1
|
Zhang B, Wang H, Ma C, Huang H, Fang Z, Qu J. LDAGM: prediction lncRNA-disease asociations by graph convolutional auto-encoder and multilayer perceptron based on multi-view heterogeneous networks. BMC Bioinformatics 2024; 25:332. [PMID: 39407120 PMCID: PMC11481433 DOI: 10.1186/s12859-024-05950-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Accepted: 10/01/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) can prevent, diagnose, and treat a variety of complex human diseases, and it is crucial to establish a method to efficiently predict lncRNA-disease associations. RESULTS In this paper, we propose a prediction method for the lncRNA-disease association relationship, named LDAGM, which is based on the Graph Convolutional Autoencoder and Multilayer Perceptron model. The method first extracts the functional similarity and Gaussian interaction profile kernel similarity of lncRNAs and miRNAs, as well as the semantic similarity and Gaussian interaction profile kernel similarity of diseases. It then constructs six homogeneous networks and deeply fuses them using a deep topology feature extraction method. The fused networks facilitate feature complementation and deep mining of the original association relationships, capturing the deep connections between nodes. Next, by combining the obtained deep topological features with the similarity network of lncRNA, disease, and miRNA interactions, we construct a multi-view heterogeneous network model. The Graph Convolutional Autoencoder is employed for nonlinear feature extraction. Finally, the extracted nonlinear features are combined with the deep topological features of the multi-view heterogeneous network to obtain the final feature representation of the lncRNA-disease pair. Prediction of the lncRNA-disease association relationship is performed using the Multilayer Perceptron model. To enhance the performance and stability of the Multilayer Perceptron model, we introduce a hidden layer called the aggregation layer in the Multilayer Perceptron model. Through a gate mechanism, it controls the flow of information between each hidden layer in the Multilayer Perceptron model, aiming to achieve optimal feature extraction from each hidden layer. CONCLUSIONS Parameter analysis, ablation studies, and comparison experiments verified the effectiveness of this method, and case studies verified the accuracy of this method in predicting lncRNA-disease association relationships.
Collapse
Grants
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
Collapse
Affiliation(s)
- Bing Zhang
- Harbin University of Science and Technology, Harbin, 150006, Heilongjiang province, China
| | - Haoyu Wang
- Harbin University of Science and Technology, Harbin, 150006, Heilongjiang province, China.
| | - Chao Ma
- Harbin University of Science and Technology, Harbin, 150006, Heilongjiang province, China
| | - Hai Huang
- Harbin University of Science and Technology, Harbin, 150006, Heilongjiang province, China
| | - Zhou Fang
- Cyberspace Research Center, Harbin, 150001, Heilongjiang province, China
| | - Jiaxing Qu
- Cyberspace Research Center, Harbin, 150001, Heilongjiang province, China
| |
Collapse
|
2
|
Xie G, Xie W, Gu G, Lin Z, Chen R, Liu S, Yu J. A vector projection similarity-based method for miRNA-disease association prediction. Anal Biochem 2024; 687:115431. [PMID: 38123111 DOI: 10.1016/j.ab.2023.115431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 12/06/2023] [Accepted: 12/15/2023] [Indexed: 12/23/2023]
Abstract
[S U M M A R Y] Many miRNA-disease association prediction models incorporate Gaussian interaction profile kernel similarity (GIPS). However, the GIPS fails to consider the specificity of the miRNA-disease association matrix, where matrix elements with a value of 0 represent miRNA and disease relationships that have not been discovered yet. To address this issue and better account for the impact of known and unknown miRNA-disease associations on similarity, we propose a method called vector projection similarity-based method for miRNA-disease association prediction (VPSMDA). In VPSMDA, we introduce three projection rules and combined with logistic functions for the miRNA-disease association matrix and propose a vector projection similarity measure for miRNAs and diseases. By integrating the vector projection similarity matrix with the original one, we obtain the improved miRNA and disease similarity matrix. Additionally, we construct a weight matrix using different numbers of neighbors to reduce the noise in the similarity matrix. In performance evaluation, both LOOCV and 5-fold CV experiments demonstrate that VPSMDA outperforms seven other state-of-the-art methods in AUC. Furthermore, in a case study, VPSMDA successfully predicted 10, 9, and 10 out of the top 10 associations for three important human diseases, respectively, and these predictions were confirmed by recent biomedical resources.
Collapse
Affiliation(s)
- Guobo Xie
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Weijie Xie
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Guosheng Gu
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Zhiyi Lin
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Ruibin Chen
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Shigang Liu
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Junrui Yu
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| |
Collapse
|
3
|
Zhang W, Liu B. iSnoDi-MDRF: Identifying snoRNA-Disease Associations Based on Multiple Biological Data by Ranking Framework. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3013-3019. [PMID: 37030816 DOI: 10.1109/tcbb.2023.3258448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Accumulating evidence indicates that the dysregulation of small nucleolar RNAs (snoRNAs) is relevant with diseases. Identifying snoRNA-disease associations by computational methods is desired for biologists, which can save considerable costs and time compared biological experiments. However, it still faces some challenges as followings: (i) Many snoRNAs are detected in recent years, but only a few snoRNAs have been proved to be associated with diseases; (ii) Computational predictors trained with only a few known snoRNA-disease associations fail to accurately identify the snoRNA-disease associations. In this study, we propose a ranking framework, called iSnoDi-MDRF, to identify potential snoRNA-disease associations based on multiple biological data, which has the following highlights: (i) iSnoDi-MDRF integrates ranking framework, which is not only able to identify potential associations between known snoRNAs and diseases, but also can identify diseases associated with new snoRNAs. (ii) Known gene-disease associations are employed to help train a mature model for predicting snoRNA-disease association. Experimental results illustrate that iSnoDi-MDRF is very suitable for identifying potential snoRNA-disease associations. The web server of iSnoDi-MDRF predictor is freely available at http://bliulab.net/iSnoDi-MDRF/.
Collapse
|
4
|
Zhu W, Yuan SS, Li J, Huang CB, Lin H, Liao B. A First Computational Frame for Recognizing Heparin-Binding Protein. Diagnostics (Basel) 2023; 13:2465. [PMID: 37510209 PMCID: PMC10377868 DOI: 10.3390/diagnostics13142465] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 07/13/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023] Open
Abstract
Heparin-binding protein (HBP) is a cationic antibacterial protein derived from multinuclear neutrophils and an important biomarker of infectious diseases. The correct identification of HBP is of great significance to the study of infectious diseases. This work provides the first HBP recognition framework based on machine learning to accurately identify HBP. By using four sequence descriptors, HBP and non-HBP samples were represented by discrete numbers. By inputting these features into a support vector machine (SVM) and random forest (RF) algorithm and comparing the prediction performances of these methods on training data and independent test data, it is found that the SVM-based classifier has the greatest potential to identify HBP. The model could produce an auROC of 0.981 ± 0.028 on training data using 10-fold cross-validation and an overall accuracy of 95.0% on independent test data. As the first model for HBP recognition, it will provide some help for infectious diseases and stimulate further research in related fields.
Collapse
Affiliation(s)
- Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou 571158, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou 571158, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| | - Shi-Shi Yuan
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Jian Li
- School of Basic Medical Sciences, Chengdu University, Chengdu 610106, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, ABa Teachers University, Chengdu 623002, China
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou 571158, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou 571158, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| |
Collapse
|
5
|
Shi H, Wu C, Bai T, Chen J, Li Y, Wu H. Identify essential genes based on clustering based synthetic minority oversampling technique. Comput Biol Med 2023; 153:106523. [PMID: 36652869 DOI: 10.1016/j.compbiomed.2022.106523] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 12/13/2022] [Accepted: 12/31/2022] [Indexed: 01/03/2023]
Abstract
Prediction of essential genes in a life organism is one of the central tasks in synthetic biology. Computational predictors are desired because experimental data is often unavailable. Recently, some sequence-based predictors have been constructed to identify essential genes. However, their predictive performance should be further improved. One key problem is how to effectively extract the sequence-based features, which are able to discriminate the essential genes. Another problem is the imbalanced training set. The amount of essential genes in human cell lines is lower than that of non-essential genes. Therefore, predictors trained with such imbalanced training set tend to identify an unseen sequence as a non-essential gene. Here, a new over-sampling strategy was proposed called Clustering based Synthetic Minority Oversampling Technique (CSMOTE) to overcome the imbalanced data issue. Combining CSMOTE with the Z curve, the global features, and Support Vector Machines, a new protocol called iEsGene-CSMOTE was proposed to identify essential genes. The rigorous jackknife cross validation results indicated that iEsGene-CSMOTE is better than the other competing methods. The proposed method outperformed λ-interval Z curve by 35.48% and 11.25% in terms of Sn and BACC, respectively.
Collapse
Affiliation(s)
- Hua Shi
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, China.
| | - Chenjin Wu
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, China.
| | - Tao Bai
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China; School of Mathematics & Computer Science, Yanan University, Shanxi, 716000, China.
| | - Jiahai Chen
- Xiamen Sankuai Online Technology Co., Ltd, Xiamen, China.
| | - Yan Li
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, China.
| | - Hao Wu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
| |
Collapse
|
6
|
Liang Q, Zhang W, Wu H, Liu B. LncRNA-disease association identification using graph auto-encoder and learning to rank. Brief Bioinform 2023; 24:6955271. [PMID: 36545805 DOI: 10.1093/bib/bbac539] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Revised: 10/18/2022] [Accepted: 11/08/2022] [Indexed: 12/24/2022] Open
Abstract
Discovering the relationships between long non-coding RNAs (lncRNAs) and diseases is significant in the treatment, diagnosis and prevention of diseases. However, current identified lncRNA-disease associations are not enough because of the expensive and heavy workload of wet laboratory experiments. Therefore, it is greatly important to develop an efficient computational method for predicting potential lncRNA-disease associations. Previous methods showed that combining the prediction results of the lncRNA-disease associations predicted by different classification methods via Learning to Rank (LTR) algorithm can be effective for predicting potential lncRNA-disease associations. However, when the classification results are incorrect, the ranking results will inevitably be affected. We propose the GraLTR-LDA predictor based on biological knowledge graphs and ranking framework for predicting potential lncRNA-disease associations. Firstly, homogeneous graph and heterogeneous graph are constructed by integrating multi-source biological information. Then, GraLTR-LDA integrates graph auto-encoder and attention mechanism to extract embedded features from the constructed graphs. Finally, GraLTR-LDA incorporates the embedded features into the LTR via feature crossing statistical strategies to predict priority order of diseases associated with query lncRNAs. Experimental results demonstrate that GraLTR-LDA outperforms the other state-of-the-art predictors and can effectively detect potential lncRNA-disease associations. Availability and implementation: Datasets and source codes are available at http://bliulab.net/GraLTR-LDA.
Collapse
Affiliation(s)
- Qi Liang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Wenxiang Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Hao Wu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
7
|
Gu X, Ding Y, Xiao P, He T. A GHKNN model based on the physicochemical property extraction method to identify SNARE proteins. Front Genet 2022; 13:935717. [PMID: 36506312 PMCID: PMC9727185 DOI: 10.3389/fgene.2022.935717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 11/02/2022] [Indexed: 11/24/2022] Open
Abstract
There is a great deal of importance to SNARE proteins, and their absence from function can lead to a variety of diseases. The SNARE protein is known as a membrane fusion protein, and it is crucial for mediating vesicle fusion. The identification of SNARE proteins must therefore be conducted with an accurate method. Through extensive experiments, we have developed a model based on graph-regularized k-local hyperplane distance nearest neighbor model (GHKNN) binary classification. In this, the model uses the physicochemical property extraction method to extract protein sequence features and the SMOTE method to upsample protein sequence features. The combination achieves the most accurate performance for identifying all protein sequences. Finally, we compare the model based on GHKNN binary classification with other classifiers and measure them using four different metrics: SN, SP, ACC, and MCC. In experiments, the model performs significantly better than other classifiers.
Collapse
Affiliation(s)
- Xingyue Gu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Pengfeng Xiao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Tao He
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
8
|
Yu L, Ju B, Ren S. HLGNN-MDA: Heuristic Learning Based on Graph Neural Networks for miRNA-Disease Association Prediction. Int J Mol Sci 2022; 23:13155. [PMID: 36361945 PMCID: PMC9657597 DOI: 10.3390/ijms232113155] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 10/23/2022] [Accepted: 10/26/2022] [Indexed: 01/12/2024] Open
Abstract
Identifying disease-related miRNAs can improve the understanding of complex diseases. However, experimentally finding the association between miRNAs and diseases is expensive in terms of time and resources. The computational screening of reliable miRNA-disease associations has thus become a necessary tool to guide biological experiments. "Similar miRNAs will be associated with the same disease" is the assumption on which most current miRNA-disease association prediction methods rely; however, biased prior knowledge, and incomplete and inaccurate miRNA similarity data and disease similarity data limit the performance of the model. Here, we propose heuristic learning based on graph neural networks to predict microRNA-disease associations (HLGNN-MDA). We learn the local graph topology features of the predicted miRNA-disease node pairs using graph neural networks. In particular, our improvements to the graph convolution layer of the graph neural network enable it to learn information among homogeneous nodes and among heterogeneous nodes. We illustrate the performance of HLGNN-MDA by performing tenfold cross-validation against excellent baseline models. The results show that we have promising performance in multiple metrics. We also focus on the role of the improvements to the graph convolution layer in the model. The case studies are supported by evidence on breast cancer, hepatocellular carcinoma and renal cell carcinoma. Given the above, the experiments demonstrate that HLGNN-MDA can serve as a reliable method to identify novel miRNA-disease associations.
Collapse
Affiliation(s)
- Liang Yu
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China
| | | | | |
Collapse
|
9
|
Duan T, Kuang Z, Deng L. SVMMDR: Prediction of miRNAs-drug resistance using support vector machines based on heterogeneous network. Front Oncol 2022; 12:987609. [PMID: 36338674 PMCID: PMC9632662 DOI: 10.3389/fonc.2022.987609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 09/14/2022] [Indexed: 11/21/2022] Open
Abstract
In recent years, the miRNA is considered as a potential high-value therapeutic target because of its complex and delicate mechanism of gene regulation. The abnormal expression of miRNA can cause drug resistance, affecting the therapeutic effect of the disease. Revealing the associations between miRNAs-drug resistance can help in the design of effective drugs or possible drug combinations. However, current conventional experiments for identification of miRNAs-drug resistance are time-consuming and high-cost. Therefore, it’s of pretty realistic value to develop an accurate and efficient computational method to predicting miRNAs-drug resistance. In this paper, a method based on the Support Vector Machines (SVM) to predict the association between MiRNA and Drug Resistance (SVMMDR) is proposed. The SVMMDR integrates miRNAs-drug resistance association, miRNAs sequence similarity, drug chemical structure similarity and other similarities, extracts path-based Hetesim features, and obtains inclined diffusion feature through restart random walk. By combining the multiple feature, the prediction score between miRNAs and drug resistance is obtained based on the SVM. The innovation of the SVMMDR is that the inclined diffusion feature is obtained by inclined restart random walk, the node information and path information in heterogeneous network are integrated, and the SVM is used to predict potential miRNAs-drug resistance associations. The average AUC of SVMMDR obtained is 0.978 in 10-fold cross-validation.
Collapse
|
10
|
Li M, Fan Y, Zhang Y, Lv Z. Using Sequence Similarity Based on CKSNP Features and a Graph Neural Network Model to Identify miRNA-Disease Associations. Genes (Basel) 2022; 13:1759. [PMID: 36292644 PMCID: PMC9602123 DOI: 10.3390/genes13101759] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 09/25/2022] [Accepted: 09/26/2022] [Indexed: 01/12/2024] Open
Abstract
Among many machine learning models for analyzing the relationship between miRNAs and diseases, the prediction results are optimized by establishing different machine learning models, and less attention is paid to the feature information contained in the miRNA sequence itself. This study focused on the impact of the different feature information of miRNA sequences on the relationship between miRNA and disease. It was found that when the graph neural network used was the same and the miRNA features based on the K-spacer nucleic acid pair composition (CKSNAP) feature were adopted, a better graph neural network prediction model of miRNA-disease relationship could be built (AUC = 93.71%), which was 0.15% greater than the best model in the literature based on the same benchmark dataset. The optimized model was also used to predict miRNAs related to lung tumors, esophageal tumors, and kidney tumors, and 47, 47, and 37 of the top 50 miRNAs related to three diseases predicted separately by the model were consistent with descriptions in the wet experiment validation database (dbDEMC).
Collapse
Affiliation(s)
- Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Yu Fan
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Yiting Zhang
- College of Biology, Southwest Jiaotong University, Chengdu 611756, China
- College of Biology, Georgia State University, Atlanta, GA 30302-3965, USA
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
11
|
Zhang W, Hou J, Liu B. iPiDA-LTR: Identifying piwi-interacting RNA-disease associations based on Learning to Rank. PLoS Comput Biol 2022; 18:e1010404. [PMID: 35969645 PMCID: PMC9410559 DOI: 10.1371/journal.pcbi.1010404] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 08/25/2022] [Accepted: 07/18/2022] [Indexed: 12/01/2022] Open
Abstract
Piwi-interacting RNAs (piRNAs) are regarded as drug targets and biomarkers for the diagnosis and therapy of diseases. However, biological experiments cost substantial time and resources, and the existing computational methods only focus on identifying missing associations between known piRNAs and diseases. With the fast development of biological experiments, more and more piRNAs are detected. Therefore, the identification of piRNA-disease associations of newly detected piRNAs has significant theoretical value and practical significance on pathogenesis of diseases. In this study, the iPiDA-LTR predictor is proposed to identify associations between piRNAs and diseases based on Learning to Rank. The iPiDA-LTR predictor not only identifies the missing associations between known piRNAs and diseases, but also detects diseases associated with newly detected piRNAs. Experimental results demonstrate that iPiDA-LTR effectively predicts piRNA-disease associations outperforming the other related methods.
Collapse
Affiliation(s)
- Wenxiang Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Jialu Hou
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
12
|
Chen D, Li Y. PredMHC: An Effective Predictor of Major Histocompatibility Complex Using Mixed Features. Front Genet 2022; 13:875112. [PMID: 35547252 PMCID: PMC9081368 DOI: 10.3389/fgene.2022.875112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Accepted: 03/07/2022] [Indexed: 12/03/2022] Open
Abstract
The major histocompatibility complex (MHC) is a large locus on vertebrate DNA that contains a tightly linked set of polymorphic genes encoding cell surface proteins essential for the adaptive immune system. The groups of proteins encoded in the MHC play an important role in the adaptive immune system. Therefore, the accurate identification of the MHC is necessary to understand its role in the adaptive immune system. An effective predictor called PredMHC is established in this study to identify the MHC from protein sequences. Firstly, PredMHC encoded a protein sequence with mixed features including 188D, APAAC, KSCTriad, CKSAAGP, and PAAC. Secondly, three classifiers including SGD, SMO, and random forest were trained on the mixed features of the protein sequence. Finally, the prediction result was obtained by the voting of the three classifiers. The experimental results of the 10-fold cross-validation test in the training dataset showed that PredMHC can obtain 91.69% accuracy. Experimental results on comparison with other features, classifiers, and existing methods showed the effectiveness of PredMHC in predicting the MHC.
Collapse
Affiliation(s)
- Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Yanjuan Li
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| |
Collapse
|
13
|
Lv H, Yan K, Guo Y, Zou Q, Hesham AEL, Liu B. AMPpred-EL: An effective antimicrobial peptide prediction model based on ensemble learning. Comput Biol Med 2022; 146:105577. [PMID: 35576825 DOI: 10.1016/j.compbiomed.2022.105577] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 04/22/2022] [Accepted: 04/28/2022] [Indexed: 11/15/2022]
Abstract
Antimicrobial peptides (AMPs) are important for the human immune system and are currently applied in clinical trials. AMPs have been received much attention for accurate recognition. Recently, several computational methods for identifying AMPs have been proposed. However, existing methods have difficulty in accurately predicting AMPs. In this paper, we propose a novel AMP prediction method called AMPpred-EL based on an ensemble learning strategy. AMPred-EL is constructed based on ensemble learning combined with LightGBM and logistic regression. Experimental results demonstrate that AMPpred-EL outperforms several state-of-the-art methods on the benchmark datasets and then improves the efficiency performance.
Collapse
Affiliation(s)
- Hongwu Lv
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
| | - Yichen Guo
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
| | - Abd El-Latif Hesham
- Genetics Department, Faculty of Agriculture, Beni-Suef University, Beni-Suef, 62511, Egypt.
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China.
| |
Collapse
|
14
|
Cai L, Gao M, Ren X, Fu X, Xu J, Wang P, Chen Y. MILNP: Plant lncRNA-miRNA Interaction Prediction Based on Improved Linear Neighborhood Similarity and Label Propagation. FRONTIERS IN PLANT SCIENCE 2022; 13:861886. [PMID: 35401586 PMCID: PMC8990282 DOI: 10.3389/fpls.2022.861886] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 02/21/2022] [Indexed: 06/14/2023]
Abstract
Knowledge of the interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) is the basis of understanding various biological activities and designing new drugs. Previous computational methods for predicting lncRNA-miRNA interactions lacked for plants, and they suffer from various limitations that affect the prediction accuracy and their applicability. Research on plant lncRNA-miRNA interactions is still in its infancy. In this paper, we propose an accurate predictor, MILNP, for predicting plant lncRNA-miRNA interactions based on improved linear neighborhood similarity measurement and linear neighborhood propagation algorithm. Specifically, we propose a novel similarity measure based on linear neighborhood similarity from multiple similarity profiles of lncRNAs and miRNAs and derive more precise neighborhood ranges so as to escape the limits of the existing methods. We then simultaneously update the lncRNA-miRNA interactions predicted from both similarity matrices based on label propagation. We comprehensively evaluate MILNP on the latest plant lncRNA-miRNA interaction benchmark datasets. The results demonstrate the superior performance of MILNP than the most up-to-date methods. What's more, MILNP can be leveraged for isolated plant lncRNAs (or miRNAs). Case studies suggest that MILNP can identify novel plant lncRNA-miRNA interactions, which are confirmed by classical tools. The implementation is available on https://github.com/HerSwain/gra/tree/MILNP.
Collapse
Affiliation(s)
| | | | | | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | | - Peng Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | |
Collapse
|
15
|
Han YM, Yang H, Huang QL, Sun ZJ, Li ML, Zhang JB, Deng KJ, Chen S, Lin H. Risk prediction of diabetes and pre-diabetes based on physical examination data. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:3597-3608. [PMID: 35341266 DOI: 10.3934/mbe.2022166] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Diabetes is a metabolic disorder caused by insufficient insulin secretion and insulin secretion disorders. From health to diabetes, there are generally three stages: health, pre-diabetes and type 2 diabetes. Early diagnosis of diabetes is the most effective way to prevent and control diabetes and its complications. In this work, we collected the physical examination data from Beijing Physical Examination Center from January 2006 to December 2017, and divided the population into three groups according to the WHO (1999) Diabetes Diagnostic Standards: normal fasting plasma glucose (NFG) (FPG < 6.1 mmol/L), mildly impaired fasting plasma glucose (IFG) (6.1 mmol/L ≤ FPG < 7.0 mmol/L) and type 2 diabetes (T2DM) (FPG > 7.0 mmol/L). Finally, we obtained1,221,598 NFG samples, 285,965 IFG samples and 387,076 T2DM samples, with a total of 15 physical examination indexes. Furthermore, taking eXtreme Gradient Boosting (XGBoost), random forest (RF), Logistic Regression (LR), and Fully connected neural network (FCN) as classifiers, four models were constructed to distinguish NFG, IFG and T2DM. The comparison results show that XGBoost has the best performance, with AUC (macro) of 0.7874 and AUC (micro) of 0.8633. In addition, based on the XGBoost classifier, three binary classification models were also established to discriminate NFG from IFG, NFG from T2DM, IFG from T2DM. On the independent dataset, the AUCs were 0.7808, 0.8687, 0.7067, respectively. Finally, we analyzed the importance of the features and identified the risk factors associated with diabetes.
Collapse
Affiliation(s)
- Yu-Mei Han
- Beijing Physical Examination Center, Beijing, China
| | - Hui Yang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Qin-Lai Huang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zi-Jie Sun
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | | | | | - Ke-Jun Deng
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Shuo Chen
- Beijing Physical Examination Center, Beijing, China
| | - Hao Lin
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
16
|
Lin C, Wang L, Shi L. AAPred-CNN: accurate predictor based on deep convolution neural network for identification of anti-angiogenic peptides. Methods 2022; 204:442-448. [PMID: 35031486 DOI: 10.1016/j.ymeth.2022.01.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 12/28/2021] [Accepted: 01/09/2022] [Indexed: 12/13/2022] Open
Abstract
Recently, deep learning techniques have been developed for various bioactive peptide prediction tasks. However, there are only conventional machine learning-based methods for the prediction of anti-angiogenic peptides (AAP), which play an important role in cancer treatment. The main reason why no deep learning method has been involved in this field is that there are too few experimentally validated AAPs to support the training of deep models but researchers have believed that deep learning seriously depends on the amounts of labeled data. In this paper, as a tentative work, we try to predict AAP by constructing different classical deep learning models and propose the first deep convolution neural network-based predictor (AAPred-CNN) for AAP. Contrary to intuition, the experimental results show that deep learning models can achieve superior or comparable performance to the state-of-the-art model, although they are given a few labeled sequences to train. We also decipher the influence of hyper-parameters and training samples on the performance of deep learning models to help understand how the model work. Furthermore, we also visualize the learned embeddings by dimension reduction to increase the model interpretability and reveal the residue propensity of AAP through the statistics of convolutional features for different residues. In summary, this work demonstrates the powerful representation ability of AAPred-CNNfor AAP prediction, further improving the prediction accuracy of AAP.
Collapse
Affiliation(s)
- Changhang Lin
- School of Big Data and Artificial Intelligence, Fujian Polytechnic Normal University, Fuzhou, China
| | - Lei Wang
- Beidahuang Industry Group General Hospital, Harbin, China.
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China.
| |
Collapse
|
17
|
Luo J, Bao Y, Chen X, Shen C. Metapath-Based Deep Convolutional Neural Network for Predicting miRNA-Target Association on Heterogeneous Network. Interdiscip Sci 2021; 13:547-558. [PMID: 34170473 DOI: 10.1007/s12539-021-00454-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 06/17/2021] [Accepted: 06/17/2021] [Indexed: 06/13/2023]
Abstract
Predicting the interactions between microRNAs (miRNAs) and target genes is of great significance for understanding the regulatory mechanism of miRNA and treating complex diseases. The emergence of large-scale, heterogeneous biological networks has offered unprecedented opportunities for revealing miRNA-associated target genes. However, there are still some limitations about automatically learn the feature information of the network in the existing methods. Since network representation learning can self-adaptively capture structure information of the network, we propose a framework based on heterogeneous network representation, MDCNN (Metapath-Based Deep Convolutional Neural Network), to predict the associations between miRNAs and target genes. MDCNN samples the paths between the node pairs in the form of meta-path based on the heterogeneous information network (HIN) about miRNAs and target genes. Then the node feature and the path feature which is learned by the Deep Convolutional Neural Network (DCNN) are spliced together as the representation of the miRNA-target gene, to predict the miRNA-target gene interactions. The experiment results indicate that the performance of MDCNN outperforms other methods in multiple validation metrics by fivefold cross validation. We set an ablation study to identify the necessity of miRNA similarity and target gene similarity for improving the prediction ability of MDCNN. The case studies on hsa-miR-26b-5p and CDKN1A further demonstrates that MDCNN can successfully predict potential miRNA-target gene interactions.
Collapse
Affiliation(s)
- Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, China
| | - Yaoting Bao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, China
| | - Xiangtao Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, China.
| | - Cong Shen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, China
| |
Collapse
|
18
|
Xu H, Zhao B, Zhong W, Teng P, Qiao H. Identification of miRNA Signature Associated With Erectile Dysfunction in Type 2 Diabetes Mellitus by Support Vector Machine-Recursive Feature Elimination. Front Genet 2021; 12:762136. [PMID: 34707644 PMCID: PMC8542849 DOI: 10.3389/fgene.2021.762136] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Accepted: 09/22/2021] [Indexed: 01/10/2023] Open
Abstract
Diabetic mellitus erectile dysfunction (DMED) is one of the most common complications of diabetes mellitus (DM), which seriously affects the self-esteem and quality of life of diabetics. MicroRNAs (miRNAs) are endogenous non-coding RNAs whose expression levels can affect multiple cellular processes. Many pieces of studies have demonstrated that miRNA plays a role in the occurrence and development of DMED. However, the exact mechanism of this process is unclear. Hence, we apply miRNA sequencing from blood samples of 10 DMED patients and 10 DM controls to study the mechanisms of miRNA interactions in DMED patients. Firstly, we found four characteristic miRNAs as signature by the SVM-RFE method (hsa-let-7E-5p, hsa-miR-30 days-5p, hsa-miR-199b-5p, and hsa-miR-342–3p), called DMEDSig-4. Subsequently, we correlated DMEDSig-4 with clinical factors and further verified the ability of these miRNAs to classify samples. Finally, we functionally verified the relationship between DMEDSig-4 and DMED by pathway enrichment analysis of miRNA and its target genes. In brief, our study found four key miRNAs, which may be the key influencing factors of DMED. Meanwhile, the DMEDSig-4 could help in the development of new therapies for DMED.
Collapse
Affiliation(s)
- Haibo Xu
- The Second Affiliated Hospital of Harbin Medical University, Harbin, China.,The First Hospital of Qiqihar, Qiqihar, China
| | - Baoyin Zhao
- The First Hospital of Qiqihar, Qiqihar, China
| | - Wei Zhong
- The First Hospital of Qiqihar, Qiqihar, China
| | - Peng Teng
- The First Hospital of Qiqihar, Qiqihar, China
| | - Hong Qiao
- The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
19
|
Guo L, Shi K, Wang L. MLPMDA: Multi-layer linear projection for predicting miRNA-disease association. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106718] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|