1
|
Li X, Qu W, Yan J, Tan J. RPI-EDLCN: An Ensemble Deep Learning Framework Based on Capsule Network for ncRNA-Protein Interaction Prediction. J Chem Inf Model 2024; 64:2221-2235. [PMID: 37158609 DOI: 10.1021/acs.jcim.3c00377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Noncoding RNAs (ncRNAs) play crucial roles in many cellular life activities by interacting with proteins. Identification of ncRNA-protein interactions (ncRPIs) is key to understanding the function of ncRNAs. Although a number of computational methods for predicting ncRPIs have been developed, the problem of predicting ncRPIs remains challenging. It has always been the focus of ncRPIs research to select suitable feature extraction methods and develop a deep learning architecture with better recognition performance. In this work, we proposed an ensemble deep learning framework, RPI-EDLCN, based on a capsule network (CapsuleNet) to predict ncRPIs. In terms of feature input, we extracted the sequence features, secondary structure sequence features, motif information, and physicochemical properties of ncRNA/protein. The sequence and secondary structure sequence features of ncRNA/protein are encoded by the conjoint k-mer method and then input into an ensemble deep learning model based on CapsuleNet by combining the motif information and physicochemical properties. In this model, the encoding features are processed by convolution neural network (CNN), deep neural network (DNN), and stacked autoencoder (SAE). Then the advanced features obtained from the processing are input into the CapsuleNet for further feature learning. Compared with other state-of-the-art methods under 5-fold cross-validation, the performance of RPI-EDLCN is the best, and the accuracy of RPI-EDLCN on RPI1807, RPI2241, and NPInter v2.0 data sets was 93.8%, 88.2%, and 91.9%, respectively. The results of the independent test indicated that RPI-EDLCN can effectively predict potential ncRPIs in different organisms. In addition, RPI-EDLCN successfully predicted hub ncRNAs and proteins in Mus musculus ncRNA-protein networks. Overall, our model can be used as an effective tool to predict ncRPIs and provides some useful guidance for future biological studies.
Collapse
Affiliation(s)
- Xiaoyi Li
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Wenyan Qu
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Jing Yan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Jianjun Tan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| |
Collapse
|
2
|
Peng Y, Wang Y, Wen Z, Xiang H, Guo L, Su L, He Y, Pang H, Zhou P, Zhan X. Deep learning and machine learning predictive models for neurological function after interventional embolization of intracranial aneurysms. Front Neurol 2024; 15:1321923. [PMID: 38327618 PMCID: PMC10848172 DOI: 10.3389/fneur.2024.1321923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 01/08/2024] [Indexed: 02/09/2024] Open
Abstract
Objective The objective of this study is to develop a model to predicts the postoperative Hunt-Hess grade in patients with intracranial aneurysms by integrating radiomics and deep learning technologies, using preoperative CTA imaging data. Thereby assisting clinical decision-making and improving the assessment and prognosis of postoperative neurological function. Methods This retrospective study encompassed 101 patients who underwent aneurysm embolization surgery. 851 radiomic features were extracted from CTA images. 512 deep learning features are extracted from last layer of ResNet50 deep convolutional neural network model. The feature screening process pipeline encompassed intraclass correlation coefficient analysis, principal component analysis, U test, spearman correlation analysis, minimum redundancy maximum relevance algorithm and Lasso regression, to identify features most correlated with postoperative Hunt-Hess grading. In the model construction phase, three distinct models were constructed: radiomics feature-based model (RSM), deep learning feature-based model (DLM), and deep learning-radiomics feature fusion model (DLRSCM). The study also calculated the radiomics score and combined it with clinical data to construct a Nomogram for predictive modeling. DLM, RSM and DLRSCM model was constructed by 9 base algorithms and 1 ensemble learning algorithm - Stacking ensemble model. Model performance was evaluated based on the area under the Receiver Operating Characteristic (ROC) curve (AUC), Matthews Correlation Coefficient (MCC), calibration curves, and decision curves analysis. Results 5 significant radiomic feature and 4 significant deep learning features were obtained through the feature selection process. These features were utilized for model construction. Bootstrap resampling method was used for internal validation of the models. In terms of model evaluation, the DLM model, the stacking ensemble algorithm results achieved an AUC of 0.959 and MCC of 0.815. In the RSM model, the stacking ensemble model AUC was 0.935 and MCC was 0.793. The stacking ensemble model in DLRSCM outperformed others, with an AUC of 0.968 and MCC of 0.820. Results indicated that the ANN performed optimally among all base models, while the stacked ensemble learning model exhibited the highest predictive performance. Conclusion This study demonstrates that the combination of radiomics and deep learning is an effective approach to predict the postoperative Hunt-Hess grade in patients with intracranial aneurysms. This holds significant value in the early identification of postoperative neurological complications and in enhancing clinical decision-making.
Collapse
Affiliation(s)
- Yan Peng
- Department of Interventional Medicine, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Yiren Wang
- School of Nursing, Southwest Medical University, Luzhou, China
- Wound Healing Basic Research and Clinical Application Key Laboratory of Luzhou, Southwest Medical University, Luzhou, China
| | - Zhongjian Wen
- School of Nursing, Southwest Medical University, Luzhou, China
- Wound Healing Basic Research and Clinical Application Key Laboratory of Luzhou, Southwest Medical University, Luzhou, China
| | - Hongli Xiang
- School of Nursing, Southwest Medical University, Luzhou, China
- Wound Healing Basic Research and Clinical Application Key Laboratory of Luzhou, Southwest Medical University, Luzhou, China
| | - Ling Guo
- Department of Oncology, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Lei Su
- School of Medical Information and Engineering, Southwest Medical University, Luzhou, China
| | - Yongcheng He
- Department of Pharmacy, Sichuan Agriculture University, Chengdu, China
| | - Haowen Pang
- Department of Oncology, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Ping Zhou
- Wound Healing Basic Research and Clinical Application Key Laboratory of Luzhou, Southwest Medical University, Luzhou, China
- Department of Nursing, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Department of Radiology, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Xiang Zhan
- Department of Radiology, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| |
Collapse
|
3
|
Huiwen J, Kai S. Prediction of LncRNA-protein Interactions Using Auto-Encoder, SE-ResNet Models and Transfer Learning. Microrna 2024; 13:155-165. [PMID: 38591194 DOI: 10.2174/0122115366288068240322064431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/26/2024] [Accepted: 03/09/2024] [Indexed: 04/10/2024]
Abstract
BACKGROUND Long non-coding RNA (lncRNA) plays a crucial role in various biological processes, and mutations or imbalances of lncRNAs can lead to several diseases, including cancer, Prader-Willi syndrome, autism, Alzheimer's disease, cartilage-hair hypoplasia, and hearing loss. Understanding lncRNA-protein interactions (LPIs) is vital for elucidating basic cellular processes, human diseases, viral replication, transcription, and plant pathogen resistance. Despite the development of several LPI calculation methods, predicting LPI remains challenging, with the selection of variables and deep learning structure being the focus of LPI research. METHODS We propose a deep learning framework called AR-LPI, which extracts sequence and secondary structure features of proteins and lncRNAs. The framework utilizes an auto-encoder for feature extraction and employs SE-ResNet for prediction. Additionally, we apply transfer learning to the deep neural network SE-ResNet for predicting small-sample datasets. RESULTS Through comprehensive experimental comparison, we demonstrate that the AR-LPI architecture performs better in LPI prediction. Specifically, the accuracy of AR-LPI increases by 2.86% to 94.52%, while the F-value of AR-LPI increases by 2.71% to 94.73%. CONCLUSION Our experimental results show that the overall performance of AR-LPI is better than that of other LPI prediction tools.
Collapse
Affiliation(s)
- Jiang Huiwen
- School of Mathematics and Statistics, Qingdao University, Qingdao, Shandong, China
| | - Song Kai
- School of Mathematics and Statistics, Qingdao University, Qingdao, Shandong, China
| |
Collapse
|
4
|
Gong L, Chen J, Cui X, Liu Y. RPIPCM: A deep network model for predicting lncRNA-protein interaction based on sequence feature encoding. Comput Biol Med 2023; 165:107366. [PMID: 37633089 DOI: 10.1016/j.compbiomed.2023.107366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 07/29/2023] [Accepted: 08/12/2023] [Indexed: 08/28/2023]
Abstract
LncRNA-protein interactionplays an important regulatory role in biological processes. In this paper, the proposed RPIPCM based on a novel deep network model uses the sequence feature encoding of both RNA and protein to predict lncRNA-protein interactions (LPIs). A negative sampling of sliding window method is proposed for solving the problem of unbalanced between positive and negative samples. The proposed negative sampling method is effective and helpful to solve the problem of data imbalance in the existing LPIs research by comparative experiments. Experimental results also show that the proposed sequence feature encoding method has good performance in predicting LPIs for different datasets of different sizes and types. In the RPI488 dataset related to animal, compared with the direct original sequence encoding model, the accuracy of sequence feature encoding model increased by 1.02%, the recall increased by 4.08%, and the value of MCC increased by 1.67%. In the case of the plant dataset ATH948, the sequence feature-based encoding demonstrated a 1.58% higher accuracy, a 1.53% higher recall, a 1.62% higher specificity, a 1.62% higher precision, and a 3.16% higher value of MCC compared to the direct original sequence-based encoding. Compared with the latest prediction work in the ZEA22133 dataset, RPIPCM is shown to be more effective with the accuracy increased by 2.23%, the recall increased by 1.78%, the specificity increased by 2.67%, the precision increased by 2.52%, and the value of MCC increased by 4.43%, which also proves the effectiveness and robustness of RPIPCM. In conclusion, RPIPCM of deep network model based on sequence feature encoding can automatically mine the hidden feature information of the sequence in the lncRNA-protein interaction without relying on external features or prior biomedical knowledge, and its low cost and high efficiency can provide a reference for biomedical researchers.
Collapse
Affiliation(s)
- Lejun Gong
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China.
| | - Jingmei Chen
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Xiong Cui
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Yang Liu
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| |
Collapse
|
5
|
Ballarino M, Pepe G, Helmer-Citterich M, Palma A. Exploring the landscape of tools and resources for the analysis of long non-coding RNAs. Comput Struct Biotechnol J 2023; 21:4706-4716. [PMID: 37841333 PMCID: PMC10568309 DOI: 10.1016/j.csbj.2023.09.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 09/28/2023] [Accepted: 09/28/2023] [Indexed: 10/17/2023] Open
Abstract
In recent years, research on long non-coding RNAs (lncRNAs) has gained considerable attention due to the increasing number of newly identified transcripts. Several characteristics make their functional evaluation challenging, which called for the urgent need to combine molecular biology with other disciplines, including bioinformatics. Indeed, the recent development of computational pipelines and resources has greatly facilitated both the discovery and the mechanisms of action of lncRNAs. In this review, we present a curated collection of the most recent computational resources, which have been categorized into distinct groups: databases and annotation, identification and classification, interaction prediction, and structure prediction. As the repertoire of lncRNAs and their analysis tools continues to expand over the years, standardizing the computational pipelines and improving the existing annotation of lncRNAs will be crucial to facilitate functional genomics studies.
Collapse
Affiliation(s)
- Monica Ballarino
- Department of Biology and Biotechnologies “Charles Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00161 Rome, Italy
| | - Gerardo Pepe
- Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 1, 00133 Rome, Italy
| | - Manuela Helmer-Citterich
- Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 1, 00133 Rome, Italy
| | - Alessandro Palma
- Department of Biology and Biotechnologies “Charles Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00161 Rome, Italy
| |
Collapse
|
6
|
Mulugeta G, Zewotir T, Tegegne AS, Juhar LH, Muleta MB. Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia. BMC Med Inform Decis Mak 2023; 23:98. [PMID: 37217892 DOI: 10.1186/s12911-023-02185-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 04/25/2023] [Indexed: 05/24/2023] Open
Abstract
INTRODUCTION The prevalence of end-stage renal disease has raised the need for renal replacement therapy over recent decades. Even though a kidney transplant offers an improved quality of life and lower cost of care than dialysis, graft failure is possible after transplantation. Hence, this study aimed to predict the risk of graft failure among post-transplant recipients in Ethiopia using the selected machine learning prediction models. METHODOLOGY The data was extracted from the retrospective cohort of kidney transplant recipients at the Ethiopian National Kidney Transplantation Center from September 2015 to February 2022. In response to the imbalanced nature of the data, we performed hyperparameter tuning, probability threshold moving, tree-based ensemble learning, stacking ensemble learning, and probability calibrations to improve the prediction results. Merit-based selected probabilistic (logistic regression, naive Bayes, and artificial neural network) and tree-based ensemble (random forest, bagged tree, and stochastic gradient boosting) models were applied. Model comparison was performed in terms of discrimination and calibration performance. The best-performing model was then used to predict the risk of graft failure. RESULTS A total of 278 completed cases were analyzed, with 21 graft failures and 3 events per predictor. Of these, 74.8% are male, and 25.2% are female, with a median age of 37. From the comparison of models at the individual level, the bagged tree and random forest have top and equal discrimination performance (AUC-ROC = 0.84). In contrast, the random forest has the best calibration performance (brier score = 0.045). Under testing the individual model as a meta-learner for stacking ensemble learning, the result of stochastic gradient boosting as a meta-learner has the top discrimination (AUC-ROC = 0.88) and calibration (brier score = 0.048) performance. Regarding feature importance, chronic rejection, blood urea nitrogen, number of post-transplant admissions, phosphorus level, acute rejection, and urological complications are the top predictors of graft failure. CONCLUSIONS Bagging, boosting, and stacking, with probability calibration, are good choices for clinical risk predictions working on imbalanced data. The data-driven probability threshold is more beneficial than the natural threshold of 0.5 to improve the prediction result from imbalanced data. Integrating various techniques in a systematic framework is a smart strategy to improve prediction results from imbalanced data. It is recommended for clinical experts in kidney transplantation to use the final calibrated model as a decision support system to predict the risk of graft failure for individual patients.
Collapse
Affiliation(s)
- Getahun Mulugeta
- Department of Statistics, Bahir Dar University, Bahir Dar, Ethiopia.
| | - Temesgen Zewotir
- School of Mathematics, Statistics, and Computer Science, KwaZulu-Natal University, Durban, South Africa
| | | | - Leja Hamza Juhar
- St. Paul's Hospital Millennium Medical College, Addis Ababa, Ethiopia
| | | |
Collapse
|
7
|
Wei MM, Yu CQ, Li LP, You ZH, Ren ZH, Guan YJ, Wang XF, Li YC. LPIH2V: LncRNA-protein interactions prediction using HIN2Vec based on heterogeneous networks model. Front Genet 2023; 14:1122909. [PMID: 36845392 PMCID: PMC9950107 DOI: 10.3389/fgene.2023.1122909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 01/30/2023] [Indexed: 02/12/2023] Open
Abstract
LncRNA-protein interaction plays an important role in the development and treatment of many human diseases. As the experimental approaches to determine lncRNA-protein interactions are expensive and time-consuming, considering that there are few calculation methods, therefore, it is urgent to develop efficient and accurate methods to predict lncRNA-protein interactions. In this work, a model for heterogeneous network embedding based on meta-path, namely LPIH2V, is proposed. The heterogeneous network is composed of lncRNA similarity networks, protein similarity networks, and known lncRNA-protein interaction networks. The behavioral features are extracted in a heterogeneous network using the HIN2Vec method of network embedding. The results showed that LPIH2V obtains an AUC of 0.97 and ACC of 0.95 in the 5-fold cross-validation test. The model successfully showed superiority and good generalization ability. Compared to other models, LPIH2V not only extracts attribute characteristics by similarity, but also acquires behavior properties by meta-path wandering in heterogeneous networks. LPIH2V would be beneficial in forecasting interactions between lncRNA and protein.
Collapse
Affiliation(s)
- Meng-Meng Wei
- School of Information Engineering, Xijing University, Xi’an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi’an, China,*Correspondence: Chang-Qing Yu, ; Li-Ping Li,
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi’an, China,College of Grassland and Environment Sciences, Xinjiang Agricultural University, Urumqi, China,*Correspondence: Chang-Qing Yu, ; Li-Ping Li,
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Zhong-Hao Ren
- School of Information Engineering, Xijing University, Xi’an, China
| | - Yong-Jian Guan
- School of Information Engineering, Xijing University, Xi’an, China
| | | | | |
Collapse
|
8
|
Yu CQ, Wang XF, Li LP, You ZH, Huang WZ, Li YC, Ren ZH, Guan YJ. SGCNCMI: A New Model Combining Multi-Modal Information to Predict circRNA-Related miRNAs, Diseases and Genes. BIOLOGY 2022; 11:biology11091350. [PMID: 36138829 PMCID: PMC9495879 DOI: 10.3390/biology11091350] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/21/2022] [Accepted: 09/08/2022] [Indexed: 11/16/2022]
Abstract
Computational prediction of miRNAs, diseases, and genes associated with circRNAs has important implications for circRNA research, as well as provides a reference for wet experiments to save costs and time. In this study, SGCNCMI, a computational model combining multimodal information and graph convolutional neural networks, combines node similarity to form node information and then predicts associated nodes using GCN with a distributive contribution mechanism. The model can be used not only to predict the molecular level of circRNA–miRNA interactions but also to predict circRNA–cancer and circRNA–gene associations. The AUCs of circRNA—miRNA, circRNA–disease, and circRNA–gene associations in the five-fold cross-validation experiment of SGCNCMI is 89.42%, 84.18%, and 82.44%, respectively. SGCNCMI is one of the few models in this field and achieved the best results. In addition, in our case study, six of the top ten relationship pairs with the highest prediction scores were verified in PubMed.
Collapse
Affiliation(s)
- Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi’an 710123, China
- Correspondence:
| | - Xin-Fei Wang
- School of Information Engineering, Xijing University, Xi’an 710123, China
| | - Li-Ping Li
- College of Grassland and Environment Sciences, Xinjiang Agricultural University, Urumqi 830052, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| | - Wen-Zhun Huang
- School of Information Engineering, Xijing University, Xi’an 710123, China
| | - Yue-Chao Li
- School of Information Engineering, Xijing University, Xi’an 710123, China
| | - Zhong-Hao Ren
- School of Information Engineering, Xijing University, Xi’an 710123, China
| | - Yong-Jian Guan
- School of Information Engineering, Xijing University, Xi’an 710123, China
| |
Collapse
|
9
|
Chen Y, Li Z, Li Z. Prediction of Plant Resistance Proteins Based on Pairwise Energy Content and Stacking Framework. FRONTIERS IN PLANT SCIENCE 2022; 13:912599. [PMID: 35712582 PMCID: PMC9194944 DOI: 10.3389/fpls.2022.912599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 05/10/2022] [Indexed: 06/15/2023]
Abstract
Plant resistance proteins (R proteins) recognize effector proteins secreted by pathogenic microorganisms and trigger an immune response against pathogenic microbial infestation. Accurate identification of plant R proteins is an important research topic in plant pathology. Plant R protein prediction has achieved many research results. Recently, some machine learning-based methods have emerged to identify plant R proteins. Still, most of them only rely on protein sequence features, which ignore inter-amino acid features, thus limiting the further improvement of plant R protein prediction performance. In this manuscript, we propose a method called StackRPred to predict plant R proteins. Specifically, the StackRPred first obtains plant R protein feature information from the pairwise energy content of residues; then, the obtained feature information is fed into the stacking framework for training to construct a prediction model for plant R proteins. The results of both the five-fold cross-validation and independent test validation show that our proposed method outperforms other state-of-the-art methods, indicating that StackRPred is an effective tool for predicting plant R proteins. It is expected to bring some favorable contribution to the study of plant R proteins.
Collapse
Affiliation(s)
- Yifan Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Zejun Li
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Zhiyong Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
10
|
Xu D, Yuan W, Fan C, Liu B, Lu MZ, Zhang J. Opportunities and Challenges of Predictive Approaches for the Non-coding RNA in Plants. FRONTIERS IN PLANT SCIENCE 2022; 13:890663. [PMID: 35498708 PMCID: PMC9048598 DOI: 10.3389/fpls.2022.890663] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 03/28/2022] [Indexed: 06/01/2023]
Affiliation(s)
- Dong Xu
- State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Wenya Yuan
- State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, China
| | - Chunjie Fan
- State Key Laboratory of Tree Genetics and Breeding, Research Institute of Tropical Forestry, Chinese Academy of Forestry, Guangzhou, China
| | - Bobin Liu
- Jiangsu Key Laboratory for Bioresources of Saline Soils, Jiangsu Synthetic Innovation Center for Coastal Bio-agriculture, School of Wetlands, Yancheng Teachers University, Yancheng, China
| | - Meng-Zhu Lu
- State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, China
| | - Jin Zhang
- State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, China
| |
Collapse
|
11
|
Song J, Tian S, Yu L, Yang Q, Dai Q, Wang Y, Wu W, Duan X. RLF-LPI: An ensemble learning framework using sequence information for predicting lncRNA-protein interaction based on AE-ResLSTM and fuzzy decision. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:4749-4764. [PMID: 35430839 DOI: 10.3934/mbe.2022222] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Long non-coding RNAs (lncRNAs) play a regulatory role in many biological cells, and the recognition of lncRNA-protein interactions is helpful to reveal the functional mechanism of lncRNAs. Identification of lncRNA-protein interaction by biological techniques is costly and time-consuming. Here, an ensemble learning framework, RLF-LPI is proposed, to predict lncRNA-protein interactions. The RLF-LPI of the residual LSTM autoencoder module with fusion attention mechanism can extract the potential representation of features and capture the dependencies between sequences and structures by k-mer method. Finally, the relationship between lncRNA and protein is learned through the method of fuzzy decision. The experimental results show that the ACC of RLF-LPI is 0.912 on ATH948 dataset and 0.921 on ZEA22133 dataset. Thus, it is demonstrated that our proposed method performed better in predicting lncRNA-protein interaction than other methods.
Collapse
Affiliation(s)
- Jinmiao Song
- Department of Information Science and Engineering, Xinjiang University, Urumqi 830008, China
- Key Laboratory of Big Data Applied Technology, State Ethnic Affairs Commission, Dalian Minzu University, Dalian 116600, China
| | - Shengwei Tian
- Department of Software, Xinjiang University, Urumqi 830008, China
- Key Laboratory of Signal and Information Processing, Xinjiang University, Urumqi 830008, China
- Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi 830008, China
| | - Long Yu
- Department of Information Science and Engineering, Xinjiang University, Urumqi 830008, China
| | - Qimeng Yang
- Department of Information Science and Engineering, Xinjiang University, Urumqi 830008, China
| | - Qiguo Dai
- Key Laboratory of Big Data Applied Technology, State Ethnic Affairs Commission, Dalian Minzu University, Dalian 116600, China
| | - Yuanxu Wang
- Key Laboratory of Big Data Applied Technology, State Ethnic Affairs Commission, Dalian Minzu University, Dalian 116600, China
| | - Weidong Wu
- Center for Science Education, People's Hospital of Xinjiang Uygur Autonomous Region, Urumqi 830001, China
| | - Xiaodong Duan
- Key Laboratory of Big Data Applied Technology, State Ethnic Affairs Commission, Dalian Minzu University, Dalian 116600, China
| |
Collapse
|
12
|
Yu B, Wang X, Zhang Y, Gao H, Wang Y, Liu Y, Gao X. RPI-MDLStack: Predicting RNA-protein interactions through deep learning with stacking strategy and LASSO. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108676] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
13
|
Peng L, Tan J, Tian X, Zhou L. EnANNDeep: An Ensemble-based lncRNA-protein Interaction Prediction Framework with Adaptive k-Nearest Neighbor Classifier and Deep Models. Interdiscip Sci 2022; 14:209-232. [PMID: 35006529 DOI: 10.1007/s12539-021-00483-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/14/2021] [Accepted: 09/15/2021] [Indexed: 01/08/2023]
Abstract
lncRNA-protein interactions (LPIs) prediction can deepen the understanding of many important biological processes. Artificial intelligence methods have reported many possible LPIs. However, most computational techniques were evaluated mainly on one dataset, which may produce prediction bias. More importantly, they were validated only under cross validation on lncRNA-protein pairs, and did not consider the performance under cross validations on lncRNAs and proteins, thus fail to search related proteins/lncRNAs for a new lncRNA/protein. Under an ensemble learning framework (EnANNDeep) composed of adaptive k-nearest neighbor classifier and Deep models, this study focuses on systematically finding underlying linkages between lncRNAs and proteins. First, five LPI-related datasets are arranged. Second, multiple source features are integrated to depict an lncRNA-protein pair. Third, adaptive k-nearest neighbor classifier, deep neural network, and deep forest are designed to score unknown lncRNA-protein pairs, respectively. Finally, interaction probabilities from the three predictors are integrated based on a soft voting technique. In comparing to five classical LPI identification models (SFPEL, PMDKN, CatBoost, PLIPCOM, and LPI-SKF) under fivefold cross validations on lncRNAs, proteins, and LPIs, EnANNDeep computes the best average AUCs of 0.8660, 0.8775, and 0.9166, respectively, and the best average AUPRs of 0.8545, 0.8595, and 0.9054, respectively, indicating its superior LPI prediction ability. Case study analyses indicate that SNHG10 may have dense linkage with Q15717. In the ensemble framework, adaptive k-nearest neighbor classifier can separately pick the most appropriate k for each query lncRNA-protein pair. More importantly, deep models including deep neural network and deep forest can effectively learn the representative features of lncRNAs and proteins.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China. .,College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China.
| | - Jingwei Tan
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Xiongfei Tian
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China.
| |
Collapse
|
14
|
Ren ZH, Yu CQ, Li LP, You ZH, Guan YJ, Li YC, Pan J. SAWRPI: A Stacking Ensemble Framework With Adaptive Weight for Predicting ncRNA-Protein Interactions Using Sequence Information. Front Genet 2022; 13:839540. [PMID: 35360836 PMCID: PMC8963817 DOI: 10.3389/fgene.2022.839540] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 02/07/2022] [Indexed: 11/13/2022] Open
Abstract
Non-coding RNAs (ncRNAs) take essential effects on biological processes, like gene regulation. One critical way of ncRNA executing biological functions is interactions between ncRNA and RNA binding proteins (RBPs). Identifying proteins, involving ncRNA-protein interactions, can well understand the function ncRNA. Many high-throughput experiment have been applied to recognize the interactions. As a consequence of these approaches are time- and labor-consuming, currently, a great number of computational methods have been developed to improve and advance the ncRNA-protein interactions research. However, these methods may be not available to all RNAs and proteins, particularly processing new RNAs and proteins. Additionally, most of them cannot process well with long sequence. In this work, a computational method SAWRPI is proposed to make prediction of ncRNA-protein through sequence information. More specifically, the raw features of protein and ncRNA are firstly extracted through the k-mer sparse matrix with SVD reduction and learning nucleic acid symbols by natural language processing with local fusion strategy, respectively. Then, to classify easily, Hilbert Transformation is exploited to transform raw feature data to the new feature space. Finally, stacking ensemble strategy is adopted to learn high-level abstraction features automatically and generate final prediction results. To confirm the robustness and stability, three different datasets containing two kinds of interactions are utilized. In comparison with state-of-the-art methods and other results classifying or feature extracting strategies, SAWRPI achieved high performance on three datasets, containing two kinds of lncRNA-protein interactions. Upon our finding, SAWRPI is a trustworthy, robust, yet simple and can be used as a beneficial supplement to the task of predicting ncRNA-protein interactions.
Collapse
Affiliation(s)
- Zhong-Hao Ren
- School of Information Engineering, Xijing University, Xi’an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi’an, China
- *Correspondence: Li-Ping Li, ; Chang-Qing Yu,
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi’an, China
- *Correspondence: Li-Ping Li, ; Chang-Qing Yu,
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Yong-Jian Guan
- School of Information Engineering, Xijing University, Xi’an, China
| | - Yue-Chao Li
- School of Information Engineering, Xijing University, Xi’an, China
| | - Jie Pan
- School of Information Engineering, Xijing University, Xi’an, China
| |
Collapse
|
15
|
Zhao G, Li P, Qiao X, Han X, Liu ZP. Predicting lncRNA–Protein Interactions by Heterogenous Network Embedding. Front Genet 2022; 12:814073. [PMID: 35186016 PMCID: PMC8854746 DOI: 10.3389/fgene.2021.814073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 12/27/2021] [Indexed: 12/25/2022] Open
Abstract
lncRNA–protein interactions play essential roles in a variety of cellular processes. However, the experimental methods for systematically mapping of lncRNA–protein interactions remain time-consuming and expensive. Therefore, it is urgent to develop reliable computational methods for predicting lncRNA–protein interactions. In this study, we propose a computational method called LncPNet to predict potential lncRNA–protein interactions by embedding an lncRNA–protein heterogenous network. The experimental results indicate that LncPNet achieves promising performance on benchmark datasets extracted from the NPInter database with an accuracy of 0.930 and area under ROC curve (AUC) of 0.971. In addition, we further compare our method with other eight state-of-the-art methods, and the results illustrate that our method achieves superior prediction performance. LncPNet provides an effective method via a new perspective of representing lncRNA–protein heterogenous network, which will greatly benefit the prediction of lncRNA–protein interactions.
Collapse
Affiliation(s)
- Guoqing Zhao
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
| | - Pengpai Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
| | - Xu Qiao
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
| | - Xianhua Han
- Faculty of Science, Yamaguchi University, Yamaguchi, Japan
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
- *Correspondence: Zhi-Ping Liu,
| |
Collapse
|
16
|
Staem5: A novel computational approachfor accurate prediction of m5C site. MOLECULAR THERAPY. NUCLEIC ACIDS 2021; 26:1027-1034. [PMID: 34786208 PMCID: PMC8571400 DOI: 10.1016/j.omtn.2021.10.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 08/27/2021] [Accepted: 10/06/2021] [Indexed: 12/25/2022]
Abstract
5-Methylcytosine (m5C) is an important post-transcriptional modification that has been extensively found in multiple types of RNAs. Many studies have shown that m5C plays vital roles in many biological functions, such as RNA structure stability and metabolism. Computational approaches act as an efficient way to identify m5C sites from high-throughput RNA sequence data and help interpret the functional mechanism of this important modification. This study proposed a novel species-specific computational approach, Staem5, to accurately predict RNA m5C sites in Mus musculus and Arabidopsis thaliana. Staem5 was developed by employing feature fusion tactics to leverage informatic sequence profiles, and a stacking ensemble learning framework combined five popular machine learning algorithms. Extensive benchmarking tests demonstrated that Staem5 outperformed state-of-the-art approaches in both cross-validation and independent tests. We provide the source code of Staem5, which is publicly available at https://github.com/Cxd-626/Staem5.git.
Collapse
|
17
|
Kang Q, Meng J, Su C, Luan Y. Mining plant endogenous target mimics from miRNA-lncRNA interactions based on dual-path parallel ensemble pruning method. Brief Bioinform 2021; 23:6399881. [PMID: 34662389 DOI: 10.1093/bib/bbab440] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 09/07/2021] [Accepted: 09/24/2021] [Indexed: 12/14/2022] Open
Abstract
The interactions between microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) play important roles in biological activities. Specially, lncRNAs as endogenous target mimics (eTMs) can bind miRNAs to regulate the expressions of target messenger RNAs (mRNAs). A growing number of studies focus on animals, but the studies on plants are scarce and many functions of plant eTMs are unknown. This study proposes a novel ensemble pruning protocol for predicting plant miRNA-lncRNA interactions at first. It adaptively prunes the base models based on dual-path parallel ensemble method to meet the challenge of cross-species prediction. Then potential eTMs are mined from predicted results. The expression levels of RNAs are identified through biological experiment to construct the lncRNA-miRNA-mRNA regulatory network, and the functions of potential eTMs are inferred through enrichment analysis. Experiment results show that the proposed protocol outperforms existing methods and state-of-the-art predictors on various plant species. A total of 17 potential eTMs are verified by biological experiment to involve in 22 regulations, and 14 potential eTMs are inferred by Gene Ontology enrichment analysis to involve in 63 functions, which is significant for further research.
Collapse
Affiliation(s)
- Qiang Kang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Chenglin Su
- School of Bioengineering, Dalian University of Technology, Dalian, Liaoning, 116024 China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, Liaoning, 116024 China
| |
Collapse
|
18
|
Liang X, Li F, Chen J, Li J, Wu H, Li S, Song J, Liu Q. Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification. Brief Bioinform 2021; 22:bbaa312. [PMID: 33316035 PMCID: PMC8294543 DOI: 10.1093/bib/bbaa312] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 09/30/2020] [Accepted: 08/25/2020] [Indexed: 12/13/2022] Open
Abstract
Anti-cancer peptides (ACPs) are known as potential therapeutics for cancer. Due to their unique ability to target cancer cells without affecting healthy cells directly, they have been extensively studied. Many peptide-based drugs are currently evaluated in the preclinical and clinical trials. Accurate identification of ACPs has received considerable attention in recent years; as such, a number of machine learning-based methods for in silico identification of ACPs have been developed. These methods promote the research on the mechanism of ACPs therapeutics against cancer to some extent. There is a vast difference in these methods in terms of their training/testing datasets, machine learning algorithms, feature encoding schemes, feature selection methods and evaluation strategies used. Therefore, it is desirable to summarize the advantages and disadvantages of the existing methods, provide useful insights and suggestions for the development and improvement of novel computational tools to characterize and identify ACPs. With this in mind, we firstly comprehensively investigate 16 state-of-the-art predictors for ACPs in terms of their core algorithms, feature encoding schemes, performance evaluation metrics and webserver/software usability. Then, comprehensive performance assessment is conducted to evaluate the robustness and scalability of the existing predictors using a well-prepared benchmark dataset. We provide potential strategies for the model performance improvement. Moreover, we propose a novel ensemble learning framework, termed ACPredStackL, for the accurate identification of ACPs. ACPredStackL is developed based on the stacking ensemble strategy combined with SVM, Naïve Bayesian, lightGBM and KNN. Empirical benchmarking experiments against the state-of-the-art methods demonstrate that ACPredStackL achieves a comparative performance for predicting ACPs. The webserver and source code of ACPredStackL is freely available at http://bigdata.biocie.cn/ACPredStackL/ and https://github.com/liangxiaoq/ACPredStackL, respectively.
Collapse
Affiliation(s)
- Xiao Liang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
- Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, Shaanxi 712100, China
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Monash University, Melbourne, VIC 3800, Australia
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia
| | - Jinxiang Chen
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Junlong Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Hao Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Shuqin Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
- Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, Shaanxi 712100, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
- Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, Shaanxi 712100, China
| |
Collapse
|
19
|
Yi HC, You ZH, Wang L, Su XR, Zhou X, Jiang TH. In silico drug repositioning using deep learning and comprehensive similarity measures. BMC Bioinformatics 2021; 22:293. [PMID: 34074242 PMCID: PMC8170943 DOI: 10.1186/s12859-020-03882-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 11/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Drug repositioning, meanings finding new uses for existing drugs, which can accelerate the processing of new drugs research and development. Various computational methods have been presented to predict novel drug-disease associations for drug repositioning based on similarity measures among drugs and diseases. However, there are some known associations between drugs and diseases that previous studies not utilized. METHODS In this work, we develop a deep gated recurrent units model to predict potential drug-disease interactions using comprehensive similarity measures and Gaussian interaction profile kernel. More specifically, the similarity measure is used to exploit discriminative feature for drugs based on their chemical fingerprints. Meanwhile, the Gaussian interactions profile kernel is employed to obtain efficient feature of diseases based on known disease-disease associations. Then, a deep gated recurrent units model is developed to predict potential drug-disease interactions. RESULTS The performance of the proposed model is evaluated on two benchmark datasets under tenfold cross-validation. And to further verify the predictive ability, case studies for predicting new potential indications of drugs were carried out. CONCLUSION The experimental results proved the proposed model is a useful tool for predicting new indications for drugs or new treatments for diseases, and can accelerate drug repositioning and related drug research and discovery.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.
| | - Lei Wang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Xiao-Rui Su
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xi Zhou
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Tong-Hai Jiang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| |
Collapse
|
20
|
Liang M, Chang T, An B, Duan X, Du L, Wang X, Miao J, Xu L, Gao X, Zhang L, Li J, Gao H. A Stacking Ensemble Learning Framework for Genomic Prediction. Front Genet 2021; 12:600040. [PMID: 33747037 PMCID: PMC7969712 DOI: 10.3389/fgene.2021.600040] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 01/12/2021] [Indexed: 11/22/2022] Open
Abstract
Machine learning (ML) is perhaps the most useful tool for the interpretation of large genomic datasets. However, the performance of a single machine learning method in genomic selection (GS) is currently unsatisfactory. To improve the genomic predictions, we constructed a stacking ensemble learning framework (SELF), integrating three machine learning methods, to predict genomic estimated breeding values (GEBVs). The present study evaluated the prediction ability of SELF by analyzing three real datasets, with different genetic architecture; comparing the prediction accuracy of SELF, base learners, genomic best linear unbiased prediction (GBLUP) and BayesB. For each trait, SELF performed better than base learners, which included support vector regression (SVR), kernel ridge regression (KRR) and elastic net (ENET). The prediction accuracy of SELF was, on average, 7.70% higher than GBLUP in three datasets. Except for the milk fat percentage (MFP) traits, of the German Holstein dairy cattle dataset, SELF was more robust than BayesB in all remaining traits. Therefore, we believed that SEFL has the potential to be promoted to estimate GEBVs in other animals and plants.
Collapse
Affiliation(s)
- Mang Liang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Tianpeng Chang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Bingxing An
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xinghai Duan
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lili Du
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xiaoqiao Wang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jian Miao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lingyang Xu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xue Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lupei Zhang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Junya Li
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Huijiang Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
21
|
Zhang Q, Liu P, Wang X, Zhang Y, Han Y, Yu B. StackPDB: Predicting DNA-binding proteins based on XGB-RFE feature optimization and stacked ensemble classifier. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.106921] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
22
|
Shaw D, Chen H, Xie M, Jiang T. DeepLPI: a multimodal deep learning method for predicting the interactions between lncRNAs and protein isoforms. BMC Bioinformatics 2021; 22:24. [PMID: 33461501 PMCID: PMC7814738 DOI: 10.1186/s12859-020-03914-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 11/30/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) regulate diverse biological processes via interactions with proteins. Since the experimental methods to identify these interactions are expensive and time-consuming, many computational methods have been proposed. Although these computational methods have achieved promising prediction performance, they neglect the fact that a gene may encode multiple protein isoforms and different isoforms of the same gene may interact differently with the same lncRNA. RESULTS In this study, we propose a novel method, DeepLPI, for predicting the interactions between lncRNAs and protein isoforms. Our method uses sequence and structure data to extract intrinsic features and expression data to extract topological features. To combine these different data, we adopt a hybrid framework by integrating a multimodal deep learning neural network and a conditional random field. To overcome the lack of known interactions between lncRNAs and protein isoforms, we apply a multiple instance learning (MIL) approach. In our experiment concerning the human lncRNA-protein interactions in the NPInter v3.0 database, DeepLPI improved the prediction performance by 4.7% in term of AUC and 5.9% in term of AUPRC over the state-of-the-art methods. Our further correlation analyses between interactive lncRNAs and protein isoforms also illustrated that their co-expression information helped predict the interactions. Finally, we give some examples where DeepLPI was able to outperform the other methods in predicting mouse lncRNA-protein interactions and novel human lncRNA-protein interactions. CONCLUSION Our results demonstrated that the use of isoforms and MIL contributed significantly to the improvement of performance in predicting lncRNA and protein interactions. We believe that such an approach would find more applications in predicting other functional roles of RNAs and proteins.
Collapse
Affiliation(s)
- Dipan Shaw
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
- Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China
| |
Collapse
|
23
|
Wekesa JS, Meng J, Luan Y. A deep learning model for plant lncRNA-protein interaction prediction with graph attention. Mol Genet Genomics 2020; 295:1091-1102. [DOI: 10.1007/s00438-020-01682-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Accepted: 05/01/2020] [Indexed: 02/06/2023]
|