1
|
Wang M, Ali H, Xu Y, Xie J, Xu S. BiPSTP: Sequence feature encoding method for identifying different RNA modifications with bidirectional position-specific trinucleotides propensities. J Biol Chem 2024; 300:107140. [PMID: 38447795 PMCID: PMC10997841 DOI: 10.1016/j.jbc.2024.107140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/17/2024] [Accepted: 02/25/2024] [Indexed: 03/08/2024] Open
Abstract
RNA modification, a posttranscriptional regulatory mechanism, significantly influences RNA biogenesis and function. The accurate identification of modification sites is paramount for investigating their biological implications. Methods for encoding RNA sequence into numerical data play a crucial role in developing robust models for predicting modification sites. However, existing techniques suffer from limitations, including inadequate information representation, challenges in effectively integrating positional and sequential information, and the generation of irrelevant or redundant features when combining multiple approaches. These deficiencies hinder the effectiveness of machine learning models in addressing the performance challenges associated with predicting RNA modification sites. Here, we introduce a novel RNA sequence feature representation method, named BiPSTP, which utilizes bidirectional trinucleotide position-specific propensities. We employ the parameter ξ to denote the interval between the current nucleotide and its adjacent forward or backward dinucleotide, enabling the extraction of positional and sequential information from RNA sequences. Leveraging the BiPSTP method, we have developed the prediction model mRNAPred using support vector machine classifier to identify multiple types of RNA modification sites. We evaluate the performance of our BiPSTP method and mRNAPred model across 12 distinct RNA modification types. Our experimental results demonstrate the superiority of the mRNAPred model compared to state-of-art models in the domain of RNA modification sites identification. Importantly, our BiPSTP method enhances the robustness and generalization performance of prediction models. Notably, it can be applied to feature extraction from DNA sequences to predict other biological modification sites.
Collapse
Affiliation(s)
- Mingzhao Wang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Haider Ali
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yandi Xu
- School of Computer Science, Shaanxi Normal University, Xi'an, China; College of Life Sciences, Shaanxi Normal University, Xi'an, China
| | - Juanying Xie
- School of Computer Science, Shaanxi Normal University, Xi'an, China.
| | - Shengquan Xu
- College of Life Sciences, Shaanxi Normal University, Xi'an, China.
| |
Collapse
|
2
|
Zhang X, Wang S, Xie L, Zhu Y. PseU-ST: A new stacked ensemble-learning method for identifying RNA pseudouridine sites. Front Genet 2023; 14:1121694. [PMID: 36741328 PMCID: PMC9892456 DOI: 10.3389/fgene.2023.1121694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 01/09/2023] [Indexed: 01/20/2023] Open
Abstract
Background: Pseudouridine (Ψ) is one of the most abundant RNA modifications found in a variety of RNA types, and it plays a significant role in many biological processes. The key to studying the various biochemical functions and mechanisms of Ψ is to identify the Ψ sites. However, identifying Ψ sites using experimental methods is time-consuming and expensive. Therefore, it is necessary to develop computational methods that can accurately predict Ψ sites based on RNA sequence information. Methods: In this study, we proposed a new model called PseU-ST to identify Ψ sites in Homo sapiens (H. sapiens), Saccharomyces cerevisiae (S. cerevisiae), and Mus musculus (M. musculus). We selected the best six encoding schemes and four machine learning algorithms based on a comprehensive test of almost all of the RNA sequence encoding schemes available in the iLearnPlus software package, and selected the optimal features for each encoding scheme using chi-square and incremental feature selection algorithms. Then, we selected the optimal feature combination and the best base-classifier combination for each species through an extensive performance comparison and employed a stacking strategy to build the predictive model. Results: The results demonstrated that PseU-ST achieved better prediction performance compared with other existing models. The PseU-ST accuracy scores were 93.64%, 87.74%, and 89.64% on H_990, S_628, and M_944, respectively, representing increments of 13.94%, 6.05%, and 0.26%, respectively, higher than the best existing methods on the same benchmark training datasets. Conclusion: The data indicate that PseU-ST is a very competitive prediction model for identifying RNA Ψ sites in H. sapiens, M. musculus, and S. cerevisiae. In addition, we found that the Position-specific trinucleotide propensity based on single strand (PSTNPss) and Position-specific of three nucleotides (PS3) features play an important role in Ψ site identification. The source code for PseU-ST and the data are obtainable in our GitHub repository (https://github.com/jluzhangxinrubio/PseU-ST).
Collapse
|
3
|
Zou J, Liu H, Tan W, Chen YQ, Dong J, Bai SY, Wu ZX, Zeng Y. Dynamic regulation and key roles of ribonucleic acid methylation. Front Cell Neurosci 2022; 16:1058083. [PMID: 36601431 PMCID: PMC9806184 DOI: 10.3389/fncel.2022.1058083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open
Abstract
Ribonucleic acid (RNA) methylation is the most abundant modification in biological systems, accounting for 60% of all RNA modifications, and affects multiple aspects of RNA (including mRNAs, tRNAs, rRNAs, microRNAs, and long non-coding RNAs). Dysregulation of RNA methylation causes many developmental diseases through various mechanisms mediated by N 6-methyladenosine (m6A), 5-methylcytosine (m5C), N 1-methyladenosine (m1A), 5-hydroxymethylcytosine (hm5C), and pseudouridine (Ψ). The emerging tools of RNA methylation can be used as diagnostic, preventive, and therapeutic markers. Here, we review the accumulated discoveries to date regarding the biological function and dynamic regulation of RNA methylation/modification, as well as the most popularly used techniques applied for profiling RNA epitranscriptome, to provide new ideas for growth and development.
Collapse
Affiliation(s)
- Jia Zou
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Hui Liu
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Wei Tan
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
| | - Yi-qi Chen
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Jing Dong
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Shu-yuan Bai
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Zhao-xia Wu
- Community Health Service Center, Wuchang Hospital, Wuhan, China
| | - Yan Zeng
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China,School of Public Health, Wuhan University of Science and Technology, Wuhan, China,*Correspondence: Yan Zeng,
| |
Collapse
|
4
|
Peng L, Yang J, Wang M, Zhou L. Editorial: Machine Learning-Based Methods for RNA Data Analysis. Front Genet 2022; 13:828575. [PMID: 35692815 PMCID: PMC9175173 DOI: 10.3389/fgene.2022.828575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 04/12/2022] [Indexed: 11/13/2022] Open
Affiliation(s)
- Lihong Peng
- College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China
- School of Computer, Hunan University of Technology, Zhuzhou, China
| | | | - Minxian Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Liqian Zhou
- College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China
- *Correspondence: Liqian Zhou,
| |
Collapse
|
5
|
Wang H, Wang S, Zhang Y, Bi S, Zhu X. A brief review of machine learning methods for RNA methylation sites prediction. Methods 2022; 203:399-421. [DOI: 10.1016/j.ymeth.2022.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 02/15/2022] [Accepted: 03/01/2022] [Indexed: 02/07/2023] Open
|
6
|
Wang X, Lin X, Wang R, Han N, Fan K, Han L, Ding Z. A Feature Fusion Predictor for RNA Pseudouridine Sites with Particle Swarm Optimizer Based Feature Selection and Ensemble Learning Approach. Curr Issues Mol Biol 2021; 43:1844-1858. [PMID: 34889887 PMCID: PMC8929013 DOI: 10.3390/cimb43030129] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 10/17/2021] [Accepted: 10/19/2021] [Indexed: 01/28/2023] Open
Abstract
RNA pseudouridine modification is particularly important in a variety of cellular biological and physiological processes. It plays a significant role in understanding RNA functions, RNA structure stabilization, translation processes, etc. To understand its functional mechanisms, it is necessary to accurately identify pseudouridine sites in RNA sequences. Although some computational methods have been proposed for the identification of pseudouridine sites, it is still a challenge to improve the identification accuracy and generalization ability. To address this challenge, a novel feature fusion predictor, named PsoEL-PseU, is proposed for the prediction of pseudouridine sites. Firstly, this study systematically and comprehensively explored different types of feature descriptors and determined six feature descriptors with various properties. To improve the feature representation ability, a binary particle swarm optimizer was used to capture the optimal feature subset for six feature descriptors. Secondly, six individual predictors were trained by using the six optimal feature subsets. Finally, to fuse the effects of all six features, six individual predictors were fused into an ensemble predictor by a parallel fusion strategy. Ten-fold cross-validation on three benchmark datasets indicated that the PsoEL-PseU predictor significantly outperformed the current state-of-the-art predictors. Additionally, the new predictor achieved better accuracy in the independent dataset evaluation-accuracy which is significantly higher than that of its existing counterparts-and the user-friendly webserver developed by the PsoEL-PseU predictor has been made freely accessible.
Collapse
Affiliation(s)
- Xiao Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China; (X.L.); (R.W.); (N.H.); (L.H.); (Z.D.)
- Correspondence:
| | - Xi Lin
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China; (X.L.); (R.W.); (N.H.); (L.H.); (Z.D.)
| | - Rong Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China; (X.L.); (R.W.); (N.H.); (L.H.); (Z.D.)
| | - Nijia Han
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China; (X.L.); (R.W.); (N.H.); (L.H.); (Z.D.)
| | - Kaiqi Fan
- School of Material and Chemical Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China;
| | - Lijun Han
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China; (X.L.); (R.W.); (N.H.); (L.H.); (Z.D.)
| | - Zhaoyuan Ding
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China; (X.L.); (R.W.); (N.H.); (L.H.); (Z.D.)
| |
Collapse
|
7
|
Zhou H, Wekesa JS, Luan Y, Meng J. PRPI-SC: an ensemble deep learning model for predicting plant lncRNA-protein interactions. BMC Bioinformatics 2021; 22:415. [PMID: 34429059 PMCID: PMC8385908 DOI: 10.1186/s12859-021-04328-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 11/09/2020] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Plant long non-coding RNAs (lncRNAs) play vital roles in many biological processes mainly through interactions with RNA-binding protein (RBP). To understand the function of lncRNAs, a fundamental method is to identify which types of proteins interact with the lncRNAs. However, the models or rules of interactions are a major challenge when calculating and estimating the types of RBP. RESULTS In this study, we propose an ensemble deep learning model to predict plant lncRNA-protein interactions using stacked denoising autoencoder and convolutional neural network based on sequence and structural information, named PRPI-SC. PRPI-SC predicts interactions between lncRNAs and proteins based on the k-mer features of RNAs and proteins. Experiments proved good results on Arabidopsis thaliana and Zea mays datasets (ATH948 and ZEA22133). The accuracy rates of ATH948 and ZEA22133 datasets were 88.9% and 82.6%, respectively. PRPI-SC also performed well on some public RNA protein interaction datasets. CONCLUSIONS PRPI-SC accurately predicts the interaction between plant lncRNA and protein, which plays a guiding role in studying the function and expression of plant lncRNA. At the same time, PRPI-SC has a strong generalization ability and good prediction effect for non-plant data.
Collapse
Affiliation(s)
- Haoran Zhou
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 Liaoning China
| | - Jael Sanyanda Wekesa
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 Liaoning China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, 116024 Liaoning China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 Liaoning China
| |
Collapse
|
8
|
Aziz AZB, Hasan MAM, Shin J. Identification of RNA pseudouridine sites using deep learning approaches. PLoS One 2021; 16:e0247511. [PMID: 33621235 PMCID: PMC7901771 DOI: 10.1371/journal.pone.0247511] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 02/08/2021] [Indexed: 01/05/2023] Open
Abstract
Pseudouridine(Ψ) is widely popular among various RNA modifications which have been confirmed to occur in rRNA, mRNA, tRNA, and nuclear/nucleolar RNA. Hence, identifying them has vital significance in academic research, drug development and gene therapies. Several laboratory techniques for Ψ identification have been introduced over the years. Although these techniques produce satisfactory results, they are costly, time-consuming and requires skilled experience. As the lengths of RNA sequences are getting longer day by day, an efficient method for identifying pseudouridine sites using computational approaches is very important. In this paper, we proposed a multi-channel convolution neural network using binary encoding. We employed k-fold cross-validation and grid search to tune the hyperparameters. We evaluated its performance in the independent datasets and found promising results. The results proved that our method can be used to identify pseudouridine sites for associated purposes. We have also implemented an easily accessible web server at http://103.99.176.239/ipseumulticnn/.
Collapse
Affiliation(s)
- Abu Zahid Bin Aziz
- Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh
- * E-mail:
| | - Md. Al Mehedi Hasan
- Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh
| | - Jungpil Shin
- School of Computer Science and Engineering, University of Aizu, Aizuwakamatsu, Japan
| |
Collapse
|
9
|
Wang W, Guan X, Khan MT, Xiong Y, Wei DQ. LMI-DForest: A deep forest model towards the prediction of lncRNA-miRNA interactions. Comput Biol Chem 2020; 89:107406. [PMID: 33120126 DOI: 10.1016/j.compbiolchem.2020.107406] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 10/12/2020] [Accepted: 10/15/2020] [Indexed: 02/07/2023]
Abstract
The interactions between miRNAs and long non-coding RNAs (lncRNAs) are subject to intensive recent studies due to its critical role in gene regulations. Computational prediction of lncRNA-miRNA interactions has become a popular alternative strategy to the experimental methods for identification of underlying interactions. It is desirable to develop the machine learning-based models for prediction of lncRNA-miRNA based on the experimentally validated interactions between lncRNAs and miRNAs. The accuracy and robustness of existing models based on machine learning techniques are subject to further improvement. Considering that the attributes of lncRNA and miRNA contribute key importance in the interaction between these two RNAs, a deep learning model, named LMI-DForest, is proposed here by combining the deep forest and autoencoder strategies. Systematic comparison on the experiment validated datasets for lncRNA-miRNA interaction datasets demonstrates that the proposed method consistently shows superior performance over the other machine learning models in the lncRNA-miRNA interaction prediction.
Collapse
Affiliation(s)
- Wei Wang
- School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Xiaoqing Guan
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Muhammad Tahir Khan
- Institute of Molecular Biology and Biotechnology, The University of Lahore Pakistan, Pakistan
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China; Peng Cheng Laboratory, Shenzhen, Guangdong, China.
| |
Collapse
|