1
|
Lu W, Cao Y, Wu H, Ding Y, Song Z, Zhang Y, Fu Q, Li H. Research on RNA secondary structure predicting via bidirectional recurrent neural network. BMC Bioinformatics 2021; 22:431. [PMID: 34496763 PMCID: PMC8427827 DOI: 10.1186/s12859-021-04332-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Accepted: 08/23/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequence information with different sequence lengths to the prediction process due to the constraint of the self model when predicting the RNA secondary structure. In addition, there is a large difference between the number of paired bases and the number of unpaired bases in the RNA sequences, which means the problem of positive and negative sample imbalance is easy to make the model fall into a local optimum. To solve the above problems, this paper proposes a variable-length dynamic bidirectional Gated Recurrent Unit(VLDB GRU) model. The model can accept sequences with different lengths through the introduction of flag vector. The model can also make full use of the base information before and after the predicted base and can avoid losing part of the information due to truncation. Introducing a weight vector to predict the RNA training set by dynamically adjusting each base loss function solves the problem of balanced sample imbalance. RESULTS The algorithm proposed in this paper is compared with the existing algorithms on five representative subsets of the data set RNA STRAND. The experimental results show that the accuracy and Matthews correlation coefficient of the method are improved by 4.7% and 11.4%, respectively. CONCLUSIONS The flag vector introduced allows the model to effectively use the information before and after the protein sequence; the introduced weight vector solves the problem of unbalanced sample balance. Compared with other algorithms, the LVDB GRU algorithm proposed in this paper has the best detection results.
Collapse
Affiliation(s)
- Weizhong Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Yan Cao
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China. .,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Zhengwei Song
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Yu Zhang
- Suzhou Industrial Park Institute of Services Outsourcing, Suzhou, 215123, China
| | - Qiming Fu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Haiou Li
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| |
Collapse
|
2
|
Xu C, Gao L, Li J, Shen L, Liang H, Luan K, Wu X. Prediction of RNA secondary structure based on stem region replacement using the RSRNA algorithm. Comput Methods Biomech Biomed Engin 2020; 24:101-114. [PMID: 32901523 DOI: 10.1080/10255842.2020.1813280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
RNA functions, including the regulation of various cellular activities, seem to be closely related to its structure. However, accurately predicting RNA secondary structures can be difficult. Structural prediction can be achieved by selecting stem areas that are suitable and compatible from stem pools. Here, we propose a method for predicting the secondary structure of non-coding RNA based on stem region substitution, which we named RSRNA. This method is compatible with nested RNA secondary structures, while reducing any randomness. Our algorithm had higher performance and prediction accuracy than other algorithms, which deems it more effective for future RNA structure studies.
Collapse
Affiliation(s)
- Chengzhen Xu
- School of Computer Science and Technology, Huaibei Normal University, Huaibei, China.,College of Life Sciences, Huaibei Normal University, Huaibei, China
| | - Longjian Gao
- School of Computer Science and Technology, Huaibei Normal University, Huaibei, China
| | - Jin Li
- College of Automation, Harbin Engineering University, Harbin, China
| | - Longfeng Shen
- School of Computer Science and Technology, Huaibei Normal University, Huaibei, China
| | - Hong Liang
- College of Automation, Harbin Engineering University, Harbin, China
| | - Kuan Luan
- College of Automation, Harbin Engineering University, Harbin, China
| | - Xiaomin Wu
- College of Life Sciences, Huaibei Normal University, Huaibei, China
| |
Collapse
|
3
|
Zhu Y, Xie Z, Li Y, Zhu M, Chen YPP. Research on folding diversity in statistical learning methods for RNA secondary structure prediction. Int J Biol Sci 2018; 14:872-882. [PMID: 29989089 PMCID: PMC6036747 DOI: 10.7150/ijbs.24595] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2017] [Accepted: 02/21/2018] [Indexed: 12/24/2022] Open
Abstract
How to improve the prediction accuracy of RNA secondary structure is currently a hot topic. The existing prediction methods for a single sequence do not fully consider the folding diversity which may occur among RNAs with different functions or sources. This paper explores the relationship between folding diversity and prediction accuracy, and puts forward a new method to improve the prediction accuracy of RNA secondary structure. Our research investigates the following: 1. The folding feature based on stochastic context-free grammar is proposed. By using dimension reduction and clustering techniques, some public data sets are analyzed. The results show that there is significant folding diversity among different RNA families. 2. To assign folding rules to RNAs without structural information, a classification method based on production probability is proposed. The experimental results show that the classification method proposed in this paper can effectively classify the RNAs of unknown structure. 3. Based on the existing prediction methods of statistical learning models, an RNA secondary structure prediction framework is proposed, namely "Cluster - Training - Parameter Selection - Prediction". The results show that, with information on folding diversity, prediction accuracy can be significantly improved.
Collapse
Affiliation(s)
- Yu Zhu
- College of Computer Science, Sichuan University, China
| | - ZhaoYang Xie
- College of Computer Science, Sichuan University, China
| | - YiZhou Li
- College of Chemistry, Sichuan University, China
| | - Min Zhu
- Vice Dean of College of Computer Science, Sichuan University
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Australia
| |
Collapse
|