1
|
Yuan Y, Yang E, Zhang R. Wfold: A new method for predicting RNA secondary structure with deep learning. Comput Biol Med 2024; 182:109207. [PMID: 39341115 DOI: 10.1016/j.compbiomed.2024.109207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 09/21/2024] [Accepted: 09/23/2024] [Indexed: 09/30/2024]
Abstract
Precise estimations of RNA secondary structures have the potential to reveal the various roles that non-coding RNAs play in regulating cellular activity. However, the mainstay of traditional RNA secondary structure prediction methods relies on thermos-dynamic models via free energy minimization, a laborious process that requires a lot of prior knowledge. Here, RNA secondary structure prediction using Wfold, an end-to-end deep learning-based approach, is suggested. Wfold is trained directly on annotated data and base-pairing criteria. It makes use of an image-like representation of RNA sequences, which an enhanced U-net incorporated with a transformer encoder can process effectively. Wfold eventually increases the accuracy of RNA secondary structure prediction by combining the benefits of self-attention mechanism's mining of long-range information with U-net's ability to gather local information. We compare Wfold's performance using RNA datasets that are within and across families. When trained and evaluated on different RNA families, it achieves a similar performance as the traditional methods, but dramatically outperforms the state-of-the-art methods on within-family datasets. Moreover, Wfold can also reliably forecast pseudoknots. The findings imply that Wfold may be useful for improving sequence alignment, functional annotations, and RNA structure modeling.
Collapse
Affiliation(s)
- Yongna Yuan
- School of Information Science & Engineering, Lanzhou University, South Tianshui Road, Lanzhou, 730000, Gansu, China.
| | - Enjie Yang
- School of Information Science & Engineering, Lanzhou University, South Tianshui Road, Lanzhou, 730000, Gansu, China
| | - Ruisheng Zhang
- School of Information Science & Engineering, Lanzhou University, South Tianshui Road, Lanzhou, 730000, Gansu, China
| |
Collapse
|
2
|
Zhao Q, Zhao Z, Fan X, Yuan Z, Mao Q, Yao Y. Review of machine learning methods for RNA secondary structure prediction. PLoS Comput Biol 2021; 17:e1009291. [PMID: 34437528 PMCID: PMC8389396 DOI: 10.1371/journal.pcbi.1009291] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Secondary structure plays an important role in determining the function of noncoding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagnated in the last decade. Recently, with the increasing availability of RNA structure data, new methods based on machine learning (ML) technologies, especially deep learning, have alleviated the issue. In this review, we provide a comprehensive overview of RNA secondary structure prediction methods based on ML technologies and a tabularized summary of the most important methods in this field. The current pending challenges in the field of RNA secondary structure prediction and future trends are also discussed.
Collapse
Affiliation(s)
- Qi Zhao
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Zheng Zhao
- School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, China
| | - Xiaoya Fan
- School of Software, Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian University of Technology, Dalian, Liaoning, China
| | - Zhengwei Yuan
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital of China Medical University, Shenyang, Liaoning, China
| | - Qian Mao
- College of Light Industry, Liaoning University, Shenyang, Liaoning, China
- Key Laboratory of Agroproducts Processing Technology, Changchun University, Changchun, Jilin, China
| | - Yudong Yao
- Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, New Jersey, United States of America
| |
Collapse
|
3
|
Xu C, Gao L, Li J, Shen L, Liang H, Luan K, Wu X. Prediction of RNA secondary structure based on stem region replacement using the RSRNA algorithm. Comput Methods Biomech Biomed Engin 2020; 24:101-114. [PMID: 32901523 DOI: 10.1080/10255842.2020.1813280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
RNA functions, including the regulation of various cellular activities, seem to be closely related to its structure. However, accurately predicting RNA secondary structures can be difficult. Structural prediction can be achieved by selecting stem areas that are suitable and compatible from stem pools. Here, we propose a method for predicting the secondary structure of non-coding RNA based on stem region substitution, which we named RSRNA. This method is compatible with nested RNA secondary structures, while reducing any randomness. Our algorithm had higher performance and prediction accuracy than other algorithms, which deems it more effective for future RNA structure studies.
Collapse
Affiliation(s)
- Chengzhen Xu
- School of Computer Science and Technology, Huaibei Normal University, Huaibei, China.,College of Life Sciences, Huaibei Normal University, Huaibei, China
| | - Longjian Gao
- School of Computer Science and Technology, Huaibei Normal University, Huaibei, China
| | - Jin Li
- College of Automation, Harbin Engineering University, Harbin, China
| | - Longfeng Shen
- School of Computer Science and Technology, Huaibei Normal University, Huaibei, China
| | - Hong Liang
- College of Automation, Harbin Engineering University, Harbin, China
| | - Kuan Luan
- College of Automation, Harbin Engineering University, Harbin, China
| | - Xiaomin Wu
- College of Life Sciences, Huaibei Normal University, Huaibei, China
| |
Collapse
|
4
|
Wang L, Liu Y, Zhong X, Liu H, Lu C, Li C, Zhang H. DMfold: A Novel Method to Predict RNA Secondary Structure With Pseudoknots Based on Deep Learning and Improved Base Pair Maximization Principle. Front Genet 2019; 10:143. [PMID: 30886627 PMCID: PMC6409321 DOI: 10.3389/fgene.2019.00143] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Accepted: 02/12/2019] [Indexed: 01/21/2023] Open
Abstract
While predicting the secondary structure of RNA is vital for researching its function, determining RNA secondary structure is challenging, especially for that with pseudoknots. Typically, several excellent computational methods can be utilized to predict the secondary structure (with or without pseudoknots), but they have their own merits and demerits. These methods can be classified into two categories: the multi-sequence method and the single-sequence method. The main advantage of the multi-sequence method lies in its use of the auxiliary sequences to assist in predicting the secondary structure, but it can only successfully predict in the presence of multiple highly homologous sequences. The single-sequence method is associated with the major merit of easy operation (only need the target sequence to predict secondary structure), but its folding parameters are the common features of diversity RNA, which cannot describe the unique characteristics of RNA, thus potentially resulting in the low prediction accuracy in some RNA. In this paper, "DMfold," a method based on the Deep Learning and Improved Base Pair Maximization Principle, is proposed to predict the secondary structure with pseudoknots, which fully absorbs the advantages and avoids some disadvantages of those two methods. Notably, DMfold could predict the secondary structure of RNA by learning similar RNA in the known structures, which uses the similar RNA sequences instead of the highly homogeneous sequences in the multi-sequence method, thereby reducing the requirement for auxiliary sequences. In DMfold, it only needs to input the target sequence to predict the secondary structure. Its folding parameters are fully extracted automatically by deep learning, which could avoid the lack of folding parameters in the single-sequence method. Experiments show that our method is not only simple to operate, but also improves the prediction accuracy compared to multiple excellent prediction methods. A repository containing our code can be found at https://github.com/linyuwangPHD/RNA-Secondary-Structure-Database.
Collapse
Affiliation(s)
- Linyu Wang
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Yuanning Liu
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Xiaodan Zhong
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
- Department of Pediatric Oncology, The First Hospital of Jilin University, Changchun, China
| | - Haiming Liu
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Chao Lu
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Cong Li
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Hao Zhang
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
5
|
Lin H, Peng S, Huang J. Special issue on Computational Resources and Methods in Biological Sciences. Int J Biol Sci 2018; 14:807-810. [PMID: 29989106 PMCID: PMC6036761 DOI: 10.7150/ijbs.27554] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 06/03/2018] [Indexed: 12/11/2022] Open
Abstract
This special issue covers a wide range of topics in computational biology, such as database construction, sequence analysis and function prediction with machine learning methods, disease-related diagnosis, drug-target and drug discovery, and electronic health record system construction.
Collapse
Affiliation(s)
- Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China.,School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 610054, China
| | - Shaoliang Peng
- School of Computer Science, National University of Defense Technology, Changsha 410073, China
| | - Jian Huang
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China.,School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 610054, China
| |
Collapse
|