1
|
Wang M, Ali H, Xu Y, Xie J, Xu S. BiPSTP: Sequence feature encoding method for identifying different RNA modifications with bidirectional position-specific trinucleotides propensities. J Biol Chem 2024; 300:107140. [PMID: 38447795 PMCID: PMC10997841 DOI: 10.1016/j.jbc.2024.107140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/17/2024] [Accepted: 02/25/2024] [Indexed: 03/08/2024] Open
Abstract
RNA modification, a posttranscriptional regulatory mechanism, significantly influences RNA biogenesis and function. The accurate identification of modification sites is paramount for investigating their biological implications. Methods for encoding RNA sequence into numerical data play a crucial role in developing robust models for predicting modification sites. However, existing techniques suffer from limitations, including inadequate information representation, challenges in effectively integrating positional and sequential information, and the generation of irrelevant or redundant features when combining multiple approaches. These deficiencies hinder the effectiveness of machine learning models in addressing the performance challenges associated with predicting RNA modification sites. Here, we introduce a novel RNA sequence feature representation method, named BiPSTP, which utilizes bidirectional trinucleotide position-specific propensities. We employ the parameter ξ to denote the interval between the current nucleotide and its adjacent forward or backward dinucleotide, enabling the extraction of positional and sequential information from RNA sequences. Leveraging the BiPSTP method, we have developed the prediction model mRNAPred using support vector machine classifier to identify multiple types of RNA modification sites. We evaluate the performance of our BiPSTP method and mRNAPred model across 12 distinct RNA modification types. Our experimental results demonstrate the superiority of the mRNAPred model compared to state-of-art models in the domain of RNA modification sites identification. Importantly, our BiPSTP method enhances the robustness and generalization performance of prediction models. Notably, it can be applied to feature extraction from DNA sequences to predict other biological modification sites.
Collapse
Affiliation(s)
- Mingzhao Wang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Haider Ali
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yandi Xu
- School of Computer Science, Shaanxi Normal University, Xi'an, China; College of Life Sciences, Shaanxi Normal University, Xi'an, China
| | - Juanying Xie
- School of Computer Science, Shaanxi Normal University, Xi'an, China.
| | - Shengquan Xu
- College of Life Sciences, Shaanxi Normal University, Xi'an, China.
| |
Collapse
|
2
|
Jin Z, Sheng J, Hu Y, Zhang Y, Wang X, Huang Y. Shining a spotlight on m6A and the vital role of RNA modification in endometrial cancer: a review. Front Genet 2023; 14:1247309. [PMID: 37886684 PMCID: PMC10598767 DOI: 10.3389/fgene.2023.1247309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 09/19/2023] [Indexed: 10/28/2023] Open
Abstract
RNA modifications are mostly dynamically reversible post-transcriptional modifications, of which m6A is the most prevalent in eukaryotic mRNAs. A growing number of studies indicate that RNA modification can finely tune gene expression and modulate RNA metabolic homeostasis, which in turn affects the self-renewal, proliferation, apoptosis, migration, and invasion of tumor cells. Endometrial carcinoma (EC) is the most common gynecologic tumor in developed countries. Although it can be diagnosed early in the onset and have a preferable prognosis, some cases might develop and become metastatic or recurrent, with a worse prognosis. Fortunately, immunotherapy and targeted therapy are promising methods of treating endometrial cancer patients. Gene modifications may also contribute to these treatments, as is especially the case with recent developments of new targeted therapeutic genes and diagnostic biomarkers for EC, even though current findings on the relationship between RNA modification and EC are still very limited, especially m6A. For example, what is the elaborate mechanism by which RNA modification affects EC progression? Taking m6A modification as an example, what is the conversion mode of methylation and demethylation for RNAs, and how to achieve selective recognition of specific RNA? Understanding how they cope with various stimuli as part of in vivo and in vitro biological development, disease or tumor occurrence and development, and other processes is valuable and RNA modifications provide a distinctive insight into genetic information. The roles of these processes in coping with various stimuli, biological development, disease, or tumor development in vivo and in vitro are self-evident and may become a new direction for cancer in the future. In this review, we summarize the category, characteristics, and therapeutic precis of RNA modification, m6A in particular, with the purpose of seeking the systematic regulation axis related to RNA modification to provide a better solution for the treatment of EC.
Collapse
Affiliation(s)
- Zujian Jin
- Department of Gynecology and Obstetrics, The Fourth Affiliated Hospital, Zhejiang Provincial Clinical Research Center for Obstetrics and Gynecology, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Jingjing Sheng
- Department of Gynecology and Obstetrics, The Fourth Affiliated Hospital, Zhejiang Provincial Clinical Research Center for Obstetrics and Gynecology, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Yingying Hu
- Department of Gynecology and Obstetrics, The Fourth Affiliated Hospital, Zhejiang Provincial Clinical Research Center for Obstetrics and Gynecology, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Yu Zhang
- Department of Gynecology and Obstetrics, The Fourth Affiliated Hospital, Zhejiang Provincial Clinical Research Center for Obstetrics and Gynecology, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Xiaoxia Wang
- Reproductive Medicine Center, School of Medicine, The Fourth Affiliated Hospital, Zhejiang University, Yiwu, Zhejiang, China
| | - Yiping Huang
- Department of Gynecology and Obstetrics, The Fourth Affiliated Hospital, Zhejiang Provincial Clinical Research Center for Obstetrics and Gynecology, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| |
Collapse
|
3
|
Zhang X, Zhu WY, Shen SY, Shen JH, Chen XD. Biological roles of RNA m7G modification and its implications in cancer. Biol Direct 2023; 18:58. [PMID: 37710294 PMCID: PMC10500781 DOI: 10.1186/s13062-023-00414-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 09/07/2023] [Indexed: 09/16/2023] Open
Abstract
M7G modification, known as one of the common post-transcriptional modifications of RNA, is present in many different types of RNAs. With the accurate identification of m7G modifications within RNAs, their functional roles in the regulation of gene expression and different physiological functions have been revealed. In addition, there is growing evidence that m7G modifications are crucial in the emergence of cancer. Here, we review the most recent findings regarding the detection techniques, distribution, biological functions and Regulators of m7G. We also summarize the connections between m7G modifications and cancer development, drug resistance, and tumor microenvironment as well as we discuss the research's future directions and trends.
Collapse
Affiliation(s)
- Xin Zhang
- Department of Dermatology, Affiliated Hospital of Nantong University, Medical School of Nantong University, Nantong, China
| | - Wen-Yan Zhu
- Department of Dermatology, Affiliated Hospital of Nantong University, Medical School of Nantong University, Nantong, China
| | - Shu-Yi Shen
- Department of Dermatology, the First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Jia-Hao Shen
- Department of Dermatology, Affiliated Hospital of Nantong University, Medical School of Nantong University, Nantong, China
| | - Xiao-Dong Chen
- Department of Dermatology, Affiliated Hospital of Nantong University, Medical School of Nantong University, Nantong, China.
| |
Collapse
|
4
|
Zhang Y, Yu L, Jing R, Han B, Luo J. Fast and Efficient Design of Deep Neural Networks for Predicting N 7-Methylguanosine Sites Using autoBioSeqpy. ACS OMEGA 2023; 8:19728-19740. [PMID: 37305295 PMCID: PMC10249100 DOI: 10.1021/acsomega.3c01371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 05/10/2023] [Indexed: 06/13/2023]
Abstract
N7-Methylguanosine (m7G) is a crucial post-transcriptional RNA modification that plays a pivotal role in regulating gene expression. Accurately identifying m7G sites is a fundamental step in understanding the biological functions and regulatory mechanisms associated with this modification. While whole-genome sequencing is the gold standard for RNA modification site detection, it is a time-consuming, expensive, and intricate process. Recently, computational approaches, especially deep learning (DL) techniques, have gained popularity in achieving this objective. Convolutional neural networks and recurrent neural networks are examples of DL algorithms that have emerged as versatile tools for modeling biological sequence data. However, developing an efficient network architecture with superior performance remains a challenging task, requiring significant expertise, time, and effort. To address this, we previously introduced a tool called autoBioSeqpy, which streamlines the design and implementation of DL networks for biological sequence classification. In this study, we utilized autoBioSeqpy to develop, train, evaluate, and fine-tune sequence-level DL models for predicting m7G sites. We provided detailed descriptions of these models, along with a step-by-step guide on their execution. The same methodology can be applied to other systems dealing with similar biological questions. The benchmark data and code utilized in this study can be accessed for free at http://github.com/jingry/autoBioSeeqpy/tree/2.0/examples/m7G.
Collapse
Affiliation(s)
- Yonglin Zhang
- Department
of Pharmacy, Affiliated Hospital of North
Sichuan Medical College, Nanchong 637000, China
| | - Lezheng Yu
- School
of Chemistry and Materials Science, Guizhou
Education University, Guiyang 550024, China
| | - Runyu Jing
- School
of Cyber Science and Engineering, Sichuan
University, Chengdu 610017, China
| | - Bin Han
- GCP
Center/Institute of Drug Clinical Trials, Affiliated Hospital of North Sichuan Medical College, Nanchong 637503, China
| | - Jiesi Luo
- Basic
Medical College, Southwest Medical University, Luzhou 646099, Sichuan, China
- Key
Medical
Laboratory of New Drug Discovery and Druggability Evaluation, Luzhou
Key Laboratory of Activity Screening and Druggability Evaluation for
Chinese Materia Medica, Southwest Medical
University, Luzhou 646099, China
| |
Collapse
|
5
|
Liu Z, Lan P, Liu T, Liu X, Liu T. m6Aminer: Predicting the m6Am Sites on mRNA by Fusing Multiple Sequence-Derived Features into a CatBoost-Based Classifier. Int J Mol Sci 2023; 24:ijms24097878. [PMID: 37175594 PMCID: PMC10177809 DOI: 10.3390/ijms24097878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 04/20/2023] [Accepted: 04/24/2023] [Indexed: 05/15/2023] Open
Abstract
As one of the most important post-transcriptional modifications, m6Am plays a fairly important role in conferring mRNA stability and in the progression of cancers. The accurate identification of the m6Am sites is critical for explaining its biological significance and developing its application in the medical field. However, conventional experimental approaches are time-consuming and expensive, making them unsuitable for the large-scale identification of the m6Am sites. To address this challenge, we exploit a CatBoost-based method, m6Aminer, to identify the m6Am sites on mRNA. For feature extraction, nine different feature-encoding schemes (pseudo electron-ion interaction potential, hash decimal conversion method, dinucleotide binary encoding, nucleotide chemical properties, pseudo k-tuple composition, dinucleotide numerical mapping, K monomeric units, series correlation pseudo trinucleotide composition, and K-spaced nucleotide pair frequency) were utilized to form the initial feature space. To obtain the optimized feature subset, the ExtraTreesClassifier algorithm was adopted to perform feature importance ranking, and the top 300 features were selected as the optimal feature subset. With different performance assessment methods, 10-fold cross-validation and independent test, m6Aminer achieved average AUC of 0.913 and 0.754, demonstrating a competitive performance with the state-of-the-art models m6AmPred (0.905 and 0.735) and DLm6Am (0.897 and 0.730). The prediction model developed in this study can be used to identify the m6Am sites in the whole transcriptome, laying a foundation for the functional research of m6Am.
Collapse
Affiliation(s)
- Ze Liu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
| | - Pengfei Lan
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
| | - Ting Liu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
- Department of Mechanical Engineering, Faculty of Engineering, The University of Hong Kong, Hong Kong 999077, China
| | - Xudong Liu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
- College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
| | - Tao Liu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A&F University, Xianyang 712100, China
| |
Collapse
|
6
|
Xia X, Wang Y, Zheng JC. Internal m7G methylation: A novel epitranscriptomic contributor in brain development and diseases. MOLECULAR THERAPY. NUCLEIC ACIDS 2023; 31:295-308. [PMID: 36726408 PMCID: PMC9883147 DOI: 10.1016/j.omtn.2023.01.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
In recent years, N7-methylguanosine (m7G) methylation, originally considered as messenger RNA (mRNA) 5' caps modifications, has been identified at defined internal positions within multiple types of RNAs, including transfer RNAs, ribosomal RNAs, miRNA, and mRNAs. Scientists have put substantial efforts to discover m7G methyltransferases and methylated sites in RNAs to unveil the essential roles of m7G modifications in the regulation of gene expression and determine the association of m7G dysregulation in various diseases, including neurological disorders. Here, we review recent findings regarding the distribution, abundance, biogenesis, modifiers, and functions of m7G modifications. We also provide an up-to-date summary of m7G detection and profile mapping techniques, databases for validated and predicted m7G RNA sites, and web servers for m7G methylation prediction. Furthermore, we discuss the pathological roles of METTL1/WDR-driven m7G methylation in neurological disorders. Last, we outline a roadmap for future directions and trends of m7G modification research, particularly in the central nervous system.
Collapse
Affiliation(s)
- Xiaohuan Xia
- Center for Translational Neurodegeneration and Regenerative Therapy, Tongji Hospital affiliated to Tongji University School of Medicine, Shanghai 200072, China,Shanghai Frontiers Science Center of Nanocatalytic Medicine, Shanghai 200331, China,Corresponding author: Xiaohuan Xia, Center for Translational Neurodegeneration and Regenerative Therapy, Tongji Hospital affiliated to Tongji University School of Medicine, Shanghai 200065, China.
| | - Yi Wang
- Shanghai Frontiers Science Center of Nanocatalytic Medicine, Shanghai 200331, China,Translational Research Center, Shanghai Yangzhi Rehabilitation Hospital affiliated to Tongji University School of Medicine, Shanghai 201613, China
| | - Jialin C. Zheng
- Center for Translational Neurodegeneration and Regenerative Therapy, Tongji Hospital affiliated to Tongji University School of Medicine, Shanghai 200072, China,Shanghai Frontiers Science Center of Nanocatalytic Medicine, Shanghai 200331, China,Corresponding author: Jialin C. Zheng, Center for Translational Neurodegeneration and Regenerative Therapy, Tongji Hospital affiliated to Tongji University School of Medicine, Shanghai 200065, China.
| |
Collapse
|
7
|
Feng Q, Wang D, Xue T, Lin C, Gao Y, Sun L, Jin Y, Liu D. The role of RNA modification in hepatocellular carcinoma. Front Pharmacol 2022; 13:984453. [PMID: 36120301 PMCID: PMC9479111 DOI: 10.3389/fphar.2022.984453] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 08/11/2022] [Indexed: 12/25/2022] Open
Abstract
Hepatocellular carcinoma (HCC) is a highly mortal type of primary liver cancer. Abnormal epigenetic modifications are present in HCC, and RNA modification is dynamic and reversible and is a key post-transcriptional regulator. With the in-depth study of post-transcriptional modifications, RNA modifications are aberrantly expressed in human cancers. Moreover, the regulators of RNA modifications can be used as potential targets for cancer therapy. In RNA modifications, N6-methyladenosine (m6A), N7-methylguanosine (m7G), and 5-methylcytosine (m5C) and their regulators have important regulatory roles in HCC progression and represent potential novel biomarkers for the confirmation of diagnosis and treatment of HCC. This review focuses on RNA modifications in HCC and the roles and mechanisms of m6A, m7G, m5C, N1-methyladenosine (m1A), N3-methylcytosine (m3C), and pseudouridine (ψ) on its development and maintenance. The potential therapeutic strategies of RNA modifications are elaborated for HCC.
Collapse
Affiliation(s)
- Qiang Feng
- Laboratory Animal Center, College of Animal Science, Jilin University, Changchun, China
| | - Dongxu Wang
- Laboratory Animal Center, College of Animal Science, Jilin University, Changchun, China
| | - Tianyi Xue
- Laboratory Animal Center, College of Animal Science, Jilin University, Changchun, China
| | - Chao Lin
- School of Grain Science and Technology, Jilin Business and Technology College, Changchun, China
| | - Yongjian Gao
- Department of Gastrointestinal Colorectal and Anal Surgery, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Liqun Sun
- Department of Pediatrics, First Hospital of Jilin University, Changchun, China
| | - Ye Jin
- School of Pharmacy, Changchun University of Chinese Medicine, Changchun, China
| | - Dianfeng Liu
- Laboratory Animal Center, College of Animal Science, Jilin University, Changchun, China
- *Correspondence: Dianfeng Liu,
| |
Collapse
|
8
|
An Effective Deep Learning-Based Architecture for Prediction of N7-Methylguanosine Sites in Health Systems. ELECTRONICS 2022. [DOI: 10.3390/electronics11121917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
N7-methylguanosine (m7G) is one of the most important epigenetic modifications found in rRNA, mRNA, and tRNA, and performs a promising role in gene expression regulation. Owing to its significance, well-equipped traditional laboratory-based techniques have been performed for the identification of N7-methylguanosine (m7G). Consequently, these approaches were found to be time-consuming and cost-ineffective. To move on from these traditional approaches to predict N7-methylguanosine sites with high precision, the concept of artificial intelligence has been adopted. In this study, an intelligent computational model called N7-methylguanosine-Long short-term memory (m7G-LSTM) is introduced for the prediction of N7-methylguanosine sites. One-hot encoding and word2vec feature schemes are used to express the biological sequences while the LSTM and CNN algorithms have been employed for classification. The proposed “m7G-LSTM” model obtained an accuracy value of 95.95%, a specificity value of 95.94%, a sensitivity value of 95.97%, and Matthew’s correlation coefficient (MCC) value of 0.919. The proposed predictive m7G-LSTM model has significantly achieved better outcomes than previous models in terms of all evaluation parameters. The proposed m7G-LSTM computational system aims to support the drug industry and help researchers in the fields of bioinformatics to enhance innovation for the prediction of the behavior of N7-methylguanosine sites.
Collapse
|
9
|
Zou H. iAHTP-LH: Integrating Low-Order and High-Order Correlation Information for Identifying Antihypertensive Peptides. Int J Pept Res Ther 2022. [DOI: 10.1007/s10989-022-10414-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
10
|
Liu HY, Du PF. i5hmCVec: Identifying 5-Hydroxymethylcytosine Sites of Drosophila RNA Using Sequence Feature Embeddings. Front Genet 2022; 13:896925. [PMID: 35591855 PMCID: PMC9110757 DOI: 10.3389/fgene.2022.896925] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 03/30/2022] [Indexed: 01/27/2023] Open
Abstract
5-Hydroxymethylcytosine (5hmC), one of the most important RNA modifications, plays an important role in many biological processes. Accurately identifying RNA modification sites helps understand the function of RNA modification. In this work, we propose a computational method for identifying 5hmC-modified regions using machine learning algorithms. We applied a sequence feature embedding method based on the dna2vec algorithm to represent the RNA sequence. The results showed that the performance of our model is better that of than state-of-art methods. All dataset and source codes used in this study are available at: https://github.com/liu-h-y/5hmC_model.
Collapse
|
11
|
Shoombuatong W, Basith S, Pitti T, Lee G, Manavalan B. THRONE: a new approach for accurate prediction of human RNA N7-methylguanosine sites. J Mol Biol 2022; 434:167549. [DOI: 10.1016/j.jmb.2022.167549] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 03/08/2022] [Accepted: 03/10/2022] [Indexed: 12/30/2022]
|
12
|
Wang H, Wang S, Zhang Y, Bi S, Zhu X. A brief review of machine learning methods for RNA methylation sites prediction. Methods 2022; 203:399-421. [DOI: 10.1016/j.ymeth.2022.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 02/15/2022] [Accepted: 03/01/2022] [Indexed: 02/07/2023] Open
|
13
|
Zou H, Yang F, Yin Z. Identifying N7-methylguanosine sites by integrating multiple features. Biopolymers 2021; 113:e23480. [PMID: 34709657 DOI: 10.1002/bip.23480] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 10/12/2021] [Accepted: 10/14/2021] [Indexed: 11/10/2022]
Abstract
Recent studies reported that N7-methylguanosine (m7G) plays a vital role in gene expression regulation. As a consequence, determining the distribution of m7G is a crucial step towards further understanding its biological functions. Although biological experimental approaches are capable of accurately locating m7G sites, they are labor-intensive, costly, and time-consuming. Therefore, it is necessary to develop more effective and robust computational methods to replace, or at least complement current experimental methods. In this study, we developed a novel sequence-based computational tool to identify RNA m7G sites. In this model, 22 kinds of dinucleotide physicochemical (PC) properties were employed to encode the RNA sequence. Three types of descriptors, including auto-covariance, cross-covariance, and discrete wavelet transform were adopted to extract effective features from the PC matrix. The least absolute shrinkage and selection operator (LASSO) algorithm was utilized to reduce the influence of irrelevant or redundant features. Finally, these selected features were fed into a support vector machine (SVM) for distinguishing m7G from non-m7G sites. The proposed method significantly outperforms existing predictors across all evaluation metrics. It indicates that the approach is effective in identifying RNA m7G sites.
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Zhijian Yin
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| |
Collapse
|
14
|
Zou H, Yin Z. m7G-DPP: Identifying N7-methylguanosine sites based on dinucleotide physicochemical properties of RNA. Biophys Chem 2021; 279:106697. [PMID: 34628276 DOI: 10.1016/j.bpc.2021.106697] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 10/01/2021] [Accepted: 10/02/2021] [Indexed: 11/17/2022]
Abstract
N7-methylguanosine (m7G) modification is one of the most common post-transcriptional RNA modifications, which play vital role in the regulation of gene expression. Dysfunction of m7G may result to developmental defects and the appearance of some serious diseases. Thus, it is an urgent task to fast and accurate identifying m7G sites. In view of experimental approaches are costly and time-consuming, researchers focused their attention on computational models. Hence, in current study, we proposed a novel predictor called m7G-DPP to identify m7G sites. In the predictor, the RNA sequences were firstly encoded by physicochemical (PC) properties of dinucleotide. Then, sliding window approach was adopted to divide PC matrix into multiple matrixes, and Pearson's correlation coefficient (PCC), dynamic time warping (DTW), and distance correlation (DC) were employed to extract classification features at each window. Next, the least absolute shrinkage and selection operator (LASSO) algorithm was applied to select discriminative features. Finally, these selected features were fed into support vector machine to identify m7G sites. Experimental results showed that the proposed method is effective, which may play a complementary role in current m7G sites prediction studies. The MATLAB codes and dataset can be obtained from website at https://figshare.com/articles/online_resource/m7G-DPP/15000348.
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330003, China.
| | - Zhijian Yin
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330003, China
| |
Collapse
|
15
|
El Allali A, Elhamraoui Z, Daoud R. Machine learning applications in RNA modification sites prediction. Comput Struct Biotechnol J 2021; 19:5510-5524. [PMID: 34712397 PMCID: PMC8517552 DOI: 10.1016/j.csbj.2021.09.025] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/24/2021] [Accepted: 09/25/2021] [Indexed: 12/15/2022] Open
Abstract
Ribonucleic acid (RNA) modifications are post-transcriptional chemical composition changes that have a fundamental role in regulating the main aspect of RNA function. Recently, large datasets have become available thanks to the recent development in deep sequencing and large-scale profiling. This availability of transcriptomic datasets has led to increased use of machine learning based approaches in epitranscriptomics, particularly in identifying RNA modifications. In this review, we comprehensively explore machine learning based approaches used for the prediction of 11 RNA modification types, namely,m 1 A ,m 6 A ,m 5 C , 5 hmC , ψ , 2 ' - O - Me , ac 4 C ,m 7 G , A - to - I ,m 2 G , and D . This review covers the life cycle of machine learning methods to predict RNA modification sites including available benchmark datasets, feature extraction, and classification algorithms. We compare available methods in terms of datasets, target species, approach, and accuracy for each RNA modification type. Finally, we discuss the advantages and limitations of the reviewed approaches and suggest future perspectives.
Collapse
Affiliation(s)
- A. El Allali
- African Genome Center, University Mohamed VI Polytechnic, Morocco
| | - Zahra Elhamraoui
- African Genome Center, University Mohamed VI Polytechnic, Morocco
| | - Rachid Daoud
- African Genome Center, University Mohamed VI Polytechnic, Morocco
| |
Collapse
|
16
|
BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:7764764. [PMID: 34484416 PMCID: PMC8413034 DOI: 10.1155/2021/7764764] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 08/13/2021] [Indexed: 01/19/2023]
Abstract
As one of the most prevalent posttranscriptional modifications of RNA, N7-methylguanosine (m7G) plays an essential role in the regulation of gene expression. Accurate identification of m7G sites in the transcriptome is invaluable for better revealing their potential functional mechanisms. Although high-throughput experimental methods can locate m7G sites precisely, they are overpriced and time-consuming. Hence, it is imperative to design an efficient computational method that can accurately identify the m7G sites. In this study, we propose a novel method via incorporating BERT-based multilingual model in bioinformatics to represent the information of RNA sequences. Firstly, we treat RNA sequences as natural sentences and then employ bidirectional encoder representations from transformers (BERT) model to transform them into fixed-length numerical matrices. Secondly, a feature selection scheme based on the elastic net method is constructed to eliminate redundant features and retain important features. Finally, the selected feature subset is input into a stacking ensemble classifier to predict m7G sites, and the hyperparameters of the classifier are tuned with tree-structured Parzen estimator (TPE) approach. By 10-fold cross-validation, the performance of BERT-m7G is measured with an ACC of 95.48% and an MCC of 0.9100. The experimental results indicate that the proposed method significantly outperforms state-of-the-art prediction methods in the identification of m7G modifications.
Collapse
|
17
|
ANOX: A robust computational model for predicting the antioxidant proteins based on multiple features. Anal Biochem 2021; 631:114257. [PMID: 34043981 DOI: 10.1016/j.ab.2021.114257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 05/12/2021] [Accepted: 05/14/2021] [Indexed: 11/20/2022]
Abstract
As an indispensable component of various living organisms, the antioxidant proteins have been studied for anti-aging and prevention of various diseases, such as altitude sickness, coronary heart disease, and even cancer. However, the traditional experimental methods for identifying the antioxidant proteins are very expensive and time-consuming. Thus, to address the challenge, a new predictor, named ANOX, was developed in this study. Multiple features, such as frequency matrix features (FRE), amino acid and dipeptide composition (AADP), evolutionary difference formula features (EEDP), k-separated bigrams (KSB), and PSI-PRED secondary structure (PRED), were extracted to generate the original feature space. To find the optimized feature subset, the Max-Relevance-Max-Distance (MRMD) algorithm was implemented for feature ranking and our model received the best performance with the top 1170 features. Rigorous tests were performed to evaluate the performance of ANOX, and the results showed that ANOX achieved a major improvement in the prediction accuracy of the antioxidant proteins (AUC:0.930 and 0.935 using 5-fold cross-validation or the jackknife test) compared to the state-of-the-art predictor AOPs-SVM (AUC:0.869 and 0.885). The dataset used in this study and the source code of ANOX are all available at https://github.com/NWAFU-LiuLab/ANOX.
Collapse
|
18
|
Zhang L, Qin X, Liu M, Xu Z, Liu G. DNN-m6A: A Cross-Species Method for Identifying RNA N6-Methyladenosine Sites Based on Deep Neural Network with Multi-Information Fusion. Genes (Basel) 2021; 12:354. [PMID: 33670877 PMCID: PMC7997228 DOI: 10.3390/genes12030354] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 02/22/2021] [Accepted: 02/25/2021] [Indexed: 12/16/2022] Open
Abstract
As a prevalent existing post-transcriptional modification of RNA, N6-methyladenosine (m6A) plays a crucial role in various biological processes. To better radically reveal its regulatory mechanism and provide new insights for drug design, the accurate identification of m6A sites in genome-wide is vital. As the traditional experimental methods are time-consuming and cost-prohibitive, it is necessary to design a more efficient computational method to detect the m6A sites. In this study, we propose a novel cross-species computational method DNN-m6A based on the deep neural network (DNN) to identify m6A sites in multiple tissues of human, mouse and rat. Firstly, binary encoding (BE), tri-nucleotide composition (TNC), enhanced nucleic acid composition (ENAC), K-spaced nucleotide pair frequencies (KSNPFs), nucleotide chemical property (NCP), pseudo dinucleotide composition (PseDNC), position-specific nucleotide propensity (PSNP) and position-specific dinucleotide propensity (PSDP) are employed to extract RNA sequence features which are subsequently fused to construct the initial feature vector set. Secondly, we use elastic net to eliminate redundant features while building the optimal feature subset. Finally, the hyper-parameters of DNN are tuned with Bayesian hyper-parameter optimization based on the selected feature subset. The five-fold cross-validation test on training datasets show that the proposed DNN-m6A method outperformed the state-of-the-art method for predicting m6A sites, with an accuracy (ACC) of 73.58%-83.38% and an area under the curve (AUC) of 81.39%-91.04%. Furthermore, the independent datasets achieved an ACC of 72.95%-83.04% and an AUC of 80.79%-91.09%, which shows an excellent generalization ability of our proposed method.
Collapse
Affiliation(s)
- Lu Zhang
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China; (L.Z.); (X.Q.); (M.L.)
| | - Xinyi Qin
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China; (L.Z.); (X.Q.); (M.L.)
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China; (L.Z.); (X.Q.); (M.L.)
| | - Ziwei Xu
- Polytech Nantes, Bâtiment Ireste, 44300 Nantes, France;
| | - Guangzhong Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China; (L.Z.); (X.Q.); (M.L.)
| |
Collapse
|