1
|
Kurata H, Harun-Or-Roshid M, Mehedi Hasan M, Tsukiyama S, Maeda K, Manavalan B. MLm5C: A high-precision human RNA 5-methylcytosine sites predictor based on a combination of hybrid machine learning models. Methods 2024; 227:37-47. [PMID: 38729455 DOI: 10.1016/j.ymeth.2024.05.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 04/22/2024] [Accepted: 05/06/2024] [Indexed: 05/12/2024] Open
Abstract
RNA modification serves as a pivotal component in numerous biological processes. Among the prevalent modifications, 5-methylcytosine (m5C) significantly influences mRNA export, translation efficiency and cell differentiation and are also associated with human diseases, including Alzheimer's disease, autoimmune disease, cancer, and cardiovascular diseases. Identification of m5C is critically responsible for understanding the RNA modification mechanisms and the epigenetic regulation of associated diseases. However, the large-scale experimental identification of m5C present significant challenges due to labor intensity and time requirements. Several computational tools, using machine learning, have been developed to supplement experimental methods, but identifying these sites lack accuracy and efficiency. In this study, we introduce a new predictor, MLm5C, for precise prediction of m5C sites using sequence data. Briefly, we evaluated eleven RNA sequence-derived features with four basic machine learning algorithms to generate baseline models. From these 44 models, we ranked them based on their performance and subsequently stacked the Top 20 baseline models as the best model, named MLm5C. The MLm5C outperformed the-state-of-the-art predictors. Notably, the optimization of the sequence length surrounding the modification sites significantly improved the prediction performance. MLm5C is an invaluable tool in accelerating the detection of m5C sites within the human genome, thereby facilitating in the characterization of their roles in post-transcriptional regulation.
Collapse
Affiliation(s)
- Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| | - Md Harun-Or-Roshid
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Md Mehedi Hasan
- Division of Biotetecnology and Molecular Medicine, Department of Pathobiological Science, School of Veterinary Medicine, Lousiana State University, Baton Rouge, LA 70803, USA
| | - Sho Tsukiyama
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Kazuhiro Maeda
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Republic of Korea.
| |
Collapse
|
2
|
Wang M, Ali H, Xu Y, Xie J, Xu S. BiPSTP: Sequence feature encoding method for identifying different RNA modifications with bidirectional position-specific trinucleotides propensities. J Biol Chem 2024; 300:107140. [PMID: 38447795 PMCID: PMC10997841 DOI: 10.1016/j.jbc.2024.107140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/17/2024] [Accepted: 02/25/2024] [Indexed: 03/08/2024] Open
Abstract
RNA modification, a posttranscriptional regulatory mechanism, significantly influences RNA biogenesis and function. The accurate identification of modification sites is paramount for investigating their biological implications. Methods for encoding RNA sequence into numerical data play a crucial role in developing robust models for predicting modification sites. However, existing techniques suffer from limitations, including inadequate information representation, challenges in effectively integrating positional and sequential information, and the generation of irrelevant or redundant features when combining multiple approaches. These deficiencies hinder the effectiveness of machine learning models in addressing the performance challenges associated with predicting RNA modification sites. Here, we introduce a novel RNA sequence feature representation method, named BiPSTP, which utilizes bidirectional trinucleotide position-specific propensities. We employ the parameter ξ to denote the interval between the current nucleotide and its adjacent forward or backward dinucleotide, enabling the extraction of positional and sequential information from RNA sequences. Leveraging the BiPSTP method, we have developed the prediction model mRNAPred using support vector machine classifier to identify multiple types of RNA modification sites. We evaluate the performance of our BiPSTP method and mRNAPred model across 12 distinct RNA modification types. Our experimental results demonstrate the superiority of the mRNAPred model compared to state-of-art models in the domain of RNA modification sites identification. Importantly, our BiPSTP method enhances the robustness and generalization performance of prediction models. Notably, it can be applied to feature extraction from DNA sequences to predict other biological modification sites.
Collapse
Affiliation(s)
- Mingzhao Wang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Haider Ali
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yandi Xu
- School of Computer Science, Shaanxi Normal University, Xi'an, China; College of Life Sciences, Shaanxi Normal University, Xi'an, China
| | - Juanying Xie
- School of Computer Science, Shaanxi Normal University, Xi'an, China.
| | - Shengquan Xu
- College of Life Sciences, Shaanxi Normal University, Xi'an, China.
| |
Collapse
|
3
|
Jia J, Lei R, Qin L, Wei X. i5mC-DCGA: an improved hybrid network framework based on the CBAM attention mechanism for identifying promoter 5mC sites. BMC Genomics 2024; 25:242. [PMID: 38443802 PMCID: PMC10913688 DOI: 10.1186/s12864-024-10154-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 02/22/2024] [Indexed: 03/07/2024] Open
Abstract
BACKGROUND 5-Methylcytosine (5mC) plays a very important role in gene stability, transcription, and development. Therefore, accurate identification of the 5mC site is of key importance in genetic and pathological studies. However, traditional experimental methods for identifying 5mC sites are time-consuming and costly, so there is an urgent need to develop computational methods to automatically detect and identify these 5mC sites. RESULTS Deep learning methods have shown great potential in the field of 5mC sites, so we developed a deep learning combinatorial model called i5mC-DCGA. The model innovatively uses the Convolutional Block Attention Module (CBAM) to improve the Dense Convolutional Network (DenseNet), which is improved to extract advanced local feature information. Subsequently, we combined a Bidirectional Gated Recurrent Unit (BiGRU) and a Self-Attention mechanism to extract global feature information. Our model can learn feature representations of abstract and complex from simple sequence coding, while having the ability to solve the sample imbalance problem in benchmark datasets. The experimental results show that the i5mC-DCGA model achieves 97.02%, 96.52%, 96.58% and 85.58% in sensitivity (Sn), specificity (Sp), accuracy (Acc) and matthews correlation coefficient (MCC), respectively. CONCLUSIONS The i5mC-DCGA model outperforms other existing prediction tools in predicting 5mC sites, and it is currently the most representative promoter 5mC site prediction tool. The benchmark dataset and source code for the i5mC-DCGA model can be found in https://github.com/leirufeng/i5mC-DCGA .
Collapse
Grants
- Nos. 61761023, 62162032, and 31760315 National Natural Science Foundation of China
- Nos. 61761023, 62162032, and 31760315 National Natural Science Foundation of China
- Nos. 61761023, 62162032, and 31760315 National Natural Science Foundation of China
- Nos. 20202BABL202004 and 20202BAB202007 Natural Science Foundation of Jiangxi Province
- Nos. 20202BABL202004 and 20202BAB202007 Natural Science Foundation of Jiangxi Province
- Nos. 20202BABL202004 and 20202BAB202007 Natural Science Foundation of Jiangxi Province
- GJJ190695 and GJJ212419 Scientific Research Plan of the Department of Education of Jiangxi Province, China
- GJJ190695 and GJJ212419 Scientific Research Plan of the Department of Education of Jiangxi Province, China
- GJJ190695 and GJJ212419 Scientific Research Plan of the Department of Education of Jiangxi Province, China
- GJJ190695 and GJJ212419 Scientific Research Plan of the Department of Education of Jiangxi Province, China
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, 333403, Jingdezhen, China.
| | - Rufeng Lei
- School of Information Engineering, Jingdezhen Ceramic University, 333403, Jingdezhen, China.
| | - Lulu Qin
- School of Information Engineering, Jingdezhen Ceramic University, 333403, Jingdezhen, China
| | - Xin Wei
- Business School, Jiangxi Institute of Fashion Technology, 330044, Nanchang, China
| |
Collapse
|
4
|
Aslam I, Shah S, Jabeen S, ELAffendi M, A Abdel Latif A, Ul Haq N, Ali G. A CNN based m5c RNA methylation predictor. Sci Rep 2023; 13:21885. [PMID: 38081880 PMCID: PMC10713599 DOI: 10.1038/s41598-023-48751-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
Post-transcriptional modifications of RNA play a key role in performing a variety of biological processes, such as stability and immune tolerance, RNA splicing, protein translation and RNA degradation. One of these RNA modifications is m5c which participates in various cellular functions like RNA structural stability and translation efficiency, got popularity among biologists. By applying biological experiments to detect RNA m5c methylation sites would require much more efforts, time and money. Most of the researchers are using pre-processed RNA sequences of 41 nucleotides where the methylated cytosine is in the center. Therefore, it is possible that some of the information around these motif may have lost. The conventional methods are unable to process the RNA sequence directly due to high dimensionality and thus need optimized techniques for better features extraction. To handle the above challenges the goal of this study is to employ an end-to-end, 1D CNN based model to classify and interpret m5c methylated data sites. Moreover, our aim is to analyze the sequence in its full length where the methylated cytosine may not be in the center. The evaluation of the proposed architecture showed a promising results by outperforming state-of-the-art techniques in terms of sensitivity and accuracy. Our model achieve 96.70% sensitivity and 96.21% accuracy for 41 nucleotides sequences while 96.10% accuracy for full length sequences.
Collapse
Affiliation(s)
- Irum Aslam
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, KPK, Pakistan
| | - Sajid Shah
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| | - Saima Jabeen
- College of Engineering, AI Research Center, Alfaisal University, Riyadh, 50927, Saudi Arabia.
| | - Mohammed ELAffendi
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| | - Asmaa A Abdel Latif
- Public Health and Community Medicine Department (Industrial medicine and occupational health specialty, Faculty of Medicine, Menoufia University, Shibîn el Kôm, Egypt
| | - Nuhman Ul Haq
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, KPK, Pakistan
| | - Gauhar Ali
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| |
Collapse
|
5
|
Zhang Y, Ge F, Li F, Yang X, Song J, Yu DJ. Prediction of Multiple Types of RNA Modifications via Biological Language Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3205-3214. [PMID: 37289599 DOI: 10.1109/tcbb.2023.3283985] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
It has been demonstrated that RNA modifications play essential roles in multiple biological processes. Accurate identification of RNA modifications in the transcriptome is critical for providing insights into the biological functions and mechanisms. Many tools have been developed for predicting RNA modifications at single-base resolution, which employ conventional feature engineering methods that focus on feature design and feature selection processes that require extensive biological expertise and may introduce redundant information. With the rapid development of artificial intelligence technologies, end-to-end methods are favorably received by researchers. Nevertheless, each well-trained model is only suitable for a specific RNA methylation modification type for nearly all of these approaches. In this study, we present MRM-BERT by feeding task-specific sequences into the powerful BERT (Bidirectional Encoder Representations from Transformers) model and implementing fine-tuning, which exhibits competitive performance to the state-of-the-art methods. MRM-BERT avoids repeated de novo training of the model and can predict multiple RNA modifications such as pseudouridine, m6A, m5C, and m1A in Mus musculus, Arabidopsis thaliana, and Saccharomyces cerevisiae. In addition, we analyse the attention heads to provide high attention regions for the prediction, and conduct saturated in silico mutagenesis of the input sequences to discover potential changes of RNA modifications, which can better assist researchers in their follow-up research.
Collapse
|
6
|
Abbas Z, Rehman MU, Tayara H, Zou Q, Chong KT. XGBoost framework with feature selection for the prediction of RNA N5-methylcytosine sites. Mol Ther 2023; 31:2543-2551. [PMID: 37271991 PMCID: PMC10422016 DOI: 10.1016/j.ymthe.2023.05.016] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/06/2023] [Accepted: 05/31/2023] [Indexed: 06/06/2023] Open
Abstract
5-methylcytosine (m5C) is indeed a critical post-transcriptional alteration that is widely present in various kinds of RNAs and is crucial to the fundamental biological processes. By correctly identifying the m5C-methylation sites on RNA, clinicians can more clearly comprehend the precise function of these m5C-sites in different biological processes. Due to their effectiveness and affordability, computational methods have received greater attention over the last few years for the identification of methylation sites in various species. To precisely identify RNA m5C locations in five different species including Homo sapiens, Arabidopsis thaliana, Mus musculus, Drosophila melanogaster, and Danio rerio, we proposed a more effective and accurate model named m5C-pred. To create m5C-pred, five distinct feature encoding techniques were combined to extract features from the RNA sequence, and then we used SHapley Additive exPlanations to choose the best features among them, followed by XGBoost as a classifier. We applied the novel optimization method called Optuna to quickly and efficiently determine the best hyperparameters. Finally, the proposed model was evaluated using independent test datasets, and we compared the results with the previous methods. Our approach, m5C- pred, is anticipated to be useful for accurately identifying m5C sites, outperforming the currently available state-of-the-art techniques.
Collapse
Affiliation(s)
- Zeeshan Abbas
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Mobeen Ur Rehman
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea; Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea.
| |
Collapse
|
7
|
Jia J, Qin L, Lei R. DGA-5mC: A 5-methylcytosine site prediction model based on an improved DenseNet and bidirectional GRU method. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:9759-9780. [PMID: 37322910 DOI: 10.3934/mbe.2023428] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
The 5-methylcytosine (5mC) in the promoter region plays a significant role in biological processes and diseases. A few high-throughput sequencing technologies and traditional machine learning algorithms are often used by researchers to detect 5mC modification sites. However, high-throughput identification is laborious, time-consuming and expensive; moreover, the machine learning algorithms are not so advanced. Therefore, there is an urgent need to develop a more efficient computational approach to replace those traditional methods. Since deep learning algorithms are more popular and have powerful computational advantages, we constructed a novel prediction model, called DGA-5mC, to identify 5mC modification sites in promoter regions by using a deep learning algorithm based on an improved densely connected convolutional network (DenseNet) and the bidirectional GRU approach. Furthermore, we added a self-attention module to evaluate the importance of various 5mC features. The deep learning-based DGA-5mC model algorithm automatically handles large proportions of unbalanced data for both positive and negative samples, highlighting the model's reliability and superiority. So far as the authors are aware, this is the first time that the combination of an improved DenseNet and bidirectional GRU methods has been used to predict the 5mC modification sites in promoter regions. It can be seen that the DGA-5mC model, after using a combination of one-hot coding, nucleotide chemical property coding and nucleotide density coding, performed well in terms of sensitivity, specificity, accuracy, the Matthews correlation coefficient (MCC), area under the curve and Gmean in the independent test dataset: 90.19%, 92.74%, 92.54%, 64.64%, 96.43% and 91.46%, respectively. In addition, all datasets and source codes for the DGA-5mC model are freely accessible at https://github.com/lulukoss/DGA-5mC.
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Lulu Qin
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Rufeng Lei
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| |
Collapse
|
8
|
Bu Y, Zheng J, Jia C. An efficient deep learning based predictor for identifying miRNA-triggered phasiRNA loci in plant. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:6853-6865. [PMID: 37161131 DOI: 10.3934/mbe.2023295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Phasic small interfering RNAs are plant secondary small interference RNAs that typically generated by the convergence of miRNAs and polyadenylated mRNAs. A growing number of studies have shown that miRNA-initiated phasiRNA plays crucial roles in regulating plant growth and stress responses. Experimental verification of miRNA-initiated phasiRNA loci may take considerable time, energy and labor. Therefore, computational methods capable of processing high throughput data have been proposed one by one. In this work, we proposed a predictor (DIGITAL) for identifying miRNA-initiated phasiRNAs in plant, which combined a multi-scale residual network with a bi-directional long-short term memory network. The negative dataset was constructed based on positive data, through replacing 60% of nucleotides randomly in each positive sample. Our predictor achieved the accuracy of 98.48% and 94.02% respectively on two independent test datasets with different sequence length. These independent testing results indicate the effectiveness of our model. Furthermore, DIGITAL is of robustness and generalization ability, and thus can be easily extended and applied for miRNA target recognition of other species. We provide the source code of DIGITAL, which is freely available at https://github.com/yuanyuanbu/DIGITAL.
Collapse
Affiliation(s)
- Yuanyuan Bu
- School of Science, Dalian Maritimr University, Dalian 116026, China
| | - Jia Zheng
- School of Science, Dalian Maritimr University, Dalian 116026, China
| | - Cangzhi Jia
- School of Science, Dalian Maritimr University, Dalian 116026, China
| |
Collapse
|
9
|
Abstract
The epitranscriptome, defined as RNA modifications that do not involve alterations in the nucleotide sequence, is a popular topic in the genomic sciences. Because we need massive computational techniques to identify epitranscriptomes within individual transcripts, many tools have been developed to infer epitranscriptomic sites as well as to process datasets using high-throughput sequencing. In this review, we summarize recent developments in epitranscriptome spatial detection and data analysis and discuss their progression.
Collapse
Affiliation(s)
- Y-H Taguchi
- Department of Physics, Chuo University, Tokyo, Japan
| |
Collapse
|
10
|
Guo X, Li F, Song J. Predicting Pseudouridine Sites with Porpoise. Methods Mol Biol 2023; 2624:139-151. [PMID: 36723814 DOI: 10.1007/978-1-0716-2962-8_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Pseudouridine is a ubiquitous RNA modification and plays a crucial role in many biological processes. However, it remains a challenging task to identify pseudouridine sites using expensive and time-consuming experimental research. To this end, we present Porpoise, a computational approach to identify pseudouridine sites from RNA sequence data. Porpoise builds on a stacking ensemble learning framework with several informative features and achieves competitive performance compared with state-of-the-art approaches. This protocol elaborates on step-by-step use and execution of the local stand-alone version and the webserver of Porpoise. In addition, we also provide a general machine learning framework that can help identify the optimal stacking ensemble learning model using different combinations of feature-based features. This general machine learning framework can facilitate users to build their pseudouridine predictors using their in-house datasets.
Collapse
Affiliation(s)
- Xudong Guo
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Fuyi Li
- College of Information Engineering, Northwest A&F University, Yangling, China.
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC, Australia.
| | - Jiangning Song
- Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia.
- Monash Data Futures Institute, Monash University, Melbourne, VIC, Australia.
| |
Collapse
|
11
|
Catacalos C, Krohannon A, Somalraju S, Meyer KD, Janga SC, Chakrabarti K. Epitranscriptomics in parasitic protists: Role of RNA chemical modifications in posttranscriptional gene regulation. PLoS Pathog 2022; 18:e1010972. [PMID: 36548245 PMCID: PMC9778586 DOI: 10.1371/journal.ppat.1010972] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
"Epitranscriptomics" is the new RNA code that represents an ensemble of posttranscriptional RNA chemical modifications, which can precisely coordinate gene expression and biological processes. There are several RNA base modifications, such as N6-methyladenosine (m6A), 5-methylcytosine (m5C), and pseudouridine (Ψ), etc. that play pivotal roles in fine-tuning gene expression in almost all eukaryotes and emerging evidences suggest that parasitic protists are no exception. In this review, we primarily focus on m6A, which is the most abundant epitranscriptomic mark and regulates numerous cellular processes, ranging from nuclear export, mRNA splicing, polyadenylation, stability, and translation. We highlight the universal features of spatiotemporal m6A RNA modifications in eukaryotic phylogeny, their homologs, and unique processes in 3 unicellular parasites-Plasmodium sp., Toxoplasma sp., and Trypanosoma sp. and some technological advances in this rapidly developing research area that can significantly improve our understandings of gene expression regulation in parasites.
Collapse
Affiliation(s)
- Cassandra Catacalos
- Department of Biological Sciences, University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| | - Alexander Krohannon
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis (IUPUI), Indianapolis, Indiana, United States of America
| | - Sahiti Somalraju
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis (IUPUI), Indianapolis, Indiana, United States of America
| | - Kate D. Meyer
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - Sarath Chandra Janga
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis (IUPUI), Indianapolis, Indiana, United States of America
| | - Kausik Chakrabarti
- Department of Biological Sciences, University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
12
|
Zou J, Liu H, Tan W, Chen YQ, Dong J, Bai SY, Wu ZX, Zeng Y. Dynamic regulation and key roles of ribonucleic acid methylation. Front Cell Neurosci 2022; 16:1058083. [PMID: 36601431 PMCID: PMC9806184 DOI: 10.3389/fncel.2022.1058083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open
Abstract
Ribonucleic acid (RNA) methylation is the most abundant modification in biological systems, accounting for 60% of all RNA modifications, and affects multiple aspects of RNA (including mRNAs, tRNAs, rRNAs, microRNAs, and long non-coding RNAs). Dysregulation of RNA methylation causes many developmental diseases through various mechanisms mediated by N 6-methyladenosine (m6A), 5-methylcytosine (m5C), N 1-methyladenosine (m1A), 5-hydroxymethylcytosine (hm5C), and pseudouridine (Ψ). The emerging tools of RNA methylation can be used as diagnostic, preventive, and therapeutic markers. Here, we review the accumulated discoveries to date regarding the biological function and dynamic regulation of RNA methylation/modification, as well as the most popularly used techniques applied for profiling RNA epitranscriptome, to provide new ideas for growth and development.
Collapse
Affiliation(s)
- Jia Zou
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Hui Liu
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Wei Tan
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
| | - Yi-qi Chen
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Jing Dong
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Shu-yuan Bai
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Zhao-xia Wu
- Community Health Service Center, Wuchang Hospital, Wuhan, China
| | - Yan Zeng
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China,School of Public Health, Wuhan University of Science and Technology, Wuhan, China,*Correspondence: Yan Zeng,
| |
Collapse
|
13
|
Li X, Zhang S, Shi H. An improved residual network using deep fusion for identifying RNA 5-methylcytosine sites. Bioinformatics 2022; 38:4271-4277. [PMID: 35866985 DOI: 10.1093/bioinformatics/btac532] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 06/30/2022] [Accepted: 07/21/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION 5-Methylcytosine (m5C) is a crucial post-transcriptional modification. With the development of technology, it is widely found in various RNAs. Numerous studies have indicated that m5C plays an essential role in various activities of organisms, such as tRNA recognition, stabilization of RNA structure, RNA metabolism and so on. Traditional identification is costly and time-consuming by wet biological experiments. Therefore, computational models are commonly used to identify the m5C sites. Due to the vast computing advantages of deep learning, it is feasible to construct the predictive model through deep learning algorithms. RESULTS In this study, we construct a model to identify m5C based on a deep fusion approach with an improved residual network. First, sequence features are extracted from the RNA sequences using Kmer, K-tuple nucleotide frequency component (KNFC), Pseudo dinucleotide composition (PseDNC) and Physical and chemical property (PCP). Kmer and KNFC extract information from a statistical point of view. PseDNC and PCP extract information from the physicochemical properties of RNA sequences. Then, two parts of information are fused with new features using bidirectional long- and short-term memory and attention mechanisms, respectively. Immediately after, the fused features are fed into the improved residual network for classification. Finally, 10-fold cross-validation and independent set testing are used to verify the credibility of the model. The results show that the accuracy reaches 91.87%, 95.55%, 92.27% and 95.60% on the training sets and independent test sets of Arabidopsis thaliana and M.musculus, respectively. This is a considerable improvement compared to previous studies and demonstrates the robust performance of our model. AVAILABILITY AND IMPLEMENTATION The data and code related to the study are available at https://github.com/alivelxj/m5c-DFRESG.
Collapse
Affiliation(s)
- Xinjie Li
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, P. R. China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, P. R. China
| | - Hongyan Shi
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, P. R. China
| |
Collapse
|
14
|
Thi Phan L, Woo Park H, Pitti T, Madhavan T, Jeon YJ, Manavalan B. MLACP 2.0: An updated machine learning tool for anticancer peptide prediction. Comput Struct Biotechnol J 2022; 20:4473-4480. [PMID: 36051870 PMCID: PMC9421197 DOI: 10.1016/j.csbj.2022.07.043] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 07/25/2022] [Accepted: 07/25/2022] [Indexed: 12/24/2022] Open
Abstract
We present a novel meta-approach, MLACP 2.0, and implement it as a user-friendly webserver for the accurate identification of ACPs. MLACP 2.0 employed 11 different encoding schemes and eight different classifiers, including convolutional neural networks, to create a stable meta-model. Benchmarking study has demonstrated that MLACP 2.0 achieves superior performance in ACP prediction compared to publicly available state-of-the-art predictors.
Anticancer peptides are emerging anticancer drug that offers fewer side effects and is more effective than chemotherapy and targeted therapy. Predicting anticancer peptides from sequence information is one of the most challenging tasks in immunoinformatics. In the past ten years, machine learning-based approaches have been proposed for identifying ACP activity from peptide sequences. These methods include our previous method MLACP (developed in 2017) which made a significant impact on anticancer research. MLACP tool has been widely used by the research community, however, its robustness must be improved significantly for its continued practical application. In this study, the first large non-redundant training and independent datasets were constructed for ACP research. Using the training dataset, the study explored a wide range of feature encodings and developed their respective models using seven different conventional classifiers. Subsequently, a subset of encoding-based models was selected for each classifier based on their performance, whose predicted scores were concatenated and trained through a convolutional neural network (CNN), whose corresponding predictor is named MLACP 2.0. The evaluation of MLACP 2.0 with a very diverse independent dataset showed excellent performance and significantly outperformed the recent ACP prediction tools. Additionally, MLACP 2.0 exhibits superior performance during cross-validation and independent assessment when compared to CNN-based embedding models and conventional single models. Consequently, we anticipate that our proposed MLACP 2.0 will facilitate the design of hypothesis-driven experiments by making it easier to discover novel ACPs. The MLACP 2.0 is freely available at https://balalab-skku.org/mlacp2.
Collapse
|
15
|
Hasan MM, Tsukiyama S, Cho JY, Kurata H, Alam MA, Liu X, Manavalan B, Deng HW. Deepm5C: A deep learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy. Mol Ther 2022; 30:2856-2867. [PMID: 35526094 PMCID: PMC9372321 DOI: 10.1016/j.ymthe.2022.05.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 04/25/2022] [Accepted: 05/03/2022] [Indexed: 11/30/2022] Open
Abstract
As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C), plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications in order to gain a deeper understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models have been developed using small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we propose Deepm5C, a bioinformatics method to identify RNA m5C sites in the throughout human genome. To develop Deepm5C, we constructed a novel benchmarking dataset and investigated a mixture of three conventional feature encoding algorithms and a feature derived from word embedding approaches. Afterwards, four variants of deep learning classifiers and four commonly used conventional classifiers were employed and trained with the four encodings, ultimately obtaining 32 baseline models. A stacking strategy is effectively utilized by integrating the predicted output of the optimal baseline models and trained with a 1-D convolutional neural network. As a result, the Deepm5C predictor achieved excellent performance during cross-validation with a Matthews correlation coefficient and accuracy of 0.697 and 0.855, respectively. The corresponding metrics during the independent test were 0.691 and 0.852, respectively. Overall, Deepm5C achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, Deepm5C is expected to assist community-wide efforts in identifying putative m5Cs and formulate the novel testable biological hypothesis.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA.
| | - Sho Tsukiyama
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Jae Youl Cho
- Molecular Immunology Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Korea
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA
| | - Xiaowen Liu
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Korea.
| | - Hong-Wen Deng
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA.
| |
Collapse
|
16
|
He Z, Xu J, Shi H, Wu S. m5CRegpred: Epitranscriptome Target Prediction of 5-Methylcytosine (m5C) Regulators Based on Sequencing Features. Genes (Basel) 2022; 13:genes13040677. [PMID: 35456483 PMCID: PMC9025882 DOI: 10.3390/genes13040677] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 04/02/2022] [Accepted: 04/05/2022] [Indexed: 02/04/2023] Open
Abstract
5-methylcytosine (m5C) is a common post-transcriptional modification observed in a variety of RNAs. m5C has been demonstrated to be important in a variety of biological processes, including RNA structural stability and metabolism. Driven by the importance of m5C modification, many projects focused on the m5C sites prediction were reported before. To better understand the upstream and downstream regulation of m5C, we present a bioinformatics framework, m5CRegpred, to predict the substrate of m5C writer NSUN2 and m5C readers YBX1 and ALYREF for the first time. After features comparison, window lengths selection and algorism comparison on the mature mRNA model, our model achieved AUROC scores 0.869, 0.724 and 0.889 for NSUN2, YBX1 and ALYREF, respectively in an independent test. Our work suggests the substrate of m5C regulators can be distinguished and may help the research of m5C regulators in a special condition, such as substrates prediction of hyper- or hypo-expressed m5C regulators in human disease.
Collapse
Affiliation(s)
- Zhizhou He
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China; (Z.H.); (J.X.)
- Department of Molecular, Cell, and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jing Xu
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China; (Z.H.); (J.X.)
| | - Haoran Shi
- Research Center for BioSystems, Land Use, and Nutrition (IFZ), Institute of Applied Microbiology, Justus-Liebig-University Giessen, Heinrich-Buff-Ring 26-32, 35392 Giessen, Germany
- Correspondence: (H.S.); (S.W.)
| | - Shuxiang Wu
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China; (Z.H.); (J.X.)
- Fujian Key Laboratory of Tumor Microbiology, Department of Medical Microbiology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China
- Correspondence: (H.S.); (S.W.)
| |
Collapse
|
17
|
Biological roles of RNA m 5C modification and its implications in Cancer immunotherapy. Biomark Res 2022; 10:15. [PMID: 35365216 PMCID: PMC8973801 DOI: 10.1186/s40364-022-00362-8] [Citation(s) in RCA: 67] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 03/03/2022] [Indexed: 01/08/2023] Open
Abstract
Epigenetics including DNA and RNA modifications have always been the hotspot field of life sciences in the post-genome era. Since the first mapping of N6-methyladenosine (m6A) and the discovery of its widespread presence in mRNA, there are at least 160-170 RNA modifications have been discovered. These methylations occur in different RNA types, and their distribution is species-specific. 5-methylcytosine (m5C) has been found in mRNA, rRNA and tRNA of representative organisms from all kinds of species. As reversible epigenetic modifications, m5C modifications of RNA affect the fate of the modified RNA molecules and play important roles in various biological processes including RNA stability control, protein synthesis, and transcriptional regulation. Furthermore, accumulative evidence also implicates the role of RNA m5C in tumorigenesis. Here, we review the latest progresses in the biological roles of m5C modifications and how it is regulated by corresponding “writers”, “readers” and “erasers” proteins, as well as the potential molecular mechanism in tumorigenesis and cancer immunotherapy.
Collapse
|
18
|
Liu Y, Shen Y, Wang H, Zhang Y, Zhu X. m5Cpred-XS: A New Method for Predicting RNA m5C Sites Based on XGBoost and SHAP. Front Genet 2022; 13:853258. [PMID: 35432446 PMCID: PMC9005994 DOI: 10.3389/fgene.2022.853258] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 02/16/2022] [Indexed: 11/13/2022] Open
Abstract
As one of the most important post-transcriptional modifications of RNA, 5-cytosine-methylation (m5C) is reported to closely relate to many chemical reactions and biological functions in cells. Recently, several computational methods have been proposed for identifying m5C sites. However, the accuracy and efficiency are still not satisfactory. In this study, we proposed a new method, m5Cpred-XS, for predicting m5C sites of H. sapiens, M. musculus, and A. thaliana. First, the powerful SHAP method was used to select the optimal feature subset from seven different kinds of sequence-based features. Second, different machine learning algorithms were used to train the models. The results of five-fold cross-validation indicate that the model based on XGBoost achieved the highest prediction accuracy. Finally, our model was compared with other state-of-the-art models, which indicates that m5Cpred-XS is superior to other methods. Moreover, we deployed the model on a web server that can be accessed through http://m5cpred-xs.zhulab.org.cn/, and m5Cpred-XS is expected to be a useful tool for studying m5C sites.
Collapse
Affiliation(s)
| | | | | | - Yong Zhang
- *Correspondence: Xiaolei Zhu, ; Yong Zhang,
| | | |
Collapse
|
19
|
Wang H, Wang S, Zhang Y, Bi S, Zhu X. A brief review of machine learning methods for RNA methylation sites prediction. Methods 2022; 203:399-421. [DOI: 10.1016/j.ymeth.2022.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 02/15/2022] [Accepted: 03/01/2022] [Indexed: 02/07/2023] Open
|