1
|
Wang H, Huang T, Wang D, Zeng W, Sun Y, Zhang L. MSCAN: multi-scale self- and cross-attention network for RNA methylation site prediction. BMC Bioinformatics 2024; 25:32. [PMID: 38233745 PMCID: PMC10795237 DOI: 10.1186/s12859-024-05649-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 01/11/2024] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND Epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all RNA types. Precise recognition of RNA modifications is critical for understanding their functions and regulatory mechanisms. However, wet experimental methods are often costly and time-consuming, limiting their wide range of applications. Therefore, recent research has focused on developing computational methods, particularly deep learning (DL). Bidirectional long short-term memory (BiLSTM), convolutional neural network (CNN), and the transformer have demonstrated achievements in modification site prediction. However, BiLSTM cannot achieve parallel computation, leading to a long training time, CNN cannot learn the dependencies of the long distance of the sequence, and the Transformer lacks information interaction with sequences at different scales. This insight underscores the necessity for continued research and development in natural language processing (NLP) and DL to devise an enhanced prediction framework that can effectively address the challenges presented. RESULTS This study presents a multi-scale self- and cross-attention network (MSCAN) to identify the RNA methylation site using an NLP and DL way. Experiment results on twelve RNA modification sites (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um) reveal that the area under the receiver operating characteristic of MSCAN obtains respectively 98.34%, 85.41%, 97.29%, 96.74%, 99.04%, 79.94%, 76.22%, 65.69%, 92.92%, 92.03%, 95.77%, 89.66%, which is better than the state-of-the-art prediction model. This indicates that the model has strong generalization capabilities. Furthermore, MSCAN reveals a strong association among different types of RNA modifications from an experimental perspective. A user-friendly web server for predicting twelve widely occurring human RNA modification sites (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um) is available at http://47.242.23.141/MSCAN/index.php . CONCLUSIONS A predictor framework has been developed through binary classification to predict RNA methylation sites.
Collapse
Affiliation(s)
- Honglei Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
- School of Information Engineering, Xuzhou College of Industrial Technology, Xuzhou, 221400, China
| | - Tao Huang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Dong Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Wenliang Zeng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yanjing Sun
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
| |
Collapse
|
2
|
Lang X, Yu C, Shen M, Gu L, Qian Q, Zhou D, Tan J, Li Y, Peng X, Diao S, Deng Z, Ruan Z, Xu Z, Xing J, Li C, Wang R, Ding C, Cao Y, Liu Q. PRMD: an integrated database for plant RNA modifications. Nucleic Acids Res 2024; 52:D1597-D1613. [PMID: 37831097 PMCID: PMC10768107 DOI: 10.1093/nar/gkad851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 08/23/2023] [Accepted: 09/23/2023] [Indexed: 10/14/2023] Open
Abstract
The scope and function of RNA modifications in model plant systems have been extensively studied, resulting in the identification of an increasing number of novel RNA modifications in recent years. Researchers have gradually revealed that RNA modifications, especially N6-methyladenosine (m6A), which is one of the most abundant and commonly studied RNA modifications in plants, have important roles in physiological and pathological processes. These modifications alter the structure of RNA, which affects its molecular complementarity and binding to specific proteins, thereby resulting in various of physiological effects. The increasing interest in plant RNA modifications has necessitated research into RNA modifications and associated datasets. However, there is a lack of a convenient and integrated database with comprehensive annotations and intuitive visualization of plant RNA modifications. Here, we developed the Plant RNA Modification Database (PRMD; http://bioinformatics.sc.cn/PRMD and http://rnainformatics.org.cn/PRMD) to facilitate RNA modification research. This database contains information regarding 20 plant species and provides an intuitive interface for displaying information. Moreover, PRMD offers multiple tools, including RMlevelDiff, RMplantVar, RNAmodNet and Blast (for functional analyses), and mRNAbrowse, RNAlollipop, JBrowse and Integrative Genomics Viewer (for displaying data). Furthermore, PRMD is freely available, making it useful for the rapid development and promotion of research on plant RNA modifications.
Collapse
Affiliation(s)
- Xiaoqiang Lang
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, College of Life Science, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Chunyan Yu
- Frontiers Science Center for Disease-related Molecular Network, Laboratory of Omics Technology and Bioinformatics, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Mengyuan Shen
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Guangdong Key Laboratory of New Technology in Rice Breeding, Guangdong Rice Engineering Laboratory, Guangzhou, 510640, China
| | - Lei Gu
- Epigenetics Laboratory, Max Planck Institute for Heart and Lung Research & Cardiopulmonary Institute (CPI). Parkstr.1 61231 Bad Nauheim Germany
| | - Qian Qian
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Guangdong Key Laboratory of New Technology in Rice Breeding, Guangdong Rice Engineering Laboratory, Guangzhou, 510640, China
| | - Degui Zhou
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Guangdong Key Laboratory of New Technology in Rice Breeding, Guangdong Rice Engineering Laboratory, Guangzhou, 510640, China
| | - Jiantao Tan
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Guangdong Key Laboratory of New Technology in Rice Breeding, Guangdong Rice Engineering Laboratory, Guangzhou, 510640, China
| | - Yiliang Li
- Guangdong Provincial Key Laboratory of Silviculture, Protection and Utilization/Guangdong Academy of Forestry, Guangzhou, Guangdong 510520, China
| | - Xin Peng
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Guangdong Key Laboratory of New Technology in Rice Breeding, Guangdong Rice Engineering Laboratory, Guangzhou, 510640, China
| | - Shu Diao
- Research Institute of Subtropical Forestry, Chinese Academy of Forestry, Hangzhou, China
| | - Zhujun Deng
- Precision Medicine Center, Precision Medicine Key Laboratory of Sichuan Province, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Zhaohui Ruan
- Sun Yat-sen University Cancer Center, State Key Laboratory Oncology in South China, Collaborative Innovation Center of Cancer Medicine, 510060, Guangzhou, China
| | - Zhi Xu
- Guangxi Key Laboratory of Images and Graphics Intelligent Processing, Guilin University of Electronics Technology, Guilin, 541004, China
| | - Junlian Xing
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Guangdong Key Laboratory of New Technology in Rice Breeding, Guangdong Rice Engineering Laboratory, Guangzhou, 510640, China
| | - Chen Li
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Guangdong Key Laboratory of New Technology in Rice Breeding, Guangdong Rice Engineering Laboratory, Guangzhou, 510640, China
| | - Runfeng Wang
- Guangdong Provincial Key Laboratory of Crop Genetic Improvement, Crops Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Changjun Ding
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| | - Yi Cao
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, College of Life Science, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Qi Liu
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Guangdong Key Laboratory of New Technology in Rice Breeding, Guangdong Rice Engineering Laboratory, Guangzhou, 510640, China
| |
Collapse
|
3
|
Bai J, Yang H, Wu C. MLACNN: an attention mechanism-based CNN architecture for predicting genome-wide DNA methylation. Theory Biosci 2023; 142:359-370. [PMID: 37648910 PMCID: PMC10564812 DOI: 10.1007/s12064-023-00402-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 07/31/2023] [Indexed: 09/01/2023]
Abstract
Methylation is an important epigenetic regulation of methylation genes that plays a crucial role in regulating biological processes. While traditional methods for detecting methylation in biological experiments are constantly improving, the development of artificial intelligence has led to the emergence of deep learning and machine learning methods as a new trend. However, traditional machine learning-based methods rely heavily on manual feature extraction, and most deep learning methods for studying methylation extract fewer features due to their simple network structures. To address this, we propose a bottomneck network based on an attention mechanism and use new methods to ensure that the deep network can learn more effective features while minimizing overfitting. This approach enables the model to learn more features from nucleotide sequences and make better predictions of methylation. The model uses three coding methods to encode the original DNA sequence and then applies feature fusion based on attention mechanisms to obtain the best fusion method. Our results demonstrate that MLACNN outperforms previous methods and achieves more satisfactory performance.
Collapse
Affiliation(s)
- JianGuo Bai
- Shandong Jiaotong University, Jinan City, Shandong Province China
| | - Hai Yang
- Shandong Jiaotong University, Jinan City, Shandong Province China
| | - ChangDe Wu
- Shandong Jiaotong University, Jinan City, Shandong Province China
| |
Collapse
|
4
|
Xiang S, Zhang T, Wu M. M6ATMR: identifying N6-methyladenosine sites through RNA sequence similarity matrix reconstruction guided by Transformer. PeerJ 2023; 11:e15899. [PMID: 37719113 PMCID: PMC10501384 DOI: 10.7717/peerj.15899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 07/24/2023] [Indexed: 09/19/2023] Open
Abstract
Numerous studies have focused on the classification of N6-methyladenosine (m6A) modification sites in RNA sequences, treating it as a multi-feature extraction task. In these studies, the incorporation of physicochemical properties of nucleotides has been applied to enhance recognition efficacy. However, the introduction of excessive supplementary information may introduce noise to the RNA sequence features, and the utilization of sequence similarity information remains underexplored. In this research, we present a novel method for RNA m6A modification site recognition called M6ATMR. Our approach relies solely on sequence information, leveraging Transformer to guide the reconstruction of the sequence similarity matrix, thereby enhancing feature representation. Initially, M6ATMR encodes RNA sequences using 3-mers to generate the sequence similarity matrix. Meanwhile, Transformer is applied to extract sequence structure graphs for each RNA sequence. Subsequently, to capture low-dimensional representations of similarity matrices and structure graphs, we introduce a graph self-correlation convolution block. These representations are then fused and reconstructed through the local-global fusion block. Notably, we adopt iteratively updated sequence structure graphs to continuously optimize the similarity matrix, thereby constraining the end-to-end feature extraction process. Finally, we employ the random forest (RF) algorithm for identifying m6A modification sites based on the reconstructed features. Experimental results demonstrate that M6ATMR achieves promising performance by solely utilizing RNA sequences for m6A modification site identification. Our proposed method can be considered an effective complement to existing RNA m6A modification site recognition approaches.
Collapse
Affiliation(s)
- Shuang Xiang
- Changjiang Water Resources and Hydropower Development Group, Wuhan, Hubei, China
| | - Te Zhang
- Changjiang Water Resources and Hydropower Development Group, Wuhan, Hubei, China
| | - Minghao Wu
- Changjiang Water Resources and Hydropower Development Group, Wuhan, Hubei, China
| |
Collapse
|
5
|
Meng Q, Schatten H, Zhou Q, Chen J. Crosstalk between m6A and coding/non-coding RNA in cancer and detection methods of m6A modification residues. Aging (Albany NY) 2023; 15:6577-6619. [PMID: 37437245 PMCID: PMC10373953 DOI: 10.18632/aging.204836] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 06/15/2023] [Indexed: 07/14/2023]
Abstract
N6-methyladenosine (m6A) is one of the most common and well-known internal RNA modifications that occur on mRNAs or ncRNAs. It affects various aspects of RNA metabolism, including splicing, stability, translocation, and translation. An abundance of evidence demonstrates that m6A plays a crucial role in various pathological and biological processes, especially in tumorigenesis and tumor progression. In this article, we introduce the potential functions of m6A regulators, including "writers" that install m6A marks, "erasers" that demethylate m6A, and "readers" that determine the fate of m6A-modified targets. We have conducted a review on the molecular functions of m6A, focusing on both coding and noncoding RNAs. Additionally, we have compiled an overview of the effects noncoding RNAs have on m6A regulators and explored the dual roles of m6A in the development and advancement of cancer. Our review also includes a detailed summary of the most advanced databases for m6A, state-of-the-art experimental and sequencing detection methods, and machine learning-based computational predictors for identifying m6A sites.
Collapse
Affiliation(s)
- Qingren Meng
- National Clinical Research Center for Infectious Diseases, Shenzhen Third People’s Hospital, The Second Hospital Affiliated with the Southern University of Science and Technology, Shenzhen, Guangdong Province, China
| | - Heide Schatten
- Department of Veterinary Pathobiology, University of Missouri, Columbia, MO 65211, USA
| | - Qian Zhou
- International Cancer Center, Shenzhen University Medical School, Shenzhen, Guangdong Province, China
| | - Jun Chen
- National Clinical Research Center for Infectious Diseases, Shenzhen Third People’s Hospital, The Second Hospital Affiliated with the Southern University of Science and Technology, Shenzhen, Guangdong Province, China
| |
Collapse
|
6
|
Acera Mateos P, Zhou Y, Zarnack K, Eyras E. Concepts and methods for transcriptome-wide prediction of chemical messenger RNA modifications with machine learning. Brief Bioinform 2023; 24:7150742. [PMID: 37139545 DOI: 10.1093/bib/bbad163] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 03/03/2023] [Indexed: 05/05/2023] Open
Abstract
The expanding field of epitranscriptomics might rival the epigenome in the diversity of biological processes impacted. In recent years, the development of new high-throughput experimental and computational techniques has been a key driving force in discovering the properties of RNA modifications. Machine learning applications, such as for classification, clustering or de novo identification, have been critical in these advances. Nonetheless, various challenges remain before the full potential of machine learning for epitranscriptomics can be leveraged. In this review, we provide a comprehensive survey of machine learning methods to detect RNA modifications using diverse input data sources. We describe strategies to train and test machine learning methods and to encode and interpret features that are relevant for epitranscriptomics. Finally, we identify some of the current challenges and open questions about RNA modification analysis, including the ambiguity in predicting RNA modifications in transcript isoforms or in single nucleotides, or the lack of complete ground truth sets to test RNA modifications. We believe this review will inspire and benefit the rapidly developing field of epitranscriptomics in addressing the current limitations through the effective use of machine learning.
Collapse
Affiliation(s)
- Pablo Acera Mateos
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
| | - You Zhou
- Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
- Institute of Molecular Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
| | - Kathi Zarnack
- Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
- Institute of Molecular Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
| |
Collapse
|
7
|
Abstract
The epitranscriptome, defined as RNA modifications that do not involve alterations in the nucleotide sequence, is a popular topic in the genomic sciences. Because we need massive computational techniques to identify epitranscriptomes within individual transcripts, many tools have been developed to infer epitranscriptomic sites as well as to process datasets using high-throughput sequencing. In this review, we summarize recent developments in epitranscriptome spatial detection and data analysis and discuss their progression.
Collapse
Affiliation(s)
- Y-H Taguchi
- Department of Physics, Chuo University, Tokyo, Japan
| |
Collapse
|
8
|
Zou J, Liu H, Tan W, Chen YQ, Dong J, Bai SY, Wu ZX, Zeng Y. Dynamic regulation and key roles of ribonucleic acid methylation. Front Cell Neurosci 2022; 16:1058083. [PMID: 36601431 PMCID: PMC9806184 DOI: 10.3389/fncel.2022.1058083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open
Abstract
Ribonucleic acid (RNA) methylation is the most abundant modification in biological systems, accounting for 60% of all RNA modifications, and affects multiple aspects of RNA (including mRNAs, tRNAs, rRNAs, microRNAs, and long non-coding RNAs). Dysregulation of RNA methylation causes many developmental diseases through various mechanisms mediated by N 6-methyladenosine (m6A), 5-methylcytosine (m5C), N 1-methyladenosine (m1A), 5-hydroxymethylcytosine (hm5C), and pseudouridine (Ψ). The emerging tools of RNA methylation can be used as diagnostic, preventive, and therapeutic markers. Here, we review the accumulated discoveries to date regarding the biological function and dynamic regulation of RNA methylation/modification, as well as the most popularly used techniques applied for profiling RNA epitranscriptome, to provide new ideas for growth and development.
Collapse
Affiliation(s)
- Jia Zou
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Hui Liu
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Wei Tan
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
| | - Yi-qi Chen
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Jing Dong
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Shu-yuan Bai
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Zhao-xia Wu
- Community Health Service Center, Wuchang Hospital, Wuhan, China
| | - Yan Zeng
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China,School of Public Health, Wuhan University of Science and Technology, Wuhan, China,*Correspondence: Yan Zeng,
| |
Collapse
|
9
|
Luo Z, Lou L, Qiu W, Xu Z, Xiao X. Predicting N6-Methyladenosine Sites in Multiple Tissues of Mammals through Ensemble Deep Learning. Int J Mol Sci 2022; 23:ijms232415490. [PMID: 36555143 PMCID: PMC9778682 DOI: 10.3390/ijms232415490] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/03/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022] Open
Abstract
N6-methyladenosine (m6A) is the most abundant within eukaryotic messenger RNA modification, which plays an essential regulatory role in the control of cellular functions and gene expression. However, it remains an outstanding challenge to detect mRNA m6A transcriptome-wide at base resolution via experimental approaches, which are generally time-consuming and expensive. Developing computational methods is a good strategy for accurate in silico detection of m6A modification sites from the large amount of RNA sequence data. Unfortunately, the existing computational models are usually only for m6A site prediction in a single species, without considering the tissue level of species, while most of them are constructed based on low-confidence level data generated by an m6A antibody immunoprecipitation (IP)-based sequencing method, thereby restricting reliability and generalizability of proposed models. Here, we review recent advances in computational prediction of m6A sites and construct a new computational approach named im6APred using ensemble deep learning to accurately identify m6A sites based on high-confidence level data in multiple tissues of mammals. Our model im6APred builds upon a comprehensive evaluation of multiple classification methods, including four traditional classification algorithms and three deep learning methods and their ensembles. The optimal base-classifier combinations are then chosen by five-fold cross-validation test to achieve an effective stacked model. Our model im6APred can produce the area under the receiver operating characteristic curve (AUROC) in the range of 0.82-0.91 on independent tests, indicating that our model has the ability to learn general methylation rules on RNA bases and generalize to m6A transcriptome-wide identification. Moreover, AUROCs in the range of 0.77-0.96 were achieved using cross-species/tissues validation on the benchmark dataset, demonstrating differences in predictive performance at the tissue level and the need for constructing tissue-specific models for m6A site prediction.
Collapse
|
10
|
Wang H, Zhao S, Cheng Y, Bi S, Zhu X. MTDeepM6A-2S: A two-stage multi-task deep learning method for predicting RNA N6-methyladenosine sites of Saccharomyces cerevisiae. Front Microbiol 2022; 13:999506. [PMID: 36274691 PMCID: PMC9579691 DOI: 10.3389/fmicb.2022.999506] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/16/2022] [Indexed: 11/13/2022] Open
Abstract
N6-methyladenosine (m6A) is one of the most important RNA modifications, which is involved in many biological activities. Computational methods have been developed to detect m6A sites due to their high efficiency and low costs. As one of the most widely utilized model organisms, many methods have been developed for predicting m6A sites of Saccharomyces cerevisiae. However, the generalization of these methods was hampered by the limited size of the benchmark datasets. On the other hand, over 60,000 low resolution m6A sites and more than 10,000 base resolution m6A sites of Saccharomyces cerevisiae are recorded in RMBase and m6A-Atlas, respectively. The base resolution m6A sites are often obtained from low resolution results by post calibration. In view of these, we proposed a two-stage deep learning method, named MTDeepM6A-2S, to predict RNA m6A sites of Saccharomyces cerevisiae based on RNA sequence information. In the first stage, a multi-task model with convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) deep framework was built to not only detect the low resolution m6A sites but also assign a reasonable probability for the predicted site. In the second stage, a transfer-learning strategy was used to build the model to predict the base resolution m6A sites from those low resolution m6A sites. The effectiveness of our model was validated on both training and independent test sets. The results show that our model outperforms other state-of-the-art models on the independent test set, which indicates that our model holds high potential to become a useful tool for epitranscriptomics analysis.
Collapse
|
11
|
Ma L, He LN, Kang S, Gu B, Gao S, Zuo Z. Advances in detecting N6-methyladenosine modification in circRNAs. Methods 2022; 205:234-246. [PMID: 35878749 DOI: 10.1016/j.ymeth.2022.07.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 07/15/2022] [Accepted: 07/18/2022] [Indexed: 12/14/2022] Open
Abstract
Circular RNAs (circRNAs) are a class of noncoding RNAs with covalently single-stranded closed loop structures derived from back-splicing event of linear precursor mRNAs (pre-mRNAs). N6-methyladenosine (m6A), the most abundant epigenetic modification in eukaryotic RNAs, has been shown to play a crucial role in regulating the fate and biological function of circRNAs, and thus affecting various physiological and pathological processes. Accurate identification of m6A modification in circRNAs is an essential step to fully elucidate the crosstalk between m6A and circRNAs. In recent years, the rapid development of high-throughput sequencing technology and bioinformatic methodology has propelled the establishment of a multitude of approaches to detect circRNAs and m6A modification, including in vitro-based and in silico methods. Based on this, the research community has started on a new journey to develop methods for identification of m6A modification in circRNAs. In this review, we provide a comprehensive review and evaluation of the existing methods responsible for detecting circRNAs, m6A modification, and especially, m6A modification in circRNAs, which mainly focused on those developed based on high-throughput technologies and methodology of bioinformatics. This handy reference can help researchers figure out towards which direction this field will go.
Collapse
Affiliation(s)
- Lixia Ma
- State Key Laboratory of Esophageal Cancer Prevention & Treatment, Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment, Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital (College of Clinical Medical) of Henan University of Science and Technology, Luoyang, China
| | - Li-Na He
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Shiyang Kang
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Bianli Gu
- State Key Laboratory of Esophageal Cancer Prevention & Treatment, Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment, Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital (College of Clinical Medical) of Henan University of Science and Technology, Luoyang, China
| | - Shegan Gao
- State Key Laboratory of Esophageal Cancer Prevention & Treatment, Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment, Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital (College of Clinical Medical) of Henan University of Science and Technology, Luoyang, China.
| | - Zhixiang Zuo
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China.
| |
Collapse
|
12
|
Yang X, Patil S, Joshi S, Jamla M, Kumar V. Exploring epitranscriptomics for crop improvement and environmental stress tolerance. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2022; 183:56-71. [PMID: 35567875 DOI: 10.1016/j.plaphy.2022.04.031] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 04/27/2022] [Accepted: 04/30/2022] [Indexed: 06/15/2023]
Abstract
Climate change and stressful environmental conditions severely hamper crop growth, development and yield. Plants respond to environmental perturbations, through their plasticity provided by key-genes, governed at post-/transcriptional levels. Gene-regulation in plants is a multilevel process controlled by diverse cellular entities that includes transcription factors (TF), epigenetic regulators and non-coding RNAs beside others. There are successful studies confirming the role of epigenetic modifications (DNA-methylation/histone-modifications) in gene expression. Recent years have witnessed emergence of a highly specialized field the "Epitranscriptomics". Epitranscriptomics deals with investigating post-transcriptional RNA chemical-modifications present across the life forms that change structural, functional and biological characters of RNA. However, deeper insights on of epitranscriptomic modifications, with >140 types known so far, are to be understood fully. Researchers have identified epitranscriptome marks (writers, erasers and readers) and mapped the site-specific RNA modifications (m6A, m5C, 3' uridylation, etc.) responsible for fine-tuning gene expression in plants. Simultaneous advancement in sequencing platforms, upgraded bioinformatic tools and pipelines along with conventional labelled techniques have further given a statistical picture of these epitranscriptomic modifications leading to their potential applicability in crop improvement and developing climate-smart crops. We present herein the insights on epitranscriptomic machinery in plants and how epitranscriptome and epitranscriptomic modifications underlying plant growth, development and environmental stress responses/adaptations. Third-generation sequencing technology, advanced bioinformatics tools and databases being used in plant epitranscriptomics are also discussed. Emphasis is given on potential exploration of epitranscriptome engineering for crop-improvement and developing environmental stress tolerant plants covering current status, challenges and future directions.
Collapse
Affiliation(s)
- Xiangbo Yang
- College of Agriculture, Jilin Agricultural Science and Technology University, Jilin, 132101, PR China.
| | - Suraj Patil
- Department of Biotechnology, Modern College of Arts, Science and Commerce, Savitribai Phule Pune University, Ganeshkhind, Pune, 411016, India
| | - Shrushti Joshi
- Department of Biotechnology, Modern College of Arts, Science and Commerce, Savitribai Phule Pune University, Ganeshkhind, Pune, 411016, India
| | - Monica Jamla
- Department of Biotechnology, Modern College of Arts, Science and Commerce, Savitribai Phule Pune University, Ganeshkhind, Pune, 411016, India
| | - Vinay Kumar
- Department of Biotechnology, Modern College of Arts, Science and Commerce, Savitribai Phule Pune University, Ganeshkhind, Pune, 411016, India.
| |
Collapse
|
13
|
CNNLSTMac4CPred: A Hybrid Model for N4-Acetylcytidine Prediction. Interdiscip Sci 2022; 14:439-451. [PMID: 35106702 DOI: 10.1007/s12539-021-00500-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 12/04/2021] [Accepted: 12/13/2021] [Indexed: 12/23/2022]
Abstract
N4-Acetylcytidine (ac4C) is a highly conserved post-transcriptional and an extensively existing RNA modification, playing versatile roles in the cellular processes. Due to the limitation of techniques and knowledge, large-scale identification of ac4C is still a challenging task. RNA sequences are like sentences containing semantics in the natural language. Inspired by the semantics of language, we proposed a hybrid model for ac4C prediction. The model used long short-term memory and convolution neural network to extract the semantic features hidden in the sequences. The semantic and the two traditional features (k-nucleotide frequencies and pseudo tri-tuple nucleotide composition) were combined to represent ac4C or non-ac4C sequences. The eXtreme Gradient Boosting was used as the learning algorithm. Five-fold cross-validation over the training set consisting of 1160 ac4C and 10,855 non-ac4C sequences obtained the area under the receiver operating characteristic curve (AUROC) of 0.9004, and the independent test over 469 ac4C and 4343 non-ac4C sequences reached an AUROC of 0.8825. The model obtained a sensitivity of 0.6474 in the five-fold cross-validation and 0.6290 in the independent test, outperforming two state-of-the-art methods. The performance of semantic features alone was better than those of k-nucleotide frequencies and pseudo tri-tuple nucleotide composition, implying that ac4C sequences are of semantics. The proposed hybrid model was implemented into a user-friendly web-server which is freely available to scientific communities: http://47.113.117.61/ac4c/ . The presented model and tool are beneficial to identify ac4C on large scale.
Collapse
|
14
|
Yu B, Zhang Y, Wang X, Gao H, Sun J, Gao X. Identification of DNA modification sites based on elastic net and bidirectional gated recurrent unit with convolutional neural network. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103566] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
15
|
m6A-Finder: Detecting m6A methylation sites from RNA transcriptomes using physical and statistical properties based features. Comput Biol Chem 2022; 97:107640. [DOI: 10.1016/j.compbiolchem.2022.107640] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 11/25/2021] [Accepted: 02/07/2022] [Indexed: 11/23/2022]
|
16
|
Wang H, Wang S, Zhang Y, Bi S, Zhu X. A brief review of machine learning methods for RNA methylation sites prediction. Methods 2022; 203:399-421. [DOI: 10.1016/j.ymeth.2022.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 02/15/2022] [Accepted: 03/01/2022] [Indexed: 02/07/2023] Open
|
17
|
Cui C, Wu X, Zhou Y. GlyinsRNA: a webserver for predicting glycosylation sites on small RNAs. RNA Biol 2021; 18:600-603. [PMID: 34559595 DOI: 10.1080/15476286.2021.1982574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
Abstract
Versatile RNA modifications play important roles in post-transcriptional regulations of gene expression, among which glycosylation modifications on small RNAs emerge as a novel clade whose characteristics need further interrogations. Here, we demonstrated that the sequence pattern around RNA glycosylation sites was not random and could be exploited for glycosylation site prediction. A machine learning predictor, GlyinsRNA, which integrated multiple RNA sequence representation encodings, was established. GlyinsRNA achieved AUROC (area under the receiver operating characteristic curve) of 0.7933 and 0.7979 in five-fold cross-validation and independent tests, respectively. GlyinsRNA was implemented as an online webserver, where both the predicted glycosylation sites and the overrepresented RNA-binding protein (RBP)-related motifs were annotated to facilitate the users. GlyinsRNA webserver is freely available at http://www.rnanut.net/glyinsrna.
Collapse
Affiliation(s)
- Chunmei Cui
- Department of Biomedical Informatics, Moe Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Xiaobin Wu
- Department of Biomedical Informatics, Moe Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Yuan Zhou
- Department of Biomedical Informatics, Moe Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, Beijing, China
| |
Collapse
|
18
|
Li J, He S, Guo F, Zou Q. HSM6AP: a high-precision predictor for the Homo sapiens N6-methyladenosine (m^6 A) based on multiple weights and feature stitching. RNA Biol 2021; 18:1882-1892. [PMID: 33446014 PMCID: PMC8583144 DOI: 10.1080/15476286.2021.1875180] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 12/02/2020] [Accepted: 01/08/2021] [Indexed: 01/21/2023] Open
Abstract
Recent studies have shown that RNA methylation modification can affect RNA transcription, metabolism, splicing and stability. In addition, RNA methylation modification has been associated with cancer, obesity and other diseases. Based on information about human genome and machine learning, this paper discusses the effect of the fusion sequence and gene-level feature extraction on the accuracy of methylation site recognition. The significant limitation of existing computing tools was exposed by discovered of new features. (1) Most prediction models are based solely on sequence features and use SVM or random forest as classification methods. (2) Limited by the number of samples, the model may not achieve good performance. In order to establish a better prediction model for methylation sites, we must set specific weighting strategies for training samples and find more powerful and informative feature matrices to establish a comprehensive model. In this paper, we present HSM6AP, a high-precision predictor for the Homo sapiens N6-methyladenosine (m 6 A ) based on multiple weights and feature stitching. Compared with existing methods, HSM6AP samples were creatively weighted during training, and a wide range of features were explored. Max-Relevance-Max-Distance (MRMD) is employed for feature selection, and the feature matrix is generated by fusing a single feature. The extreme gradient boosting (XGBoost), an integrated machine learning algorithm based on decision tree, is used for model training and improves model performance through parameter adjustment. Two rigorous independent data sets demonstrated the superiority of HSM6AP in identifying methylation sites. HSM6AP is an advanced predictor that can be directly employed by users (especially non-professional users) to predict methylation sites. Users can access our related tools and data sets at the following website: http://lab.malab.cn/~lijing/HSM6AP.html The codes of our tool can be publicly accessible at https://github.com/lijingtju/HSm6AP.git.
Collapse
Affiliation(s)
- Jing Li
- Institute of computational biology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Shida He
- Institute of computational biology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- Institute of computational biology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Bioinformatics Laboratory, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
19
|
Islam N, Park J. bCNN-Methylpred: Feature-Based Prediction of RNA Sequence Modification Using Branch Convolutional Neural Network. Genes (Basel) 2021; 12:genes12081155. [PMID: 34440330 PMCID: PMC8392086 DOI: 10.3390/genes12081155] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 07/24/2021] [Accepted: 07/26/2021] [Indexed: 11/16/2022] Open
Abstract
RNA modification is vital to various cellular and biological processes. Among the existing RNA modifications, N6-methyladenosine (m6A) is considered the most important modification owing to its involvement in many biological processes. The prediction of m6A sites is crucial because it can provide a better understanding of their functional mechanisms. In this regard, although experimental methods are useful, they are time consuming. Previously, researchers have attempted to predict m6A sites using computational methods to overcome the limitations of experimental methods. Some of these approaches are based on classical machine-learning techniques that rely on handcrafted features and require domain knowledge, whereas other methods are based on deep learning. However, both methods lack robustness and yield low accuracy. Hence, we develop a branch-based convolutional neural network and a novel RNA sequence representation. The proposed network automatically extracts features from each branch of the designated inputs. Subsequently, these features are concatenated in the feature space to predict the m6A sites. Finally, we conduct experiments using four different species. The proposed approach outperforms existing state-of-the-art methods, achieving accuracies of 94.91%, 94.28%, 88.46%, and 94.8% for the H. sapiens, M. musculus, S. cerevisiae, and A. thaliana datasets, respectively.
Collapse
Affiliation(s)
- Naeem Islam
- Core Research Institute of Intelligent Robots, Jeonbuk National University, Jeonju 54896, Korea;
- College of Electrical & Mechanical Engineering, NUST, Islamabad 44000, Pakistan
| | - Jaebyung Park
- Core Research Institute of Intelligent Robots, Jeonbuk National University, Jeonju 54896, Korea;
- Division of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Korea
- Correspondence: ; Tel.: +82-63-270-4283
| |
Collapse
|
20
|
Wang M, Xie J, Xu S. M6A-BiNP: predicting N 6-methyladenosine sites based on bidirectional position-specific propensities of polynucleotides and pointwise joint mutual information. RNA Biol 2021; 18:2498-2512. [PMID: 34161188 PMCID: PMC8632114 DOI: 10.1080/15476286.2021.1930729] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
N6-methyladenosine (m6A) plays an important role in various biological processes. Identifying m6A site is a key step in exploring its biological functions. One of the biggest challenges in identifying m6A sites is how to extract features comprising rich categorical information to distinguish m6A and non-m6A sites. To address this challenge, we propose bidirectional dinucleotide and trinucleotide position-specific propensities, respectively, in this paper. Based on this, we propose two feature-encoding algorithms: Position-Specific Propensities and Pointwise Mutual Information (PSP-PMI) and Position-Specific Propensities and Pointwise Joint Mutual Information (PSP-PJMI). PSP-PMI is based on the bidirectional dinucleotide propensity and the pointwise mutual information, while PSP-PJMI is based on the bidirectional trinucleotide position-specific propensity and the proposed pointwise joint mutual information in this paper. We introduce parameters α and β in PSP-PMI and PSP-PJMI, respectively, to represent the distance from the nucleotide to its forward or backward adjacent nucleotide or dinucleotide, so as to extract features containing local and global classification information. Finally, we propose the M6A-BiNP predictor based on PSP-PMI or PSP-PJMI and SVM classifier. The 10-fold cross-validation experimental results on the benchmark datasets of non-single-base resolution and single-base resolution demonstrate that PSP-PMI and PSP-PJMI can extract features with strong capabilities to identify m6A and non-m6A sites. The M6A-BiNP predictor based on our proposed feature encoding algorithm PSP-PJMI is better than the state-of-the-art predictors, and it is so far the best model to identify m6A and non-m6A sites.
Collapse
Affiliation(s)
- Mingzhao Wang
- College of Life Sciences, Shaanxi Normal University, Xi'an, China.,School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Juanying Xie
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Shengquan Xu
- College of Life Sciences, Shaanxi Normal University, Xi'an, China
| |
Collapse
|
21
|
EDLm 6APred: ensemble deep learning approach for mRNA m 6A site prediction. BMC Bioinformatics 2021; 22:288. [PMID: 34051729 PMCID: PMC8164815 DOI: 10.1186/s12859-021-04206-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 05/18/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND As a common and abundant RNA methylation modification, N6-methyladenosine (m6A) is widely spread in various species' transcriptomes, and it is closely related to the occurrence and development of various life processes and diseases. Thus, accurate identification of m6A methylation sites has become a hot topic. Most biological methods rely on high-throughput sequencing technology, which places great demands on the sequencing library preparation and data analysis. Thus, various machine learning methods have been proposed to extract various types of features based on sequences, then occupied conventional classifiers, such as SVM, RF, etc., for m6A methylation site identification. However, the identification performance relies heavily on the extracted features, which still need to be improved. RESULTS This paper mainly studies feature extraction and classification of m6A methylation sites in a natural language processing way, which manages to organically integrate the feature extraction and classification simultaneously, with consideration of upstream and downstream information of m6A sites. One-hot, RNA word embedding, and Word2vec are adopted to depict sites from the perspectives of the base as well as its upstream and downstream sequence. The BiLSTM model, a well-known sequence model, was then constructed to discriminate the sequences with potential m6A sites. Since the above-mentioned three feature extraction methods focus on different perspectives of m6A sites, an ensemble deep learning predictor (EDLm6APred) was finally constructed for m6A site prediction. Experimental results on human and mouse data sets show that EDLm6APred outperforms the other single ones, indicating that base, upstream, and downstream information are all essential for m6A site detection. Compared with the existing m6A methylation site prediction models without genomic features, EDLm6APred obtains 86.6% of the area under receiver operating curve on the human data sets, indicating the effectiveness of sequential modeling on RNA. To maximize user convenience, a webserver was developed as an implementation of EDLm6APred and made publicly available at www.xjtlu.edu.cn/biologicalsciences/EDLm6APred . CONCLUSIONS Our proposed EDLm6APred method is a reliable predictor for m6A methylation sites.
Collapse
|
22
|
Epigenetics: Roles and therapeutic implications of non-coding RNA modifications in human cancers. MOLECULAR THERAPY. NUCLEIC ACIDS 2021; 25:67-82. [PMID: 34188972 PMCID: PMC8217334 DOI: 10.1016/j.omtn.2021.04.021] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
As next-generation sequencing (NGS) is leaping forward, more than 160 covalent RNA modification processes have been reported, and they are widely present in every organism and overall RNA type. Many modification processes of RNA introduce a new layer to the gene regulation process, resulting in novel RNA epigenetics. The commonest RNA modification includes pseudouridine (Ψ), N 7-methylguanosine (m7G), 5-hydroxymethylcytosine (hm5C), 5-methylcytosine (m5C), N 1-methyladenosine (m1A), N 6-methyladenosine (m6A), and others. In this study, we focus on non-coding RNAs (ncRNAs) to summarize the epigenetic consequences of RNA modifications, and the pathogenesis of cancer, as diagnostic markers and therapeutic targets for cancer, as well as the mechanisms affecting the immune environment of cancer. In addition, we summarize the current status of epigenetic drugs for tumor therapy based on ncRNA modifications and the progress of bioinformatics methods in elucidating RNA modifications in recent years.
Collapse
|
23
|
Dai C, Feng P, Cui L, Su R, Chen W, Wei L. Iterative feature representation algorithm to improve the predictive performance of N7-methylguanosine sites. Brief Bioinform 2020; 22:5964186. [PMID: 33169141 DOI: 10.1093/bib/bbaa278] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 09/11/2020] [Accepted: 09/21/2020] [Indexed: 01/13/2023] Open
Abstract
MOTIVATION N7-methylguanosine (m7G) is an important epigenetic modification, playing an essential role in gene expression regulation. Therefore, accurate identification of m7G modifications will facilitate revealing and in-depth understanding their potential functional mechanisms. Although high-throughput experimental methods are capable of precisely locating m7G sites, they are still cost ineffective. Therefore, it's necessary to develop new methods to identify m7G sites. RESULTS In this work, by using the iterative feature representation algorithm, we developed a machine learning based method, namely m7G-IFL, to identify m7G sites. To demonstrate its superiority, m7G-IFL was evaluated and compared with existing predictors. The results demonstrate that our predictor outperforms existing predictors in terms of accuracy for identifying m7G sites. By analyzing and comparing the features used in the predictors, we found that the positive and negative samples in our feature space were more separated than in existing feature space. This result demonstrates that our features extracted more discriminative information via the iterative feature learning process, and thus contributed to the predictive performance improvement.
Collapse
Affiliation(s)
- Chichi Dai
- Bachelor of Engineering in Software Engineering from Sichuan University
| | | | - Lizhen Cui
- School of Software, Shandong University, the Deputy Director of the E-Commerce Research Center
| | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wei Chen
- School of Life Sciences, North China University of Science and Technology, 21 Bohai Road, Caofeidian Xincheng, Tangshan 063210, China
| | - Leyi Wei
- Computer Science from Xiamen University, China
| |
Collapse
|
24
|
Liu L, Song B, Ma J, Song Y, Zhang SY, Tang Y, Wu X, Wei Z, Chen K, Su J, Rong R, Lu Z, de Magalhães JP, Rigden DJ, Zhang L, Zhang SW, Huang Y, Lei X, Liu H, Meng J. Bioinformatics approaches for deciphering the epitranscriptome: Recent progress and emerging topics. Comput Struct Biotechnol J 2020; 18:1587-1604. [PMID: 32670500 PMCID: PMC7334300 DOI: 10.1016/j.csbj.2020.06.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2020] [Revised: 06/02/2020] [Accepted: 06/07/2020] [Indexed: 12/13/2022] Open
Abstract
Post-transcriptional RNA modification occurs on all types of RNA and plays a vital role in regulating every aspect of RNA function. Thanks to the development of high-throughput sequencing technologies, transcriptome-wide profiling of RNA modifications has been made possible. With the accumulation of a large number of high-throughput datasets, bioinformatics approaches have become increasing critical for unraveling the epitranscriptome. We review here the recent progress in bioinformatics approaches for deciphering the epitranscriptomes, including epitranscriptome data analysis techniques, RNA modification databases, disease-association inference, general functional annotation, and studies on RNA modification site prediction. We also discuss the limitations of existing approaches and offer some future perspectives.
Collapse
Affiliation(s)
- Lian Liu
- School of Computer Sciences, Shannxi Normal University, Xi’an, Shaanxi 710119, China
| | - Bowen Song
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Jiani Ma
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Yi Song
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Song-Yao Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi 710072, China
| | - Yujiao Tang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Xiangyu Wu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Kunqi Chen
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Jionglong Su
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
| | - Rong Rong
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Zhiliang Lu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - João Pedro de Magalhães
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Shao-Wu Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Yufei Huang
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, 78249, USA
- Department of Epidemiology and Biostatistics, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Xiujuan Lei
- School of Computer Sciences, Shannxi Normal University, Xi’an, Shaanxi 710119, China
| | - Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| |
Collapse
|
25
|
Liu L, Lei X, Fang Z, Tang Y, Meng J, Wei Z. LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor. Front Genet 2020; 11:545. [PMID: 32582286 PMCID: PMC7297269 DOI: 10.3389/fgene.2020.00545] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 05/06/2020] [Indexed: 12/31/2022] Open
Abstract
N 6-methyladenosine (m6A) is one of the most widely studied epigenetic modifications, which plays an important role in many biological processes, such as splicing, RNA localization, and degradation. Studies have shown that m6A on lncRNA has important functions, including regulating the expression and functions of lncRNA, regulating the synthesis of pre-mRNA, promoting the proliferation of cancer cells, and affecting cell differentiation and many others. Although a number of methods have been proposed to predict m6A RNA methylation sites, most of these methods aimed at general m6A sites prediction without noticing the uniqueness of the lncRNA methylation prediction problem. Since many lncRNAs do not have a polyA tail and cannot be captured in the polyA selection step of the most widely adopted RNA-seq library preparation protocol, lncRNA methylation sites cannot be effectively captured and are thus likely to be significantly underrepresented in existing experimental data affecting the accuracy of existing predictors. In this paper, we propose a new computational framework, LITHOPHONE, which stands for long noncoding RNA methylation sites prediction from sequence characteristics and genomic information with an ensemble predictor. We show that the methylation sites of lncRNA and mRNA have different patterns exhibited in the extracted features and should be differently handled when making predictions. Due to the used experiment protocols, the number of known lncRNA m6A sites is limited, and insufficient to train a reliable predictor; thus, the performance can be improved by combining both lncRNA and mRNA data using an ensemble predictor. We show that the newly developed LITHOPHONE approach achieved a reasonably good performance when tested on independent datasets (AUC: 0.966 and 0.835 under full transcript and mature mRNA modes, respectively), marking a substantial improvement compared with existing methods. Additionally, LITHOPHONE was applied to scan the entire human lncRNAome for all possible lncRNA m6A sites, and the results are freely accessible at: http://180.208.58.19/lith/.
Collapse
Affiliation(s)
- Lian Liu
- School of Computer Sciences, Shannxi Normal University, Xi'an, China
| | - Xiujuan Lei
- School of Computer Sciences, Shannxi Normal University, Xi'an, China
| | - Zengqiang Fang
- School of Computer Sciences, Shannxi Normal University, Xi'an, China
| | - Yujiao Tang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China
| |
Collapse
|
26
|
Zhu X, He J, Zhao S, Tao W, Xiong Y, Bi S. A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae. Brief Funct Genomics 2020; 18:367-376. [PMID: 31609411 DOI: 10.1093/bfgp/elz018] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 07/07/2019] [Accepted: 07/15/2019] [Indexed: 12/16/2022] Open
Abstract
N6-methyladenosine (m6A) modification, as one of the commonest post-transcription modifications in RNAs, has been reported to be highly related to many biological processes. Over the past decade, several tools for m6A sites prediction of Saccharomyces cerevisiae have been developed and are freely available online. However, the quality of predictions by these tools is difficult to quantify and compare. In this study, an independent dataset M6Atest6540 was compiled to systematically evaluate nine publicly available m6A prediction tools for S. cerevisiae. The experimental results indicate that RAM-ESVM achieved the best performance on M6Atest6540; however, most models performed substantially worse than their performances reported in the original papers. The benchmark dataset Met2614, which was used as the training dataset for the nine methods, were further analyzed by using a position bias index. The results demonstrated the significantly different bias of dataset Met2614 compared with the RNA segments around m6A sites recorded in RMBase. Moreover, newMet2614 was collected by randomly selecting RNA segments from non-redundant data recorded in RMBase, and three different kinds of features were extracted. The performances of the models built on Met2614 and newMet2614 with the features were compared, which shows the better generalization of models built on newMet2614. Our results also indicate the position-specific propensity-based features outperform other features, although they are also easily over-fitted on a biased dataset.
Collapse
Affiliation(s)
- Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China.,School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Jingjing He
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Shihao Zhao
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Wei Tao
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shoudong Bi
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| |
Collapse
|
27
|
Wu P, Mo Y, Peng M, Tang T, Zhong Y, Deng X, Xiong F, Guo C, Wu X, Li Y, Li X, Li G, Zeng Z, Xiong W. Emerging role of tumor-related functional peptides encoded by lncRNA and circRNA. Mol Cancer 2020; 19:22. [PMID: 32019587 PMCID: PMC6998289 DOI: 10.1186/s12943-020-1147-3] [Citation(s) in RCA: 320] [Impact Index Per Article: 80.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 01/28/2020] [Indexed: 02/08/2023] Open
Abstract
Non-coding RNAs do not encode proteins and regulate various oncological processes. They are also important potential cancer diagnostic and prognostic biomarkers. Bioinformatics and translation omics have begun to elucidate the roles and modes of action of the functional peptides encoded by ncRNA. Here, recent advances in long non-coding RNA (lncRNA) and circular RNA (circRNA)-encoded small peptides are compiled and synthesized. We introduce both the computational and analytical methods used to forecast prospective ncRNAs encoding oncologically functional oligopeptides. We also present numerous specific lncRNA and circRNA-encoded proteins and their cancer-promoting or cancer-inhibiting molecular mechanisms. This information may expedite the discovery, development, and optimization of novel and efficacious cancer diagnostic, therapeutic, and prognostic protein-based tools derived from non-coding RNAs. The role of ncRNA-encoding functional peptides has promising application perspectives and potential challenges in cancer research. The aim of this review is to provide a theoretical basis and relevant references, which may promote the discovery of more functional peptides encoded by ncRNAs, and further develop novel anticancer therapeutic targets, as well as diagnostic and prognostic cancer markers.
Collapse
Affiliation(s)
- Pan Wu
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Translational Radiation Oncology, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Disease Genome Research Center, the Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Yongzhen Mo
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Miao Peng
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Ting Tang
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Yu Zhong
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Xiangying Deng
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Fang Xiong
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Can Guo
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Xu Wu
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Translational Radiation Oncology, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Yong Li
- Department of Medicine, Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas, USA
| | - Xiaoling Li
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Guiyuan Li
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Translational Radiation Oncology, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Disease Genome Research Center, the Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Zhaoyang Zeng
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Translational Radiation Oncology, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Disease Genome Research Center, the Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Wei Xiong
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Translational Radiation Oncology, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China. .,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China. .,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Disease Genome Research Center, the Third Xiangya Hospital, Central South University, Changsha, Hunan, China.
| |
Collapse
|
28
|
Liu L, Lei X, Meng J, Wei Z. WITMSG: Large-scale Prediction of Human Intronic m 6A RNA Methylation Sites from Sequence and Genomic Features. Curr Genomics 2020; 21:67-76. [PMID: 32655300 PMCID: PMC7324894 DOI: 10.2174/1389202921666200211104140] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 01/14/2020] [Accepted: 01/27/2020] [Indexed: 02/07/2023] Open
Abstract
INTRODUCTION N 6-methyladenosine (m6A) is one of the most widely studied epigenetic modifications. It plays important roles in various biological processes, such as splicing, RNA localization and degradation, many of which are related to the functions of introns. Although a number of computational approaches have been proposed to predict the m6A sites in different species, none of them were optimized for intronic m6A sites. As existing experimental data overwhelmingly relied on polyA selection in sample preparation and the intronic RNAs are usually underrepresented in the captured RNA library, the accuracy of general m6A sites prediction approaches is limited for intronic m6A sites prediction task. METHODOLOGY A computational framework, WITMSG, dedicated to the large-scale prediction of intronic m6A RNA methylation sites in humans has been proposed here for the first time. Based on the random forest algorithm and using only known intronic m6A sites as the training data, WITMSG takes advantage of both conventional sequence features and a variety of genomic characteristics for improved prediction performance of intron-specific m6A sites. RESULTS AND CONCLUSION It has been observed that WITMSG outperformed competing approaches (trained with all the m6A sites or intronic m6A sites only) in 10-fold cross-validation (AUC: 0.940) and when tested on independent datasets (AUC: 0.946). WITMSG was also applied intronome-wide in humans to predict all possible intronic m6A sites, and the prediction results are freely accessible at http://rnamd.com/intron/.
Collapse
Affiliation(s)
- Lian Liu
- 1School of Computer Sciences, Shannxi Normal University, Xi'an, Shaanxi, 710119, China; 2Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
| | - Xiujuan Lei
- 1School of Computer Sciences, Shannxi Normal University, Xi'an, Shaanxi, 710119, China; 2Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
| | - Jia Meng
- 1School of Computer Sciences, Shannxi Normal University, Xi'an, Shaanxi, 710119, China; 2Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
| | - Zhen Wei
- 1School of Computer Sciences, Shannxi Normal University, Xi'an, Shaanxi, 710119, China; 2Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
| |
Collapse
|
29
|
Cai J, Wang D, Chen R, Niu Y, Ye X, Su R, Xiao G, Wei L. A Bioinformatics Tool for the Prediction of DNA N6-Methyladenine Modifications Based on Feature Fusion and Optimization Protocol. Front Bioeng Biotechnol 2020; 8:502. [PMID: 32582654 PMCID: PMC7287168 DOI: 10.3389/fbioe.2020.00502] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 04/29/2020] [Indexed: 01/04/2023] Open
Abstract
DNA N6-methyladenine (6mA) is closely involved with various biological processes. Identifying the distributions of 6mA modifications in genome-scale is of great significance to in-depth understand the functions. In recent years, various experimental and computational methods have been proposed for this purpose. Unfortunately, existing methods cannot provide accurate and fast 6mA prediction. In this study, we present 6mAPred-FO, a bioinformatics tool that enables researchers to make predictions based on sequences only. To sufficiently capture the characteristics of 6mA sites, we integrate the sequence-order information with nucleotide positional specificity information for feature encoding, and further improve the feature representation capacity by analysis of variance-based feature optimization protocol. The experimental results show that using this feature protocol, we can significantly improve the predictive performance. Via further feature analysis, we found that the sequence-order information and positional specificity information are complementary to each other, contributing to the performance improvement. On the other hand, the improvement is also due to the use of the feature optimization protocol, which is capable of effectively capturing the most informative features from the original feature space. Moreover, benchmarking comparison results demonstrate that our 6mAPred-FO outperforms several existing predictors. Finally, we establish a web-server that implements the proposed method for convenience of researchers' use, which is currently available at http://server.malab.cn/6mAPred-FO.
Collapse
Affiliation(s)
- Jianhua Cai
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China
| | - Donghua Wang
- Department of General Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Riqing Chen
- College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Yuzhen Niu
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Guobao Xiao
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China
- *Correspondence: Guobao Xiao
| | - Leyi Wei
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China
- School of Software, Shandong University, Jinan, China
- Leyi Wei
| |
Collapse
|
30
|
Liu Z, Dong W, Luo W, Jiang W, Li Q, He Z. HLMethy: a machine learning-based model to identify the hidden labels of m 6A candidates. PLANT MOLECULAR BIOLOGY 2019; 101:575-584. [PMID: 31722090 DOI: 10.1007/s11103-019-00930-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 11/01/2019] [Indexed: 06/10/2023]
Abstract
We developed a machine learning-based model to identify the hidden labels of m6A candidates from noisy m6A-seq data. Peak-calling approaches, such as MeRIP-seq or m6A-seq, are commonly used to map m6A modifications. However, these technologies can only map m6A sites with 100-200 nt resolution and cannot reveal the precise location or the number of modified residues in a transcript. To address this challenge, we developed a novel machine learning-based approach, named HLMethy, to assign labels to m6A candidates from noisy m6A-seq data. The multiple instance learning framework was adopted and two different training strategies were used to generate the classification model. To test the performance of our model, the m6A sites with single-base resolution were used and our model achieved comparable performance against existing instance-level predictors, which suggest that our model has the potential to improve the data quality of m6A-seq at reduced costs. What's more, our generic framework can be extended to other newly found modifications that are found by peak-calling approaches. The source code of HLMethy is available at https://github.com/liuze-nwafu/HLMethy.
Collapse
Affiliation(s)
- Ze Liu
- College of Water Resources and Architectural Engineering, Northwest A & F University, Yangling, 712100, Shaanxi, China
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| | - Wei Dong
- College of Water Resources and Architectural Engineering, Northwest A & F University, Yangling, 712100, Shaanxi, China.
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China.
| | - WenJie Luo
- College of Water Resources and Architectural Engineering, Northwest A & F University, Yangling, 712100, Shaanxi, China
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| | - Wei Jiang
- College of Water Resources and Architectural Engineering, Northwest A & F University, Yangling, 712100, Shaanxi, China
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| | - QuanWu Li
- College of Water Resources and Architectural Engineering, Northwest A & F University, Yangling, 712100, Shaanxi, China
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| | - ZiLi He
- College of Water Resources and Architectural Engineering, Northwest A & F University, Yangling, 712100, Shaanxi, China
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| |
Collapse
|
31
|
Chen Z, Zhao P, Li F, Wang Y, Smith AI, Webb GI, Akutsu T, Baggag A, Bensmail H, Song J. Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences. Brief Bioinform 2019; 21:1676-1696. [DOI: 10.1093/bib/bbz112] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 07/31/2019] [Accepted: 08/07/2019] [Indexed: 12/14/2022] Open
Abstract
Abstract
RNA post-transcriptional modifications play a crucial role in a myriad of biological processes and cellular functions. To date, more than 160 RNA modifications have been discovered; therefore, accurate identification of RNA-modification sites is fundamental for a better understanding of RNA-mediated biological functions and mechanisms. However, due to limitations in experimental methods, systematic identification of different types of RNA-modification sites remains a major challenge. Recently, more than 20 computational methods have been developed to identify RNA-modification sites in tandem with high-throughput experimental methods, with most of these capable of predicting only single types of RNA-modification sites. These methods show high diversity in their dataset size, data quality, core algorithms, features extracted and feature selection techniques and evaluation strategies. Therefore, there is an urgent need to revisit these methods and summarize their methodologies, in order to improve and further develop computational techniques to identify and characterize RNA-modification sites from the large amounts of sequence data. With this goal in mind, first, we provide a comprehensive survey on a large collection of 27 state-of-the-art approaches for predicting N1-methyladenosine and N6-methyladenosine sites. We cover a variety of important aspects that are crucial for the development of successful predictors, including the dataset quality, operating algorithms, sequence and genomic features, feature selection, model performance evaluation and software utility. In addition, we also provide our thoughts on potential strategies to improve the model performance. Second, we propose a computational approach called DeepPromise based on deep learning techniques for simultaneous prediction of N1-methyladenosine and N6-methyladenosine. To extract the sequence context surrounding the modification sites, three feature encodings, including enhanced nucleic acid composition, one-hot encoding, and RNA embedding, were used as the input to seven consecutive layers of convolutional neural networks (CNNs), respectively. Moreover, DeepPromise further combined the prediction score of the CNN-based models and achieved around 43% higher area under receiver-operating curve (AUROC) for m1A site prediction and 2–6% higher AUROC for m6A site prediction, respectively, when compared with several existing state-of-the-art approaches on the independent test. In-depth analyses of characteristic sequence motifs identified from the convolution-layer filters indicated that nucleotide presentation at proximal positions surrounding the modification sites contributed most to the classification, whereas those at distal positions also affected classification but to different extents. To maximize user convenience, a web server was developed as an implementation of DeepPromise and made publicly available at http://DeepPromise.erc.monash.edu/, with the server accepting both RNA sequences and genomic sequences to allow prediction of two types of putative RNA-modification sites.
Collapse
Affiliation(s)
- Zhen Chen
- School of BasicMedical Science, Qingdao University, China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
| | - Fuyi Li
- Northwest A&F University, China
| | | | - A Ian Smith
- Prince Henrys Institute Melbourne and Monash University, Australia
| | | | | | - Abdelkader Baggag
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar
| | - Halima Bensmail
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Victoria 3800, Australia
| |
Collapse
|
32
|
Chen K, Wei Z, Zhang Q, Wu X, Rong R, Lu Z, Su J, de Magalhães JP, Rigden DJ, Meng J. WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach. Nucleic Acids Res 2019; 47:e41. [PMID: 30993345 PMCID: PMC6468314 DOI: 10.1093/nar/gkz074] [Citation(s) in RCA: 137] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 01/27/2019] [Accepted: 02/01/2019] [Indexed: 12/24/2022] Open
Abstract
N6-methyladenosine (m6A) is the most prevalent post-transcriptional modification in eukaryotes, and plays a pivotal role in various biological processes, such as splicing, RNA degradation and RNA–protein interaction. We report here a prediction framework WHISTLE for transcriptome-wide m6A RNA-methylation site prediction. When tested on six independent datasets, our approach, which integrated 35 additional genomic features besides the conventional sequence features, achieved a major improvement in the accuracy of m6A site prediction (average AUC: 0.948 and 0.880 under the full transcript or mature messenger RNA models, respectively) compared to the state-of-the-art computational approaches MethyRNA (AUC: 0.790 and 0.732) and SRAMP (AUC: 0.761 and 0.706). It also out-performed the existing epitranscriptome databases MeT-DB (AUC: 0.798 and 0.744) and RMBase (AUC: 0.786 and 0.736), which were built upon hundreds of epitranscriptome high-throughput sequencing samples. To probe the putative biological processes impacted by changes in an individual m6A site, a network-based approach was implemented according to the ‘guilt-by-association’ principle by integrating RNA methylation profiles, gene expression profiles and protein–protein interaction data. Finally, the WHISTLE web server was built to facilitate the query of our high-accuracy map of the human m6A epitranscriptome, and the server is freely available at: www.xjtlu.edu.cn/biologicalsciences/whistle and http://whistle-epitranscriptome.com.
Collapse
Affiliation(s)
- Kunqi Chen
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China.,Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX Liverpool, UK
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China.,Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX Liverpool, UK
| | - Qing Zhang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Xiangyu Wu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China.,Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX Liverpool, UK
| | - Rong Rong
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China.,Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China.,Institute of Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Zhiliang Lu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China.,Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China.,Institute of Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Jionglong Su
- Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China.,Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | | | - Daniel J Rigden
- Institute of Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China.,Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China.,Institute of Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| |
Collapse
|
33
|
Zhao W, Zhou Y, Cui Q, Zhou Y. PACES: prediction of N4-acetylcytidine (ac4C) modification sites in mRNA. Sci Rep 2019; 9:11112. [PMID: 31366994 PMCID: PMC6668381 DOI: 10.1038/s41598-019-47594-7] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 07/19/2019] [Indexed: 01/27/2023] Open
Abstract
N4-acetylcytidine (ac4C) is a highly conserved RNA modification and is the first acetylation event described in mRNA. ac4C in mRNA has been demonstrated to be involved in the regulation of mRNA stability, processing and translation, but the exact means by which ac4C works remain unclear. In addition, ac4C is widely distributed within the human transcriptome at physiologically relevant levels and so far only a small fraction of modified sequences have been detected by experiments. In this study, we developed a predictor of ac4C sites in human mRNA named PACES to help mining possible modified motifs. PACES combines two random forest classifiers, position-specific dinucleotide sequence profile and K-nucleotide frequencies. With genomic sequences as input, PACES gives possible modified sequences based on the training model. PACES is freely available at http://www.rnanut.net/paces/.
Collapse
Affiliation(s)
- Wanqing Zhao
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China
| | - Yiran Zhou
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China
| | - Qinghua Cui
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China.
- Center of Bioinformatics, Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Yuan Zhou
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China.
| |
Collapse
|
34
|
Yue H, Nie X, Yan Z, Weining S. N6-methyladenosine regulatory machinery in plants: composition, function and evolution. PLANT BIOTECHNOLOGY JOURNAL 2019; 17:1194-1208. [PMID: 31070865 PMCID: PMC6576107 DOI: 10.1111/pbi.13149] [Citation(s) in RCA: 118] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Revised: 04/28/2019] [Accepted: 05/01/2019] [Indexed: 05/04/2023]
Abstract
N6-methyladenosine (m6A) RNA methylation, one of the most pivotal internal modifications of RNA, is a conserved post-transcriptional mechanism to enrich and regulate genetic information in eukaryotes. The scope and function of this modification in plants has been an intense focus of study, especially in model plant systems. The characterization of plant m6A writers, erasers and readers, as well as the elucidation of their functions, is currently one of the most fascinating hotspots in plant biology research. The functional analysis of m6A in plants will be booming in the foreseeable future, which could contribute to crop genetic improvement through epitranscriptome manipulation. In this review, we systematically analysed and summarized recent advances in the understanding of the structure and composition of plant m6A regulatory machinery, and the biological functions of m6A in plant growth, development and stress response. Finally, our analysis showed that the evolutionary relationships between m6A modification components were highly conserved across the plant kingdom.
Collapse
Affiliation(s)
- Hong Yue
- College of Life SciencesState Key Laboratory of Crop Stress Biology in Arid AreasNorthwest A&F UniversityYanglingShaanxiChina
| | - Xiaojun Nie
- College of Life SciencesState Key Laboratory of Crop Stress Biology in Arid AreasNorthwest A&F UniversityYanglingShaanxiChina
| | - Zhaogui Yan
- College of Horticulture and Forestry SciencesHuazhong Agricultural UniversityWuhanHubeiChina
| | - Song Weining
- College of Life SciencesState Key Laboratory of Crop Stress Biology in Arid AreasNorthwest A&F UniversityYanglingShaanxiChina
| |
Collapse
|
35
|
Zhang SY, Zhang SW, Fan XN, Meng J, Chen Y, Gao SJ, Huang Y. Global analysis of N6-methyladenosine functions and its disease association using deep learning and network-based methods. PLoS Comput Biol 2019; 15:e1006663. [PMID: 30601803 PMCID: PMC6331136 DOI: 10.1371/journal.pcbi.1006663] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 01/14/2019] [Accepted: 11/21/2018] [Indexed: 02/03/2023] Open
Abstract
N6-methyladenosine (m6A) is the most abundant methylation, existing in >25% of human mRNAs. Exciting recent discoveries indicate the close involvement of m6A in regulating many different aspects of mRNA metabolism and diseases like cancer. However, our current knowledge about how m6A levels are controlled and whether and how regulation of m6A levels of a specific gene can play a role in cancer and other diseases is mostly elusive. We propose in this paper a computational scheme for predicting m6A-regulated genes and m6A-associated disease, which includes Deep-m6A, the first model for detecting condition-specific m6A sites from MeRIP-Seq data with a single base resolution using deep learning and Hot-m6A, a new network-based pipeline that prioritizes functional significant m6A genes and its associated diseases using the Protein-Protein Interaction (PPI) and gene-disease heterogeneous networks. We applied Deep-m6A and this pipeline to 75 MeRIP-seq human samples, which produced a compact set of 709 functionally significant m6A-regulated genes and nine functionally enriched subnetworks. The functional enrichment analysis of these genes and networks reveal that m6A targets key genes of many critical biological processes including transcription, cell organization and transport, and cell proliferation and cancer-related pathways such as Wnt pathway. The m6A-associated disease analysis prioritized five significantly associated diseases including leukemia and renal cell carcinoma. These results demonstrate the power of our proposed computational scheme and provide new leads for understanding m6A regulatory functions and its roles in diseases.
Collapse
Affiliation(s)
- Song-Yao Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, Texas, United States of America
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Xiao-Nan Fan
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Jia Meng
- Department of Biological Sciences, HRINU, SUERI, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, China
| | - Yidong Chen
- Department of Epidemiology and Biostatistics, University of Texas Health San Antonio, San Antonio, Texas, United States of America
| | - Shou-Jiang Gao
- UPMC Hillman Cancer Center and Department of Microbiology and Molecular Genetics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Yufei Huang
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, Texas, United States of America
- Department of Epidemiology and Biostatistics, University of Texas Health San Antonio, San Antonio, Texas, United States of America
| |
Collapse
|
36
|
Wei L, Su R, Wang B, Li X, Zou Q, Gao X. Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.04.082] [Citation(s) in RCA: 110] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
37
|
Qiang X, Chen H, Ye X, Su R, Wei L. M6AMRFS: Robust Prediction of N6-Methyladenosine Sites With Sequence-Based Features in Multiple Species. Front Genet 2018; 9:495. [PMID: 30410501 PMCID: PMC6209681 DOI: 10.3389/fgene.2018.00495] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 10/04/2018] [Indexed: 12/23/2022] Open
Abstract
As one of the well-studied RNA methylation modifications, N6-methyladenosine (m6A) plays important roles in various biological progresses, such as RNA splicing and degradation, etc. Identification of m6A sites is fundamentally important for better understanding of their functional mechanisms. Recently, machine learning based prediction methods have emerged as an effective approach for fast and accurate identification of m6A sites. In this paper, we proposed "M6AMRFS", a new machine learning based predictor for the identification of m6A sites. In this predictor, we exploited a new feature representation algorithm to encode RNA sequences with two feature descriptors (dinucleotide binary encoding and Local position-specific dinucleotide frequency), and used the F-score algorithm combined with SFS (Sequential Forward Search) to enhance the feature representation ability. To predict m6A sites, we employed the eXtreme Gradient Boosting (XGBoost) algorithm to build a predictive model. Benchmarking results showed that the proposed predictor is competitive with the state-of-the art predictors. Importantly, robust predictions for multiple species by our predictor demonstrate that our predictive models have strong generalization ability. To the best of our knowledge, M6AMRFS is the first tool that can be used for the identification of m6A sites in multiple species. To facilitate the use of our predictor, we have established a user-friendly webserver with the implementation of M6AMRFS, which is currently available in http://server.malab.cn/M6AMRFS/. We anticipate that it will be a useful tool for the relevant research of m6A sites.
Collapse
Affiliation(s)
- Xiaoli Qiang
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Huangrong Chen
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Ran Su
- School of Software, Tianjin University, Tianjin, China
| | - Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| |
Collapse
|
38
|
Huang Y, He N, Chen Y, Chen Z, Li L. BERMP: a cross-species classifier for predicting m 6A sites by integrating a deep learning algorithm and a random forest approach. Int J Biol Sci 2018; 14:1669-1677. [PMID: 30416381 PMCID: PMC6216033 DOI: 10.7150/ijbs.27819] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 08/14/2018] [Indexed: 11/12/2022] Open
Abstract
N6-methyladenosine (m6A) is a prevalent RNA methylation modification involved in several biological processes. Hundreds or thousands of m6A sites identified from different species using high-throughput experiments provides a rich resource to construct in-silico approaches for identifying m6A sites. The existing m6A predictors are developed using conventional machine-learning (ML) algorithms and most are species-centric. In this paper, we develop a novel cross-species deep-learning classifier based on bidirectional Gated Recurrent Unit (BGRU) for the prediction of m6A sites. In comparison with conventional ML approaches, BGRU achieves outstanding performance for the Mammalia dataset that contains over fifty thousand m6A sites but inferior for the Saccharomyces cerevisiae dataset that covers around a thousand positives. The accuracy of BGRU is sensitive to the data size and the sensitivity is compensated by the integration of a random forest classifier with a novel encoding of enhanced nucleic acid content. The integrated approach dubbed as BGRU-based Ensemble RNA Methylation site Predictor (BERMP) has competitive performance in both cross-validation test and independent test. BERMP also outperforms existing m6A predictors for different species. Therefore, BERMP is a novel multi-species tool for identifying m6A sites with high confidence. This classifier is freely available at http://www.bioinfogo.org/bermp.
Collapse
Affiliation(s)
- Yu Huang
- School of Data Science and Software Engineering, Qingdao University, 266021, Qingdao, China
| | - Ningning He
- School of Basic Medicine, Qingdao University, 266021, Qingdao, China
| | - Yu Chen
- School of Data Science and Software Engineering, Qingdao University, 266021, Qingdao, China
| | - Zhen Chen
- School of Basic Medicine, Qingdao University, 266021, Qingdao, China
| | - Lei Li
- School of Data Science and Software Engineering, Qingdao University, 266021, Qingdao, China.,School of Basic Medicine, Qingdao University, 266021, Qingdao, China.,Cancer institute, the Affiliated Hospital of Qingdao University, Qingdao, Shandong, 266061, China.,Qingdao Cancer Institute, Qingdao, Shandong 266061, China
| |
Collapse
|
39
|
Zhao Z, Peng H, Lan C, Zheng Y, Fang L, Li J. Imbalance learning for the prediction of N 6-Methylation sites in mRNAs. BMC Genomics 2018; 19:574. [PMID: 30068294 PMCID: PMC6090857 DOI: 10.1186/s12864-018-4928-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 07/04/2018] [Indexed: 01/09/2023] Open
Abstract
Background N6-methyladenosine (m6A) is an important epigenetic modification which plays various roles in mRNA metabolism and embryogenesis directly related to human diseases. To identify m6A in a large scale, machine learning methods have been developed to make predictions on m6A sites. However, there are two main drawbacks of these methods. The first is the inadequate learning of the imbalanced m6A samples which are much less than the non-m6A samples, by their balanced learning approaches. Second, the features used by these methods are not outstanding to represent m6A sequence characteristics. Results We propose to use cost-sensitive learning ideas to resolve the imbalance data issues in the human mRNA m6A prediction problem. This cost-sensitive approach applies to the entire imbalanced dataset, without random equal-size selection of negative samples, for an adequate learning. Along with site location and entropy features, top-ranked positions with the highest single nucleotide polymorphism specificity in the window sequences are taken as new features in our imbalance learning. On an independent dataset, our overall prediction performance is much superior to the existing predictors. Our method shows stronger robustness against the imbalance changes in the tests on 9 datasets whose imbalance ratios range from 1:1 to 9:1. Our method also outperforms the existing predictors on 1226 individual transcripts. It is found that the new types of features are indeed of high significance in the m6A prediction. The case studies on gene c-Jun and CBFB demonstrate the detailed prediction capacity to improve the prediction performance. Conclusion The proposed cost-sensitive model and the new features are useful in human mRNA m6A prediction. Our method achieves better correctness and robustness than the existing predictors in independent test and case studies. The results suggest that imbalance learning is promising to improve the performance of m6A prediction. Electronic supplementary material The online version of this article (10.1186/s12864-018-4928-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhixun Zhao
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, PO Box 123, Broadway, Sydney, NSW 2007, Australia
| | - Hui Peng
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, PO Box 123, Broadway, Sydney, NSW 2007, Australia
| | - Chaowang Lan
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, PO Box 123, Broadway, Sydney, NSW 2007, Australia
| | - Yi Zheng
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, PO Box 123, Broadway, Sydney, NSW 2007, Australia
| | - Liang Fang
- School of Computer, National University of Defense Technology, Changsha, 410073, China
| | - Jinyan Li
- Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, PO Box 123, Broadway, Sydney, NSW 2007, Australia.
| |
Collapse
|
40
|
Wang X, Yan R. RFAthM6A: a new tool for predicting m 6A sites in Arabidopsis thaliana. PLANT MOLECULAR BIOLOGY 2018; 96:327-337. [PMID: 29340952 DOI: 10.1007/s11103-018-0698-9] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Accepted: 01/05/2018] [Indexed: 06/07/2023]
Abstract
We curated a reliable dataset of m6A sites in Arabidopsis thaliana, built competitive models for predicting m6A sites, extracted predominant rules from the prediction models and analyzed the most important features. In biological RNA, approximately 150 chemical modifications have been discovered, of which N6-methyladenine (m6A) is the most prevalent and abundant. This modification plays an essential role in a myriad of biological mechanisms and regulates RNA localization, nuclear export, translation, stability, alternative splicing, and other processes. However, m6A-seq and other wet-lab techniques do not easily facilitate accurate and complete determination of m6A sites across the transcriptome. Therefore, the use of computational methods to establish accurate models for predicting m6A sites is essential. In this work, we manually curated a reliable dataset of m6A sites and non-m6A sites and developed a new tool called RFAthM6A for predicting m6A sites in Arabidopsis thaliana. Briefly, RFAthM6A consists of four independent models named RFPSNSP, RFPSDSP, RFKSNPF and RFKNF and strict benchmarks show that the AUC values of the four models reached 0.894, 0.914, 0.920 and 0.926, respectively in a fivefold cross validation and the prediction performance of RFPSDSP, RFKSNPF and RFKNF exceeded that of three previously reported models (AthMethPre, M6ATH and RAM-NPPS). Linear combination of the prediction scores of RFPSDSP, RFKSNPF and RFKNF improved the prediction performance. We also extracted several predominant rules that underlie the m6A site identification from the trained models. Furthermore, the most important features of the predictors for the m6A site identification were also analyzed in depth. To facilitate use of our proposed models by interested researchers, all the source codes and datasets are publicly deposited at https://github.com/nongdaxiaofeng/RFAthM6A .
Collapse
Affiliation(s)
- Xiaofeng Wang
- College of Mathematics and Computer Science, Shanxi Normal University, Linfen, 041004, China.
| | - Renxiang Yan
- Institute of Applied Genomics, School of Biological Sciences and Engineering, Fuzhou University, Fuzhou, 350002, China.
| |
Collapse
|
41
|
Chen X, Sun YZ, Liu H, Zhang L, Li JQ, Meng J. RNA methylation and diseases: experimental results, databases, Web servers and computational models. Brief Bioinform 2017; 20:896-917. [DOI: 10.1093/bib/bbx142] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Revised: 09/12/2017] [Indexed: 12/15/2022] Open
Affiliation(s)
- Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Ya-Zhou Sun
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Jia Meng
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University
| |
Collapse
|
42
|
Chen W, Lin H. Recent Advances in Identification of RNA Modifications. Noncoding RNA 2016; 3:ncrna3010001. [PMID: 29657273 PMCID: PMC5831996 DOI: 10.3390/ncrna3010001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Revised: 12/19/2016] [Accepted: 12/23/2016] [Indexed: 12/18/2022] Open
Abstract
RNA modifications are involved in a broad spectrum of biological and physiological processes. To reveal the functions of RNA modifications, it is important to accurately predict their positions. Although high-throughput experimental techniques have been proposed, they are cost-ineffective. As good complements of experiments, many computational methods have been proposed to predict RNA modification sites in recent years. In this review, we will summarize the existing computational approaches directed at predicting RNA modification sites. We will also discuss the challenges and future perspectives in developing reliable methods for predicting RNA modification sites.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China.
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|