1
|
Avila Santos AP, de Almeida BLS, Bonidia RP, Stadler PF, Stefanic P, Mandic-Mulec I, Rocha U, Sanches DS, de Carvalho ACPLF. BioDeepfuse: a hybrid deep learning approach with integrated feature extraction techniques for enhanced non-coding RNA classification. RNA Biol 2024; 21:1-12. [PMID: 38528797 DOI: 10.1080/15476286.2024.2329451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/23/2024] [Indexed: 03/27/2024] Open
Abstract
The accurate classification of non-coding RNA (ncRNA) sequences is pivotal for advanced non-coding genome annotation and analysis, a fundamental aspect of genomics that facilitates understanding of ncRNA functions and regulatory mechanisms in various biological processes. While traditional machine learning approaches have been employed for distinguishing ncRNA, these often necessitate extensive feature engineering. Recently, deep learning algorithms have provided advancements in ncRNA classification. This study presents BioDeepFuse, a hybrid deep learning framework integrating convolutional neural networks (CNN) or bidirectional long short-term memory (BiLSTM) networks with handcrafted features for enhanced accuracy. This framework employs a combination of k-mer one-hot, k-mer dictionary, and feature extraction techniques for input representation. Extracted features, when embedded into the deep network, enable optimal utilization of spatial and sequential nuances of ncRNA sequences. Using benchmark datasets and real-world RNA samples from bacterial organisms, we evaluated the performance of BioDeepFuse. Results exhibited high accuracy in ncRNA classification, underscoring the robustness of our tool in addressing complex ncRNA sequence data challenges. The effective melding of CNN or BiLSTM with external features heralds promising directions for future research, particularly in refining ncRNA classifiers and deepening insights into ncRNAs in cellular processes and disease manifestations. In addition to its original application in the context of bacterial organisms, the methodologies and techniques integrated into our framework can potentially render BioDeepFuse effective in various and broader domains.
Collapse
Affiliation(s)
- Anderson P Avila Santos
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil
- Department of Applied Microbial Ecology, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Saxony, Germany
| | - Breno L S de Almeida
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil
| | - Robson P Bonidia
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil
- Department of Computer Science, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio, Brazil
| | - Peter F Stadler
- Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Saxony, Germany
| | - Polonca Stefanic
- Department of Food Science and Technology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Ines Mandic-Mulec
- Department of Food Science and Technology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Ulisses Rocha
- Department of Applied Microbial Ecology, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Saxony, Germany
| | - Danilo S Sanches
- Department of Computer Science, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio, Brazil
| | | |
Collapse
|
2
|
Huang G, Huang X, Luo W. 6mA-StackingCV: an improved stacking ensemble model for predicting DNA N6-methyladenine site. BioData Min 2023; 16:34. [PMID: 38012796 PMCID: PMC10680251 DOI: 10.1186/s13040-023-00348-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 11/04/2023] [Indexed: 11/29/2023] Open
Abstract
DNA N6-adenine methylation (N6-methyladenine, 6mA) plays a key regulating role in the cellular processes. Precisely recognizing 6mA sites is of importance to further explore its biological functions. Although there are many developed computational methods for 6mA site prediction over the past decades, there is a large root left to improve. We presented a cross validation-based stacking ensemble model for 6mA site prediction, called 6mA-StackingCV. The 6mA-StackingCV is a type of meta-learning algorithm, which uses output of cross validation as input to the final classifier. The 6mA-StackingCV reached the state of the art performances in the Rosaceae independent test. Extensive tests demonstrated the stability and the flexibility of the 6mA-StackingCV. We implemented the 6mA-StackingCV as a user-friendly web application, which allows one to restrictively choose representations or learning algorithms. This application is freely available at http://www.biolscience.cn/6mA-stackingCV/ . The source code and experimental data is available at https://github.com/Xiaohong-source/6mA-stackingCV .
Collapse
Affiliation(s)
- Guohua Huang
- School of Information Technology and Administration, Hunan University of Finance and Economics, Changsha, China.
- College of Information Science and Engineering, Shaoyang University, Shaoyang, Hunan, 422000, China.
| | - Xiaohong Huang
- College of Information Science and Engineering, Shaoyang University, Shaoyang, Hunan, 422000, China
| | - Wei Luo
- College of Information Science and Engineering, Shaoyang University, Shaoyang, Hunan, 422000, China
| |
Collapse
|
3
|
Bai J, Yang H, Wu C. MLACNN: an attention mechanism-based CNN architecture for predicting genome-wide DNA methylation. Theory Biosci 2023; 142:359-370. [PMID: 37648910 PMCID: PMC10564812 DOI: 10.1007/s12064-023-00402-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 07/31/2023] [Indexed: 09/01/2023]
Abstract
Methylation is an important epigenetic regulation of methylation genes that plays a crucial role in regulating biological processes. While traditional methods for detecting methylation in biological experiments are constantly improving, the development of artificial intelligence has led to the emergence of deep learning and machine learning methods as a new trend. However, traditional machine learning-based methods rely heavily on manual feature extraction, and most deep learning methods for studying methylation extract fewer features due to their simple network structures. To address this, we propose a bottomneck network based on an attention mechanism and use new methods to ensure that the deep network can learn more effective features while minimizing overfitting. This approach enables the model to learn more features from nucleotide sequences and make better predictions of methylation. The model uses three coding methods to encode the original DNA sequence and then applies feature fusion based on attention mechanisms to obtain the best fusion method. Our results demonstrate that MLACNN outperforms previous methods and achieves more satisfactory performance.
Collapse
Affiliation(s)
- JianGuo Bai
- Shandong Jiaotong University, Jinan City, Shandong Province China
| | - Hai Yang
- Shandong Jiaotong University, Jinan City, Shandong Province China
| | - ChangDe Wu
- Shandong Jiaotong University, Jinan City, Shandong Province China
| |
Collapse
|
4
|
Liang S, Zhao Y, Jin J, Qiao J, Wang D, Wang Y, Wei L. Rm-LR: A long-range-based deep learning model for predicting multiple types of RNA modifications. Comput Biol Med 2023; 164:107238. [PMID: 37515874 DOI: 10.1016/j.compbiomed.2023.107238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/16/2023] [Accepted: 07/07/2023] [Indexed: 07/31/2023]
Abstract
Recent research has highlighted the pivotal role of RNA post-transcriptional modifications in the regulation of RNA expression and function. Accurate identification of RNA modification sites is important for understanding RNA function. In this study, we propose a novel RNA modification prediction method, namely Rm-LR, which leverages a long-range-based deep learning approach to accurately predict multiple types of RNA modifications using RNA sequences only. Rm-LR incorporates two large-scale RNA language pre-trained models to capture discriminative sequential information and learn local important features, which are subsequently integrated through a bilinear attention network. Rm-LR supports a total of ten RNA modification types (m6A, m1A, m5C, m5U, m6Am, Ψ, Am, Cm, Gm, and Um) and significantly outperforms the state-of-the-art methods in terms of predictive capability on benchmark datasets. Experimental results show the effectiveness and superiority of Rm-LR in prediction of various RNA modifications, demonstrating the strong adaptability and robustness of our proposed model. We demonstrate that RNA language pretrained models enable to learn dense biological sequential representations from large-scale long-range RNA corpus, and meanwhile enhance the interpretability of the models. This work contributes to the development of accurate and reliable computational models for RNA modification prediction, providing insights into the complex landscape of RNA modifications.
Collapse
Affiliation(s)
- Sirui Liang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Yanxi Zhao
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Junru Jin
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Ding Wang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Yu Wang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China.
| |
Collapse
|
5
|
Acera Mateos P, Zhou Y, Zarnack K, Eyras E. Concepts and methods for transcriptome-wide prediction of chemical messenger RNA modifications with machine learning. Brief Bioinform 2023; 24:7150742. [PMID: 37139545 DOI: 10.1093/bib/bbad163] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 03/03/2023] [Indexed: 05/05/2023] Open
Abstract
The expanding field of epitranscriptomics might rival the epigenome in the diversity of biological processes impacted. In recent years, the development of new high-throughput experimental and computational techniques has been a key driving force in discovering the properties of RNA modifications. Machine learning applications, such as for classification, clustering or de novo identification, have been critical in these advances. Nonetheless, various challenges remain before the full potential of machine learning for epitranscriptomics can be leveraged. In this review, we provide a comprehensive survey of machine learning methods to detect RNA modifications using diverse input data sources. We describe strategies to train and test machine learning methods and to encode and interpret features that are relevant for epitranscriptomics. Finally, we identify some of the current challenges and open questions about RNA modification analysis, including the ambiguity in predicting RNA modifications in transcript isoforms or in single nucleotides, or the lack of complete ground truth sets to test RNA modifications. We believe this review will inspire and benefit the rapidly developing field of epitranscriptomics in addressing the current limitations through the effective use of machine learning.
Collapse
Affiliation(s)
- Pablo Acera Mateos
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
| | - You Zhou
- Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
- Institute of Molecular Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
| | - Kathi Zarnack
- Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
- Institute of Molecular Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
| |
Collapse
|
6
|
Wang R, Chung CR, Huang HD, Lee TY. Identification of species-specific RNA N6-methyladinosine modification sites from RNA sequences. Brief Bioinform 2023; 24:7008797. [PMID: 36715277 DOI: 10.1093/bib/bbac573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/11/2022] [Accepted: 11/24/2022] [Indexed: 01/31/2023] Open
Abstract
N6-methyladinosine (m6A) modification is the most abundant co-transcriptional modification in eukaryotic RNA and plays important roles in cellular regulation. Traditional high-throughput sequencing experiments used to explore functional mechanisms are time-consuming and labor-intensive, and most of the proposed methods focused on limited species types. To further understand the relevant biological mechanisms among different species with the same RNA modification, it is necessary to develop a computational scheme that can be applied to different species. To achieve this, we proposed an attention-based deep learning method, adaptive-m6A, which consists of convolutional neural network, bi-directional long short-term memory and an attention mechanism, to identify m6A sites in multiple species. In addition, three conventional machine learning (ML) methods, including support vector machine, random forest and logistic regression classifiers, were considered in this work. In addition to the performance of ML methods for multi-species prediction, the optimal performance of adaptive-m6A yielded an accuracy of 0.9832 and the area under the receiver operating characteristic curve of 0.98. Moreover, the motif analysis and cross-validation among different species were conducted to test the robustness of one model towards multiple species, which helped improve our understanding about the sequence characteristics and biological functions of RNA modifications in different species.
Collapse
Affiliation(s)
- Rulan Wang
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| | - Chia-Ru Chung
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Life Sciences, University of Science and Technology of China, 230026, Hefei, Anhui, P.R. China
| | - Hsien-Da Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| |
Collapse
|
7
|
Abstract
The epitranscriptome, defined as RNA modifications that do not involve alterations in the nucleotide sequence, is a popular topic in the genomic sciences. Because we need massive computational techniques to identify epitranscriptomes within individual transcripts, many tools have been developed to infer epitranscriptomic sites as well as to process datasets using high-throughput sequencing. In this review, we summarize recent developments in epitranscriptome spatial detection and data analysis and discuss their progression.
Collapse
Affiliation(s)
- Y-H Taguchi
- Department of Physics, Chuo University, Tokyo, Japan
| |
Collapse
|
8
|
Zou J, Liu H, Tan W, Chen YQ, Dong J, Bai SY, Wu ZX, Zeng Y. Dynamic regulation and key roles of ribonucleic acid methylation. Front Cell Neurosci 2022; 16:1058083. [PMID: 36601431 PMCID: PMC9806184 DOI: 10.3389/fncel.2022.1058083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open
Abstract
Ribonucleic acid (RNA) methylation is the most abundant modification in biological systems, accounting for 60% of all RNA modifications, and affects multiple aspects of RNA (including mRNAs, tRNAs, rRNAs, microRNAs, and long non-coding RNAs). Dysregulation of RNA methylation causes many developmental diseases through various mechanisms mediated by N 6-methyladenosine (m6A), 5-methylcytosine (m5C), N 1-methyladenosine (m1A), 5-hydroxymethylcytosine (hm5C), and pseudouridine (Ψ). The emerging tools of RNA methylation can be used as diagnostic, preventive, and therapeutic markers. Here, we review the accumulated discoveries to date regarding the biological function and dynamic regulation of RNA methylation/modification, as well as the most popularly used techniques applied for profiling RNA epitranscriptome, to provide new ideas for growth and development.
Collapse
Affiliation(s)
- Jia Zou
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Hui Liu
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Wei Tan
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
| | - Yi-qi Chen
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Jing Dong
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Shu-yuan Bai
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Zhao-xia Wu
- Community Health Service Center, Wuchang Hospital, Wuhan, China
| | - Yan Zeng
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China,School of Public Health, Wuhan University of Science and Technology, Wuhan, China,*Correspondence: Yan Zeng,
| |
Collapse
|
9
|
Luo Z, Lou L, Qiu W, Xu Z, Xiao X. Predicting N6-Methyladenosine Sites in Multiple Tissues of Mammals through Ensemble Deep Learning. Int J Mol Sci 2022; 23:ijms232415490. [PMID: 36555143 PMCID: PMC9778682 DOI: 10.3390/ijms232415490] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/03/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022] Open
Abstract
N6-methyladenosine (m6A) is the most abundant within eukaryotic messenger RNA modification, which plays an essential regulatory role in the control of cellular functions and gene expression. However, it remains an outstanding challenge to detect mRNA m6A transcriptome-wide at base resolution via experimental approaches, which are generally time-consuming and expensive. Developing computational methods is a good strategy for accurate in silico detection of m6A modification sites from the large amount of RNA sequence data. Unfortunately, the existing computational models are usually only for m6A site prediction in a single species, without considering the tissue level of species, while most of them are constructed based on low-confidence level data generated by an m6A antibody immunoprecipitation (IP)-based sequencing method, thereby restricting reliability and generalizability of proposed models. Here, we review recent advances in computational prediction of m6A sites and construct a new computational approach named im6APred using ensemble deep learning to accurately identify m6A sites based on high-confidence level data in multiple tissues of mammals. Our model im6APred builds upon a comprehensive evaluation of multiple classification methods, including four traditional classification algorithms and three deep learning methods and their ensembles. The optimal base-classifier combinations are then chosen by five-fold cross-validation test to achieve an effective stacked model. Our model im6APred can produce the area under the receiver operating characteristic curve (AUROC) in the range of 0.82-0.91 on independent tests, indicating that our model has the ability to learn general methylation rules on RNA bases and generalize to m6A transcriptome-wide identification. Moreover, AUROCs in the range of 0.77-0.96 were achieved using cross-species/tissues validation on the benchmark dataset, demonstrating differences in predictive performance at the tissue level and the need for constructing tissue-specific models for m6A site prediction.
Collapse
|
10
|
Wang H, Zhao S, Cheng Y, Bi S, Zhu X. MTDeepM6A-2S: A two-stage multi-task deep learning method for predicting RNA N6-methyladenosine sites of Saccharomyces cerevisiae. Front Microbiol 2022; 13:999506. [PMID: 36274691 PMCID: PMC9579691 DOI: 10.3389/fmicb.2022.999506] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/16/2022] [Indexed: 11/13/2022] Open
Abstract
N6-methyladenosine (m6A) is one of the most important RNA modifications, which is involved in many biological activities. Computational methods have been developed to detect m6A sites due to their high efficiency and low costs. As one of the most widely utilized model organisms, many methods have been developed for predicting m6A sites of Saccharomyces cerevisiae. However, the generalization of these methods was hampered by the limited size of the benchmark datasets. On the other hand, over 60,000 low resolution m6A sites and more than 10,000 base resolution m6A sites of Saccharomyces cerevisiae are recorded in RMBase and m6A-Atlas, respectively. The base resolution m6A sites are often obtained from low resolution results by post calibration. In view of these, we proposed a two-stage deep learning method, named MTDeepM6A-2S, to predict RNA m6A sites of Saccharomyces cerevisiae based on RNA sequence information. In the first stage, a multi-task model with convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) deep framework was built to not only detect the low resolution m6A sites but also assign a reasonable probability for the predicted site. In the second stage, a transfer-learning strategy was used to build the model to predict the base resolution m6A sites from those low resolution m6A sites. The effectiveness of our model was validated on both training and independent test sets. The results show that our model outperforms other state-of-the-art models on the independent test set, which indicates that our model holds high potential to become a useful tool for epitranscriptomics analysis.
Collapse
|
11
|
N(6)-methyladenosine modification: A vital role of programmed cell death in myocardial ischemia/reperfusion injury. Int J Cardiol 2022; 367:11-19. [PMID: 36002042 DOI: 10.1016/j.ijcard.2022.08.042] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 07/08/2022] [Accepted: 08/19/2022] [Indexed: 11/20/2022]
Abstract
N(6)-methyladenosine (m6A) modification is closely associated with myocardial ischemia/reperfusion injury (MIRI). As the most common modification among RNA modifications, the reversible m6A modification is processed by methylase ("writers") and demethylase ("erasers"). The biological effects of RNA modified by m6A are regulated under the corresponding RNA binding proteins (RBPs) ("readers"). m6A modification regulates the whole process of RNA, including transcription, processing, splicing, nuclear export, stability, degradation, and translation. Programmed cell death (PCD) is a regulated mechanism that maintains the internal environment's stability. PCD plays an essential role in MIRI, including apoptosis, autophagy, pyroptosis, ferroptosis, and necroptosis. However, the relationship between PCD modified with m6A and MIRI is still not clear. This review summarizes the regulators of m6A modification and their bioeffects on PCD in MIRI.
Collapse
|
12
|
Ma L, He LN, Kang S, Gu B, Gao S, Zuo Z. Advances in detecting N6-methyladenosine modification in circRNAs. Methods 2022; 205:234-246. [PMID: 35878749 DOI: 10.1016/j.ymeth.2022.07.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 07/15/2022] [Accepted: 07/18/2022] [Indexed: 12/14/2022] Open
Abstract
Circular RNAs (circRNAs) are a class of noncoding RNAs with covalently single-stranded closed loop structures derived from back-splicing event of linear precursor mRNAs (pre-mRNAs). N6-methyladenosine (m6A), the most abundant epigenetic modification in eukaryotic RNAs, has been shown to play a crucial role in regulating the fate and biological function of circRNAs, and thus affecting various physiological and pathological processes. Accurate identification of m6A modification in circRNAs is an essential step to fully elucidate the crosstalk between m6A and circRNAs. In recent years, the rapid development of high-throughput sequencing technology and bioinformatic methodology has propelled the establishment of a multitude of approaches to detect circRNAs and m6A modification, including in vitro-based and in silico methods. Based on this, the research community has started on a new journey to develop methods for identification of m6A modification in circRNAs. In this review, we provide a comprehensive review and evaluation of the existing methods responsible for detecting circRNAs, m6A modification, and especially, m6A modification in circRNAs, which mainly focused on those developed based on high-throughput technologies and methodology of bioinformatics. This handy reference can help researchers figure out towards which direction this field will go.
Collapse
Affiliation(s)
- Lixia Ma
- State Key Laboratory of Esophageal Cancer Prevention & Treatment, Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment, Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital (College of Clinical Medical) of Henan University of Science and Technology, Luoyang, China
| | - Li-Na He
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Shiyang Kang
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Bianli Gu
- State Key Laboratory of Esophageal Cancer Prevention & Treatment, Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment, Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital (College of Clinical Medical) of Henan University of Science and Technology, Luoyang, China
| | - Shegan Gao
- State Key Laboratory of Esophageal Cancer Prevention & Treatment, Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment, Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital (College of Clinical Medical) of Henan University of Science and Technology, Luoyang, China.
| | - Zhixiang Zuo
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China.
| |
Collapse
|
13
|
Chen S, Zhang L, Lu L, Meng J, Liu H. FBCwPlaid: A Functional Biclustering Analysis of Epi-Transcriptome Profiling Data Via a Weighted Plaid Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1640-1650. [PMID: 33400655 DOI: 10.1109/tcbb.2021.3049366] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recent studies have shown that in-depth studies on epi-transcriptomic patterns of N6-methyladenosine (m6A) may help understand its complex functions and co-regulatory mechanisms. Since most biclustering algorithms are developed in scenarios of gene expression analysis, which does not share the same characteristics with m6A methylation profile, we propose a weighted Plaid biclustering model (FBCwPlaid) based on the Lagrange multiplier method to discover the potential functional patterns. Each pattern is achieved by minimizing approximation error between FBCwPlaid predicted value and real data. To address the issue that site expression level determines methylation level confidence, it uses RNA expression levels of each site as weights to make lower expressed sites less confident. FBCwPlaid also allows overlapping biclusters, indicating some sites may participate in multiple biological functions. FBCwPlaid was then applied on MeRIP-Seq data of 69,446 methylation sites under 32 experimental conditions, each of which represented a stimulus to a particular cell line or environment. Finally, three patterns were discovered, and further pathway analysis and enzyme specificity test showed that sites involved in each pattern are highly relevant to m6A methyltransferases. Further detailed analyses showed that some patterns are condition-specific, indicating that some specific sites' methylation profiles may occur in specific cell lines or conditions.
Collapse
|
14
|
Wang H, Wang S, Zhang Y, Bi S, Zhu X. A brief review of machine learning methods for RNA methylation sites prediction. Methods 2022; 203:399-421. [DOI: 10.1016/j.ymeth.2022.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 02/15/2022] [Accepted: 03/01/2022] [Indexed: 02/07/2023] Open
|
15
|
Zhou Y, Yang J, Tian Z, Zeng J, Shen W. Research progress concerning m 6A methylation and cancer. Oncol Lett 2021; 22:775. [PMID: 34589154 PMCID: PMC8442141 DOI: 10.3892/ol.2021.13036] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 08/20/2021] [Indexed: 12/12/2022] Open
Abstract
N6-methyladenosine (m6A) methylation is a type of methylation modification on RNA molecules, which was first discovered in 1974, and has become a hot topic in life science in recent years. m6A modification is an epigenetic regulation similar to DNA and histone modification and is dynamically reversible in mammalian cells. This chemical marker of RNA is produced by m6A 'writers' (methylase) and can be degraded by m6A 'erasers' (demethylase). Methylated reading protein is the 'reader', that can recognize the mRNA containing m6A and regulate the expression of downstream genes accordingly. m6A methylation is involved in all stages of the RNA life cycle, including RNA processing, nuclear export, translation and regulation of RNA degradation, indicating that m6A plays a crucial role in RNA metabolism. Recent studies have shown that m6A modification is a complicated regulatory network in different cell lines, tissues and spatio-temporal models, and m6A methylation is associated with the occurrence and development of tumors. The present review describes the regulatory mechanism and physiological functions of m6A methylation, and its research progress in several types of human tumor, to provide novel approaches for early diagnosis and targeted treatment of cancer.
Collapse
Affiliation(s)
- Yang Zhou
- Department of Cell Biology, School of Medicine of Yangzhou University, Yangzhou, Jiangsu 225000, P.R. China
| | - Jie Yang
- Department of Cell Biology, School of Medicine of Yangzhou University, Yangzhou, Jiangsu 225000, P.R. China
| | - Zheng Tian
- Department of Cell Biology, School of Medicine of Yangzhou University, Yangzhou, Jiangsu 225000, P.R. China
| | - Jing Zeng
- Department of Cell Biology, School of Medicine of Yangzhou University, Yangzhou, Jiangsu 225000, P.R. China
| | - Weigan Shen
- Department of Cell Biology, School of Medicine of Yangzhou University, Yangzhou, Jiangsu 225000, P.R. China
| |
Collapse
|
16
|
Islam N, Park J. bCNN-Methylpred: Feature-Based Prediction of RNA Sequence Modification Using Branch Convolutional Neural Network. Genes (Basel) 2021; 12:genes12081155. [PMID: 34440330 PMCID: PMC8392086 DOI: 10.3390/genes12081155] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 07/24/2021] [Accepted: 07/26/2021] [Indexed: 11/16/2022] Open
Abstract
RNA modification is vital to various cellular and biological processes. Among the existing RNA modifications, N6-methyladenosine (m6A) is considered the most important modification owing to its involvement in many biological processes. The prediction of m6A sites is crucial because it can provide a better understanding of their functional mechanisms. In this regard, although experimental methods are useful, they are time consuming. Previously, researchers have attempted to predict m6A sites using computational methods to overcome the limitations of experimental methods. Some of these approaches are based on classical machine-learning techniques that rely on handcrafted features and require domain knowledge, whereas other methods are based on deep learning. However, both methods lack robustness and yield low accuracy. Hence, we develop a branch-based convolutional neural network and a novel RNA sequence representation. The proposed network automatically extracts features from each branch of the designated inputs. Subsequently, these features are concatenated in the feature space to predict the m6A sites. Finally, we conduct experiments using four different species. The proposed approach outperforms existing state-of-the-art methods, achieving accuracies of 94.91%, 94.28%, 88.46%, and 94.8% for the H. sapiens, M. musculus, S. cerevisiae, and A. thaliana datasets, respectively.
Collapse
Affiliation(s)
- Naeem Islam
- Core Research Institute of Intelligent Robots, Jeonbuk National University, Jeonju 54896, Korea;
- College of Electrical & Mechanical Engineering, NUST, Islamabad 44000, Pakistan
| | - Jaebyung Park
- Core Research Institute of Intelligent Robots, Jeonbuk National University, Jeonju 54896, Korea;
- Division of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Korea
- Correspondence: ; Tel.: +82-63-270-4283
| |
Collapse
|
17
|
Feng P, Feng L, Tang C. Comparison and Analysis of Computational Methods for Identifying N6-Methyladenosine Sites in Saccharomyces cerevisiae. Curr Pharm Des 2021; 27:1219-1229. [PMID: 33167827 DOI: 10.2174/1381612826666201109110703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Accepted: 07/20/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND N6-methyladenosine (m6A) plays critical roles in a broad range of biological processes. Knowledge about the precise location of m6A site in the transcriptome is vital for deciphering its biological functions. Although experimental techniques have made substantial contributions to identify m6A, they are still labor intensive and time consuming. As complement to experimental methods, in the past few years, a series of computational approaches have been proposed to identify m6A sites. METHODS In order to facilitate researchers to select appropriate methods for identifying m6A sites, it is necessary to conduct a comprehensive review and comparison of existing methods. RESULTS Since research works on m6A in Saccharomyces cerevisiae are relatively clear, in this review, we summarized recent progress of computational prediction of m6A sites in S. cerevisiae and assessed the performance of existing computational methods. Finally, future directions of computationally identifying m6A sites are presented. CONCLUSION Taken together, we anticipate that this review will serve as an important guide for computational analysis of m6A modifications.
Collapse
Affiliation(s)
- Pengmian Feng
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Lijing Feng
- School of Sciences, North China University of Science and Technology, Tangshan 063000, China
| | - Chaohui Tang
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| |
Collapse
|
18
|
Ao C, Zou Q, Yu L. RFhy-m2G: Identification of RNA N2-methylguanosine modification sites based on random forest and hybrid features. Methods 2021; 203:32-39. [PMID: 34033879 DOI: 10.1016/j.ymeth.2021.05.016] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Revised: 05/04/2021] [Accepted: 05/20/2021] [Indexed: 12/31/2022] Open
Abstract
N2-methylguanosine is a post-transcriptional modification of RNA that is found in eukaryotes and archaea. The biological function of m2G modification discovered so far is to control and stabilize the three-dimensional structure of tRNA and the dynamic barrier of reverse transcription. To discover additional biological functions of m2G, it is necessary to develop time-saving and labor-saving calculation tools to identify m2G. In this paper, based on hybrid features and a random forest, a novel predictor, RFhy-m2G, was developed to identify the m2G modification sites for three species. The hybrid feature used by the predictor is used to fuse the three features of ENAC, PseDNC, and NPPS. These three features include primary sequence derivation properties, physicochemical properties, and position-specific properties. Since there are redundant features in hybrid features, MRMD2.0 is used for optimal feature selection. Through feature analysis, it is found that the optimal hybrid features obtained still contain three kinds of properties, and the hybrid features can more accurately identify m2G modification sites and improve prediction performance. Based on five-fold cross-validation and independent testing to evaluate the prediction model, the accuracies obtained were 0.9982 and 0.9417, respectively. The robustness of the predictor is demonstrated by comparisons with other predictors.
Collapse
Affiliation(s)
- Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China.
| |
Collapse
|
19
|
Liu K, Chen W. iMRM: a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics 2020; 36:3336-3342. [PMID: 32134472 DOI: 10.1093/bioinformatics/btaa155] [Citation(s) in RCA: 102] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 02/26/2020] [Accepted: 02/28/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION RNA modifications play critical roles in a series of cellular and developmental processes. Knowledge about the distributions of RNA modifications in the transcriptomes will provide clues to revealing their functions. Since experimental methods are time consuming and laborious for detecting RNA modifications, computational methods have been proposed for this aim in the past five years. However, there are some drawbacks for both experimental and computational methods in simultaneously identifying modifications occurred on different nucleotides. RESULTS To address such a challenge, in this article, we developed a new predictor called iMRM, which is able to simultaneously identify m6A, m5C, m1A, ψ and A-to-I modifications in Homo sapiens, Mus musculus and Saccharomyces cerevisiae. In iMRM, the feature selection technique was used to pick out the optimal features. The results from both 10-fold cross-validation and jackknife test demonstrated that the performance of iMRM is superior to existing methods for identifying RNA modifications. AVAILABILITY AND IMPLEMENTATION A user-friendly web server for iMRM was established at http://www.bioml.cn/XG_iRNA/home. The off-line command-line version is available at https://github.com/liukeweiaway/iMRM. CONTACT greatchen@ncst.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kewei Liu
- School of Life Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063009, China
| | - Wei Chen
- School of Life Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063009, China.,Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| |
Collapse
|
20
|
Karthiya R, Khandelia P. m6A RNA Methylation: Ramifications for Gene Expression and Human Health. Mol Biotechnol 2020; 62:467-484. [PMID: 32840728 DOI: 10.1007/s12033-020-00269-5] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/14/2020] [Indexed: 12/12/2022]
Abstract
Cellular transcriptomes are frequently adorned by a variety of chemical modification marks, which in turn have a profound influence on its functioning. Of these modifications, the one which has invited a lot of attention in the recent years is m6A RNA methylation, leading to the development of RNA epigenetics or epitranscriptomics as a frontier research area. m6A RNA methylation is one of the most abundant reversible internal modification seen in cellular RNAs. Studies in the last few years have not only shed light on the molecular machinery involved in m6A RNA methylation but also on the impact of this modification in regulating gene expression and hence biological processes. In this review, we will emphasize the biological impact of this modification in normal organismal development and diseases.
Collapse
Affiliation(s)
- R Karthiya
- Department of Biological Sciences, Birla Institute of Technology and Science, Pilani - Hyderabad Campus, Jawahar Nagar, Kapra Mandal, Medchal District, Hyderabad, Telangana, 500078, India
| | - Piyush Khandelia
- Department of Biological Sciences, Birla Institute of Technology and Science, Pilani - Hyderabad Campus, Jawahar Nagar, Kapra Mandal, Medchal District, Hyderabad, Telangana, 500078, India.
| |
Collapse
|
21
|
Wu S, Zhang S, Wu X, Zhou X. m 6A RNA Methylation in Cardiovascular Diseases. Mol Ther 2020; 28:2111-2119. [PMID: 32910911 DOI: 10.1016/j.ymthe.2020.08.010] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 08/01/2020] [Accepted: 08/13/2020] [Indexed: 01/01/2023] Open
Abstract
Cardiovascular diseases (CVDs) remain the leading cause of death and disability worldwide, despite marked improvements in prevention, diagnosis, and early intervention. There is an urgent need to discover more effective therapeutic strategies, which would be facilitated by a more in-depth understanding of CVDs and their underlying molecular mechanisms. Recent advances in knowledge about epigenetic mechanisms, especially RNA methylation, have revealed a close relationship between epigenetic modifications and CVDs and have brought to potential novel targets for diagnosis and treatment. Here, we provide a review of recent studies exploring RNA N6-methyladenosine (m6A) modification, with particular emphasis on its role in CVDs, such as coronary heart disease, hypertension, cardiac hypertrophy, and heart failure. We also introduce the "life cycle" of m6A and its dominant function in several biological processes. Finally, we highlight the prospects of treatment based on interfering with m6A, which could have a transformative effect on clinical medicine.
Collapse
Affiliation(s)
- Siyi Wu
- Department of Cardiology, The Second Affiliated Hospital of Soochow University, Suzhou 215004, P.R. China
| | - Shuchen Zhang
- Department of Cardiology, The Second Affiliated Hospital of Soochow University, Suzhou 215004, P.R. China
| | - Xiaoguang Wu
- Department of Cardiology, The Second Affiliated Hospital of Soochow University, Suzhou 215004, P.R. China
| | - Xiang Zhou
- Department of Cardiology, The Second Affiliated Hospital of Soochow University, Suzhou 215004, P.R. China.
| |
Collapse
|
22
|
Liu L, Song B, Ma J, Song Y, Zhang SY, Tang Y, Wu X, Wei Z, Chen K, Su J, Rong R, Lu Z, de Magalhães JP, Rigden DJ, Zhang L, Zhang SW, Huang Y, Lei X, Liu H, Meng J. Bioinformatics approaches for deciphering the epitranscriptome: Recent progress and emerging topics. Comput Struct Biotechnol J 2020; 18:1587-1604. [PMID: 32670500 PMCID: PMC7334300 DOI: 10.1016/j.csbj.2020.06.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2020] [Revised: 06/02/2020] [Accepted: 06/07/2020] [Indexed: 12/13/2022] Open
Abstract
Post-transcriptional RNA modification occurs on all types of RNA and plays a vital role in regulating every aspect of RNA function. Thanks to the development of high-throughput sequencing technologies, transcriptome-wide profiling of RNA modifications has been made possible. With the accumulation of a large number of high-throughput datasets, bioinformatics approaches have become increasing critical for unraveling the epitranscriptome. We review here the recent progress in bioinformatics approaches for deciphering the epitranscriptomes, including epitranscriptome data analysis techniques, RNA modification databases, disease-association inference, general functional annotation, and studies on RNA modification site prediction. We also discuss the limitations of existing approaches and offer some future perspectives.
Collapse
Affiliation(s)
- Lian Liu
- School of Computer Sciences, Shannxi Normal University, Xi’an, Shaanxi 710119, China
| | - Bowen Song
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Jiani Ma
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Yi Song
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Song-Yao Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi 710072, China
| | - Yujiao Tang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Xiangyu Wu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Kunqi Chen
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Jionglong Su
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
| | - Rong Rong
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Zhiliang Lu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - João Pedro de Magalhães
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Shao-Wu Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Yufei Huang
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, 78249, USA
- Department of Epidemiology and Biostatistics, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Xiujuan Lei
- School of Computer Sciences, Shannxi Normal University, Xi’an, Shaanxi 710119, China
| | - Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| |
Collapse
|
23
|
Liu L, Lei X, Fang Z, Tang Y, Meng J, Wei Z. LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor. Front Genet 2020; 11:545. [PMID: 32582286 PMCID: PMC7297269 DOI: 10.3389/fgene.2020.00545] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 05/06/2020] [Indexed: 12/31/2022] Open
Abstract
N 6-methyladenosine (m6A) is one of the most widely studied epigenetic modifications, which plays an important role in many biological processes, such as splicing, RNA localization, and degradation. Studies have shown that m6A on lncRNA has important functions, including regulating the expression and functions of lncRNA, regulating the synthesis of pre-mRNA, promoting the proliferation of cancer cells, and affecting cell differentiation and many others. Although a number of methods have been proposed to predict m6A RNA methylation sites, most of these methods aimed at general m6A sites prediction without noticing the uniqueness of the lncRNA methylation prediction problem. Since many lncRNAs do not have a polyA tail and cannot be captured in the polyA selection step of the most widely adopted RNA-seq library preparation protocol, lncRNA methylation sites cannot be effectively captured and are thus likely to be significantly underrepresented in existing experimental data affecting the accuracy of existing predictors. In this paper, we propose a new computational framework, LITHOPHONE, which stands for long noncoding RNA methylation sites prediction from sequence characteristics and genomic information with an ensemble predictor. We show that the methylation sites of lncRNA and mRNA have different patterns exhibited in the extracted features and should be differently handled when making predictions. Due to the used experiment protocols, the number of known lncRNA m6A sites is limited, and insufficient to train a reliable predictor; thus, the performance can be improved by combining both lncRNA and mRNA data using an ensemble predictor. We show that the newly developed LITHOPHONE approach achieved a reasonably good performance when tested on independent datasets (AUC: 0.966 and 0.835 under full transcript and mature mRNA modes, respectively), marking a substantial improvement compared with existing methods. Additionally, LITHOPHONE was applied to scan the entire human lncRNAome for all possible lncRNA m6A sites, and the results are freely accessible at: http://180.208.58.19/lith/.
Collapse
Affiliation(s)
- Lian Liu
- School of Computer Sciences, Shannxi Normal University, Xi'an, China
| | - Xiujuan Lei
- School of Computer Sciences, Shannxi Normal University, Xi'an, China
| | - Zengqiang Fang
- School of Computer Sciences, Shannxi Normal University, Xi'an, China
| | - Yujiao Tang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China
| |
Collapse
|
24
|
Zhu X, He J, Zhao S, Tao W, Xiong Y, Bi S. A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae. Brief Funct Genomics 2020; 18:367-376. [PMID: 31609411 DOI: 10.1093/bfgp/elz018] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 07/07/2019] [Accepted: 07/15/2019] [Indexed: 12/16/2022] Open
Abstract
N6-methyladenosine (m6A) modification, as one of the commonest post-transcription modifications in RNAs, has been reported to be highly related to many biological processes. Over the past decade, several tools for m6A sites prediction of Saccharomyces cerevisiae have been developed and are freely available online. However, the quality of predictions by these tools is difficult to quantify and compare. In this study, an independent dataset M6Atest6540 was compiled to systematically evaluate nine publicly available m6A prediction tools for S. cerevisiae. The experimental results indicate that RAM-ESVM achieved the best performance on M6Atest6540; however, most models performed substantially worse than their performances reported in the original papers. The benchmark dataset Met2614, which was used as the training dataset for the nine methods, were further analyzed by using a position bias index. The results demonstrated the significantly different bias of dataset Met2614 compared with the RNA segments around m6A sites recorded in RMBase. Moreover, newMet2614 was collected by randomly selecting RNA segments from non-redundant data recorded in RMBase, and three different kinds of features were extracted. The performances of the models built on Met2614 and newMet2614 with the features were compared, which shows the better generalization of models built on newMet2614. Our results also indicate the position-specific propensity-based features outperform other features, although they are also easily over-fitted on a biased dataset.
Collapse
Affiliation(s)
- Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China.,School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Jingjing He
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Shihao Zhao
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Wei Tao
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shoudong Bi
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| |
Collapse
|
25
|
Zhu ZM, Huo FC, Pei DS. Function and evolution of RNA N6-methyladenosine modification. Int J Biol Sci 2020; 16:1929-1940. [PMID: 32398960 PMCID: PMC7211178 DOI: 10.7150/ijbs.45231] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Accepted: 04/05/2020] [Indexed: 02/06/2023] Open
Abstract
N6-methyladenosine (m6A) is identified as the most prevalent and abundant internal RNA modification, especially within eukaryotic mRNAs, which has attracted much attention in recent years since its importance for regulating gene expression and deciding cell fate. m6A modification is installed by RNA methyltransferases METTL3, METTL14 and WTAP (Writers), removed by the demethylases FTO and ALKBH5 (Erasers) and recognized by m6A binding proteins, such as YT521-B homology YTH domain-containing proteins (Readers). Accumulating evidence shows that m6A RNA methylation participates in almost all aspects of RNA processing, implying an association with important bioprocesses. In this review, we mainly summarize and discuss the functional relevance and importance of m6A modification in cellular processes.
Collapse
Affiliation(s)
- Zhi-Man Zhu
- Department of Pathology, Xuzhou Medical University, Xuzhou 221004, China
| | - Fu-Chun Huo
- Department of Pathology, Xuzhou Medical University, Xuzhou 221004, China
| | - Dong-Sheng Pei
- Department of Pathology, Xuzhou Medical University, Xuzhou 221004, China
| |
Collapse
|
26
|
Govindaraj RG, Subramaniyam S, Manavalan B. Extremely-randomized-tree-based Prediction of N 6-Methyladenosine Sites in Saccharomyces cerevisiae. Curr Genomics 2020; 21:26-33. [PMID: 32655295 PMCID: PMC7324895 DOI: 10.2174/1389202921666200219125625] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 12/28/2019] [Accepted: 01/24/2020] [Indexed: 02/07/2023] Open
Abstract
Introduction N6-methyladenosine (m6A) is one of the most common post-transcriptional modifications in RNA, which has been related to several biological processes. The accurate prediction of m6A sites from RNA sequences is one of the challenging tasks in computational biology. Several computational methods utilizing machine-learning algorithms have been proposed that accelerate in silico screening of m6A sites, thereby drastically reducing the experimental time and labor costs involved. Methodology In this study, we proposed a novel computational predictor termed ERT-m6Apred, for the accurate prediction of m6A sites. To identify the feature encodings with more discriminative capability, we applied a two-step feature selection technique on seven different feature encodings and identified the corresponding optimal feature set. Results Subsequently, performance comparison of the corresponding optimal feature set-based extremely randomized tree model revealed that Pseudo k-tuple composition encoding, which includes 14 physicochemical properties significantly outperformed other encodings. Moreover, ERT-m6Apred achieved an accuracy of 78.84% during cross-validation analysis, which is comparatively better than recently reported predictors. Conclusion In summary, ERT-m6Apred predicts Saccharomyces cerevisiae m6A sites with higher accuracy, thus facilitating biological hypothesis generation and experimental validations.
Collapse
Affiliation(s)
- Rajiv G Govindaraj
- 1HotSpot Therapeutics, 50 Milk Street, 16 Floor, Boston, MA02109, USA; 2Research and Development Center, In-silicogen Inc., Yongin-si 16954, Gyeonggi-do, Republic of Korea; 3Department of Biotechnology, Dr. N.G.P. Arts and Science College, Coimbatore, Tamil Nadu641048, India; 4Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Sathiyamoorthy Subramaniyam
- 1HotSpot Therapeutics, 50 Milk Street, 16 Floor, Boston, MA02109, USA; 2Research and Development Center, In-silicogen Inc., Yongin-si 16954, Gyeonggi-do, Republic of Korea; 3Department of Biotechnology, Dr. N.G.P. Arts and Science College, Coimbatore, Tamil Nadu641048, India; 4Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Balachandran Manavalan
- 1HotSpot Therapeutics, 50 Milk Street, 16 Floor, Boston, MA02109, USA; 2Research and Development Center, In-silicogen Inc., Yongin-si 16954, Gyeonggi-do, Republic of Korea; 3Department of Biotechnology, Dr. N.G.P. Arts and Science College, Coimbatore, Tamil Nadu641048, India; 4Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| |
Collapse
|
27
|
Li Y, Wang J, Huang C, Shen M, Zhan H, Xu K. RNA N6-methyladenosine: a promising molecular target in metabolic diseases. Cell Biosci 2020; 10:19. [PMID: 32110378 PMCID: PMC7035649 DOI: 10.1186/s13578-020-00385-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Accepted: 02/11/2020] [Indexed: 12/12/2022] Open
Abstract
N6-methyladenosine is a prevalent and abundant transcriptome modification, and its methylation regulates the various aspects of RNAs, including transcription, translation, processing and metabolism. The methylation of N6-methyladenosine is highly associated with numerous cellular processes, which plays important roles in the development of physiological process and diseases. The high prevalence of metabolic diseases poses a serious threat to human health, but its pathological mechanisms remain poorly understood. Recent studies have reported that the progression of metabolic diseases is closely related to the expression of RNA N6-methyladenosine modification. In this review, we aim to summarize the biological and clinical significance of RNA N6-methyladenosine modification in metabolic diseases, including obesity, type 2 diabetes, non-alcoholic fatty liver disease, hypertension, cardiovascular diseases, osteoporosis and immune-related metabolic diseases.
Collapse
Affiliation(s)
- Yan Li
- 1Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072 Sichuan China
| | - Jiawen Wang
- 1Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072 Sichuan China
| | - Chunyan Huang
- Houjie Hospital of Dongguan, Dongguan, 523945 Guangdong China
| | - Meng Shen
- Chengdu Tumor Hospital, Chengdu, 610041 Sichuan China
| | - Huakui Zhan
- 1Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072 Sichuan China
| | - Keyang Xu
- 4Hangzhou Xixi Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310023 Zhejiang China
| |
Collapse
|
28
|
Liu L, Lei X, Meng J, Wei Z. WITMSG: Large-scale Prediction of Human Intronic m 6A RNA Methylation Sites from Sequence and Genomic Features. Curr Genomics 2020; 21:67-76. [PMID: 32655300 PMCID: PMC7324894 DOI: 10.2174/1389202921666200211104140] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 01/14/2020] [Accepted: 01/27/2020] [Indexed: 02/07/2023] Open
Abstract
INTRODUCTION N 6-methyladenosine (m6A) is one of the most widely studied epigenetic modifications. It plays important roles in various biological processes, such as splicing, RNA localization and degradation, many of which are related to the functions of introns. Although a number of computational approaches have been proposed to predict the m6A sites in different species, none of them were optimized for intronic m6A sites. As existing experimental data overwhelmingly relied on polyA selection in sample preparation and the intronic RNAs are usually underrepresented in the captured RNA library, the accuracy of general m6A sites prediction approaches is limited for intronic m6A sites prediction task. METHODOLOGY A computational framework, WITMSG, dedicated to the large-scale prediction of intronic m6A RNA methylation sites in humans has been proposed here for the first time. Based on the random forest algorithm and using only known intronic m6A sites as the training data, WITMSG takes advantage of both conventional sequence features and a variety of genomic characteristics for improved prediction performance of intron-specific m6A sites. RESULTS AND CONCLUSION It has been observed that WITMSG outperformed competing approaches (trained with all the m6A sites or intronic m6A sites only) in 10-fold cross-validation (AUC: 0.940) and when tested on independent datasets (AUC: 0.946). WITMSG was also applied intronome-wide in humans to predict all possible intronic m6A sites, and the prediction results are freely accessible at http://rnamd.com/intron/.
Collapse
Affiliation(s)
| | - Xiujuan Lei
- Address correspondence to these authors at the School of Computer Sciences, Shannxi Normal University, Xi’an, Shaanxi, 710119, China; E-mail: ; and Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China; E-mail:
| | | | - Zhen Wei
- Address correspondence to these authors at the School of Computer Sciences, Shannxi Normal University, Xi’an, Shaanxi, 710119, China; E-mail: ; and Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China; E-mail:
| |
Collapse
|
29
|
Liu T, Li C, Jin L, Li C, Wang L. The Prognostic Value of m6A RNA Methylation Regulators in Colon Adenocarcinoma. Med Sci Monit 2019; 25:9435-9445. [PMID: 31823961 PMCID: PMC6926093 DOI: 10.12659/msm.920381] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Background The RNA-seq FPKM data of 331 colorectal adenocarcinoma samples in The Cancer Genome Atlas database with matching clinical data were analyzed in order to reveal the prognostic value of m6A RNA methylation regulators in colon adenocarcinoma. Material/Methods The expression of 13 m6A RNA methylated regulators in samples were analyzed. The samples were classified into Cluster I and II by consistent clustering. The gene distribution was analyzed by principal component analysis. Further functional analysis of selected m6A RNA genes was performed and potential risk characteristics was developed using Lasso Cox regression algorithm. Using minimum criteria, the risk coefficients of YTHDF1 and HNRNPC were detected for Cluster II. Patients were divided into high-risk and low-risk subgroups based on the risk characteristics. The clinical data were analyzed by univariate and multivariate Cox regression analysis. Results Expression of the detected m6A RNA methylated regulators except YTHDC2 in tumors were significantly different from their adjacent mucosa. Among them, only ALKBH5 and METTL4 were downregulated in tumors. The gene distribution between the 2 subgroups were different. The expression of m6A RNA methylation regulators including YTHDF1, HNRNPC, YTHDC2, YTHDC1, ZC3H13, and RBM15 were different between the 2 groups (P<0.05). The prognostic characteristics between the high-risk and low-risk groups were significant different (P<0.05), which had a good predictive significance of prognosis area under the curve (AUC)=0.62). Risk scores were less than 0.05, suggesting risk score was an independent prognostic factor for colon adenocarcinoma. Conclusions m6A RNA methylation regulators YTHDF1 and HNRNPC can be used as prognostic factors of colon cancer, which has potential value for colon cancer treatment.
Collapse
Affiliation(s)
- Tao Liu
- Department of Colorectal and Anal Surgery, The First Hospital of Jilin University, Changchun, Jilin, China (mainland)
| | - Chenyao Li
- Department of Colorectal and Anal Surgery, The First Hospital of Jilin University, Changchun, Jilin, China (mainland)
| | - Lipeng Jin
- Department of Colorectal and Anal Surgery, The First Hospital of Jilin University, Changchun, Jilin, China (mainland)
| | - Chao Li
- Department of Colorectal and Anal Surgery, The First Hospital of Jilin University, Changchun, Jilin, China (mainland)
| | - Lei Wang
- Department of Colorectal and Anal Surgery, The First Hospital of Jilin University, Changchun, Jilin, China (mainland)
| |
Collapse
|
30
|
Zhao W, Qi X, Liu L, Ma S, Liu J, Wu J. Epigenetic Regulation of m 6A Modifications in Human Cancer. MOLECULAR THERAPY-NUCLEIC ACIDS 2019; 19:405-412. [PMID: 31887551 PMCID: PMC6938965 DOI: 10.1016/j.omtn.2019.11.022] [Citation(s) in RCA: 157] [Impact Index Per Article: 31.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 11/03/2019] [Accepted: 11/22/2019] [Indexed: 01/22/2023]
Abstract
N6-methyladenosine (m6A) is the most prevalent internal RNA modification, especially within eukaryotic messenger RNAs (mRNAs). m6A modifications of RNA regulate splicing, translocation, stability, and translation into proteins. m6A modifications are catalyzed by RNA methyltransferases, such as METTL3, METTL14, and WTAP (writers); the modifications are removed by the demethylases fat mass and obesity-associated protein (FTO) and ALKBH5 (ALKB homolog 5) (erasers); and the modifications are recognized by m6A-binding proteins, such as YTHDF domain-containing proteins and IGF2BPs (readers). Abnormal changes in the m6A levels of these genes are closely related to tumor occurrence and development. In this paper, we review the role of m6A in human cancer and summarize its prospective applications in cancer.
Collapse
Affiliation(s)
- Wei Zhao
- The School and Hospital of Stomatology, Tianjin Medical University, Tianjin 300070, P.R. China
| | - Xiaoqian Qi
- The School and Hospital of Stomatology, Tianjin Medical University, Tianjin 300070, P.R. China
| | - Lina Liu
- Department of Prosthodontics, Tianjin Stomatological Hospital, Hospital of Stomatology, NanKai University, Tianjin 300041, P.R. China
| | - Shiqing Ma
- The School and Hospital of Stomatology, Tianjin Medical University, Tianjin 300070, P.R. China.
| | - Jingwen Liu
- The School and Hospital of Stomatology, Tianjin Medical University, Tianjin 300070, P.R. China.
| | - Jie Wu
- The School and Hospital of Stomatology, Tianjin Medical University, Tianjin 300070, P.R. China.
| |
Collapse
|
31
|
Chen Z, Zhao P, Li F, Wang Y, Smith AI, Webb GI, Akutsu T, Baggag A, Bensmail H, Song J. Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences. Brief Bioinform 2019; 21:1676-1696. [DOI: 10.1093/bib/bbz112] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 07/31/2019] [Accepted: 08/07/2019] [Indexed: 12/14/2022] Open
Abstract
Abstract
RNA post-transcriptional modifications play a crucial role in a myriad of biological processes and cellular functions. To date, more than 160 RNA modifications have been discovered; therefore, accurate identification of RNA-modification sites is fundamental for a better understanding of RNA-mediated biological functions and mechanisms. However, due to limitations in experimental methods, systematic identification of different types of RNA-modification sites remains a major challenge. Recently, more than 20 computational methods have been developed to identify RNA-modification sites in tandem with high-throughput experimental methods, with most of these capable of predicting only single types of RNA-modification sites. These methods show high diversity in their dataset size, data quality, core algorithms, features extracted and feature selection techniques and evaluation strategies. Therefore, there is an urgent need to revisit these methods and summarize their methodologies, in order to improve and further develop computational techniques to identify and characterize RNA-modification sites from the large amounts of sequence data. With this goal in mind, first, we provide a comprehensive survey on a large collection of 27 state-of-the-art approaches for predicting N1-methyladenosine and N6-methyladenosine sites. We cover a variety of important aspects that are crucial for the development of successful predictors, including the dataset quality, operating algorithms, sequence and genomic features, feature selection, model performance evaluation and software utility. In addition, we also provide our thoughts on potential strategies to improve the model performance. Second, we propose a computational approach called DeepPromise based on deep learning techniques for simultaneous prediction of N1-methyladenosine and N6-methyladenosine. To extract the sequence context surrounding the modification sites, three feature encodings, including enhanced nucleic acid composition, one-hot encoding, and RNA embedding, were used as the input to seven consecutive layers of convolutional neural networks (CNNs), respectively. Moreover, DeepPromise further combined the prediction score of the CNN-based models and achieved around 43% higher area under receiver-operating curve (AUROC) for m1A site prediction and 2–6% higher AUROC for m6A site prediction, respectively, when compared with several existing state-of-the-art approaches on the independent test. In-depth analyses of characteristic sequence motifs identified from the convolution-layer filters indicated that nucleotide presentation at proximal positions surrounding the modification sites contributed most to the classification, whereas those at distal positions also affected classification but to different extents. To maximize user convenience, a web server was developed as an implementation of DeepPromise and made publicly available at http://DeepPromise.erc.monash.edu/, with the server accepting both RNA sequences and genomic sequences to allow prediction of two types of putative RNA-modification sites.
Collapse
Affiliation(s)
- Zhen Chen
- School of BasicMedical Science, Qingdao University, China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
| | - Fuyi Li
- Northwest A&F University, China
| | | | - A Ian Smith
- Prince Henrys Institute Melbourne and Monash University, Australia
| | | | | | - Abdelkader Baggag
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar
| | - Halima Bensmail
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Victoria 3800, Australia
| |
Collapse
|
32
|
Zhang M, Li F, Marquez-Lago TT, Leier A, Fan C, Kwoh CK, Chou KC, Song J, Jia C. MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics 2019; 35:2957-2965. [PMID: 30649179 PMCID: PMC6736106 DOI: 10.1093/bioinformatics/btz016] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 12/09/2018] [Accepted: 01/05/2019] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Promoters are short DNA consensus sequences that are localized proximal to the transcription start sites of genes, allowing transcription initiation of particular genes. However, the precise prediction of promoters remains a challenging task because individual promoters often differ from the consensus at one or more positions. RESULTS In this study, we present a new multi-layer computational approach, called MULTiPly, for recognizing promoters and their specific types. MULTiPly took into account the sequences themselves, including both local information such as k-tuple nucleotide composition, dinucleotide-based auto covariance and global information of the entire samples based on bi-profile Bayes and k-nearest neighbour feature encodings. Specifically, the F-score feature selection method was applied to identify the best unique type of feature prediction results, in combination with other types of features that were subsequently added to further improve the prediction performance of MULTiPly. Benchmarking experiments on the benchmark dataset and comparisons with five state-of-the-art tools show that MULTiPly can achieve a better prediction performance on 5-fold cross-validation and jackknife tests. Moreover, the superiority of MULTiPly was also validated on a newly constructed independent test dataset. MULTiPly is expected to be used as a useful tool that will facilitate the discovery of both general and specific types of promoters in the post-genomic era. AVAILABILITY AND IMPLEMENTATION The MULTiPly webserver and curated datasets are freely available at http://flagshipnt.erc.monash.edu/MULTiPly/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Meng Zhang
- School of Science, Dalian Maritime University, Dalian, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Tatiana T Marquez-Lago
- Department of Genetics, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - André Leier
- Department of Genetics, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Cunshuo Fan
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | | | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian, China
- College of Information Engineering, Northwest A&F University, Yangling, China
| |
Collapse
|
33
|
Dang W, Xie Y, Cao P, Xin S, Wang J, Li S, Li Y, Lu J. N 6-Methyladenosine and Viral Infection. Front Microbiol 2019; 10:417. [PMID: 30891023 PMCID: PMC6413633 DOI: 10.3389/fmicb.2019.00417] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 02/18/2019] [Indexed: 12/12/2022] Open
Abstract
N6-methyladenosine (m6A), as a dynamic posttranscriptional RNA modification, recently gave rise to the field of viral epitranscriptomics. The interaction between virus and host is affected by m6A. Multiple m6A-modified viral RNAs have been observed. The epitranscriptome of m6A in host cells are altered after viral infection. The expression of viral genes, the replication of virus and the generation of progeny virions are influenced by m6A modifications in viral RNAs during virus infection. Meanwhile, the decorations of m6A in host mRNAs can make viral infections more likely to happen or can enhance the resistance of host to virus infection. However, the mechanism of m6A regulation in viral infection and host immune response has not been thoroughly elucidated to date. With the development of sequencing-based biotechnologies, transcriptome-wide mapping of m6A in viruses has been achieved, laying the foundation for expanding its functions and corresponding mechanisms. In this report, we summarize the positive and negative effects of m6A in distinct viral infection. Given the increasingly important roles of m6A in diverse viruses, m6A represents a novel potential target for antiviral therapy.
Collapse
Affiliation(s)
- Wei Dang
- Department of Hematology, Xiangya Hospital, Central South University, Changsha, China.,Department of Microbiology, Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, China
| | - Yan Xie
- Department of Hematology, Xiangya Hospital, Central South University, Changsha, China.,Department of Microbiology, Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, China
| | - Pengfei Cao
- Department of Hematology, Xiangya Hospital, Central South University, Changsha, China
| | - Shuyu Xin
- Department of Microbiology, Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, China
| | - Jia Wang
- Department of Hematology, Xiangya Hospital, Central South University, Changsha, China.,Department of Microbiology, Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, China
| | - Shen Li
- Department of Microbiology, Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, China
| | - Yanling Li
- Department of Microbiology, Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, China
| | - Jianhong Lu
- Department of Hematology, Xiangya Hospital, Central South University, Changsha, China.,Department of Microbiology, Cancer Research Institute, School of Basic Medical Science, Central South University, Changsha, China
| |
Collapse
|
34
|
Wei L, Su R, Wang B, Li X, Zou Q, Gao X. Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.04.082] [Citation(s) in RCA: 110] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
35
|
Zhang L, He Y, Wang H, Liu H, Huang Y, Wang X, Meng J. Clustering Count-based RNA Methylation Data Using a Nonparametric Generative Model. Curr Bioinform 2018. [DOI: 10.2174/1574893613666180601080008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
RNA methylome has been discovered as an important layer of gene regulation and
can be profiled directly with count-based measurements from high-throughput sequencing data. Although
the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation
status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation
profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data
has unique features, such as low reads coverage, which calls for novel clustering approaches.
<P><P>
Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach
clustering analysis of count-based RNA methylation sequencing data.
<P><P>
Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for
clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the
clustering effect in methylation level with the original count-based measurements rather than an estimated
continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically
determine an optimal number of clusters so as to avoid the common model selection problem in clustering
analysis.
<P><P>
Results: When tested on the simulated system, the method demonstrated improved clustering performance
over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel
RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and
WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex.
<P><P>
Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of
RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters
adaptively from the data analyzed.
<P><P>
Availability: The source code and documents of DPBBM R package are freely available through the
Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.
Collapse
Affiliation(s)
- Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Yanling He
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Huaizhi Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Yufei Huang
- Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio TX 78229, United States
| | - Xuesong Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| |
Collapse
|
36
|
Support Vector Machine Classifier for Accurate Identification of piRNA. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8112204] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Piwi-interacting RNA (piRNA) is a newly identified class of small non-coding RNAs. It can combine with PIWI proteins to regulate the transcriptional gene silencing process, heterochromatin modifications, and to maintain germline and stem cell function in animals. To better understand the function of piRNA, it is imperative to improve the accuracy of identifying piRNAs. In this study, the sequence information included the single nucleotide composition, and 16 dinucleotides compositions, six physicochemical properties in RNA, the position specificities of nucleotides both in N-terminal and C-terminal, and the proportions of the similar peptide sequence of both N-terminal and C-terminal in positive and negative samples, which were used to construct the feature vector. Then, the F-Score was applied to choose an optimal single type of features. By combining these selected features, we achieved the best results on the jackknife and the 5-fold cross-validation running 10 times based on the support vector machine algorithm. Moreover, we further evaluated the stability and robustness of our new method.
Collapse
|
37
|
Qiang X, Chen H, Ye X, Su R, Wei L. M6AMRFS: Robust Prediction of N6-Methyladenosine Sites With Sequence-Based Features in Multiple Species. Front Genet 2018; 9:495. [PMID: 30410501 PMCID: PMC6209681 DOI: 10.3389/fgene.2018.00495] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 10/04/2018] [Indexed: 12/23/2022] Open
Abstract
As one of the well-studied RNA methylation modifications, N6-methyladenosine (m6A) plays important roles in various biological progresses, such as RNA splicing and degradation, etc. Identification of m6A sites is fundamentally important for better understanding of their functional mechanisms. Recently, machine learning based prediction methods have emerged as an effective approach for fast and accurate identification of m6A sites. In this paper, we proposed "M6AMRFS", a new machine learning based predictor for the identification of m6A sites. In this predictor, we exploited a new feature representation algorithm to encode RNA sequences with two feature descriptors (dinucleotide binary encoding and Local position-specific dinucleotide frequency), and used the F-score algorithm combined with SFS (Sequential Forward Search) to enhance the feature representation ability. To predict m6A sites, we employed the eXtreme Gradient Boosting (XGBoost) algorithm to build a predictive model. Benchmarking results showed that the proposed predictor is competitive with the state-of-the art predictors. Importantly, robust predictions for multiple species by our predictor demonstrate that our predictive models have strong generalization ability. To the best of our knowledge, M6AMRFS is the first tool that can be used for the identification of m6A sites in multiple species. To facilitate the use of our predictor, we have established a user-friendly webserver with the implementation of M6AMRFS, which is currently available in http://server.malab.cn/M6AMRFS/. We anticipate that it will be a useful tool for the relevant research of m6A sites.
Collapse
Affiliation(s)
- Xiaoli Qiang
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Huangrong Chen
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Ran Su
- School of Software, Tianjin University, Tianjin, China
| | - Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| |
Collapse
|
38
|
Wei L, Chen H, Su R. M6APred-EL: A Sequence-Based Predictor for Identifying N6-methyladenosine Sites Using Ensemble Learning. MOLECULAR THERAPY-NUCLEIC ACIDS 2018; 12:635-644. [PMID: 30081234 PMCID: PMC6082921 DOI: 10.1016/j.omtn.2018.07.004] [Citation(s) in RCA: 136] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2018] [Revised: 07/03/2018] [Accepted: 07/03/2018] [Indexed: 12/28/2022]
Abstract
N6-methyladenosine (m6A) modification is the most abundant RNA methylation modification and involves various biological processes, such as RNA splicing and degradation. Recent studies have demonstrated the feasibility of identifying m6A peaks using high-throughput sequencing techniques. However, such techniques cannot accurately identify specific methylated sites, which is important for a better understanding of m6A functions. In this study, we develop a novel machine learning-based predictor called M6APred-EL for the identification of m6A sites. To predict m6A sites accurately within genomic sequences, we trained an ensemble of three support vector machine classifiers that explore the position-specific information and physical chemical information from position-specific k-mer nucleotide propensity, physical-chemical properties, and ring-function-hydrogen-chemical properties. We examined and compared the performance of our predictor with other state-of-the-art methods of benchmarking datasets. Comparative results showed that the proposed M6APred-EL performed more accurately for m6A site identification. Moreover, a user-friendly web server that implements the proposed M6APred-EL is well established and is currently available at http://server.malab.cn/M6APred-EL/. It is expected to be a practical and effective tool for the investigation of m6A functional mechanisms.
Collapse
Affiliation(s)
- Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin, China; State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, China
| | - Huangrong Chen
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Ran Su
- School of Computer Software, Tianjin University, Tianjin, China; State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, China.
| |
Collapse
|
39
|
Jia C, Yang Q, Zou Q. NucPosPred: Predicting species-specific genomic nucleosome positioning via four different modes of general PseKNC. J Theor Biol 2018; 450:15-21. [PMID: 29678692 DOI: 10.1016/j.jtbi.2018.04.025] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 04/13/2018] [Accepted: 04/16/2018] [Indexed: 11/20/2022]
Abstract
The nucleosome is the basic structure of chromatin in eukaryotic cells, with essential roles in the regulation of many biological processes, such as DNA transcription, replication and repair, and RNA splicing. Because of the importance of nucleosomes, the factors that determine their positioning within genomes should be investigated. High-resolution nucleosome-positioning maps are now available for organisms including Saccharomyces cerevisiae, Drosophila melanogaster and Caenorhabditis elegans, enabling the identification of nucleosome positioning by application of computational tools. Here, we describe a novel predictor called NucPosPred, which was specifically designed for large-scale identification of nucleosome positioning in C. elegans and D. melanogaster genomes. NucPosPred was separately optimized for each species for four types of DNA sequence feature extraction, with consideration of two classification algorithms (gradient-boosting decision tree and support vector machine). The overall accuracy obtained with NucPosPred was 92.29% for C. elegans and 88.26% for D. melanogaster, outperforming previous methods and demonstrating the potential for species-specific prediction of nucleosome positioning. For the convenience of most experimental scientists, a web-server for the predictor NucPosPred is available at http://121.42.167.206/NucPosPred/index.jsp.
Collapse
Affiliation(s)
- Cangzhi Jia
- Science of College, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China.
| | - Qing Yang
- Science of College, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China.
| |
Collapse
|
40
|
Chen X, Sun YZ, Liu H, Zhang L, Li JQ, Meng J. RNA methylation and diseases: experimental results, databases, Web servers and computational models. Brief Bioinform 2017; 20:896-917. [DOI: 10.1093/bib/bbx142] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Revised: 09/12/2017] [Indexed: 12/15/2022] Open
Affiliation(s)
- Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Ya-Zhou Sun
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Jia Meng
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University
| |
Collapse
|
41
|
Wei W, Ji X, Guo X, Ji S. Regulatory Role of N 6 -methyladenosine (m 6 A) Methylation in RNA Processing and Human Diseases. J Cell Biochem 2017; 118:2534-2543. [PMID: 28256005 DOI: 10.1002/jcb.25967] [Citation(s) in RCA: 116] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 02/28/2017] [Indexed: 12/21/2022]
Abstract
N6 -methyladenosine (m6 A) modification is an abundant and conservative RNA modification in bacterial and eukaryotic cells. m6 A modification mainly occurs in the 3' untranslated regions (UTRs) and near the stop codons of mRNA. Diverse strategies have been developed for identifying m6 A sites in single nucleotide resolution. Dynamic regulation of m6 A is found in metabolism, embryogenesis, and developmental processes, indicating a possible epigenetic regulation role along RNA processing and exerting biological functions. It has been known that m6 A editing involves in nuclear RNA export, mRNA degradation, protein translation, and RNA splicing. Deficiency of m6 A modification will lead to kinds of diseases, such as obesity, cancer, type 2 diabetes mellitus (T2DM), infertility, and developmental arrest. Some specific inhibitors against methyltransferase and demethylase have been developed to selectively regulate m6 A modification, which may be advantageous for treatment of m6 A related diseases. J. Cell. Biochem. 118: 2534-2543, 2017. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Wenqiang Wei
- Laboratory of Cell Signal Transduction, Basic Medical School of Henan University, Kaifeng, Henan, 475004, China.,Department of Medical Microbiology, Basic Medical School of Henan University, Kaifeng, Henan, 475004, China
| | - Xinying Ji
- Department of Medical Microbiology, Basic Medical School of Henan University, Kaifeng, Henan, 475004, China
| | - Xiangqian Guo
- Laboratory of Cell Signal Transduction, Basic Medical School of Henan University, Kaifeng, Henan, 475004, China
| | - Shaoping Ji
- Laboratory of Cell Signal Transduction, Basic Medical School of Henan University, Kaifeng, Henan, 475004, China.,Department of Oncology, The First Affiliated Hospital of Henan University, Kaifeng, 475001, China
| |
Collapse
|
42
|
He W, Jia C. EnhancerPred2.0: predicting enhancers and their strength based on position-specific trinucleotide propensity and electron–ion interaction potential feature selection. MOLECULAR BIOSYSTEMS 2017; 13:767-774. [DOI: 10.1039/c7mb00054e] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Enhancers arecis-acting elements that play major roles in upregulating eukaryotic gene expression by providing binding sites for transcription factors and their complexes.
Collapse
Affiliation(s)
- Wenying He
- Department of Mathematics
- Dalian Maritime University
- Dalian 116026
- China
| | - Cangzhi Jia
- Department of Mathematics
- Dalian Maritime University
- Dalian 116026
- China
| |
Collapse
|
43
|
Jia C, He W. EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features. Sci Rep 2016; 6:38741. [PMID: 27941893 PMCID: PMC5150536 DOI: 10.1038/srep38741] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 11/11/2016] [Indexed: 12/31/2022] Open
Abstract
Enhancers are cis elements that play an important role in regulating gene expression by enhancing it. Recent study of modifications revealed that enhancers are a large group of functional elements with many different subgroups, which have different biological activities and regulatory effects on target genes. As powerful auxiliary tools, several computational methods have been proposed to distinguish enhancers from other regulatory elements, but only one method has been considered to clustering them into subgroups. In this study, we developed a predictor (called EnhancerPred) to distinguish between enhancers and nonenhancers and to determine enhancers' strength. A two-step wrapper-based feature selection method was applied in high dimension feature vector from bi-profile Bayes and pseudo-nucleotide composition. Finally, the combination of 104 features from bi-profile Bayes, 1 feature from nucleotide composition and 9 features from pseudo-nucleotide composition yielded the best performance for identifying enhancers and nonenhancers, with overall Acc of 77.39%. The combination of 89 features from bi-profile Bayes and 10 features from pseudo-nucleotide composition yielded the best performance for identifying strong and weak enhancers, with overall Acc of 68.19%. The process and steps of feature optimization illustrated that it is necessary to construct a particular model for identifying strong enhancers and weak enhancers.
Collapse
Affiliation(s)
- Cangzhi Jia
- Department of Mathematics, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China
| | - Wenying He
- Department of Mathematics, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China
| |
Collapse
|