1
|
Alromema N, Suleman MT, Malebary SJ, Ahmed A, Ali Mohammed Al-Rami Al-Ghamdi B, Khan YD. Identification of 6-methyladenosine sites using novel feature encoding methods and ensemble models. Sci Rep 2024; 14:8180. [PMID: 38589431 PMCID: PMC11001897 DOI: 10.1038/s41598-024-58353-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 03/28/2024] [Indexed: 04/10/2024] Open
Abstract
N6-methyladenosine (6 mA) is the most common internal modification in eukaryotic mRNA. Mass spectrometry and site-directed mutagenesis, two of the most common conventional approaches, have been shown to be laborious and challenging. In recent years, there has been a rising interest in analyzing RNA sequences to systematically investigate mutated locations. Using novel methods for feature development, the current work aimed to identify 6 mA locations in RNA sequences. Following the generation of these novel features, they were used to train an ensemble of models using methods such as stacking, boosting, and bagging. The trained ensemble models were assessed using an independent test set and k-fold cross validation. When compared to baseline predictors, the suggested model performed better and showed improved ratings across the board for key measures of accuracy.
Collapse
Affiliation(s)
- Nashwan Alromema
- Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muhammad Taseer Suleman
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan.
- Department of Criminology and Forensic Sciences, Lahore Garrison University, Lahore, Pakistan.
| | - Sharaf J Malebary
- Department of Information Technology, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, P.O. Box 344, 21911, Rabigh, Saudi Arabia
| | - Amir Ahmed
- Department of Information Systems and Security, College of Information Technology, United Arab Emirates University, Alain, United Arab Emirates
| | | | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan
| |
Collapse
|
2
|
Wang M, Ali H, Xu Y, Xie J, Xu S. BiPSTP: Sequence feature encoding method for identifying different RNA modifications with bidirectional position-specific trinucleotides propensities. J Biol Chem 2024; 300:107140. [PMID: 38447795 PMCID: PMC10997841 DOI: 10.1016/j.jbc.2024.107140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/17/2024] [Accepted: 02/25/2024] [Indexed: 03/08/2024] Open
Abstract
RNA modification, a posttranscriptional regulatory mechanism, significantly influences RNA biogenesis and function. The accurate identification of modification sites is paramount for investigating their biological implications. Methods for encoding RNA sequence into numerical data play a crucial role in developing robust models for predicting modification sites. However, existing techniques suffer from limitations, including inadequate information representation, challenges in effectively integrating positional and sequential information, and the generation of irrelevant or redundant features when combining multiple approaches. These deficiencies hinder the effectiveness of machine learning models in addressing the performance challenges associated with predicting RNA modification sites. Here, we introduce a novel RNA sequence feature representation method, named BiPSTP, which utilizes bidirectional trinucleotide position-specific propensities. We employ the parameter ξ to denote the interval between the current nucleotide and its adjacent forward or backward dinucleotide, enabling the extraction of positional and sequential information from RNA sequences. Leveraging the BiPSTP method, we have developed the prediction model mRNAPred using support vector machine classifier to identify multiple types of RNA modification sites. We evaluate the performance of our BiPSTP method and mRNAPred model across 12 distinct RNA modification types. Our experimental results demonstrate the superiority of the mRNAPred model compared to state-of-art models in the domain of RNA modification sites identification. Importantly, our BiPSTP method enhances the robustness and generalization performance of prediction models. Notably, it can be applied to feature extraction from DNA sequences to predict other biological modification sites.
Collapse
Affiliation(s)
- Mingzhao Wang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Haider Ali
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yandi Xu
- School of Computer Science, Shaanxi Normal University, Xi'an, China; College of Life Sciences, Shaanxi Normal University, Xi'an, China
| | - Juanying Xie
- School of Computer Science, Shaanxi Normal University, Xi'an, China.
| | - Shengquan Xu
- College of Life Sciences, Shaanxi Normal University, Xi'an, China.
| |
Collapse
|
3
|
Meng Q, Schatten H, Zhou Q, Chen J. Crosstalk between m6A and coding/non-coding RNA in cancer and detection methods of m6A modification residues. Aging (Albany NY) 2023; 15:6577-6619. [PMID: 37437245 PMCID: PMC10373953 DOI: 10.18632/aging.204836] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 06/15/2023] [Indexed: 07/14/2023]
Abstract
N6-methyladenosine (m6A) is one of the most common and well-known internal RNA modifications that occur on mRNAs or ncRNAs. It affects various aspects of RNA metabolism, including splicing, stability, translocation, and translation. An abundance of evidence demonstrates that m6A plays a crucial role in various pathological and biological processes, especially in tumorigenesis and tumor progression. In this article, we introduce the potential functions of m6A regulators, including "writers" that install m6A marks, "erasers" that demethylate m6A, and "readers" that determine the fate of m6A-modified targets. We have conducted a review on the molecular functions of m6A, focusing on both coding and noncoding RNAs. Additionally, we have compiled an overview of the effects noncoding RNAs have on m6A regulators and explored the dual roles of m6A in the development and advancement of cancer. Our review also includes a detailed summary of the most advanced databases for m6A, state-of-the-art experimental and sequencing detection methods, and machine learning-based computational predictors for identifying m6A sites.
Collapse
Affiliation(s)
- Qingren Meng
- National Clinical Research Center for Infectious Diseases, Shenzhen Third People’s Hospital, The Second Hospital Affiliated with the Southern University of Science and Technology, Shenzhen, Guangdong Province, China
| | - Heide Schatten
- Department of Veterinary Pathobiology, University of Missouri, Columbia, MO 65211, USA
| | - Qian Zhou
- International Cancer Center, Shenzhen University Medical School, Shenzhen, Guangdong Province, China
| | - Jun Chen
- National Clinical Research Center for Infectious Diseases, Shenzhen Third People’s Hospital, The Second Hospital Affiliated with the Southern University of Science and Technology, Shenzhen, Guangdong Province, China
| |
Collapse
|
4
|
Yang S, Yang Z, Yang J. 4mCBERT: A computing tool for the identification of DNA N4-methylcytosine sites by sequence- and chemical-derived information based on ensemble learning strategies. Int J Biol Macromol 2023; 231:123180. [PMID: 36646347 DOI: 10.1016/j.ijbiomac.2023.123180] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 11/26/2022] [Accepted: 12/30/2022] [Indexed: 01/15/2023]
Abstract
N4-methylcytosine (4mC) is an important DNA chemical modification pattern which is a new methylation modification discovered in recent years and plays critical roles in gene expression regulation, defense against invading genetic elements, genomic imprinting, and so on. Identifying 4mC site from DNA sequence segment contributes to discovering more novel modification patterns. In this paper, we present a model called 4mCBERT that encodes DNA sequence segments by sequence characteristics including one-hot, electron-ion interaction pseudopotential, nucleotide chemical property, word2vec and chemical information containing physicochemical properties (PCP), chemical bidirectional encoder representations from transformers (chemical BERT) and employs ensemble learning framework to develop a prediction model. PCP and chemical BERT features are firstly constructed and applied to predict 4mC sites and show positive contributions to identifying 4mC. For the Matthew's Correlation Coefficient, 4mCBERT significantly outperformed other state-of-the-art models on six independent benchmark datasets including A. thaliana, C. elegans, D. melanogaster, E. coli, G. Pickering, and G. subterraneous by 4.32 % to 24.39 %, 2.52 % to 31.65 %, 2 % to 16.49 %, 6.63 % to 35.15, 8.59 % to 61.85 %, and 8.45 % to 34.45 %. Moreover, 4mCBERT is designed to allow users to predict 4mC sites and retrain 4mC prediction models. In brief, 4mCBERT shows higher performance on six benchmark datasets by incorporating sequence- and chemical-driven information and is available at http://cczubio.top/4mCBERT and https://github.com/abcair/4mCBERT.
Collapse
Affiliation(s)
- Sen Yang
- School of Computer Science and Artificial Intelligence, Aliyun School of Big Data, School of Software, Changzhou 213164, China; The Affiliated Changzhou No 2 People's Hospital of Nanjing Medical University, Changzhou 213164, China.
| | - Zexi Yang
- School of Computer Science and Artificial Intelligence, Aliyun School of Big Data, School of Software, Changzhou 213164, China
| | - Jun Yang
- School of Educational Sciences, Yili Normal University, Yining 835000, China
| |
Collapse
|
5
|
Abstract
The epitranscriptome, defined as RNA modifications that do not involve alterations in the nucleotide sequence, is a popular topic in the genomic sciences. Because we need massive computational techniques to identify epitranscriptomes within individual transcripts, many tools have been developed to infer epitranscriptomic sites as well as to process datasets using high-throughput sequencing. In this review, we summarize recent developments in epitranscriptome spatial detection and data analysis and discuss their progression.
Collapse
Affiliation(s)
- Y-H Taguchi
- Department of Physics, Chuo University, Tokyo, Japan
| |
Collapse
|
6
|
Zhang L, Ju T, Jin X, Ji J, Han J, Zhou X, Yuan Z. Network regression analysis for binary and ordinal categorical phenotypes in transcriptome-wide association studies. Genetics 2022; 222:iyac153. [PMID: 36227056 PMCID: PMC9713396 DOI: 10.1093/genetics/iyac153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/28/2022] [Indexed: 12/13/2022] Open
Abstract
Transcriptome-wide association studies aim to integrate genome-wide association studies and expression quantitative trait loci mapping studies for exploring the gene regulatory mechanisms underlying diseases. Existing transcriptome-wide association study methods primarily focus on 1 gene at a time. However, complex diseases are seldom resulted from the abnormality of a single gene, but from the biological network involving multiple genes. In addition, binary or ordinal categorical phenotypes are commonly encountered in biomedicine. We develop a proportional odds logistic model for network regression in transcriptome-wide association study, Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study, to detect the association between a network and binary or ordinal categorical phenotype. Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study relies on 2-stage transcriptome-wide association study framework. It first adopts the distribution-robust nonparametric Dirichlet process regression model in expression quantitative trait loci study to obtain the SNP effect estimate on each gene within the network. Then, Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study uses pointwise mutual information to represent the general relationship among the network nodes of predicted gene expression in genome-wide association study, followed by the association analysis with all nodes and edges involved in proportional odds logistic model. A key feature of Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study is its ability to simultaneously identify the disease-related network nodes or edges. With extensive realistic simulations including those under various between-node correlation patterns, we show Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study can provide calibrated type I error control and yield higher power than other existing methods. We finally apply Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study to analyze bipolar and major depression status and blood pressure from UK Biobank to illustrate its benefits in real data analysis.
Collapse
Affiliation(s)
- Liye Zhang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China
| | - Tao Ju
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China
| | - Xiuyuan Jin
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China
| | - Jiadong Ji
- Institute for Financial Studies, Shandong University, Jinan, Shandong 250100, China
| | - Jiayi Han
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China
| | - Xiang Zhou
- Department of Biostatistics, The University of Michigan, Ann Arbor, MI 48109, USA
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China
| |
Collapse
|
7
|
N(6)-methyladenosine modification: A vital role of programmed cell death in myocardial ischemia/reperfusion injury. Int J Cardiol 2022; 367:11-19. [PMID: 36002042 DOI: 10.1016/j.ijcard.2022.08.042] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 07/08/2022] [Accepted: 08/19/2022] [Indexed: 11/20/2022]
Abstract
N(6)-methyladenosine (m6A) modification is closely associated with myocardial ischemia/reperfusion injury (MIRI). As the most common modification among RNA modifications, the reversible m6A modification is processed by methylase ("writers") and demethylase ("erasers"). The biological effects of RNA modified by m6A are regulated under the corresponding RNA binding proteins (RBPs) ("readers"). m6A modification regulates the whole process of RNA, including transcription, processing, splicing, nuclear export, stability, degradation, and translation. Programmed cell death (PCD) is a regulated mechanism that maintains the internal environment's stability. PCD plays an essential role in MIRI, including apoptosis, autophagy, pyroptosis, ferroptosis, and necroptosis. However, the relationship between PCD modified with m6A and MIRI is still not clear. This review summarizes the regulators of m6A modification and their bioeffects on PCD in MIRI.
Collapse
|
8
|
PSP-PJMI: An innovative feature representation algorithm for identifying DNA N4-methylcytosine sites. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.05.060] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|