Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rehman MU, Tayara H, Chong KT. DL-m6A: Identification of N6-Methyladenosine Sites in Mammals Using Deep Learning Based on Different Encoding Schemes. IEEE/ACM Trans Comput Biol Bioinform 2023;20:904-911. [PMID: 35857733 DOI: 10.1109/tcbb.2022.3192572] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

For:	Rehman MU, Tayara H, Chong KT. DL-m6A: Identification of N6-Methyladenosine Sites in Mammals Using Deep Learning Based on Different Encoding Schemes. IEEE/ACM Trans Comput Biol Bioinform 2023;20:904-911. [PMID: 35857733 DOI: 10.1109/tcbb.2022.3192572] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Number

Cited by Other Article(s)

Li M, Li R, Zhang Y, Peng S, Lv Z. Using statistical analysis to explore the influencing factors of data imbalance for machine learning identification methods of human transcriptome m6A modification sites. Comput Biol Chem 2025;115:108351. [PMID: 39837162 DOI: 10.1016/j.compbiolchem.2025.108351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2024] [Revised: 12/12/2024] [Accepted: 01/09/2025] [Indexed: 01/23/2025]

Abstract

RNA methylation, particularly through m6A modification, represents a crucial epigenetic mechanism that governs gene expression and influences a range of biological functions. Accurate identification of methylation sites is crucial for understanding their biological functions. Traditional experimental methods, however, are often costly and can be influenced by experimental conditions, making machine learning, especially deep learning techniques, a vital tool for m6A site identification. Despite their utility, current machine learning models struggle with unbalanced datasets, a common issue in bioinformatics. This study addresses the RNA methylation site data imbalance problem from three key perspectives: feature encoding representation, deep learning models, and data resampling strategies. Using the K-mer one-hot encoding strategy, we effectively extracted RNA sequence features and developed classification prediction models utilizing long short-term memory networks (LSTM) and its variant, Multiplicative LSTM (mLSTM). We further enhanced model performance by ensemble and weighted strategy models. Additionally, we utilized the sequence generative adversarial network (SeqGAN) and the synthetic minority resampling technique (SMOTE) to construct balanced datasets for RNA methylation sites. The prediction results were rigorously analyzed using the Wilcoxon test and multivariate linear regression to explore the effects of different K-mer values, model architectures, and sampling methods on classification outcomes. The analysis underscored the significant impact of feature selection, model architecture, and sampling techniques in addressing data imbalance. Notably, the optimal prediction performance was achieved with a K value of 5 using the mLSTM-ensemble model. These findings not only offer new insights and methodologies for RNA methylation site identification but also provide valuable guidance for addressing similar challenges in bioinformatics.

Collapse

Li G, Zhao B, Su X, Yang Y, Zeng Z, Hu P, Hu L. Capturing short-range and long-range dependencies of nucleotides for identifying RNA N6-methyladenosine modification sites. Comput Biol Med 2025;186:109625. [PMID: 39756188 DOI: 10.1016/j.compbiomed.2024.109625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 11/17/2024] [Accepted: 12/23/2024] [Indexed: 01/07/2025]

Yu Y, Xiang S, Wu M. Injecting structure-aware insights for the learning of RNA sequence representations to identify m6A modification sites. PeerJ 2025;13:e18878. [PMID: 40017651 PMCID: PMC11867033 DOI: 10.7717/peerj.18878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Accepted: 12/28/2024] [Indexed: 03/01/2025] Open

Abstract

N6-methyladenosine (m6A) represents one of the most prevalent methylation modifications in eukaryotes and it is crucial to accurately identify its modification sites on RNA sequences. Traditional machine learning based approaches to m6A modification site identification primarily focus on RNA sequence data but often incorporate additional biological domain knowledge and rely on manually crafted features. These methods typically overlook the structural insights inherent in RNA sequences. To address this limitation, we propose M6A-SAI, an advanced predictor for RNA m6A modifications. M6A-SAI leverages a transformer-based deep learning framework to integrate structure-aware insights into sequence representation learning, thereby enhancing the precision of m6A modification site identification. The core innovation of M6A-SAI lies in its ability to incorporate structural information through a multi-step process: initially, the model utilizes a Transformer encoder to learn RNA sequence representations. It then constructs a similarity graph based on Manhattan distance to capture sequence correlations. To address the limitations of the smooth similarity graph, M6A-SAI integrates a structure-aware optimization block, which refines the graph by defining anchor sets and generating an awareness graph through PageRank. Following this, M6A-SAI employs a self-correlation fusion graph convolution framework to merge information from both the similarity and awareness graphs, thus producing enriched sequence representations. Finally, a support vector machine is utilized for classifying these representations. Experimental results validate that M6A-SAI substantially improves the recognition of m6A modification sites by incorporating structure-aware insights, demonstrating its efficacy as a robust method for identifying RNA m6A modification sites.

Collapse

Park S, To Chong K, Tayara H. CpGFuse: a holistic approach for accurate identification of methylation states of DNA CpG sites. Brief Bioinform 2024;26:bbaf063. [PMID: 39968737 PMCID: PMC11836533 DOI: 10.1093/bib/bbaf063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 12/27/2024] [Accepted: 02/07/2025] [Indexed: 02/20/2025] Open

Liu M, Sun ZL, Zeng Z, Lam KM. Multi-kernel feature extraction with dynamic fusion and downsampled residual feature embedding for predicting rice RNA N6-methyladenine sites. Brief Bioinform 2024;26:bbae647. [PMID: 39674264 DOI: 10.1093/bib/bbae647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 11/15/2024] [Accepted: 11/30/2024] [Indexed: 12/16/2024] Open

Abstract

RNA N$^{6}$-methyladenosine (m$^{6}$A) is a critical epigenetic modification closely related to rice growth, development, and stress response. m$^{6}$A accurate identification, directly related to precision rice breeding and improvement, is fundamental to revealing phenotype regulatory and molecular mechanisms. Faced on rice m$^{6}$A variable-length sequence, to input into the model, the maximum length padding and label encoding usually adapt to obtain the max-length padded sequence for prediction. Although this can retain complete sequence information, resulting in sparse information and invalid padding, reducing feature extraction accuracy. Simultaneously, existing rice-specific m$^{6}$A prediction methods are still at an early stage. To address these issues, we develop a new end-to-end deep learning framework, MFDm$^{6}$ARice, for predicting rice m$^{6}$A sites. In particular, to alleviate sparseness, we construct a multi-kernel feature fusion module to mine essential information in max-length padded sequences by multi-kernel feature extraction function and effectively transfer information through global-local dynamic fusion function. Concurrently, considering the complexity and computational efficiency of high-dimensional features caused by invalid padding, we design a downsampling residual feature embedding module to optimize feature space compression and achieve accurate feature expression and efficient computational performance. Experiments show that MFDm$^{6}$ARice outperforms comparison methods in cross-validation, same- and cross-species independent test sets, demonstrating good robustness and generalization. The application on maize m$^{6}$A indicates the MFDm$^{6}$ARice's scalability. Further investigations have shown that combining different kernel features, focusing on global channel-local spatial, and employing reasonable downsampling and residual connections can improve feature representation and extraction, ensure effective information transfer, and significantly enhance model performance.

Collapse

Abbas Z, Rehman MU, Tayara H, Lee SW, Chong KT. m5C-Seq: Machine learning-enhanced profiling of RNA 5-methylcytosine modifications. Comput Biol Med 2024;182:109087. [PMID: 39232403 DOI: 10.1016/j.compbiomed.2024.109087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 08/13/2024] [Accepted: 08/29/2024] [Indexed: 09/06/2024]

Mir BA, Tayara H, Chong KT. SB-Net: Synergizing CNN and LSTM networks for uncovering retrosynthetic pathways in organic synthesis. Comput Biol Chem 2024;112:108130. [PMID: 38954849 DOI: 10.1016/j.compbiolchem.2024.108130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 05/17/2024] [Accepted: 06/12/2024] [Indexed: 07/04/2024]

Zahid MU, Nisar MD, Fazil A, Ryu J, Shah MH. Composite Ensemble Learning Framework for Passive Drone Radio Frequency Fingerprinting in Sixth-Generation Networks. SENSORS (BASEL, SWITZERLAND) 2024;24:5618. [PMID: 39275529 PMCID: PMC11397939 DOI: 10.3390/s24175618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 08/09/2024] [Accepted: 08/28/2024] [Indexed: 09/16/2024]

Hassan MT, Tayara H, Chong KT. NaII-Pred: An ensemble-learning framework for the identification and interpretation of sodium ion inhibitors by fusing multiple feature representation. Comput Biol Med 2024;178:108737. [PMID: 38879934 DOI: 10.1016/j.compbiomed.2024.108737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/21/2024] [Accepted: 06/08/2024] [Indexed: 06/18/2024]

Zhang T, Gao S, Zhang SW, Cui XD. m⁶Aexpress-enet: Predicting the regulatory expression m⁶A sites by an enet-regularization negative binomial regression model. Methods 2024;226:61-70. [PMID: 38631404 DOI: 10.1016/j.ymeth.2024.04.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 04/04/2024] [Accepted: 04/10/2024] [Indexed: 04/19/2024] Open

Wang M, Ali H, Xu Y, Xie J, Xu S. BiPSTP: Sequence feature encoding method for identifying different RNA modifications with bidirectional position-specific trinucleotides propensities. J Biol Chem 2024;300:107140. [PMID: 38447795 PMCID: PMC10997841 DOI: 10.1016/j.jbc.2024.107140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/17/2024] [Accepted: 02/25/2024] [Indexed: 03/08/2024] Open

Abstract

RNA modification, a posttranscriptional regulatory mechanism, significantly influences RNA biogenesis and function. The accurate identification of modification sites is paramount for investigating their biological implications. Methods for encoding RNA sequence into numerical data play a crucial role in developing robust models for predicting modification sites. However, existing techniques suffer from limitations, including inadequate information representation, challenges in effectively integrating positional and sequential information, and the generation of irrelevant or redundant features when combining multiple approaches. These deficiencies hinder the effectiveness of machine learning models in addressing the performance challenges associated with predicting RNA modification sites. Here, we introduce a novel RNA sequence feature representation method, named BiPSTP, which utilizes bidirectional trinucleotide position-specific propensities. We employ the parameter ξ to denote the interval between the current nucleotide and its adjacent forward or backward dinucleotide, enabling the extraction of positional and sequential information from RNA sequences. Leveraging the BiPSTP method, we have developed the prediction model mRNAPred using support vector machine classifier to identify multiple types of RNA modification sites. We evaluate the performance of our BiPSTP method and mRNAPred model across 12 distinct RNA modification types. Our experimental results demonstrate the superiority of the mRNAPred model compared to state-of-art models in the domain of RNA modification sites identification. Importantly, our BiPSTP method enhances the robustness and generalization performance of prediction models. Notably, it can be applied to feature extraction from DNA sequences to predict other biological modification sites.

Collapse

Wang L, Zhou Y. MRM-BERT: a novel deep neural network predictor of multiple RNA modifications by fusing BERT representation and sequence features. RNA Biol 2024;21:1-10. [PMID: 38357904 PMCID: PMC10877979 DOI: 10.1080/15476286.2024.2315384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 12/26/2023] [Accepted: 02/02/2024] [Indexed: 02/16/2024] Open

Venkatesan VK, Kuppusamy Murugesan KR, Chandrasekaran KA, Thyluru Ramakrishna M, Khan SB, Almusharraf A, Albuali A. Cancer Diagnosis through Contour Visualization of Gene Expression Leveraging Deep Learning Techniques. Diagnostics (Basel) 2023;13:3452. [PMID: 37998588 PMCID: PMC10670706 DOI: 10.3390/diagnostics13223452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 10/30/2023] [Accepted: 11/04/2023] [Indexed: 11/25/2023] Open

Hossain PS, Kim K, Uddin J, Samad MA, Choi K. Enhancing Taxonomic Categorization of DNA Sequences with Deep Learning: A Multi-Label Approach. Bioengineering (Basel) 2023;10:1293. [PMID: 38002417 PMCID: PMC10669241 DOI: 10.3390/bioengineering10111293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 11/02/2023] [Accepted: 11/05/2023] [Indexed: 11/26/2023] Open

Jia J, Wei Z, Sun M. EMDL_m6Am: identifying N6,2'-O-dimethyladenosine sites based on stacking ensemble deep learning. BMC Bioinformatics 2023;24:397. [PMID: 37880673 PMCID: PMC10598967 DOI: 10.1186/s12859-023-05543-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 10/20/2023] [Indexed: 10/27/2023] Open

Park S, Rehman MU, Ullah F, Tayara H, Chong KT. iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data. Bioinformatics 2023;39:btad474. [PMID: 37555812 PMCID: PMC10444964 DOI: 10.1093/bioinformatics/btad474] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 05/11/2023] [Accepted: 08/08/2023] [Indexed: 08/10/2023] Open

Abstract

MOTIVATION

The investigation of DNA methylation can shed light on the processes underlying human well-being and help determine overall human health. However, insufficient coverage makes it challenging to implement single-stranded DNA methylation sequencing technologies, highlighting the need for an efficient prediction model. Models are required to create an understanding of the underlying biological systems and to project single-cell (methylated) data accurately.

RESULTS

In this study, we developed positional features for predicting CpG sites. Positional characteristics of the sequence are derived using data from CpG regions and the separation between nearby CpG sites. Multiple optimized classifiers and different ensemble learning approaches are evaluated. The OPTUNA framework is used to optimize the algorithms. The CatBoost algorithm followed by the stacking algorithm outperformed existing DNA methylation identifiers.

AVAILABILITY AND IMPLEMENTATION

The data and methodologies used in this study are openly accessible to the research community. Researchers can access the positional features and algorithms used for predicting CpG site methylation patterns. To achieve superior performance, we employed the CatBoost algorithm followed by the stacking algorithm, which outperformed existing DNA methylation identifiers. The proposed iCpG-Pos approach utilizes only positional features, resulting in a substantial reduction in computational complexity compared to other known approaches for detecting CpG site methylation patterns. In conclusion, our study introduces a novel approach, iCpG-Pos, for predicting CpG site methylation patterns. By focusing on positional features, our model offers both accuracy and efficiency, making it a promising tool for advancing DNA methylation research and its applications in human health and well-being.

Collapse

Ahmad W, Tayara H, Chong KT. Attention-Based Graph Neural Network for Molecular Solubility Prediction. ACS OMEGA 2023;8:3236-3244. [PMID: 36713733 PMCID: PMC9878542 DOI: 10.1021/acsomega.2c06702] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 12/23/2022] [Indexed: 06/18/2023]

Zou J, Liu H, Tan W, Chen YQ, Dong J, Bai SY, Wu ZX, Zeng Y. Dynamic regulation and key roles of ribonucleic acid methylation. Front Cell Neurosci 2022;16:1058083. [PMID: 36601431 PMCID: PMC9806184 DOI: 10.3389/fncel.2022.1058083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open

Affiliation(s)

Jia Zou Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
Hui Liu Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
Wei Tan Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
Yi-qi Chen Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
Jing Dong Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
Shu-yuan Bai Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
Zhao-xia Wu Community Health Service Center, Wuchang Hospital, Wuhan, China
Yan Zeng Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China,School of Public Health, Wuhan University of Science and Technology, Wuhan, China,*Correspondence: Yan Zeng,

Collapse

Jaganathan K, Rehman MU, Tayara H, Chong KT. XML-CIMT: Explainable Machine Learning (XML) Model for Predicting Chemical-Induced Mitochondrial Toxicity. Int J Mol Sci 2022;23:ijms232415655. [PMID: 36555297 PMCID: PMC9779353 DOI: 10.3390/ijms232415655] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 12/06/2022] [Accepted: 12/06/2022] [Indexed: 12/14/2022] Open