1
|
Li G, Zhao B, Su X, Yang Y, Zeng Z, Hu P, Hu L. Capturing short-range and long-range dependencies of nucleotides for identifying RNA N6-methyladenosine modification sites. Comput Biol Med 2025; 186:109625. [PMID: 39756188 DOI: 10.1016/j.compbiomed.2024.109625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 11/17/2024] [Accepted: 12/23/2024] [Indexed: 01/07/2025]
Abstract
N6-methyladenosine (m6A) plays a crucial role in enriching RNA functional and genetic information, and the identification of m6A modification sites is therefore an important task to promote the understanding of RNA epigenetics. In the identification process, current studies are mainly concentrated on capturing the short-range dependencies between adjacent nucleotides in RNA sequences, while ignoring the impact of long-range dependencies between non-adjacent nucleotides for learning high-quality representation of RNA sequences. In this work, we propose an end-to-end prediction model, called m6ASLD, to improve the identification accuracy of m6A modification sites by capturing the short-range and long-range dependencies of nucleotides. Specifically, m6ASLD first encodes the type and position information of nucleotides to construct the initial embeddings of RNA sequences. A self-correlation map is then generated to characterize both short-range and long-range dependencies with a designed map generating block for each RNA sequence. After that, m6ASLD learns the global and local representations of RNA sequences by using a graph convolution process and a designed dependency searching block respectively, and finally achieves its identification task under a joint training scheme. Extensive experiments have demonstrated the promising performance of m6ASLD on 11 benchmark datasets across several evaluation metrics.
Collapse
Affiliation(s)
- Guodong Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Bowei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Xiaorui Su
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Yue Yang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Zhi Zeng
- College of Computer Science and Technology, Xi'an Jiaotong University, 710049, Xi'an, China.
| | - Pengwei Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| |
Collapse
|
2
|
Liu M, Sun ZL, Zeng Z, Lam KM. Multi-kernel feature extraction with dynamic fusion and downsampled residual feature embedding for predicting rice RNA N6-methyladenine sites. Brief Bioinform 2024; 26:bbae647. [PMID: 39674264 DOI: 10.1093/bib/bbae647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 11/15/2024] [Accepted: 11/30/2024] [Indexed: 12/16/2024] Open
Abstract
RNA N$^{6}$-methyladenosine (m$^{6}$A) is a critical epigenetic modification closely related to rice growth, development, and stress response. m$^{6}$A accurate identification, directly related to precision rice breeding and improvement, is fundamental to revealing phenotype regulatory and molecular mechanisms. Faced on rice m$^{6}$A variable-length sequence, to input into the model, the maximum length padding and label encoding usually adapt to obtain the max-length padded sequence for prediction. Although this can retain complete sequence information, resulting in sparse information and invalid padding, reducing feature extraction accuracy. Simultaneously, existing rice-specific m$^{6}$A prediction methods are still at an early stage. To address these issues, we develop a new end-to-end deep learning framework, MFDm$^{6}$ARice, for predicting rice m$^{6}$A sites. In particular, to alleviate sparseness, we construct a multi-kernel feature fusion module to mine essential information in max-length padded sequences by multi-kernel feature extraction function and effectively transfer information through global-local dynamic fusion function. Concurrently, considering the complexity and computational efficiency of high-dimensional features caused by invalid padding, we design a downsampling residual feature embedding module to optimize feature space compression and achieve accurate feature expression and efficient computational performance. Experiments show that MFDm$^{6}$ARice outperforms comparison methods in cross-validation, same- and cross-species independent test sets, demonstrating good robustness and generalization. The application on maize m$^{6}$A indicates the MFDm$^{6}$ARice's scalability. Further investigations have shown that combining different kernel features, focusing on global channel-local spatial, and employing reasonable downsampling and residual connections can improve feature representation and extraction, ensure effective information transfer, and significantly enhance model performance.
Collapse
Affiliation(s)
- Mengya Liu
- School of Computer Science and Technology, Anhui University, Hefei 230601, China
| | - Zhan-Li Sun
- School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China
| | - Zhigang Zeng
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Kin-Man Lam
- Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong, China
| |
Collapse
|
3
|
Abbas Z, Rehman MU, Tayara H, Lee SW, Chong KT. m5C-Seq: Machine learning-enhanced profiling of RNA 5-methylcytosine modifications. Comput Biol Med 2024; 182:109087. [PMID: 39232403 DOI: 10.1016/j.compbiomed.2024.109087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 08/13/2024] [Accepted: 08/29/2024] [Indexed: 09/06/2024]
Abstract
Epigenetic modifications, particularly RNA methylation and histone alterations, play a crucial role in heredity, development, and disease. Among these, RNA 5-methylcytosine (m5C) is the most prevalent RNA modification in mammalian cells, essential for processes such as ribosome synthesis, translational fidelity, mRNA nuclear export, turnover, and translation. The increasing volume of nucleotide sequences has led to the development of machine learning-based predictors for m5C site prediction. However, these predictors often face challenges related to training data limitations and overfitting due to insufficient external validation. This study introduces m5C-Seq, an ensemble learning approach for RNA modification profiling, designed to address these issues. m5C-Seq employs a meta-classifier that integrates 15 probabilities generated from a novel, large dataset using systematic encoding methods to make final predictions. Demonstrating superior performance compared to existing predictors, m5C-Seq represents a significant advancement in accurate RNA modification profiling. The code and the newly established datasets are made available through GitHub at https://github.com/Z-Abbas/m5C-Seq.
Collapse
Affiliation(s)
- Zeeshan Abbas
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, South Korea
| | - Mobeen Ur Rehman
- Khalifa University Center for Autonomous Robotic Systems (KUCARS), Khalifa University, United Arab Emirates
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea
| | - Seung Won Lee
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea; Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea.
| |
Collapse
|
4
|
Mir BA, Tayara H, Chong KT. SB-Net: Synergizing CNN and LSTM networks for uncovering retrosynthetic pathways in organic synthesis. Comput Biol Chem 2024; 112:108130. [PMID: 38954849 DOI: 10.1016/j.compbiolchem.2024.108130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 05/17/2024] [Accepted: 06/12/2024] [Indexed: 07/04/2024]
Abstract
Retrosynthesis is vital in synthesizing target products, guiding reaction pathway design crucial for drug and material discovery. Current models often neglect multi-scale feature extraction, limiting efficacy in leveraging molecular descriptors. Our proposed SB-Net model, a deep-learning architecture tailored for retrosynthesis prediction, addresses this gap. SB-Net combines CNN and Bi-LSTM architectures, excelling in capturing multi-scale molecular features. It integrates parallel branches for processing one-hot encoded descriptors and ECFP, merging through dense layers. Experimental results demonstrate SB-Net's superiority, achieving 73.6 % top-1 and 94.6 % top-10 accuracy on USPTO-50k data. Versatility is validated on MetaNetX, with rates of 52.8 % top-1, 74.3 % top-3, 79.8 % top-5, and 83.5 % top-10. SB-Net's success in bioretrosynthesis prediction tasks indicates its efficacy. This research advances computational chemistry, offering a robust deep-learning model for retrosynthesis prediction. With implications for drug discovery and synthesis planning, SB-Net promises innovative and efficient pathways.
Collapse
Affiliation(s)
- Bilal Ahmad Mir
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea; Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea.
| |
Collapse
|
5
|
Zahid MU, Nisar MD, Fazil A, Ryu J, Shah MH. Composite Ensemble Learning Framework for Passive Drone Radio Frequency Fingerprinting in Sixth-Generation Networks. SENSORS (BASEL, SWITZERLAND) 2024; 24:5618. [PMID: 39275529 PMCID: PMC11397939 DOI: 10.3390/s24175618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 08/09/2024] [Accepted: 08/28/2024] [Indexed: 09/16/2024]
Abstract
The rapid evolution of drone technology has introduced unprecedented challenges in security, particularly concerning the threat of unconventional drone and swarm attacks. In order to deal with threats, drones need to be classified by intercepting their Radio Frequency (RF) signals. With the arrival of Sixth Generation (6G) networks, it is required to develop sophisticated methods to properly categorize drone signals in order to achieve optimal resource sharing, high-security levels, and mobility management. However, deep ensemble learning has not been investigated properly in the case of 6G. It is anticipated that it will incorporate drone-based BTS and cellular networks that, in one way or another, may be subjected to jamming, intentional interferences, or other dangers from unauthorized UAVs. Thus, this study is conducted based on Radio Frequency Fingerprinting (RFF) of drones identified to detect unauthorized ones so that proper actions can be taken to protect the network's security and integrity. This paper proposes a novel method-a Composite Ensemble Learning (CEL)-based neural network-for drone signal classification. The proposed method integrates wavelet-based denoising and combines automatic and manual feature extraction techniques to foster feature diversity, robustness, and performance enhancement. Through extensive experiments conducted on open-source benchmark datasets of drones, our approach demonstrates superior classification accuracies compared to recent benchmark deep learning techniques across various Signal-to-Noise Ratios (SNRs). This novel approach holds promise for enhancing communication efficiency, security, and safety in 6G networks amidst the proliferation of drone-based applications.
Collapse
Affiliation(s)
- Muhammad Usama Zahid
- Electrical and Computer Engineering Department, Sir Syed CASE Institute of Technology, Islamabad 04524, Pakistan
| | - Muhammad Danish Nisar
- Electrical and Computer Engineering Department, Sir Syed CASE Institute of Technology, Islamabad 04524, Pakistan
| | - Adnan Fazil
- Department of Avionics Engineering, Air University, E-9, Islamabad 44230, Pakistan
| | - Jihyoung Ryu
- Electronics and Telecommunications Research Institute (ETRI), Gwangju 61012, Republic of Korea
| | - Maqsood Hussain Shah
- SFI Insight Centre for Data Analytics and the School of Electronic Engineering, Dublin City University, D09 V209 Dublin, Ireland
| |
Collapse
|
6
|
Hassan MT, Tayara H, Chong KT. NaII-Pred: An ensemble-learning framework for the identification and interpretation of sodium ion inhibitors by fusing multiple feature representation. Comput Biol Med 2024; 178:108737. [PMID: 38879934 DOI: 10.1016/j.compbiomed.2024.108737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/21/2024] [Accepted: 06/08/2024] [Indexed: 06/18/2024]
Abstract
High-affinity ligand peptides for ion channels are essential for controlling the flow of ions across the plasma membrane. These peptides are now being investigated as possible therapeutic possibilities for a variety of illnesses, including cancer and cardiovascular disease. So, the identification and interpretation of ligand peptide inhibitors to control ion flow across cells become pivotal for exploration. In this work, we developed an ensemble-based model, NaII-Pred, for the identification of sodium ion inhibitors. The ensemble model was trained, tested, and evaluated on three different datasets. The NaII-Pred method employs six different descriptors and a hybrid feature set in conjunction with five conventional machine learning classifiers to create 35 baseline models. Through an ensemble approach, the top five baseline models trained on the hybrid feature set were integrated to yield the final predictive model, NaII-Pred. Our proposed model, NaII-Pred, outperforms the baseline models and the current predictors on both datasets. We believe NaII-Pred will play a critical role in screening and identifying potential sodium ion inhibitors and will be an invaluable tool.
Collapse
Affiliation(s)
- Mir Tanveerul Hassan
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea; Advances Electronics and Information Research Centre, Jeonbuk National University, Jeonju, 54896, South Korea.
| |
Collapse
|
7
|
Zhang T, Gao S, Zhang SW, Cui XD. m 6Aexpress-enet: Predicting the regulatory expression m 6A sites by an enet-regularization negative binomial regression model. Methods 2024; 226:61-70. [PMID: 38631404 DOI: 10.1016/j.ymeth.2024.04.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 04/04/2024] [Accepted: 04/10/2024] [Indexed: 04/19/2024] Open
Abstract
As the most abundant mRNA modification, m6A controls and influences many aspects of mRNA metabolism including the mRNA stability and degradation. However, the role of specific m6A sites in regulating gene expression still remains unclear. In additional, the multicollinearity problem caused by the correlation of methylation level of multiple m6A sites in each gene could influence the prediction performance. To address the above challenges, we propose an elastic-net regularized negative binomial regression model (called m6Aexpress-enet) to predict which m6A site could potentially regulate its gene expression. Comprehensive evaluations on simulated datasets demonstrate that m6Aexpress-enet could achieve the top prediction performance. Applying m6Aexpress-enet on real MeRIP-seq data from human lymphoblastoid cell lines, we have uncovered the complex regulatory pattern of predicted m6A sites and their unique enrichment pathway of the constructed co-methylation modules. m6Aexpress-enet proves itself as a powerful tool to enable biologists to discover the mechanism of m6A regulatory gene expression. Furthermore, the source code and the step-by-step implementation of m6Aexpress-enet is freely accessed at https://github.com/tengzhangs/m6Aexpress-enet.
Collapse
Affiliation(s)
- Teng Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710027 Shaanxi, China; School of Computer, Jiangsu University of Science and Technology, ZhenJiang, 212100 JiangSu, China
| | - Shang Gao
- School of Computer, Jiangsu University of Science and Technology, ZhenJiang, 212100 JiangSu, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710027 Shaanxi, China.
| | - Xiao-Dong Cui
- School of Marine Science and Technology Northwestern Polytechnical University, Xi'an, 710027 Shaanxi, China.
| |
Collapse
|
8
|
Wang M, Ali H, Xu Y, Xie J, Xu S. BiPSTP: Sequence feature encoding method for identifying different RNA modifications with bidirectional position-specific trinucleotides propensities. J Biol Chem 2024; 300:107140. [PMID: 38447795 PMCID: PMC10997841 DOI: 10.1016/j.jbc.2024.107140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/17/2024] [Accepted: 02/25/2024] [Indexed: 03/08/2024] Open
Abstract
RNA modification, a posttranscriptional regulatory mechanism, significantly influences RNA biogenesis and function. The accurate identification of modification sites is paramount for investigating their biological implications. Methods for encoding RNA sequence into numerical data play a crucial role in developing robust models for predicting modification sites. However, existing techniques suffer from limitations, including inadequate information representation, challenges in effectively integrating positional and sequential information, and the generation of irrelevant or redundant features when combining multiple approaches. These deficiencies hinder the effectiveness of machine learning models in addressing the performance challenges associated with predicting RNA modification sites. Here, we introduce a novel RNA sequence feature representation method, named BiPSTP, which utilizes bidirectional trinucleotide position-specific propensities. We employ the parameter ξ to denote the interval between the current nucleotide and its adjacent forward or backward dinucleotide, enabling the extraction of positional and sequential information from RNA sequences. Leveraging the BiPSTP method, we have developed the prediction model mRNAPred using support vector machine classifier to identify multiple types of RNA modification sites. We evaluate the performance of our BiPSTP method and mRNAPred model across 12 distinct RNA modification types. Our experimental results demonstrate the superiority of the mRNAPred model compared to state-of-art models in the domain of RNA modification sites identification. Importantly, our BiPSTP method enhances the robustness and generalization performance of prediction models. Notably, it can be applied to feature extraction from DNA sequences to predict other biological modification sites.
Collapse
Affiliation(s)
- Mingzhao Wang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Haider Ali
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yandi Xu
- School of Computer Science, Shaanxi Normal University, Xi'an, China; College of Life Sciences, Shaanxi Normal University, Xi'an, China
| | - Juanying Xie
- School of Computer Science, Shaanxi Normal University, Xi'an, China.
| | - Shengquan Xu
- College of Life Sciences, Shaanxi Normal University, Xi'an, China.
| |
Collapse
|
9
|
Wang L, Zhou Y. MRM-BERT: a novel deep neural network predictor of multiple RNA modifications by fusing BERT representation and sequence features. RNA Biol 2024; 21:1-10. [PMID: 38357904 PMCID: PMC10877979 DOI: 10.1080/15476286.2024.2315384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 12/26/2023] [Accepted: 02/02/2024] [Indexed: 02/16/2024] Open
Abstract
RNA modifications play crucial roles in various biological processes and diseases. Accurate prediction of RNA modification sites is essential for understanding their functions. In this study, we propose a hybrid approach that fuses a pre-trained sequence representation with various sequence features to predict multiple types of RNA modifications in one combined prediction framework. We developed MRM-BERT, a deep learning method that combined the pre-trained DNABERT deep sequence representation module and the convolutional neural network (CNN) exploiting four traditional sequence feature encodings to improve the prediction performance. MRM-BERT was evaluated on multiple datasets of 12 commonly occurring RNA modifications, including m6A, m5C, m1A and so on. The results demonstrate that our hybrid model outperforms other models in terms of area under receiver operating characteristic curve (AUC) for all 12 types of RNA modifications. MRM-BERT is available as an online tool (http://117.122.208.21:8501) or source code (https://github.com/abhhba999/MRM-BERT), which allows users to predict RNA modification sites and visualize the results. Overall, our study provides an effective and efficient approach to predict multiple RNA modifications, contributing to the understanding of RNA biology and the development of therapeutic strategies.
Collapse
Affiliation(s)
- Linshu Wang
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Yuan Zhou
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing, China
- Department of Biomedical Informatics, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing, China
| |
Collapse
|
10
|
Venkatesan VK, Kuppusamy Murugesan KR, Chandrasekaran KA, Thyluru Ramakrishna M, Khan SB, Almusharraf A, Albuali A. Cancer Diagnosis through Contour Visualization of Gene Expression Leveraging Deep Learning Techniques. Diagnostics (Basel) 2023; 13:3452. [PMID: 37998588 PMCID: PMC10670706 DOI: 10.3390/diagnostics13223452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 10/30/2023] [Accepted: 11/04/2023] [Indexed: 11/25/2023] Open
Abstract
Prompt diagnostics and appropriate cancer therapy necessitate the use of gene expression databases. The integration of analytical methods can enhance detection precision by capturing intricate patterns and subtle connections in the data. This study proposes a diagnostic-integrated approach combining Empirical Bayes Harmonization (EBS), Jensen-Shannon Divergence (JSD), deep learning, and contour mathematics for cancer detection using gene expression data. EBS preprocesses the gene expression data, while JSD measures the distributional differences between cancerous and non-cancerous samples, providing invaluable insights into gene expression patterns. Deep learning (DL) models are employed for automatic deep feature extraction and to discern complex patterns from the data. Contour mathematics is applied to visualize decision boundaries and regions in the high-dimensional feature space. JSD imparts significant information to the deep learning model, directing it to concentrate on pertinent features associated with cancerous samples. Contour visualization elucidates the model's decision-making process, bolstering interpretability. The amalgamation of JSD, deep learning, and contour mathematics in gene expression dataset analysis diagnostics presents a promising pathway for precise cancer detection. This method taps into the prowess of deep learning for feature extraction while employing JSD to pinpoint distributional differences and contour mathematics for visual elucidation. The outcomes underscore its potential as a formidable instrument for cancer detection, furnishing crucial insights for timely diagnostics and tailor-made treatment strategies.
Collapse
Affiliation(s)
- Vinoth Kumar Venkatesan
- School of Computer Science Engineering and Information Systems (SCORE), Vellore Institute of Technology, Vellore 632014, India;
| | - Karthick Raghunath Kuppusamy Murugesan
- Department of Computer Science and Engineering, Faculty of Engineering and Technology, JAIN (Deemed-to-be University), Bangalore 562112, India; (K.R.K.M.); (M.T.R.)
| | | | - Mahesh Thyluru Ramakrishna
- Department of Computer Science and Engineering, Faculty of Engineering and Technology, JAIN (Deemed-to-be University), Bangalore 562112, India; (K.R.K.M.); (M.T.R.)
| | - Surbhi Bhatia Khan
- Department of Data Science, School of Science Engineering and Environment, University of Salford, Manchester M5 4WT, UK
- Department of Engineering and Environment, University of Religions and Denominations, Qom 37491-13357, Iran
- Department of Electrical and Computer Engineering, Lebanese American University, Byblos P.O. Box 13-5053, Lebanon
| | - Ahlam Almusharraf
- Department of Business Administration, College of Business and Administration, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia;
| | - Abdullah Albuali
- Department of Computer Science, School of Computer Science and Information Technology, King Faisal University, Hofuf 11671, Saudi Arabia;
| |
Collapse
|
11
|
Hossain PS, Kim K, Uddin J, Samad MA, Choi K. Enhancing Taxonomic Categorization of DNA Sequences with Deep Learning: A Multi-Label Approach. Bioengineering (Basel) 2023; 10:1293. [PMID: 38002417 PMCID: PMC10669241 DOI: 10.3390/bioengineering10111293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 11/02/2023] [Accepted: 11/05/2023] [Indexed: 11/26/2023] Open
Abstract
The application of deep learning for taxonomic categorization of DNA sequences is investigated in this study. Two deep learning architectures, namely the Stacked Convolutional Autoencoder (SCAE) with Multilabel Extreme Learning Machine (MLELM) and the Variational Convolutional Autoencoder (VCAE) with MLELM, have been proposed. These designs provide precise feature maps for individual and inter-label interactions within DNA sequences, capturing their spatial and temporal properties. The collected features are subsequently fed into MLELM networks, which yield soft classification scores and hard labels. The proposed algorithms underwent thorough training and testing on unsupervised data, whereby one or more labels were concurrently taken into account. The introduction of the clade label resulted in improved accuracy for both models compared to the class or genus labels, probably owing to the occurrence of large clusters of similar nucleotides inside a DNA strand. In all circumstances, the VCAE-MLELM model consistently outperformed the SCAE-MLELM model. The best accuracy attained by the VCAE-MLELM model when the clade and family labels were combined was 94%. However, accuracy ratings for single-label categorization using either approach were less than 65%. The approach's effectiveness is based on MLELM networks, which record connected patterns across classes for accurate label categorization. This study advances deep learning in biological taxonomy by emphasizing the significance of combining numerous labels for increased classification accuracy.
Collapse
Affiliation(s)
| | - Kyungsup Kim
- Department of Computer Engineering, Chungnam National University, Yuseong-gu, Daejeon 34134, Republic of Korea
| | - Jia Uddin
- Artificial Intelligence and Big Data Department, Endicott College, Woosong University, Daejeon 34606, Republic of Korea
| | - Md Abdus Samad
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan-si 38541, Gyeongsangbuk-do, Republic of Korea
| | - Kwonhue Choi
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan-si 38541, Gyeongsangbuk-do, Republic of Korea
| |
Collapse
|
12
|
Jia J, Wei Z, Sun M. EMDL_m6Am: identifying N6,2'-O-dimethyladenosine sites based on stacking ensemble deep learning. BMC Bioinformatics 2023; 24:397. [PMID: 37880673 PMCID: PMC10598967 DOI: 10.1186/s12859-023-05543-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 10/20/2023] [Indexed: 10/27/2023] Open
Abstract
BACKGROUND N6, 2'-O-dimethyladenosine (m6Am) is an abundant RNA methylation modification on vertebrate mRNAs and is present in the transcription initiation region of mRNAs. It has recently been experimentally shown to be associated with several human disorders, including obesity genes, and stomach cancer, among others. As a result, N6,2'-O-dimethyladenosine (m6Am) site will play a crucial part in the regulation of RNA if it can be correctly identified. RESULTS This study proposes a novel deep learning-based m6Am prediction model, EMDL_m6Am, which employs one-hot encoding to expressthe feature map of the RNA sequence and recognizes m6Am sites by integrating different CNN models via stacking. Including DenseNet, Inflated Convolutional Network (DCNN) and Deep Multiscale Residual Network (MSRN), the sensitivity (Sn), specificity (Sp), accuracy (ACC), Mathews correlation coefficient (MCC) and area under the curve (AUC) of our model on the training data set reach 86.62%, 88.94%, 87.78%, 0.7590 and 0.8778, respectively, and the prediction results on the independent test set are as high as 82.25%, 79.72%, 80.98%, 0.6199, and 0.8211. CONCLUSIONS In conclusion, the experimental results demonstrated that EMDL_m6Am greatly improved the predictive performance of the m6Am sites and could provide a valuable reference for the next part of the study. The source code and experimental data are available at: https://github.com/13133989982/EMDL-m6Am .
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China.
| | - Zhangying Wei
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China.
| | - Mingwei Sun
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| |
Collapse
|
13
|
Park S, Rehman MU, Ullah F, Tayara H, Chong KT. iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data. Bioinformatics 2023; 39:btad474. [PMID: 37555812 PMCID: PMC10444964 DOI: 10.1093/bioinformatics/btad474] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 05/11/2023] [Accepted: 08/08/2023] [Indexed: 08/10/2023] Open
Abstract
MOTIVATION The investigation of DNA methylation can shed light on the processes underlying human well-being and help determine overall human health. However, insufficient coverage makes it challenging to implement single-stranded DNA methylation sequencing technologies, highlighting the need for an efficient prediction model. Models are required to create an understanding of the underlying biological systems and to project single-cell (methylated) data accurately. RESULTS In this study, we developed positional features for predicting CpG sites. Positional characteristics of the sequence are derived using data from CpG regions and the separation between nearby CpG sites. Multiple optimized classifiers and different ensemble learning approaches are evaluated. The OPTUNA framework is used to optimize the algorithms. The CatBoost algorithm followed by the stacking algorithm outperformed existing DNA methylation identifiers. AVAILABILITY AND IMPLEMENTATION The data and methodologies used in this study are openly accessible to the research community. Researchers can access the positional features and algorithms used for predicting CpG site methylation patterns. To achieve superior performance, we employed the CatBoost algorithm followed by the stacking algorithm, which outperformed existing DNA methylation identifiers. The proposed iCpG-Pos approach utilizes only positional features, resulting in a substantial reduction in computational complexity compared to other known approaches for detecting CpG site methylation patterns. In conclusion, our study introduces a novel approach, iCpG-Pos, for predicting CpG site methylation patterns. By focusing on positional features, our model offers both accuracy and efficiency, making it a promising tool for advancing DNA methylation research and its applications in human health and well-being.
Collapse
Affiliation(s)
- Sehi Park
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Mobeen Ur Rehman
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Farman Ullah
- College of Information Technology in the United Arab Emirates University (UAEU), Abu Dhabi 15551, UAE
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea
| |
Collapse
|
14
|
Ahmad W, Tayara H, Chong KT. Attention-Based Graph Neural Network for Molecular Solubility Prediction. ACS OMEGA 2023; 8:3236-3244. [PMID: 36713733 PMCID: PMC9878542 DOI: 10.1021/acsomega.2c06702] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 12/23/2022] [Indexed: 06/18/2023]
Abstract
Drug discovery (DD) research is aimed at the discovery of new medications. Solubility is an important physicochemical property in drug development. Active pharmaceutical ingredients (APIs) are essential substances for high drug efficacy. During DD research, aqueous solubility (AS) is a key physicochemical attribute required for API characterization. High-precision in silico solubility prediction reduces the experimental cost and time of drug development. Several artificial tools have been employed for solubility prediction using machine learning and deep learning techniques. This study aims to create different deep learning models that can predict the solubility of a wide range of molecules using the largest currently available solubility data set. Simplified molecular-input line-entry system (SMILES) strings were used as molecular representation, models developed using simple graph convolution, graph isomorphism network, graph attention network, and AttentiveFP network. Based on the performance of the models, the AttentiveFP-based network model was finally selected. The model was trained and tested on 9943 compounds. The model outperformed on 62 anticancer compounds with metric Pearson correlation R 2 and root-mean-square error values of 0.52 and 0.61, respectively. AS can be improved by graph algorithm improvement or more molecular properties addition.
Collapse
Affiliation(s)
- Waqar Ahmad
- Department
of Electronics and Information Engineering, Jeonbuk National University, Jeonju54896, South Korea
| | - Hilal Tayara
- School
of International Engineering and Science, Jeonbuk National University, Jeonju54896, South Korea
| | - Kil To Chong
- Department
of Electronics and Information Engineering, Jeonbuk National University, Jeonju54896, South Korea
- Advanced
Electronics and Information Research Center, Jeonbuk National University, Jeonju54896, South Korea
| |
Collapse
|
15
|
Zou J, Liu H, Tan W, Chen YQ, Dong J, Bai SY, Wu ZX, Zeng Y. Dynamic regulation and key roles of ribonucleic acid methylation. Front Cell Neurosci 2022; 16:1058083. [PMID: 36601431 PMCID: PMC9806184 DOI: 10.3389/fncel.2022.1058083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open
Abstract
Ribonucleic acid (RNA) methylation is the most abundant modification in biological systems, accounting for 60% of all RNA modifications, and affects multiple aspects of RNA (including mRNAs, tRNAs, rRNAs, microRNAs, and long non-coding RNAs). Dysregulation of RNA methylation causes many developmental diseases through various mechanisms mediated by N 6-methyladenosine (m6A), 5-methylcytosine (m5C), N 1-methyladenosine (m1A), 5-hydroxymethylcytosine (hm5C), and pseudouridine (Ψ). The emerging tools of RNA methylation can be used as diagnostic, preventive, and therapeutic markers. Here, we review the accumulated discoveries to date regarding the biological function and dynamic regulation of RNA methylation/modification, as well as the most popularly used techniques applied for profiling RNA epitranscriptome, to provide new ideas for growth and development.
Collapse
Affiliation(s)
- Jia Zou
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Hui Liu
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Wei Tan
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
| | - Yi-qi Chen
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Jing Dong
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Shu-yuan Bai
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Zhao-xia Wu
- Community Health Service Center, Wuchang Hospital, Wuhan, China
| | - Yan Zeng
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China,School of Public Health, Wuhan University of Science and Technology, Wuhan, China,*Correspondence: Yan Zeng,
| |
Collapse
|
16
|
Jaganathan K, Rehman MU, Tayara H, Chong KT. XML-CIMT: Explainable Machine Learning (XML) Model for Predicting Chemical-Induced Mitochondrial Toxicity. Int J Mol Sci 2022; 23:ijms232415655. [PMID: 36555297 PMCID: PMC9779353 DOI: 10.3390/ijms232415655] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 12/06/2022] [Accepted: 12/06/2022] [Indexed: 12/14/2022] Open
Abstract
Organ toxicity caused by chemicals is a serious problem in the creation and usage of chemicals such as medications, insecticides, chemical products, and cosmetics. In recent decades, the initiation and development of chemical-induced organ damage have been related to mitochondrial dysfunction, among several adverse effects. Recently, many drugs, for example, troglitazone, have been removed from the marketplace because of significant mitochondrial toxicity. As a result, it is an urgent requirement to develop in silico models that can reliably anticipate chemical-induced mitochondrial toxicity. In this paper, we have proposed an explainable machine-learning model to classify mitochondrially toxic and non-toxic compounds. After several experiments, the Mordred feature descriptor was shortlisted to be used after feature selection. The selected features used with the CatBoost learning algorithm achieved a prediction accuracy of 85% in 10-fold cross-validation and 87.1% in independent testing. The proposed model has illustrated improved prediction accuracy when compared with the existing state-of-the-art method available in the literature. The proposed tree-based ensemble model, along with the global model explanation, will aid pharmaceutical chemists in better understanding the prediction of mitochondrial toxicity.
Collapse
Affiliation(s)
- Keerthana Jaganathan
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Mobeen Ur Rehman
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Correspondence: (H.T); (K.T.C)
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Correspondence: (H.T); (K.T.C)
| |
Collapse
|