1
|
Malebary SJ, Alromema N, Suleman MT, Saleem M. m5c-iDeep: 5-Methylcytosine sites identification through deep learning. Methods 2024:S1046-2023(24)00170-1. [PMID: 39089345 DOI: 10.1016/j.ymeth.2024.07.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 07/16/2024] [Accepted: 07/23/2024] [Indexed: 08/03/2024] Open
Abstract
5-Methylcytosine (m5c) is a modified cytosine base which is formed as the result of addition of methyl group added at position 5 of carbon. This modification is one of the most common PTM that used to occur in almost all types of RNA. The conventional laboratory methods do not provide quick reliable identification of m5c sites. However, the sequence data readiness has made it feasible to develop computationally intelligent models that optimize the identification process for accuracy and robustness. The present research focused on the development of in-silico methods built using deep learning models. The encoded data was then fed into deep learning models, which included gated recurrent unit (GRU), long short-term memory (LSTM), and bi-directional LSTM (Bi-LSTM). After that, the models were subjected to a rigorous evaluation process that included both independent set testing and 10-fold cross validation. The results revealed that LSTM-based model, m5c-iDeep, outperformed revealing 99.9 % accuracy while comparing with existing m5c predictors. In order to facilitate researchers, m5c-iDeep was also deployed on a web-based server which is accessible at https://taseersuleman-m5c-ideep-m5c-ideep.streamlit.app/.
Collapse
Affiliation(s)
- Sharaf J Malebary
- Department of Information Technology, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, P.O. Box 344, Rabigh 21911, Saudi Arabia
| | - Nashwan Alromema
- Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, P.O. Box 344, Rabigh 21911, Saudi Arabia
| | - Muhammad Taseer Suleman
- Department of Criminology and Forensic Sciences, Lahore Garrison University, Lahore Pakistan
| | - Maham Saleem
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore 54770 Pakistan
| |
Collapse
|
2
|
Suleman MT, Alturise F, Alkhalifah T, Khan YD. m1A-Ensem: accurate identification of 1-methyladenosine sites through ensemble models. BioData Min 2024; 17:4. [PMID: 38360720 PMCID: PMC10868122 DOI: 10.1186/s13040-023-00353-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 12/31/2023] [Indexed: 02/17/2024] Open
Abstract
BACKGROUND 1-methyladenosine (m1A) is a variant of methyladenosine that holds a methyl substituent in the 1st position having a prominent role in RNA stability and human metabolites. OBJECTIVE Traditional approaches, such as mass spectrometry and site-directed mutagenesis, proved to be time-consuming and complicated. METHODOLOGY The present research focused on the identification of m1A sites within RNA sequences using novel feature development mechanisms. The obtained features were used to train the ensemble models, including blending, boosting, and bagging. Independent testing and k-fold cross validation were then performed on the trained ensemble models. RESULTS The proposed model outperformed the preexisting predictors and revealed optimized scores based on major accuracy metrics. CONCLUSION For research purpose, a user-friendly webserver of the proposed model can be accessed through https://taseersuleman-m1a-ensem1.streamlit.app/ .
Collapse
Affiliation(s)
- Muhammad Taseer Suleman
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia.
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan
| |
Collapse
|
3
|
Rafay A, Asghar Z, Manzoor H, Hussain W. EyeCNN: exploring the potential of convolutional neural networks for identification of multiple eye diseases through retinal imagery. Int Ophthalmol 2023; 43:3569-3586. [PMID: 37291412 DOI: 10.1007/s10792-023-02764-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 05/21/2023] [Indexed: 06/10/2023]
Abstract
BACKGROUND The eyes are the most important part of the human body as these are directly connected to the brain and help us perceive the imagery in daily life whereas, eye diseases are mostly ignored and underestimated until it is too late. Diagnosing eye disorders through manual diagnosis by the physician can be very costly and time taking. OBJECTIVE Thus, to tackle this, a novel method namely EyeCNN is proposed for identifying eye diseases through retinal images using EfficientNet B3. METHODS A dataset of retinal imagery of three diseases, i.e. Diabetic Retinopathy, Glaucoma, and Cataract is used to train 12 convolutional networks while EfficientNet B3 was the topperforming model out of all 12 models with a testing accuracy of 94.30%. RESULTS After preprocessing of the dataset and training of models, various experimentations were performed to see where our model stands. The evaluation was performed using some well-defined measures and the final model was deployed on the Streamlit server as a prototype for public usage. The proposed model has the potential to help diagnose eye diseases early, which can facilitate timely treatment. CONCLUSION The use of EyeCNN for classifying eye diseases has the potential to aid ophthalmologists in diagnosing conditions accurately and efficiently. This research may also lead to a deeper understanding of these diseases and it may lead to new treatments. The webserver of EyeCNN can be accessed at ( https://abdulrafay97-eyecnn-app-rd9wgz.streamlit.app/ ).
Collapse
Affiliation(s)
- Abdul Rafay
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Zaeem Asghar
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Hamza Manzoor
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Waqar Hussain
- Department of Artificial Intelligence, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan.
| |
Collapse
|
4
|
Ali Z, Alturise F, Alkhalifah T, Khan YD. IGPred-HDnet: Prediction of Immunoglobulin Proteins Using Graphical Features and the Hierarchal Deep Learning-Based Approach. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023; 2023:2465414. [PMID: 36744119 PMCID: PMC9891831 DOI: 10.1155/2023/2465414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/16/2022] [Accepted: 10/12/2022] [Indexed: 01/26/2023]
Abstract
Motivation. Immunoglobulin proteins (IGP) (also called antibodies) are glycoproteins that act as B-cell receptors against external or internal antigens like viruses and bacteria. IGPs play a significant role in diverse cellular processes ranging from adhesion to cell recognition. IGP identifications via the in-silico approach are faster and more cost-effective than wet-lab technological methods. Methods. In this study, we developed an intelligent theoretical deep learning framework, "IGPred-HDnet" for the discrimination of IGPs and non-IGPs. Three types of promising descriptors are feature extraction based on graphical and statistical features (FEGS), amphiphilic pseudo-amino acid composition (Amp-PseAAC), and dipeptide composition (DPC) to extract the graphical, physicochemical, and sequential features. Next, the extracted attributes are evaluated through machine learning, i.e., decision tree (DT), support vector machine (SVM), k-nearest neighbour (KNN), and hierarchical deep network (HDnet) classifiers. The proposed predictor IGPred-HDnet was trained and tested using a 10-fold cross-validation and independent test. Results and Conclusion. The success rates in terms of accuracy (ACC) and Matthew's correlation coefficient (MCC) of IGPred-HDnet on training and independent dataset (Dtrain Dtest) are ACC = 98.00%, 99.10%, and MCC = 0.958, and 0.980 points, respectively. The empirical outcomes demonstrate that the IGPred-HDnet model efficacy on both datasets using the novel FEGS feature and HDnet algorithm achieved superior predictions to other existing computational models. We hope this research will provide great insights into the large-scale identification of IGPs and pharmaceutical companies in new drug design.
Collapse
Affiliation(s)
- Zakir Ali
- Department of Computer Science, School of Science and Technology, University of Management and Technology, Lahore, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Science and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
5
|
Suleman MT, Alturise F, Alkhalifah T, Khan YD. iDHU-Ensem: Identification of dihydrouridine sites through ensemble learning models. Digit Health 2023; 9:20552076231165963. [PMID: 37009307 PMCID: PMC10064468 DOI: 10.1177/20552076231165963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 03/09/2023] [Indexed: 04/04/2023] Open
Abstract
Background Dihydrouridine (D) is one of the most significant uridine modifications that have a prominent occurrence in eukaryotes. The folding and conformational flexibility of transfer RNA (tRNA) can be attained through this modification. Objective The modification also triggers lung cancer in humans. The identification of D sites was carried out through conventional laboratory methods; however, those were costly and time-consuming. The readiness of RNA sequences helps in the identification of D sites through computationally intelligent models. However, the most challenging part is turning these biological sequences into distinct vectors. Methods The current research proposed novel feature extraction mechanisms and the identification of D sites in tRNA sequences using ensemble models. The ensemble models were then subjected to evaluation using k-fold cross-validation and independent testing. Results The results revealed that the stacking ensemble model outperformed all the ensemble models by revealing 0.98 accuracy, 0.98 specificity, 0.97 sensitivity, and 0.92 Matthews Correlation Coefficient. The proposed model, iDHU-Ensem, was also compared with pre-existing predictors using an independent test. The accuracy scores have shown that the proposed model in this research study performed better than the available predictors. Conclusion The current research contributed towards the enhancement of D site identification capabilities through computationally intelligent methods. A web-based server, iDHU-Ensem, was also made available for the researchers at https://taseersuleman-idhu-ensem-idhu-ensem.streamlit.app/.
Collapse
Affiliation(s)
- Muhammad Taseer Suleman
- Department of Computer Science, School of systems and technology, University of Management and Technology, Lahore, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
- Fahad Alturise, Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia.
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of systems and technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
6
|
Dao FY, Lv H, Fullwood MJ, Lin H. Accurate Identification of DNA Replication Origin by Fusing Epigenomics and Chromatin Interaction Information. RESEARCH (WASHINGTON, D.C.) 2022; 2022:9780293. [PMID: 36405252 PMCID: PMC9667886 DOI: 10.34133/2022/9780293] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 09/30/2022] [Indexed: 07/29/2023]
Abstract
DNA replication initiation is a complex process involving various genetic and epigenomic signatures. The correct identification of replication origins (ORIs) could provide important clues for the study of a variety of diseases caused by replication. Here, we design a computational approach named iORI-Epi to recognize ORIs by incorporating epigenome-based features, sequence-based features, and 3D genome-based features. The iORI-Epi displays excellent robustness and generalization ability on both training datasets and independent datasets of K562 cell line. Further experiments confirm that iORI-Epi is highly scalable in other cell lines (MCF7 and HCT116). We also analyze and clarify the regulatory role of epigenomic marks, DNA motifs, and chromatin interaction in DNA replication initiation of eukaryotic genomes. Finally, we discuss gene enrichment pathways from the perspective of ORIs in different replication timing states and heuristically dissect the effect of promoters on replication initiation. Our computational methodology is worth extending to ORI identification in other eukaryotic species.
Collapse
Affiliation(s)
- Fu-Ying Dao
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
- School of Biological Sciences, Nanyang Technological University, Singapore 639798, Singapore
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore 117599, Singapore
| | - Hao Lv
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
- Department of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Melissa J. Fullwood
- School of Biological Sciences, Nanyang Technological University, Singapore 639798, Singapore
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore 117599, Singapore
- Institute of Molecular and Cell Biology, Agency for Science, Technology and Research (A∗STAR), Singapore 138673, Singapore
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
7
|
Shah AA, Alturise F, Alkhalifah T, Khan YD. Evaluation of deep learning techniques for identification of sarcoma-causing carcinogenic mutations. Digit Health 2022; 8:20552076221133703. [PMID: 36312852 PMCID: PMC9597026 DOI: 10.1177/20552076221133703] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 09/30/2022] [Indexed: 11/05/2022] Open
Abstract
The abnormal growth of human healthy cells is called cancer. One of the major
types of cancer is sarcoma, mostly found in human bones and soft tissue cells.
It commonly occurs in children. According to a survey of the United States of
America, there are more than 17,000 sarcoma patients registered each year which
is 15% of all cancer cases. Recognition of cancer at its early stage saves many
lives. The proposed study developed a framework for the early detection of human
sarcoma cancer using deep learning Recurrent Neural Network (RNN) algorithms.
The DNA of a human cell is made up of 25,000 to 30,000 genes. Each gene is
represented by sequences of nucleotides. The nucleotides in a sequence of a
driver gene can change which is termed as mutations. Some mutations can cause
cancer. There are seven types of a gene whose mutation causes sarcoma cancer.
The study uses the dataset which has been taken from more than 134 samples and
includes 141 mutations in 8 driver genes. On these gene sequences RNN algorithms
Long and Short-Term Memory (LSTM), Gated Recurrent Units and Bi-directional LSTM
(Bi-LSTM) are used for training. Rigorous testing techniques such as
Self-consistency testing, independent set testing, 10-fold cross-validation test
are applied for the validation of results. These validation techniques yield
several metrics such as Area Under the Curve (AUC), sensitivity, specificity,
Mathew's correlation coefficient, loss, and accuracy. The proposed algorithm
exhibits an accuracy of 99.6% with an AUC value of 1.00.
Collapse
Affiliation(s)
- Asghar Ali Shah
- Department of Computer Science, University of Management and
Technology, Lahore, Pakistan,Department of Computer Sciences, Bahria University Lahore Campus, Lahore, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia,Fahad Alturise, Department of Computer,
College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim,
Saudi Arabia. ,
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and
Technology, Lahore, Pakistan
| |
Collapse
|