1
|
Ye Y, Jiang H, Xu R, Wang S, Zheng L, Guo J. The INSIGHT platform: Enhancing NAD(P)-dependent specificity prediction for co-factor specificity engineering. Int J Biol Macromol 2024; 278:135064. [PMID: 39182884 DOI: 10.1016/j.ijbiomac.2024.135064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 08/22/2024] [Accepted: 08/23/2024] [Indexed: 08/27/2024]
Abstract
Enzyme specificity towards cofactors like NAD(P)H is crucial for applications in bioremediation and eco-friendly chemical synthesis. Despite their role in converting pollutants and creating sustainable products, predicting enzyme specificity faces challenges due to sparse data and inadequate models. To bridge this gap, we developed the cutting-edge INSIGHT platform to enhance the prediction of coenzyme specificity in NAD(P)-dependent enzymes. INSIGHT integrates extensive data from principal bioinformatics resources, concentrating on both NADH and NADPH specificities, and utilizes advanced protein language models to refine the predictions. This integration not only strengthens computational predictions but also meets the practical demands of high-throughput screening and optimization. Experimental validation confirms INSIGHT's effectiveness, boosting our ability to engineer enzymes for efficient, sustainable industrial and environmental processes. This work advances the practical use of computational tools in enzyme research, addressing industrial needs and offering scalable solutions for environmental challenges.
Collapse
Affiliation(s)
- Yilin Ye
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao
| | | | - Ran Xu
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., China
| | | | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao.
| |
Collapse
|
2
|
Adnan A, Hongya W, Ali F, Khalid M, Alghushairy O, Alsini R. A bi-layer model for identification of piwiRNA using deep neural learning. J Biomol Struct Dyn 2024; 42:5725-5733. [PMID: 37608578 DOI: 10.1080/07391102.2023.2243523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 06/15/2023] [Indexed: 08/24/2023]
Abstract
piwiRNA is a kind of non-coding RNA (ncRNA) that cannot be translated into proteins. It helps in understanding the study of gametes generation and regulation of gene expression over both transcriptional and post-transcriptional levels. piwiRNA has the function of instructing deadenylation, animal fertility, silencing transposons, fighting viruses, and regulating endogenous genes. Due to the great significance of piwiRNA, prediction of piwiRNA is essential for crucial cellular functions. Several predictors were established for prediction of piwiRNA. However, improving the prediction of piwiRNA is highly desirable. In the current study, we developed a more promising predictor named, BLP-piwiRNA. The features are explored by reverse complement k-mer, gapped-k-mer composition, and k-mer composition. The feature set of all descriptors is fused and the best features are selected by cascade and relief feature selection strategies. The best feature sets are provided to random forest (RF), deep neural network (DNN), and support vector machine (SVM). The models validation are examined by 10-fold test. DNN with optimal features of Cascade feature selection approach secured the highest prediction results. The results illustrate that BLP-piwiRNA effectively outperforms the existing studies. The proposed approach would be beneficial for both research community and drug development industry. BLP-piwiRNA would serve as novel biomarkers and therapeutic targets for tumor diagnostics and treatment.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Adnan Adnan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Wang Hongya
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Farman Ali
- Department of Software Engineering, Sarhad University of Science and Information Technology, Peshawar, Pakistan
| | - Majdi Khalid
- Department of Computer Science, College of Computers and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Omar Alghushairy
- Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
3
|
Ali F, Almuhaimeed A, Khalid M, Alshanbari H, Masmoudi A, Alsini R. DEEP-EP: Identification of epigenetic protein by ensemble residual convolutional neural network for drug discovery. Methods 2024; 226:49-53. [PMID: 38621436 DOI: 10.1016/j.ymeth.2024.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/06/2024] [Accepted: 04/08/2024] [Indexed: 04/17/2024] Open
Abstract
Epigenetic proteins (EP) play a role in the progression of a wide range of diseases, including autoimmune disorders, neurological disorders, and cancer. Recognizing their different functions has prompted researchers to investigate them as potential therapeutic targets and pharmacological targets. This paper proposes a novel deep learning-based model that accurately predicts EP. This study introduces a novel deep learning-based model that accurately predicts EP. Our approach entails generating two distinct datasets for training and evaluating the model. We then use three distinct strategies to transform protein sequences to numerical representations: Dipeptide Deviation from Expected Mean (DDE), Dipeptide Composition (DPC), and Group Amino Acid (GAAC). Following that, we train and compare the performance of four advanced deep learning models algorithms: Ensemble Residual Convolutional Neural Network (ERCNN), Generative Adversarial Network (GAN), Convolutional Neural Network (CNN), and Gated Recurrent Unit (GRU). The DDE encoding combined with the ERCNN model demonstrates the best performance on both datasets. This study demonstrates deep learning's potential for precisely predicting EP, which can considerably accelerate research and streamline drug discovery efforts. This analytical method has the potential to find new therapeutic targets and advance our understanding of EP activities in disease.
Collapse
Affiliation(s)
- Farman Ali
- Department of Computer Science, Bahria University Islamabad Campus, Pakistan.
| | - Abdullah Almuhaimeed
- Digital Health Institute, King Abdulaziz City for Science and Technology, Riyadh 11442, Saudi Arabia
| | - Majdi Khalid
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah 21955, Saudi Arabia
| | - Hanan Alshanbari
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah 21955, Saudi Arabia
| | - Atef Masmoudi
- College of Computer Science, King Khalid University, Abha, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
4
|
Khalid M, Ali F, Alghamdi W, Alzahrani A, Alsini R, Alzahrani A. An ensemble computational model for prediction of clathrin protein by coupling machine learning with discrete cosine transform. J Biomol Struct Dyn 2024:1-9. [PMID: 38498362 DOI: 10.1080/07391102.2024.2329777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 02/19/2024] [Indexed: 03/20/2024]
Abstract
Clathrin protein (CP) plays a pivotal role in numerous cellular processes, including endocytosis, signal transduction, and neuronal function. Dysregulation of CP has been associated with a spectrum of diseases. Given its involvement in various cellular functions, CP has garnered significant attention for its potential applications in drug design and medicine, ranging from targeted drug delivery to addressing viral infections, neurological disorders, and cancer. The accurate identification of CP is crucial for unraveling its function and devising novel therapeutic strategies. Computational methods offer a rapid, cost-effective, and less labor-intensive alternative to traditional identification methods, making them especially appealing for high-throughput screening. This paper introduces CL-Pred, a novel computational method for CP identification. CL-Pred leverages three feature descriptors: Dipeptide Deviation from Expected Mean (DDE), Bigram Position Specific Scoring Matrix (BiPSSM), and Position Specific Scoring Matrix-Tetra Slice-Discrete Cosine Transform (PSSM-TS-DCT). The model is trained using three classifiers: Support Vector Machine (SVM), Extremely Randomized Tree (ERT), and Light eXtreme Gradient Boosting (LiXGB). Notably, the LiXGB-based model achieves outstanding performance, demonstrating accuracies of 94.63% and 93.65% on the training and testing datasets, respectively. The proposed CL-Pred method is poised to significantly advance our comprehension of clathrin-mediated endocytosis, cellular physiology, and disease pathogenesis. Furthermore, it holds promise for identifying potential drug targets across a spectrum of diseases.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Majdi Khalid
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Farman Ali
- Sarhad University of Science and Information Technology Peshawar, Mardan Campus, Mardan, Pakistan
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Abdulrahman Alzahrani
- Department of Information System and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ahmed Alzahrani
- College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
5
|
Alsini R, Almuhaimeed A, Ali F, Khalid M, Farrash M, Masmoudi A. Deep-VEGF: deep stacked ensemble model for prediction of vascular endothelial growth factor by concatenating gated recurrent unit with two-dimensional convolutional neural network. J Biomol Struct Dyn 2024:1-11. [PMID: 38450715 DOI: 10.1080/07391102.2024.2323144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 02/16/2024] [Indexed: 03/08/2024]
Abstract
Vascular endothelial growth factor (VEGF) is involved in the development and progression of various diseases, including cancer, diabetic retinopathy, macular degeneration and arthritis. Understanding the role of VEGF in various disorders has led to the development of effective treatments, including anti-VEGF drugs, which have significantly improved therapeutic methods. Accurate VEGF identification is critical, yet experimental identification is expensive and time-consuming. This study presents Deep-VEGF, a novel computational model for VEGF prediction based on deep-stacked ensemble learning. We formulated two datasets using primary sequences. A novel feature descriptor named K-Space Tri Slicing-Bigram position-specific scoring metrix (KSTS-BPSSM) is constructed to extract numerical features from primary sequences. The model training is performed by deep learning techniques, including gated recurrent unit (GRU), generative adversarial network (GAN) and convolutional neural network (CNN). The GRU and CNN are ensembled using stacking learning approach. KSTS-BPSSM-based ensemble model secured the most accurate predictive outcomes, surpassing other competitive predictors across both training and testing datasets. This demonstrates the potential of leveraging deep learning for accurate VEGF prediction as a powerful tool to accelerate research, streamline drug discovery and uncover novel therapeutic targets. This insightful approach holds promise for expanding our knowledge of VEGF's role in health and disease.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Abdullah Almuhaimeed
- Digital Health Institute, King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia
| | - Farman Ali
- Sarhad University of Science and Information Technology Peshawar, Mardan Campus, Pakistan
| | - Majdi Khalid
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Majed Farrash
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Atef Masmoudi
- College of Computer Science, King Khalid University, Abha, Saudi Arabia
| |
Collapse
|
6
|
Alghushairy O, Ali F, Alghamdi W, Khalid M, Alsini R, Asiry O. Machine learning-based model for accurate identification of druggable proteins using light extreme gradient boosting. J Biomol Struct Dyn 2023:1-12. [PMID: 37850427 DOI: 10.1080/07391102.2023.2269280] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/04/2023] [Indexed: 10/19/2023]
Abstract
The identification of druggable proteins (DPs) is significant for the development of new drugs, personalized medicine, understanding of disease mechanisms, drug repurposing, and economic benefits. By identifying new druggable targets, researchers can develop new therapies for a range of diseases, leading to better patient outcomes. Identification of DPs by machine learning strategies is more efficient and cost-effective than conventional methods. In this study, a computational predictor, namely Drug-LXGB, is introduced to enhance the identification of DPs. Features are discovered by composition, transition, and distribution (CTD), composition of K-spaced amino acid pair (CKSAAP), pseudo-position-specific scoring matrix (PsePSSM), and a novel descriptor, called multi-block pseudo amino acid composition (MB-PseAAC). The dimensions of CTD, CKSAAP, PsePSSM, and MB-PseAAC are integrated and utilized the sequential forward selection as feature selection algorithm. The best characteristics are provided by random forest, extreme gradient boosting, and light eXtreme gradient boosting (LXGB). The predictive analysis of these learning methods is measured via 10-fold cross-validation. The LXGB-based model secures the highest results than other existing predictors. Our novel protocol will perform an active role in designing novel drugs and would be fruitful to explore the potential target. This study will help better to capture a more universal view of a potential target.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Omar Alghushairy
- Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Farman Ali
- Department of Software Engineering, Sarhad University of Science and Information Technology Peshawar Mardan Campus, Peshawar, Pakistan
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Majdi Khalid
- Department of Computer Science, College of Computers and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Othman Asiry
- Department of Information Technology, College of Computing and Information Technology at Khulais, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
7
|
Ali F, Alghamdi W, Almagrabi AO, Alghushairy O, Banjar A, Khalid M. Deep-AGP: Prediction of angiogenic protein by integrating two-dimensional convolutional neural network with discrete cosine transform. Int J Biol Macromol 2023; 243:125296. [PMID: 37301349 DOI: 10.1016/j.ijbiomac.2023.125296] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 06/05/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023]
Abstract
Angiogenic proteins (AGPs) play a primary role in the formation of new blood vessels from pre-existing ones. AGPs have diverse applications in cancer, including serving as biomarkers, guiding anti-angiogenic therapies, and aiding in tumor imaging. Understanding the role of AGPs in cardiovascular and neurodegenerative diseases is vital for developing new diagnostic tools and therapeutic approaches. Considering the significance of AGPs, in this research, we first time established a computational model using deep learning for identifying AGPs. First, we constructed a sequence-based dataset. Second, we explored features by designing a novel feature encoder, called position-specific scoring matrix-decomposition-discrete cosine transform (PSSM-DC-DCT) and existing descriptors including Dipeptide Deviation from Expected Mean (DDE) and bigram-position-specific scoring matrix (Bi-PSSM). Third, each feature set is fed into two-dimensional convolutional neural network (2D-CNN) and machine learning classifiers. Finally, the performance of each learning model is validated by 10-fold cross-validation (CV). The experimental results demonstrate that 2D-CNN with proposed novel feature descriptor achieved the highest success rate on both training and testing datasets. In addition to being an accurate predictor for identification of angiogenic proteins, our proposed method (Deep-AGP) might be fruitful in understanding cancer, cardiovascular, and neurodegenerative diseases, development of their novel therapeutic methods and drug designing.
Collapse
Affiliation(s)
- Farman Ali
- Sarhad University of Science and Information Technology Peshawar, Mardan Campus, Pakistan.
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Alaa Omran Almagrabi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia.
| | - Omar Alghushairy
- Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Ameen Banjar
- Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Majdi Khalid
- Department of Computer Science, College of Computers and Information Systems, Umm Al-Qura University, Makkah 21955, Saudi Arabia
| |
Collapse
|
8
|
Ali F, Kumar H, Alghamdi W, Kateb FA, Alarfaj FK. Recent Advances in Machine Learning-Based Models for Prediction of Antiviral Peptides. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2023; 30:1-12. [PMID: 37359746 PMCID: PMC10148704 DOI: 10.1007/s11831-023-09933-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/19/2023] [Indexed: 06/28/2023]
Abstract
Viruses have killed and infected millions of people across the world. It causes several chronic diseases like COVID-19, HIV, and hepatitis. To cope with such diseases and virus infections, antiviral peptides (AVPs) have been applied in the design of drugs. Keeping in view the significant role in pharmaceutical industry and other research fields, identification of AVPs is highly indispensable. In this connection, experimental and computational methods were proposed to identify AVPs. However, more accurate predictors for boosting AVPs identification are highly desirable. This work presents a thorough study and reports the available predictors of AVPs. We explained applied datasets, feature representation approaches, classification algorithms, and evaluation parameters of performance. In this study, the limitations of the existing studies and the best methods were emphasized. Provided the pros and cons of the applied classifiers. The future insights demonstrate efficient feature encoding approaches, best feature optimization schemes, and effective classification techniques that can improve the performance of novel method for accurate prediction of AVPs.
Collapse
Affiliation(s)
- Farman Ali
- Sarhad University of Science and Information Technology Peshawar, Mardan Campus, Khyber Pakhtunkhwa, Pakistan
| | - Harish Kumar
- Department of Computer Science, College of Computer Science, King Khalid University, Abha, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, 21589 Saudi Arabia
| | - Faris A. Kateb
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, 21589 Saudi Arabia
| | - Fawaz Khaled Alarfaj
- Department of Management Information Systems, King Faisal University, Hufof, Saudi Arabia
| |
Collapse
|
9
|
Khan A, Uddin J, Ali F, Kumar H, Alghamdi W, Ahmad A. AFP-SPTS: An Accurate Prediction of Antifreeze Proteins Using Sequential and Pseudo-Tri-Slicing Evolutionary Features with an Extremely Randomized Tree. J Chem Inf Model 2023; 63:826-834. [PMID: 36649569 DOI: 10.1021/acs.jcim.2c01417] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
The development of intracellular ice in the bodies of cold-blooded living organisms may cause them to die. These species yield antifreeze proteins (AFPs) to live in subzero temperature environments. Additionally, AFPs are implemented in biotechnological, industrial, agricultural, and medical fields. Machine learning-based predictors were presented for AFP identification. However, more accurate predictors are still highly desirable for boosting the AFP prediction. This work presents a novel approach, named AFP-SPTS, for the correct prediction of AFPs. We explored the discriminative features with four schemes, namely, dipeptide deviation from the expected mean (DDE), reduced amino acid alphabet (RAAA), grouped dipeptide composition (GDPC), and a novel representative method, called pseudo-position-specific scoring matrix tri-slicing (PseTS-PSSM). Considering the advantages of ensemble learning strategy, we fused each feature vector into different combinations and trained the models with five machine learning algorithms, i.e., multilayer perceptron (MLP), extremely randomized tree (ERT), decision tree (DT), random forest (RF), and AdaBoost. Among all models, PseTS-PSSM + RAAA with an extremely randomized tree attained the best outcomes. The proposed predictor (AFP-SPTS) boosted the accuracies of AFPs in the literature by 1.82 and 4.1%.
Collapse
Affiliation(s)
- Adnan Khan
- Qurtuba University of Science and Information Technology, Peshawar5000, Khyber Pakhtunkhwa, Pakistan
| | - Jamal Uddin
- Qurtuba University of Science and Information Technology, Peshawar5000, Khyber Pakhtunkhwa, Pakistan
| | - Farman Ali
- Sarhad University of Science and Information Technology, Mardan Campus, Peshawar23200, Pakistan.,Department of Elementary and Secondary Education Department, Government of Khyber Pakhtunkhwa, Peshawar5000, Khyber Pakhtunkhwa, Pakistan
| | - Harish Kumar
- Department of Computer Science, College of Computer Science, King Khalid University, Abha61421, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King AbdulAziz University, Jeddah21589, Saudi Arabia
| | - Aftab Ahmad
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan23200, Pakistan
| |
Collapse
|
10
|
Ali Z, Alturise F, Alkhalifah T, Khan YD. IGPred-HDnet: Prediction of Immunoglobulin Proteins Using Graphical Features and the Hierarchal Deep Learning-Based Approach. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023; 2023:2465414. [PMID: 36744119 PMCID: PMC9891831 DOI: 10.1155/2023/2465414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/16/2022] [Accepted: 10/12/2022] [Indexed: 01/26/2023]
Abstract
Motivation. Immunoglobulin proteins (IGP) (also called antibodies) are glycoproteins that act as B-cell receptors against external or internal antigens like viruses and bacteria. IGPs play a significant role in diverse cellular processes ranging from adhesion to cell recognition. IGP identifications via the in-silico approach are faster and more cost-effective than wet-lab technological methods. Methods. In this study, we developed an intelligent theoretical deep learning framework, "IGPred-HDnet" for the discrimination of IGPs and non-IGPs. Three types of promising descriptors are feature extraction based on graphical and statistical features (FEGS), amphiphilic pseudo-amino acid composition (Amp-PseAAC), and dipeptide composition (DPC) to extract the graphical, physicochemical, and sequential features. Next, the extracted attributes are evaluated through machine learning, i.e., decision tree (DT), support vector machine (SVM), k-nearest neighbour (KNN), and hierarchical deep network (HDnet) classifiers. The proposed predictor IGPred-HDnet was trained and tested using a 10-fold cross-validation and independent test. Results and Conclusion. The success rates in terms of accuracy (ACC) and Matthew's correlation coefficient (MCC) of IGPred-HDnet on training and independent dataset (Dtrain Dtest) are ACC = 98.00%, 99.10%, and MCC = 0.958, and 0.980 points, respectively. The empirical outcomes demonstrate that the IGPred-HDnet model efficacy on both datasets using the novel FEGS feature and HDnet algorithm achieved superior predictions to other existing computational models. We hope this research will provide great insights into the large-scale identification of IGPs and pharmaceutical companies in new drug design.
Collapse
Affiliation(s)
- Zakir Ali
- Department of Computer Science, School of Science and Technology, University of Management and Technology, Lahore, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Science and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
11
|
Khan A, Uddin J, Ali F, Ahmad A, Alghushairy O, Banjar A, Daud A. Prediction of antifreeze proteins using machine learning. Sci Rep 2022; 12:20672. [PMID: 36450775 PMCID: PMC9712683 DOI: 10.1038/s41598-022-24501-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 11/16/2022] [Indexed: 12/03/2022] Open
Abstract
Living organisms including fishes, microbes, and animals can live in extremely cold weather. To stay alive in cold environments, these species generate antifreeze proteins (AFPs), also referred to as ice-binding proteins. Moreover, AFPs are extensively utilized in many important fields including medical, agricultural, industrial, and biotechnological. Several predictors were constructed to identify AFPs. However, due to the sequence and structural heterogeneity of AFPs, correct identification is still a challenging task. It is highly desirable to develop a more promising predictor. In this research, a novel computational method, named AFP-LXGB has been proposed for prediction of AFPs more precisely. The information is explored by Dipeptide Composition (DPC), Grouped Amino Acid Composition (GAAC), Position Specific Scoring Matrix-Segmentation-Autocorrelation Transformation (Sg-PSSM-ACT), and Pseudo Position Specific Scoring Matrix Tri-Slicing (PseTS-PSSM). Keeping the benefits of ensemble learning, these feature sets are concatenated into different combinations. The best feature set is selected by Extremely Randomized Tree-Recursive Feature Elimination (ERT-RFE). The models are trained by Light eXtreme Gradient Boosting (LXGB), Random Forest (RF), and Extremely Randomized Tree (ERT). Among classifiers, LXGB has obtained the best prediction results. The novel method (AFP-LXGB) improved the accuracies by 3.70% and 4.09% than the best methods. These results verified that AFP-LXGB can predict AFPs more accurately and can participate in a significant role in medical, agricultural, industrial, and biotechnological fields.
Collapse
Affiliation(s)
- Adnan Khan
- grid.444994.00000 0004 0609 284XQurtuba University of Science and Technology, Peshawar, Khyber Pakhtunkhwa Pakistan
| | - Jamal Uddin
- grid.444994.00000 0004 0609 284XQurtuba University of Science and Technology, Peshawar, Khyber Pakhtunkhwa Pakistan
| | - Farman Ali
- Department of Elementary and Secondary Education, Peshawar, Khyber Pakhtunkhwa Pakistan ,grid.444996.20000 0004 0609 292XSarhad University of Science and Information Technology, Mardan, Pakistan
| | - Ashfaq Ahmad
- grid.440522.50000 0004 0478 6450Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Omar Alghushairy
- grid.460099.2Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Ameen Banjar
- grid.460099.2Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Ali Daud
- Abu Dhabi School of Management, Abu Dhabi, United Arab Emirates ,grid.460099.2Department of Computer Science and Artificial Intelligence, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
12
|
DBP-iDWT: Improving DNA-Binding Proteins Prediction Using Multi-Perspective Evolutionary Profile and Discrete Wavelet Transform. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2987407. [PMID: 36211019 PMCID: PMC9534628 DOI: 10.1155/2022/2987407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 08/19/2022] [Accepted: 09/09/2022] [Indexed: 11/17/2022]
Abstract
DNA-binding proteins (DBPs) have crucial biotic activities including DNA replication, recombination, and transcription. DBPs are highly concerned with chronic diseases and are used in the manufacturing of antibiotics and steroids. A series of predictors were established to identify DBPs. However, researchers are still working to further enhance the identification of DBPs. This research designed a novel predictor to identify DBPs more accurately. The features from the sequences are transformed by F-PSSM (Filtered position-specific scoring matrix), PSSM-DPC (Position specific scoring matrix-dipeptide composition), and R-PSSM (Reduced position-specific scoring matrix). To eliminate the noisy attributes, we extended DWT (discrete wavelet transform) to F-PSSM, PSSM-DPC, and R-PSSM and introduced three novel descriptors, namely, F-PSSM-DWT, PSSM-DPC-DWT, and R-PSSM-DWT. Onward, the training of the four models were performed using LiXGB (Light eXtreme gradient boosting), XGB (eXtreme gradient boosting, ERT (extremely randomized trees), and Adaboost. LiXGB with R-PSSM-DWT has attained 6.55% higher accuracy on training and 5.93% on testing dataset than the best existing predictors. The results reveal the excellent performance of our novel predictor over the past studies. DBP-iDWT would be fruitful for establishing more operative therapeutic strategies for fatal disease treatment.
Collapse
|
13
|
Ali F, Kumar H, Patil S, Ahmad A, Babour A, Daud A. Deep-GHBP: Improving prediction of Growth Hormone-binding proteins using deep learning model. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103856] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
14
|
Ali F, Kumar H, Patil S, Kotecha K, Banjar A, Daud A. Target-DBPPred: An intelligent model for prediction of DNA-binding proteins using discrete wavelet transform based compression and light eXtreme gradient boosting. Comput Biol Med 2022; 145:105533. [PMID: 35447463 DOI: 10.1016/j.compbiomed.2022.105533] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 04/11/2022] [Accepted: 04/13/2022] [Indexed: 11/03/2022]
Abstract
DNA-protein interaction is a critical biological process that performs influential activities, including DNA transcription and recombination. DBPs (DNA-binding proteins) are closely associated with different kinds of human diseases (asthma, cancer, and AIDS), while some of the DBPs are used in the production of antibiotics, steroids, and anti-inflammatories. Several methods have been reported for the prediction of DBPs. However, a more intelligent method is still highly desirable for the accurate prediction of DBPs. This study presents an intelligent computational method, Target-DBPPred, to improve DBPs prediction. Important features from primary protein sequences are investigated via a novel feature descriptor, called EDF-PSSM-DWT (Evolutionary difference formula position-specific scoring matrix-discrete wavelet transform) and several other multi-evolutionary methods, including F-PSSM (Filtered position-specific scoring matrix), EDF-PSSM (Evolutionary difference formula position-specific scoring matrix), PSSM-DPC (Position-specific scoring matrix-dipeptide composition), and Lead-BiPSSM (Lead-bigram-position specific scoring matrix) to encapsulate diverse multivariate features. The best feature set from the features of each descriptor is selected using sequential forward selection (SFS). Further, four models are trained using Adaboost, XGB (eXtreme gradient boosting), ERT (extremely randomized trees), and LiXGB (Light eXtreme gradient boosting) classifiers. LiXGB, with the best feature set of EDF-PSSM-DWT, has attained 6.69% and 15.07% higher performance in terms of accuracies using training and testing datasets, respectively. The obtained results verify the improved performance of our proposed predictor over the existing predictors.
Collapse
Affiliation(s)
- Farman Ali
- Department of Elementary and Secondary Education, Peshawar, Khyber Pakhtunkhwa, Pakistan; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| | - Harish Kumar
- Department of Computer Science, College of Computer Science, King Khalid University, Abha, Saudi Arabia
| | - Shruti Patil
- Symbiosis Institute of Technology, Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International University, Pune, India
| | - Ketan Kotecha
- Symbiosis Institute of Technology, Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International University, Pune, India.
| | - Ameen Banjar
- Department of Information Systems, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Ali Daud
- Key Laboratory of Oceanographic Big Data Mining & Application of Zhejiang Province, School of Information Engineering, Zhejiang Ocean University, Zhoushan, 316022, China; Department of Computer Science and Artificial Intelligence, University of Jeddah, Jeddah, Saudi Arabia.
| |
Collapse
|