1
|
Nagm AM, Moussa MM, Shoitan R, Ali A, Mashhour M, Salama AS, AbdulWakel HI. Detecting image manipulation with ELA-CNN integration: a powerful framework for authenticity verification. PeerJ Comput Sci 2024; 10:e2205. [PMID: 39145198 PMCID: PMC11323046 DOI: 10.7717/peerj-cs.2205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 06/26/2024] [Indexed: 08/16/2024]
Abstract
The exponential progress of image editing software has contributed to a rapid rise in the production of fake images. Consequently, various techniques and approaches have been developed to detect manipulated images. These methods aim to discern between genuine and altered images, effectively combating the proliferation of deceptive visual content. However, additional advancements are necessary to enhance their accuracy and precision. Therefore, this research proposes an image forgery algorithm that integrates error level analysis (ELA) and a convolutional neural network (CNN) to detect the manipulation. The system primarily focuses on detecting copy-move and splicing forgeries in images. The input image is fed to the ELA algorithm to identify regions within the image that have different compression levels. Afterward, the created ELA images are used as input to train the proposed CNN model. The CNN model is constructed from two consecutive convolution layers, followed by one max pooling layer and two dense layers. Two dropout layers are inserted between the layers to improve model generalization. The experiments are applied to the CASIA 2 dataset, and the simulation results show that the proposed algorithm demonstrates remarkable performance metrics, including a training accuracy of 99.05%, testing accuracy of 94.14%, precision of 94.1%, and recall of 94.07%. Notably, it outperforms state-of-the-art techniques in both accuracy and precision.
Collapse
Affiliation(s)
- Ahmad M. Nagm
- Department of Computer Engineering and Electronics, Cairo Higher Institute for Engineering, Computer Science and Management, Cairo, Egypt
| | - Mona M. Moussa
- Computer and Systems Department, Electronics Research Institute, Cairo, Egypt
| | - Rasha Shoitan
- Computer and Systems Department, Electronics Research Institute, Cairo, Egypt
| | - Ahmed Ali
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
- Computer Science, Higher Future Institute for Specialized Technological Studies, Cairo, Egypt
| | - Mohamed Mashhour
- Department of Computer Engineering and Electronics, Cairo Higher Institute for Engineering, Computer Science and Management, Cairo, Egypt
- Computer Science Department, Faculty of Computers and Information, Minia University, Minia, Egypt
| | - Ahmed S. Salama
- Department of Computer Engineering and Electronics, Cairo Higher Institute for Engineering, Computer Science and Management, Cairo, Egypt
- Electrical Engineering Department, Faculty of Engineering & Technology, Future University in Egypt, New Cairo, Egypt
| | - Hamada I. AbdulWakel
- Computer Science Department, Faculty of Computers and Information, Minia University, Minia, Egypt
| |
Collapse
|
2
|
Bukhari SNH, Ogudo KA. Hybrid Predictive Machine Learning Model for the Prediction of Immunodominant Peptides of Respiratory Syncytial Virus. Bioengineering (Basel) 2024; 11:791. [PMID: 39199749 PMCID: PMC11351268 DOI: 10.3390/bioengineering11080791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 07/26/2024] [Accepted: 08/02/2024] [Indexed: 09/01/2024] Open
Abstract
Respiratory syncytial virus (RSV) is a common respiratory pathogen that infects the human lungs and respiratory tract, often causing symptoms similar to the common cold. Vaccination is the most effective strategy for managing viral outbreaks. Currently, extensive efforts are focused on developing a vaccine for RSV. Traditional vaccine design typically involves using an attenuated form of the pathogen to elicit an immune response. In contrast, peptide-based vaccines (PBVs) aim to identify and chemically synthesize specific immunodominant peptides (IPs), known as T-cell epitopes (TCEs), to induce a targeted immune response. Despite their potential for enhancing vaccine safety and immunogenicity, PBVs have received comparatively less attention. Identifying IPs for PBV design through conventional wet-lab experiments is challenging, costly, and time-consuming. Machine learning (ML) techniques offer a promising alternative, accurately predicting TCEs and significantly reducing the time and cost of vaccine development. This study proposes the development and evaluation of eight hybrid ML predictive models created through the permutations and combinations of two classification methods, two feature weighting techniques, and two feature selection algorithms, all aimed at predicting the TCEs of RSV. The models were trained using the experimentally determined TCEs and non-TCE sequences acquired from the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) repository. The hybrid model composed of the XGBoost (XGB) classifier, chi-squared (ChST) weighting technique, and backward search (BST) as the optimal feature selection algorithm (ChST-BST-XGB) was identified as the best model, achieving an accuracy, sensitivity, specificity, F1 score, AUC, precision, and MCC of 97.10%, 0.98, 0.97, 0.98, 0.99, 0.99, and 0.96, respectively. Additionally, K-fold cross-validation (KFCV) was performed to ensure the model's reliability and an average accuracy of 97.21% was recorded for the ChST-BST-XGB model. The results indicate that the hybrid XGBoost model consistently outperforms other hybrid approaches. The epitopes predicted by the proposed model may serve as promising vaccine candidates for RSV, subject to in vitro and in vivo scientific assessments. This model can assist the scientific community in expediting the screening of active TCE candidates for RSV, ultimately saving time and resources in vaccine development.
Collapse
Affiliation(s)
- Syed Nisar Hussain Bukhari
- National Institute of Electronics and Information Technology (NIELIT), Ministry of Electronics and Information Technology (MeitY), Government of India, Srinagar 191132, India
| | - Kingsley A. Ogudo
- Department of Electrical & Electronics Engineering, Faculty of Engineering and the Built Environment, University of Johannesburg, Johannesburg 0524, South Africa;
| |
Collapse
|
3
|
Fu Y, Guo L, Huang F. A lightweight CNN model for pepper leaf disease recognition in a human palm background. Heliyon 2024; 10:e33447. [PMID: 39027426 PMCID: PMC11254718 DOI: 10.1016/j.heliyon.2024.e33447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 06/20/2024] [Accepted: 06/21/2024] [Indexed: 07/20/2024] Open
Abstract
The identification of pepper leaf diseases is crucial for ensuring the safety and quality of pepper yield. However, existing methods heavily rely on manual diagnosis, resulting in inefficiencies and inaccuracies. In this study, we propose a lightweight convolutional neural network (CNN) model for recognizing pepper leaf diseases and subsequently develop an application based on this model. To begin with, we acquired various images depicting healthy leaves as well as leaves affected by viral diseases, brown spots, and leaf mold. It is noteworthy that these images were captured against a background of human palms, which is commonly encountered in field conditions. The proposed CNN model adopts the GGM-VGG16 architecture, incorporating Ghost modules, global average pooling, and multi-scale convolution. Following training with the collected image dataset, the model was deployed on a mobile terminal, where an application for pepper leaf disease recognition was developed using Android Studio. Experimental results indicate that the proposed model achieved 100 % accuracy on images with a human palm background, while also demonstrating satisfactory performance on images with other backgrounds, achieving an accuracy of 87.38 %. Furthermore, the developed application has a compact size of only 12.84 MB and exhibits robust performance in recognizing pepper leaf diseases.
Collapse
Affiliation(s)
- Youyao Fu
- School of Electronic & Information Engineering, Taizhou University, Taizhou, 318000, China
- Nuclear Technology Application Engineering Research Center of the Ministry of Education, Nanchang, 330013, China
| | - Linsheng Guo
- School of Geophysics and Measurement-Control Technology, East China University of Technology, Nanchang, 330013, China
| | - Fang Huang
- School of Electronic & Information Engineering, Taizhou University, Taizhou, 318000, China
| |
Collapse
|
4
|
Bukhari SNH, Elshiekh E, Abbas M. Physicochemical properties-based hybrid machine learning technique for the prediction of SARS-CoV-2 T-cell epitopes as vaccine targets. PeerJ Comput Sci 2024; 10:e1980. [PMID: 38686005 PMCID: PMC11057572 DOI: 10.7717/peerj-cs.1980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 03/15/2024] [Indexed: 05/02/2024]
Abstract
Majority of the existing SARS-CoV-2 vaccines work by presenting the whole pathogen in the attenuated form to immune system to invoke an immune response. On the other hand, the concept of a peptide based vaccine (PBV) is based on the identification and chemical synthesis of only immunodominant peptides known as T-cell epitopes (TCEs) to induce a specific immune response against a particular pathogen. However PBVs have received less attention despite holding huge untapped potential for boosting vaccine safety and immunogenicity. To identify these TCEs for designing PBV, wet-lab experiments are difficult, expensive, and time-consuming. Machine learning (ML) techniques can accurately predict TCEs, saving time and cost for speedy vaccine development. This work proposes novel hybrid ML techniques based on the physicochemical properties of peptides to predict SARS-CoV-2 TCEs. The proposed hybrid ML technique was evaluated using various ML model evaluation metrics and demonstrated promising results. The hybrid technique of decision tree classifier with chi-squared feature weighting technique and forward search optimal feature searching algorithm has been identified as the best model with an accuracy of 98.19%. Furthermore, K-fold cross-validation (KFCV) was performed to ensure that the model is reliable and the results indicate that the hybrid random forest model performs consistently well in terms of accuracy with respect to other hybrid approaches. The predicted TCEs are highly likely to serve as promising vaccine targets, subject to evaluations both in-vivo and in-vitro. This development could potentially save countless lives globally, prevent future epidemic-scale outbreaks, and reduce the risk of mutation escape.
Collapse
Affiliation(s)
- Syed Nisar Hussain Bukhari
- National Institute of Electronics and Information Technology (NIELIT), Srinagar, Jammu and Kashmir, India
| | - E. Elshiekh
- Department of Radiological Sciences, College of Applied Medical Sciences, King Khalid University, Abha, Saudi Arabia
| | - Mohamed Abbas
- Electrical Engineering Department, College of Engineering, King Khalid University, Abha, Saudi Arabia
| |
Collapse
|
5
|
Ozsahin DU, Ameen ZS, Hassan AS, Mubarak AS. Enhancing explainable SARS-CoV-2 vaccine development leveraging bee colony optimised Bi-LSTM, Bi-GRU models and bioinformatic analysis. Sci Rep 2024; 14:6737. [PMID: 38509174 PMCID: PMC10954760 DOI: 10.1038/s41598-024-55762-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Accepted: 02/27/2024] [Indexed: 03/22/2024] Open
Abstract
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a single-stranded RNA virus that caused the outbreak of the coronavirus disease 2019 (COVID-19). The COVID-19 outbreak has led to millions of deaths and economic losses globally. Vaccination is the most practical solution, but finding epitopes (antigenic peptide regions) in the SARS-CoV-2 proteome is challenging, costly, and time-consuming. Here, we proposed a deep learning method based on standalone Recurrent Neural networks to predict epitopes from SARS-CoV-2 proteins easily. We optimised the standalone Bidirectional Long Short-Term Memory (Bi-LSTM) and Bidirectional Gated Recurrent Unit (Bi-GRU) with a bioinspired optimisation algorithm, namely, Bee Colony Optimization (BCO). The study shows that LSTM-based models, particularly BCO-Bi-LSTM, outperform all other models and achieve an accuracy of 0.92 and AUC of 0.944. To overcome the challenge of understanding the model predictions, explainable AI using the Shapely Additive Explanations (SHAP) method was employed to explain how Blackbox models make decisions. Finally, the predicted epitopes led to the development of a multi-epitope vaccine. The multi-epitope vaccine effectiveness evaluation is based on vaccine toxicity, allergic response risk, and antigenic and biochemical characteristics using bioinformatic tools. The developed multi-epitope vaccine is non-toxic and highly antigenic. Codon adaptation, cloning, gel electrophoresis assess genomic sequence, protein composition, expression and purification while docking and IMMSIM servers simulate interactions and immunological response, respectively. These investigations provide a conceptual framework for developing a SARS-CoV-2 vaccine.
Collapse
Affiliation(s)
- Dilber Uzun Ozsahin
- Department of Medical Diagnostic Imaging, College of Health Science, University of Sharjah, Sharjah, UAE
- Research Institute for Medical and Health Sciences, University of Sharjah, Sharjah, UAE
- Operational Research Centre in Healthcare, Near East University, TRNC Mersin 10, Nicosia, 99138, Turkey
| | - Zubaida Said Ameen
- Operational Research Centre in Healthcare, Near East University, TRNC Mersin 10, Nicosia, 99138, Turkey
- Department of Biochemistry, Yusuf Maitama Sule University, Kano, Nigeria
| | - Abdurrahman Shuaibu Hassan
- Department of Electrical Electronics and Automation Systems Engineering, Kampala International University, Kampala, Uganda.
| | - Auwalu Saleh Mubarak
- Operational Research Centre in Healthcare, Near East University, TRNC Mersin 10, Nicosia, 99138, Turkey
- Department of Electrical Engineering, Aliko Dangote University of Science and Technology, Wudil, Kano, Nigeria
| |
Collapse
|
6
|
Beikzadeh B. Immunoinformatics design of novel multi-epitope vaccine against Trueperella Pyogenes using collagen adhesion protein, fimbriae, and pyolysin. Arch Microbiol 2024; 206:90. [PMID: 38315222 DOI: 10.1007/s00203-023-03814-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 12/02/2023] [Accepted: 12/22/2023] [Indexed: 02/07/2024]
Abstract
Trueperella pyogenes (T. pyogenes) is an opportunistic pathogen that causes infertility, mastitis, and metritis in animals. T. pyogenes is also a zoonotic disease and is considered an economic loss agent in the livestock industry. Therefore, vaccine development is necessary. Using an immunoinformatics approach, this study aimed to construct a multi-epitope vaccine against T. pyogenes. The collagen adhesion protein, fimbriae, and pyolysin (PLO) sequences were initially retrieved. The HTL, CTL, and B cell epitopes were predicted. The vaccine was designed by binding these epitopes with linkers. To increase vaccine immunogenicity, profilin was added to the N-terminal of the vaccine construct. The antigenic features and safety of the vaccine model were investigated. Docking, molecular dynamics simulation of the vaccine with immune receptors, and immunological simulation were used to evaluate the vaccine's efficacy. The vaccine's sequence was then optimized for cloning. The vaccine construct was designed based on 18 epitopes of T. pyogenes. The computational tools validated the vaccine as non-allergenic, non-toxic, hydrophilic, and stable at different temperatures with acceptable antigenic features. The vaccine model had good affinity and stability to bovine TLR2, 4, and 5 as well as stimulation of IgM, IgG, IL-2, IFN-γ, and Th1 responses. This vaccine also increased long-lived memory cells, dendritic cells, and macrophage population. In addition, codon optimization was done and cloned in the E. coli K12 expression vector (pET-28a). For the first time, this study introduced a novel multi-epitope vaccine candidate based on collagen adhesion protein, fimbriae, and PLO of T. pyogenes. It is expected this vaccine stimulates an effective immune response to prevent T. pyogenes infection.
Collapse
Affiliation(s)
- Babak Beikzadeh
- Department of Cell and Molecular Biology & Microbiology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran.
| |
Collapse
|
7
|
Halawani R, Buchert M, Chen YPP. Deep learning exploration of single-cell and spatially resolved cancer transcriptomics to unravel tumour heterogeneity. Comput Biol Med 2023; 164:107274. [PMID: 37506451 DOI: 10.1016/j.compbiomed.2023.107274] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 07/03/2023] [Accepted: 07/16/2023] [Indexed: 07/30/2023]
Abstract
Tumour heterogeneity is one of the critical confounding aspects in decoding tumour growth. Malignant cells display variations in their gene transcription profiles and mutation spectra even when originating from a single progenitor cell. Single-cell and spatial transcriptomics sequencing have recently emerged as key technologies for unravelling tumour heterogeneity. Single-cell sequencing promotes individual cell-type identification through transcriptome-wide gene expression measurements of each cell. Spatial transcriptomics facilitates identification of cell-cell interactions and the structural organization of heterogeneous cells within a tumour tissue through associating spatial RNA abundance of cells at distinct spots in the tissue section. However, extracting features and analyzing single-cell and spatial transcriptomics data poses challenges. Single-cell transcriptome data is extremely noisy and its sparse nature and dropouts can lead to misinterpretation of gene expression and the misclassification of cell types. Deep learning predictive power can overcome data challenges, provide high-resolution analysis and enhance precision oncology applications that involve early cancer prognosis, diagnosis, patient survival estimation and anti-cancer therapy planning. In this paper, we provide a background to and review of the recent progress of deep learning frameworks to investigate tumour heterogeneity using both single-cell and spatial transcriptomics data types.
Collapse
Affiliation(s)
- Raid Halawani
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
| | - Michael Buchert
- School of Cancer Medicine, La Trobe University, Melbourne, Victoria, Australia; Olivia Newton-John Cancer Research Institute, Melbourne, Victoria, Australia
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia.
| |
Collapse
|
8
|
Li F, Guo X, Bi Y, Jia R, Pitt ME, Pan S, Li S, Gasser RB, Coin LJ, Song J. Digerati - A multipath parallel hybrid deep learning framework for the identification of mycobacterial PE/PPE proteins. Comput Biol Med 2023; 163:107155. [PMID: 37356289 DOI: 10.1016/j.compbiomed.2023.107155] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/05/2023] [Accepted: 06/07/2023] [Indexed: 06/27/2023]
Abstract
The genome of Mycobacterium tuberculosis contains a relatively high percentage (10%) of genes that are poorly characterised because of their highly repetitive nature and high GC content. Some of these genes encode proteins of the PE/PPE family, which are thought to be involved in host-pathogen interactions, virulence, and disease pathogenicity. Members of this family are genetically divergent and challenging to both identify and classify using conventional computational tools. Thus, advanced in silico methods are needed to identify proteins of this family for subsequent functional annotation efficiently. In this study, we developed the first deep learning-based approach, termed Digerati, for the rapid and accurate identification of PE and PPE family proteins. Digerati was built upon a multipath parallel hybrid deep learning framework, which equips multi-layer convolutional neural networks with bidirectional, long short-term memory, equipped with a self-attention module to effectively learn the higher-order feature representations of PE/PPE proteins. Empirical studies demonstrated that Digerati achieved a significantly better performance (∼18-20%) than alignment-based approaches, including BLASTP, PHMMER, and HHsuite, in both prediction accuracy and speed. Digerati is anticipated to facilitate community-wide efforts to conduct high-throughput identification and analysis of PE/PPE family members. The webserver and source codes of Digerati are publicly available at http://web.unimelb-bioinfortools.cloud.edu.au/Digerati/.
Collapse
Affiliation(s)
- Fuyi Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China; Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria, 3000, Australia.
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Yue Bi
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, 3800, Australia
| | - Runchang Jia
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Miranda E Pitt
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria, 3000, Australia
| | - Shirui Pan
- School of Information and Communication Technology, Griffith University, QLD, 4222, Australia
| | - Shuqin Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Robin B Gasser
- Melbourne Veterinary School, Faculty of Science, The University of Melbourne, VIC, 3010, Australia
| | - Lachlan Jm Coin
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria, 3000, Australia.
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, 3800, Australia.
| |
Collapse
|
9
|
Wang L, Wang Y, Xuan C, Zhang B, Wu H, Gao J. Predicting potential microbe-disease associations based on multi-source features and deep learning. Brief Bioinform 2023; 24:bbad255. [PMID: 37406190 DOI: 10.1093/bib/bbad255] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 05/30/2023] [Accepted: 06/20/2023] [Indexed: 07/07/2023] Open
Abstract
Studies have confirmed that the occurrence of many complex diseases in the human body is closely related to the microbial community, and microbes can affect tumorigenesis and metastasis by regulating the tumor microenvironment. However, there are still large gaps in the clinical observation of the microbiota in disease. Although biological experiments are accurate in identifying disease-associated microbes, they are also time-consuming and expensive. The computational models for effective identification of diseases related microbes can shorten this process, and reduce capital and time costs. Based on this, in the paper, a model named DSAE_RF is presented to predict latent microbe-disease associations by combining multi-source features and deep learning. DSAE_RF calculates four similarities between microbes and diseases, which are then used as feature vectors for the disease-microbe pairs. Later, reliable negative samples are screened by k-means clustering, and a deep sparse autoencoder neural network is further used to extract effective features of the disease-microbe pairs. In this foundation, a random forest classifier is presented to predict the associations between microbes and diseases. To assess the performance of the model in this paper, 10-fold cross-validation is implemented on the same dataset. As a result, the AUC and AUPR of the model are 0.9448 and 0.9431, respectively. Furthermore, we also conduct a variety of experiments, including comparison of negative sample selection methods, comparison with different models and classifiers, Kolmogorov-Smirnov test and t-test, ablation experiments, robustness analysis, and case studies on Covid-19 and colorectal cancer. The results fully demonstrate the reliability and availability of our model.
Collapse
Affiliation(s)
- Liugen Wang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Yan Wang
- School of Science, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Chenxu Xuan
- School of Science, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Bai Zhang
- School of Science, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Hanwen Wu
- School of Science, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Jie Gao
- School of Science, Jiangnan University, Wuxi, Jiangsu 214122, China
| |
Collapse
|
10
|
Pan X, Coban Akdemir ZH, Gao R, Jiang X, Sheynkman GM, Wu E, Huang JH, Sahni N, Yi SS. AD-Syn-Net: systematic identification of Alzheimer's disease-associated mutation and co-mutation vulnerabilities via deep learning. Brief Bioinform 2023; 24:bbad030. [PMID: 36752347 PMCID: PMC10025433 DOI: 10.1093/bib/bbad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 12/19/2022] [Accepted: 01/13/2023] [Indexed: 02/09/2023] Open
Abstract
Alzheimer's disease (AD) is one of the most challenging neurodegenerative diseases because of its complicated and progressive mechanisms, and multiple risk factors. Increasing research evidence demonstrates that genetics may be a key factor responsible for the occurrence of the disease. Although previous reports identified quite a few AD-associated genes, they were mostly limited owing to patient sample size and selection bias. There is a lack of comprehensive research aimed to identify AD-associated risk mutations systematically. To address this challenge, we hereby construct a large-scale AD mutation and co-mutation framework ('AD-Syn-Net'), and propose deep learning models named Deep-SMCI and Deep-CMCI configured with fully connected layers that are capable of predicting cognitive impairment of subjects effectively based on genetic mutation and co-mutation profiles. Next, we apply the customized frameworks to data sets to evaluate the importance scores of the mutations and identified mutation effectors and co-mutation combination vulnerabilities contributing to cognitive impairment. Furthermore, we evaluate the influence of mutation pairs on the network architecture to dissect the genetic organization of AD and identify novel co-mutations that could be responsible for dementia, laying a solid foundation for proposing future targeted therapy for AD precision medicine. Our deep learning model codes are available open access here: https://github.com/Pan-Bio/AD-mutation-effectors.
Collapse
Affiliation(s)
- Xingxin Pan
- Livestrong Cancer Institutes, and Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
| | - Zeynep H Coban Akdemir
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Ruixuan Gao
- Departments of Chemistry and Biological Sciences, University of Illinois Chicago, Chicago, IL 60607, USA
| | - Xiaoqian Jiang
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX 77030, USA
| | - Gloria M Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Center for Public Health Genomics, and UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22903, USA
| | - Erxi Wu
- Livestrong Cancer Institutes, and Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
- Neuroscience Institute and Department of Neurosurgery, Baylor Scott & White Health, Temple, TX 76502, USA
- Department of Surgery, Texas A & M University Health Science Center, College of Medicine, Temple, TX 76508, USA
- Department of Pharmaceutical Sciences, Texas A & M University Health Science Center, College of Pharmacy, College Station, TX 77843, USA
| | - Jason H Huang
- Neuroscience Institute and Department of Neurosurgery, Baylor Scott & White Health, Temple, TX 76502, USA
- Department of Surgery, Texas A & M University Health Science Center, College of Medicine, Temple, TX 76508, USA
| | - Nidhi Sahni
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77054, USA
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Quantitative and Computational Biosciences Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - S Stephen Yi
- Livestrong Cancer Institutes, and Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
- Oden Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin, Austin, TX 78712, USA
- Interdisciplinary Life Sciences Graduate Programs (ILSGP), College of Natural Sciences, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Biomedical Engineering, Cockrell School of Engineering, The University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
11
|
Chen M, Jia S, Xue M, Huang H, Xu Z, Yang D, Zhu W, Song Q. Dual-Stream Subspace Clustering Network for revealing gene targets in Alzheimer's disease. Comput Biol Med 2022; 151:106305. [PMID: 36401971 DOI: 10.1016/j.compbiomed.2022.106305] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 11/02/2022] [Accepted: 11/06/2022] [Indexed: 11/13/2022]
Abstract
The rapid development of scRNA-seq technology in recent years has enabled us to capture high-throughput gene expression profiles at single-cell resolution, reveal the heterogeneity of complex cell populations, and greatly advance our understanding of the underlying mechanisms in human diseases. Traditional methods for gene co-expression clustering are limited to discovering effective gene groups in scRNA-seq data. In this paper, we propose a novel gene clustering method based on convolutional neural networks called Dual-Stream Subspace Clustering Network (DS-SCNet). DS-SCNet can accurately identify important gene clusters from large scales of single-cell RNA-seq data and provide useful information for downstream analysis. Based on the simulated datasets, DS-SCNet successfully clusters genes into different groups and outperforms mainstream gene clustering methods, such as DBSCAN and DESC, across different evaluation metrics. To explore the biological insights of our proposed method, we applied it to real scRNA-seq data of patients with Alzheimer's disease (AD). DS-SCNet analyzed the single-cell RNA-seq data with 10,850 genes, and accurately identified 8 optimal clusters from 6673 cells. Enrichment analysis of these gene clusters revealed functional signaling pathways including the ILS signaling, the Rho GTPase signaling, and hemostasis pathways. Further analysis of gene regulatory networks identified new hub genes such as ELF4 as important regulators of AD, which indicates that DS-SCNet contributes to the discovery and understanding of the pathogenesis in Alzheimer's disease.
Collapse
Affiliation(s)
- Minghan Chen
- Department of Computer Science, Wake Forest University, Winston-Salem, NC, USA
| | - Shishen Jia
- School of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, China
| | - Mengfan Xue
- School of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, China; Zhejiang Lab, Hangzhou, Zhejiang, China
| | | | - Ziang Xu
- Department of Computer Science, Wake Forest University, Winston-Salem, NC, USA
| | - Defu Yang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Wentao Zhu
- Zhejiang Lab, Hangzhou, Zhejiang, China.
| | - Qianqian Song
- Center for Cancer Genomics and Precision Oncology, Wake Forest Baptist Comprehensive Cancer Center, Wake Forest Baptist Medical Center, Winston Salem, NC, USA; Department of Cancer Biology, Wake Forest School of Medicine, Winston Salem, NC, USA.
| |
Collapse
|
12
|
Rashid S, Ng TA, Kwoh CK. Jupytope: computational extraction of structural properties of viral epitopes. Brief Bioinform 2022; 23:6696137. [PMID: 36094101 DOI: 10.1093/bib/bbac362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 07/29/2022] [Accepted: 08/02/2022] [Indexed: 12/14/2022] Open
Abstract
Epitope residues located on viral surface proteins are of immense interest in immunology and related applications such as vaccine development, disease diagnosis and drug design. Most tools rely on sequence-based statistical comparisons, such as information entropy of residue positions in aligned columns to infer location and properties of epitope sites. To facilitate cross-structural comparisons of epitopes on viral surface proteins, a python-based extraction tool implemented with Jupyter notebook is presented (Jupytope). Given a viral antigen structure of interest, a list of known epitope sites and a reference structure, the corresponding epitope structural properties can quickly be obtained. The tool integrates biopython modules for commonly used software such as NACCESS, DSSP as well as residue depth and outputs a list of structure-derived properties such as dihedral angles, solvent accessibility, residue depth and secondary structure that can be saved in several convenient data formats. To ensure correct spatial alignment, Jupytope takes a list of given epitope sites and their corresponding reference structure and aligns them before extracting the desired properties. Examples are demonstrated for epitopes of Influenza and severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) viral strains. The extracted properties assist detection of two Influenza subtypes and show potential in distinguishing between four major clades of SARS-CoV2, as compared with randomized labels. The tool will facilitate analytical and predictive works on viral epitopes through the extracted structural information. Jupytope and extracted datasets are available at https://github.com/shamimarashid/Jupytope.
Collapse
Affiliation(s)
- Shamima Rashid
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore
| | - Teng Ann Ng
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore
| |
Collapse
|
13
|
Algorithmically-guided discovery of viral epitopes via linguistic parsing: Problem formulation and solving by soft computing. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
14
|
Mohammad Shabani NR, Khairul Hisyam Ismail CM, Anthony AA, Leow CH, Chuah C, Abdul Majeed AB, Nor NM, He Y, Banga Singh KK, Leow CY. Mass spectrometry-based immunopeptidomics and computational vaccinology strategies for the identification of universal Shigella immunogenic candidates. Comput Biol Med 2022; 148:105900. [DOI: 10.1016/j.compbiomed.2022.105900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 06/26/2022] [Accepted: 07/16/2022] [Indexed: 11/16/2022]
|
15
|
CapsProm: a capsule network for promoter prediction. Comput Biol Med 2022; 147:105627. [DOI: 10.1016/j.compbiomed.2022.105627] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 04/05/2022] [Accepted: 04/11/2022] [Indexed: 11/21/2022]
|
16
|
Bukhari SNH, Webber J, Mehbodniya A. Decision tree based ensemble machine learning model for the prediction of Zika virus T-cell epitopes as potential vaccine candidates. Sci Rep 2022; 12:7810. [PMID: 35552469 PMCID: PMC9096330 DOI: 10.1038/s41598-022-11731-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 04/25/2022] [Indexed: 12/30/2022] Open
Abstract
Zika fever is an infectious disease caused by the Zika virus (ZIKV). The disease is claiming millions of lives worldwide, primarily in developing countries. In addition to vector control strategies, the most effective way to prevent the spread of ZIKV infection is vaccination. There is no clinically approved vaccine to combat ZIKV infection and curb its pandemic. An epitope-based peptide vaccine (EBPV) is seen as a powerful alternative to conventional vaccinations because of its low production cost and short production time. Nonetheless, EBPVs have gotten less attention, despite the fact that they have a significant untapped potential for enhancing vaccine safety, immunogenicity, and cross-reactivity. Such a vaccine technology is based on target pathogen’s selected antigenic peptides called T-cell epitopes (TCE), which are synthesized chemically based on their amino acid sequences. The identification of TCEs using wet-lab experimental approach is challenging, expensive, and time-consuming. Therefore in this study, we present computational model for the prediction of ZIKV TCEs. The model proposed is an ensemble of decision trees that utilizes the physicochemical properties of amino acids. In this way a large amount of time and efforts would be saved for quick vaccine development. The peptide sequences dataset for model training was retrieved from Virus Pathogen Database and Analysis Resource (ViPR) database. The sequences dataset consist of experimentally verified T-cell epitopes (TCEs) and non-TCEs. The model demonstrated promising results when evaluated on test dataset. The evaluation metrics namely, accuracy, AUC, sensitivity, specificity, Gini and Mathew’s correlation coefficient (MCC) recorded values of 0.9789, 0.984, 0.981, 0.987, 0.974 and 0.948 respectively. The consistency and reliability of the model was assessed by carrying out the five (05)-fold cross-validation technique, and the mean accuracy of 0.97864 was reported. Finally, model was compared with standard machine learning (ML) algorithms and the proposed model outperformed all of them. The proposed model will aid in predicting novel and immunodominant TCEs of ZIKV. The predicted TCEs may have a high possibility of acting as prospective vaccine targets subjected to in-vivo and in-vitro scientific assessments, thereby saving lives worldwide, preventing future epidemic-scale outbreaks, and lowering the possibility of mutation escape.
Collapse
Affiliation(s)
- Syed Nisar Hussain Bukhari
- National Institute of Electronics and Information Technology (NIELIT), Ministry of Electronics and Information Technology (MeitY), Govt. of India, Srinagar, J&K, 191132, India
| | - Julian Webber
- Department of Electronics and Communication Engineering, Kuwait College of Science and Technology (KCST), Doha Area, Kuwait
| | - Abolfazl Mehbodniya
- Department of Electronics and Communication Engineering, Kuwait College of Science and Technology (KCST), Doha Area, Kuwait.
| |
Collapse
|
17
|
Bukhari SNH, Jain A, Haq E, Mehbodniya A, Webber J. Machine Learning Techniques for the Prediction of B-Cell and T-Cell Epitopes as Potential Vaccine Targets with a Specific Focus on SARS-CoV-2 Pathogen: A Review. Pathogens 2022; 11:146. [PMID: 35215090 PMCID: PMC8879824 DOI: 10.3390/pathogens11020146] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 01/19/2022] [Accepted: 01/21/2022] [Indexed: 02/01/2023] Open
Abstract
The only part of an antigen (a protein molecule found on the surface of a pathogen) that is composed of epitopes specific to T and B cells is recognized by the human immune system (HIS). Identification of epitopes is considered critical for designing an epitope-based peptide vaccine (EBPV). Although there are a number of vaccine types, EBPVs have received less attention thus far. It is important to mention that EBPVs have a great deal of untapped potential for boosting vaccination safety-they are less expensive and take a short time to produce. Thus, in order to quickly contain global pandemics such as the ongoing outbreak of coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), as well as epidemics and endemics, EBPVs are considered promising vaccine types. The high mutation rate of SARS-CoV-2 has posed a great challenge to public health worldwide because either the composition of existing vaccines has to be changed or a new vaccine has to be developed to protect against its different variants. In such scenarios, time being the critical factor, EBPVs can be a promising alternative. To design an effective and viable EBPV against different strains of a pathogen, it is important to identify the putative T- and B-cell epitopes. Using the wet-lab experimental approach to identify these epitopes is time-consuming and costly because the experimental screening of a vast number of potential epitope candidates is required. Fortunately, various available machine learning (ML)-based prediction methods have reduced the burden related to the epitope mapping process by decreasing the potential epitope candidate list for experimental trials. Moreover, these methods are also cost-effective, scalable, and fast. This paper presents a systematic review of various state-of-the-art and relevant ML-based methods and tools for predicting T- and B-cell epitopes. Special emphasis is placed on highlighting and analyzing various models for predicting epitopes of SARS-CoV-2, the causative agent of COVID-19. Based on the various methods and tools discussed, future research directions for epitope prediction are presented.
Collapse
Affiliation(s)
- Syed Nisar Hussain Bukhari
- University Institute of Computing, Chandigarh University, NH-95, Chandigarh-Ludhiana Highway, Mohali 140413, India;
| | - Amit Jain
- University Institute of Computing, Chandigarh University, NH-95, Chandigarh-Ludhiana Highway, Mohali 140413, India;
| | - Ehtishamul Haq
- Department of Biotechnology, University of Kashmir, Srinagar 190006, India;
| | - Abolfazl Mehbodniya
- Department of Electronics and Communication Engineering, Kuwait College of Science and Technology, Kuwait City 20185145, Kuwait;
| | - Julian Webber
- Graduate School of Engineering Science, Osaka University, Osaka 560-8531, Japan;
| |
Collapse
|