1
|
Ge F, Li HY, Zhang M, Arif M, Alam T. TCellPredX: A Novel Approach for Accurate Prediction of Hepatitis C Virus Linear T Cell Epitopes. ACS OMEGA 2024; 9:51494-51507. [PMID: 39758636 PMCID: PMC11696426 DOI: 10.1021/acsomega.4c08715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 11/29/2024] [Accepted: 12/04/2024] [Indexed: 01/07/2025]
Abstract
Hepatitis C Virus (HCV) is a bloodborne RNA virus that leads to severe liver diseases, and currently, no effective prophylactic biologics are available to prevent its transmission. The prevention of HCV is closely related to the major histocompatibility complex (MHC). Linear antigenic peptides of HCV, known as T cell epitopes (TCEs), are crucial in the presentation process by MHC molecules to T cells, playing a key role in immune responses. Therefore, the rapid and accurate identification of these TCE-HCVs is essential for advancing vaccine development. Herein, we propose TCellPredX, a novel integrated predictor for TCE-HCV identification. TCellPredX leverages five distinct feature encoding schemes, including local and global sequence encodings, composition-transition-distribution descriptors, physicochemical properties, and embeddings from two protein language models, which are processed through 12 machine learning algorithms. Our results indicate that feature fusion significantly enhances predictive accuracy. Moreover, the maximal relevance minimal redundancy feature selection method is particularly effective in isolating informative features, ensuring the model's use of the most informative data. Additionally, ensemble models, especially when combined with an averaged voting strategy, demonstrate superior stability and accuracy compared to individual classifiers, effectively reducing noise and enhancing model robustness. TCellPredX achieves notable accuracies of 0.900 and 0.897 in 10-fold cross-validation and independent test, respectively. Furthermore, TCellPredX's high accuracy is validated on experimentally verified peptide sequences documented for their potential benefits in vaccine development. Overall, TCellPredX can offer a robust tool for the precise identification of TCE-HCV, potentially serving as a cornerstone for future epitope research and advancing HCV vaccines development.
Collapse
Affiliation(s)
- Fang Ge
- State
Key Laboratory of Organic Electronics and Information Displays, Institute of Advanced Materials (IAM), Nanjing University
of Posts and Telecommunications, 6 Wenyuan Road, Nanjing 210023, China
| | - Hao-Yang Li
- School
of Computer, Jiangsu University of Science
and Technology, 666 Changhui Road, Zhenjiang 212100, China
| | - Ming Zhang
- School
of Computer, Jiangsu University of Science
and Technology, 666 Changhui Road, Zhenjiang 212100, China
| | - Muhammad Arif
- College
of Science and Engineering, Hamad Bin Khalifa
University, Doha 34110, Qatar
| | - Tanvir Alam
- College
of Science and Engineering, Hamad Bin Khalifa
University, Doha 34110, Qatar
| |
Collapse
|
2
|
Wang R, Liang X, Zhao Y, Xue W, Liang G. UniBioPAN: A Novel Universal Classification Architecture for Bioactive Peptides Inspired by Video Action Recognition. J Chem Inf Model 2024; 64:9276-9285. [PMID: 39571078 DOI: 10.1021/acs.jcim.4c01599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
The classification of bioactive peptides is of great importance in protein biology, but there is still a lack of a universal and effective classifier. Inspired by video action recognition, we developed the UniBioPAN architecture to create a universal peptide classifier to solve this problem. The architecture treats the peptide sequence as a video sequence and the molecular image of each amino acid in the peptide sequence as a video frame, enabling feature extraction and classification using convolutional neural networks, bidirectional long short-term memory networks, and fully connected networks. As a novel peptide classification architecture, UniBioPAN significantly outperforms other universal architecture in ACC, AUC and MCC across 11 data sets, and F1 score in 9 data sets. UniBioPAN is available in three ways: python script, jupyter notebook script and web server (https://gzliang.cqu.edu.cn/software/UniBioPAN.html). In summary, UniBioPAN is a universal, convenient, and high-performance peptide classification architecture. UniBioPAN holds significant importance in the discovery of bioactive peptides and the advancement of peptide classifiers. All the codes and data sets are publicly available at https://github.com/sanwrh/UniBioPAN.
Collapse
Affiliation(s)
- Ruihong Wang
- Key Laboratory of Biorheological Science and Technology (Chongqing University), Ministry of Education, Bioengineering College, Chongqing University, Chongqing 400044, China
| | - Xiao Liang
- Key Laboratory of Biorheological Science and Technology (Chongqing University), Ministry of Education, Bioengineering College, Chongqing University, Chongqing 400044, China
| | - Yi Zhao
- Key Laboratory of Biorheological Science and Technology (Chongqing University), Ministry of Education, Bioengineering College, Chongqing University, Chongqing 400044, China
| | - Wenjun Xue
- Key Laboratory of Biorheological Science and Technology (Chongqing University), Ministry of Education, Bioengineering College, Chongqing University, Chongqing 400044, China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology (Chongqing University), Ministry of Education, Bioengineering College, Chongqing University, Chongqing 400044, China
- Bioengineering College of Chongqing University, No. 174, Shazheng Street, Shapingba District, Chongqing 400030, China
| |
Collapse
|
3
|
Zhang M, Zhou J, Wang X, Wang X, Ge F. DeepBP: Ensemble deep learning strategy for bioactive peptide prediction. BMC Bioinformatics 2024; 25:352. [PMID: 39528950 PMCID: PMC11556071 DOI: 10.1186/s12859-024-05974-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 11/04/2024] [Indexed: 11/16/2024] Open
Abstract
BACKGROUND Bioactive peptides are important bioactive molecules composed of short-chain amino acids that play various crucial roles in the body, such as regulating physiological processes and promoting immune responses and antibacterial effects. Due to their significance, bioactive peptides have broad application potential in drug development, food science, and biotechnology. Among them, understanding their biological mechanisms will contribute to new ideas for drug discovery and disease treatment. RESULTS This study employs generative adversarial capsule networks (CapsuleGAN), gated recurrent units (GRU), and convolutional neural networks (CNN) as base classifiers to achieve ensemble learning through voting methods, which not only obtains high-precision prediction results on the angiotensin-converting enzyme (ACE) inhibitory peptides dataset and the anticancer peptides (ACP) dataset but also demonstrates effective model performance. For this method, we first utilized the protein language model-evolutionary scale modeling (ESM-2)-to extract relevant features for the ACE inhibitory peptides and ACP datasets. Following feature extraction, we trained three deep learning models-CapsuleGAN, GRU, and CNN-while continuously adjusting the model parameters throughout the training process. Finally, during the voting stage, different weights were assigned to the models based on their prediction accuracy, allowing full utilization of the model's performance. Experimental results show that on the ACE inhibitory peptide dataset, the balanced accuracy is 0.926, the Matthews correlation coefficient (MCC) is 0.831, and the area under the curve is 0.966; on the ACP dataset, the accuracy (ACC) is 0.779, and the MCC is 0.558. The experimental results on both datasets are superior to existing methods, demonstrating the effectiveness of the experimental approach. CONCLUSION In this study, CapsuleGAN, GRU, and CNN were successfully employed as base classifiers to implement ensemble learning, which not only achieved good results in the prediction of two datasets but also surpassed existing methods. The ability to predict peptides with strong ACE inhibitory activity and ACPs more accurately and quickly is significant, and this work provides valuable insights for predicting other functional peptides. The source code and dataset for this experiment are publicly available at https://github.com/Zhou-Jianren/bioactive-peptides .
Collapse
Affiliation(s)
- Ming Zhang
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang, 212100, China.
| | - Jianren Zhou
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang, 212100, China
| | - Xiaohua Wang
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang, 212100, China
| | - Xun Wang
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang, 212100, China
| | - Fang Ge
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, 9 Wenyuan Road, Nanjing, 210023, China.
| |
Collapse
|
4
|
Charoenkwan P, Chumnanpuen P, Schaduangrat N, Shoombuatong W. Accelerating the identification of the allergenic potential of plant proteins using a stacked ensemble-learning framework. J Biomol Struct Dyn 2024:1-13. [PMID: 38385478 DOI: 10.1080/07391102.2024.2318482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 02/08/2024] [Indexed: 02/23/2024]
Abstract
Plant-allergenic proteins (PAPs) have the potential to induce allergic reactions in certain individuals. While these proteins are generally innocuous for the majority of people, they can elicit an immune response in those with particular sensitivities. Thus, screening and prioritizing the allergenic potential of plant proteins is indispensable for the development of diagnostic tools, therapeutic interventions or medications to treat allergic reactions. However, investigating the allergenic potential of plant proteins based on experimental methods is costly and labour-intensive. Therefore, we develop StackPAP, a three-layer stacking ensemble framework for accurate large-scale identification of PAPs. In StackPAP, at the first layer, we conducted a comprehensive analysis of an extensive set of feature descriptors. Subsequently, we selected and fused five potential sequence-based feature descriptors, including amphiphilic pseudo-amino acid composition, dipeptide deviation from expected mean, amino acid composition, pseudo amino acid composition and dipeptide composition. Additionally, we applied an efficient genetic algorithm (GA-SAR) to determine informative feature sets. In the second layer, 12 powerful machine learning (ML) methods, in combination with all the informative feature sets, were employed to construct a pool of base classifiers. Finally, 13 potential base classifiers were selected using the GA-SAR method and combined to develop the final meta-classifier. Our experimental results revealed the promising prediction performance of StackPAP, with an accuracy, Matthew's correlation coefficient and AUC of 0.984, 0.969 and 0.993, respectively, as judged by the independent test dataset. In conclusion, both cross-validation and independent test results indicated the superior performance of StackPAP compared with several ML-based classifiers. To accelerate the identification of the allergenicity of plant proteins, we developed a user-friendly web server for StackPAP (https://pmlabqsar.pythonanywhere.com/StackPAP). We anticipate that StackPAP will be an efficient and useful tool for rapidly screening PAPs from a vast number of plant proteins.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, Thailand
- Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| |
Collapse
|