1
|
Doneva N, Dimitrov I. Viral Immunogenicity Prediction by Machine Learning Methods. Int J Mol Sci 2024; 25:2949. [PMID: 38474194 DOI: 10.3390/ijms25052949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/28/2024] [Accepted: 02/29/2024] [Indexed: 03/14/2024] Open
Abstract
Since viruses are one of the main causes of infectious illnesses, prophylaxis is essential for efficient disease control. Vaccines play a pivotal role in mitigating the transmission of various viral infections and fortifying our defenses against them. The initial step in modern vaccine design and development involves the identification of potential vaccine targets through computational techniques. Here, using datasets of 1588 known viral immunogens and 468 viral non-immunogens, we apply machine learning algorithms to develop models for the prediction of protective immunogens of viral origin. The datasets are split into training and test sets in a 4:1 ratio. The protein structures are encoded by E-descriptors and transformed into uniform vectors by the auto- and cross-covariance methods. The most relevant descriptors are selected by the gain/ratio technique. The models generated by Random Forest, Multilayer Perceptron, and XGBoost algorithms demonstrate superior predictive performance on the test sets, surpassing predictions made by VaxiJen 2.0-an established gold standard in viral immunogenicity prediction. The key attributes determining immunogenicity in viral proteins are specific fingerprints in hydrophobicity and steric properties.
Collapse
Affiliation(s)
- Nikolet Doneva
- Faculty of Pharmacy, Medical University-Sofia, 1000 Sofia, Bulgaria
| | - Ivan Dimitrov
- Faculty of Pharmacy, Medical University-Sofia, 1000 Sofia, Bulgaria
| |
Collapse
|
2
|
Feng H, Wang F, Li N, Xu Q, Zheng G, Sun X, Hu M, Li X, Xing G, Zhang G. Use of tree-based machine learning methods to screen affinitive peptides based on docking data. Mol Inform 2023; 42:e202300143. [PMID: 37696773 DOI: 10.1002/minf.202300143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 09/03/2023] [Accepted: 09/11/2023] [Indexed: 09/13/2023]
Abstract
Screening peptides with good affinity is an important step in peptide-drug discovery. Recent advancement in computer and data science have made machine learning a useful tool in accurately affinitive-peptide screening. In current study, four different tree-based algorithms, including Classification and regression trees (CART), C5.0 decision tree (C50), Bagged CART (BAG) and Random Forest (RF), were employed to explore the relationship between experimental peptide affinities and virtual docking data, and the performance of each model was also compared in parallel. All four algorithms showed better performances on dataset pre-scaled, -centered and -PCA than other pre-processed dataset. After model re-built and hyperparameter optimization, the optimal C50 model (C50O) showed the best performances in terms of Accuracy, Kappa, Sensitivity, Specificity, F1, MCC and AUC when validated on test data and an unknown PEDV datasets evaluation (Accuracy=80.4 %). BAG and RFO (the optimal RF), as two best models during training process, did not performed as expecting during in testing and unknown dataset validations. Furthermore, the high correlation of the predictions of RFO and BAG to C50O implied the high stability and robustness of their prediction. Whereas although the good performance on unknown dataset, the poor performance in test data validation and correlation analysis indicated CARTO could not be used for future data prediction. To accurately evaluate the peptide affinity, the current study firstly gave a tree-model competition on affinitive peptide prediction by using virtual docking data, which would expand the application of machine learning algorithms in studying PepPIs and benefit the development of peptide therapeutics.
Collapse
Affiliation(s)
- Hua Feng
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Fangyu Wang
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Ning Li
- College of Food Science and Technology, Henan Agricultural University, Zhengzhou, China
| | - Qian Xu
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Guanming Zheng
- Public Health and Preventive Medicine Teaching and Research Center, Henan University of Chinese Medicine, Zhengzhou, Henan, China
| | - Xuefeng Sun
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Man Hu
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Xuewu Li
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Guangxu Xing
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Gaiping Zhang
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
- Longhu Modern Immunology Laboratory, Zhengzhou, China
- School of Advanced Agricultural sciences, Peking University, Beijing, China
- Jiangsu Co-Innovation Center for the Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou, Jiangsu, China
| |
Collapse
|
3
|
Feng H, Wang F, Li N, Xu Q, Zheng G, Sun X, Hu M, Xing G, Zhang G. A Random Forest Model for Peptide Classification Based on Virtual Docking Data. Int J Mol Sci 2023; 24:11409. [PMID: 37511165 PMCID: PMC10380188 DOI: 10.3390/ijms241411409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 06/25/2023] [Accepted: 07/11/2023] [Indexed: 07/30/2023] Open
Abstract
The affinity of peptides is a crucial factor in studying peptide-protein interactions. Despite the development of various techniques to evaluate peptide-receptor affinity, the results may not always reflect the actual affinity of the peptides accurately. The current study provides a free tool to assess the actual peptide affinity based on virtual docking data. This study employed a dataset that combined actual peptide affinity information (active and inactive) and virtual peptide-receptor docking data, and different machine learning algorithms were utilized. Compared with the other algorithms, the random forest (RF) algorithm showed the best performance and was used in building three RF models using different numbers of significant features (four, three, and two). Further analysis revealed that the four-feature RF model achieved the highest Accuracy of 0.714 in classifying an independent unknown peptide dataset designed with the PEDV spike protein, and it also revealed overfitting problems in the other models. This four-feature RF model was used to evaluate peptide affinity by constructing the relationship between the actual affinity and the virtual docking scores of peptides to their receptors.
Collapse
Affiliation(s)
- Hua Feng
- Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou 450002, China
| | - Fangyu Wang
- Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou 450002, China
| | - Ning Li
- Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou 450002, China
| | - Qian Xu
- Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou 450002, China
| | - Guanming Zheng
- Public Health and Preventive Medicine Teaching and Research Center, Henan University of Chinese Medicine, Zhengzhou 450046, China
| | - Xuefeng Sun
- Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou 450002, China
| | - Man Hu
- Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou 450002, China
| | - Guangxu Xing
- Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou 450002, China
| | - Gaiping Zhang
- Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou 450002, China
- Longhu Modern Immunology Laboratory, Zhengzhou 450002, China
- School of Advanced Agricultural Sciences, Peking University, Beijing 100871, China
- Jiangsu Co-Innovation Center for the Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou 225009, China
| |
Collapse
|
4
|
Sharma T, Saralamma VVG, Lee DC, Imran MA, Choi J, Baig MH, Dong JJ. Combining structure-based pharmacophore modeling and machine learning for the identification of novel BTK inhibitors. Int J Biol Macromol 2022; 222:239-250. [PMID: 36130643 DOI: 10.1016/j.ijbiomac.2022.09.151] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/13/2022] [Accepted: 09/16/2022] [Indexed: 11/05/2022]
Abstract
Bruton's tyrosine kinase (BTK) is a critical enzyme which is involved in multiple signaling pathways that regulate cellular survival, activation, and proliferation, making it a major cancer therapeutic target. We applied the novel integrated structure-based pharmacophore modeling, machine learning, and other in silico studies to screen the Korean chemical database (KCB) to identify the potential BTK inhibitors (BTKi). Further evaluation of these inhibitors on three different human cancer cell lines showed significant cell growth inhibitory activity. Among the 13 compounds shortlisted, four demonstrated consistent cell inhibition activity among breast, gastric, and lung cancer cells (IC50 below 3 μM). The selected compounds also showed significant kinase inhibition activity (IC50 below 5 μM). The current study suggests the potential of these inhibitors for targeting BTK malignant tumors.
Collapse
Affiliation(s)
- Tanuj Sharma
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Gangnam-gu, Seoul 120-752, Republic of Korea
| | - Venu Venkatarame Gowda Saralamma
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Gangnam-gu, Seoul 120-752, Republic of Korea
| | - Duk Chul Lee
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Gangnam-gu, Seoul 120-752, Republic of Korea
| | - Mohammad Azhar Imran
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Gangnam-gu, Seoul 120-752, Republic of Korea
| | - Jaehyuk Choi
- BNJBiopharma, 2nd floor Memorial Hall, 85, Songdogwahak-ro, Yeonsu-gu, Incheon 21983, Republic of Korea
| | - Mohammad Hassan Baig
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Gangnam-gu, Seoul 120-752, Republic of Korea.
| | - Jae-June Dong
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Gangnam-gu, Seoul 120-752, Republic of Korea.
| |
Collapse
|