1
|
Yang S, Xu P. LLM4THP: a computing tool to identify tumor homing peptides by molecular and sequence representation of large language model based on two-layer ensemble model strategy. Amino Acids 2024; 56:62. [PMID: 39404804 PMCID: PMC11480143 DOI: 10.1007/s00726-024-03422-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 10/04/2024] [Indexed: 10/19/2024]
Abstract
Tumor homing peptides (THPs) have a distinctive capacity to specifically attach to tumor cells, providing a promising approach for targeted cancer treatment and detection. Although THPs have the potential for significant impact, their detection by conventional methods is both time-consuming and expensive. To tackle this issue, we provide LLM4THP, an innovative computational approach that utilizes large language models (LLMs) to quickly and effectively detect THPs. LLM4THP utilizes two protein LLMs, ESM2 and Prot_T5_XL_UniRef50, to encode peptide sequences. This allows for the capture of complex patterns and relationships within the peptide data. In addition, we utilize inherent sequence characteristics such as Amino Acid Composition (AAC), Pseudo Amino Acid Composition (PAAC), Amphiphilic Pseudo Amino Acid Composition (APAAC), and Composition, Transition, and Distribution (CTD) to improve the representation of peptides. The RDKitDescriptors feature representation approach transforms peptide sequences into molecular objects and computes chemical characteristics, resulting in enhanced THP identification. The LLM4THP ensemble strategy incorporates various features into a two-layer learning architecture. The first layer consists of LightGBM, XGBoost, Random Forest, and Extremely Randomized Trees, which generate a set of meta results. The second layer utilizes Logistic Regression to further refine the identification of sequences as either THP or non-THP. LLM4THP exhibits exceptional performance compared to the most advanced methods, showcasing enhancements in accuracy, Matthew's correlation coefficient, F1 score, area under the curve, and average precision. The source code and dataset can be accessed at the following URL: https://github.com/abcair/LLM4THP.
Collapse
Affiliation(s)
- Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China
- The Affiliated Changzhou No. 2 People's Hospital of Nanjing Medical University, Changzhou, 213164, China
| | - Piao Xu
- College of Economics and Management, Nanjing Forestry University, Nanjing, 210037, China.
| |
Collapse
|
2
|
Milewska S, Sadowska A, Stefaniuk N, Misztalewska-Turkowicz I, Wilczewska AZ, Car H, Niemirowicz-Laskowska K. Tumor-Homing Peptides as Crucial Component of Magnetic-Based Delivery Systems: Recent Developments and Pharmacoeconomical Perspective. Int J Mol Sci 2024; 25:6219. [PMID: 38892406 PMCID: PMC11172452 DOI: 10.3390/ijms25116219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 05/24/2024] [Accepted: 05/27/2024] [Indexed: 06/21/2024] Open
Abstract
According to data from the World Health Organization (WHO), cancer is considered to be one of the leading causes of death worldwide, and new therapeutic approaches, especially improved novel cancer treatment regimens, are in high demand. Considering that many chemotherapeutic drugs tend to have poor pharmacokinetic profiles, including rapid clearance and limited on-site accumulation, a combined approach with tumor-homing peptide (THP)-functionalized magnetic nanoparticles could lead to remarkable improvements. This is confirmed by an increasing number of papers in this field, showing that the on-target peptide functionalization of magnetic nanoparticles improves their penetration properties and ensures tumor-specific binding, which results in an increased clinical response. This review aims to highlight the potential applications of THPs in combination with magnetic carriers across various fields, including a pharmacoeconomic perspective.
Collapse
Affiliation(s)
- Sylwia Milewska
- Department of Experimental Pharmacology, Medical University of Bialystok, Szpitalna 37, 15-295 Bialystok, Poland; (S.M.); (A.S.); (N.S.); (H.C.)
| | - Anna Sadowska
- Department of Experimental Pharmacology, Medical University of Bialystok, Szpitalna 37, 15-295 Bialystok, Poland; (S.M.); (A.S.); (N.S.); (H.C.)
| | - Natalia Stefaniuk
- Department of Experimental Pharmacology, Medical University of Bialystok, Szpitalna 37, 15-295 Bialystok, Poland; (S.M.); (A.S.); (N.S.); (H.C.)
| | | | - Agnieszka Z. Wilczewska
- Faculty of Chemistry, University of Bialystok, Ciolkowskiego 1K, 15-245 Bialystok, Poland; (I.M.-T.); (A.Z.W.)
| | - Halina Car
- Department of Experimental Pharmacology, Medical University of Bialystok, Szpitalna 37, 15-295 Bialystok, Poland; (S.M.); (A.S.); (N.S.); (H.C.)
| | - Katarzyna Niemirowicz-Laskowska
- Department of Experimental Pharmacology, Medical University of Bialystok, Szpitalna 37, 15-295 Bialystok, Poland; (S.M.); (A.S.); (N.S.); (H.C.)
| |
Collapse
|
3
|
Arif R, Kanwal S, Ahmed S, Kabir M. A Computational Predictor for Accurate Identification of Tumor Homing Peptides by Integrating Sequential and Deep BiLSTM Features. Interdiscip Sci 2024; 16:503-518. [PMID: 38733473 DOI: 10.1007/s12539-024-00628-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 03/16/2024] [Accepted: 03/27/2024] [Indexed: 05/13/2024]
Abstract
Cancer remains a severe illness, and current research indicates that tumor homing peptides (THPs) play an important part in cancer therapy. The identification of THPs can provide crucial insights for drug-discovery and pharmaceutical industries as they allow for tailored medication delivery towards cancer cells. These peptides have a high affinity enabling particular receptors present upon tumor surfaces, allowing for the creation of precision medications that reduce off-target consequences and enhance cancer patient treatment results. Wet-lab techniques are considered essential tools for studying THPs; however, they're labor-extensive and time-consuming, therefore making prediction of THPs a challenging task for the researchers. Computational-techniques, on the other hand, are considered significant tools in identifying THPs according to the sequence data. Despite many strategies have been presented to predict new THP, there is still a need to develop a robust method with higher rates of success. In this paper, we developed a novel framework, THP-DF, for accurately identifying THPs on a large-scale. Firstly, the peptide sequences are encoded through various sequential features. Secondly, each feature is passed to BiLSTM and attention layers to extract simplified deep features. Finally, an ensemble-framework is formed via integrating sequential- and deep features which are fed to a support vector machine which with 10-fold cross-validation to carry to validate the efficiency. The experimental results showed that THP-DF worked better on both [Formula: see text] and [Formula: see text] datasets by achieving accuracy of > 95% which are higher than existing predictors both datasets. This indicates that the proposed predictor could be a beneficial tool to precisely and rapidly identify THPs and will contribute to the cutting-edge cancer treatment strategies and pharmaceuticals.
Collapse
Affiliation(s)
- Roha Arif
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Sameera Kanwal
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Saeed Ahmed
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Muhammad Kabir
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan.
| |
Collapse
|
4
|
Shoombuatong W, Homdee N, Schaduangrat N, Chumnanpuen P. Leveraging a meta-learning approach to advance the accuracy of Na v blocking peptides prediction. Sci Rep 2024; 14:4463. [PMID: 38396246 PMCID: PMC10891130 DOI: 10.1038/s41598-024-55160-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 02/21/2024] [Indexed: 02/25/2024] Open
Abstract
The voltage-gated sodium (Nav) channel is a crucial molecular component responsible for initiating and propagating action potentials. While the α subunit, forming the channel pore, plays a central role in this function, the complete physiological function of Nav channels relies on crucial interactions between the α subunit and auxiliary proteins, known as protein-protein interactions (PPI). Nav blocking peptides (NaBPs) have been recognized as a promising and alternative therapeutic agent for pain and itch. Although traditional experimental methods can precisely determine the effect and activity of NaBPs, they remain time-consuming and costly. Hence, machine learning (ML)-based methods that are capable of accurately contributing in silico prediction of NaBPs are highly desirable. In this study, we develop an innovative meta-learning-based NaBP prediction method (MetaNaBP). MetaNaBP generates new feature representations by employing a wide range of sequence-based feature descriptors that cover multiple perspectives, in combination with powerful ML algorithms. Then, these feature representations were optimized to identify informative features using a two-step feature selection method. Finally, the selected informative features were applied to develop the final meta-predictor. To the best of our knowledge, MetaNaBP is the first meta-predictor for NaBP prediction. Experimental results demonstrated that MetaNaBP achieved an accuracy of 0.948 and a Matthews correlation coefficient of 0.898 over the independent test dataset, which were 5.79% and 11.76% higher than the existing method. In addition, the discriminative power of our feature representations surpassed that of conventional feature descriptors over both the training and independent test datasets. We anticipate that MetaNaBP will be exploited for the large-scale prediction and analysis of NaBPs to narrow down the potential NaBPs.
Collapse
Affiliation(s)
- Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| | - Nutta Homdee
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand
- Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, 10900, Thailand
| |
Collapse
|
5
|
Iwaniak A, Minkiewicz P, Darewicz M. Bioinformatics and bioactive peptides from foods: Do they work together? ADVANCES IN FOOD AND NUTRITION RESEARCH 2024; 108:35-111. [PMID: 38461003 DOI: 10.1016/bs.afnr.2023.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2024]
Abstract
We live in the Big Data Era which affects many aspects of science, including research on bioactive peptides derived from foods, which during the last few decades have been a focus of interest for scientists. These two issues, i.e., the development of computer technologies and progress in the discovery of novel peptides with health-beneficial properties, are closely interrelated. This Chapter presents the example applications of bioinformatics for studying biopeptides, focusing on main aspects of peptide analysis as the starting point, including: (i) the role of peptide databases; (ii) aspects of bioactivity prediction; (iii) simulation of peptide release from proteins. Bioinformatics can also be used for predicting other features of peptides, including ADMET, QSAR, structure, and taste. To answer the question asked "bioinformatics and bioactive peptides from foods: do they work together?", currently it is almost impossible to find examples of peptide research with no bioinformatics involved. However, theoretical predictions are not equivalent to experimental work and always require critical scrutiny. The aspects of compatibility of in silico and in vitro results are also summarized herein.
Collapse
Affiliation(s)
- Anna Iwaniak
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland.
| | - Piotr Minkiewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland
| | - Małgorzata Darewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland
| |
Collapse
|
6
|
Shoombuatong W, Schaduangrat N, Nikom J. Empirical comparison and analysis of machine learning-based approaches for druggable protein identification. EXCLI JOURNAL 2023; 22:915-927. [PMID: 37780939 PMCID: PMC10539545 DOI: 10.17179/excli2023-6410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 08/15/2023] [Indexed: 10/03/2023]
Abstract
Efficiently and precisely identifying drug targets is crucial for developing and discovering potential medications. While conventional experimental approaches can accurately pinpoint these targets, they suffer from time constraints and are not easily adaptable to high-throughput processes. On the other hand, computational approaches, particularly those utilizing machine learning (ML), offer an efficient means to accelerate the prediction of druggable proteins based solely on their primary sequences. Recently, several state-of-the-art computational methods have been developed for predicting and analyzing druggable proteins. These computational methods showed high diversity in terms of benchmark datasets, feature extraction schemes, ML algorithms, evaluation strategies and webserver/software usability. Thus, our objective is to reexamine these computational approaches and conduct a comprehensive assessment of their strengths and weaknesses across multiple aspects. In this study, we deliver the first comprehensive survey regarding the state-of-the-art computational approaches for in silico prediction of druggable proteins. First, we provided information regarding the existing benchmark datasets and the types of ML methods employed. Second, we investigated the effectiveness of these computational methods in druggable protein identification for each benchmark dataset. Third, we summarized the important features used in this field and the existing webserver/software. Finally, we addressed the present constraints of the existing methods and offer valuable guidance to the scientific community in designing and developing novel prediction models. We anticipate that this comprehensive review will provide crucial information for the development of more accurate and efficient druggable protein predictors.
Collapse
Affiliation(s)
- Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| | - Jaru Nikom
- Research Methodology and Data Analytics Program, Faculty of Science & Technology, Prince of Songkla University, Pattani, Thailand, 94000
| |
Collapse
|
7
|
Guan J, Yao L, Chung CR, Chiang YC, Lee TY. StackTHPred: Identifying Tumor-Homing Peptides through GBDT-Based Feature Selection with Stacking Ensemble Architecture. Int J Mol Sci 2023; 24:10348. [PMID: 37373494 DOI: 10.3390/ijms241210348] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 05/31/2023] [Accepted: 06/02/2023] [Indexed: 06/29/2023] Open
Abstract
One of the major challenges in cancer therapy lies in the limited targeting specificity exhibited by existing anti-cancer drugs. Tumor-homing peptides (THPs) have emerged as a promising solution to this issue, due to their capability to specifically bind to and accumulate in tumor tissues while minimally impacting healthy tissues. THPs are short oligopeptides that offer a superior biological safety profile, with minimal antigenicity, and faster incorporation rates into target cells/tissues. However, identifying THPs experimentally, using methods such as phage display or in vivo screening, is a complex, time-consuming task, hence the need for computational methods. In this study, we proposed StackTHPred, a novel machine learning-based framework that predicts THPs using optimal features and a stacking architecture. With an effective feature selection algorithm and three tree-based machine learning algorithms, StackTHPred has demonstrated advanced performance, surpassing existing THP prediction methods. It achieved an accuracy of 0.915 and a 0.831 Matthews Correlation Coefficient (MCC) score on the main dataset, and an accuracy of 0.883 and a 0.767 MCC score on the small dataset. StackTHPred also offers favorable interpretability, enabling researchers to better understand the intrinsic characteristics of THPs. Overall, StackTHPred is beneficial for both the exploration and identification of THPs and facilitates the development of innovative cancer therapies.
Collapse
Affiliation(s)
- Jiahui Guan
- School of Medicine, The Chinese University of Hong Kong (Shenzhen) 2001 Longxiang Road, Shenzhen 518172, China
| | - Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Chia-Ru Chung
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Ying-Chih Chiang
- School of Medicine, The Chinese University of Hong Kong (Shenzhen) 2001 Longxiang Road, Shenzhen 518172, China
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
8
|
Yue ZX, Yan TC, Xu HQ, Liu YH, Hong YF, Chen GX, Xie T, Tao L. A systematic review on the state-of-the-art strategies for protein representation. Comput Biol Med 2023; 152:106440. [PMID: 36543002 DOI: 10.1016/j.compbiomed.2022.106440] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/08/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022]
Abstract
The study of drug-target protein interaction is a key step in drug research. In recent years, machine learning techniques have become attractive for research, including drug research, due to their automated nature, predictive power, and expected efficiency. Protein representation is a key step in the study of drug-target protein interaction by machine learning, which plays a fundamental role in the ultimate accomplishment of accurate research. With the progress of machine learning, protein representation methods have gradually attracted attention and have consequently developed rapidly. Therefore, in this review, we systematically classify current protein representation methods, comprehensively review them, and discuss the latest advances of interest. According to the information extraction methods and information sources, these representation methods are generally divided into structure and sequence-based representation methods. Each primary class can be further divided into specific subcategories. As for the particular representation methods involve both traditional and the latest approaches. This review contains a comprehensive assessment of the various methods which researchers can use as a reference for their specific protein-related research requirements, including drug research.
Collapse
Affiliation(s)
- Zi-Xuan Yue
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian-Ci Yan
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Hong-Quan Xu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yu-Hong Liu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yan-Feng Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Gong-Xing Chen
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| |
Collapse
|
9
|
Hemmati S, Rasekhi Kazerooni H. Polypharmacological Cell-Penetrating Peptides from Venomous Marine Animals Based on Immunomodulating, Antimicrobial, and Anticancer Properties. Mar Drugs 2022; 20:md20120763. [PMID: 36547910 PMCID: PMC9787916 DOI: 10.3390/md20120763] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 11/25/2022] [Accepted: 11/30/2022] [Indexed: 12/09/2022] Open
Abstract
Complex pathological diseases, such as cancer, infection, and Alzheimer's, need to be targeted by multipronged curative. Various omics technologies, with a high rate of data generation, demand artificial intelligence to translate these data into druggable targets. In this study, 82 marine venomous animal species were retrieved, and 3505 cryptic cell-penetrating peptides (CPPs) were identified in their toxins. A total of 279 safe peptides were further analyzed for antimicrobial, anticancer, and immunomodulatory characteristics. Protease-resistant CPPs with endosomal-escape ability in Hydrophis hardwickii, nuclear-localizing peptides in Scorpaena plumieri, and mitochondrial-targeting peptides from Synanceia horrida were suitable for compartmental drug delivery. A broad-spectrum S. horrida-derived antimicrobial peptide with a high binding-affinity to bacterial membranes was an antigen-presenting cell (APC) stimulator that primes cytokine release and naïve T-cell maturation simultaneously. While antibiofilm and wound-healing peptides were detected in Synanceia verrucosa, APC epitopes as universal adjuvants for antiviral vaccination were in Pterois volitans and Conus monile. Conus pennaceus-derived anticancer peptides showed antiangiogenic and IL-2-inducing properties with moderate BBB-permeation and were defined to be a tumor-homing peptide (THP) with the ability to inhibit programmed death ligand-1 (PDL-1). Isoforms of RGD-containing peptides with innate antiangiogenic characteristics were in Conus tessulatus for tumor targeting. Inhibitors of neuropilin-1 in C. pennaceus are proposed for imaging probes or therapeutic delivery. A Conus betulinus cryptic peptide, with BBB-permeation, mitochondrial-targeting, and antioxidant capacity, was a stimulator of anti-inflammatory cytokines and non-inducer of proinflammation proposed for Alzheimer's. Conclusively, we have considered the dynamic interaction of cells, their microenvironment, and proportional-orchestrating-host- immune pathways by multi-target-directed CPPs resembling single-molecule polypharmacology. This strategy might fill the therapeutic gap in complex resistant disorders and increase the candidates' clinical-translation chance.
Collapse
Affiliation(s)
- Shiva Hemmati
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz 71345-1583, Iran
- Department of Pharmaceutical Biology, Faculty of Pharmaceutical Sciences, UCSI University, Cheras, Kuala Lumpur 56000, Malaysia
- Biotechnology Research Center, Shiraz University of Medical Sciences, Shiraz 71345-1583, Iran
- Correspondence: ; Tel.: +98-7132-424-128
| | | |
Collapse
|
10
|
Charoenkwan P, Schaduangrat N, Lio P, Moni MA, Chumnanpuen P, Shoombuatong W. iAMAP-SCM: A Novel Computational Tool for Large-Scale Identification of Antimalarial Peptides Using Estimated Propensity Scores of Dipeptides. ACS OMEGA 2022; 7:41082-41095. [PMID: 36406571 PMCID: PMC9670693 DOI: 10.1021/acsomega.2c04465] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 10/20/2022] [Indexed: 06/16/2023]
Abstract
Antimalarial peptides (AMAPs) varying in length, amino acid composition, charge, conformational structure, hydrophobicity, and amphipathicity reflect their diversity in antimalarial mechanisms. Due to the worldwide major health problem concerning antimicrobial resistance, these peptides possess great therapeutic value owing to their low incidences of drug resistance as compared to conventional antibiotics. Although well-known experimental methods are able to precisely determine the antimalarial activity of peptides, these methods are still time-consuming and costly. Thus, machine learning (ML)-based methods that are capable of identifying AMAPs rapidly by using only sequence information would be beneficial for the high-throughput identification of AMAPs. In this study, we propose the first computational model (termed iAMAP-SCM) for the large-scale identification and characterization of peptides with antimalarial activity by using only sequence information. Specifically, we employed an interpretable scoring card method (SCM) to develop iAMAP-SCM and estimate propensities of 20 amino acids and 400 dipeptides to be AMAPs in a supervised manner. Experimental results showed that iAMAP-SCM could achieve a maximum accuracy and Matthew's coefficient correlation of 0.957 and 0.834, respectively, on the independent test dataset. In addition, SCM-derived propensities of 20 amino acids and selected physicochemical properties were used to provide an understanding of the functional mechanisms of AMAPs. Finally, a user-friendly online computational platform of iAMAP-SCM is publicly available at http://pmlabstack.pythonanywhere.com/iAMAP-SCM. The iAMAP-SCM predictor is anticipated to assist experimental scientists in the high-throughput identification of potential AMAP candidates for the treatment of malaria and other clinical applications.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern
Management and Information Technology, College of Arts, Media and
Technology, Chiang Mai University, Chiang Mai50200, Thailand
| | - Nalini Schaduangrat
- Center
of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok10700, Thailand
| | - Pietro Lio
- Department
of Computer Science and Technology, University
of Cambridge, CambridgeshireCB3 0FD, U.K.
| | - Mohammad Ali Moni
- Artificial
Intelligence & Digital Health, School of Health and Rehabilitation
Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St LuciaQLD 4072, Australia
| | - Pramote Chumnanpuen
- Department
of Zoology, Faculty of Science, Kasetsart
University, Bangkok10900, Thailand
- Omics Center
for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok10900, Thailand
| | - Watshara Shoombuatong
- Center
of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok10700, Thailand
| |
Collapse
|
11
|
SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins. Comput Biol Med 2022; 146:105704. [PMID: 35690478 DOI: 10.1016/j.compbiomed.2022.105704] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 05/15/2022] [Accepted: 06/04/2022] [Indexed: 11/22/2022]
Abstract
Thermophilic proteins (TPPs) are important in the field of protein biochemistry and development of new enzymes. Thus, computational methods must be urgently developed to accurately and rapidly identify TPPs. To date, several computational methods have been developed for TPP identification; however, few limitations in terms of performance and utility remain. In this study, we present a novel computational method, SAPPHIRE, to achieve more accurate identification of TPPs using only sequence information without any need for structural information. We combined twelve different feature encodings representing different perspectives and six popular machine learning algorithms to train 72 baseline models and extract the key information of TPPs. Subsequently, the informative predicted probabilities from the baseline models were mined and selected using a genetic algorithm in conjunction with a self-assessment-report approach. Finally, the final meta-predictor, SAPPHIRE, was built and optimized by applying an optimal feature set. The performance of SAPPHIRE in the 10-fold cross-validation test showed that a superior predictive performance compared with several baseline models could be achieved. Moreover, SAPPHIRE yielded an accuracy of 0.942 and Matthew's coefficient correlation of 0.884, which were 7.68 and 5.12% higher than those of the current existing methods, respectively, as indicated by the independent test. The proposed computational approach is anticipated to facilitate large-scale identification of TPPs and accelerate their applications in the food industry. The codes and datasets are available at https://github.com/plenoi/SAPPHIRE.
Collapse
|
12
|
Charoenkwan P, Schaduangrat N, Lio' P, Moni MA, Manavalan B, Shoombuatong W. NEPTUNE: A novel computational approach for accurate and large-scale identification of tumor homing peptides. Comput Biol Med 2022; 148:105700. [PMID: 35715261 DOI: 10.1016/j.compbiomed.2022.105700] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/31/2022] [Accepted: 06/04/2022] [Indexed: 11/16/2022]
Abstract
Tumor homing peptides (THPs) play a crucial role in recognizing and specifically binding to cancer cells. Although experimental approaches can facilitate the precise identification of THPs, they are usually time-consuming, labor-intensive, and not cost-effective. However, computational approaches can identify THPs by utilizing sequence information alone, thus highlighting their great potential for large-scale identification of THPs. Herein, we propose NEPTUNE, a novel computational approach for the accurate and large-scale identification of THPs from sequence information. Specifically, we constructed variant baseline models from multiple feature encoding schemes coupled with six popular machine learning algorithms. Subsequently, we comprehensively assessed and investigated the effects of these baseline models on THP prediction. Finally, the probabilistic information generated by the optimal baseline models is fed into a support vector machine-based classifier to construct the final meta-predictor (NEPTUNE). Cross-validation and independent tests demonstrated that NEPTUNE achieved superior performance for THP prediction compared with its constituent baseline models and the existing methods. Moreover, we employed the powerful SHapley additive exPlanations method to improve the interpretation of NEPTUNE and elucidate the most important features for identifying THPs. Finally, we implemented an online web server using NEPTUNE, which is available at http://pmlabstack.pythonanywhere.com/NEPTUNE. NEPTUNE could be beneficial for the large-scale identification of unknown THP candidates for follow-up experimental validation.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Mohammad Ali Moni
- Artificial Intelligence & Digital Health, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland St Lucia, QLD, 4072, Australia
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
13
|
Charoenkwan P, Schaduangrat N, Hasan MM, Moni MA, Lió P, Shoombuatong W. Empirical comparison and analysis of machine learning-based predictors for predicting and analyzing of thermophilic proteins. EXCLI JOURNAL 2022; 21:554-570. [PMID: 35651661 PMCID: PMC9150013 DOI: 10.17179/excli2022-4723] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 02/21/2022] [Indexed: 12/15/2022]
Abstract
Thermophilic proteins (TPPs) are critical for basic research and in the food industry due to their ability to maintain a thermodynamically stable fold at extremely high temperatures. Thus, the expeditious identification of novel TPPs through computational models from protein sequences is very desirable. Over the last few decades, a number of computational methods, especially machine learning (ML)-based methods, for in silico prediction of TPPs have been developed. Therefore, it is desirable to revisit these methods and summarize their advantages and disadvantages in order to further develop new computational approaches to achieve more accurate and improved prediction of TPPs. With this goal in mind, we comprehensively investigate a large collection of fourteen state-of-the-art TPP predictors in terms of their dataset size, feature encoding schemes, feature selection strategies, ML algorithms, evaluation strategies and web server/software usability. To the best of our knowledge, this article represents the first comprehensive review on the development of ML-based methods for in silico prediction of TPPs. Among these TPP predictors, they can be classified into two groups according to the interpretability of ML algorithms employed (i.e., computational black-box methods and computational white-box methods). In order to perform the comparative analysis, we conducted a comparative study on several currently available TPP predictors based on two benchmark datasets. Finally, we provide future perspectives for the design and development of new computational models for TPP prediction. We hope that this comprehensive review will facilitate researchers in selecting an appropriate TPP predictor that is the most suitable one to deal with their purposes and provide useful perspectives for the development of more effective and accurate TPP predictors.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Mohammad Ali Moni
- School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, the University of Queensland, St Lucia, QLD 4072, Australia
| | - Pietro Lió
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| |
Collapse
|
14
|
A Novel Network Science and Similarity-Searching-Based Approach for Discovering Potential Tumor-Homing Peptides from Antimicrobials. Antibiotics (Basel) 2022; 11:antibiotics11030401. [PMID: 35326864 PMCID: PMC8944733 DOI: 10.3390/antibiotics11030401] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 03/13/2022] [Accepted: 03/15/2022] [Indexed: 02/01/2023] Open
Abstract
Peptide-based drugs are promising anticancer candidates due to their biocompatibility and low toxicity. In particular, tumor-homing peptides (THPs) have the ability to bind specifically to cancer cell receptors and tumor vasculature. Despite their potential to develop antitumor drugs, there are few available prediction tools to assist the discovery of new THPs. Two webservers based on machine learning models are currently active, the TumorHPD and the THPep, and more recently the SCMTHP. Herein, a novel method based on network science and similarity searching implemented in the starPep toolbox is presented for THP discovery. The approach leverages from exploring the structural space of THPs with Chemical Space Networks (CSNs) and from applying centrality measures to identify the most relevant and non-redundant THP sequences within the CSN. Such THPs were considered as queries (Qs) for multi-query similarity searches that apply a group fusion (MAX-SIM rule) model. The resulting multi-query similarity searching models (SSMs) were validated with three benchmarking datasets of THPs/non-THPs. The predictions achieved accuracies that ranged from 92.64 to 99.18% and Matthews Correlation Coefficients between 0.894–0.98, outperforming state-of-the-art predictors. The best model was applied to repurpose AMPs from the starPep database as THPs, which were subsequently optimized for the TH activity. Finally, 54 promising THP leads were discovered, and their sequences were analyzed to encounter novel motifs. These results demonstrate the potential of CSNs and multi-query similarity searching for the rapid and accurate identification of THPs.
Collapse
|