1
|
Arif R, Kanwal S, Ahmed S, Kabir M. A Computational Predictor for Accurate Identification of Tumor Homing Peptides by Integrating Sequential and Deep BiLSTM Features. Interdiscip Sci 2024:10.1007/s12539-024-00628-9. [PMID: 38733473 DOI: 10.1007/s12539-024-00628-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 03/16/2024] [Accepted: 03/27/2024] [Indexed: 05/13/2024]
Abstract
Cancer remains a severe illness, and current research indicates that tumor homing peptides (THPs) play an important part in cancer therapy. The identification of THPs can provide crucial insights for drug-discovery and pharmaceutical industries as they allow for tailored medication delivery towards cancer cells. These peptides have a high affinity enabling particular receptors present upon tumor surfaces, allowing for the creation of precision medications that reduce off-target consequences and enhance cancer patient treatment results. Wet-lab techniques are considered essential tools for studying THPs; however, they're labor-extensive and time-consuming, therefore making prediction of THPs a challenging task for the researchers. Computational-techniques, on the other hand, are considered significant tools in identifying THPs according to the sequence data. Despite many strategies have been presented to predict new THP, there is still a need to develop a robust method with higher rates of success. In this paper, we developed a novel framework, THP-DF, for accurately identifying THPs on a large-scale. Firstly, the peptide sequences are encoded through various sequential features. Secondly, each feature is passed to BiLSTM and attention layers to extract simplified deep features. Finally, an ensemble-framework is formed via integrating sequential- and deep features which are fed to a support vector machine which with 10-fold cross-validation to carry to validate the efficiency. The experimental results showed that THP-DF worked better on both [Formula: see text] and [Formula: see text] datasets by achieving accuracy of > 95% which are higher than existing predictors both datasets. This indicates that the proposed predictor could be a beneficial tool to precisely and rapidly identify THPs and will contribute to the cutting-edge cancer treatment strategies and pharmaceuticals.
Collapse
Affiliation(s)
- Roha Arif
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Sameera Kanwal
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Saeed Ahmed
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Muhammad Kabir
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan.
| |
Collapse
|
2
|
Abbasi AF, Asim MN, Ahmed S, Dengel A. Long extrachromosomal circular DNA identification by fusing sequence-derived features of physicochemical properties and nucleotide distribution patterns. Sci Rep 2024; 14:9466. [PMID: 38658614 PMCID: PMC11043385 DOI: 10.1038/s41598-024-57457-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 03/18/2024] [Indexed: 04/26/2024] Open
Abstract
Long extrachromosomal circular DNA (leccDNA) regulates several biological processes such as genomic instability, gene amplification, and oncogenesis. The identification of leccDNA holds significant importance to investigate its potential associations with cancer, autoimmune, cardiovascular, and neurological diseases. In addition, understanding these associations can provide valuable insights about disease mechanisms and potential therapeutic approaches. Conventionally, wet lab-based methods are utilized to identify leccDNA, which are hindered by the need for prior knowledge, and resource-intensive processes, potentially limiting their broader applicability. To empower the process of leccDNA identification across multiple species, the paper in hand presents the very first computational predictor. The proposed iLEC-DNA predictor makes use of SVM classifier along with sequence-derived nucleotide distribution patterns and physicochemical properties-based features. In addition, the study introduces a set of 12 benchmark leccDNA datasets related to three species, namely Homo sapiens (HM), Arabidopsis Thaliana (AT), and Saccharomyces cerevisiae (SC/YS). It performs large-scale experimentation across 12 benchmark datasets under different experimental settings using the proposed predictor, more than 140 baseline predictors, and 858 encoder ensembles. The proposed predictor outperforms baseline predictors and encoder ensembles across diverse leccDNA datasets by producing average performance values of 81.09%, 62.2% and 81.08% in terms of ACC, MCC and AUC-ROC across all the datasets. The source code of the proposed and baseline predictors is available at https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction . To facilitate the scientific community, a web application for leccDNA identification is available at https://sds_genetic_analysis.opendfki.de/iLEC_DNA/.
Collapse
Affiliation(s)
- Ahtisham Fazeel Abbasi
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, 67663, Kaiserslautern, Germany.
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Germany.
| | - Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Germany.
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, 67663, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Germany
| |
Collapse
|
3
|
Kanwal S, Arif R, Ahmed S, Kabir M. A novel stacking-based predictor for accurate prediction of antimicrobial peptides. J Biomol Struct Dyn 2024:1-12. [PMID: 38500243 DOI: 10.1080/07391102.2024.2329298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 03/06/2024] [Indexed: 03/20/2024]
Abstract
Antimicrobial peptides (AMPs) are gaining acceptance and support as a chief antibiotic substitute since they boost human immunity. They retain a wide range of actions and have a low risk of developing resistance, which are critical properties to the pharmaceutical industry for drug discovery. Antibiotic sensitivity, however, is an issue that affects people all around the world and has the potential to one day lead to an epidemic. As cutting-edge therapeutic agents, AMPs are also expected to cure microbial infections. In order to produce tolerable drugs, it is crucial to understand the significance of the basic architecture of AMPs. Traditional laboratory methods are expensive and time-consuming for AMPs testing and detection. Currently, bioinformatics techniques are being successfully applied to the detection of AMPs. In this study, we have developed a novel STacking-based ensemble learning framework for AntiMicrobial Peptide (STAMP) prediction. First, we constructed 84 different baseline models by using 12 different feature encoding schemes and 7 popular machine learning algorithms. Second, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, based on the feature selection strategy, we determined the optimal probabilistic feature vector, which was further utilized for the construction of our stacked model. Resultantly, the STAMP predictor achieved excellent performance during cross-validation with an accuracy and Matthew's correlation coefficient of 0.930 and 0.860, respectively. The corresponding metrics during the independent test were 0.710 and 0.464, respectively. Overall, STAMP achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, STAMP is expected to assist community-wide efforts in identifying AMPs and will contribute to the development of novel therapeutic methods and drug-design for immunity.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Sameera Kanwal
- School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Roha Arif
- School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Saeed Ahmed
- School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Muhammad Kabir
- School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
4
|
Khalid M, Ali F, Alghamdi W, Alzahrani A, Alsini R, Alzahrani A. An ensemble computational model for prediction of clathrin protein by coupling machine learning with discrete cosine transform. J Biomol Struct Dyn 2024:1-9. [PMID: 38498362 DOI: 10.1080/07391102.2024.2329777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 02/19/2024] [Indexed: 03/20/2024]
Abstract
Clathrin protein (CP) plays a pivotal role in numerous cellular processes, including endocytosis, signal transduction, and neuronal function. Dysregulation of CP has been associated with a spectrum of diseases. Given its involvement in various cellular functions, CP has garnered significant attention for its potential applications in drug design and medicine, ranging from targeted drug delivery to addressing viral infections, neurological disorders, and cancer. The accurate identification of CP is crucial for unraveling its function and devising novel therapeutic strategies. Computational methods offer a rapid, cost-effective, and less labor-intensive alternative to traditional identification methods, making them especially appealing for high-throughput screening. This paper introduces CL-Pred, a novel computational method for CP identification. CL-Pred leverages three feature descriptors: Dipeptide Deviation from Expected Mean (DDE), Bigram Position Specific Scoring Matrix (BiPSSM), and Position Specific Scoring Matrix-Tetra Slice-Discrete Cosine Transform (PSSM-TS-DCT). The model is trained using three classifiers: Support Vector Machine (SVM), Extremely Randomized Tree (ERT), and Light eXtreme Gradient Boosting (LiXGB). Notably, the LiXGB-based model achieves outstanding performance, demonstrating accuracies of 94.63% and 93.65% on the training and testing datasets, respectively. The proposed CL-Pred method is poised to significantly advance our comprehension of clathrin-mediated endocytosis, cellular physiology, and disease pathogenesis. Furthermore, it holds promise for identifying potential drug targets across a spectrum of diseases.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Majdi Khalid
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Farman Ali
- Sarhad University of Science and Information Technology Peshawar, Mardan Campus, Mardan, Pakistan
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Abdulrahman Alzahrani
- Department of Information System and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ahmed Alzahrani
- College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
5
|
Akbar S, Raza A, Zou Q. Deepstacked-AVPs: predicting antiviral peptides using tri-segment evolutionary profile and word embedding based multi-perspective features with deep stacking model. BMC Bioinformatics 2024; 25:102. [PMID: 38454333 PMCID: PMC10921744 DOI: 10.1186/s12859-024-05726-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/01/2024] [Indexed: 03/09/2024] Open
Abstract
BACKGROUND Viral infections have been the main health issue in the last decade. Antiviral peptides (AVPs) are a subclass of antimicrobial peptides (AMPs) with substantial potential to protect the human body against various viral diseases. However, there has been significant production of antiviral vaccines and medications. Recently, the development of AVPs as an antiviral agent suggests an effective way to treat virus-affected cells. Recently, the involvement of intelligent machine learning techniques for developing peptide-based therapeutic agents is becoming an increasing interest due to its significant outcomes. The existing wet-laboratory-based drugs are expensive, time-consuming, and cannot effectively perform in screening and predicting the targeted motif of antiviral peptides. METHODS In this paper, we proposed a novel computational model called Deepstacked-AVPs to discriminate AVPs accurately. The training sequences are numerically encoded using a novel Tri-segmentation-based position-specific scoring matrix (PSSM-TS) and word2vec-based semantic features. Composition/Transition/Distribution-Transition (CTDT) is also employed to represent the physiochemical properties based on structural features. Apart from these, the fused vector is formed using PSSM-TS features, semantic information, and CTDT descriptors to compensate for the limitations of single encoding methods. Information gain (IG) is applied to choose the optimal feature set. The selected features are trained using a stacked-ensemble classifier. RESULTS The proposed Deepstacked-AVPs model achieved a predictive accuracy of 96.60%%, an area under the curve (AUC) of 0.98, and a precision-recall (PR) value of 0.97 using training samples. In the case of the independent samples, our model obtained an accuracy of 95.15%, an AUC of 0.97, and a PR value of 0.97. CONCLUSION Our Deepstacked-AVPs model outperformed existing models with a ~ 4% and ~ 2% higher accuracy using training and independent samples, respectively. The reliability and efficacy of the proposed Deepstacked-AVPs model make it a valuable tool for scientists and may perform a beneficial role in pharmaceutical design and research academia.
Collapse
Affiliation(s)
- Shahid Akbar
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Ali Raza
- Department of Physical and Numerical Sciences, Qurtuba University of Science and Information Technology, Peshawar, 25124, KP, Pakistan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, People's Republic of China.
| |
Collapse
|
6
|
Hassan MT, Tayara H, Chong KT. An integrative machine learning model for the identification of tumor T-cell antigens. Biosystems 2024; 237:105177. [PMID: 38458346 DOI: 10.1016/j.biosystems.2024.105177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 01/28/2024] [Accepted: 03/05/2024] [Indexed: 03/10/2024]
Abstract
The escalating global incidence of cancer poses significant health challenges, underscoring the need for innovative and more efficacious treatments. Cancer immunotherapy, a promising approach leveraging the body's immune system against cancer, emerges as a compelling solution. Consequently, the identification and characterization of tumor T-cell antigens (TTCAs) have become pivotal for exploration. In this manuscript, we introduce TTCA-IF, an integrative machine learning-based framework designed for TTCAs identification. TTCA-IF employs ten feature encoding types in conjunction with five conventional machine learning classifiers. To establish a robust foundation, these classifiers are trained, resulting in the creation of 150 baseline models. The outputs from these baseline models are then fed back into the five classifiers, generating their respective meta-models. Through an ensemble approach, the five meta-models are seamlessly integrated to yield the final predictive model, the TTCA-IF model. Our proposed model, TTCA-IF, surpasses both baseline models and existing predictors in performance. In a comparative analysis involving nine novel peptide sequences, TTCA-IF demonstrated exceptional accuracy by correctly identifying 8 out of 9 peptides as TTCAs. As a tool for screening and pinpointing potential TTCAs, we anticipate TTCA-IF to be invaluable in advancing cancer immunotherapy.
Collapse
Affiliation(s)
- Mir Tanveerul Hassan
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea; Advances Electronics and Information Research Centre, Jeonbuk National University, Jeonju, 54896, South Korea.
| |
Collapse
|
7
|
Shoombuatong W, Homdee N, Schaduangrat N, Chumnanpuen P. Leveraging a meta-learning approach to advance the accuracy of Na v blocking peptides prediction. Sci Rep 2024; 14:4463. [PMID: 38396246 PMCID: PMC10891130 DOI: 10.1038/s41598-024-55160-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 02/21/2024] [Indexed: 02/25/2024] Open
Abstract
The voltage-gated sodium (Nav) channel is a crucial molecular component responsible for initiating and propagating action potentials. While the α subunit, forming the channel pore, plays a central role in this function, the complete physiological function of Nav channels relies on crucial interactions between the α subunit and auxiliary proteins, known as protein-protein interactions (PPI). Nav blocking peptides (NaBPs) have been recognized as a promising and alternative therapeutic agent for pain and itch. Although traditional experimental methods can precisely determine the effect and activity of NaBPs, they remain time-consuming and costly. Hence, machine learning (ML)-based methods that are capable of accurately contributing in silico prediction of NaBPs are highly desirable. In this study, we develop an innovative meta-learning-based NaBP prediction method (MetaNaBP). MetaNaBP generates new feature representations by employing a wide range of sequence-based feature descriptors that cover multiple perspectives, in combination with powerful ML algorithms. Then, these feature representations were optimized to identify informative features using a two-step feature selection method. Finally, the selected informative features were applied to develop the final meta-predictor. To the best of our knowledge, MetaNaBP is the first meta-predictor for NaBP prediction. Experimental results demonstrated that MetaNaBP achieved an accuracy of 0.948 and a Matthews correlation coefficient of 0.898 over the independent test dataset, which were 5.79% and 11.76% higher than the existing method. In addition, the discriminative power of our feature representations surpassed that of conventional feature descriptors over both the training and independent test datasets. We anticipate that MetaNaBP will be exploited for the large-scale prediction and analysis of NaBPs to narrow down the potential NaBPs.
Collapse
Affiliation(s)
- Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| | - Nutta Homdee
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand
- Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, 10900, Thailand
| |
Collapse
|
8
|
Liu GY, Yu D, Fan MM, Zhang X, Jin ZY, Tang C, Liu XF. Antimicrobial resistance crisis: could artificial intelligence be the solution? Mil Med Res 2024; 11:7. [PMID: 38254241 PMCID: PMC10804841 DOI: 10.1186/s40779-024-00510-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 01/08/2024] [Indexed: 01/24/2024] Open
Abstract
Antimicrobial resistance is a global public health threat, and the World Health Organization (WHO) has announced a priority list of the most threatening pathogens against which novel antibiotics need to be developed. The discovery and introduction of novel antibiotics are time-consuming and expensive. According to WHO's report of antibacterial agents in clinical development, only 18 novel antibiotics have been approved since 2014. Therefore, novel antibiotics are critically needed. Artificial intelligence (AI) has been rapidly applied to drug development since its recent technical breakthrough and has dramatically improved the efficiency of the discovery of novel antibiotics. Here, we first summarized recently marketed novel antibiotics, and antibiotic candidates in clinical development. In addition, we systematically reviewed the involvement of AI in antibacterial drug development and utilization, including small molecules, antimicrobial peptides, phage therapy, essential oils, as well as resistance mechanism prediction, and antibiotic stewardship.
Collapse
Affiliation(s)
- Guang-Yu Liu
- Department of Immunology and Pathogen Biology, School of Basic Medical Sciences, Hangzhou Normal University, Key Laboratory of Aging and Cancer Biology of Zhejiang Province, Key Laboratory of Inflammation and Immunoregulation of Hangzhou, Hangzhou Normal University, Hangzhou, 311121, China
| | - Dan Yu
- National Key Discipline of Pediatrics Key Laboratory of Major Diseases in Children Ministry of Education, Laboratory of Dermatology, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Mei-Mei Fan
- Department of Immunology and Pathogen Biology, School of Basic Medical Sciences, Hangzhou Normal University, Key Laboratory of Aging and Cancer Biology of Zhejiang Province, Key Laboratory of Inflammation and Immunoregulation of Hangzhou, Hangzhou Normal University, Hangzhou, 311121, China
| | - Xu Zhang
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN, 55905, USA
| | - Ze-Yu Jin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Christoph Tang
- Sir William Dunn School of Pathology, University of Oxford, Oxford, OX1 3RE, UK.
| | - Xiao-Fen Liu
- Institute of Antibiotics, Huashan Hospital, Fudan University, Key Laboratory of Clinical Pharmacology of Antibiotics, National Health Commission of the People's Republic of China, National Clinical Research Centre for Aging and Medicine, Huashan Hospital, Fudan University, Shanghai, 200040, China.
| |
Collapse
|
9
|
Singh S, Le NQK, Wang C. VF-Pred: Predicting virulence factor using sequence alignment percentage and ensemble learning models. Comput Biol Med 2024; 168:107662. [PMID: 37979206 DOI: 10.1016/j.compbiomed.2023.107662] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 10/02/2023] [Accepted: 10/31/2023] [Indexed: 11/20/2023]
Abstract
This study introduces VF-Pred, a novel framework developed for the purpose of detecting virulence factors (VFs) through the analysis of genomic data. VFs are crucial for pathogens to successfully infect host tissue and evade the immune system, leading to the onset of infectious diseases. Identifying VFs accurately is of utmost importance in the quest for developing potent drugs and vaccines to counter these diseases. To accomplish this, VF-Pred combines various feature engineering techniques to generate inputs for distinct machine learning classification models. The collective predictions of these models are then consolidated by a final downstream model using an innovative ensembling approach. One notable aspect of VF-Pred is the inclusion of a novel Seq-Alignment feature, which significantly enhances the accuracy of the employed machine learning algorithms. The framework was meticulously trained on 982 features obtained from extensive feature engineering, utilizing a comprehensive ensemble of 25 models. The new downstream ensembling technique adopted by VF-Pred surpasses existing stacking strategies and other ensembling methods, delivering superior performance in VF detection. There have been similar studies done earlier, VF-Pred stands out in comparison showing higher accuracy (83.5 %), higher sensitivity (87 %) towards identification of VFs. Accessible through a user-friendly web page, VF-Pred can be accessed by providing the identifier and protein sequence, enabling the prediction of high or low likelihoods of VFs. Overall, VF-Pred showcases a highly promising methodology for the identification of VFs, potentially paving the way for the development of more effective strategies in the battle against infectious diseases.
Collapse
Affiliation(s)
- Shreya Singh
- NUS-ISS, National University of Singapore, 119615, Singapore
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, 110, Taiwan; AIBioMed Research Group, Taipei Medical University, Taipei, 110, Taiwan; Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, 110, Taiwan.
| | - Cheng Wang
- NUS-ISS, National University of Singapore, 119615, Singapore
| |
Collapse
|
10
|
Schaduangrat N, Homdee N, Shoombuatong W. StackER: a novel SMILES-based stacked approach for the accelerated and efficient discovery of ERα and ERβ antagonists. Sci Rep 2023; 13:22994. [PMID: 38151513 PMCID: PMC10752908 DOI: 10.1038/s41598-023-50393-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 12/19/2023] [Indexed: 12/29/2023] Open
Abstract
The role of estrogen receptors (ERs) in breast cancer is of great importance in both clinical practice and scientific exploration. However, around 15-30% of those affected do not see benefits from the usual treatments owing to the innate resistance mechanisms, while 30-40% will gain resistance through treatments. In order to address this problem and facilitate community-wide efforts, machine learning (ML)-based approaches are considered one of the most cost-effective and large-scale identification methods. Herein, we propose a new SMILES-based stacked approach, termed StackER, for the accelerated and efficient identification of ERα and ERβ inhibitors. In StackER, we first established an up-to-date dataset consisting of 1,996 and 1,207 compounds for ERα and ERβ, respectively. Using the up-to-date dataset, StackER explored a wide range of different SMILES-based feature descriptors and ML algorithms in order to generate probabilistic features (PFs). Finally, the selected PFs derived from the two-step feature selection strategy were used for the development of an efficient stacked model. Both cross-validation and independent tests showed that StackER surpassed several conventional ML classifiers and the existing method in precisely predicting ERα and ERβ inhibitors. Remarkably, StackER achieved MCC values of 0.829-0.847 and 0.712-0.786 in terms of the cross-validation and independent tests, respectively, which were 5.92-8.29 and 1.59-3.45% higher than the existing method. In addition, StackER was applied to determine useful features for being ERα and ERβ inhibitors and identify FDA-approved drugs as potential ERα inhibitors in efforts to facilitate drug repurposing. This innovative stacked method is anticipated to facilitate community-wide efforts in efficiently narrowing down ER inhibitor screening.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nutta Homdee
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
11
|
Zou H, Yu W. Integrating Low-Order and High-Order Correlation Information for Identifying Phage Virion Proteins. J Comput Biol 2023; 30:1131-1143. [PMID: 37729064 DOI: 10.1089/cmb.2022.0237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2023] Open
Abstract
Phage virion proteins (PVPs) play an important role in the host cell. Fast and accurate identification of PVPs is beneficial for the discovery and development of related drugs. Although wet experimental approaches are the first choice to identify PVPs, they are costly and time-consuming. Thus, researchers have turned their attention to computational models, which can speed up related studies. Therefore, we proposed a novel machine-learning model to identify PVPs in the current study. First, 50 different types of physicochemical properties were used to denote protein sequences. Next, two different approaches, including Pearson's correlation coefficient (PCC) and maximal information coefficient (MIC), were employed to extract discriminative information. Further, to capture the high-order correlation information, we used PCC and MIC once again. After that, we adopted the least absolute shrinkage and selection operator algorithm to select the optimal feature subset. Finally, these chosen features were fed into a support vector machine to discriminate PVPs from phage non-virion proteins. We performed experiments on two different datasets to validate the effectiveness of our proposed method. Experimental results showed a significant improvement in performance compared with state-of-the-art approaches. It indicates that the proposed computational model may become a powerful predictor in identifying PVPs.
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Wanting Yu
- College of Animal Science and Technology, Jiangxi Agricultural University, Nanchang, China
| |
Collapse
|
12
|
Charoenkwan P, Waramit S, Chumnanpuen P, Schaduangrat N, Shoombuatong W. TROLLOPE: A novel sequence-based stacked approach for the accelerated discovery of linear T-cell epitopes of hepatitis C virus. PLoS One 2023; 18:e0290538. [PMID: 37624802 PMCID: PMC10456195 DOI: 10.1371/journal.pone.0290538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 08/10/2023] [Indexed: 08/27/2023] Open
Abstract
Hepatitis C virus (HCV) infection is a concerning health issue that causes chronic liver diseases. Despite many successful therapeutic outcomes, no effective HCV vaccines are currently available. Focusing on T cell activity, the primary effector for HCV clearance, T cell epitopes of HCV (TCE-HCV) are considered promising elements to accelerate HCV vaccine efficacy. Thus, accurate and rapid identification of TCE-HCVs is recommended to obtain more efficient therapy for chronic HCV infection. In this study, a novel sequence-based stacked approach, termed TROLLOPE, is proposed to accurately identify TCE-HCVs from sequence information. Specifically, we employed 12 different sequence-based feature descriptors from heterogeneous perspectives, such as physicochemical properties, composition-transition-distribution information and composition information. These descriptors were used in cooperation with 12 popular machine learning (ML) algorithms to create 144 base-classifiers. To maximize the utility of these base-classifiers, we used a feature selection strategy to determine a collection of potential base-classifiers and integrated them to develop the meta-classifier. Comprehensive experiments based on both cross-validation and independent tests demonstrated the superior predictive performance of TROLLOPE compared with conventional ML classifiers, with cross-validation and independent test accuracies of 0.745 and 0.747, respectively. Finally, a user-friendly online web server of TROLLOPE (http://pmlabqsar.pythonanywhere.com/TROLLOPE) has been developed to serve research efforts in the large-scale identification of potential TCE-HCVs for follow-up experimental verification.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand
| | - Sajee Waramit
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, Thailand
- Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| |
Collapse
|
13
|
Charoenkwan P, Schaduangrat N, Shoombuatong W. StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens. BMC Bioinformatics 2023; 24:301. [PMID: 37507654 PMCID: PMC10386778 DOI: 10.1186/s12859-023-05421-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023] Open
Abstract
BACKGROUND The identification of tumor T cell antigens (TTCAs) is crucial for providing insights into their functional mechanisms and utilizing their potential in anticancer vaccines development. In this context, TTCAs are highly promising. Meanwhile, experimental technologies for discovering and characterizing new TTCAs are expensive and time-consuming. Although many machine learning (ML)-based models have been proposed for identifying new TTCAs, there is still a need to develop a robust model that can achieve higher rates of accuracy and precision. RESULTS In this study, we propose a new stacking ensemble learning-based framework, termed StackTTCA, for accurate and large-scale identification of TTCAs. Firstly, we constructed 156 different baseline models by using 12 different feature encoding schemes and 13 popular ML algorithms. Secondly, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, the optimal probabilistic feature vector was determined based the feature selection strategy and then used for the construction of our stacked model. Comparative benchmarking experiments indicated that StackTTCA clearly outperformed several ML classifiers and the existing methods in terms of the independent test, with an accuracy of 0.932 and Matthew's correlation coefficient of 0.866. CONCLUSIONS In summary, the proposed stacking ensemble learning-based framework of StackTTCA could help to precisely and rapidly identify true TTCAs for follow-up experimental verification. In addition, we developed an online web server ( http://2pmlab.camt.cmu.ac.th/StackTTCA ) to maximize user convenience for high-throughput screening of novel TTCAs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
14
|
Syahid NF, Weerapreeyakul N, Srisongkram T. StackBRAF: A Large-Scale Stacking Ensemble Learning for BRAF Affinity Prediction. ACS OMEGA 2023; 8:20881-20891. [PMID: 37332807 PMCID: PMC10268632 DOI: 10.1021/acsomega.3c01641] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 05/22/2023] [Indexed: 06/20/2023]
Abstract
The B-rapidly accelerated fibrosarcoma (BRAF) is a proto-oncogene that plays a vital role in cell signaling and growth regulation. Identifying a potent BRAF inhibitor can enhance therapeutic success in high-stage cancers, particularly metastatic melanoma. In this study, we proposed a stacking ensemble learning framework for the accurate prediction of BRAF inhibitors. We obtained 3857 curated molecules with BRAF inhibitory activity expressed as a predicted half-maximal inhibitory concentration value (pIC50) from the ChEMBL database. Twelve molecular fingerprints from PaDeL-Descriptor were calculated for model training. Three machine learning algorithms including extreme gradient boosting, support vector regression, and multilayer perceptron were utilized for constructing new predictive features (PFs). The meta-ensemble random forest regression, called StackBRAF, was created based on the 36 PFs. The StackBRAF model achieves lower mean absolute error (MAE) and higher coefficient of determination (R2 and Q2) than the individual baseline models. The stacking ensemble learning model provides good y-randomization results, indicating a strong correlation between molecular features and pIC50. An applicability domain of the model with an acceptable Tanimoto similarity score was also defined. Moreover, a large-scale high-throughput screening of 2123 FDA-approved drugs against the BRAF protein was successfully demonstrated using the StackBRAF algorithm. Thus, the StackBRAF model proved beneficial as a drug design algorithm for BRAF inhibitor drug discovery and drug development.
Collapse
Affiliation(s)
- Nur Fadhilah Syahid
- Graduate
School in the Program of Pharmaceutical Chemistry and Natural Products,
Faculty of Pharmaceutical Sciences, Khon
Kaen University, Khon Kaen 40002, Thailand
| | - Natthida Weerapreeyakul
- Division
of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen 40002, Thailand
- Human
High Performance and Health Promotion Research Institute, Khon Kaen University, Khon Kaen 40002, Thailand
| | - Tarapong Srisongkram
- Division
of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen 40002, Thailand
- Human
High Performance and Health Promotion Research Institute, Khon Kaen University, Khon Kaen 40002, Thailand
| |
Collapse
|
15
|
Charoenkwan P, Schaduangrat N, Pham NT, Manavalan B, Shoombuatong W. Pretoria: An effective computational approach for accurate and high-throughput identification of CD8+ t-cell epitopes of eukaryotic pathogens. Int J Biol Macromol 2023; 238:124228. [PMID: 36996953 DOI: 10.1016/j.ijbiomac.2023.124228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 03/11/2023] [Accepted: 03/25/2023] [Indexed: 03/31/2023]
Abstract
T-cells recognize antigenic epitopes present on major histocompatibility complex (MHC) molecules, triggering an adaptive immune response in the host. T-cell epitope (TCE) identification is challenging because of the extensive number of undetermined proteins found in eukaryotic pathogens, as well as MHC polymorphisms. In addition, conventional experimental approaches for TCE identification are time-consuming and expensive. Thus, computational approaches that can accurately and rapidly identify CD8+ T-cell epitopes (TCEs) of eukaryotic pathogens based solely on sequence information may facilitate the discovery of novel CD8+ TCEs in a cost-effective manner. Here, Pretoria (Predictor of CD8+ TCEs of eukaryotic pathogens) is proposed as the first stack-based approach for accurate and large-scale identification of CD8+ TCEs of eukaryotic pathogens. In particular, Pretoria enabled the extraction and exploration of crucial information embedded in CD8+ TCEs by employing a comprehensive set of 12 well-known feature descriptors extracted from multiple groups, including physicochemical properties, composition-transition-distribution, pseudo-amino acid composition, and amino acid composition. These feature descriptors were then utilized to construct a pool of 144 different machine learning (ML)-based classifiers based on 12 popular ML algorithms. Finally, the feature selection method was used to effectively determine the important ML classifiers for the construction of our stacked model. The experimental results indicated that Pretoria is an accurate and effective computational approach for CD8+ TCE prediction; it was superior to several conventional ML classifiers and the existing method in terms of the independent test, with an accuracy of 0.866, MCC of 0.732, and AUC of 0.921. Additionally, to maximize user convenience for high-throughput identification of CD8+ TCEs of eukaryotic pathogens, a user-friendly web server of Pretoria (http://pmlabstack.pythonanywhere.com/Pretoria) was developed and made freely available.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Nhat Truong Pham
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
16
|
de Paiva BBM, Pereira PD, de Andrade CMV, Gomes VMR, Souza-Silva MVR, Martins KPMP, Sales TLS, de Carvalho RLR, Pires MC, Ramos LEF, Silva RT, de Freitas Martins Vieira A, Nunes AGS, de Oliveira Jorge A, de Oliveira Maurílio A, Scotton ALBA, da Silva CTCA, Cimini CCR, Ponce D, Pereira EC, Manenti ERF, Rodrigues FD, Anschau F, Botoni FA, Bartolazzi F, Grizende GMS, Noal HC, Duani H, Gomes IM, Costa JHSM, di Sabatino Santos Guimarães J, Tupinambás JT, Rugolo JM, Batista JDL, de Alvarenga JC, Chatkin JM, Ruschel KB, Zandoná LB, Pinheiro LS, Menezes LSM, de Oliveira LMC, Kopittke L, Assis LA, Marques LM, Raposo MC, Floriani MA, Bicalho MAC, Nogueira MCA, de Oliveira NR, Ziegelmann PK, Paraiso PG, de Lima Martelli PJ, Senger R, Menezes RM, Francisco SC, Araújo SF, Kurtz T, Fereguetti TO, de Oliveira TC, Ribeiro YCNMB, Ramires YC, Lima MCPB, Carneiro M, Bezerra AFB, Schwarzbold AV, de Moura Costa AS, Farace BL, Silveira DV, de Almeida Cenci EP, Lucas FB, Aranha FG, Bastos GAN, Vietta GG, Nascimento GF, Vianna HR, Guimarães HC, de Morais JDP, Moreira LB, de Oliveira LS, de Deus Sousa L, de Souza Viana L, de Souza Cabral MA, Ferreira MAP, de Godoy MF, de Figueiredo MP, Guimarães-Junior MH, de Paula de Sordi MA, da Cunha Severino Sampaio N, Assaf PL, Lutkmeier R, Valacio RA, Finger RG, de Freitas R, Guimarães SMM, Oliveira TF, Diniz THO, Gonçalves MA, Marcolino MS. Potential and limitations of machine meta-learning (ensemble) methods for predicting COVID-19 mortality in a large inhospital Brazilian dataset. Sci Rep 2023; 13:3463. [PMID: 36859446 PMCID: PMC9975879 DOI: 10.1038/s41598-023-28579-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Accepted: 01/20/2023] [Indexed: 03/03/2023] Open
Abstract
The majority of early prediction scores and methods to predict COVID-19 mortality are bound by methodological flaws and technological limitations (e.g., the use of a single prediction model). Our aim is to provide a thorough comparative study that tackles those methodological issues, considering multiple techniques to build mortality prediction models, including modern machine learning (neural) algorithms and traditional statistical techniques, as well as meta-learning (ensemble) approaches. This study used a dataset from a multicenter cohort of 10,897 adult Brazilian COVID-19 patients, admitted from March/2020 to November/2021, including patients [median age 60 (interquartile range 48-71), 46% women]. We also proposed new original population-based meta-features that have not been devised in the literature. Stacking has shown to achieve the best results reported in the literature for the death prediction task, improving over previous state-of-the-art by more than 46% in Recall for predicting death, with AUROC 0.826 and MacroF1 of 65.4%. The newly proposed meta-features were highly discriminative of death, but fell short in producing large improvements in final prediction performance, demonstrating that we are possibly on the limits of the prediction capabilities that can be achieved with the current set of ML techniques and (meta-)features. Finally, we investigated how the trained models perform on different hospitals, showing that there are indeed large differences in classifier performance between different hospitals, further making the case that errors are produced by factors that cannot be modeled with the current predictors.
Collapse
Affiliation(s)
- Bruno Barbosa Miranda de Paiva
- grid.8430.f0000 0001 2181 4888Computer Science Department, Universidade Federal de Minas Gerais, Av. Presidente Antônio Carlos, 6627, Belo Horizonte, Brazil
| | - Polianna Delfino Pereira
- grid.8430.f0000 0001 2181 4888Universidade Federal de Minas Gerais, Av. Presidente Antônio Carlos, 6627, Belo Horizonte, Brazil ,Institute for Health Technology Assessment (IATS/ CNPq), R. Ramiro Barcelos, 2359, building 21, room 507, Porto Alegre, Brazil
| | - Claudio Moisés Valiense de Andrade
- grid.8430.f0000 0001 2181 4888Computer Science Department, Universidade Federal de Minas Gerais, Av. Presidente Antônio Carlos, 6627, Belo Horizonte, Brazil
| | - Virginia Mara Reis Gomes
- grid.8430.f0000 0001 2181 4888Medical School and University Hospital, Universidade Federal de Minas Gerais, Av. Professor Alfredo Balena, 190, room 246, Belo Horizonte, Brazil
| | - Maira Viana Rego Souza-Silva
- grid.8430.f0000 0001 2181 4888Medical School and University Hospital, Universidade Federal de Minas Gerais, Av. Professor Alfredo Balena, 190, room 246, Belo Horizonte, Brazil
| | - Karina Paula Medeiros Prado Martins
- grid.8430.f0000 0001 2181 4888Medical School and University Hospital, Universidade Federal de Minas Gerais, Av. Professor Alfredo Balena, 190, room 246, Belo Horizonte, Brazil
| | - Thaís Lorenna Souza Sales
- grid.428481.30000 0001 1516 3599Universidade Federal de São João del-Rei, R. Sebastião Gonçalves Coelho, 400, Divinópolis, Brazil
| | | | - Magda Carvalho Pires
- grid.8430.f0000 0001 2181 4888Department of Statistics, Universidade Federal de Minas Gerais, Av. Presidente Antônio Carlos, 6627, ICEx, room 4071, Belo Horizonte, Brazil
| | - Lucas Emanuel Ferreira Ramos
- grid.8430.f0000 0001 2181 4888Department of Statistics, Universidade Federal de Minas Gerais, Av. Presidente Antônio Carlos, 6627, ICEx, room 4071, Belo Horizonte, Brazil
| | - Rafael Tavares Silva
- grid.8430.f0000 0001 2181 4888Department of Statistics, Universidade Federal de Minas Gerais, Av. Presidente Antônio Carlos, 6627, ICEx, room 4071, Belo Horizonte, Brazil
| | | | | | | | | | | | | | | | - Daniela Ponce
- grid.410543.70000 0001 2188 478XFaculdade de Medicina de Botucatu-Universidade Estadual Paulista “Júlio de Mesquita Filho”, Av. Prof. Mário Rubens Guimarães Montenegro, s/n-UNESP-Campus de Botucatu, Botucatu, Brazil
| | | | | | - Fernanda d’Athayde Rodrigues
- grid.414449.80000 0001 0125 3761Hospital de Clínicas de Porto Alegre, R. Ramiro Barcelos, 2350, Porto Alegre, Brazil
| | - Fernando Anschau
- grid.414914.dHospital Nossa Senhora da Conceição and Hospital Cristo Redentor, Av. Francisco Trein, 326, Porto Alegre, Brazil
| | | | - Frederico Bartolazzi
- Hospital Santo Antônio, Pç. Dr. Márcio Carvalho Lopes Filho, 501, Curvelo, Brazil
| | - Genna Maira Santos Grizende
- grid.477816.b0000 0004 4692 337XHospital Santa Casa de Misericórdia de Belo Horizonte, Av. Francisco Sales, 1111, Belo Horizonte, Brazil
| | - Helena Carolina Noal
- grid.411239.c0000 0001 2284 6531Universidade Federal de Santa Maria/Hospital Universitário/EBSERH, Av. Roraima, 1000, building 22, Santa Maria, Brazil
| | - Helena Duani
- grid.8430.f0000 0001 2181 4888Medical School and University Hospital, Universidade Federal de Minas Gerais, Av. Professor Alfredo Balena, 190, room 246, Belo Horizonte, Brazil
| | - Isabela Moraes Gomes
- grid.8430.f0000 0001 2181 4888Medical School and University Hospital, Universidade Federal de Minas Gerais, Av. Professor Alfredo Balena, 190, room 246, Belo Horizonte, Brazil
| | | | | | | | - Juliana Machado Rugolo
- grid.410543.70000 0001 2188 478XFaculdade de Medicina de Botucatu-Universidade Estadual Paulista “Júlio de Mesquita Filho”, Av. Prof. Mário Rubens Guimarães Montenegro, s/n-UNESP-Campus de Botucatu, Botucatu, Brazil
| | - Joanna d’Arc Lyra Batista
- grid.440565.60000 0004 0491 0431Universidade Federal da Fronteira Sul, Av. Fernando Machado, 108E, Chapecó, Brazil
| | | | - José Miguel Chatkin
- grid.411379.90000 0001 2198 7041Hospital São Lucas PUCRS, Av. Ipiranga, 6690, Porto Alegre, Brazil
| | - Karen Brasil Ruschel
- grid.414871.f0000 0004 0491 7596Hospital Mãe de Deus, R. José de Alencar, 286, Porto Alegre, Brazil
| | | | | | - Luanna Silva Monteiro Menezes
- Hospital Metropolitano Odilon Behrens, R. Formiga, 50, Belo Horizonte, Brazil ,Hospital Luxemburgo, R. Gentios, 1350, Belo Horizonte, Brazil
| | | | - Luciane Kopittke
- grid.414914.dHospital Nossa Senhora da Conceição and Hospital Cristo Redentor, Av. Francisco Trein, 326, Porto Alegre, Brazil
| | - Luisa Argolo Assis
- grid.412520.00000 0001 2155 6671Pontifícia Universidade Católica de Minas Gerais, Av. Dom José Gaspar, 500, Belo Horizonte, Brazil
| | - Luiza Margoto Marques
- grid.419130.e0000 0004 0413 0953Faculdade de Ciências Médicas de Minas Gerais, Al. Ezequiel Dias, 275, Belo Horizonte, Brazil
| | - Magda Cesar Raposo
- grid.428481.30000 0001 1516 3599Universidade Federal de São João del-Rei, R. Sebastião Gonçalves Coelho, 400, Divinópolis, Brazil
| | - Maiara Anschau Floriani
- grid.414856.a0000 0004 0398 2134Hospital Moinhos de Vento, R. Ramiro Barcelos, 910, Porto Alegre, Brazil ,Moinhos Research Institute, 910 Ramiro Barcelos Street, 5 floor, Porto Alegre, Brazil
| | - Maria Aparecida Camargos Bicalho
- grid.452464.50000 0000 9270 1314Fundação Hospitalar do Estado de Minas Gerais–FHEMIG, Cidade Administrativa de Minas Gerais, Edifício Gerais, 13rd floor, Rod. Papa João Paulo II, 3777, Belo Horizonte, Brazil
| | | | - Neimy Ramos de Oliveira
- grid.452464.50000 0000 9270 1314Hospital Eduardo de Menezes, R. Dr. Cristiano Rezende, 2213, Belo Horizonte, Brazil
| | | | | | | | - Roberta Senger
- grid.411239.c0000 0001 2284 6531Universidade Federal de Santa Maria/Hospital Universitário/EBSERH, Av. Roraima, 1000, building 22, Santa Maria, Brazil
| | | | | | | | - Tatiana Kurtz
- Hospital Santa Cruz, R. Fernando Abott, 174, Santa Cruz do Sul, Brazil
| | - Tatiani Oliveira Fereguetti
- grid.452464.50000 0000 9270 1314Hospital Eduardo de Menezes, R. Dr. Cristiano Rezende, 2213, Belo Horizonte, Brazil
| | | | | | | | | | - Marcelo Carneiro
- Hospital Santa Cruz, R. Fernando Abott, 174, Santa Cruz do Sul, Brazil
| | | | - Alexandre Vargas Schwarzbold
- grid.411239.c0000 0001 2284 6531Universidade Federal de Santa Maria/Hospital Universitário/EBSERH, Av. Roraima, 1000, building 22, Santa Maria, Brazil
| | | | - Barbara Lopes Farace
- grid.490178.3Hospital Risoleta Tolentino Neves, R. das Gabirobas, 01, Belo Horizonte, Brazil
| | | | | | | | | | - Gisele Alsina Nader Bastos
- grid.414856.a0000 0004 0398 2134Hospital Moinhos de Vento, R. Ramiro Barcelos, 910, Porto Alegre, Brazil
| | | | | | | | | | | | - Leila Beltrami Moreira
- grid.414449.80000 0001 0125 3761Hospital de Clínicas de Porto Alegre, R. Ramiro Barcelos, 2350, Porto Alegre, Brazil
| | | | | | | | - Máderson Alvares de Souza Cabral
- grid.8430.f0000 0001 2181 4888Medical School and University Hospital, Universidade Federal de Minas Gerais, Av. Professor Alfredo Balena, 190, room 246, Belo Horizonte, Brazil
| | - Maria Angélica Pires Ferreira
- grid.414449.80000 0001 0125 3761Hospital de Clínicas de Porto Alegre, R. Ramiro Barcelos, 2350, Porto Alegre, Brazil
| | - Mariana Frizzo de Godoy
- grid.411379.90000 0001 2198 7041Hospital São Lucas PUCRS, Av. Ipiranga, 6690, Porto Alegre, Brazil
| | | | | | - Mônica Aparecida de Paula de Sordi
- grid.410543.70000 0001 2188 478XFaculdade de Medicina de Botucatu-Universidade Estadual Paulista “Júlio de Mesquita Filho”, Av. Prof. Mário Rubens Guimarães Montenegro, s/n-UNESP-Campus de Botucatu, Botucatu, Brazil
| | | | - Pedro Ledic Assaf
- Hospital Metropolitano Doutor Célio de Castro, R. Dona Luiza, 311, Belo Horizonte, Brazil
| | - Raquel Lutkmeier
- grid.414914.dHospital Nossa Senhora da Conceição and Hospital Cristo Redentor, Av. Francisco Trein, 326, Porto Alegre, Brazil
| | | | | | - Rufino de Freitas
- Hospital São João de Deus, R. do Cobre, 800, São João de Deus, Brazil
| | | | | | | | - Marcos André Gonçalves
- grid.8430.f0000 0001 2181 4888Computer Science Department, Universidade Federal de Minas Gerais, Av. Presidente Antônio Carlos, 6627, Belo Horizonte, Brazil
| | - Milena Soriano Marcolino
- Institute for Health Technology Assessment (IATS/ CNPq), R. Ramiro Barcelos, 2359, building 21, room 507, Porto Alegre, Brazil. .,Medical School and University Hospital, Universidade Federal de Minas Gerais, Av. Professor Alfredo Balena, 190, room 246, Belo Horizonte, Brazil. .,Telehealth Center, University Hospital, Universidade Federal de Minas Gerais, Avenida Professor Alfredo Balena, 110 room 107. Santa Efigênia, Belo Horizonte, MG, CEP 30130-100, Brazil.
| |
Collapse
|
17
|
Bajiya N, Dhall A, Aggarwal S, Raghava GPS. Advances in the field of phage-based therapy with special emphasis on computational resources. Brief Bioinform 2023; 24:6961791. [PMID: 36575815 DOI: 10.1093/bib/bbac574] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 11/07/2022] [Accepted: 11/25/2022] [Indexed: 12/29/2022] Open
Abstract
In the current era, one of the major challenges is to manage the treatment of drug/antibiotic-resistant strains of bacteria. Phage therapy, a century-old technique, may serve as an alternative to antibiotics in treating bacterial infections caused by drug-resistant strains of bacteria. In this review, a systematic attempt has been made to summarize phage-based therapy in depth. This review has been divided into the following two sections: general information and computer-aided phage therapy (CAPT). In the case of general information, we cover the history of phage therapy, the mechanism of action, the status of phage-based products (approved and clinical trials) and the challenges. This review emphasizes CAPT, where we have covered primary phage-associated resources, phage prediction methods and pipelines. This review covers a wide range of databases and resources, including viral genomes and proteins, phage receptors, host genomes of phages, phage-host interactions and lytic proteins. In the post-genomic era, identifying the most suitable phage for lysing a drug-resistant strain of bacterium is crucial for developing alternate treatments for drug-resistant bacteria and this remains a challenging problem. Thus, we compile all phage-associated prediction methods that include the prediction of phages for a bacterial strain, the host for a phage and the identification of interacting phage-host pairs. Most of these methods have been developed using machine learning and deep learning techniques. This review also discussed recent advances in the field of CAPT, where we briefly describe computational tools available for predicting phage virions, the life cycle of phages and prophage identification. Finally, we describe phage-based therapy's advantages, challenges and opportunities.
Collapse
Affiliation(s)
- Nisha Bajiya
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Suchet Aggarwal
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| |
Collapse
|
18
|
Fang Z, Feng T, Zhou H, Chen M. DeePVP: Identification and classification of phage virion proteins using deep learning. Gigascience 2022; 11:6661052. [PMID: 35950840 PMCID: PMC9366990 DOI: 10.1093/gigascience/giac076] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/08/2022] [Accepted: 07/11/2022] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Many biological properties of phages are determined by phage virion proteins (PVPs), and the poor annotation of PVPs is a bottleneck for many areas of viral research, such as viral phylogenetic analysis, viral host identification, and antibacterial drug design. Because of the high diversity of PVP sequences, the PVP annotation of a phage genome remains a particularly challenging bioinformatic task. FINDINGS Based on deep learning, we developed DeePVP. The main module of DeePVP aims to discriminate PVPs from non-PVPs within a phage genome, while the extended module of DeePVP can further classify predicted PVPs into the 10 major classes of PVPs. Compared with the present state-of-the-art tools, the main module of DeePVP performs better, with a 9.05% higher F1-score in the PVP identification task. Moreover, the overall accuracy of the extended module of DeePVP in the PVP classification task is approximately 3.72% higher than that of PhANNs. Two application cases show that the predictions of DeePVP are more reliable and can better reveal the compact PVP-enriched region than the current state-of-the-art tools. Particularly, in the Escherichia phage phiEC1 genome, a novel PVP-enriched region that is conserved in many other Escherichia phage genomes was identified, indicating that DeePVP will be a useful tool for the analysis of phage genomic structures. CONCLUSIONS DeePVP outperforms state-of-the-art tools. The program is optimized in both a virtual machine with graphical user interface and a docker so that the tool can be easily run by noncomputer professionals. DeePVP is freely available at https://github.com/fangzcbio/DeePVP/.
Collapse
Affiliation(s)
- Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Tao Feng
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Muxuan Chen
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| |
Collapse
|
19
|
Charoenkwan P, Schaduangrat N, Mahmud SMH, Thinnukool O, Shoombuatong W. Recent development of machine learning-based methods for the prediction of defensin family and subfamily. EXCLI JOURNAL 2022; 21:757-771. [PMID: 35949489 PMCID: PMC9360473 DOI: 10.17179/excli2022-4913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 05/03/2022] [Indexed: 11/05/2022]
Abstract
Nearly all living species comprise of host defense peptides called defensins, that are crucial for innate immunity. These peptides work by activating the immune system which kills the microbes directly or indirectly, thus providing protection to the host. Thus far, numerous preclinical and clinical trials for peptide-based drugs are currently being evaluated. Although, experimental methods can help to precisely identify the defensin peptide family and subfamily, these approaches are often time-consuming and cost-ineffective. On the other hand, machine learning (ML) methods are able to effectively employ protein sequence information without the knowledge of a protein's three-dimensional structure, thus highlighting their predictive ability for the large-scale identification. To date, several ML methods have been developed for the in silico identification of the defensin peptide family and subfamily. Therefore, summarizing the advantages and disadvantages of the existing methods is urgently needed in order to provide useful suggestions for the development and improvement of new computational models for the identification of the defensin peptide family and subfamily. With this goal in mind, we first provide a comprehensive survey on a collection of six state-of-the-art computational approaches for predicting the defensin peptide family and subfamily. Herein, we cover different important aspects, including the dataset quality, feature encoding methods, feature selection schemes, ML algorithms, cross-validation methods and web server availability/usability. Moreover, we provide our thoughts on the limitations of existing methods and future perspectives for improving the prediction performance and model interpretability. The insights and suggestions gained from this review are anticipated to serve as a valuable guidance for researchers for the development of more robust and useful predictors.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| | - S. M. Hasan Mahmud
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700,Department of Computer Science, American International University-Bangladesh (AIUB), Kuratoli, Dhaka 1229, Bangladesh
| | - Orawit Thinnukool
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700,*To whom correspondence should be addressed: Watshara Shoombuatong, Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700; Phone: +66 2 441 4371, Fax: +66 2 441 4380, E-mail:
| |
Collapse
|