1
|
Beltrán JF, Belén LH, Farias JG, Zamorano M, Lefin N, Miranda J, Parraguez-Contreras F. VirusHound-I: prediction of viral proteins involved in the evasion of host adaptive immune response using the random forest algorithm and generative adversarial network for data augmentation. Brief Bioinform 2023; 25:bbad434. [PMID: 38033292 PMCID: PMC10753651 DOI: 10.1093/bib/bbad434] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 10/18/2023] [Accepted: 11/05/2023] [Indexed: 12/02/2023] Open
Abstract
Throughout evolution, pathogenic viruses have developed different strategies to evade the response of the adaptive immune system. To carry out successful replication, some pathogenic viruses encode different proteins that manipulate the molecular mechanisms of host cells. Currently, there are different bioinformatics tools for virus research; however, none of them focus on predicting viral proteins that evade the adaptive system. In this work, we have developed a novel tool based on machine and deep learning for predicting this type of viral protein named VirusHound-I. This tool is based on a model developed with the multilayer perceptron algorithm using the dipeptide composition molecular descriptor. In this study, we have also demonstrated the robustness of our strategy for data augmentation of the positive dataset based on generative antagonistic networks. During the 10-fold cross-validation step in the training dataset, the predictive model showed 0.947 accuracy, 0.994 precision, 0.943 F1 score, 0.995 specificity, 0.896 sensitivity, 0.894 kappa, 0.898 Matthew's correlation coefficient and 0.989 AUC. On the other hand, during the testing step, the model showed 0.964 accuracy, 1.0 precision, 0.967 F1 score, 1.0 specificity, 0.936 sensitivity, 0.929 kappa, 0.931 Matthew's correlation coefficient and 1.0 AUC. Taking this model into account, we have developed a tool called VirusHound-I that makes it possible to predict viral proteins that evade the host's adaptive immune system. We believe that VirusHound-I can be very useful in accelerating studies on the molecular mechanisms of evasion of pathogenic viruses, as well as in the discovery of therapeutic targets.
Collapse
Affiliation(s)
- Jorge F Beltrán
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | | | - Jorge G Farias
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | - Mauricio Zamorano
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | - Nicolás Lefin
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | - Javiera Miranda
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | - Fernanda Parraguez-Contreras
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| |
Collapse
|
2
|
Kumari S, Gupta R, Ambasta RK, Kumar P. Emerging trends in post-translational modification: Shedding light on Glioblastoma multiforme. Biochim Biophys Acta Rev Cancer 2023; 1878:188999. [PMID: 37858622 DOI: 10.1016/j.bbcan.2023.188999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/06/2023] [Accepted: 10/06/2023] [Indexed: 10/21/2023]
Abstract
Recent multi-omics studies, including proteomics, transcriptomics, genomics, and metabolomics have revealed the critical role of post-translational modifications (PTMs) in the progression and pathogenesis of Glioblastoma multiforme (GBM). Further, PTMs alter the oncogenic signaling events and offer a novel avenue in GBM therapeutics research through PTM enzymes as potential biomarkers for drug targeting. In addition, PTMs are critical regulators of chromatin architecture, gene expression, and tumor microenvironment (TME), that play a crucial function in tumorigenesis. Moreover, the implementation of artificial intelligence and machine learning algorithms enhances GBM therapeutics research through the identification of novel PTM enzymes and residues. Herein, we briefly explain the mechanism of protein modifications in GBM etiology, and in altering the biologics of GBM cells through chromatin remodeling, modulation of the TME, and signaling pathways. In addition, we highlighted the importance of PTM enzymes as therapeutic biomarkers and the role of artificial intelligence and machine learning in protein PTM prediction.
Collapse
Affiliation(s)
- Smita Kumari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India
| | - Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India; School of Medicine, University of South Carolina, Columbia, SC, United States of America
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India; Department of Biotechnology and Microbiology, SRM University, Sonepat, Haryana, India.
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India.
| |
Collapse
|
3
|
Khandelwal M, Rout RK. PRMxAI: protein arginine methylation sites prediction based on amino acid spatial distribution using explainable artificial intelligence. BMC Bioinformatics 2023; 24:376. [PMID: 37794362 PMCID: PMC10548713 DOI: 10.1186/s12859-023-05491-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 09/21/2023] [Indexed: 10/06/2023] Open
Abstract
BACKGROUND Protein methylation, a post-translational modification, is crucial in regulating various cellular functions. Arginine methylation is required to understand crucial biochemical activities and biological functions, like gene regulation, signal transduction, etc. However, some experimental methods, including Chip-Chip, mass spectrometry, and methylation-specific antibodies, exist for the prediction of methylated proteins. These experimental methods are expensive and tedious. As a result, computational methods based on machine learning play an efficient role in predicting arginine methylation sites. RESULTS In this research, a novel method called PRMxAI has been proposed to predict arginine methylation sites. The proposed PRMxAI extract sequence-based features, such as dipeptide composition, physicochemical properties, amino acid composition, and information theory-based features (Arimoto, Havrda-Charvat, Renyi, and Shannon entropy), to represent the protein sequences into numerical format. Various machine learning algorithms are implemented to select the better classifier, such as Decision trees, Naive Bayes, Random Forest, Support vector machines, and K-nearest neighbors. The random forest algorithm is selected as the underlying classifier for the PRMxAI model. The performance of PRMxAI is evaluated by employing 10-fold cross-validation, and it yields 87.17% and 90.40% accuracy on mono-methylarginine and di-methylarginine data sets, respectively. This research also examines the impact of various features on both data sets using explainable artificial intelligence. CONCLUSIONS The proposed PRMxAI shows the effectiveness of the features for predicting arginine methylation sites. Additionally, the SHapley Additive exPlanation method is used to interpret the predictive mechanism of the proposed model. The results indicate that the proposed PRMxAI model outperforms other state-of-the-art predictors.
Collapse
Affiliation(s)
- Monika Khandelwal
- Computer Science and Engineering Department, National Institute of Technology Srinagar, Hazratbal, Srinagar, J&K 190006 India
| | - Ranjeet Kumar Rout
- Computer Science and Engineering Department, National Institute of Technology Srinagar, Hazratbal, Srinagar, J&K 190006 India
| |
Collapse
|
4
|
Sohrawordi M, Hossain MA, Hasan MAM. PLP_FS: prediction of lysine phosphoglycerylation sites in protein using support vector machine and fusion of multiple F_Score feature selection. Brief Bioinform 2022; 23:6655632. [PMID: 35929355 DOI: 10.1093/bib/bbac306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 07/05/2022] [Accepted: 07/06/2022] [Indexed: 11/14/2022] Open
Abstract
A newly invented post-translational modification (PTM), phosphoglycerylation, has shown its essential role in the construction and functional properties of proteins and dangerous human diseases. Hence, it is very urgent to know about the molecular mechanism behind the phosphoglycerylation process to develop the drugs for related diseases. But accurately identifying of phosphoglycerylation site from a protein sequence in a laboratory is a very difficult and challenging task. Hence, the construction of an efficient computation model is greatly sought for this purpose. A little number of computational models are currently available for identifying the phosphoglycerylation sites, which are not able to reach their prediction capability at a satisfactory level. Therefore, an effective predictor named PLP_FS has been designed and constructed to identify phosphoglycerylation sites in this study. For the training purpose, an optimal number of feature sets was obtained by fusion of multiple F_Score feature selection techniques from the features generated by three types of sequence-based feature extraction methods and fitted with the support vector machine classification technique to the prediction model. On the other hand, the k-neighbor near cleaning and SMOTE methods were also implemented to balance the benchmark dataset. The suggested model in 10-fold cross-validation obtained an accuracy of 99.22%, a sensitivity of 98.17% and a specificity of 99.75% according to the experimental findings, which are better than other currently available predictors for accurately identifying the phosphoglycerylation sites.
Collapse
Affiliation(s)
- Md Sohrawordi
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
- Dept. of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
| | - Md Ali Hossain
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Md Al Mehedi Hasan
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| |
Collapse
|
5
|
Xia Y, Jiang M, Luo Y, Feng G, Jia G, Zhang H, Wang P, Ge R. SuccSPred2.0: A Two-Step Model to Predict Succinylation Sites Based on Multifeature Fusion and Selection Algorithm. J Comput Biol 2022; 29:1085-1094. [PMID: 35714347 DOI: 10.1089/cmb.2022.0109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein succinylation is a novel type of post-translational modification in recent decade years. It played an important role in biological structure and functions verified by experiments. However, it is time consuming and laborious for the wet experimental identification of succinylation sites. Traditional technology cannot adapt to the rapid growth of the biological sequence data sets. In this study, a new computational method named SuccSPred2.0 was proposed to identify succinylation sites in the protein sequences based on multifeature fusion and maximal information coefficient (MIC) method. SuccSPred2.0 was implemented based on a two-step strategy. At first, high-dimension features were reduced by linear discriminant analysis to prevent overfitting. Subsequently, MIC method was employed to select the important features binding classifiers to predict succinylation sites. From the compared experiments on 10-fold cross-validation and independent test data sets, SuccSPred2.0 obtained promising improvements. Comparative experiments showed that SuccSPred2.0 was superior to previous tools in identifying succinylation sites in the given proteins.
Collapse
Affiliation(s)
- Yixiao Xia
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Minchao Jiang
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Yizhang Luo
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Guanwen Feng
- Xi'an Key Laboratory of Big Data and Intelligent Vision, School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Gangyong Jia
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Hua Zhang
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Pu Wang
- Computer School, Hubei University of Arts and Science, Xiangyang, China
| | - Ruiquan Ge
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| |
Collapse
|
6
|
Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2499:285-322. [PMID: 35696087 DOI: 10.1007/978-1-0716-2317-6_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Posttranslational modification (PTM ) is a ubiquitous phenomenon in both eukaryotes and prokaryotes which gives rise to enormous proteomic diversity. PTM mostly comes in two flavors: covalent modification to polypeptide chain and proteolytic cleavage. Understanding and characterization of PTM is a fundamental step toward understanding the underpinning of biology. Recent advances in experimental approaches, mainly mass-spectrometry-based approaches, have immensely helped in obtaining and characterizing PTMs. However, experimental approaches are not enough to understand and characterize more than 450 different types of PTMs and complementary computational approaches are becoming popular. Recently, due to the various advancements in the field of Deep Learning (DL), along with the explosion of applications of DL to various fields, the field of computational prediction of PTM has also witnessed the development of a plethora of deep learning (DL)-based approaches. In this book chapter, we first review some recent DL-based approaches in the field of PTM site prediction. In addition, we also review the recent advances in the not-so-studied PTM , that is, proteolytic cleavage predictions. We describe advances in PTM prediction by highlighting the Deep learning architecture, feature encoding, novelty of the approaches, and availability of the tools/approaches. Finally, we provide an outlook and possible future research directions for DL-based approaches for PTM prediction.
Collapse
|
7
|
Sorkhi AG, Pirgazi J, Ghasemi V. A hybrid feature extraction scheme for efficient malonylation site prediction. Sci Rep 2022; 12:5756. [PMID: 35388017 PMCID: PMC8987080 DOI: 10.1038/s41598-022-08555-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 03/07/2022] [Indexed: 11/09/2022] Open
Abstract
Lysine malonylation is one of the most important post-translational modifications (PTMs). It affects the functionality of cells. Malonylation site prediction in proteins can unfold the mechanisms of cellular functionalities. Experimental methods are one of the due prediction approaches. But they are typically costly and time-consuming to implement. Recently, methods based on machine-learning solutions have been proposed to tackle this problem. Such practices have been shown to reduce costs and time complexities and increase accuracy. However, these approaches also have specific shortcomings, including inappropriate feature extraction out of protein sequences, high-dimensional features, and inefficient underlying classifiers. A machine learning-based method is proposed in this paper to cope with these problems. In the proposed approach, seven different features are extracted. Then, the extracted features are combined, ranked based on the Fisher's score (F-score), and the most efficient ones are selected. Afterward, malonylation sites are predicted using various classifiers. Simulation results show that the proposed method has acceptable performance compared with some state-of-the-art approaches. In addition, the XGBOOST classifier, founded on extracted features such as TFCRF, has a higher prediction rate than the other methods. The codes are publicly available at: https://github.com/jimy2020/Malonylation-site-prediction.
Collapse
Affiliation(s)
- Ali Ghanbari Sorkhi
- Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran
| | - Jamshid Pirgazi
- Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran.
| | - Vahid Ghasemi
- Department of Computer Engineering, Faculty of Information Technology, Kermanshah University of Technology, Kermanshah, Iran
| |
Collapse
|
8
|
Liu Y, Liu Y, Wang GA, Cheng Y, Bi S, Zhu X. BERT-Kgly: A Bidirectional Encoder Representations From Transformers (BERT)-Based Model for Predicting Lysine Glycation Site for Homo sapiens. FRONTIERS IN BIOINFORMATICS 2022; 2:834153. [PMID: 36304324 PMCID: PMC9580886 DOI: 10.3389/fbinf.2022.834153] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 01/20/2022] [Indexed: 12/21/2022] Open
Abstract
As one of the most important posttranslational modifications (PTMs), protein lysine glycation changes the characteristics of the proteins and leads to the dysfunction of the proteins, which may cause diseases. Accurately detecting the glycation sites is of great benefit for understanding the biological function and potential mechanism of glycation in the treatment of diseases. However, experimental methods are expensive and time-consuming for lysine glycation site identification. Instead, computational methods, with their higher efficiency and lower cost, could be an important supplement to the experimental methods. In this study, we proposed a novel predictor, BERT-Kgly, for protein lysine glycation site prediction, which was developed by extracting embedding features of protein segments from pretrained Bidirectional Encoder Representations from Transformers (BERT) models. Three pretrained BERT models were explored to get the embeddings with optimal representability, and three downstream deep networks were employed to build our models. Our results showed that the model based on embeddings extracted from the BERT model pretrained on 556,603 protein sequences of UniProt outperforms other models. In addition, an independent test set was used to evaluate and compare our model with other existing methods, which indicated that our model was superior to other existing models.
Collapse
Affiliation(s)
| | | | | | | | - Shoudong Bi
- *Correspondence: Shoudong Bi, ; Xiaolei Zhu,
| | - Xiaolei Zhu
- *Correspondence: Shoudong Bi, ; Xiaolei Zhu,
| |
Collapse
|
9
|
Wang C, Tan X, Tang D, Gou Y, Han C, Ning W, Lin S, Zhang W, Chen M, Peng D, Xue Y. GPS-Uber: a hybrid-learning framework for prediction of general and E3-specific lysine ubiquitination sites. Brief Bioinform 2022; 23:6509047. [PMID: 35037020 DOI: 10.1093/bib/bbab574] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 12/11/2021] [Accepted: 12/14/2021] [Indexed: 12/13/2022] Open
Abstract
As an important post-translational modification, lysine ubiquitination participates in numerous biological processes and is involved in human diseases, whereas the site specificity of ubiquitination is mainly decided by ubiquitin-protein ligases (E3s). Although numerous ubiquitination predictors have been developed, computational prediction of E3-specific ubiquitination sites is still a great challenge. Here, we carefully reviewed the existing tools for the prediction of general ubiquitination sites. Also, we developed a tool named GPS-Uber for the prediction of general and E3-specific ubiquitination sites. From the literature, we manually collected 1311 experimentally identified site-specific E3-substrate relations, which were classified into different clusters based on corresponding E3s at different levels. To predict general ubiquitination sites, we integrated 10 types of sequence and structure features, as well as three types of algorithms including penalized logistic regression, deep neural network and convolutional neural network. Compared with other existing tools, the general model in GPS-Uber exhibited a highly competitive accuracy, with an area under curve values of 0.7649. Then, transfer learning was adopted for each E3 cluster to construct E3-specific models, and in total 112 individual E3-specific predictors were implemented. Using GPS-Uber, we conducted a systematic prediction of human cancer-associated ubiquitination events, which could be helpful for further experimental consideration. GPS-Uber will be regularly updated, and its online service is free for academic research at http://gpsuber.biocuckoo.cn/.
Collapse
Affiliation(s)
- Chenwei Wang
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Xiaodan Tan
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Dachao Tang
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yujie Gou
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Cheng Han
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Wanshan Ning
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Shaofeng Lin
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Weizhi Zhang
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Miaomiao Chen
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Di Peng
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yu Xue
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| |
Collapse
|
10
|
Basith S, Lee G, Manavalan B. STALLION: a stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction. Brief Bioinform 2021; 23:6370848. [PMID: 34532736 PMCID: PMC8769686 DOI: 10.1093/bib/bbab376] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 08/22/2021] [Accepted: 08/24/2021] [Indexed: 12/13/2022] Open
Abstract
Protein post-translational modification (PTM) is an important regulatory mechanism that plays a key role in both normal and disease states. Acetylation on lysine residues is one of the most potent PTMs owing to its critical role in cellular metabolism and regulatory processes. Identifying protein lysine acetylation (Kace) sites is a challenging task in bioinformatics. To date, several machine learning-based methods for the in silico identification of Kace sites have been developed. Of those, a few are prokaryotic species-specific. Despite their attractive advantages and performances, these methods have certain limitations. Therefore, this study proposes a novel predictor STALLION (STacking-based Predictor for ProkAryotic Lysine AcetyLatION), containing six prokaryotic species-specific models to identify Kace sites accurately. To extract crucial patterns around Kace sites, we employed 11 different encodings representing three different characteristics. Subsequently, a systematic and rigorous feature selection approach was employed to identify the optimal feature set independently for five tree-based ensemble algorithms and built their respective baseline model for each species. Finally, the predicted values from baseline models were utilized and trained with an appropriate classifier using the stacking strategy to develop STALLION. Comparative benchmarking experiments showed that STALLION significantly outperformed existing predictor on independent tests. To expedite direct accessibility to the STALLION models, a user-friendly online predictor was implemented, which is available at: http://thegleelab.org/STALLION.
Collapse
Affiliation(s)
- Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Republic of Korea
| | - Gwang Lee
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Republic of Korea
| | | |
Collapse
|