1
|
Qin Z, Ren H, Zhao P, Wang K, Liu H, Miao C, Du Y, Li J, Wu L, Chen Z. Current computational tools for protein lysine acylation site prediction. Brief Bioinform 2024; 25:bbae469. [PMID: 39316944 PMCID: PMC11421846 DOI: 10.1093/bib/bbae469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/20/2024] [Accepted: 09/07/2024] [Indexed: 09/26/2024] Open
Abstract
As a main subtype of post-translational modification (PTM), protein lysine acylations (PLAs) play crucial roles in regulating diverse functions of proteins. With recent advancements in proteomics technology, the identification of PTM is becoming a data-rich field. A large amount of experimentally verified data is urgently required to be translated into valuable biological insights. With computational approaches, PLA can be accurately detected across the whole proteome, even for organisms with small-scale datasets. Herein, a comprehensive summary of 166 in silico PLA prediction methods is presented, including a single type of PLA site and multiple types of PLA sites. This recapitulation covers important aspects that are critical for the development of a robust predictor, including data collection and preparation, sample selection, feature representation, classification algorithm design, model evaluation, and method availability. Notably, we discuss the application of protein language models and transfer learning to solve the small-sample learning issue. We also highlight the prediction methods developed for functionally relevant PLA sites and species/substrate/cell-type-specific PLA sites. In conclusion, this systematic review could potentially facilitate the development of novel PLA predictors and offer useful insights to researchers from various disciplines.
Collapse
Affiliation(s)
- Zhaohui Qin
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Haoran Ren
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Kaiyuan Wang
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Huixia Liu
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Chunbo Miao
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Yanxiu Du
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Junzhou Li
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Liuji Wu
- National Key Laboratory of Wheat and Maize Crop Science, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| |
Collapse
|
2
|
Hou X, Zhu L, Xu H, Shi J, Ji S. Dysregulation of protein succinylation and disease development. Front Mol Biosci 2024; 11:1407505. [PMID: 38882606 PMCID: PMC11176430 DOI: 10.3389/fmolb.2024.1407505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 05/15/2024] [Indexed: 06/18/2024] Open
Abstract
As a novel post-translational modification of proteins, succinylation is widely present in both prokaryotes and eukaryotes. By regulating protein translocation and activity, particularly involved in regulation of gene expression, succinylation actively participates in diverse biological processes such as cell proliferation, differentiation and metabolism. Dysregulation of succinylation is closely related to many diseases. Consequently, it has increasingly attracted attention from basic and clinical researchers. For a thorough understanding of succinylation dysregulation and its implications for disease development, such as inflammation, tumors, cardiovascular and neurological diseases, this paper provides a comprehensive review of the research progress on abnormal succinylation. This understanding of association of dysregulation of succinylation with pathological processes will provide valuable directions for disease prevention/treatment strategies as well as drug development.
Collapse
Affiliation(s)
- Xiaoli Hou
- Center for Molecular Medicine, Zhengzhou Shuqing Medical College, Zhengzhou, Henan, China
| | - Lijuan Zhu
- Zhengzhou Central Hospital Affiliated to Zhengzhou University, Zhengzhou, Henan, China
| | - Haiying Xu
- Center for Molecular Medicine, Zhengzhou Shuqing Medical College, Zhengzhou, Henan, China
| | - Jie Shi
- Zhoukou Vocational and Technical College, Zhoukou, Henan, China
| | - Shaoping Ji
- Center for Molecular Medicine, Zhengzhou Shuqing Medical College, Zhengzhou, Henan, China
- Department of Biochemistry and Molecular Biology, Medical School, Henan University, Kaifeng, Henan, China
| |
Collapse
|
3
|
Hou X, Chen Y, Li X, Gu X, Dong W, Shi J, Ji S. Protein succinylation: regulating metabolism and beyond. Front Nutr 2024; 11:1336057. [PMID: 38379549 PMCID: PMC10876795 DOI: 10.3389/fnut.2024.1336057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 01/22/2024] [Indexed: 02/22/2024] Open
Abstract
Modifications of protein post-translation are critical modulatory processes, which alters target protein biological activity,function and/or location, even involved in pathogenesis of some diseases. So far, there are at least 16 types of post-translation modifications identified, particularly through recent mass spectrometry analysis. Among them, succinylation (Ksuc) on protein lysine residues causes a variety of biological changes. Succinylation of proteins contributes to many cellular processes such as proliferation, growth, differentiation, metabolism and even tumorigenesis. Mechanically, Succinylation leads to conformation alteration of chromatin or remodeling. As a result, transcription/expression of target genes is changed accordingly. Recent research indicated that succinylation mainly contributes to metabolism modulations, from gene expression of metabolic enzymes to their activity modulation. In this review, we will conclude roles of succinylation in metabolic regulation of glucose, fat, amino acids and related metabolic disease launched by aberrant succinylation. Our goal is to stimulate extra attention to these still not well researched perhaps important succinylation modification on proteins and cell processes.
Collapse
Affiliation(s)
- Xiaoli Hou
- Department of Basic Medicine, Zhengzhou Shuqing Medical College, Zhengzhou, China
| | - Yiqiu Chen
- Department of Basic Medicine, Zhengzhou Shuqing Medical College, Zhengzhou, China
| | - Xiao Li
- Department of Basic Medicine, Zhengzhou Shuqing Medical College, Zhengzhou, China
| | - Xianliang Gu
- Department of Basic Medicine, Zhengzhou Shuqing Medical College, Zhengzhou, China
| | - Weixia Dong
- Department of Basic Medicine, Zhengzhou Shuqing Medical College, Zhengzhou, China
| | - Jie Shi
- Zhoukou Vocational and Technical College, Zhoukou, China
| | - Shaoping Ji
- Department of Basic Medicine, Zhengzhou Shuqing Medical College, Zhengzhou, China
- Department of Biochemistry and Molecular Biology, Medical School, Henan University, Kaifeng, China
| |
Collapse
|
4
|
Adejor J, Tumukunde E, Li G, Lin H, Xie R, Wang S. Impact of Lysine Succinylation on the Biology of Fungi. Curr Issues Mol Biol 2024; 46:1020-1046. [PMID: 38392183 PMCID: PMC10888112 DOI: 10.3390/cimb46020065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Revised: 01/02/2024] [Accepted: 01/03/2024] [Indexed: 02/24/2024] Open
Abstract
Post-translational modifications (PTMs) play a crucial role in protein functionality and the control of various cellular processes and secondary metabolites (SMs) in fungi. Lysine succinylation (Ksuc) is an emerging protein PTM characterized by the addition of a succinyl group to a lysine residue, which induces substantial alteration in the chemical and structural properties of the affected protein. This chemical alteration is reversible, dynamic in nature, and evolutionarily conserved. Recent investigations of numerous proteins that undergo significant succinylation have underscored the potential significance of Ksuc in various biological processes, encompassing normal physiological functions and the development of certain pathological processes and metabolites. This review aims to elucidate the molecular mechanisms underlying Ksuc and its diverse functions in fungi. Both conventional investigation techniques and predictive tools for identifying Ksuc sites were also considered. A more profound comprehension of Ksuc and its impact on the biology of fungi have the potential to unveil new insights into post-translational modification and may pave the way for innovative approaches that can be applied across various clinical contexts in the management of mycotoxins.
Collapse
Affiliation(s)
- John Adejor
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Elisabeth Tumukunde
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Guoqi Li
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Hong Lin
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Rui Xie
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Shihua Wang
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| |
Collapse
|
5
|
Wang X, Ding Z, Wang R, Lin X. Deepro-Glu: combination of convolutional neural network and Bi-LSTM models using ProtBert and handcrafted features to identify lysine glutarylation sites. Brief Bioinform 2023; 24:6991122. [PMID: 36653898 DOI: 10.1093/bib/bbac631] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 12/11/2022] [Accepted: 12/28/2022] [Indexed: 01/20/2023] Open
Abstract
Lysine glutarylation (Kglu) is a newly discovered post-translational modification of proteins with important roles in mitochondrial functions, oxidative damage, etc. The established biological experimental methods to identify glutarylation sites are often time-consuming and costly. Therefore, there is an urgent need to develop computational methods for efficient and accurate identification of glutarylation sites. Most of the existing computational methods only utilize handcrafted features to construct the prediction model and do not consider the positive impact of the pre-trained protein language model on the prediction performance. Based on this, we develop an ensemble deep-learning predictor Deepro-Glu that combines convolutional neural network and bidirectional long short-term memory network using the deep learning features and traditional handcrafted features to predict lysine glutaryation sites. The deep learning features are generated from the pre-trained protein language model called ProtBert, and the handcrafted features consist of sequence-based features, physicochemical property-based features and evolution information-based features. Furthermore, the attention mechanism is used to efficiently integrate the deep learning features and the handcrafted features by learning the appropriate attention weights. 10-fold cross-validation and independent tests demonstrate that Deepro-Glu achieves competitive or superior performance than the state-of-the-art methods. The source codes and data are publicly available at https://github.com/xwanggroup/Deepro-Glu.
Collapse
Affiliation(s)
- Xiao Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Zhaoyuan Ding
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Rong Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Xi Lin
- Instiute of Artificial Intelligence, Xiamen University, No.4221, Xiang'an South Road, 361000, Xiamen, China
| |
Collapse
|
6
|
Xia Y, Jiang M, Luo Y, Feng G, Jia G, Zhang H, Wang P, Ge R. SuccSPred2.0: A Two-Step Model to Predict Succinylation Sites Based on Multifeature Fusion and Selection Algorithm. J Comput Biol 2022; 29:1085-1094. [PMID: 35714347 DOI: 10.1089/cmb.2022.0109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein succinylation is a novel type of post-translational modification in recent decade years. It played an important role in biological structure and functions verified by experiments. However, it is time consuming and laborious for the wet experimental identification of succinylation sites. Traditional technology cannot adapt to the rapid growth of the biological sequence data sets. In this study, a new computational method named SuccSPred2.0 was proposed to identify succinylation sites in the protein sequences based on multifeature fusion and maximal information coefficient (MIC) method. SuccSPred2.0 was implemented based on a two-step strategy. At first, high-dimension features were reduced by linear discriminant analysis to prevent overfitting. Subsequently, MIC method was employed to select the important features binding classifiers to predict succinylation sites. From the compared experiments on 10-fold cross-validation and independent test data sets, SuccSPred2.0 obtained promising improvements. Comparative experiments showed that SuccSPred2.0 was superior to previous tools in identifying succinylation sites in the given proteins.
Collapse
Affiliation(s)
- Yixiao Xia
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Minchao Jiang
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Yizhang Luo
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Guanwen Feng
- Xi'an Key Laboratory of Big Data and Intelligent Vision, School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Gangyong Jia
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Hua Zhang
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Pu Wang
- Computer School, Hubei University of Arts and Science, Xiangyang, China
| | - Ruiquan Ge
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| |
Collapse
|
7
|
Wang H, Zhao H, Zhang J, Han J, Liu Z. A parallel model of DenseCNN and ordered-neuron LSTM for generic and species-specific succinylation site prediction. Biotechnol Bioeng 2022; 119:1755-1767. [PMID: 35320585 DOI: 10.1002/bit.28091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 03/12/2022] [Accepted: 03/19/2022] [Indexed: 11/07/2022]
Abstract
Lysine succinylation (Ksucc) regulates various metabolic processes, participates in vital life processes, ans is involved in the occurrence and development of numerous diseases. Accurate recognition of succinylation sites can reveal underlying functional mechanisms and pathogenesis. However, most remain undetected. Moreover, a deep learning architecture focusing on generic and species-specific predictions is still lacking. Thus, we proposed a deep learning-based framework named Deep-Ksucc, combining a dense convolutional network (DenseCNN) and ordered-neuron long short-term memory (OnLSTM) in parallel, which took the cascading characteristics of sequence information and physicochemical properties as the input. The results of the generic and species-specific predictions indicated that Deep-Ksucc can identify sequence patterns of different organisms and recognize plenty of succinylation sites. The case study showed that Deep-Ksucc can serve as a reliable tool for biology verification and computer-aided recognition of succinylation sites. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Huiqing Wang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Hong Zhao
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Jing Zhang
- Engineering Training Center, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Jiale Han
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Zhihao Liu
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| |
Collapse
|
8
|
Wang H, Zhao H, Yan Z, Zhao J, Han J. MDCAN-Lys: A Model for Predicting Succinylation Sites Based on Multilane Dense Convolutional Attention Network. Biomolecules 2021; 11:biom11060872. [PMID: 34208298 PMCID: PMC8231176 DOI: 10.3390/biom11060872] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 05/30/2021] [Accepted: 06/07/2021] [Indexed: 12/26/2022] Open
Abstract
Lysine succinylation is an important post-translational modification, whose abnormalities are closely related to the occurrence and development of many diseases. Therefore, exploring effective methods to identify succinylation sites is helpful for disease treatment and research of related drugs. However, most existing computational methods for the prediction of succinylation sites are still based on machine learning. With the increasing volume of data and complexity of feature representations, it is necessary to explore effective deep learning methods to recognize succinylation sites. In this paper, we propose a multilane dense convolutional attention network, MDCAN-Lys. MDCAN-Lys extracts sequence information, physicochemical properties of amino acids, and structural properties of proteins using a three-way network, and it constructs feature space. For each sub-network, MDCAN-Lys uses the cascading model of dense convolutional block and convolutional block attention module to capture feature information at different levels and improve the abstraction ability of the network. The experimental results of 10-fold cross-validation and independent testing show that MDCAN-Lys can recognize more succinylation sites, which is consistent with the conclusion of the case study. Thus, it is worthwhile to explore deep learning-based methods for the recognition of succinylation sites.
Collapse
|
9
|
Nilamyani AN, Auliah FN, Moni MA, Shoombuatong W, Hasan MM, Kurata H. PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features. Int J Mol Sci 2021; 22:2704. [PMID: 33800121 PMCID: PMC7962192 DOI: 10.3390/ijms22052704] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Revised: 03/02/2021] [Accepted: 03/03/2021] [Indexed: 12/15/2022] Open
Abstract
Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyrosine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available.
Collapse
Affiliation(s)
- Andi Nur Nilamyani
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (A.N.N.); (F.N.A.)
| | - Firda Nurul Auliah
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (A.N.N.); (F.N.A.)
| | - Mohammad Ali Moni
- WHO Collaborating Centre on eHealth, UNSW Digital Health, School of Public Health and Community Medicine, Faculty of Medicine, UNSW Sydney, Sydney, NSW 2052, Australia;
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (A.N.N.); (F.N.A.)
- Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (A.N.N.); (F.N.A.)
| |
Collapse
|
10
|
Auliah FN, Nilamyani AN, Shoombuatong W, Alam MA, Hasan MM, Kurata H. PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations. Int J Mol Sci 2021; 22:ijms22042120. [PMID: 33672741 PMCID: PMC7924619 DOI: 10.3390/ijms22042120] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 02/12/2021] [Accepted: 02/18/2021] [Indexed: 12/30/2022] Open
Abstract
Pupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental methods are laborious and time-consuming. Hence, computational algorithms are highly needed that can predict potential pupylation sites using sequence features. In this research, a new prediction model, PUP-Fuse, has been developed for pupylation site prediction by integrating multiple sequence representations. Meanwhile, we explored the five types of feature encoding approaches and three machine learning (ML) algorithms. In the final model, we integrated the successive ML scores using a linear regression model. The PUP-Fuse achieved a Mathew correlation value of 0.768 by a 10-fold cross-validation test. It also outperformed existing predictors in an independent test. The web server of the PUP-Fuse with curated datasets is freely available.
Collapse
Affiliation(s)
- Firda Nurul Auliah
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
| | - Andi Nur Nilamyani
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA;
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
- Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
- Correspondence:
| |
Collapse
|