1
|
Gou Y, Liu D, Chen M, Wei Y, Huang X, Han C, Feng Z, Zhang C, Lu T, Peng D, Xue Y. GPS-SUMO 2.0: an updated online service for the prediction of SUMOylation sites and SUMO-interacting motifs. Nucleic Acids Res 2024; 52:W238-W247. [PMID: 38709873 PMCID: PMC11223847 DOI: 10.1093/nar/gkae346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/08/2024] [Accepted: 04/18/2024] [Indexed: 05/08/2024] Open
Abstract
Small ubiquitin-like modifiers (SUMOs) are tiny but important protein regulators involved in orchestrating a broad spectrum of biological processes, either by covalently modifying protein substrates or by noncovalently interacting with other proteins. Here, we report an updated server, GPS-SUMO 2.0, for the prediction of SUMOylation sites and SUMO-interacting motifs (SIMs). For predictor training, we adopted three machine learning algorithms, penalized logistic regression (PLR), a deep neural network (DNN), and a transformer, and used 52 404 nonredundant SUMOylation sites in 8262 proteins and 163 SIMs in 102 proteins. To further increase the accuracy of predicting SUMOylation sites, a pretraining model was first constructed using 145 545 protein lysine modification sites, followed by transfer learning to fine-tune the model. GPS-SUMO 2.0 exhibited greater accuracy in predicting SUMOylation sites than did other existing tools. For users, one or multiple protein sequences or identifiers can be input, and the prediction results are shown in a tabular list. In addition to the basic statistics, we integrated knowledge from 35 public resources to annotate SUMOylation sites or SIMs. The GPS-SUMO 2.0 server is freely available at https://sumo.biocuckoo.cn/. We believe that GPS-SUMO 2.0 can serve as a useful tool for further analysis of SUMOylation and SUMO interactions.
Collapse
Affiliation(s)
- Yujie Gou
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
- Key Laboratory of Molecular Biophysics of Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
| | - Dan Liu
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
- Key Laboratory of Molecular Biophysics of Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
| | - Miaomiao Chen
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
- Key Laboratory of Molecular Biophysics of Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
| | - Yuxiang Wei
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
- Key Laboratory of Molecular Biophysics of Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
| | - Xinhe Huang
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
- Key Laboratory of Molecular Biophysics of Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
| | - Cheng Han
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
- Key Laboratory of Molecular Biophysics of Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
| | - Zihao Feng
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
- Key Laboratory of Molecular Biophysics of Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
| | - Chi Zhang
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
- Key Laboratory of Molecular Biophysics of Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
| | - Teng Lu
- Computer Network Information Center, Chinese Academy of Sciences, Beijing100190, China
| | - Di Peng
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
- Key Laboratory of Molecular Biophysics of Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
| | - Yu Xue
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
- Key Laboratory of Molecular Biophysics of Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China
- Nanjing University Institute of Artificial Intelligence Biomedicine, Nanjing210031, China
| |
Collapse
|
2
|
Lv Z, Wei X, Hu S, Lin G, Qiu W. iSUMO-RsFPN: A predictor for identifying lysine SUMOylation sites based on multi-features and feature pyramid networks. Anal Biochem 2024; 687:115460. [PMID: 38191118 DOI: 10.1016/j.ab.2024.115460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 01/02/2024] [Accepted: 01/03/2024] [Indexed: 01/10/2024]
Abstract
SUMOylation is a protein post-translational modification that plays an essential role in cellular functions. For predicting SUMO sites, numerous researchers have proposed advanced methods based on ordinary machine learning algorithms. These reported methods have shown excellent predictive performance, but there is room for improvement. In this study, we constructed a novel deep neural network Residual Pyramid Network (RsFPN), and developed an ensemble deep learning predictor called iSUMO-RsFPN. Initially, three feature extraction methods were employed to extract features from samples. Following this, weak classifiers were trained based on RsFPN for each feature type. Ultimately, the weak classifiers were integrated to construct the final classifier. Moreover, the predictor underwent systematically testing on an independent test dataset, where the results demonstrated a significant improvement over the existing state-of-the-art predictors. The code of iSUMO-RsFPN is free and available at https://github.com/454170054/iSUMO-RsFPN.
Collapse
Affiliation(s)
- Zhe Lv
- School of Mega Data, Jiangxi Institute of Fashion Technology, 330201, Nanchang, Jiangxi, China
| | - Xin Wei
- Business School, Jiangxi Institute of Fashion Technology, 330201, Nanchang, Jiangxi, China
| | - Siqin Hu
- School of Mega Data, Jiangxi Institute of Fashion Technology, 330201, Nanchang, Jiangxi, China
| | - Gang Lin
- School of Mega Data, Jiangxi Institute of Fashion Technology, 330201, Nanchang, Jiangxi, China
| | - Wangren Qiu
- Computer Department, Jingdezhen Ceramic University, 333403, Jingdezhen, Jiangxi, China.
| |
Collapse
|
3
|
Chang X, Zhu Y, Chen Y, Li L. DeepNphos: A deep-learning architecture for prediction of N-phosphorylation sites. Comput Biol Med 2024; 170:108079. [PMID: 38295472 DOI: 10.1016/j.compbiomed.2024.108079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 01/25/2024] [Accepted: 01/27/2024] [Indexed: 02/02/2024]
Abstract
MOTIVATION Phosphorylation, a prevalent post-translational modification, plays a crucial role in regulating cellular activities. This process encompasses O-phosphorylation (e.g., phosphoserine) and N-phosphorylation (e.g., phospho-lysine (pK), phospho-arginine (pR), and phospho-histidine (pH)). While significant research has focused on O-phosphorylation, resulting in the development of various algorithms for predicting O-phosphorylation sites with commendable performance, there has been a notable absence of models designed to predict N-phosphorylation sites. This study introduces an integrated model named DeepNphos, designed to predict N-phosphorylation sites. This model is developed based on the analysis of thousands of experimentally identified pK, pR and pH sites. RESULTS Observing that the Convolutional Neural Network (CNN) model, incorporating the One-Hot encoding feature, demonstrates favorable performance in comparison to other models when predicting pK, pR, and pH sites. Additionally, pK exhibits similarities to other lysine modification types, and integrating the CNN model with a deep-transfer learning (DTL) strategy based on tens of thousands of known lysine modification sites could enhance pK prediction performance. In contrast, pR exhibits little similarity to other arginine modification types, and the integration of DTL has minimal impact on pR prediction performance. Furthermore, the decision was made to refrain from incorporating the DTL strategy in predicting pH sites, given the scarcity of histidine modification sites beyond those associated with pH. The final classifiers for predicting pK, pR, and pH sites achieve AUC values of 0.856, 0.805 and 0.802 for ten-fold cross-validation, respectively. Overall, DeepNphos is the first classifier for predicting N-phosphorylation sites, accessible at https://github.com/ChangXulinmessi/DeepNPhos.
Collapse
Affiliation(s)
- Xulin Chang
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, China
| | - Yafei Zhu
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, China
| | - Yu Chen
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, China
| | - Lei Li
- School of Health and Life Sciences, University of Health and Rehabilitation Sciences, Qingdao, 266000, China.
| |
Collapse
|
4
|
Wu W, Zheng J, Wang R, Wang Y. Ion channels regulate energy homeostasis and the progression of metabolic disorders: Novel mechanisms and pharmacology of their modulators. Biochem Pharmacol 2023; 218:115863. [PMID: 37863328 DOI: 10.1016/j.bcp.2023.115863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/13/2023] [Accepted: 10/16/2023] [Indexed: 10/22/2023]
Abstract
The progression of metabolic diseases, featured by dysregulated metabolic signaling pathways, is orchestrated by numerous signaling networks. Among the regulators, ion channels transport ions across the membranes and trigger downstream signaling transduction. They critically regulate energy homeostasis and pathogenesis of metabolic diseases and are potential therapeutic targets for treating metabolic disorders. Ion channel blockers have been used to treat diabetes for decades by stimulating insulin secretion, yet with hypoglycemia and other adverse effects. It calls for deeper understanding of the largely elusive regulatory mechanisms, which facilitates the identification of new therapeutic targets and safe drugs against ion channels. In the article, we critically assess the two principal regulatory mechanisms, protein-channel interaction and post-translational modification on the activities of ion channels to modulate energy homeostasis and metabolic disorders through multiple novel mechanisms. Moreover, we discuss the multidisciplinary methods that provide the tools for elucidation of the regulatory mechanisms mediating metabolic disorders by ion channels. In terms of translational perspective, the mechanistic analysis of recently validated ion channels that regulate insulin resistance, body weight control, and adverse effects of current ion channel antagonists are discussed in details. Their small molecule modulators serve as promising new drug candidates to combat metabolic disorders.
Collapse
Affiliation(s)
- Wenyi Wu
- School of Kinesiology, Shanghai University of Sport, Shanghai 200438, China
| | - Jianan Zheng
- School of Kinesiology, Shanghai University of Sport, Shanghai 200438, China
| | - Ru Wang
- School of Kinesiology, Shanghai University of Sport, Shanghai 200438, China; Shanghai Frontiers Science Research Base of Exercise and Metabolic Health, China
| | - Yibing Wang
- School of Kinesiology, Shanghai University of Sport, Shanghai 200438, China; Shanghai Frontiers Science Research Base of Exercise and Metabolic Health, China.
| |
Collapse
|
5
|
Khan S, Khan M, Iqbal N, Dilshad N, Almufareh MF, Alsubaie N. Enhancing Sumoylation Site Prediction: A Deep Neural Network with Discriminative Features. Life (Basel) 2023; 13:2153. [PMID: 38004293 PMCID: PMC10672286 DOI: 10.3390/life13112153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 10/18/2023] [Accepted: 10/25/2023] [Indexed: 11/26/2023] Open
Abstract
Sumoylation is a post-translation modification (PTM) mechanism that involves many critical biological processes, such as gene expression, localizing and stabilizing proteins, and replicating the genome. Moreover, sumoylation sites are associated with different diseases, including Parkinson's and Alzheimer's. Due to its vital role in the biological process, identifying sumoylation sites in proteins is significant for monitoring protein functions and discovering multiple diseases. Therefore, in the literature, several computational models utilizing conventional ML methods have been introduced to classify sumoylation sites. However, these models cannot accurately classify the sumoylation sites due to intrinsic limitations associated with the conventional learning methods. This paper proposes a robust computational model (called Deep-Sumo) for predicting sumoylation sites based on a deep-learning algorithm with efficient feature representation methods. The proposed model employs a half-sphere exposure method to represent protein sequences in a feature vector. Principal Component Analysis is applied to extract discriminative features by eliminating noisy and redundant features. The discriminant features are given to a multilayer Deep Neural Network (DNN) model to predict sumoylation sites accurately. The performance of the proposed model is extensively evaluated using a 10-fold cross-validation test by considering various statistical-based performance measurement metrics. Initially, the proposed DNN is compared with the traditional learning algorithm, and subsequently, the performance of the Deep-Sumo is compared with the existing models. The validation results show that the proposed model reports an average accuracy of 96.47%, with improvement compared with the existing models. It is anticipated that the proposed model can be used as an effective tool for drug discovery and the diagnosis of multiple diseases.
Collapse
Affiliation(s)
- Salman Khan
- Department of Computer Science, Abdul Wali Khan University, Mardan 23200, Pakistan; (S.K.); (N.I.)
| | - Mukhtaj Khan
- Department of Information Technology, The University of Haripur, Haripur 22620, Pakistan;
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University, Mardan 23200, Pakistan; (S.K.); (N.I.)
| | - Naqqash Dilshad
- Department of Convergence Engineering for Intelligent Drone, Sejong University, Seoul 05006, Republic of Korea;
| | - Maram Fahaad Almufareh
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakaka 72388, Saudi Arabia;
| | - Najah Alsubaie
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University (PNU), P.O. Box 84428, Riyadh 11671, Saudi Arabia
| |
Collapse
|
6
|
Kumari S, Gupta R, Ambasta RK, Kumar P. Emerging trends in post-translational modification: Shedding light on Glioblastoma multiforme. Biochim Biophys Acta Rev Cancer 2023; 1878:188999. [PMID: 37858622 DOI: 10.1016/j.bbcan.2023.188999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/06/2023] [Accepted: 10/06/2023] [Indexed: 10/21/2023]
Abstract
Recent multi-omics studies, including proteomics, transcriptomics, genomics, and metabolomics have revealed the critical role of post-translational modifications (PTMs) in the progression and pathogenesis of Glioblastoma multiforme (GBM). Further, PTMs alter the oncogenic signaling events and offer a novel avenue in GBM therapeutics research through PTM enzymes as potential biomarkers for drug targeting. In addition, PTMs are critical regulators of chromatin architecture, gene expression, and tumor microenvironment (TME), that play a crucial function in tumorigenesis. Moreover, the implementation of artificial intelligence and machine learning algorithms enhances GBM therapeutics research through the identification of novel PTM enzymes and residues. Herein, we briefly explain the mechanism of protein modifications in GBM etiology, and in altering the biologics of GBM cells through chromatin remodeling, modulation of the TME, and signaling pathways. In addition, we highlighted the importance of PTM enzymes as therapeutic biomarkers and the role of artificial intelligence and machine learning in protein PTM prediction.
Collapse
Affiliation(s)
- Smita Kumari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India
| | - Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India; School of Medicine, University of South Carolina, Columbia, SC, United States of America
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India; Department of Biotechnology and Microbiology, SRM University, Sonepat, Haryana, India.
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India.
| |
Collapse
|
7
|
Lin X, Gao Y, Lei F. An application of topological data analysis in predicting sumoylation sites. PeerJ 2023; 11:e16204. [PMID: 37846308 PMCID: PMC10576966 DOI: 10.7717/peerj.16204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Accepted: 09/08/2023] [Indexed: 10/18/2023] Open
Abstract
Sumoylation is a reversible post-translational modification that regulates certain significant biochemical functions in proteins. The protein alterations caused by sumoylation are associated with the incidence of some human diseases. Therefore, identifying the sites of sumoylation in proteins may provide a direction for mechanistic research and drug development. Here, we propose a new computational approach for identifying sumoylation sites using an encoding method based on topological data analysis. The features of our model captured the key physical and biological properties of proteins at multiple scales. In a 10-fold cross validation, the outcomes of our model showed 96.45% of sensitivity (Sn), 94.65% of accuracy (Acc), 0.8946 of Matthew's correlation coefficient (MCC), and 0.99 of area under curve (AUC). The proposed predictor with only topological features achieves the best MCC and AUC in comparison to the other released methods. Our results suggest that topological information is an additional parameter that can assist in the prediction of sumoylation sites and provide a novel perspective for further research in protein sumoylation.
Collapse
Affiliation(s)
- Xiaoxi Lin
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
| | - Yaru Gao
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
| | - Fengchun Lei
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
| |
Collapse
|
8
|
Tang H, Tang Q, Zhang Q, Feng P. O-GlyThr: Prediction of human O-linked threonine glycosites using multi-feature fusion. Int J Biol Macromol 2023; 242:124761. [PMID: 37156312 DOI: 10.1016/j.ijbiomac.2023.124761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 05/01/2023] [Accepted: 05/02/2023] [Indexed: 05/10/2023]
Abstract
O-linked glycosylation is one of the most complex post-translational modifications (PTM) of human proteins modulating various cellular metabolic and signaling pathways. Unlike N-glycosylation, the O-glycosylation has nonspecific sequence features and nonstable glycan core structure, which makes identification of O-glycosites more challenging either by experimental or computational methods. Biochemical experiments to identify O-glycosites in batches are technically and economically demanding. Therefore, development of computation-based methods is greatly warranted. This study constructed a prediction model based on feature fusion for O-glycosites linked to the threonine residues in Homo sapiens. In the training model, we collected and sorted out high-quality human protein data with O-linked threonine glycosites. Seven feature coding methods were fused to represent the sample sequence. By comparison of different algorithms, random forest was selected as the final classifier to construct the classification model. Through 5-fold cross-validation, the proposed model, namely O-GlyThr, performed satisfactorily on both training set (AUC: 0.9308) and independent validation dataset (AUC: 0.9323). Compared with previously published predictors, O-GlyThr achieved the highest ACC of 0.8475 on the independent test dataset. These results demonstrated the high competency of our predictor in identifying O-glycosites on threonine residues. Furthermore, a user-friendly webserver named O-GlyThr (http://cbcb.cdutcm.edu.cn/O-GlyThr/) was developed to assist glycobiologists in the research associated with glycosylation structure and function.
Collapse
Affiliation(s)
- Hua Tang
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China
| | - Qiang Tang
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China
| | - Pengmian Feng
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.
| |
Collapse
|
9
|
Jiang H, Shang S, Sha Y, Zhang L, He N, Li L. EdeepSADPr: an extensive deep-learning architecture for prediction of the in situ crosstalks of serine phosphorylation and ADP-ribosylation. Front Cell Dev Biol 2023; 11:1149535. [PMID: 37187615 PMCID: PMC10175571 DOI: 10.3389/fcell.2023.1149535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Accepted: 04/17/2023] [Indexed: 05/17/2023] Open
Abstract
The in situ post-translational modification (PTM) crosstalk refers to the interactions between different types of PTMs that occur on the same residue site of a protein. The crosstalk sites generally have different characteristics from those with the single PTM type. Studies targeting the latter's features have been widely conducted, while studies on the former's characteristics are rare. For example, the characteristics of serine phosphorylation (pS) and serine ADP-ribosylation (SADPr) have been investigated, whereas those of their in situ crosstalks (pSADPr) are unknown. In this study, we collected 3,250 human pSADPr, 7,520 SADPr, 151,227 pS and 80,096 unmodified serine sites and explored the features of the pSADPr sites. We found that the characteristics of pSADPr sites are more similar to those of SADPr compared to pS or unmodified serine sites. Moreover, the crosstalk sites are likely to be phosphorylated by some kinase families (e.g., AGC, CAMK, STE and TKL) rather than others (e.g., CK1 and CMGC). Additionally, we constructed three classifiers to predict pSADPr sites from the pS dataset, the SADPr dataset and the protein sequences separately. We built and evaluated five deep-learning classifiers in ten-fold cross-validation and independent test datasets. We also used the classifiers as base classifiers to develop a few stacking-based ensemble classifiers to improve performance. The best classifiers had the AUC values of 0.700, 0.914 and 0.954 for recognizing pSADPr sites from the SADPr, pS and unmodified serine sites, respectively. The lowest prediction accuracy was achieved by separating pSADPr and SADPr sites, which is consistent with the observation that pSADPr's characteristics are more similar to those of SADPr than the rest. Finally, we developed an online tool for extensively predicting human pSADPr sites based on the CNNOH classifier, dubbed EdeepSADPr. It is freely available through http://edeepsadpr.bioinfogo.org/. We expect our investigation will promote a comprehensive understanding of crosstalks.
Collapse
Affiliation(s)
- Haoqiang Jiang
- College of Basic Medicine, Qingdao University, Qingdao, China
- Sino Genomics Technology Co., Ltd., Qingdao, China
| | - Shipeng Shang
- College of Basic Medicine, Qingdao University, Qingdao, China
| | - Yutong Sha
- College of Basic Medicine, Qingdao University, Qingdao, China
| | - Lin Zhang
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Ningning He
- College of Basic Medicine, Qingdao University, Qingdao, China
| | - Lei Li
- College of Basic Medicine, Qingdao University, Qingdao, China
- Faculty of Biomedical and Rehabilitation Engineering, University of Health and Rehabilitation Sciences, Qingdao, China
- *Correspondence: Lei Li,
| |
Collapse
|
10
|
Zhao J, Jiang H, Zou G, Lin Q, Wang Q, Liu J, Ma L. CNNArginineMe: A CNN structure for training models for predicting arginine methylation sites based on the One-Hot encoding of peptide sequence. Front Genet 2022; 13:1036862. [PMID: 36324513 PMCID: PMC9618650 DOI: 10.3389/fgene.2022.1036862] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 10/04/2022] [Indexed: 11/30/2022] Open
Abstract
Protein arginine methylation (PRme), as one post-translational modification, plays a critical role in numerous cellular processes and regulates critical cellular functions. Though several in silico models for predicting PRme sites have been reported, new models may be required to develop due to the significant increase of identified PRme sites. In this study, we constructed multiple machine-learning and deep-learning models. The deep-learning model CNN combined with the One-Hot coding showed the best performance, dubbed CNNArginineMe. CNNArginineMe performed best in AUC scoring metrics in comparisons with several reported predictors. Additionally, we employed CNNArginineMe to predict arginine methylation proteome and performed functional analysis. The arginine methylated proteome is significantly enriched in the amyotrophic lateral sclerosis (ALS) pathway. CNNArginineMe is freely available at https://github.com/guoyangzou/CNNArginineMe.
Collapse
Affiliation(s)
- Jiaojiao Zhao
- Cancer Institute of the Affiliated Hospital of Qingdao University and Qingdao Cancer Institute, Qingdao University, Qingdao, China
- School of Basic Medicine, Qingdao University, Qingdao, China
| | - Haoqiang Jiang
- School of Basic Medicine, Qingdao University, Qingdao, China
| | - Guoyang Zou
- School of Basic Medicine, Qingdao University, Qingdao, China
| | - Qian Lin
- Cancer Institute of the Affiliated Hospital of Qingdao University and Qingdao Cancer Institute, Qingdao University, Qingdao, China
| | - Qiang Wang
- Oncology Department, Shandong Second Provincial General Hospital, Jinan, China
| | - Jia Liu
- Department of Pharmacology, School of Pharmacy, Qingdao University, Qingdao, China
| | - Leina Ma
- Cancer Institute of the Affiliated Hospital of Qingdao University and Qingdao Cancer Institute, Qingdao University, Qingdao, China
- *Correspondence: Leina Ma,
| |
Collapse
|