1
|
Shazia, Ullah FUM, Rho S, Lee MY. Predictive modeling for ubiquitin proteins through advanced machine learning technique. Heliyon 2024; 10:e32517. [PMID: 38975176 PMCID: PMC11225741 DOI: 10.1016/j.heliyon.2024.e32517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 06/05/2024] [Indexed: 07/09/2024] Open
Abstract
Ubiquitination is an essential post-translational modification mechanism involving the ubiquitin protein's bonding to a substrate protein. It is crucial in a variety of physiological activities including cell survival and differentiation, and innate and adaptive immunity. Any alteration in the ubiquitin system leads to the development of various human diseases. Numerous researches show the highly reversibility and dynamic of ubiquitin system, making the experimental identification quite difficult. To solve this issue, this article develops a model using a machine learning approach, tending to improve the ubiquitin protein prediction precisely. We deeply investigate the ubiquitination data that is proceed through different features extraction methods, followed by the classification. The evaluation and assessment are conducted considering Jackknife tests and 10-fold cross-validation. The proposed method demonstrated the remarkable performance in terms of 100 %, 99.88 %, and 99.84 % accuracy on Dataset-I, Dataset-II, and Dataset-III, respectively. Using Jackknife test, the method achieves 100 %, 99.91 %, and 99.99 % for Dataset-I, Dataset-II and Dataset-III, respectively. This analysis concludes that the proposed method outperformed the state-of-the-arts to identify the ubiquitination sites and helpful in the development of current clinical therapies. The source code and datasets will be made available at Github.
Collapse
Affiliation(s)
- Shazia
- Mardan College of Nursing, Bacha Khan Medical College, Mardan, Pakistan
| | - Fath U Min Ullah
- Deparment of Computing, School of Engineering and Computing, University of Central Lancashire, Preston, United Kingdom
| | - Seungmin Rho
- Department of Industrial Security, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Mi Young Lee
- Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
2
|
Fan Y, Xu F, Wang R, He J. Lysine 222 in PPAR γ1 functions as the key site of MuRF2-mediated ubiquitination modification. Sci Rep 2023; 13:1999. [PMID: 36737649 PMCID: PMC9898238 DOI: 10.1038/s41598-023-28905-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 01/27/2023] [Indexed: 02/05/2023] Open
Abstract
Peroxisome proliferator-activated receptor gamma (PPAR γ) plays key roles in the development, physiology, reproduction, and homeostasis of organisms. Its expression and activity are regulated by various posttranslational modifications. We previously reported that E3 ubiquitin ligase muscle ring finger protein 2 (MuRF2) inhibits cardiac PPAR γ1 protein level and activity, eventually protects heart from diabetic cardiomyopathy; furthermore, by GST-pulldown assay, we found that MuRF2 modifies PPAR γ1 via poly-ubiquitination and accelerates PPAR γ1 proteasomal degradation. However, the key ubiquitination site on PPAR γ that MuRF2 targets for remains unclear. In the present study, we demonstrate that lysine site 222 is the receptor of MuRF2-mediated PPAR γ1 ubiquitination modification, using prediction of computational models, immunoprecipitation, ubiquitination assays, cycloheximide chasing assay and RT-qPCR. Our findings elucidated the underlying details of MuRF2 prevents heart from diabetic cardiomyopathy through the PPAR γ1 regulatory pathway.
Collapse
Affiliation(s)
- Yucheng Fan
- Department of Pathology, The First People's Hospital of Shizuishan, Affiliated to Ningxia Medical University, Shizuishan, China
| | - Fangjing Xu
- School of Clinical Medicine, Ningxia Medical University, Yinchuan, China
| | - Rui Wang
- School of Basic Medical Sciences , Ningxia Medical University, Yinchuan, China
| | - Jun He
- Department of Cardiovascular Internal Medicine, General Hospital of Ningxia Medical University, Yinchuan, China.
| |
Collapse
|
3
|
Wang H, Li H, Gao W, Xie J. PrUb-EL: A hybrid framework based on deep learning for identifying ubiquitination sites in Arabidopsis thaliana using ensemble learning strategy. Anal Biochem 2022; 658:114935. [PMID: 36206844 DOI: 10.1016/j.ab.2022.114935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/25/2022] [Accepted: 09/26/2022] [Indexed: 12/30/2022]
Abstract
Identification of ubiquitination sites is central to many biological experiments. Ubiquitination is a kind of post-translational protein modification (PTM). It is a key mechanism for increasing protein diversity and plays a vital role in regulating cell function. In recent years, many models have been developed to predict ubiquitination sites in humans, mice and yeast. However, few studies have predicted ubiquitination sites in Arabidopsis thaliana. In view of this, a deep network model named PrUb-EL is proposed to predict ubiquitination sites in Arabidopsis thaliana. Firstly, six features based on the protein sequence are extracted with amino acid index database (AAindex), dipeptide deviates from the expected mean (DDE), dipeptide composition (DPC), blocks substitution matrix (BLOSUM62), enhanced amino acid composition (EAAC) and binary encoding. Secondly, the synthetic minority over-sampling technique (SMOTE) is utilized to process the imbalanced data set. Then a new classifier named DG is presented, which includes Dense block, Residual block and Gated recurrent unit (GRU) block. Finally, each of six feature extraction methods is integrated into the DG model, and the ensemble learning strategy is used to gain the final prediction result. Experimental results show that PrUb-EL has good predictive ability with the accuracy (ACC) and area under the ROC curve (auROC) values of 91.00% and 97.70% using 5-fold cross-validation, respectively. Note that the values of ACC and auROC are 88.58% and 96.09% in the independent test, respectively. Compared with previous studies, our model has significantly improved performance thus it is an excellent method for identifying ubiquitination sites in Arabidopsis thaliana. The datasets and code used for the article are available at https://github.com/Tom-Wangy/PreUb-EL.git.
Collapse
Affiliation(s)
- Houqiang Wang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Hong Li
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China.
| | - Weifeng Gao
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Jin Xie
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| |
Collapse
|
4
|
Zhu F, Yang S, Meng F, Zheng Y, Ku X, Luo C, Hu G, Liang Z. Leveraging Protein Dynamics to Identify Functional Phosphorylation Sites using Deep Learning Models. J Chem Inf Model 2022; 62:3331-3345. [PMID: 35816597 DOI: 10.1021/acs.jcim.2c00484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Accurate prediction of post-translational modifications (PTMs) is of great significance in understanding cellular processes, by modulating protein structure and dynamics. Nowadays, with the rapid growth of protein data at different "omics" levels, machine learning models largely enriched the prediction of PTMs. However, most machine learning models only rely on protein sequence and little structural information. The lack of the systematic dynamics analysis underlying PTMs largely limits the PTM functional predictions. In this research, we present two dynamics-centric deep learning models, namely, cDL-PAU and cDL-FuncPhos, by incorporating sequence, structure, and dynamics-based features to elucidate the molecular basis and underlying functional landscape of PTMs. cDL-PAU achieved satisfactory area under the curve (AUC) scores of 0.804-0.888 for predicting phosphorylation, acetylation, and ubiquitination (PAU) sites, while cDL-FuncPhos achieved an AUC value of 0.771 for predicting functional phosphorylation (FuncPhos) sites, displaying reliable improvements. Through a feature selection, the dynamics-based coupling and commute ability show large contributions in discovering PAU sites and FuncPhos sites, suggesting the allosteric propensity for important PTMs. The application of cDL-FuncPhos in three oncoproteins not only corroborates its strong performance in FuncPhos prioritization but also gains insight into the physical basis for the functions. The source code and data set of cDL-PAU and cDL-FuncPhos are available at https://github.com/ComputeSuda/PTM_ML.
Collapse
Affiliation(s)
- Fei Zhu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China.,School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Sijie Yang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Fanwang Meng
- Department of Chemistry and Chemical Biology, McMaster University, Hamilton L8S 4L8, Ontario, Canada
| | - Yuxiang Zheng
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Xin Ku
- Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Cheng Luo
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Zhongjie Liang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China.,Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China.,State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| |
Collapse
|
5
|
Wang C, Tan X, Tang D, Gou Y, Han C, Ning W, Lin S, Zhang W, Chen M, Peng D, Xue Y. GPS-Uber: a hybrid-learning framework for prediction of general and E3-specific lysine ubiquitination sites. Brief Bioinform 2022; 23:6509047. [PMID: 35037020 DOI: 10.1093/bib/bbab574] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 12/11/2021] [Accepted: 12/14/2021] [Indexed: 12/13/2022] Open
Abstract
As an important post-translational modification, lysine ubiquitination participates in numerous biological processes and is involved in human diseases, whereas the site specificity of ubiquitination is mainly decided by ubiquitin-protein ligases (E3s). Although numerous ubiquitination predictors have been developed, computational prediction of E3-specific ubiquitination sites is still a great challenge. Here, we carefully reviewed the existing tools for the prediction of general ubiquitination sites. Also, we developed a tool named GPS-Uber for the prediction of general and E3-specific ubiquitination sites. From the literature, we manually collected 1311 experimentally identified site-specific E3-substrate relations, which were classified into different clusters based on corresponding E3s at different levels. To predict general ubiquitination sites, we integrated 10 types of sequence and structure features, as well as three types of algorithms including penalized logistic regression, deep neural network and convolutional neural network. Compared with other existing tools, the general model in GPS-Uber exhibited a highly competitive accuracy, with an area under curve values of 0.7649. Then, transfer learning was adopted for each E3 cluster to construct E3-specific models, and in total 112 individual E3-specific predictors were implemented. Using GPS-Uber, we conducted a systematic prediction of human cancer-associated ubiquitination events, which could be helpful for further experimental consideration. GPS-Uber will be regularly updated, and its online service is free for academic research at http://gpsuber.biocuckoo.cn/.
Collapse
Affiliation(s)
- Chenwei Wang
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Xiaodan Tan
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Dachao Tang
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yujie Gou
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Cheng Han
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Wanshan Ning
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Shaofeng Lin
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Weizhi Zhang
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Miaomiao Chen
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Di Peng
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yu Xue
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| |
Collapse
|
6
|
Luo Y, Jiang J, Zhu J, Huang Q, Li W, Wang Y, Gao Y. A Caps-Ubi Model for Protein Ubiquitination Site Prediction. FRONTIERS IN PLANT SCIENCE 2022; 13:884903. [PMID: 35693166 PMCID: PMC9175003 DOI: 10.3389/fpls.2022.884903] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Accepted: 04/26/2022] [Indexed: 05/12/2023]
Abstract
Ubiquitination, a widespread mechanism of regulating cellular responses in plants, is one of the most important post-translational modifications of proteins in many biological processes and is involved in the regulation of plant disease resistance responses. Predicting ubiquitination is an important technical method for plant protection. Traditional ubiquitination site determination methods are costly and time-consuming, while computational-based prediction methods can accurately and efficiently predict ubiquitination sites. At present, capsule networks and deep learning are used alone for prediction, and the effect is not obvious. The capsule network reflects the spatial position relationship of the internal features of the neural network, but it cannot identify long-distance dependencies or focus on amino acids in protein sequences or their degree of importance. In this study, we investigated the use of convolutional neural networks and capsule networks in deep learning to design a novel model "Caps-Ubi," first using the one-hot and amino acid continuous type hybrid encoding method to characterize ubiquitination sites. The sequence patterns, the dependencies between the encoded protein sequences and the important amino acids in the captured sequences, were then focused on the importance of amino acids in the sequences through the proposed Caps-Ubi model and used for multispecies ubiquitination site prediction. Through relevant experiments, the proposed Caps-Ubi method is superior to other similar methods in predicting ubiquitination sites.
Collapse
Affiliation(s)
- Yin Luo
- School of Life Sciences, East China Normal University, Shanghai, China
| | - Jiulei Jiang
- School of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, China
- *Correspondence: Jiulei Jiang,
| | - Jiajie Zhu
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Qiyi Huang
- School of Life Sciences, East China Normal University, Shanghai, China
- School of Computer Science and Engineering, North Minzu University, Yinchuan, China
| | - Weimin Li
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
- Weimin Li,
| | - Ying Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, China
| | - Yamin Gao
- School of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, China
| |
Collapse
|
7
|
Siraj A, Lim DY, Tayara H, Chong KT. UbiComb: A Hybrid Deep Learning Model for Predicting Plant-Specific Protein Ubiquitylation Sites. Genes (Basel) 2021; 12:genes12050717. [PMID: 34064731 PMCID: PMC8151217 DOI: 10.3390/genes12050717] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 05/06/2021] [Accepted: 05/07/2021] [Indexed: 12/11/2022] Open
Abstract
Protein ubiquitylation is an essential post-translational modification process that performs a critical role in a wide range of biological functions, even a degenerative role in certain diseases, and is consequently used as a promising target for the treatment of various diseases. Owing to the significant role of protein ubiquitylation, these sites can be identified by enzymatic approaches, mass spectrometry analysis, and combinations of multidimensional liquid chromatography and tandem mass spectrometry. However, these large-scale experimental screening techniques are time consuming, expensive, and laborious. To overcome the drawbacks of experimental methods, machine learning and deep learning-based predictors were considered for prediction in a timely and cost-effective manner. In the literature, several computational predictors have been published across species; however, predictors are species-specific because of the unclear patterns in different species. In this study, we proposed a novel approach for predicting plant ubiquitylation sites using a hybrid deep learning model by utilizing convolutional neural network and long short-term memory. The proposed method uses the actual protein sequence and physicochemical properties as inputs to the model and provides more robust predictions. The proposed predictor achieved the best result with accuracy values of 80% and 81% and F-scores of 79% and 82% on the 10-fold cross-validation and an independent dataset, respectively. Moreover, we also compared the testing of the independent dataset with popular ubiquitylation predictors; the results demonstrate that our model significantly outperforms the other methods in prediction classification results.
Collapse
Affiliation(s)
- Arslan Siraj
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Korea; (A.S.); (D.Y.L.)
| | - Dae Yeong Lim
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Korea; (A.S.); (D.Y.L.)
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Korea
- Correspondence: (H.T.); (K.T.C.)
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Korea; (A.S.); (D.Y.L.)
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Korea
- Correspondence: (H.T.); (K.T.C.)
| |
Collapse
|
8
|
Wang H, Wang Z, Li Z, Lee TY. Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites. Front Cell Dev Biol 2020; 8:572195. [PMID: 33102477 PMCID: PMC7554246 DOI: 10.3389/fcell.2020.572195] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Accepted: 08/24/2020] [Indexed: 12/17/2022] Open
Abstract
Protein ubiquitylation is an important posttranslational modification (PTM), which is involved in diverse biological processes and plays an essential role in the regulation of physiological mechanisms and diseases. The Protein Lysine Modifications Database (PLMD) has accumulated abundant ubiquitylated proteins with their substrate sites for more than 20 kinds of species. Numerous works have consequently developed a variety of ubiquitylation site prediction tools across all species, mainly relying on the predefined sequence features and machine learning algorithms. However, the difference in ubiquitylated patterns between these species stays unclear. In this work, the sequence-based characterization of ubiquitylated substrate sites has revealed remarkable differences among plants, animals, and fungi. Then an improved word-embedding scheme based on the transfer learning strategy was incorporated with the multilayer convolutional neural network (CNN) for identifying protein ubiquitylation sites. For the prediction of plant ubiquitylation sites, the proposed deep learning scheme could outperform the machine learning-based methods, with the accuracy of 75.6%, precision of 73.3%, recall of 76.7%, F-score of 0.7493, and 0.82 AUC on the independent testing set. Although the ubiquitylated specificity of substrate sites is complicated, this work has demonstrated that the application of the word-embedding method can enable the extraction of informative features and help the identification of ubiquitylated sites. To accelerate the investigation of protein ubiquitylation, the data sets and source code used in this study are freely available at https://github.com/wang-hong-fei/DL-plant-ubsites-prediction.
Collapse
Affiliation(s)
- Hongfei Wang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China
| | - Zhuo Wang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China.,School of Life Sciences, University of Science and Technology of China, Hefei, China
| | - Zhongyan Li
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China.,School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, China
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China.,School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, China
| |
Collapse
|
9
|
Fu H, Yang Y, Wang X, Wang H, Xu Y. DeepUbi: a deep learning framework for prediction of ubiquitination sites in proteins. BMC Bioinformatics 2019; 20:86. [PMID: 30777029 PMCID: PMC6379983 DOI: 10.1186/s12859-019-2677-9] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 02/12/2019] [Indexed: 01/22/2023] Open
Abstract
Background Protein ubiquitination occurs when the ubiquitin protein binds to a target protein residue of lysine (K), and it is an important regulator of many cellular functions, such as signal transduction, cell division, and immune reactions, in eukaryotes. Experimental and clinical studies have shown that ubiquitination plays a key role in several human diseases, and recent advances in proteomic technology have spurred interest in identifying ubiquitination sites. However, most current computing tools for predicting target sites are based on small-scale data and shallow machine learning algorithms. Results As more experimentally validated ubiquitination sites emerge, we need to design a predictor that can identify lysine ubiquitination sites in large-scale proteome data. In this work, we propose a deep learning predictor, DeepUbi, based on convolutional neural networks. Four different features are adopted from the sequences and physicochemical properties. In a 10-fold cross validation, DeepUbi obtains an AUC (area under the Receiver Operating Characteristic curve) of 0.9, and the accuracy, sensitivity and specificity exceeded 85%. The more comprehensive indicator, MCC, reaches 0.78. We also develop a software package that can be freely downloaded from https://github.com/Sunmile/DeepUbi. Conclusion Our results show that DeepUbi has excellent performance in predicting ubiquitination based on large data. Electronic supplementary material The online version of this article (10.1186/s12859-019-2677-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hongli Fu
- Department of Information and Computing Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Yingxi Yang
- Department of Information and Computing Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Xiaobo Wang
- Department of Information and Computing Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Hui Wang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
| | - Yan Xu
- Department of Information and Computing Science, University of Science and Technology Beijing, Beijing, 100083, China. .,Beijing Key Laboratory for Magneto-photoelectrical Composite and Interface Science, University of Science and Technology Beijing, Beijing, 100083, China.
| |
Collapse
|
10
|
He W, Wei L, Zou Q. Research progress in protein posttranslational modification site prediction. Brief Funct Genomics 2018; 18:220-229. [DOI: 10.1093/bfgp/ely039] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 11/15/2018] [Accepted: 11/22/2018] [Indexed: 01/24/2023] Open
Abstract
AbstractPosttranslational modifications (PTMs) play an important role in regulating protein folding, activity and function and are involved in almost all cellular processes. Identification of PTMs of proteins is the basis for elucidating the mechanisms of cell biology and disease treatments. Compared with the laboriousness of equivalent experimental work, PTM prediction using various machine-learning methods can provide accurate, simple and rapid research solutions and generate valuable information for further laboratory studies. In this review, we manually curate most of the bioinformatics tools published since 2008. We also summarize the approaches for predicting ubiquitination sites and glycosylation sites. Moreover, we discuss the challenges of current PTM bioinformatics tools and look forward to future research possibilities.
Collapse
Affiliation(s)
- Wenying He
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
11
|
He F, Wang R, Li J, Bao L, Xu D, Zhao X. Large-scale prediction of protein ubiquitination sites using a multimodal deep architecture. BMC SYSTEMS BIOLOGY 2018; 12:109. [PMID: 30463553 PMCID: PMC6249717 DOI: 10.1186/s12918-018-0628-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
BACKGROUND Ubiquitination, which is also called "lysine ubiquitination", occurs when an ubiquitin is attached to lysine (K) residues in targeting proteins. As one of the most important post translational modifications (PTMs), it plays the significant role not only in protein degradation, but also in other cellular functions. Thus, systematic anatomy of the ubiquitination proteome is an appealing and challenging research topic. The existing methods for identifying protein ubiquitination sites can be divided into two kinds: mass spectrometry and computational methods. Mass spectrometry-based experimental methods can discover ubiquitination sites from eukaryotes, but are time-consuming and expensive. Therefore, it is priority to develop computational approaches that can effectively and accurately identify protein ubiquitination sites. RESULTS The existing computational methods usually require feature engineering, which may lead to redundancy and biased representations. While deep learning is able to excavate underlying characteristics from large-scale training data via multiple-layer networks and non-linear mapping operations. In this paper, we proposed a deep architecture within multiple modalities to identify the ubiquitination sites. First, according to prior knowledge and biological knowledge, we encoded protein sequence fragments around candidate ubiquitination sites into three modalities, namely raw protein sequence fragments, physico-chemical properties and sequence profiles, and designed different deep network layers to extract the hidden representations from them. Then, the generative deep representations corresponding to three modalities were merged to build the final model. We performed our algorithm on the available largest scale protein ubiquitination sites database PLMD, and achieved 66.4% specificity, 66.7% sensitivity, 66.43% accuracy, and 0.221 MCC value. A number of comparative experiments also indicated that our multimodal deep architecture outperformed several popular protein ubiquitination site prediction tools. CONCLUSION The results of comparative experiments validated the effectiveness of our deep network and also displayed that our method outperformed several popular protein ubiquitination site prediction tools. The source codes of our proposed method are available at https://github.com/jiagenlee/deepUbiquitylation .
Collapse
Affiliation(s)
- Fei He
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China.,Institution of Computational Biology, Northeast Normal University, Changchun, 130117, China
| | - Rui Wang
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Jiagen Li
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Lingling Bao
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Dong Xu
- Department of Electrical Engineering and Computer Science Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
| | - Xiaowei Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China. .,Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, People's Republic of China.
| |
Collapse
|