1
|
Zahiri Z, Mehrshad N, Mehrshad M. DF-Phos: Prediction of Protein Phosphorylation Sites by Deep Forest. J Biochem 2024; 175:447-456. [PMID: 38153271 DOI: 10.1093/jb/mvad116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 12/10/2023] [Accepted: 12/12/2023] [Indexed: 12/29/2023] Open
Abstract
Phosphorylation is the most important and studied post-translational modification (PTM), which plays a crucial role in protein function studies and experimental design. Many significant studies have been performed to predict phosphorylation sites using various machine-learning methods. Recently, several studies have claimed that deep learning-based methods are the best way to predict the phosphorylation sites because deep learning as an advanced machine learning method can automatically detect complex representations of phosphorylation patterns from raw sequences and thus offers a powerful tool to improve phosphorylation site prediction. In this study, we report DF-Phos, a new phosphosite predictor based on the Deep Forest to predict phosphorylation sites. In DF-Phos, the feature vector taken from the CkSAApair method is as input for a Deep Forest framework for predicting phosphorylation sites. The results of 10-fold cross-validation show that the Deep Forest method has the highest performance among other available methods. We implemented a Python program of DF-Phos, which is freely available for non-commercial use at https://github.com/zahiriz/DF-Phos Moreover, users can use it for various PTM predictions.
Collapse
Affiliation(s)
- Zeynab Zahiri
- Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Nasser Mehrshad
- Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Maliheh Mehrshad
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, 750 07 Sweden
| |
Collapse
|
2
|
Deng Q, Zhang J, Liu J, Liu Y, Dai Z, Zou X, Li Z. Identifying Protein Phosphorylation Site-Disease Associations Based on Multi-Similarity Fusion and Negative Sample Selection by Convolutional Neural Network. Interdiscip Sci 2024:10.1007/s12539-024-00615-0. [PMID: 38457108 DOI: 10.1007/s12539-024-00615-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 01/26/2024] [Accepted: 01/29/2024] [Indexed: 03/09/2024]
Abstract
As one of the most important post-translational modifications (PTMs), protein phosphorylation plays a key role in a variety of biological processes. Many studies have shown that protein phosphorylation is associated with various human diseases. Therefore, identifying protein phosphorylation site-disease associations can help to elucidate the pathogenesis of disease and discover new drug targets. Networks of sequence similarity and Gaussian interaction profile kernel similarity were constructed for phosphorylation sites, as well as networks of disease semantic similarity, disease symptom similarity and Gaussian interaction profile kernel similarity were constructed for diseases. To effectively combine different phosphorylation sites and disease similarity information, random walk with restart algorithm was used to obtain the topology information of the network. Then, the diffusion component analysis method was utilized to obtain the comprehensive phosphorylation site similarity and disease similarity. Meanwhile, the reliable negative samples were screened based on the Euclidean distance method. Finally, a convolutional neural network (CNN) model was constructed to identify potential associations between phosphorylation sites and diseases. Based on tenfold cross-validation, the evaluation indicators were obtained including accuracy of 93.48%, specificity of 96.82%, sensitivity of 90.15%, precision of 96.62%, Matthew's correlation coefficient of 0.8719, area under the receiver operating characteristic curve of 0.9786 and area under the precision-recall curve of 0.9836. Additionally, most of the top 20 predicted disease-related phosphorylation sites (19/20 for Alzheimer's disease; 20/16 for neuroblastoma) were verified by literatures and databases. These results show that the proposed method has an outstanding prediction performance and a high practical value.
Collapse
Affiliation(s)
- Qian Deng
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China
| | - Jing Zhang
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China
| | - Jie Liu
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China
| | - Yuqi Liu
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China
| | - Zong Dai
- School of Biomedical Engineering, Sun Yat-Sen University, Guangzhou, 510275, China
| | - Xiaoyong Zou
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, China.
| | - Zhanchao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China.
| |
Collapse
|
3
|
Shrestha P, Kandel J, Tayara H, Chong KT. DL-SPhos: Prediction of serine phosphorylation sites using transformer language model. Comput Biol Med 2024; 169:107925. [PMID: 38183701 DOI: 10.1016/j.compbiomed.2024.107925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/21/2023] [Accepted: 01/01/2024] [Indexed: 01/08/2024]
Abstract
Serine phosphorylation plays a pivotal role in the pathogenesis of various cellular processes and diseases. Roughly 81% of human diseases have links to phosphorylation, and an overwhelming 86.4% of protein phosphorylation takes place at serine residues. In eukaryotes, over a quarter of proteins undergo phosphorylation, with more than half implicated in numerous disorders, notably cancer and reproductive system diseases. This study primarily focuses on serine-phosphorylation-driven pathogenesis and the critical role of conserved motif identification. While numerous techniques exist for predicting serine phosphorylation sites, traditional wet lab experiments are resource-intensive. Our paper introduces a cutting-edge deep learning tool for predicting S phosphorylation sites, integrating explainable AI for motif identification, a transformer language model, and deep neural network components. We trained our model on protein sequences from UniProt, validated it against the dbPTM benchmark dataset, and employed the PTMD dataset to explore motifs related to mammalian disorders. Our results highlight that our model surpasses other deep learning predictors by a significant 3%. Furthermore, we utilized the local interpretable model-agnostic explanations (LIME) approach to shed light on the predictions, emphasizing the amino acid residues crucial for S phosphorylation. Notably, our model also outperformed competitors in kinase-specific serine phosphorylation prediction on benchmark datasets.
Collapse
Affiliation(s)
- Palistha Shrestha
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea
| | - Jeevan Kandel
- Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea; Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea.
| |
Collapse
|
4
|
Yan Y, Wang D, Xin R, Soriano RA, Ng DCM, Wang W, Ping P. Protocol for the prediction, interpretation, and mutation evaluation of post-translational modification using MIND-S. STAR Protoc 2023; 4:102682. [PMID: 37979178 PMCID: PMC10694567 DOI: 10.1016/j.xpro.2023.102682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 09/06/2023] [Accepted: 10/10/2023] [Indexed: 11/20/2023] Open
Abstract
Post-translational modifications (PTMs) serve as key regulatory mechanisms in various cellular processes; altered PTMs can potentially lead to human diseases. We present a protocol for using MIND-S (multi-label interpretable deep-learning approach for PTM prediction-structure version), to study PTMs. This protocol consists of step-by-step guide and includes three key applications of MIND-S: PTM predictions based on protein sequences, important amino acids identification, and elucidation of altered PTM landscape resulting from molecular mutations. For complete details on the use and execution of this protocol, please refer to Yan et al (2023).1.
Collapse
Affiliation(s)
- Yu Yan
- NIH BRIDGE2AI Center at UCLA & NHLBI Integrated Cardiovascular Data Science Training Program at UCLA, Suite 1-609, MRL Building, 675 Charles E. Young Dr. South, Los Angeles, CA 90095-1760, USA; Medical Informatics Program, University of California at Los Angeles (UCLA), Los Angeles, CA 90095, USA; Department of Physiology, UCLA School of Medicine, Suite 1-609, MRL Building, 675 Charles E. Young Dr., Los Angeles, CA 90095-1760, USA
| | - Dean Wang
- NIH BRIDGE2AI Center at UCLA & NHLBI Integrated Cardiovascular Data Science Training Program at UCLA, Suite 1-609, MRL Building, 675 Charles E. Young Dr. South, Los Angeles, CA 90095-1760, USA; Department of Physiology, UCLA School of Medicine, Suite 1-609, MRL Building, 675 Charles E. Young Dr., Los Angeles, CA 90095-1760, USA
| | - Ruiqi Xin
- Computational and Systems Biology Interdepartmental Program (IDP), University of California at Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Raine A Soriano
- Department of Computer Science, UCLA School of Engineering, Los Angeles, CA 90095, USA
| | - Dominic C M Ng
- NIH BRIDGE2AI Center at UCLA & NHLBI Integrated Cardiovascular Data Science Training Program at UCLA, Suite 1-609, MRL Building, 675 Charles E. Young Dr. South, Los Angeles, CA 90095-1760, USA; Department of Physiology, UCLA School of Medicine, Suite 1-609, MRL Building, 675 Charles E. Young Dr., Los Angeles, CA 90095-1760, USA
| | - Wei Wang
- Scalable Analytics Institute (ScAi) at Department of Computer Science, UCLA School of Engineering, Los Angeles, CA 90095, USA; Department of Computer Science, UCLA School of Engineering, Los Angeles, CA 90095, USA
| | - Peipei Ping
- NIH BRIDGE2AI Center at UCLA & NHLBI Integrated Cardiovascular Data Science Training Program at UCLA, Suite 1-609, MRL Building, 675 Charles E. Young Dr. South, Los Angeles, CA 90095-1760, USA; Medical Informatics Program, University of California at Los Angeles (UCLA), Los Angeles, CA 90095, USA; Department of Physiology, UCLA School of Medicine, Suite 1-609, MRL Building, 675 Charles E. Young Dr., Los Angeles, CA 90095-1760, USA; Scalable Analytics Institute (ScAi) at Department of Computer Science, UCLA School of Engineering, Los Angeles, CA 90095, USA; Department of Medicine (Cardiology), UCLA School of Medicine, Suite 1-609, MRL Building, 675 Charles E. Young Dr. South, Los Angeles, CA 90095-1760, USA.
| |
Collapse
|
5
|
Pakhrin SC, Pokharel S, Pratyush P, Chaudhari M, Ismail HD, Kc DB. LMPhosSite: A Deep Learning-Based Approach for General Protein Phosphorylation Site Prediction Using Embeddings from the Local Window Sequence and Pretrained Protein Language Model. J Proteome Res 2023; 22:2548-2557. [PMID: 37459437 DOI: 10.1021/acs.jproteome.2c00667] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Phosphorylation is one of the most important post-translational modifications and plays a pivotal role in various cellular processes. Although there exist several computational tools to predict phosphorylation sites, existing tools have not yet harnessed the knowledge distilled by pretrained protein language models. Herein, we present a novel deep learning-based approach called LMPhosSite for the general phosphorylation site prediction that integrates embeddings from the local window sequence and the contextualized embedding obtained using global (overall) protein sequence from a pretrained protein language model to improve the prediction performance. Thus, the LMPhosSite consists of two base-models: one for capturing effective local representation and the other for capturing global per-residue contextualized embedding from a pretrained protein language model. The output of these base-models is integrated using a score-level fusion approach. LMPhosSite achieves a precision, recall, Matthew's correlation coefficient, and F1-score of 38.78%, 67.12%, 0.390, and 49.15%, for the combined serine and threonine independent test data set and 34.90%, 62.03%, 0.298, and 44.67%, respectively, for the tyrosine independent test data set, which is better than the compared approaches. These results demonstrate that LMPhosSite is a robust computational tool for the prediction of the general phosphorylation sites in proteins.
Collapse
Affiliation(s)
- Subash C Pakhrin
- School of Computing, Wichita State University, 1845 Fairmount St., Wichita, Kansas 67260, United States
- Department of Computer Science & Engineering Technology, University of Houston-Downtown, 1 Main St., Houston, Texas 77002, United States
| | - Suresh Pokharel
- Department of Computer Science, Michigan Technological University, Houghton, Michigan 49931, United States
| | - Pawel Pratyush
- Department of Computer Science, Michigan Technological University, Houghton, Michigan 49931, United States
| | - Meenal Chaudhari
- Department of Biology, North Carolina A&T State University, Greensboro, North Carolina 27411, United States
| | - Hamid D Ismail
- Department of Computer Science, Michigan Technological University, Houghton, Michigan 49931, United States
| | - Dukka B Kc
- Department of Computer Science, Michigan Technological University, Houghton, Michigan 49931, United States
| |
Collapse
|
6
|
Zhou H, Tan W, Shi S. DeepGpgs: a novel deep learning framework for predicting arginine methylation sites combined with Gaussian prior and gated self-attention mechanism. Brief Bioinform 2023; 24:7000314. [PMID: 36694944 DOI: 10.1093/bib/bbad018] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/26/2022] [Accepted: 01/04/2023] [Indexed: 01/26/2023] Open
Abstract
Protein arginine methylation is an important posttranslational modification (PTM) associated with protein functional diversity and pathological conditions including cancer. Identification of methylation binding sites facilitates a better understanding of the molecular function of proteins. Recent developments in the field of deep neural networks have led to a proliferation of deep learning-based methylation identification studies because of their fast and accurate prediction. In this paper, we propose DeepGpgs, an advanced deep learning model incorporating Gaussian prior and gated attention mechanism. We introduce a residual network channel to extract the evolutionary information of proteins. Then we combine the adaptive embedding with bidirectional long short-term memory networks to form a context-shared encoder layer. A gated multi-head attention mechanism is followed to obtain the global information about the sequence. A Gaussian prior is injected into the sequence to assist in predicting PTMs. We also propose a weighted joint loss function to alleviate the false negative problem. We empirically show that DeepGpgs improves Matthews correlation coefficient by 6.3% on the arginine methylation independent test set compared with the existing state-of-the-art methylation site prediction methods. Furthermore, DeepGpgs has good robustness in phosphorylation site prediction of SARS-CoV-2, which indicates that DeepGpgs has good transferability and the potential to be extended to other modification sites prediction. The open-source code and data of the DeepGpgs can be obtained from https://github.com/saizhou1/DeepGpgs.
Collapse
Affiliation(s)
- Haiwei Zhou
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Wenxi Tan
- School of Mathematical Sciences, Fudan University, Shanghai 200433, China
| | - Shaoping Shi
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China
| |
Collapse
|
7
|
Ahmed F, Dehzangi I, Hasan MM, Shatabda S. Accurately predicting microbial phosphorylation sites using evolutionary and structural features. Gene 2023; 851:146993. [DOI: 10.1016/j.gene.2022.146993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 10/05/2022] [Accepted: 10/14/2022] [Indexed: 11/27/2022]
|
8
|
Zhao J, Zhuang M, Liu J, Zhang M, Zeng C, Jiang B, Wu J, Song X. pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties. BMC Bioinformatics 2022; 23:399. [PMID: 36171552 PMCID: PMC9520798 DOI: 10.1186/s12859-022-04938-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 09/16/2022] [Indexed: 11/17/2022] Open
Abstract
Background Protein histidine phosphorylation (pHis) plays critical roles in prokaryotic signal transduction pathways and various eukaryotic cellular processes. It is estimated to account for 6–10% of the phosphoproteome, however only hundreds of pHis sites have been discovered to date. Due to the inherent disadvantages of experimental methods, it is an urgent task for developing efficient computational approaches to identify pHis sites. Results Here, we present a novel tool, pHisPred, for accurately identifying pHis sites from protein sequences. We manually collected the largest number of experimental validated pHis sites to build benchmark datasets. Using randomized tenfold CV, the weighted SVM-RBF model shows the best performance than other four commonly used classification models (LR, KNN, RF, and MLP). From ten thousands of features, 140 and 150 most informative features were individually selected out for eukaryotic and prokaryotic models. The average AUC and F1-score values of pHisPred were (0.81, 0.40) and (0.78, 0.46) for tenfold CV on the eukaryotic and prokaryotic training datasets, respectively. In addition, pHisPred significantly outperforms other tools on testing datasets, in particular on the eukaryotic one. Conclusion We implemented a python program of pHisPred, which is freely available for non-commercial use at https://github.com/xiaofengsong/pHisPred. Moreover, users can use it to train new models with their own data. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04938-x.
Collapse
Affiliation(s)
- Jian Zhao
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China
| | - Minhui Zhuang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China
| | - Jingjing Liu
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China
| | - Meng Zhang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China
| | - Cong Zeng
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China
| | - Bin Jiang
- College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Jing Wu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China.
| | - Xiaofeng Song
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China.
| |
Collapse
|
9
|
Mini-review: Recent advances in post-translational modification site prediction based on deep learning. Comput Struct Biotechnol J 2022; 20:3522-3532. [PMID: 35860402 PMCID: PMC9284371 DOI: 10.1016/j.csbj.2022.06.045] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/21/2022] [Accepted: 06/21/2022] [Indexed: 11/23/2022] Open
Abstract
Post-translational modifications (PTMs) are closely linked to numerous diseases, playing a significant role in regulating protein structures, activities, and functions. Therefore, the identification of PTMs is crucial for understanding the mechanisms of cell biology and diseases therapy. Compared to traditional machine learning methods, the deep learning approaches for PTM prediction provide accurate and rapid screening, guiding the downstream wet experiments to leverage the screened information for focused studies. In this paper, we reviewed the recent works in deep learning to identify phosphorylation, acetylation, ubiquitination, and other PTM types. In addition, we summarized PTM databases and discussed future directions with critical insights.
Collapse
Key Words
- AAindex, Amino acid index
- ATP, Adenosine triphosphate
- AUC, Area under curve
- Ac, Acetylation
- BE, Binary encoding
- BLOSUM, Blocks substitution matrix
- Bi-LSTM, Bidirectional LSTM
- CKSAAP, Composition of k-spaced amino acid Pairs
- CNN, Convolutional neural network
- CNNOH, CNN with the one-hot encoding
- CNNWE, CNN with the word-embedding encoding
- CNNrgb, CNN red green blue
- CV, Cross-validation
- DC-CNN, Densely connected convolutional neural network
- DL, Deep learning
- DNNs, Deep neural networks
- Deep learning
- E. coli, Escherichia coli
- EBGW, Encoding based on grouped weight
- EGAAC, Enhanced grouped amino acids content
- IG, Information gain
- K, Lysine
- KNN, k nearest neighbor
- LASSO, Least absolute shrinkage and selection operator
- LSTM, Long short-term memory
- LSTMWE, LSTM with the word-embedding encoding
- M.musculus, Mus musculus
- MDC, Modular densely connected convolutional networks
- MDCAN, Multilane dense convolutional attention network
- ML, Machine learning
- MLP, Multilayer perceptron
- MMI, Multivariate mutual information
- Machine learning
- Mass spectrometry
- NMBroto, Normalized Moreau-Broto autocorrelation
- P, Proline
- PSP, PhosphoSitePlus
- PSSM, Position-specific scoring matrix
- PTM, Post-translational modifications
- Ph, Phosphorylation
- Post-translational modification
- Prediction
- PseAAC, Pseudo-amino acid composition
- R, Arginine
- RF, Random forest
- RNN, Recurrent neural network
- ROC, Receiver operating characteristic
- S, Serine
- S. typhimurium, Salmonella typhimurium
- S.cerevisiae, Saccharomyces cerevisiae
- SE, Squeeze and excitation
- SEV, Split to Equal Validation
- ST, Source and target
- SUMO, Small ubiquitin-like modifier
- SVM, Support vector machines
- T, Threonine
- Ub, Ubiquitination
- Y, Tyrosine
- ZSL, Zero-shot learning
Collapse
|
10
|
Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2499:285-322. [PMID: 35696087 DOI: 10.1007/978-1-0716-2317-6_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Posttranslational modification (PTM ) is a ubiquitous phenomenon in both eukaryotes and prokaryotes which gives rise to enormous proteomic diversity. PTM mostly comes in two flavors: covalent modification to polypeptide chain and proteolytic cleavage. Understanding and characterization of PTM is a fundamental step toward understanding the underpinning of biology. Recent advances in experimental approaches, mainly mass-spectrometry-based approaches, have immensely helped in obtaining and characterizing PTMs. However, experimental approaches are not enough to understand and characterize more than 450 different types of PTMs and complementary computational approaches are becoming popular. Recently, due to the various advancements in the field of Deep Learning (DL), along with the explosion of applications of DL to various fields, the field of computational prediction of PTM has also witnessed the development of a plethora of deep learning (DL)-based approaches. In this book chapter, we first review some recent DL-based approaches in the field of PTM site prediction. In addition, we also review the recent advances in the not-so-studied PTM , that is, proteolytic cleavage predictions. We describe advances in PTM prediction by highlighting the Deep learning architecture, feature encoding, novelty of the approaches, and availability of the tools/approaches. Finally, we provide an outlook and possible future research directions for DL-based approaches for PTM prediction.
Collapse
|
11
|
Bhat GR, Sethi I, Rah B, Kumar R, Afroze D. Innovative in Silico Approaches for Characterization of Genes and Proteins. Front Genet 2022; 13:865182. [PMID: 35664302 PMCID: PMC9159363 DOI: 10.3389/fgene.2022.865182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Accepted: 04/11/2022] [Indexed: 11/13/2022] Open
Abstract
Bioinformatics is an amalgamation of biology, mathematics and computer science. It is a science which gathers the information from biology in terms of molecules and applies the informatic techniques to the gathered information for understanding and organizing the data in a useful manner. With the help of bioinformatics, the experimental data generated is stored in several databases available online like nucleotide database, protein databases, GENBANK and others. The data stored in these databases is used as reference for experimental evaluation and validation. Till now several online tools have been developed to analyze the genomic, transcriptomic, proteomics, epigenomics and metabolomics data. Some of them include Human Splicing Finder (HSF), Exonic Splicing Enhancer Mutation taster, and others. A number of SNPs are observed in the non-coding, intronic regions and play a role in the regulation of genes, which may or may not directly impose an effect on the protein expression. Many mutations are thought to influence the splicing mechanism by affecting the existing splice sites or creating a new sites. To predict the effect of mutation (SNP) on splicing mechanism/signal, HSF was developed. Thus, the tool is helpful in predicting the effect of mutations on splicing signals and can provide data even for better understanding of the intronic mutations that can be further validated experimentally. Additionally, rapid advancement in proteomics have steered researchers to organize the study of protein structure, function, relationships, and dynamics in space and time. Thus the effective integration of all of these technological interventions will eventually lead to steering up of next-generation systems biology, which will provide valuable biological insights in the field of research, diagnostic, therapeutic and development of personalized medicine.
Collapse
Affiliation(s)
- Gh. Rasool Bhat
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
| | - Itty Sethi
- Institute of Human Genetics, University of Jammu, Jammu, India
| | - Bilal Rah
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
| | - Rakesh Kumar
- School of Biotechnology, Shri Mata Vaishno Devi University, Katra, India
| | - Dil Afroze
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
- *Correspondence: Dil Afroze,
| |
Collapse
|
12
|
Wang X, Zhang Z, Zhang C, Meng X, Shi X, Qu P. TransPhos: A Deep-Learning Model for General Phosphorylation Site Prediction Based on Transformer-Encoder Architecture. Int J Mol Sci 2022; 23:ijms23084263. [PMID: 35457080 PMCID: PMC9029334 DOI: 10.3390/ijms23084263] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 04/04/2022] [Accepted: 04/09/2022] [Indexed: 02/06/2023] Open
Abstract
Protein phosphorylation is one of the most critical post-translational modifications of proteins in eukaryotes, which is essential for a variety of biological processes. Plenty of attempts have been made to improve the performance of computational predictors for phosphorylation site prediction. However, most of them are based on extra domain knowledge or feature selection. In this article, we present a novel deep learning-based predictor, named TransPhos, which is constructed using a transformer encoder and densely connected convolutional neural network blocks, for predicting phosphorylation sites. Data experiments are conducted on the datasets of PPA (version 3.0) and Phospho. ELM. The experimental results show that our TransPhos performs better than several deep learning models, including Convolutional Neural Networks (CNN), Long-term and short-term memory networks (LSTM), Recurrent neural networks (RNN) and Fully connected neural networks (FCNN), and some state-of-the-art deep learning-based prediction tools, including GPS2.1, NetPhos, PPRED, Musite, PhosphoSVM, SKIPHOS, and DeepPhos. Our model achieves a good performance on the training datasets of Serine (S), Threonine (T), and Tyrosine (Y), with AUC values of 0.8579, 0.8335, and 0.6953 using 10-fold cross-validation tests, respectively, and demonstrates that the presented TransPhos tool considerably outperforms competing predictors in general protein phosphorylation site prediction.
Collapse
Affiliation(s)
- Xun Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
- State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China
- Correspondence:
| | - Zhiyuan Zhang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| | - Chaogang Zhang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| | - Xiangyu Meng
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| | - Xin Shi
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| | - Peng Qu
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| |
Collapse
|
13
|
Khalili E, Ramazi S, Ghanati F, Kouchaki S. Predicting protein phosphorylation sites in soybean using interpretable deep tabular learning network. Brief Bioinform 2022; 23:bbac015. [PMID: 35152280 DOI: 10.1093/bib/bbac015] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/17/2021] [Accepted: 01/12/2022] [Indexed: 12/17/2023] Open
Abstract
Phosphorylation of proteins is one of the most significant post-translational modifications (PTMs) and plays a crucial role in plant functionality due to its impact on signaling, gene expression, enzyme kinetics, protein stability and interactions. Accurate prediction of plant phosphorylation sites (p-sites) is vital as abnormal regulation of phosphorylation usually leads to plant diseases. However, current experimental methods for PTM prediction suffers from high-computational cost and are error-prone. The present study develops machine learning-based prediction techniques, including a high-performance interpretable deep tabular learning network (TabNet) to improve the prediction of protein p-sites in soybean. Moreover, we use a hybrid feature set of sequential-based features, physicochemical properties and position-specific scoring matrices to predict serine (Ser/S), threonine (Thr/T) and tyrosine (Tyr/Y) p-sites in soybean for the first time. The experimentally verified p-sites data of soybean proteins are collected from the eukaryotic phosphorylation sites database and database post-translational modification. We then remove the redundant set of positive and negative samples by dropping protein sequences with >40% similarity. It is found that the developed techniques perform >70% in terms of accuracy. The results demonstrate that the TabNet model is the best performing classifier using hybrid features and with window size of 13, resulted in 78.96 and 77.24% sensitivity and specificity, respectively. The results indicate that the TabNet method has advantages in terms of high-performance and interpretability. The proposed technique can automatically analyze the data without any measurement errors and any human intervention. Furthermore, it can be used to predict putative protein p-sites in plants effectively. The collected dataset and source code are publicly deposited at https://github.com/Elham-khalili/Soybean-P-sites-Prediction.
Collapse
Affiliation(s)
- Elham Khalili
- Department of Plant Science, Faculty of Science, Tarbiat Modarres University, Tehran, Iran
| | - Shahin Ramazi
- Department of Biophysics, Faculty of Biological Science, Tarbiat Modares University, Tehran, Iran
| | - Faezeh Ghanati
- Department of Plant Science, Faculty of Science, Tarbiat Modarres University, Tehran, Iran
| | - Samaneh Kouchaki
- Department of Electrical and Electronic Engineering, .Faculty of Engineering and Physical Sciences, Centre for Vision, Speech, and Signal Processing, University of Surrey, Guildford, UK
| |
Collapse
|
14
|
A Transfer-Learning-Based Deep Convolutional Neural Network for Predicting Leukemia-Related Phosphorylation Sites from Protein Primary Sequences. Int J Mol Sci 2022; 23:ijms23031741. [PMID: 35163663 PMCID: PMC8915183 DOI: 10.3390/ijms23031741] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 01/27/2022] [Accepted: 01/29/2022] [Indexed: 12/27/2022] Open
Abstract
As one of the most important post-translational modifications (PTMs), phosphorylation refers to the binding of a phosphate group with amino acid residues like Ser (S), Thr (T) and Tyr (Y) thus resulting in diverse functions at the molecular level. Abnormal phosphorylation has been proved to be closely related with human diseases. To our knowledge, no research has been reported describing specific disease-associated phosphorylation sites prediction which is of great significance for comprehensive understanding of disease mechanism. In this work, focusing on three types of leukemia, we aim to develop a reliable leukemia-related phosphorylation site prediction models by combing deep convolutional neural network (CNN) with transfer-learning. CNN could automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of leukemia-related phosphorylation site prediction. With the largest dataset of myelogenous leukemia, the optimal models for S/T/Y phosphorylation sites give the AUC values of 0.8784, 0.8328 and 0.7716 respectively. When transferred learning on the small size datasets, the models for T-cell and lymphoid leukemia also give the promising performance by common sharing the optimal parameters. Compared with other five machine-learning methods, our CNN models reveal the superior performance. Finally, the leukemia-related pathogenesis analysis and distribution analysis on phosphorylated proteins along with K-means clustering analysis and position-specific conversation profiles on the phosphorylation site all indicate the strong practical feasibility of our easy-to-use CNN models.
Collapse
|
15
|
In Silico Prediction of the Phosphorylation of NS3 as an Essential Mechanism for Dengue Virus Replication and the Antiviral Activity of Quercetin. BIOLOGY 2021; 10:biology10101067. [PMID: 34681164 PMCID: PMC8570334 DOI: 10.3390/biology10101067] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 10/09/2021] [Accepted: 10/11/2021] [Indexed: 11/25/2022]
Abstract
Simple Summary Dengue is a mosquito-borne virus that infects up to 400 million people worldwide annually. Dengue infection triggers high fever, severe body aches, rash, low platelet count, and could lead to Dengue hemorrhagic fever (DHF) in some cases. There is currently no cure, nor a broadly effective vaccine. The interaction of two viral proteins, nonstructural Proteins 3 and 5 (NS3 and NS5), is required for viral replication in the infected host’s cells. Our computational modeling of NS3 suggested that phosphorylation of a serine residue at position 137 of NS3 by a specific c-Jun N-terminal kinase (JNK) enhances viral replication by increasing the interaction of NS3 and NS5 through structural changes in amino acid residues 49–95. Experimental studies have shown that inhibition of JNK prevents viral replication and have suggested that the plants’ flavonoid Quercetin, Agathis flavone, and Myricetin inhibit Dengue infection. Our molecular simulations revealed that Quercetin binds NS3 and obstructs serine 137 phosphorylation, which may decrease viral replication. This work offers a molecular mechanism that can be used for anti-Dengue drug development. Abstract Dengue virus infection is a global health problem for which there have been challenges to obtaining a cure. Current vaccines and anti-viral drugs can only be narrowly applied in ongoing clinical trials. We employed computational methods based on structure-function relationships between human host kinases and viral nonstructural protein 3 (NS3) to understand viral replication inhibitors’ therapeutic effect. Phosphorylation at each of the two most evolutionarily conserved sites of NS3, serine 137 and threonine 189, compared to the unphosphorylated state were studied with molecular dynamics and docking simulations. The simulations suggested that phosphorylation at serine 137 caused a more remarkable structural change than phosphorylation at threonine 189, specifically located at amino acid residues 49–95. Docking studies supported the idea that phosphorylation at serine 137 increased the binding affinity between NS3 and nonstructural Protein 5 (NS5), whereas phosphorylation at threonine 189 decreased it. The interaction between NS3 and NS5 is essential for viral replication. Docking studies with the antiviral plant flavonoid Quercetin with NS3 indicated that Quercetin physically occluded the serine 137 phosphorylation site. Taken together, these findings suggested a specific site and mechanism by which Quercetin inhibits dengue and possible other flaviviruses.
Collapse
|
16
|
Yang H, Wang M, Liu X, Zhao XM, Li A. PhosIDN: an integrated deep neural network for improving protein phosphorylation site prediction by combining sequence and protein-protein interaction information. Bioinformatics 2021; 37:4668-4676. [PMID: 34320631 PMCID: PMC8665744 DOI: 10.1093/bioinformatics/btab551] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 06/22/2021] [Accepted: 07/27/2021] [Indexed: 11/29/2022] Open
Abstract
Motivation Phosphorylation is one of the most studied post-translational modifications, which plays a pivotal role in various cellular processes. Recently, deep learning methods have achieved great success in prediction of phosphorylation sites, but most of them are based on convolutional neural network that may not capture enough information about long-range dependencies between residues in a protein sequence. In addition, existing deep learning methods only make use of sequence information for predicting phosphorylation sites, and it is highly desirable to develop a deep learning architecture that can combine heterogeneous sequence and protein–protein interaction (PPI) information for more accurate phosphorylation site prediction. Results We present a novel integrated deep neural network named PhosIDN, for phosphorylation site prediction by extracting and combining sequence and PPI information. In PhosIDN, a sequence feature encoding sub-network is proposed to capture not only local patterns but also long-range dependencies from protein sequences. Meanwhile, useful PPI features are also extracted in PhosIDN by a PPI feature encoding sub-network adopting a multi-layer deep neural network. Moreover, to effectively combine sequence and PPI information, a heterogeneous feature combination sub-network is introduced to fully exploit the complex associations between sequence and PPI features, and their combined features are used for final prediction. Comprehensive experiment results demonstrate that the proposed PhosIDN significantly improves the prediction performance of phosphorylation sites and compares favorably with existing general and kinase-specific phosphorylation site prediction methods. Availability and implementation PhosIDN is freely available at https://github.com/ustchangyuanyang/PhosIDN. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hangyuan Yang
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH230027, China
| | - Xia Liu
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence and Frontiers Center for Brain Science, China.,Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH230027, China
| |
Collapse
|
17
|
Computational Phosphorylation Network Reconstruction: An Update on Methods and Resources. Methods Mol Biol 2021. [PMID: 34270057 DOI: 10.1007/978-1-0716-1625-3_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
Most proteins undergo some form of modification after translation, and phosphorylation is one of the most relevant and ubiquitous post-translational modifications. The succession of protein phosphorylation and dephosphorylation catalyzed by protein kinase and phosphatase, respectively, constitutes a key mechanism of molecular information flow in cellular systems. The protein interactions of kinases, phosphatases, and their regulatory subunits and substrates are the main part of phosphorylation networks. To elucidate the landscape of phosphorylation events has been a central goal pursued by both experimental and computational approaches. Substrate specificity (e.g., sequence, structure) or the phosphoproteome has been utilized in an array of different statistical learning methods to infer phosphorylation networks. In this chapter, different computational phosphorylation network inference-related methods and resources are summarized and discussed.
Collapse
|
18
|
Cabral AD, Radu TB, de Araujo ED, Gunning PT. Optical chemosensors for the detection of proximally phosphorylated peptides and proteins. RSC Chem Biol 2021; 2:815-829. [PMID: 34458812 PMCID: PMC8341930 DOI: 10.1039/d1cb00055a] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 04/19/2021] [Indexed: 12/31/2022] Open
Abstract
Proximal multi-site phosphorylation is a critical post-translational modification in protein biology. The additive effects of multiple phosphosite clusters in close spatial proximity triggers integrative and cooperative effects on protein conformation and activity. Proximal phosphorylation has been shown to modulate signal transduction pathways and gene expression, and as a result, is implicated in a broad range of disease states through altered protein function and/or localization including enzyme overactivation or protein aggregation. The role of proximal multi-phosphorylation events is becoming increasingly recognized as mechanistically important, although breakthroughs are limited due to a lack of detection technologies. To date, there is a limited selection of facile and robust sensing tools for proximal phosphorylation. Nonetheless, there have been considerable efforts in developing optical chemosensors for the detection of proximal phosphorylation motifs on peptides and proteins in recent years. This review provides a comprehensive overview of optical chemosensors for proximal phosphorylation, with the majority of work being reported in the past two decades. Optical sensors, in the form of fluorescent and luminescent chemosensors, hybrid biosensors, and inorganic nanoparticles, are described. Emphasis is placed on the rationale behind sensor scaffolds, relevant protein motifs, and applications in protein biology.
Collapse
Affiliation(s)
- Aaron D Cabral
- Department of Chemical and Physical Sciences, University of Toronto Mississauga 3359 Mississauga Road Mississauga Ontario L5L 1C6 Canada
- Department of Chemistry, University of Toronto 80 St George Street Toronto Ontario M5S 3H6 Canada
| | - Tudor B Radu
- Department of Chemical and Physical Sciences, University of Toronto Mississauga 3359 Mississauga Road Mississauga Ontario L5L 1C6 Canada
- Department of Chemistry, University of Toronto 80 St George Street Toronto Ontario M5S 3H6 Canada
| | - Elvin D de Araujo
- Department of Chemical and Physical Sciences, University of Toronto Mississauga 3359 Mississauga Road Mississauga Ontario L5L 1C6 Canada
| | - Patrick T Gunning
- Department of Chemical and Physical Sciences, University of Toronto Mississauga 3359 Mississauga Road Mississauga Ontario L5L 1C6 Canada
- Department of Chemistry, University of Toronto 80 St George Street Toronto Ontario M5S 3H6 Canada
| |
Collapse
|
19
|
Jamal S, Ali W, Nagpal P, Grover A, Grover S. Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins. J Transl Med 2021; 19:218. [PMID: 34030700 PMCID: PMC8142496 DOI: 10.1186/s12967-021-02851-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 04/18/2021] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Post-translational modification (PTM) is a biological process that alters proteins and is therefore involved in the regulation of various cellular activities and pathogenesis. Protein phosphorylation is an essential process and one of the most-studied PTMs: it occurs when a phosphate group is added to serine (Ser, S), threonine (Thr, T), or tyrosine (Tyr, Y) residue. Dysregulation of protein phosphorylation can lead to various diseases-most commonly neurological disorders, Alzheimer's disease, and Parkinson's disease-thus necessitating the prediction of S/T/Y residues that can be phosphorylated in an uncharacterized amino acid sequence. Despite a surplus of sequencing data, current experimental methods of PTM prediction are time-consuming, costly, and error-prone, so a number of computational methods have been proposed to replace them. However, phosphorylation prediction remains limited, owing to substrate specificity, performance, and the diversity of its features. METHODS In the present study we propose machine-learning-based predictors that use the physicochemical, sequence, structural, and functional information of proteins to classify S/T/Y phosphorylation sites. Rigorous feature selection, the minimum redundancy/maximum relevance approach, and the symmetrical uncertainty method were employed to extract the most informative features to train the models. RESULTS The RF and SVM models generated using diverse feature types in the present study were highly accurate as is evident from good values for different statistical measures. Moreover, independent test sets and benchmark validations indicated that the proposed method clearly outperformed the existing methods, demonstrating its ability to accurately predict protein phosphorylation. CONCLUSIONS The results obtained in the present work indicate that the proposed computational methodology can be effectively used for predicting putative phosphorylation sites further facilitating discovery of various biological processes mechanisms.
Collapse
Affiliation(s)
- Salma Jamal
- JH-Institute of Molecular Medicine, Jamia Hamdard, New Delhi, India
| | - Waseem Ali
- JH-Institute of Molecular Medicine, Jamia Hamdard, New Delhi, India
| | - Priya Nagpal
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, India
| | - Abhinav Grover
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, India.
| | - Sonam Grover
- JH-Institute of Molecular Medicine, Jamia Hamdard, New Delhi, India.
| |
Collapse
|
20
|
Ramazi S, Zahiri J. Posttranslational modifications in proteins: resources, tools and prediction methods. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6214407. [PMID: 33826699 DOI: 10.1093/database/baab012] [Citation(s) in RCA: 262] [Impact Index Per Article: 87.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 02/20/2021] [Indexed: 12/21/2022]
Abstract
Posttranslational modifications (PTMs) refer to amino acid side chain modification in some proteins after their biosynthesis. There are more than 400 different types of PTMs affecting many aspects of protein functions. Such modifications happen as crucial molecular regulatory mechanisms to regulate diverse cellular processes. These processes have a significant impact on the structure and function of proteins. Disruption in PTMs can lead to the dysfunction of vital biological processes and hence to various diseases. High-throughput experimental methods for discovery of PTMs are very laborious and time-consuming. Therefore, there is an urgent need for computational methods and powerful tools to predict PTMs. There are vast amounts of PTMs data, which are publicly accessible through many online databases. In this survey, we comprehensively reviewed the major online databases and related tools. The current challenges of computational methods were reviewed in detail as well.
Collapse
Affiliation(s)
- Shahin Ramazi
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences Tarbiat Modares University, Jalal Ale Ahmad Highway, P.O. Box: 14115-111, Tehran, Iran
| | - Javad Zahiri
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences Tarbiat Modares University, Jalal Ale Ahmad Highway, P.O. Box: 14115-111, Tehran, Iran
- Department of Neuroscience, University of California San Diego, La Jolla, CA, USA
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
21
|
PhosR enables processing and functional analysis of phosphoproteomic data. Cell Rep 2021; 34:108771. [PMID: 33626354 DOI: 10.1016/j.celrep.2021.108771] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Revised: 12/07/2020] [Accepted: 01/28/2021] [Indexed: 02/08/2023] Open
Abstract
Mass spectrometry (MS)-based phosphoproteomics has revolutionized our ability to profile phosphorylation-based signaling in cells and tissues on a global scale. To infer the action of kinases and signaling pathways in phosphoproteomic experiments, we present PhosR, a set of tools and methodologies implemented in a suite of R packages facilitating comprehensive analysis of phosphoproteomic data. By applying PhosR to both published and new phosphoproteomic datasets, we demonstrate capabilities in data imputation and normalization by using a set of "stably phosphorylated sites" and in functional analysis for inferring active kinases and signaling pathways. In particular, we introduce a "signalome" construction method for identifying a collection of signaling modules to summarize and visualize the interaction of kinases and their collective actions on signal transduction. Together, our data and findings demonstrate the utility of PhosR in processing and generating biological knowledge from MS-based phosphoproteomic data.
Collapse
|
22
|
Guo L, Wang Y, Xu X, Cheng KK, Long Y, Xu J, Li S, Dong J. DeepPSP: A Global-Local Information-Based Deep Neural Network for the Prediction of Protein Phosphorylation Sites. J Proteome Res 2020; 20:346-356. [PMID: 33241931 DOI: 10.1021/acs.jproteome.0c00431] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Identification of phosphorylation sites is an important step in the function study and drug design of proteins. In recent years, there have been increasing applications of the computational method in the identification of phosphorylation sites because of its low cost and high speed. Most of the currently available methods focus on using local information around potential phosphorylation sites for prediction and do not take the global information of the protein sequence into consideration. Here, we demonstrated that the global information of protein sequences may be also critical for phosphorylation site prediction. In this paper, a new deep neural network model, called DeepPSP, was proposed for the prediction of protein phosphorylation sites. In the DeepPSP model, two parallel modules were introduced to extract both local and global features from protein sequences. Two squeeze-and-excitation blocks and one bidirectional long short-term memory block were introduced into each module to capture effective representations of the sequences. Comparative studies were carried out to evaluate the performance of DeepPSP, and four other prediction methods using public data sets The F1-score, area under receiver operating characteristic curves (AUROC), and area under precision-recall curves (AUPRC) of DeepPSP were found to be 0.4819, 0.82, and 0.50, respectively, for S/T general site prediction and 0.4206, 0.73, and 0.39, respectively, for Y general site prediction. Compared with the MusiteDeep method, the F1-score, AUROC, and AUPRC of DeepPSP were found to increase by 8.6, 2.5, and 8.7%, respectively, for S/T general site prediction and by 20.6, 5.8, and 18.2%, respectively, for Y general site prediction. Among the tested methods, the developed DeepPSP method was also found to produce best results for different kinase-specific site predictions including CDK, mitogen-activated protein kinase, CAMK, AGC, and CMGC. Taken together, the developed DeepPSP method may offer a more accurate phosphorylation site prediction by including global information. It may serve as an alternative model with better performance and interpretability for protein phosphorylation site prediction.
Collapse
Affiliation(s)
- Lei Guo
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Yongpei Wang
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Xiangnan Xu
- School of Mathematics and Statistics, The University of Sydney, Sydeny, New South Wales 2006, Australia
| | - Kian-Kai Cheng
- Innovation Centre in Agritechnology, Universiti Teknologi Malaysia, Muar, Johor 84600, Malaysia
| | - Yichi Long
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Jingjing Xu
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Sanshu Li
- Institute of Genomics, Medical School, Huaqiao University, Xiamen 361021, China
| | - Jiyang Dong
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| |
Collapse
|
23
|
GasPhos: Protein Phosphorylation Site Prediction Using a New Feature Selection Approach with a GA-Aided Ant Colony System. Int J Mol Sci 2020; 21:ijms21217891. [PMID: 33114312 PMCID: PMC7660635 DOI: 10.3390/ijms21217891] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 10/20/2020] [Accepted: 10/20/2020] [Indexed: 02/06/2023] Open
Abstract
Protein phosphorylation is one of the most important post-translational modifications, and many biological processes are related to phosphorylation, such as DNA repair, transcriptional regulation and signal transduction and, therefore, abnormal regulation of phosphorylation usually causes diseases. If we can accurately predict human phosphorylation sites, this could help to solve human diseases. Therefore, we developed a kinase-specific phosphorylation prediction system, GasPhos, and proposed a new feature selection approach, called Gas, based on the ant colony system and a genetic algorithm and used performance evaluation strategies focused on different kinases to choose the best learning model. Gas uses the mean decrease Gini index (MDGI) as a heuristic value for path selection and adopts binary transformation strategies and new state transition rules. GasPhos can predict phosphorylation sites for six kinases and showed better performance than other phosphorylation prediction tools. The disease-related phosphorylated proteins that were predicted with GasPhos are also discussed. Finally, Gas can be applied to other issues that require feature selection, which could help to improve prediction performance.
Collapse
|
24
|
Zhou XX, Bracken CJ, Zhang K, Zhou J, Mou Y, Wang L, Cheng Y, Leung KK, Wells JA. Targeting Phosphotyrosine in Native Proteins with Conditional, Bispecific Antibody Traps. J Am Chem Soc 2020; 142:17703-17713. [PMID: 32924468 DOI: 10.1021/jacs.0c08458] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Engineering sequence-specific antibodies (Abs) against phosphotyrosine (pY) motifs embedded in folded polypeptides remains highly challenging because of the stringent requirement for simultaneous recognition of the pY motif and the surrounding folded protein epitope. Here, we present a method named phosphotyrosine Targeting by Recombinant Ab Pair, or pY-TRAP, for in vitro engineering of binders for native pY proteins. Specifically, we create the pY protein by unnatural amino acid misincorporation, mutagenize a universal pY-binding Ab to create a first binder B1 for the pY motif on the pY protein, and then select against the B1-pY protein complex for a second binder B2 that recognizes the composite epitope of B1 and the pY-containing protein complex. We applied pY-TRAP to create highly specific binders to folded Ub-pY59, a rarely studied Ub phosphoform exclusively observed in cancerous tissues, and ZAP70-pY248, a kinase phosphoform regulated in feedback signaling pathways in T cells. The pY-TRAPs do not have detectable binding to wild-type proteins or to other pY peptides or proteins tested. This pY-TRAP approach serves as a generalizable method for engineering sequence-specific Ab binders to native pY proteins.
Collapse
Affiliation(s)
- Xin X Zhou
- Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94158, United States
| | - Colton J Bracken
- Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94158, United States
| | - Kaihua Zhang
- Department of Biochemistry and Biophysics, University of California, San Francisco, California 94158, United States
| | - Jie Zhou
- Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94158, United States
| | - Yun Mou
- Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94158, United States
| | - Lei Wang
- Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94158, United States
| | - Yifan Cheng
- Department of Biochemistry and Biophysics, University of California, San Francisco, California 94158, United States.,Howard Hughes Medical Institute, University of California, San Francisco, California 94158, United States
| | - Kevin K Leung
- Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94158, United States
| | - James A Wells
- Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94158, United States.,Chan Zuckerberg Biohub, San Francisco, California 94158, United States.,Department of Cellular and Molecular Pharmacology, University of California, San Francisco, California 94158, United States
| |
Collapse
|
25
|
Ahmed S, Kabir M, Arif M, Khan ZU, Yu DJ. DeepPPSite: A deep learning-based model for analysis and prediction of phosphorylation sites using efficient sequence information. Anal Biochem 2020; 612:113955. [PMID: 32949607 DOI: 10.1016/j.ab.2020.113955] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 08/30/2020] [Accepted: 09/11/2020] [Indexed: 12/29/2022]
Abstract
Phosphorylation is a ubiquitous type of post-translational modification (PTM) that occurs in both eukaryotic and prokaryotic cells where in a phosphate group binds with amino acid residues. These specific residues, i.e., serine (S), threonine (T), and tyrosine (Y), exhibit diverse functions at the molecular level. Recent studies have determined that some diseases such as cancer, diabetes, and neurodegenerative diseases are caused by abnormal phosphorylation. Based on its potential applications in biological research and drug development, the large-scale identification of phosphorylation sites has attracted interest. Existing wet-lab technologies for targeting phosphorylation sites are overpriced and time consuming. Thus, computational algorithms that can efficiently accelerate the annotation of phosphorylation sites from massive protein sequences are needed. Numerous machine learning-based methods have been implemented for phosphorylation sites prediction. However, despite extensive efforts, existing computational approaches continue to have inadequate performance, particularly in terms of overall ACC, MCC, and AUC. In this paper, we report a novel deep learning-based predictor to overcome these performance hurdles, DeepPPSite, which was constructed using a stacked long short-term memory recurrent network for predicting phosphorylation sites. The proposed technique expediently learns the protein representations from conjoint protein descriptors. The experimental results indicated that our model achieved superior performance on the training dataset for S, T and Y, with MCC values of 0.608, 0.602, and 0.558, respectively, using a 10-fold cross-validation test. We further determined the generalization efficacy of the proposed predictor DeepPPSite by conducting a rigorous independent test. The predictive MCC values were 0.358, 0.356, and 0.350 for the S, T, and Y phosphorylation sites, respectively. Rigorous cross-validation and independent validation tests for the three types of phosphorylation sites demonstrated that the designed DeepPPSite tool significantly outperforms state-of-the-art methods.
Collapse
Affiliation(s)
- Saeed Ahmed
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| | - Muhammad Kabir
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| | - Muhammad Arif
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| | - Zaheer Ullah Khan
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China.
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| |
Collapse
|
26
|
Savage SR, Zhang B. Using phosphoproteomics data to understand cellular signaling: a comprehensive guide to bioinformatics resources. Clin Proteomics 2020; 17:27. [PMID: 32676006 PMCID: PMC7353784 DOI: 10.1186/s12014-020-09290-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 07/04/2020] [Indexed: 12/19/2022] Open
Abstract
Mass spectrometry-based phosphoproteomics is becoming an essential methodology for the study of global cellular signaling. Numerous bioinformatics resources are available to facilitate the translation of phosphopeptide identification and quantification results into novel biological and clinical insights, a critical step in phosphoproteomics data analysis. These resources include knowledge bases of kinases and phosphatases, phosphorylation sites, kinase inhibitors, and sequence variants affecting kinase function, and bioinformatics tools that can predict phosphorylation sites in addition to the kinase that phosphorylates them, infer kinase activity, and predict the effect of mutations on kinase signaling. However, these resources exist in silos and it is challenging to select among multiple resources with similar functions. Therefore, we put together a comprehensive collection of resources related to phosphoproteomics data interpretation, compared the use of tools with similar functions, and assessed the usability from the standpoint of typical biologists or clinicians. Overall, tools could be improved by standardization of enzyme names, flexibility of data input and output format, consistent maintenance, and detailed manuals.
Collapse
Affiliation(s)
- Sara R. Savage
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN USA
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX USA
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX USA
| |
Collapse
|
27
|
Xu ZC, Feng PM, Yang H, Qiu WR, Chen W, Lin H. iRNAD: a computational tool for identifying D modification sites in RNA sequence. Bioinformatics 2020; 35:4922-4929. [PMID: 31077296 DOI: 10.1093/bioinformatics/btz358] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 03/01/2019] [Accepted: 04/27/2019] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION Dihydrouridine (D) is a common RNA post-transcriptional modification found in eukaryotes, bacteria and a few archaea. The modification can promote the conformational flexibility of individual nucleotide bases. And its levels are increased in cancerous tissues. Therefore, it is necessary to detect D in RNA for further understanding its functional roles. Since wet-experimental techniques for the aim are time-consuming and laborious, it is urgent to develop computational models to identify D modification sites in RNA. RESULTS We constructed a predictor, called iRNAD, for identifying D modification sites in RNA sequence. In this predictor, the RNA samples derived from five species were encoded by nucleotide chemical property and nucleotide density. Support vector machine was utilized to perform the classification. The final model could produce the overall accuracy of 96.18% with the area under the receiver operating characteristic curve of 0.9839 in jackknife cross-validation test. Furthermore, we performed a series of validations from several aspects and demonstrated the robustness and reliability of the proposed model. AVAILABILITY AND IMPLEMENTATION A user-friendly web-server called iRNAD can be freely accessible at http://lin-group.cn/server/iRNAD, which will provide convenience and guide to users for further studying D modification.
Collapse
Affiliation(s)
- Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Peng-Mian Feng
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
28
|
Luo F, Wang M, Liu Y, Zhao XM, Li A. DeepPhos: prediction of protein phosphorylation sites with deep learning. Bioinformatics 2020; 35:2766-2773. [PMID: 30601936 PMCID: PMC6691328 DOI: 10.1093/bioinformatics/bty1051] [Citation(s) in RCA: 105] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 11/19/2018] [Accepted: 12/12/2018] [Indexed: 11/28/2022] Open
Abstract
Motivation Phosphorylation is the most studied post-translational modification, which is crucial for multiple biological processes. Recently, many efforts have been taken to develop computational predictors for phosphorylation site prediction, but most of them are based on feature selection and discriminative classification. Thus, it is useful to develop a novel and highly accurate predictor that can unveil intricate patterns automatically for protein phosphorylation sites. Results In this study we present DeepPhos, a novel deep learning architecture for prediction of protein phosphorylation. Unlike multi-layer convolutional neural networks, DeepPhos consists of densely connected convolutional neuron network blocks which can capture multiple representations of sequences to make final phosphorylation prediction by intra block concatenation layers and inter block concatenation layers. DeepPhos can also be used for kinase-specific prediction varying from group, family, subfamily and individual kinase level. The experimental results demonstrated that DeepPhos outperforms competitive predictors in general and kinase-specific phosphorylation site prediction. Availability and implementation The source code of DeepPhos is publicly deposited at https://github.com/USTCHIlab/DeepPhos. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fenglin Luo
- School of Information Science and Technology
| | - Minghui Wang
- School of Information Science and Technology.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH, China
| | - Yu Liu
- School of Information Science and Technology
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Ao Li
- School of Information Science and Technology.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH, China
| |
Collapse
|
29
|
Deznabi I, Arabaci B, Koyutürk M, Tastan O. DeepKinZero: zero-shot learning for predicting kinase-phosphosite associations involving understudied kinases. Bioinformatics 2020; 36:3652-3661. [PMID: 32044914 PMCID: PMC7320620 DOI: 10.1093/bioinformatics/btaa013] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 12/17/2019] [Accepted: 01/06/2020] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Protein phosphorylation is a key regulator of protein function in signal transduction pathways. Kinases are the enzymes that catalyze the phosphorylation of other proteins in a target-specific manner. The dysregulation of phosphorylation is associated with many diseases including cancer. Although the advances in phosphoproteomics enable the identification of phosphosites at the proteome level, most of the phosphoproteome is still in the dark: more than 95% of the reported human phosphosites have no known kinases. Determining which kinase is responsible for phosphorylating a site remains an experimental challenge. Existing computational methods require several examples of known targets of a kinase to make accurate kinase-specific predictions, yet for a large body of kinases, only a few or no target sites are reported. RESULTS We present DeepKinZero, the first zero-shot learning approach to predict the kinase acting on a phosphosite for kinases with no known phosphosite information. DeepKinZero transfers knowledge from kinases with many known target phosphosites to those kinases with no known sites through a zero-shot learning model. The kinase-specific positional amino acid preferences are learned using a bidirectional recurrent neural network. We show that DeepKinZero achieves significant improvement in accuracy for kinases with no known phosphosites in comparison to the baseline model and other methods available. By expanding our knowledge on understudied kinases, DeepKinZero can help to chart the phosphoproteome atlas. AVAILABILITY AND IMPLEMENTATION The source codes are available at https://github.com/Tastanlab/DeepKinZero. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Iman Deznabi
- Computer Engineering Department, Bilkent University, Ankara 06800, Turkey
- College of Information and Computer Sciences, University of Massachusetts, Amherst, MA 01003, USA
| | - Busra Arabaci
- Computer Engineering Department, Bilkent University, Ankara 06800, Turkey
| | - Mehmet Koyutürk
- Department of Computer and Data Sciences
- Center for Proteomics & Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Oznur Tastan
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| |
Collapse
|
30
|
Watson NA, Cartwright TN, Lawless C, Cámara-Donoso M, Sen O, Sako K, Hirota T, Kimura H, Higgins JMG. Kinase inhibition profiles as a tool to identify kinases for specific phosphorylation sites. Nat Commun 2020; 11:1684. [PMID: 32245944 PMCID: PMC7125195 DOI: 10.1038/s41467-020-15428-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 03/06/2020] [Indexed: 01/08/2023] Open
Abstract
There are thousands of known cellular phosphorylation sites, but the paucity of ways to identify kinases for particular phosphorylation events remains a major roadblock for understanding kinase signaling. To address this, we here develop a generally applicable method that exploits the large number of kinase inhibitors that have been profiled on near-kinome-wide panels of protein kinases. The inhibition profile for each kinase provides a fingerprint that allows identification of unknown kinases acting on target phosphosites in cell extracts. We validate the method on diverse known kinase-phosphosite pairs, including histone kinases, EGFR autophosphorylation, and Integrin β1 phosphorylation by Src-family kinases. We also use our approach to identify the previously unknown kinases responsible for phosphorylation of INCENP at a site within a commonly phosphorylated motif in mitosis (a non-canonical target of Cyclin B-Cdk1), and of BCL9L at S915 (PKA). We show that the method has clear advantages over in silico and genetic screening.
Collapse
Affiliation(s)
- Nikolaus A Watson
- Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Tyrell N Cartwright
- Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Conor Lawless
- Wellcome Centre for Mitochondrial Research, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Marcos Cámara-Donoso
- Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Onur Sen
- Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Kosuke Sako
- The Cancer Institute, Japanese Foundation for Cancer Research, Koto, Tokyo, 135-8550, Japan
| | - Toru Hirota
- The Cancer Institute, Japanese Foundation for Cancer Research, Koto, Tokyo, 135-8550, Japan
| | - Hiroshi Kimura
- Cell Biology Center, Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Kanagawa, 226-8503, Japan
| | - Jonathan M G Higgins
- Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK.
| |
Collapse
|
31
|
Rashid MM, Shatabda S, Hasan MM, Kurata H. Recent Development of Machine Learning Methods in Microbial Phosphorylation Sites. Curr Genomics 2020; 21:194-203. [PMID: 33071613 PMCID: PMC7521030 DOI: 10.2174/1389202921666200427210833] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 04/12/2020] [Accepted: 04/13/2020] [Indexed: 01/10/2023] Open
Abstract
A variety of protein post-translational modifications has been identified that control many cellular functions. Phosphorylation studies in mycobacterial organisms have shown critical importance in diverse biological processes, such as intercellular communication and cell division. Recent technical advances in high-precision mass spectrometry have determined a large number of microbial phosphorylated proteins and phosphorylation sites throughout the proteome analysis. Identification of phosphorylated proteins with specific modified residues through experimentation is often labor-intensive, costly and time-consuming. All these limitations could be overcome through the application of machine learning (ML) approaches. However, only a limited number of computational phosphorylation site prediction tools have been developed so far. This work aims to present a complete survey of the existing ML-predictors for microbial phosphorylation. We cover a variety of important aspects for developing a successful predictor, including operating ML algorithms, feature selection methods, window size, and software utility. Initially, we review the currently available phosphorylation site databases of the microbiome, the state-of-the-art ML approaches, working principles, and their performances. Lastly, we discuss the limitations and future directions of the computational ML methods for the prediction of phosphorylation.
Collapse
Affiliation(s)
| | | | - Md. Mehedi Hasan
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828;, E-mail: and Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| | - Hiroyuki Kurata
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828;, E-mail: and Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| |
Collapse
|
32
|
Veredas FJ, Urda D, Subirats JL, Cantón FR, Aledo JC. Combining feature engineering and feature selection to improve the prediction of methionine oxidation sites in proteins. Neural Comput Appl 2020. [DOI: 10.1007/s00521-018-3655-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
33
|
Wang T, Ma G, Ang CS, Korhonen PK, Stroehlein AJ, Young ND, Hofmann A, Chang BCH, Williamson NA, Gasser RB. The developmental phosphoproteome of Haemonchus contortus. J Proteomics 2019; 213:103615. [PMID: 31846766 DOI: 10.1016/j.jprot.2019.103615] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2019] [Revised: 11/22/2019] [Accepted: 12/13/2019] [Indexed: 12/22/2022]
Abstract
Protein phosphorylation plays essential roles in many cellular processes. Despite recent progress in the genomics, transcriptomics and proteomics of socioeconomically important parasitic nematodes, there is scant phosphoproteomic data to underpin molecular biological discovery. Here, using the phosphopeptide enrichment-based LC-MS/MS and data-independent acquisition (DIA) quantitation, we characterised the first developmental phosphoproteome of the parasitic nematode Haemonchus contortus - one of the most pathogenic parasites of ruminant livestock. Totally, 1804 phosphorylated proteins with 4406 phosphorylation sites ('phosphosites') from different developmental stages/sexes were identified. Bioinformatic analyses of quantified 'phosphosites' exhibited distinctive stage- and sex-specific patterns during development, and identified a subset of phosphoproteins proposed to play crucial roles in processes such as spindle positioning, signal transduction and kinase activity. A sequence-based comparison of the phosphoproteome of H. contortus with those of two free-living nematode species (Caenorhabditis elegans and Pristionchus pacificus) suggested a limited number of common protein phosphorylation events among these species. Our findings infer active roles for protein phosphorylation in the adaptation of a parasitic nematode to a constantly changing external environment. The phosphoproteomic data set for H. contortus provides a basis to better understand phosphorylation and associated biological processes (e.g., regulation of signal transduction), and might enable the discovery of novel anthelmintic targets. SIGNIFICANCE: Here, we report the first phosphoproteome for a socioeconomically parasitic nematode (Haemonchus contortus). This phosphoproteome exhibits distinctive patterns during development, suggesting active roles of post-translational modification in the parasite's adaptation to changing environments within and outside of the host animal. This work sheds a light on the developmental phosphorylation in a parasitic nematode, and could enable the discovery of novel interventions against major pathogens.
Collapse
Affiliation(s)
- Tao Wang
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Guangxu Ma
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Ching-Seng Ang
- Bio21 Mass Spectrometry and Proteomics Facility, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Pasi K Korhonen
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Andreas J Stroehlein
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Neil D Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Andreas Hofmann
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Bill C H Chang
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Nicholas A Williamson
- Bio21 Mass Spectrometry and Proteomics Facility, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| |
Collapse
|
34
|
Abstract
Proteomics and phosphoproteomics have been emerging as new dimensions of omics. Phosphorylation has a profound impact on the biological functions and applications of proteins. It influences everything from intrinsic activity and extrinsic executions to cellular localization. This post-translational modification has been subjected to detailed study and has been an object of analytical curiosity with the advent of faster instrumentation. The major strength of phosphoproteomic research lies in the fact that it gives an overall picture of the workforce of the cell. Phosphoproteomics gives deeper insights into understanding the mechanism behind development and progression of a disease. This review for the first time consolidates the list of existing bioinformatics tools developed for phosphoproteomics. The gap between development of bioinformatics tools and their implementation in clinical research is highlighted. The challenge facing progress is ideally believed to be the interdisciplinary arena this field of research is associated with. For meaningful solutions and deliverables, these tools need to be implemented in clinical studies for obtaining answers to pharmacodynamic questions, saving time, costs and energy. This review hopes to invoke some thought in this direction.
Collapse
|
35
|
Cheng G, Chen Q, Zhang R. Prediction of phosphorylation sites based on granular support vector machine. GRANULAR COMPUTING 2019. [DOI: 10.1007/s41066-019-00202-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
36
|
Maiti S, Hassan A, Mitra P. Boosting phosphorylation site prediction with sequence feature-based machine learning. Proteins 2019; 88:284-291. [PMID: 31412138 DOI: 10.1002/prot.25801] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 07/13/2019] [Accepted: 08/08/2019] [Indexed: 12/13/2022]
Abstract
Protein phosphorylation is one of the essential posttranslation modifications playing a vital role in the regulation of many fundamental cellular processes. We propose a LightGBM-based computational approach that uses evolutionary, geometric, sequence environment, and amino acid-specific features to decipher phosphate binding sites from a protein sequence. Our method, while compared with other existing methods on 2429 protein sequences taken from standard Phospho.ELM (P.ELM) benchmark data set featuring 11 organisms reports a higher F1 score = 0.504 (harmonic mean of the precision and recall) and ROC AUC = 0.836 (area under the curve of the receiver operating characteristics). The computation time of our proposed approach is much less than that of the recently developed deep learning-based framework. Structural analysis on selected protein sequences informs that our prediction is the superset of the phosphorylation sites, as mentioned in P.ELM data set. The foundation of our scheme is manual feature engineering and a decision tree-based classification. Hence, it is intuitive, and one can interpret the final tree as a set of rules resulting in a deeper understanding of the relationships between biophysical features and phosphorylation sites. Our innovative problem transformation method permits more control over precision and recall as is demonstrated by the fact that if we incorporate output probability of the existing deep learning framework as an additional feature, then our prediction improves (F1 score = 0.546; ROC AUC = 0.849). The implementation of our method can be accessed at http://cse.iitkgp.ac.in/~pralay/resources/PPSBoost/ and is mirrored at https://cosmos.iitkgp.ac.in/PPSBoost.
Collapse
Affiliation(s)
- Shyantani Maiti
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal, India
| | - Atif Hassan
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal, India
| |
Collapse
|
37
|
Li H, Guan Y. Machine learning empowers phosphoproteome prediction in cancers. Bioinformatics 2019; 36:859-864. [PMID: 31410451 PMCID: PMC7868059 DOI: 10.1093/bioinformatics/btz639] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2019] [Revised: 07/25/2019] [Accepted: 08/12/2019] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Reversible protein phosphorylation is an essential post-translational modification regulating protein functions and signaling pathways in many cellular processes. Aberrant activation of signaling pathways often contributes to cancer development and progression. The mass spectrometry-based phosphoproteomics technique is a powerful tool to investigate the site-level phosphorylation of the proteome in a global fashion, paving the way for understanding the regulatory mechanisms underlying cancers. However, this approach is time-consuming and requires expensive instruments, specialized expertise and a large amount of starting material. An alternative in silico approach is predicting the phosphoproteomic profiles of cancer patients from the available proteomic, transcriptomic and genomic data. RESULTS Here, we present a winning algorithm in the 2017 NCI-CPTAC DREAM Proteogenomics Challenge for predicting phosphorylation levels of the proteome across cancer patients. We integrate four components into our algorithm, including (i) baseline correlations between protein and phosphoprotein abundances, (ii) universal protein-protein interactions, (iii) shareable regulatory information across cancer tissues and (iv) associations among multi-phosphorylation sites of the same protein. When tested on a large held-out testing dataset of 108 breast and 62 ovarian cancer samples, our method ranked first in both cancer tissues, demonstrating its robustness and generalization ability. AVAILABILITY AND IMPLEMENTATION Our code and reproducible results are freely available on GitHub: https://github.com/GuanLab/phosphoproteome_prediction. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hongyang Li
- To whom correspondence should be addressed. or
| | | |
Collapse
|
38
|
Lumbanraja FR, Mahesworo B, Cenggoro TW, Budiarto A, Pardamean B. An Evaluation of Deep Neural Network Performance on Limited Protein Phosphorylation Site Prediction Data. ACTA ACUST UNITED AC 2019. [DOI: 10.1016/j.procs.2019.08.137] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
39
|
Cao M, Chen G, Yu J, Shi S. Computational prediction and analysis of species-specific fungi phosphorylation via feature optimization strategy. Brief Bioinform 2018; 21:595-608. [DOI: 10.1093/bib/bby122] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Revised: 11/16/2018] [Accepted: 11/22/2018] [Indexed: 11/12/2022] Open
Abstract
Abstract
Protein phosphorylation is a reversible and ubiquitous post-translational modification that primarily occurs at serine, threonine and tyrosine residues and regulates a variety of biological processes. In this paper, we first briefly summarized the current progresses in computational prediction of eukaryotic protein phosphorylation sites, which mainly focused on animals and plants, especially on human, with a less extent on fungi. Since the number of identified fungi phosphorylation sites has greatly increased in a wide variety of organisms and their roles in pathological physiology still remain largely unknown, more attention has been paid on the identification of fungi-specific phosphorylation. Here, experimental fungi phosphorylation sites data were collected and most of the sites were classified into different types to be encoded with various features and trained via a two-step feature optimization method. A novel method for prediction of species-specific fungi phosphorylation-PreSSFP was developed, which can identify fungi phosphorylation in seven species for specific serine, threonine and tyrosine residues (http://computbiol.ncu.edu.cn/PreSSFP). Meanwhile, we critically evaluated the performance of PreSSFP and compared it with other existing tools. The satisfying results showed that PreSSFP is a robust predictor. Feature analyses exhibited that there have some significant differences among seven species. The species-specific prediction via two-step feature optimization method to mine important features for training could considerably improve the prediction performance. We anticipate that our study provides a new lead for future computational analysis of fungi phosphorylation.
Collapse
Affiliation(s)
- Man Cao
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Guodong Chen
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Jialin Yu
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| |
Collapse
|
40
|
Amano M, Nishioka T, Tsuboi D, Kuroda K, Funahashi Y, Yamahashi Y, Kaibuchi K. Comprehensive analysis of kinase-oriented phospho-signalling pathways. J Biochem 2018; 165:301-307. [DOI: 10.1093/jb/mvy115] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 12/15/2018] [Indexed: 02/01/2023] Open
Affiliation(s)
- Mutsuki Amano
- Department of Cell Pharmacology, Graduate School of Medicine, Nagoya University, 65 Tsurumai, Showa-ku, Nagoya, Aichi, Japan
| | - Tomoki Nishioka
- Department of Cell Pharmacology, Graduate School of Medicine, Nagoya University, 65 Tsurumai, Showa-ku, Nagoya, Aichi, Japan
| | - Daisuke Tsuboi
- Department of Cell Pharmacology, Graduate School of Medicine, Nagoya University, 65 Tsurumai, Showa-ku, Nagoya, Aichi, Japan
| | - Keisuke Kuroda
- Department of Cell Pharmacology, Graduate School of Medicine, Nagoya University, 65 Tsurumai, Showa-ku, Nagoya, Aichi, Japan
| | - Yasuhiro Funahashi
- Department of Cell Pharmacology, Graduate School of Medicine, Nagoya University, 65 Tsurumai, Showa-ku, Nagoya, Aichi, Japan
| | - Yukie Yamahashi
- Department of Cell Pharmacology, Graduate School of Medicine, Nagoya University, 65 Tsurumai, Showa-ku, Nagoya, Aichi, Japan
| | - Kozo Kaibuchi
- Department of Cell Pharmacology, Graduate School of Medicine, Nagoya University, 65 Tsurumai, Showa-ku, Nagoya, Aichi, Japan
| |
Collapse
|
41
|
Zhou Y, Mkrtchian S, Kumondai M, Hiratsuka M, Lauschke VM. An optimized prediction framework to assess the functional impact of pharmacogenetic variants. THE PHARMACOGENOMICS JOURNAL 2018; 19:115-126. [PMID: 30206299 PMCID: PMC6462826 DOI: 10.1038/s41397-018-0044-2] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Revised: 06/27/2018] [Accepted: 08/10/2018] [Indexed: 01/25/2023]
Abstract
Prediction of phenotypic consequences of mutations constitutes an important aspect of precision medicine. Current computational tools mostly rely on evolutionary conservation and have been calibrated on variants associated with disease, which poses conceptual problems for assessment of variants in poorly conserved pharmacogenes. Here, we evaluated the performance of 18 current functionality prediction methods leveraging experimental high-quality activity data from 337 variants in genes involved in drug metabolism and transport and found that these models only achieved probabilities of 0.1–50.6% to make informed conclusions. We therefore developed a functionality prediction framework optimized for pharmacogenetic assessments that significantly outperformed current algorithms. Our model achieved 93% for both sensitivity and specificity for both loss-of-function and functionally neutral variants, and we confirmed its superior performance using cross validation analyses. This novel model holds promise to improve the translation of personal genetic information into biological conclusions and pharmacogenetic recommendations, thereby facilitating the implementation of Next-Generation Sequencing data into clinical diagnostics.
Collapse
Affiliation(s)
- Yitian Zhou
- Department of Physiology and Pharmacology, Section of Pharmacogenetics, Karolinska Institutet, SE-171 77, Stockholm, Sweden
| | - Souren Mkrtchian
- Department of Physiology and Pharmacology, Section of Pharmacogenetics, Karolinska Institutet, SE-171 77, Stockholm, Sweden
| | - Masaki Kumondai
- Laboratory of Pharmacotherapy of Life-Style Related Diseases, Graduate School of Pharmaceutical Sciences, Tohoku University, Sendai, Japan
| | - Masahiro Hiratsuka
- Laboratory of Pharmacotherapy of Life-Style Related Diseases, Graduate School of Pharmaceutical Sciences, Tohoku University, Sendai, Japan
| | - Volker M Lauschke
- Department of Physiology and Pharmacology, Section of Pharmacogenetics, Karolinska Institutet, SE-171 77, Stockholm, Sweden.
| |
Collapse
|
42
|
Kaur G, Pati PK. In silico insights on diverse interacting partners and phosphorylation sites of respiratory burst oxidase homolog (Rbohs) gene families from Arabidopsis and rice. BMC PLANT BIOLOGY 2018; 18:161. [PMID: 30097007 PMCID: PMC6086027 DOI: 10.1186/s12870-018-1378-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Accepted: 07/30/2018] [Indexed: 05/14/2023]
Abstract
BACKGROUND NADPH oxidase (Nox) is a critical enzyme involved in the generation of apoplastic superoxide (O2-), a type of reactive oxygen species (ROS) and hence regulate a wide range of biological functions in many organisms. Plant Noxes are the homologs of the catalytic subunit from mammalian NADPH oxidases and are known as respiratory burst oxidase homologs (Rbohs). Previous studies have highlighted their versatile roles in tackling different kind of stresses and in plant growth and development. In the current study, potential interacting partners and phosphorylation sites were predicted for Rboh proteins from two model species (10 Rbohs from Arabidopsis thaliana and 9 from Oryza sativa japonica). The present work is the first step towards in silico prediction of interacting partners and phosphorylation sites for Rboh proteins from two plant species. RESULTS In this work, an extensive range of potential partners (unique and common), leading to diverse functions were revealed from interaction networks and gene ontology classifications, where majority of AtRbohs and OsRbohs play role in stress-related activities, followed by cellular development. Further, 68 and 38 potential phosphorylation sites were identified in AtRbohs and OsRbohs, respectively. Their distribution, location and kinase specificities were also predicted and correlated with experimental data as well as verified with the other EF-hand containing proteins within both genomes. CONCLUSIONS Analysis of regulatory mechanisms including interaction with diverse partners and post-translational modifications like phosphorylation have provided insights regarding functional multiplicity of Rbohs. The bioinformatics-based workflow in the current study can be used to get insights for interacting partners and phosphorylation sites from Rbohs of other plant species.
Collapse
Affiliation(s)
- Gurpreet Kaur
- Department of Biotechnology, Guru Nanak Dev University (GNDU), Amritsar, Punjab, 143005, India
- Present Address: Quantitative Biology Center (QBiC), University of Tuebingen, 72076, Tuebingen, Germany
| | - Pratap Kumar Pati
- Department of Biotechnology, Guru Nanak Dev University (GNDU), Amritsar, Punjab, 143005, India.
| |
Collapse
|
43
|
Shi S, Wang L, Cao M, Chen G, Yu J. Proteomic analysis and prediction of amino acid variations that influence protein posttranslational modifications. Brief Bioinform 2018; 20:1597-1606. [DOI: 10.1093/bib/bby036] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 03/07/2018] [Indexed: 12/18/2022] Open
Abstract
Abstract
Accumulative studies have indicated that amino acid variations through changing the type of residues of the target sites or key flanking residues could directly or indirectly influence protein posttranslational modifications (PTMs) and bring about a detrimental effect on protein function. Computational mutation analysis can greatly narrow down the efforts on experimental work. To increase the utilization of current computational resources, we first provide an overview of computational prediction of amino acid variations that influence protein PTMs and their functional analysis. We also discuss the challenges that are faced while developing novel in silico approaches in the future. The development of better methods for mutation analysis-related protein PTMs will help to facilitate the development of personalized precision medicine.
Collapse
Affiliation(s)
- Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Lina Wang
- Department of Science, Nanchang Institute of Technology, Nanchang, Jiangxi 330031, China
| | - Man Cao
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Guodong Chen
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Jialin Yu
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| |
Collapse
|
44
|
Zhou HX, Pang X. Electrostatic Interactions in Protein Structure, Folding, Binding, and Condensation. Chem Rev 2018; 118:1691-1741. [PMID: 29319301 DOI: 10.1021/acs.chemrev.7b00305] [Citation(s) in RCA: 454] [Impact Index Per Article: 75.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Charged and polar groups, through forming ion pairs, hydrogen bonds, and other less specific electrostatic interactions, impart important properties to proteins. Modulation of the charges on the amino acids, e.g., by pH and by phosphorylation and dephosphorylation, have significant effects such as protein denaturation and switch-like response of signal transduction networks. This review aims to present a unifying theme among the various effects of protein charges and polar groups. Simple models will be used to illustrate basic ideas about electrostatic interactions in proteins, and these ideas in turn will be used to elucidate the roles of electrostatic interactions in protein structure, folding, binding, condensation, and related biological functions. In particular, we will examine how charged side chains are spatially distributed in various types of proteins and how electrostatic interactions affect thermodynamic and kinetic properties of proteins. Our hope is to capture both important historical developments and recent experimental and theoretical advances in quantifying electrostatic contributions of proteins.
Collapse
Affiliation(s)
- Huan-Xiang Zhou
- Department of Chemistry and Department of Physics, University of Illinois at Chicago , Chicago, Illinois 60607, United States.,Department of Physics and Institute of Molecular Biophysics, Florida State University , Tallahassee, Florida 32306, United States
| | - Xiaodong Pang
- Department of Physics and Institute of Molecular Biophysics, Florida State University , Tallahassee, Florida 32306, United States
| |
Collapse
|
45
|
Prediction of Protein Phosphorylation Sites by Integrating Secondary Structure Information and Other One-Dimensional Structural Properties. Methods Mol Biol 2018; 1484:265-274. [PMID: 27787832 DOI: 10.1007/978-1-4939-6406-2_18] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Studies on phosphorylation are important but challenging for both wet-bench experiments and computational studies, and accurate non-kinase-specific prediction tools are highly desirable for whole-genome annotation in a wide variety of species. Here, we describe a phosphorylation site prediction webserver, PhosphoSVM, that employs Support Vector Machine to combine protein secondary structure information and seven other one-dimensional structural properties, including Shannon entropy, relative entropy, predicted protein disorder information, predicted solvent accessible area, amino acid overlapping properties, averaged cumulative hydrophobicity, and subsequence k-nearest neighbor profiles. This method achieved AUC values of 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites, respectively, in animals with a tenfold cross-validation. The model trained by the animal phosphorylation sites was also applied to a plant phosphorylation site dataset as an independent test. The AUC values for the independent test data set were 0.7761/0.6652/0.5958 for S/T/Y phosphorylation sites, respectively. This algorithm with the optimally trained model was implemented as a webserver. The webserver, trained model, and all datasets used in the current study are available at http://sysbio.unl.edu/PhosphoSVM .
Collapse
|
46
|
Lumbanraja FR, Nguyen NG, Phan D, Faisal MR, Abapihi B, Purnama B, Delimayanti MK, Kubo M, Satou K. Improved Protein Phosphorylation Site Prediction by a New Combination of Feature Set and Feature Selection. ACTA ACUST UNITED AC 2018. [DOI: 10.4236/jbise.2018.116013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
47
|
Li GXH, Vogel C, Choi H. PTMscape: an open source tool to predict generic post-translational modifications and map modification crosstalk in protein domains and biological processes. Mol Omics 2018; 14:197-209. [PMID: 29876573 PMCID: PMC6115748 DOI: 10.1039/c8mo00027a] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
PTMscape predicts PTM sites using descriptors of sequence and physico-chemical microenvironment, and tests enrichment of single or pairs of PTMs in protein domains.
While tandem mass spectrometry can detect post-translational modifications (PTM) at the proteome scale, reported PTM sites are often incomplete and include false positives. Computational approaches can complement these datasets by additional predictions, but most available tools use prediction models pre-trained for single PTM type by the developers and it remains a difficult task to perform large-scale batch prediction for multiple PTMs with flexible user control, including the choice of training data. We developed an R package called PTMscape which predicts PTM sites across the proteome based on a unified and comprehensive set of descriptors of the physico-chemical microenvironment of modified sites, with additional downstream analysis modules to test enrichment of individual or pairs of PTMs in protein domains. PTMscape is flexible in the ability to process any major modifications, such as phosphorylation and ubiquitination, while achieving the sensitivity and specificity comparable to single-PTM methods and outperforming other multi-PTM tools. Applying this framework, we expanded proteome-wide coverage of five major PTMs affecting different residues by prediction, especially for lysine and arginine modifications. Using a combination of experimentally acquired sites (PSP) and newly predicted sites, we discovered that the crosstalk among multiple PTMs occur more frequently than by random chance in key protein domains such as histone, protein kinase, and RNA recognition motifs, spanning various biological processes such as RNA processing, DNA damage response, signal transduction, and regulation of cell cycle. These results provide a proteome-scale analysis of crosstalk among major PTMs and can be easily extended to other types of PTM.
Collapse
Affiliation(s)
- Ginny X H Li
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore.
| | | | | |
Collapse
|
48
|
Wang M, Wang T, Li A. ksrMKL: a novel method for identification of kinase-substrate relationships using multiple kernel learning. PeerJ 2017; 5:e4182. [PMID: 29340231 PMCID: PMC5741978 DOI: 10.7717/peerj.4182] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 12/01/2017] [Indexed: 01/24/2023] Open
Abstract
Phosphorylation exerts a crucial role in multiple biological cellular processes which is catalyzed by protein kinases and closely related to many diseases. Identification of kinase-substrate relationships is important for understanding phosphorylation and provides a fundamental basis for further disease-related research and drug design. In this study, we develop a novel computational method to identify kinase-substrate relationships based on multiple kernel learning. The comparative analysis is based on a 10-fold cross-validation process and the dataset collected from the Phospho.ELM database. The results show that ksrMKL is greatly improved in various measures when compared with the single kernel support vector machine. Furthermore, with an independent test dataset extracted from the PhosphoSitePlus database, we compare ksrMKL with two existing kinase-substrate relationship prediction tools, namely iGPS and PKIS. The experimental results show that ksrMKL has better prediction performance than these existing tools.
Collapse
Affiliation(s)
- Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei, China.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, China
| | - Tao Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei, China.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, China
| |
Collapse
|
49
|
Wilson LJ, Linley A, Hammond DE, Hood FE, Coulson JM, MacEwan DJ, Ross SJ, Slupsky JR, Smith PD, Eyers PA, Prior IA. New Perspectives, Opportunities, and Challenges in Exploring the Human Protein Kinome. Cancer Res 2017; 78:15-29. [DOI: 10.1158/0008-5472.can-17-2291] [Citation(s) in RCA: 97] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 09/22/2017] [Accepted: 10/31/2017] [Indexed: 11/16/2022]
|
50
|
Liu J, Cai W, Fang X, Wang X, Li G. Virus-induced apoptosis and phosphorylation form of metacaspase in the marine coccolithophorid Emiliania huxleyi. Arch Microbiol 2017; 200:413-422. [DOI: 10.1007/s00203-017-1460-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Revised: 09/01/2017] [Accepted: 11/17/2017] [Indexed: 12/12/2022]
|