1
|
Hong X, Lv J, Li Z, Xiong Y, Zhang J, Chen HF. Sequence-based machine learning method for predicting the effects of phosphorylation on protein-protein interactions. Int J Biol Macromol 2023; 243:125233. [PMID: 37290543 DOI: 10.1016/j.ijbiomac.2023.125233] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 06/02/2023] [Accepted: 06/03/2023] [Indexed: 06/10/2023]
Abstract
Protein phosphorylation, catalyzed by kinases, is an important biochemical process, which plays an essential role in multiple cell signaling pathways. Meanwhile, protein-protein interactions (PPI) constitute the signaling pathways. Abnormal phosphorylation status on protein can regulate protein functions through PPI to evoke severe diseases, such as Cancer and Alzheimer's disease. Due to the limited experimental evidence and high costs to experimentally identify novel evidence of phosphorylation regulation on PPI, it is necessary to develop a high-accuracy and user-friendly artificial intelligence method to predict phosphorylation effect on PPI. Here, we proposed a novel sequence-based machine learning method named PhosPPI, which achieved better identification performance (Accuracy and AUC) than other competing predictive methods of Betts, HawkDock and FoldX. PhosPPI is now freely available in web server (https://phosppi.sjtu.edu.cn/). This tool can help the user to identify functional phosphorylation sites affecting PPI and explore phosphorylation-associated disease mechanism and drug development.
Collapse
Affiliation(s)
- Xiaokun Hong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jiyang Lv
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhengxin Li
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jian Zhang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao-Tong University School of Medicine (SJTU-SM), Shanghai 200025, China.
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China.
| |
Collapse
|
2
|
Rahman A, Ahmed S, Al Mehedi Hasan M, Ahmad S, Dehzangi I. Accurately predicting nitrosylated tyrosine sites using probabilistic sequence information. Gene 2022; 826:146445. [PMID: 35358650 DOI: 10.1016/j.gene.2022.146445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 02/16/2022] [Accepted: 03/18/2022] [Indexed: 11/04/2022]
Abstract
Post-translational modification (PTM) is defined as the enzymatic changes of proteins after the translation process in protein biosynthesis. Nitrotyrosine, which is one of the most important modifications of proteins, is interceded by the active nitrogen molecule. It is known to be associated with different diseases including autoimmune diseases characterized by chronic inflammation and cell damage. Currently, nitrotyrosine sites are identified using experimental approaches which are laborious and costly. In this study, we propose a new machine learning method called PredNitro to accurately predict nitrotyrosine sites. To build PredNitro, we use sequence coupling information from the neighboring amino acids of tyrosine residues along with a support vector machine as our classification technique.Our results demonstrates that PredNitro achieves 98.0% accuracy with more than 0.96 MCC and 0.99 AUC in both 5-fold cross-validation and jackknife cross-validation tests which are significantly better than those reported in previous studies. PredNitro is publicly available as an online predictor at: http://103.99.176.239/PredNitro.
Collapse
Affiliation(s)
- Afrida Rahman
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Sabit Ahmed
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Md Al Mehedi Hasan
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Shamim Ahmad
- Department of Computer Science and Engineering, University of Rajshahi, Rajshahi, Bangladesh
| | - Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ 08102, USA; Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA.
| |
Collapse
|
3
|
Ao C, Yu L, Zou Q. Prediction of bio-sequence modifications and the associations with diseases. Brief Funct Genomics 2020; 20:1-18. [PMID: 33313647 DOI: 10.1093/bfgp/elaa023] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Revised: 11/09/2020] [Accepted: 11/10/2020] [Indexed: 12/22/2022] Open
Abstract
Modifications of protein, RNA and DNA play an important role in many biological processes and are related to some diseases. Therefore, accurate identification and comprehensive understanding of protein, RNA and DNA modification sites can promote research on disease treatment and prevention. With the development of sequencing technology, the number of known sequences has continued to increase. In the past decade, many computational tools that can be used to predict protein, RNA and DNA modification sites have been developed. In this review, we comprehensively summarized the modification site predictors for three different biological sequences and the association with diseases. The relevant web server is accessible at http://lab.malab.cn/∼acy/PTM_data/ some sample data on protein, RNA and DNA modification can be downloaded from that website.
Collapse
|
4
|
Piovesan D, Hatos A, Minervini G, Quaglia F, Monzon AM, Tosatto SCE. Assessing predictors for new post translational modification sites: A case study on hydroxylation. PLoS Comput Biol 2020; 16:e1007967. [PMID: 32569263 PMCID: PMC7332089 DOI: 10.1371/journal.pcbi.1007967] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 07/02/2020] [Accepted: 05/19/2020] [Indexed: 12/15/2022] Open
Abstract
Post-translational modification (PTM) sites have become popular for predictor development. However, with the exception of phosphorylation and a handful of other examples, PTMs suffer from a limited number of available training examples and sparsity in protein sequences. Here, proline hydroxylation is taken as an example to compare different methods and evaluate their performance on new experimentally determined sites. As a guide for effective experimental design, predictors require both high specificity and sensitivity. However, the self-reported performance may often not be indicative of prediction quality and detection of new sites is not guaranteed. We have benchmarked seven published hydroxylation site predictors on two newly constructed independent datasets. The self-reported performance is found to widely overestimate the real accuracy measured on independent datasets. No predictor performs better than random on new examples, indicating the refined models do not sufficiently generalize to detect new sites. The number of false positives is high and precision low, in particular for non-collagen proteins whose motifs are not conserved. As hydroxylation site predictors do not generalize for new data, caution is advised when using PTM predictors in the absence of independent evaluations, in particular for highly specific sites involved in signalling. Machine learning methods are extensively used by biologists to design and interpret experiments. Predictors which take the only sequence as input are of particular interest due to the large amount of available sequence data and high self-reported performance. In this work, we evaluated post-translational modification (PTM) predictors for hydroxylation sites and found that they perform no better than random, in strong contrast to performances reported in their original publications. PTMs are chemical amino acid alterations providing the cell with conditional mechanisms to fine tune protein function, regulating complex biological processes such as signalling and cell cycle. Hydroxylation sites are a good PTM test case due to the availability of a range of predictors and an abundance of newly experimentally detected modification sites. Poor performances in our results highlight the overlooked problem of predicting PTMs when best practices are not followed and training data are likely incomplete. Experimentalists should be careful when using PTM predictors blindly and more independent assessments are needed to establish their usefulness in practice.
Collapse
Affiliation(s)
- Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, Padua, Italy
- * E-mail:
| | - Andras Hatos
- Department of Biomedical Sciences, University of Padua, Padua, Italy
| | | | - Federica Quaglia
- Department of Biomedical Sciences, University of Padua, Padua, Italy
| | | | | |
Collapse
|
5
|
Tyshchuk O, Gstöttner C, Funk D, Nicolardi S, Frost S, Klostermann S, Becker T, Jolkver E, Schumacher F, Koller CF, Völger HR, Wuhrer M, Bulau P, Mølhøj M. Characterization and prediction of positional 4-hydroxyproline and sulfotyrosine, two post-translational modifications that can occur at substantial levels in CHO cells-expressed biotherapeutics. MAbs 2019; 11:1219-1232. [PMID: 31339437 PMCID: PMC6748591 DOI: 10.1080/19420862.2019.1635865] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 06/07/2019] [Accepted: 06/21/2019] [Indexed: 02/06/2023] Open
Abstract
Biotherapeutics may contain a multitude of different post-translational modifications (PTMs) that need to be assessed and possibly monitored and controlled to ensure reproducible product quality. During early development of biotherapeutics, unexpected PTMs might be prevented by in silico identification and characterization together with further molecular engineering. Mass determinations of a human IgG1 (mAb1) and a bispecific IgG-ligand fusion protein (BsAbA) demonstrated the presence of unusual PTMs resulting in major +80 Da, and +16/+32 Da chain variants, respectively. For mAb1, analytical cation exchange chromatography demonstrated the presence of an acidic peak accounting for 20%. A + 79.957 Da modification was localized within the light chain complementarity-determining region-2 and identified as a sulfation based on accurate mass, isotopic distribution, and a complete neutral loss reaction upon collision-induced dissociation. Top-down ultrahigh resolution MALDI-ISD FT-ICR MS of modified and unmodified Fabs allowed the allocation of the sulfation to a specific Tyr residue. An aspartate in amino-terminal position-3 relative to the affected Tyr was found to play a key role in determining the sulfation. For BsAbA, a + 15.995 Da modification was observed and localized to three specific Pro residues explaining the +16 Da chain A, and +16 Da and +32 Da chain B variants. The BsAbA modifications were verified as 4-hydroxyproline and not 3-hydroxyproline in a tryptic peptide map via co-chromatography with synthetic peptides containing the two isomeric forms. Finally, our approach for an alert system based on in-house in silico predictors is presented. This system is designed to prevent these PTMs by molecular design and engineering during early biotherapeutic development.
Collapse
Affiliation(s)
- Oksana Tyshchuk
- Roche Pharma Research and Early Development, Large Molecule Research, Roche Innovation Center Munich, Penzberg, Germany
| | - Christoph Gstöttner
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, Netherlands
| | - Dennis Funk
- Roche Pharma Research and Early Development, Large Molecule Research, Roche Innovation Center Munich, Penzberg, Germany
| | - Simone Nicolardi
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, Netherlands
| | - Stefan Frost
- Roche Pharma Research and Early Development, Large Molecule Research, Roche Innovation Center Munich, Penzberg, Germany
| | - Stefan Klostermann
- Roche Pharma Research and Early Development Informatics, Roche Innovation Center Munich, Penzberg, Germany
| | | | | | - Felix Schumacher
- Roche Pharma Research and Early Development, Large Molecule Research, Roche Innovation Center Munich, Penzberg, Germany
| | - Claudia Ferrara Koller
- Roche Pharma Research and Early Development, Large Molecule Research, Roche Innovation Center Zurich, Schlieren, Switzerland
| | - Hans Rainer Völger
- Roche Pharma Research and Early Development, Large Molecule Research, Roche Innovation Center Munich, Penzberg, Germany
| | - Manfred Wuhrer
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, Netherlands
| | - Patrick Bulau
- Roche Pharma Technical Development Penzberg, Penzberg, Germany
| | - Michael Mølhøj
- Roche Pharma Research and Early Development, Large Molecule Research, Roche Innovation Center Munich, Penzberg, Germany
| |
Collapse
|
6
|
He W, Wei L, Zou Q. Research progress in protein posttranslational modification site prediction. Brief Funct Genomics 2018; 18:220-229. [DOI: 10.1093/bfgp/ely039] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 11/15/2018] [Accepted: 11/22/2018] [Indexed: 01/24/2023] Open
Abstract
AbstractPosttranslational modifications (PTMs) play an important role in regulating protein folding, activity and function and are involved in almost all cellular processes. Identification of PTMs of proteins is the basis for elucidating the mechanisms of cell biology and disease treatments. Compared with the laboriousness of equivalent experimental work, PTM prediction using various machine-learning methods can provide accurate, simple and rapid research solutions and generate valuable information for further laboratory studies. In this review, we manually curate most of the bioinformatics tools published since 2008. We also summarize the approaches for predicting ubiquitination sites and glycosylation sites. Moreover, we discuss the challenges of current PTM bioinformatics tools and look forward to future research possibilities.
Collapse
Affiliation(s)
- Wenying He
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
7
|
Xi L, Yao J, Wei Y, Wu X, Yao X, Liu H, Li S. The in silico identification of human bile salt export pump (ABCB11) inhibitors associated with cholestatic drug-induced liver injury. MOLECULAR BIOSYSTEMS 2017; 13:417-424. [PMID: 28092392 DOI: 10.1039/c6mb00744a] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Drug-induced liver injury (DILI) is one of the major causes of drug attrition and failure. Currently, there is increasing evidence that direct inhibition of the human bile salt export pump (BSEP/ABCB11) by drugs and/or metabolites is one of the most important mechanisms of cholestatic DILI. In the present study, we employ two in silico methods, random forest (RF) and the pharmacophore method, to recognize potential BSEP inhibitors that could cause cholestatic DILI, with the aim of mitigating the risk of cholestatic DILI to some extent. The RF model achieved the best prediction performance, producing AUC (area under receiver operating characteristic curve) values of 0.901, 0.929 and 0.996 for leave-one-out cross-validation, the test set and the external test set, respectively, indicating that the built RF model has a satisfactory identification ability. As a complement to the RF model, the pharmacophore model was also built and was proved to be reliable with good predictive performance based on the internal and external validation results. Further analysis indicates that hydrophobicity, molecular size and polarity are important factors that influence the inhibitory activity of BSEP. Furthermore, the two models are applied to screen FDA-approved small molecule drugs, among which the drugs with the potential risk of cholestatic DILI are reported. In conclusion, the RF and pharmacophore models that we present can be considered as integrated screening tools to indicate the potential risk of cholestatic DILI by inhibition of BSEP.
Collapse
Affiliation(s)
- Lili Xi
- Department of Pharmacy, The First Hospital of Lanzhou University, Lanzhou University, Lanzhou, 730000, China
| | - Jia Yao
- Department of Science and Technology, The First Hospital of Lanzhou University, Lanzhou University, Lanzhou, 730000, China
| | - Yuhui Wei
- Department of Pharmacy, The First Hospital of Lanzhou University, Lanzhou University, Lanzhou, 730000, China
| | - Xin'an Wu
- Department of Pharmacy, The First Hospital of Lanzhou University, Lanzhou University, Lanzhou, 730000, China
| | - Xiaojun Yao
- College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou, 730000, China.
| | - Huanxiang Liu
- School of Pharmacy, Lanzhou University, Lanzhou, 730000, China
| | - Shuyan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou, 730000, China.
| |
Collapse
|