1
|
Jorge GL, Kim D, Xu C, Cho SH, Su L, Xu D, Bartley LE, Stacey G, Thelen JJ. Unveiling orphan receptor-like kinases in plants: novel client discovery using high-confidence library predictions in the Kinase-Client (KiC) assay. FRONTIERS IN PLANT SCIENCE 2024; 15:1372361. [PMID: 38633461 PMCID: PMC11021772 DOI: 10.3389/fpls.2024.1372361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 03/20/2024] [Indexed: 04/19/2024]
Abstract
Plants are remarkable in their ability to adapt to changing environments, with receptor-like kinases (RLKs) playing a pivotal role in perceiving and transmitting environmental cues into cellular responses. Despite extensive research on RLKs from the plant kingdom, the function and activity of many kinases, i.e., their substrates or "clients", remain uncharted. To validate a novel client prediction workflow and learn more about an important RLK, this study focuses on P2K1 (DORN1), which acts as a receptor for extracellular ATP (eATP), playing a crucial role in plant stress resistance and immunity. We designed a Kinase-Client (KiC) assay library of 225 synthetic peptides, incorporating previously identified P2K phosphorylated peptides and novel predictions from a deep-learning phosphorylation site prediction model (MUsite) and a trained hidden Markov model (HMM) based tool, HMMER. Screening the library against purified P2K1 cytosolic domain (CD), we identified 46 putative substrates, including 34 novel clients, 27 of which may be novel peptides, not previously identified experimentally. Gene Ontology (GO) analysis among phosphopeptide candidates revealed proteins associated with important biological processes in metabolism, structure development, and response to stress, as well as molecular functions of kinase activity, catalytic activity, and transferase activity. We offer selection criteria for efficient further in vivo experiments to confirm these discoveries. This approach not only expands our knowledge of P2K1's substrates and functions but also highlights effective prediction algorithms for identifying additional potential substrates. Overall, the results support use of the KiC assay as a valuable tool in unraveling the complexities of plant phosphorylation and provide a foundation for predicting the phosphorylation landscape of plant species based on peptide library results.
Collapse
Affiliation(s)
- Gabriel Lemes Jorge
- Division of Biochemistry, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Daewon Kim
- Division of Plant Science & Technology, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Chunhui Xu
- Institute for Data Science and Informatics, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Sung-Hwan Cho
- Division of Plant Science & Technology, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Lingtao Su
- Department of Electrical Engineering and Computer Science, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
- Shandong University of Science and Technology, Qingdao, Shandong, China
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Laura E. Bartley
- Institute of Biological Chemistry, Washington State University, Pullman, WA, United States
| | - Gary Stacey
- Division of Plant Science & Technology, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Jay J. Thelen
- Division of Biochemistry, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| |
Collapse
|
2
|
Zahiri Z, Mehrshad N, Mehrshad M. DF-Phos: Prediction of Protein Phosphorylation Sites by Deep Forest. J Biochem 2024; 175:447-456. [PMID: 38153271 DOI: 10.1093/jb/mvad116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 12/10/2023] [Accepted: 12/12/2023] [Indexed: 12/29/2023] Open
Abstract
Phosphorylation is the most important and studied post-translational modification (PTM), which plays a crucial role in protein function studies and experimental design. Many significant studies have been performed to predict phosphorylation sites using various machine-learning methods. Recently, several studies have claimed that deep learning-based methods are the best way to predict the phosphorylation sites because deep learning as an advanced machine learning method can automatically detect complex representations of phosphorylation patterns from raw sequences and thus offers a powerful tool to improve phosphorylation site prediction. In this study, we report DF-Phos, a new phosphosite predictor based on the Deep Forest to predict phosphorylation sites. In DF-Phos, the feature vector taken from the CkSAApair method is as input for a Deep Forest framework for predicting phosphorylation sites. The results of 10-fold cross-validation show that the Deep Forest method has the highest performance among other available methods. We implemented a Python program of DF-Phos, which is freely available for non-commercial use at https://github.com/zahiriz/DF-Phos Moreover, users can use it for various PTM predictions.
Collapse
Affiliation(s)
- Zeynab Zahiri
- Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Nasser Mehrshad
- Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Maliheh Mehrshad
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, 750 07 Sweden
| |
Collapse
|
3
|
Xie J, Quan L, Wang X, Wu H, Jin Z, Pan D, Chen T, Wu T, Lyu Q. DeepMPSF: A Deep Learning Network for Predicting General Protein Phosphorylation Sites Based on Multiple Protein Sequence Features. J Chem Inf Model 2023; 63:7258-7271. [PMID: 37931253 DOI: 10.1021/acs.jcim.3c00996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
Phosphorylation, as one of the most important post-translational modifications, plays a key role in various cellular physiological processes and disease occurrences. In recent years, computer technology has been gradually applied to the prediction of protein phosphorylation sites. However, most existing methods rely on simple protein sequence features that provide limited contextual information. To overcome this limitation, we propose DeepMPSF, a phosphorylation site prediction model based on multiple protein sequence features. There are two types of features: sequence semantic features, which comprise protein residue type information and relative position information within protein sequence, and protein background biophysical features, which include global semantic information containing more comprehensive protein background information obtained from pretrained models. To extract these features, DeepMPSF employs two separate subnetworks: the S71SFE module and the BBFE module, which automatically extract high-level semantic features. Our model incorporates a learning strategy for handling imbalanced datasets through ensemble learning during training and prediction. DeepMPSF is trained and evaluated on a well-established dataset of human proteins. Comparing the analysis with other benchmark methods reveals that DeepMPSF outperforms in predicting both S/T residues and Y residues. In particular, DeepMPSF showed excellent generalization performance in cross-species blind test performance, with an average improvement of 5.63%/5.72%, 22.28%/25.94%, 20.11%/17.49%, and 26.40%/28.33% for Mus musculus/Rattus norvegicus test sets in area under curves (AUCs) of ROC curve, AUC of the PR curve, F1-score, and MCC metrics, respectively. Furthermore, it also shows excellent performance in the latest updated case of natural proteins with functional phosphorylation sites. Through an ablation study and visual analysis, we uncover that the design of different feature modules significantly contributes to the accurate classification of DeepMPSF, which provides valuable insights for predicting phosphorylation sites and offers effective support for future downstream research.
Collapse
Affiliation(s)
- Jingxin Xie
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Lijun Quan
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
- Province Key Lab for Information Processing Technologies, Soochow University, Suzhou 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Xuejiao Wang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Hongjie Wu
- Suzhou University of Science and Technology, Suzhou 215006, China
| | - Zhi Jin
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Deng Pan
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Taoning Chen
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Tingfang Wu
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
- Province Key Lab for Information Processing Technologies, Soochow University, Suzhou 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Qiang Lyu
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
- Province Key Lab for Information Processing Technologies, Soochow University, Suzhou 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| |
Collapse
|
4
|
Pham NT, Phan LT, Seo J, Kim Y, Song M, Lee S, Jeon YJ, Manavalan B. Advancing the accuracy of SARS-CoV-2 phosphorylation site detection via meta-learning approach. Brief Bioinform 2023; 25:bbad433. [PMID: 38058187 PMCID: PMC10753650 DOI: 10.1093/bib/bbad433] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 10/30/2023] [Accepted: 11/05/2023] [Indexed: 12/08/2023] Open
Abstract
The worldwide appearance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has generated significant concern and posed a considerable challenge to global health. Phosphorylation is a common post-translational modification that affects many vital cellular functions and is closely associated with SARS-CoV-2 infection. Precise identification of phosphorylation sites could provide more in-depth insight into the processes underlying SARS-CoV-2 infection and help alleviate the continuing COVID-19 crisis. Currently, available computational tools for predicting these sites lack accuracy and effectiveness. In this study, we designed an innovative meta-learning model, Meta-Learning for Serine/Threonine Phosphorylation (MeL-STPhos), to precisely identify protein phosphorylation sites. We initially performed a comprehensive assessment of 29 unique sequence-derived features, establishing prediction models for each using 14 renowned machine learning methods, ranging from traditional classifiers to advanced deep learning algorithms. We then selected the most effective model for each feature by integrating the predicted values. Rigorous feature selection strategies were employed to identify the optimal base models and classifier(s) for each cell-specific dataset. To the best of our knowledge, this is the first study to report two cell-specific models and a generic model for phosphorylation site prediction by utilizing an extensive range of sequence-derived features and machine learning algorithms. Extensive cross-validation and independent testing revealed that MeL-STPhos surpasses existing state-of-the-art tools for phosphorylation site prediction. We also developed a publicly accessible platform at https://balalab-skku.org/MeL-STPhos. We believe that MeL-STPhos will serve as a valuable tool for accelerating the discovery of serine/threonine phosphorylation sites and elucidating their role in post-translational regulation.
Collapse
Affiliation(s)
- Nhat Truong Pham
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Le Thi Phan
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Jimin Seo
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Yeonwoo Kim
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Minkyung Song
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Sukchan Lee
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Young-Jun Jeon
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Balachandran Manavalan
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| |
Collapse
|
5
|
Anandakrishnan M, Ross KE, Chen C, Shanker V, Cowart J, Wu CH. KSFinder-a knowledge graph model for link prediction of novel phosphorylated substrates of kinases. PeerJ 2023; 11:e16164. [PMID: 37818330 PMCID: PMC10561642 DOI: 10.7717/peerj.16164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 09/01/2023] [Indexed: 10/12/2023] Open
Abstract
Background Aberrant protein kinase regulation leading to abnormal substrate phosphorylation is associated with several human diseases. Despite the promise of therapies targeting kinases, many human kinases remain understudied. Most existing computational tools predicting phosphorylation cover less than 50% of known human kinases. They utilize local feature selection based on protein sequences, motifs, domains, structures, and/or functions, and do not consider the heterogeneous relationships of the proteins. In this work, we present KSFinder, a tool that predicts kinase-substrate links by capturing the inherent association of proteins in a network comprising 85% of the known human kinases. We also postulate the potential role of two understudied kinases based on their substrate predictions from KSFinder. Methods KSFinder learns the semantic relationships in a phosphoproteome knowledge graph using a knowledge graph embedding algorithm and represents the nodes in low-dimensional vectors. A multilayer perceptron (MLP) classifier is trained to discern kinase-substrate links using the embedded vectors. KSFinder uses a strategic negative generation approach that eliminates biases in entity representation and combines data from experimentally validated non-interacting protein pairs, proteins from different subcellular locations, and random sampling. We assess KSFinder's generalization capability on four different datasets and compare its performance with other state-of-the-art prediction models. We employ KSFinder to predict substrates of 68 "dark" kinases considered understudied by the Illuminating the Druggable Genome program and use our text-mining tool, RLIMS-P along with manual curation, to search for literature evidence for the predictions. In a case study, we performed functional enrichment analysis for two dark kinases - HIPK3 and CAMKK1 using their predicted substrates. Results KSFinder shows improved performance over other kinase-substrate prediction models and generalized prediction ability on different datasets. We identified literature evidence for 17 novel predictions involving an understudied kinase. All of these 17 predictions had a probability score ≥0.7 (nine at >0.9, six at 0.8-0.9, and two at 0.7-0.8). The evaluation of 93,593 negative predictions (probability ≤0.3) identified four false negatives. The top enriched biological processes of HIPK3 substrates relate to the regulation of extracellular matrix and epigenetic gene expression, while CAMKK1 substrates include lipid storage regulation and glucose homeostasis. Conclusions KSFinder outperforms the current kinase-substrate prediction tools with higher kinase coverage. The strategically developed negatives provide a superior generalization ability for KSFinder. We predicted substrates of 432 kinases, 68 of which are understudied, and hypothesized the potential functions of two dark kinases using their predicted substrates.
Collapse
Affiliation(s)
- Manju Anandakrishnan
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
| | - Karen E. Ross
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, United States of America
| | - Chuming Chen
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
| | - Vijay Shanker
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
| | - Julie Cowart
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
| | - Cathy H. Wu
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, United States of America
| |
Collapse
|
6
|
Rainey C, Villikudathil AT, McConnell J, Hughes C, Bond R, McFadden S. An experimental machine learning study investigating the decision-making process of students and qualified radiographers when interpreting radiographic images. PLOS DIGITAL HEALTH 2023; 2:e0000229. [PMID: 37878569 PMCID: PMC10599497 DOI: 10.1371/journal.pdig.0000229] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 07/29/2023] [Indexed: 10/27/2023]
Abstract
AI is becoming more prevalent in healthcare and is predicted to be further integrated into workflows to ease the pressure on an already stretched service. The National Health Service in the UK has prioritised AI and Digital health as part of its Long-Term Plan. Few studies have examined the human interaction with such systems in healthcare, despite reports of biases being present with the use of AI in other technologically advanced fields, such as finance and aviation. Understanding is needed of how certain user characteristics may impact how radiographers engage with AI systems in use in the clinical setting to mitigate against problems before they arise. The aim of this study is to determine correlations of skills, confidence in AI and perceived knowledge amongst student and qualified radiographers in the UK healthcare system. A machine learning based AI model was built to predict if the interpreter was either a student (n = 67) or a qualified radiographer (n = 39) in advance, using important variables from a feature selection technique named Boruta. A survey, which required the participant to interpret a series of plain radiographic examinations with and without AI assistance, was created on the Qualtrics survey platform and promoted via social media (Twitter/LinkedIn), therefore adopting convenience, snowball sampling This survey was open to all UK radiographers, including students and retired radiographers. Pearson's correlation analysis revealed that males who were proficient in their profession were more likely than females to trust AI. Trust in AI was negatively correlated with age and with level of experience. A machine learning model was built, the best model predicted the image interpreter to be qualified radiographers with 0.93 area under curve and a prediction accuracy of 93%. Further testing in prospective validation cohorts using a larger sample size is required to determine the clinical utility of the proposed machine learning model.
Collapse
Affiliation(s)
- Clare Rainey
- Faculty of Life and Health Sciences, School of Health Sciences, Ulster University, York Street, Belfast, Northern Ireland, United Kingdom
| | - Angelina T. Villikudathil
- Faculty of Life and Health Sciences, School of Health Sciences, Ulster University, York Street, Belfast, Northern Ireland, United Kingdom
| | | | - Ciara Hughes
- Faculty of Life and Health Sciences, School of Health Sciences, Ulster University, York Street, Belfast, Northern Ireland, United Kingdom
| | - Raymond Bond
- Faculty of Computing, School of Computing, Engineering and the Built Environment, Ulster University, York Street, Belfast, Northern Ireland, United Kingdom
| | - Sonyia McFadden
- Faculty of Life and Health Sciences, School of Health Sciences, Ulster University, York Street, Belfast, Northern Ireland, United Kingdom
| |
Collapse
|
7
|
Zhang G, Tang Q, Feng P, Chen W. IPs-GRUAtt: An attention-based bidirectional gated recurrent unit network for predicting phosphorylation sites of SARS-CoV-2 infection. MOLECULAR THERAPY. NUCLEIC ACIDS 2023; 32:28-35. [PMID: 36908648 PMCID: PMC9968446 DOI: 10.1016/j.omtn.2023.02.027] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 02/22/2023] [Indexed: 02/27/2023]
Abstract
The global pandemic of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection has generated tremendous concern and poses a serious threat to international public health. Phosphorylation is a common post-translational modification affecting many essential cellular processes and is inextricably linked to SARS-CoV-2 infection. Hence, accurate identification of phosphorylation sites will be helpful to understand the mechanisms of SARS-CoV-2 infection and mitigate the ongoing COVID-19 pandemic. In the present study, an attention-based bidirectional gated recurrent unit network, called IPs-GRUAtt, was proposed to identify phosphorylation sites in SARS-CoV-2-infected host cells. Comparative results demonstrated that IPs-GRUAtt surpassed both state-of-the-art machine-learning methods and existing models for identifying phosphorylation sites. Moreover, the attention mechanism made IPs-GRUAtt able to extract the key features from protein sequences. These results demonstrated that the IPs-GRUAtt is a powerful tool for identifying phosphorylation sites. For facilitating its academic use, a freely available online web server for IPs-GRUAtt is provided at http://cbcb.cdutcm.edu.cn/phosphory/.
Collapse
Affiliation(s)
- Guiyang Zhang
- State Key Laboratory of Southwestern Chinese Medicine Resources, Innovative Chengdu University of Traditional Chinese Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Qiang Tang
- State Key Laboratory of Southwestern Chinese Medicine Resources, Innovative Chengdu University of Traditional Chinese Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Pengmian Feng
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Basic Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Wei Chen
- State Key Laboratory of Southwestern Chinese Medicine Resources, Innovative Chengdu University of Traditional Chinese Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.,State Key Laboratory of Southwestern Chinese Medicine Resources, School of Basic Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| |
Collapse
|
8
|
Park JW, Tyl MD, Cristea IM. Orchestration of Mitochondrial Function and Remodeling by Post-Translational Modifications Provide Insight into Mechanisms of Viral Infection. Biomolecules 2023; 13:biom13050869. [PMID: 37238738 DOI: 10.3390/biom13050869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 05/17/2023] [Accepted: 05/18/2023] [Indexed: 05/28/2023] Open
Abstract
The regulation of mitochondria structure and function is at the core of numerous viral infections. Acting in support of the host or of virus replication, mitochondria regulation facilitates control of energy metabolism, apoptosis, and immune signaling. Accumulating studies have pointed to post-translational modification (PTM) of mitochondrial proteins as a critical component of such regulatory mechanisms. Mitochondrial PTMs have been implicated in the pathology of several diseases and emerging evidence is starting to highlight essential roles in the context of viral infections. Here, we provide an overview of the growing arsenal of PTMs decorating mitochondrial proteins and their possible contribution to the infection-induced modulation of bioenergetics, apoptosis, and immune responses. We further consider links between PTM changes and mitochondrial structure remodeling, as well as the enzymatic and non-enzymatic mechanisms underlying mitochondrial PTM regulation. Finally, we highlight some of the methods, including mass spectrometry-based analyses, available for the identification, prioritization, and mechanistic interrogation of PTMs.
Collapse
Affiliation(s)
- Ji Woo Park
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Washington Road, Princeton, NJ 08544, USA
| | - Matthew D Tyl
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Washington Road, Princeton, NJ 08544, USA
| | - Ileana M Cristea
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Washington Road, Princeton, NJ 08544, USA
| |
Collapse
|
9
|
Ahmed F, Dehzangi I, Hasan MM, Shatabda S. Accurately predicting microbial phosphorylation sites using evolutionary and structural features. Gene 2023; 851:146993. [DOI: 10.1016/j.gene.2022.146993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 10/05/2022] [Accepted: 10/14/2022] [Indexed: 11/27/2022]
|
10
|
Newcombe EA, Delaforge E, Hartmann-Petersen R, Skriver K, Kragelund BB. How phosphorylation impacts intrinsically disordered proteins and their function. Essays Biochem 2022; 66:901-913. [PMID: 36350035 PMCID: PMC9760426 DOI: 10.1042/ebc20220060] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/14/2022] [Accepted: 10/17/2022] [Indexed: 11/10/2022]
Abstract
Phosphorylation is the most common post-translational modification (PTM) in eukaryotes, occurring particularly frequently in intrinsically disordered proteins (IDPs). These proteins are highly flexible and dynamic by nature. Thus, it is intriguing that the addition of a single phosphoryl group to a disordered chain can impact its function so dramatically. Furthermore, as many IDPs carry multiple phosphorylation sites, the number of possible states increases, enabling larger complexities and novel mechanisms. Although a chemically simple and well-understood process, the impact of phosphorylation on the conformational ensemble and molecular function of IDPs, not to mention biological output, is highly complex and diverse. Since the discovery of the first phosphorylation site in proteins 75 years ago, we have come to a much better understanding of how this PTM works, but with the diversity of IDPs and their capacity for carrying multiple phosphoryl groups, the complexity grows. In this Essay, we highlight some of the basic effects of IDP phosphorylation, allowing it to serve as starting point when embarking on studies into this topic. We further describe how recent complex cases of multisite phosphorylation of IDPs have been instrumental in widening our view on the effect of protein phosphorylation. Finally, we put forward perspectives on the phosphorylation of IDPs, both in relation to disease and in context of other PTMs; areas where deep insight remains to be uncovered.
Collapse
Affiliation(s)
- Estella A Newcombe
- REPIN, Department of Biology, University of Copenhagen, Ole Maaløes vej 5, DK-2200 Copenhagen N, Denmark
- Department of Biology, Linderstrøm-Lang Centre for Protein Science, University of Copenhagen, Ole Maaløes vej 5, DK-2200 Copenhagen N, Denmark
- The Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes vej 5, DK-2200 Copenhagen N, Denmark
| | - Elise Delaforge
- REPIN, Department of Biology, University of Copenhagen, Ole Maaløes vej 5, DK-2200 Copenhagen N, Denmark
- Department of Biology, Linderstrøm-Lang Centre for Protein Science, University of Copenhagen, Ole Maaløes vej 5, DK-2200 Copenhagen N, Denmark
- The Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes vej 5, DK-2200 Copenhagen N, Denmark
| | - Rasmus Hartmann-Petersen
- REPIN, Department of Biology, University of Copenhagen, Ole Maaløes vej 5, DK-2200 Copenhagen N, Denmark
- Department of Biology, Linderstrøm-Lang Centre for Protein Science, University of Copenhagen, Ole Maaløes vej 5, DK-2200 Copenhagen N, Denmark
| | - Karen Skriver
- REPIN, Department of Biology, University of Copenhagen, Ole Maaløes vej 5, DK-2200 Copenhagen N, Denmark
- Department of Biology, Linderstrøm-Lang Centre for Protein Science, University of Copenhagen, Ole Maaløes vej 5, DK-2200 Copenhagen N, Denmark
| | - Birthe B Kragelund
- REPIN, Department of Biology, University of Copenhagen, Ole Maaløes vej 5, DK-2200 Copenhagen N, Denmark
- Department of Biology, Linderstrøm-Lang Centre for Protein Science, University of Copenhagen, Ole Maaløes vej 5, DK-2200 Copenhagen N, Denmark
- The Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes vej 5, DK-2200 Copenhagen N, Denmark
| |
Collapse
|
11
|
Weigle AT, Feng J, Shukla D. Thirty years of molecular dynamics simulations on posttranslational modifications of proteins. Phys Chem Chem Phys 2022; 24:26371-26397. [PMID: 36285789 PMCID: PMC9704509 DOI: 10.1039/d2cp02883b] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
Posttranslational modifications (PTMs) are an integral component to how cells respond to perturbation. While experimental advances have enabled improved PTM identification capabilities, the same throughput for characterizing how structural changes caused by PTMs equate to altered physiological function has not been maintained. In this Perspective, we cover the history of computational modeling and molecular dynamics simulations which have characterized the structural implications of PTMs. We distinguish results from different molecular dynamics studies based upon the timescales simulated and analysis approaches used for PTM characterization. Lastly, we offer insights into how opportunities for modern research efforts on in silico PTM characterization may proceed given current state-of-the-art computing capabilities and methodological advancements.
Collapse
Affiliation(s)
- Austin T Weigle
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Jiangyan Feng
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA.
| |
Collapse
|
12
|
Liu S, Cui C, Chen H, Liu T. Ensemble learning-based feature selection for phosphorylation site detection. Front Genet 2022; 13:984068. [PMID: 36338976 PMCID: PMC9634105 DOI: 10.3389/fgene.2022.984068] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 10/05/2022] [Indexed: 11/18/2022] Open
Abstract
SARS-COV-2 is prevalent all over the world, causing more than six million deaths and seriously affecting human health. At present, there is no specific drug against SARS-COV-2. Protein phosphorylation is an important way to understand the mechanism of SARS -COV-2 infection. It is often expensive and time-consuming to identify phosphorylation sites with specific modified residues through experiments. A method that uses machine learning to make predictions about them is proposed. As all the methods of extracting protein sequence features are knowledge-driven, these features may not be effective for detecting phosphorylation sites without a complete understanding of the mechanism of protein. Moreover, redundant features also have a great impact on the fitting degree of the model. To solve these problems, we propose a feature selection method based on ensemble learning, which firstly extracts protein sequence features based on knowledge, then quantifies the importance score of each feature based on data, and finally uses the subset of important features as the final features to predict phosphorylation sites.
Collapse
Affiliation(s)
- Songbo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chengmin Cui
- Beijing Institute of Control Engineering, China Academy of Space Technology, Beijing, China
| | - Huipeng Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tong Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
13
|
Malik A, Mahajan N, Dar TA, Kim CB. C10Pred: A First Machine Learning Based Tool to Predict C10 Family Cysteine Peptidases Using Sequence-Derived Features. Int J Mol Sci 2022; 23:ijms23179518. [PMID: 36076915 PMCID: PMC9455582 DOI: 10.3390/ijms23179518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/17/2022] [Accepted: 08/20/2022] [Indexed: 12/02/2022] Open
Abstract
Streptococcus pyogenes, or group A Streptococcus (GAS), a gram-positive bacterium, is implicated in a wide range of clinical manifestations and life-threatening diseases. One of the key virulence factors of GAS is streptopain, a C10 family cysteine peptidase. Since its discovery, various homologs of streptopain have been reported from other bacterial species. With the increased affordability of sequencing, a significant increase in the number of potential C10 family-like sequences in the public databases is anticipated, posing a challenge in classifying such sequences. Sequence-similarity-based tools are the methods of choice to identify such streptopain-like sequences. However, these methods depend on some level of sequence similarity between the existing C10 family and the target sequences. Therefore, in this work, we propose a novel predictor, C10Pred, for the prediction of C10 peptidases using sequence-derived optimal features. C10Pred is a support vector machine (SVM) based model which is efficient in predicting C10 enzymes with an overall accuracy of 92.7% and Matthews’ correlation coefficient (MCC) value of 0.855 when tested on an independent dataset. We anticipate that C10Pred will serve as a handy tool to classify novel streptopain-like proteins belonging to the C10 family and offer essential information.
Collapse
Affiliation(s)
- Adeel Malik
- Institute of Intelligence Informatics Technology, Sangmyung University, Seoul 03016, Korea
- Correspondence: (A.M.); (C.-B.K.)
| | - Nitin Mahajan
- Department of Pediatrics, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Tanveer Ali Dar
- Department of Clinical Biochemistry, University of Kashmir, Srinagar 190006, India
| | - Chang-Bae Kim
- Department of Biotechnology, Sangmyung University, Seoul 03016, Korea
- Correspondence: (A.M.); (C.-B.K.)
| |
Collapse
|
14
|
Ma R, Li S, Li W, Yao L, Huang HD, Lee TY. KinasePhos 3.0: Redesign and Expansion of the Prediction on Kinase-specific Phosphorylation Sites. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00081-X. [PMID: 35781048 PMCID: PMC10373160 DOI: 10.1016/j.gpb.2022.06.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Revised: 05/30/2022] [Accepted: 06/27/2022] [Indexed: 06/04/2023]
Abstract
The purpose of this work is to enhance KinasePhos, a machine learning-based kinase-specific phosphorylation site prediction tool. Experimentally verified kinase-specific phosphorylation data were collected from PhosphoSitePlus, UniProtKB, the Group-based Prediction System 5.0, and Phospho.ELM. In total, 41,421 experimentally verified kinase-specific phosphorylation sites were identified. A total of 1380 unique kinases were identified, including 753 with existing classification information from KinBase and the remaining 627 annotated by building a phylogenetic tree. Based on this kinase classification, a total of 771 predictive models were built at the individual, family, and group levels, using at least 15 experimentally verified substrate sites in positive training datasets. The improved models demonstrated their effectiveness compared with other prediction tools. For example, the prediction of sites phosphorylated by the protein kinase B, casein kinase 2, and protein kinase A families had accuracies of 94.5%, 92.5%, and 90.0%, respectively. The average prediction accuracy for all 771 models was 87.2%. For enhancing interpretability, the SHapley Additive exPlanations (SHAP) method was employed to assess feature importance. The web interface of KinasePhos 3.0 has been redesigned to provide comprehensive annotations of kinase-specific phosphorylation sites on multiple proteins. Additionally, considering the large scale of phosphoproteomic data, a downloadable prediction tool is available at https://awi.cuhk.edu.cn/KinasePhos/download.html or https://github.com/tom-209/KinasePhos-3.0-executable-file.
Collapse
Affiliation(s)
- Renfei Ma
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China; School of Life Sciences, University of Science and Technology of China, Hefei 230027, China
| | - Shangfu Li
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Wenshuo Li
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Lantian Yao
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Hsien-Da Huang
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China; School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China.
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China; School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China.
| |
Collapse
|
15
|
Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2499:285-322. [PMID: 35696087 DOI: 10.1007/978-1-0716-2317-6_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Posttranslational modification (PTM ) is a ubiquitous phenomenon in both eukaryotes and prokaryotes which gives rise to enormous proteomic diversity. PTM mostly comes in two flavors: covalent modification to polypeptide chain and proteolytic cleavage. Understanding and characterization of PTM is a fundamental step toward understanding the underpinning of biology. Recent advances in experimental approaches, mainly mass-spectrometry-based approaches, have immensely helped in obtaining and characterizing PTMs. However, experimental approaches are not enough to understand and characterize more than 450 different types of PTMs and complementary computational approaches are becoming popular. Recently, due to the various advancements in the field of Deep Learning (DL), along with the explosion of applications of DL to various fields, the field of computational prediction of PTM has also witnessed the development of a plethora of deep learning (DL)-based approaches. In this book chapter, we first review some recent DL-based approaches in the field of PTM site prediction. In addition, we also review the recent advances in the not-so-studied PTM , that is, proteolytic cleavage predictions. We describe advances in PTM prediction by highlighting the Deep learning architecture, feature encoding, novelty of the approaches, and availability of the tools/approaches. Finally, we provide an outlook and possible future research directions for DL-based approaches for PTM prediction.
Collapse
|
16
|
Shining Light on Protein Kinase Biomarkers with Fluorescent Peptide Biosensors. Life (Basel) 2022; 12:life12040516. [PMID: 35455007 PMCID: PMC9026840 DOI: 10.3390/life12040516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 03/21/2022] [Accepted: 03/28/2022] [Indexed: 11/23/2022] Open
Abstract
Protein kinases (PKs) are established gameplayers in biological signalling pathways, and a large body of evidence points to their dysregulation in diseases, in particular cancer, where rewiring of PK networks occurs frequently. Fluorescent biosensors constitute attractive tools for probing biomolecules and monitoring dynamic processes in complex samples. A wide variety of genetically encoded and synthetic biosensors have been tailored to report on PK activities over the last decade, enabling interrogation of their function and insight into their behaviour in physiopathological settings. These optical tools can further be used to highlight enzymatic alterations associated with the disease, thereby providing precious functional information which cannot be obtained through conventional genetic, transcriptomic or proteomic approaches. This review focuses on fluorescent peptide biosensors, recent developments and strategies that make them attractive tools to profile PK activities for biomedical and diagnostic purposes, as well as insights into the challenges and opportunities brought by this unique toolbox of chemical probes.
Collapse
|
17
|
Guo X, He H, Yu J, Shi S. PKSPS: a novel method for predicting kinase of specific phosphorylation sites based on maximum weighted bipartite matching algorithm and phosphorylation sequence enrichment analysis. Brief Bioinform 2021; 23:6398688. [PMID: 34661630 DOI: 10.1093/bib/bbab436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 09/10/2021] [Accepted: 09/21/2021] [Indexed: 11/14/2022] Open
Abstract
With the development of biotechnology, a large number of phosphorylation sites have been experimentally confirmed and collected, but only a few of them have kinase annotations. Since experimental methods to detect kinases at specific phosphorylation sites are expensive and accidental, some computational methods have been proposed to predict the kinase of these sites, but most methods only consider single sequence information or single functional network information. In this study, a new method Predicting Kinase of Specific Phosphorylation Sites (PKSPS) is developed to predict kinases of specific phosphorylation sites in human proteins by combining PKSPS-Net with PKSPS-Seq, which considers protein-protein interaction (PPI) network information and sequence information. For PKSPS-Net, kinase-kinase and substrate-substrate similarity are quantified based on the topological similarity of proteins in the PPI network, and maximum weighted bipartite matching algorithm is proposed to predict kinase-substrate relationship. In PKSPS-Seq, phosphorylation sequence enrichment analysis is used to analyze the similarity of local sequences around phosphorylation sites and predict the kinase of specific phosphorylation sites (KSP). PKSPS has been proved to be more effective than the PKSPS-Net or PKSPS-Seq on different sets of kinases. Further comparison results show that the PKSPS method performs better than existing methods. Finally, the case study demonstrates the effectiveness of the PKSPS in predicting kinases of specific phosphorylation sites. The open source code and data of the PKSPS can be obtained from https://github.com/guoxinyunncu/PKSPS.
Collapse
Affiliation(s)
- Xinyun Guo
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| | - Huan He
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| | - Jialin Yu
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| | - Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| |
Collapse
|
18
|
The many ways that nature has exploited the unusual structural and chemical properties of phosphohistidine for use in proteins. Biochem J 2021; 478:3575-3596. [PMID: 34624072 DOI: 10.1042/bcj20210533] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 09/15/2021] [Accepted: 09/22/2021] [Indexed: 01/12/2023]
Abstract
Histidine phosphorylation is an important and ubiquitous post-translational modification. Histidine undergoes phosphorylation on either of the nitrogens in its imidazole side chain, giving rise to 1- and 3- phosphohistidine (pHis) isomers, each having a phosphoramidate linkage that is labile at high temperatures and low pH, in contrast with stable phosphomonoester protein modifications. While all organisms routinely use pHis as an enzyme intermediate, prokaryotes, lower eukaryotes and plants also use it for signal transduction. However, research to uncover additional roles for pHis in higher eukaryotes is still at a nascent stage. Since the discovery of pHis in 1962, progress in this field has been relatively slow, in part due to a lack of the tools and techniques necessary to study this labile modification. However, in the past ten years the development of phosphoproteomic techniques to detect phosphohistidine (pHis), and methods to synthesize stable pHis analogues, which enabled the development of anti-phosphohistidine (pHis) antibodies, have accelerated our understanding. Recent studies that employed anti-pHis antibodies and other advanced techniques have contributed to a rapid expansion in our knowledge of histidine phosphorylation. In this review, we examine the varied roles of pHis-containing proteins from a chemical and structural perspective, and present an overview of recent developments in pHis proteomics and antibody development.
Collapse
|
19
|
Dwivedy A, Mariadasse R, Ahmad M, Chakraborty S, Kar D, Tiwari S, Bhattacharyya S, Sonar S, Mani S, Tailor P, Majumdar T, Jeyakanthan J, Biswal BK. Characterization of the NiRAN domain from RNA-dependent RNA polymerase provides insights into a potential therapeutic target against SARS-CoV-2. PLoS Comput Biol 2021; 17:e1009384. [PMID: 34516563 PMCID: PMC8478224 DOI: 10.1371/journal.pcbi.1009384] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 09/28/2021] [Accepted: 08/26/2021] [Indexed: 12/14/2022] Open
Abstract
Apart from the canonical fingers, palm and thumb domains, the RNA dependent RNA polymerases (RdRp) from the viral order Nidovirales possess two additional domains. Of these, the function of the Nidovirus RdRp associated nucleotidyl transferase domain (NiRAN) remains unanswered. The elucidation of the 3D structure of RdRp from the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), provided the first ever insights into the domain organisation and possible functional characteristics of the NiRAN domain. Using in silico tools, we predict that the NiRAN domain assumes a kinase or phosphotransferase like fold and binds nucleoside triphosphates at its proposed active site. Additionally, using molecular docking we have predicted the binding of three widely used kinase inhibitors and five well characterized anti-microbial compounds at the NiRAN domain active site along with their drug-likeliness. For the first time ever, using basic biochemical tools, this study shows the presence of a kinase like activity exhibited by the SARS-CoV-2 RdRp. Interestingly, a well-known kinase inhibitor- Sorafenib showed a significant inhibition and dampened viral load in SARS-CoV-2 infected cells. In line with the current global COVID-19 pandemic urgency and the emergence of newer strains with significantly higher infectivity, this study provides a new anti-SARS-CoV-2 drug target and potential lead compounds for drug repurposing against SARS-CoV-2. The on-going coronavirus disease 2019 (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) is significantly affecting the world health. Unfortunately, over 180 million cases of COVID-19 resulting in nearly 4 million deaths have been reported till June, 2021. In this study, using a combination of bioinformatics, biochemical and mass spectrometry methods, we show that the Nidovirus RdRp associated Nucleotidyl transferase (NiRAN) domain of the RNA-dependent RNA polymerase (RdRp) of SARS-CoV-2 exhibits a kinase like activity. Additionally, we also show that few broad spectrum anti-cancer and anti-microbial drugs dampen this kinase like activity. Of note, Sorafenib, an FDA approved anti-cancer kinase inhibiting drug significantly reduces the SARS-CoV-2 load in cell lines. Our study suggests that NiRAN domain of the SARS-CoV-2 RdRp is indispensible for the successful viral life cycle and shows that abolishing this enzymatic function of RdRp by small molecule inhibitors may open novel avenues for COVID-19 therapeutics.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Sudipta Sonar
- Translational Health Science and Technology Institute, Faridabad, India
| | - Shailendra Mani
- Translational Health Science and Technology Institute, Faridabad, India
| | | | - Tanmay Majumdar
- National Institute of Immunology, New Delhi, India
- * E-mail: (TM); (JJ); (BKB)
| | - Jeyaraman Jeyakanthan
- Department of Bioinformatics, Alagappa University, Tamil Nadu, India
- * E-mail: (TM); (JJ); (BKB)
| | | |
Collapse
|
20
|
Yang H, Wang M, Liu X, Zhao XM, Li A. PhosIDN: an integrated deep neural network for improving protein phosphorylation site prediction by combining sequence and protein-protein interaction information. Bioinformatics 2021; 37:4668-4676. [PMID: 34320631 PMCID: PMC8665744 DOI: 10.1093/bioinformatics/btab551] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 06/22/2021] [Accepted: 07/27/2021] [Indexed: 11/29/2022] Open
Abstract
Motivation Phosphorylation is one of the most studied post-translational modifications, which plays a pivotal role in various cellular processes. Recently, deep learning methods have achieved great success in prediction of phosphorylation sites, but most of them are based on convolutional neural network that may not capture enough information about long-range dependencies between residues in a protein sequence. In addition, existing deep learning methods only make use of sequence information for predicting phosphorylation sites, and it is highly desirable to develop a deep learning architecture that can combine heterogeneous sequence and protein–protein interaction (PPI) information for more accurate phosphorylation site prediction. Results We present a novel integrated deep neural network named PhosIDN, for phosphorylation site prediction by extracting and combining sequence and PPI information. In PhosIDN, a sequence feature encoding sub-network is proposed to capture not only local patterns but also long-range dependencies from protein sequences. Meanwhile, useful PPI features are also extracted in PhosIDN by a PPI feature encoding sub-network adopting a multi-layer deep neural network. Moreover, to effectively combine sequence and PPI information, a heterogeneous feature combination sub-network is introduced to fully exploit the complex associations between sequence and PPI features, and their combined features are used for final prediction. Comprehensive experiment results demonstrate that the proposed PhosIDN significantly improves the prediction performance of phosphorylation sites and compares favorably with existing general and kinase-specific phosphorylation site prediction methods. Availability and implementation PhosIDN is freely available at https://github.com/ustchangyuanyang/PhosIDN. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hangyuan Yang
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH230027, China
| | - Xia Liu
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence and Frontiers Center for Brain Science, China.,Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230027, China.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH230027, China
| |
Collapse
|
21
|
Thapa N, Chaudhari M, Iannetta AA, White C, Roy K, Newman RH, Hicks LM, Kc DB. A deep learning based approach for prediction of Chlamydomonas reinhardtii phosphorylation sites. Sci Rep 2021; 11:12550. [PMID: 34131195 PMCID: PMC8206365 DOI: 10.1038/s41598-021-91840-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 05/28/2021] [Indexed: 11/23/2022] Open
Abstract
Protein phosphorylation, which is one of the most important post-translational modifications (PTMs), is involved in regulating myriad cellular processes. Herein, we present a novel deep learning based approach for organism-specific protein phosphorylation site prediction in Chlamydomonas reinhardtii, a model algal phototroph. An ensemble model combining convolutional neural networks and long short-term memory (LSTM) achieves the best performance in predicting phosphorylation sites in C. reinhardtii. Deemed Chlamy-EnPhosSite, the measured best AUC and MCC are 0.90 and 0.64 respectively for a combined dataset of serine (S) and threonine (T) in independent testing higher than those measures for other predictors. When applied to the entire C. reinhardtii proteome (totaling 1,809,304 S and T sites), Chlamy-EnPhosSite yielded 499,411 phosphorylated sites with a cut-off value of 0.5 and 237,949 phosphorylated sites with a cut-off value of 0.7. These predictions were compared to an experimental dataset of phosphosites identified by liquid chromatography-tandem mass spectrometry (LC–MS/MS) in a blinded study and approximately 89.69% of 2,663 C. reinhardtii S and T phosphorylation sites were successfully predicted by Chlamy-EnPhosSite at a probability cut-off of 0.5 and 76.83% of sites were successfully identified at a more stringent 0.7 cut-off. Interestingly, Chlamy-EnPhosSite also successfully predicted experimentally confirmed phosphorylation sites in a protein sequence (e.g., RPS6 S245) which did not appear in the training dataset, highlighting prediction accuracy and the power of leveraging predictions to identify biologically relevant PTM sites. These results demonstrate that our method represents a robust and complementary technique for high-throughput phosphorylation site prediction in C. reinhardtii. It has potential to serve as a useful tool to the community. Chlamy-EnPhosSite will contribute to the understanding of how protein phosphorylation influences various biological processes in this important model microalga.
Collapse
Affiliation(s)
- Niraj Thapa
- Department of Computational Data Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Meenal Chaudhari
- Department of Computational Data Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Anthony A Iannetta
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Clarence White
- Department of Computational Data Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Kaushik Roy
- Department of Computer Science, North Carolina A&T State University, Greensboro, NC, USA
| | - Robert H Newman
- Department of Biology, North Carolina A&T State University, Greensboro, NC, USA
| | - Leslie M Hicks
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Dukka B Kc
- Electrical Engineering and Computer Science Department, Wichita State University, Wichita, KS, USA.
| |
Collapse
|
22
|
Jamal S, Ali W, Nagpal P, Grover A, Grover S. Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins. J Transl Med 2021; 19:218. [PMID: 34030700 PMCID: PMC8142496 DOI: 10.1186/s12967-021-02851-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 04/18/2021] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Post-translational modification (PTM) is a biological process that alters proteins and is therefore involved in the regulation of various cellular activities and pathogenesis. Protein phosphorylation is an essential process and one of the most-studied PTMs: it occurs when a phosphate group is added to serine (Ser, S), threonine (Thr, T), or tyrosine (Tyr, Y) residue. Dysregulation of protein phosphorylation can lead to various diseases-most commonly neurological disorders, Alzheimer's disease, and Parkinson's disease-thus necessitating the prediction of S/T/Y residues that can be phosphorylated in an uncharacterized amino acid sequence. Despite a surplus of sequencing data, current experimental methods of PTM prediction are time-consuming, costly, and error-prone, so a number of computational methods have been proposed to replace them. However, phosphorylation prediction remains limited, owing to substrate specificity, performance, and the diversity of its features. METHODS In the present study we propose machine-learning-based predictors that use the physicochemical, sequence, structural, and functional information of proteins to classify S/T/Y phosphorylation sites. Rigorous feature selection, the minimum redundancy/maximum relevance approach, and the symmetrical uncertainty method were employed to extract the most informative features to train the models. RESULTS The RF and SVM models generated using diverse feature types in the present study were highly accurate as is evident from good values for different statistical measures. Moreover, independent test sets and benchmark validations indicated that the proposed method clearly outperformed the existing methods, demonstrating its ability to accurately predict protein phosphorylation. CONCLUSIONS The results obtained in the present work indicate that the proposed computational methodology can be effectively used for predicting putative phosphorylation sites further facilitating discovery of various biological processes mechanisms.
Collapse
Affiliation(s)
- Salma Jamal
- JH-Institute of Molecular Medicine, Jamia Hamdard, New Delhi, India
| | - Waseem Ali
- JH-Institute of Molecular Medicine, Jamia Hamdard, New Delhi, India
| | - Priya Nagpal
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, India
| | - Abhinav Grover
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, India.
| | - Sonam Grover
- JH-Institute of Molecular Medicine, Jamia Hamdard, New Delhi, India.
| |
Collapse
|
23
|
Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM. FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences. BMC Bioinformatics 2021; 22:205. [PMID: 33879057 PMCID: PMC8056616 DOI: 10.1186/s12859-021-04120-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 04/07/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Gene annotation in eukaryotes is a non-trivial task that requires meticulous analysis of accumulated transcript data. Challenges include transcriptionally active regions of the genome that contain overlapping genes, genes that produce numerous transcripts, transposable elements and numerous diverse sequence repeats. Currently available gene annotation software applications depend on pre-constructed full-length gene sequence assemblies which are not guaranteed to be error-free. The origins of these sequences are often uncertain, making it difficult to identify and rectify errors in them. This hinders the creation of an accurate and holistic representation of the transcriptomic landscape across multiple tissue types and experimental conditions. Therefore, to gauge the extent of diversity in gene structures, a comprehensive analysis of genome-wide expression data is imperative. RESULTS We present FINDER, a fully automated computational tool that optimizes the entire process of annotating genes and transcript structures. Unlike current state-of-the-art pipelines, FINDER automates the RNA-Seq pre-processing step by working directly with raw sequence reads and optimizes gene prediction from BRAKER2 by supplementing these reads with associated proteins. The FINDER pipeline (1) reports transcripts and recognizes genes that are expressed under specific conditions, (2) generates all possible alternatively spliced transcripts from expressed RNA-Seq data, (3) analyzes read coverage patterns to modify existing transcript models and create new ones, and (4) scores genes as high- or low-confidence based on the available evidence across multiple datasets. We demonstrate the ability of FINDER to automatically annotate a diverse pool of genomes from eight species. CONCLUSIONS FINDER takes a completely automated approach to annotate genes directly from raw expression data. It is capable of processing eukaryotic genomes of all sizes and requires no manual supervision-ideal for bench researchers with limited experience in handling computational tools.
Collapse
Affiliation(s)
- Sagnik Banerjee
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, 50011, USA
- Department of Statistics, Iowa State University, Ames, IA, 50011, USA
| | - Priyanka Bhandary
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, 50011, USA
- Department of Genetics, Developmental and Cell Biology, Iowa State University, Ames, IA, 50011, USA
| | - Margaret Woodhouse
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA
| | - Taner Z Sen
- Crop Improvement and Genetics Research Unit, USDA-Agricultural Research Service, Albany, CA, 94710, USA
| | - Roger P Wise
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, IA, 50011, USA
| | - Carson M Andorf
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA.
- Department of Computer Science, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
24
|
Ong JY, Bradley MC, Torres JZ. Phospho-regulation of mitotic spindle assembly. Cytoskeleton (Hoboken) 2020; 77:558-578. [PMID: 33280275 PMCID: PMC7898546 DOI: 10.1002/cm.21649] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 10/08/2020] [Accepted: 12/02/2020] [Indexed: 12/23/2022]
Abstract
The assembly of the bipolar mitotic spindle requires the careful orchestration of a myriad of enzyme activities like protein posttranslational modifications. Among these, phosphorylation has arisen as the principle mode for spatially and temporally activating the proteins involved in early mitotic spindle assembly processes. Here, we review key kinases, phosphatases, and phosphorylation events that regulate critical aspects of these processes. We highlight key phosphorylation substrates that are important for ensuring the fidelity of centriole duplication, centrosome maturation, and the establishment of the bipolar spindle. We also highlight techniques used to understand kinase-substrate relationships and to study phosphorylation events. We conclude with perspectives on the field of posttranslational modifications in early mitotic spindle assembly.
Collapse
Affiliation(s)
- Joseph Y Ong
- Department of Chemistry and Biochemistry, University of California, Los Angeles, California, USA
| | - Michelle C Bradley
- Department of Chemistry and Biochemistry, University of California, Los Angeles, California, USA
| | - Jorge Z Torres
- Department of Chemistry and Biochemistry, University of California, Los Angeles, California, USA.,Molecular Biology Institute, University of California, Los Angeles, California, USA.,Jonsson Comprehensive Cancer Center, University of California, Los Angeles, California, USA
| |
Collapse
|
25
|
Nováček V, McGauran G, Matallanas D, Vallejo Blanco A, Conca P, Muñoz E, Costabello L, Kanakaraj K, Nawaz Z, Walsh B, Mohamed SK, Vandenbussche PY, Ryan CJ, Kolch W, Fey D. Accurate prediction of kinase-substrate networks using knowledge graphs. PLoS Comput Biol 2020; 16:e1007578. [PMID: 33270624 PMCID: PMC7738173 DOI: 10.1371/journal.pcbi.1007578] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Revised: 12/15/2020] [Accepted: 08/10/2020] [Indexed: 12/19/2022] Open
Abstract
Phosphorylation of specific substrates by protein kinases is a key control mechanism for vital cell-fate decisions and other cellular processes. However, discovering specific kinase-substrate relationships is time-consuming and often rather serendipitous. Computational predictions alleviate these challenges, but the current approaches suffer from limitations like restricted kinome coverage and inaccuracy. They also typically utilise only local features without reflecting broader interaction context. To address these limitations, we have developed an alternative predictive model. It uses statistical relational learning on top of phosphorylation networks interpreted as knowledge graphs, a simple yet robust model for representing networked knowledge. Compared to a representative selection of six existing systems, our model has the highest kinome coverage and produces biologically valid high-confidence predictions not possible with the other tools. Specifically, we have experimentally validated predictions of previously unknown phosphorylations by the LATS1, AKT1, PKA and MST2 kinases in human. Thus, our tool is useful for focusing phosphoproteomic experiments, and facilitates the discovery of new phosphorylation reactions. Our model can be accessed publicly via an easy-to-use web interface (LinkPhinder). LinkPhinder is a new approach to prediction of protein signalling networks based on kinase-substrate relationships that outperforms existing approaches. Phosphorylation networks govern virtually all fundamental biochemical processes in cells, and thus have moved into the centre of interest in biology, medicine and drug development. Fundamentally different from current approaches, LinkPhinder is inherently network-based and makes use of the most recent AI developments. We represent existing phosphorylation data as knowledge graphs, a format for large-scale and robust knowledge representation. Training a link prediction model on such a structure leads to novel, biologically valid phosphorylation network predictions that cannot be made with competing tools. Thus our new conceptual approach can lead to establishing a new niche of AI applications in computational biology.
Collapse
Affiliation(s)
- Vít Nováček
- Data Science Institute, National University of Ireland Galway, Ireland
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
- * E-mail: (VN); (DF)
| | - Gavin McGauran
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland
| | - David Matallanas
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland
| | - Adrián Vallejo Blanco
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland
- Department of Oncology, Universidad de Navarra, Pamplona, Spain
| | | | - Emir Muñoz
- Data Science Institute, National University of Ireland Galway, Ireland
- Fujitsu Ireland Ltd., Co. Dublin, Ireland
| | | | | | - Zeeshan Nawaz
- Data Science Institute, National University of Ireland Galway, Ireland
| | - Brian Walsh
- Data Science Institute, National University of Ireland Galway, Ireland
| | - Sameh K. Mohamed
- Data Science Institute, National University of Ireland Galway, Ireland
| | | | - Colm J. Ryan
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland
| | - Walter Kolch
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland
- School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland
| | - Dirk Fey
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland
- School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland
- * E-mail: (VN); (DF)
| |
Collapse
|
26
|
Chen CW, Huang LY, Liao CF, Chang KP, Chu YW. GasPhos: Protein Phosphorylation Site Prediction Using a New Feature Selection Approach with a GA-Aided Ant Colony System. Int J Mol Sci 2020; 21:E7891. [PMID: 33114312 PMCID: PMC7660635 DOI: 10.3390/ijms21217891] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 10/20/2020] [Accepted: 10/20/2020] [Indexed: 02/06/2023] Open
Abstract
Protein phosphorylation is one of the most important post-translational modifications, and many biological processes are related to phosphorylation, such as DNA repair, transcriptional regulation and signal transduction and, therefore, abnormal regulation of phosphorylation usually causes diseases. If we can accurately predict human phosphorylation sites, this could help to solve human diseases. Therefore, we developed a kinase-specific phosphorylation prediction system, GasPhos, and proposed a new feature selection approach, called Gas, based on the ant colony system and a genetic algorithm and used performance evaluation strategies focused on different kinases to choose the best learning model. Gas uses the mean decrease Gini index (MDGI) as a heuristic value for path selection and adopts binary transformation strategies and new state transition rules. GasPhos can predict phosphorylation sites for six kinases and showed better performance than other phosphorylation prediction tools. The disease-related phosphorylated proteins that were predicted with GasPhos are also discussed. Finally, Gas can be applied to other issues that require feature selection, which could help to improve prediction performance. GasPhos is available at http://predictor.nchu.edu.tw/GasPhos.
Collapse
Affiliation(s)
- Chi-Wei Chen
- Department of Computer Science and Engineering, National Chung-Hsing University, Taichung City 402, Taiwan;
- Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung City 402, Taiwan; (L.-Y.H.); (C.-F.L.)
| | - Lan-Ying Huang
- Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung City 402, Taiwan; (L.-Y.H.); (C.-F.L.)
| | - Chia-Feng Liao
- Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung City 402, Taiwan; (L.-Y.H.); (C.-F.L.)
| | - Kai-Po Chang
- Ph.D. Program in Medical Biotechnology, National Chung Hsing University, Taichung City 402, Taiwan
- Department of Pathology, China Medical University Hospital, Taichung 404, Taiwan
| | - Yen-Wei Chu
- Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung City 402, Taiwan; (L.-Y.H.); (C.-F.L.)
- Institute of Molecular Biology, National Chung Hsing University, Taichung City 402, Taiwan
- Agricultural Biotechnology Center, National Chung Hsing University, Taichung City 402, Taiwan
- Biotechnology Center, National Chung Hsing University, Taichung City 402, Taiwan
- Program in Translational Medicine, National Chung Hsing University, Taichung City 402, Taiwan
- Rong Hsing Research Center for Translational Medicine, National Chung Hsing University, Taichung City 402, Taiwan
| |
Collapse
|
27
|
Ma H, Li G, Su Z. KSP: an integrated method for predicting catalyzing kinases of phosphorylation sites in proteins. BMC Genomics 2020; 21:537. [PMID: 32753030 PMCID: PMC7646512 DOI: 10.1186/s12864-020-06895-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 07/08/2020] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Protein phosphorylation by kinases plays crucial roles in various biological processes including signal transduction and tumorigenesis, thus a better understanding of protein phosphorylation events in cells is fundamental for studying protein functions and designing drugs to treat diseases caused by the malfunction of phosphorylation. Although a large number of phosphorylation sites in proteins have been identified using high-throughput phosphoproteomic technologies, their specific catalyzing kinases remain largely unknown. Therefore, computational methods are urgently needed to predict the kinases that catalyze the phosphorylation of these sites. RESULTS We developed KSP, a new algorithm for predicting catalyzing kinases for experimentally identified phosphorylation sites in human proteins. KSP constructs a network based on known protein-protein interactions and kinase-substrate relationships. Based on the network, it computes an affinity score between a phosphorylation site and kinases, and returns the top-ranked kinases of the score as candidate catalyzing kinases. When tested on known kinase-substrate pairs, KSP outperforms existing methods including NetworKIN, iGPS, and PKIS. CONCLUSIONS We developed a novel accurate tool for predicting catalyzing kinases of known phosphorylation sites. It can work as a complementary network approach for sequence-based phosphorylation site predictors.
Collapse
Affiliation(s)
- Hongli Ma
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China.,School of Mathematics, Shandong University, Jinan, 250100, China
| | - Guojun Li
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China. .,School of Mathematics, Shandong University, Jinan, 250100, China.
| | - Zhengchang Su
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
28
|
Luo F, Wang M, Liu Y, Zhao XM, Li A. DeepPhos: prediction of protein phosphorylation sites with deep learning. Bioinformatics 2020; 35:2766-2773. [PMID: 30601936 PMCID: PMC6691328 DOI: 10.1093/bioinformatics/bty1051] [Citation(s) in RCA: 105] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 11/19/2018] [Accepted: 12/12/2018] [Indexed: 11/28/2022] Open
Abstract
Motivation Phosphorylation is the most studied post-translational modification, which is crucial for multiple biological processes. Recently, many efforts have been taken to develop computational predictors for phosphorylation site prediction, but most of them are based on feature selection and discriminative classification. Thus, it is useful to develop a novel and highly accurate predictor that can unveil intricate patterns automatically for protein phosphorylation sites. Results In this study we present DeepPhos, a novel deep learning architecture for prediction of protein phosphorylation. Unlike multi-layer convolutional neural networks, DeepPhos consists of densely connected convolutional neuron network blocks which can capture multiple representations of sequences to make final phosphorylation prediction by intra block concatenation layers and inter block concatenation layers. DeepPhos can also be used for kinase-specific prediction varying from group, family, subfamily and individual kinase level. The experimental results demonstrated that DeepPhos outperforms competitive predictors in general and kinase-specific phosphorylation site prediction. Availability and implementation The source code of DeepPhos is publicly deposited at https://github.com/USTCHIlab/DeepPhos. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fenglin Luo
- School of Information Science and Technology
| | - Minghui Wang
- School of Information Science and Technology.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH, China
| | - Yu Liu
- School of Information Science and Technology
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Ao Li
- School of Information Science and Technology.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH, China
| |
Collapse
|
29
|
Meng C, Guo F, Zou Q. CWLy-SVM: A support vector machine-based tool for identifying cell wall lytic enzymes. Comput Biol Chem 2020; 87:107304. [PMID: 32580129 DOI: 10.1016/j.compbiolchem.2020.107304] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 06/07/2020] [Accepted: 06/08/2020] [Indexed: 12/21/2022]
Abstract
Cell wall lytic enzymes, as an important biotechnical tool in drug development, agriculture and the food industry, have attracted more research attention. In this research, the accurate identification of cell wall lytic enzymes is one of the key and fundamental tasks. In this study, in order to eliminate the inefficiency of in vitro experiments, a support vector machine-based cell wall lytic enzyme identification model was constructed using bioinformatics. This machine learning process includes feature extraction, feature selection, model training and optimization. According to the jackknife cross validation test, this model obtained a sensitivity of 0.853, a specificity of 0.977, an MCC of 0.845 and an AUC of 0.915. These benchmark results demonstrate that the proposed model outperforms the state-of-the-art method and that it has powerful cell wall lytic enzyme identification ability. Furthermore, we comprehensively analyzed the selected optimal features and used the proposed model to construct a user friendly web server called the CWLy-SVM to identify cell wall lytic enzymes, which is available at http://server.malab.cn/CWLy-SVM/index.jsp.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, China; College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
30
|
Chen Z, Zhao P, Li F, Leier A, Marquez-Lago TT, Webb GI, Baggag A, Bensmail H, Song J. PROSPECT: A web server for predicting protein histidine phosphorylation sites. J Bioinform Comput Biol 2020; 18:2050018. [DOI: 10.1142/s0219720020500183] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Background: Phosphorylation of histidine residues plays crucial roles in signaling pathways and cell metabolism in prokaryotes such as bacteria. While evidence has emerged that protein histidine phosphorylation also occurs in more complex organisms, its role in mammalian cells has remained largely uncharted. Thus, it is highly desirable to develop computational tools that are able to identify histidine phosphorylation sites. Result: Here, we introduce PROSPECT that enables fast and accurate prediction of proteome-wide histidine phosphorylation substrates and sites. Our tool is based on a hybrid method that integrates the outputs of two convolutional neural network (CNN)-based classifiers and a random forest-based classifier. Three features, including the one-of-K coding, enhanced grouped amino acids content (EGAAC) and composition of k-spaced amino acid group pairs (CKSAAGP) encoding, were taken as the input to three classifiers, respectively. Our results show that it is able to accurately predict histidine phosphorylation sites from sequence information. Our PROSPECT web server is user-friendly and publicly available at http://PROSPECT.erc.monash.edu/ . Conclusions: PROSPECT is superior than other pHis predictors in both the running speed and prediction accuracy and we anticipate that the PROSPECT webserver will become a popular tool for identifying the pHis sites in bacteria.
Collapse
Affiliation(s)
- Zhen Chen
- School of Basic Medical Science, Qingdao University, Qingdao, P. R. China
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese, Academy of Agricultural Sciences (CAAS), Anyang, P. R. China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese, Academy of Agricultural Sciences (CAAS), Anyang, P. R. China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, USA
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, USA
| | - Tatiana T. Marquez-Lago
- School of Basic Medical Science, Qingdao University, Qingdao, P. R. China
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, USA
| | - Geoffrey I. Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia
| | - Abdelkader Baggag
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Halima Bensmail
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
31
|
Deznabi I, Arabaci B, Koyutürk M, Tastan O. DeepKinZero: zero-shot learning for predicting kinase-phosphosite associations involving understudied kinases. Bioinformatics 2020; 36:3652-3661. [PMID: 32044914 PMCID: PMC7320620 DOI: 10.1093/bioinformatics/btaa013] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 12/17/2019] [Accepted: 01/06/2020] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Protein phosphorylation is a key regulator of protein function in signal transduction pathways. Kinases are the enzymes that catalyze the phosphorylation of other proteins in a target-specific manner. The dysregulation of phosphorylation is associated with many diseases including cancer. Although the advances in phosphoproteomics enable the identification of phosphosites at the proteome level, most of the phosphoproteome is still in the dark: more than 95% of the reported human phosphosites have no known kinases. Determining which kinase is responsible for phosphorylating a site remains an experimental challenge. Existing computational methods require several examples of known targets of a kinase to make accurate kinase-specific predictions, yet for a large body of kinases, only a few or no target sites are reported. RESULTS We present DeepKinZero, the first zero-shot learning approach to predict the kinase acting on a phosphosite for kinases with no known phosphosite information. DeepKinZero transfers knowledge from kinases with many known target phosphosites to those kinases with no known sites through a zero-shot learning model. The kinase-specific positional amino acid preferences are learned using a bidirectional recurrent neural network. We show that DeepKinZero achieves significant improvement in accuracy for kinases with no known phosphosites in comparison to the baseline model and other methods available. By expanding our knowledge on understudied kinases, DeepKinZero can help to chart the phosphoproteome atlas. AVAILABILITY AND IMPLEMENTATION The source codes are available at https://github.com/Tastanlab/DeepKinZero. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Iman Deznabi
- Computer Engineering Department, Bilkent University, Ankara 06800, Turkey
- College of Information and Computer Sciences, University of Massachusetts, Amherst, MA 01003, USA
| | - Busra Arabaci
- Computer Engineering Department, Bilkent University, Ankara 06800, Turkey
| | - Mehmet Koyutürk
- Department of Computer and Data Sciences
- Center for Proteomics & Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Oznur Tastan
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| |
Collapse
|
32
|
Rashid MM, Shatabda S, Hasan MM, Kurata H. Recent Development of Machine Learning Methods in Microbial Phosphorylation Sites. Curr Genomics 2020; 21:194-203. [PMID: 33071613 PMCID: PMC7521030 DOI: 10.2174/1389202921666200427210833] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 04/12/2020] [Accepted: 04/13/2020] [Indexed: 01/10/2023] Open
Abstract
A variety of protein post-translational modifications has been identified that control many cellular functions. Phosphorylation studies in mycobacterial organisms have shown critical importance in diverse biological processes, such as intercellular communication and cell division. Recent technical advances in high-precision mass spectrometry have determined a large number of microbial phosphorylated proteins and phosphorylation sites throughout the proteome analysis. Identification of phosphorylated proteins with specific modified residues through experimentation is often labor-intensive, costly and time-consuming. All these limitations could be overcome through the application of machine learning (ML) approaches. However, only a limited number of computational phosphorylation site prediction tools have been developed so far. This work aims to present a complete survey of the existing ML-predictors for microbial phosphorylation. We cover a variety of important aspects for developing a successful predictor, including operating ML algorithms, feature selection methods, window size, and software utility. Initially, we review the currently available phosphorylation site databases of the microbiome, the state-of-the-art ML approaches, working principles, and their performances. Lastly, we discuss the limitations and future directions of the computational ML methods for the prediction of phosphorylation.
Collapse
Affiliation(s)
| | | | - Md. Mehedi Hasan
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828;, E-mail: and Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| | - Hiroyuki Kurata
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828;, E-mail: and Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| |
Collapse
|
33
|
Song J, Wang Y, Li F, Akutsu T, Rawlings ND, Webb GI, Chou KC. iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief Bioinform 2020; 20:638-658. [PMID: 29897410 PMCID: PMC6556904 DOI: 10.1093/bib/bby028] [Citation(s) in RCA: 124] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Revised: 03/02/2018] [Indexed: 01/03/2023] Open
Abstract
Regulation of proteolysis plays a critical role in a myriad of important cellular processes. The key to better understanding the mechanisms that control this process is to identify the specific substrates that each protease targets. To address this, we have developed iProt-Sub, a powerful bioinformatics tool for the accurate prediction of protease-specific substrates and their cleavage sites. Importantly, iProt-Sub represents a significantly advanced version of its successful predecessor, PROSPER. It provides optimized cleavage site prediction models with better prediction performance and coverage for more species-specific proteases (4 major protease families and 38 different proteases). iProt-Sub integrates heterogeneous sequence and structural features and uses a two-step feature selection procedure to further remove redundant and irrelevant features in an effort to improve the cleavage site prediction accuracy. Features used by iProt-Sub are encoded by 11 different sequence encoding schemes, including local amino acid sequence profile, secondary structure, solvent accessibility and native disorder, which will allow a more accurate representation of the protease specificity of approximately 38 proteases and training of the prediction models. Benchmarking experiments using cross-validation and independent tests showed that iProt-Sub is able to achieve a better performance than several existing generic tools. We anticipate that iProt-Sub will be a powerful tool for proteome-wide prediction of protease-specific substrates and their cleavage sites, and will facilitate hypothesis-driven functional interrogation of protease-specific substrate cleavage and proteolytic events.
Collapse
Affiliation(s)
- Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia.,Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Yanan Wang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, 611-0011, Japan
| | - Neil D Rawlings
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA and Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
34
|
Zhao X, Jiao Q, Li H, Wu Y, Wang H, Huang S, Wang G. ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles. BMC Bioinformatics 2020; 21:43. [PMID: 32024464 PMCID: PMC7003361 DOI: 10.1186/s12859-020-3388-y] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 01/27/2020] [Indexed: 11/27/2022] Open
Abstract
Background Various methods for differential expression analysis have been widely used to identify features which best distinguish between different categories of samples. Multiple hypothesis testing may leave out explanatory features, each of which may be composed of individually insignificant variables. Multivariate hypothesis testing holds a non-mainstream position, considering the large computation overhead of large-scale matrix operation. Random forest provides a classification strategy for calculation of variable importance. However, it may be unsuitable for different distributions of samples. Results Based on the thought of using an ensemble classifier, we develop a feature selection tool for differential expression analysis on expression profiles (i.e., ECFS-DEA for short). Considering the differences in sample distribution, a graphical user interface is designed to allow the selection of different base classifiers. Inspired by random forest, a common measure which is applicable to any base classifier is proposed for calculation of variable importance. After an interactive selection of a feature on sorted individual variables, a projection heatmap is presented using k-means clustering. ROC curve is also provided, both of which can intuitively demonstrate the effectiveness of the selected feature. Conclusions Feature selection through ensemble classifiers helps to select important variables and thus is applicable for different sample distributions. Experiments on simulation and realistic data demonstrate the effectiveness of ECFS-DEA for differential expression analysis on expression profiles. The software is available at http://bio-nefu.com/resource/ecfs-dea.
Collapse
Affiliation(s)
- Xudong Zhao
- College of Information and Computer Engineering, Northeast Forestry University, No.26 Hexing Road, Harbin, 150040, China
| | - Qing Jiao
- College of Information and Computer Engineering, Northeast Forestry University, No.26 Hexing Road, Harbin, 150040, China
| | - Hangyu Li
- College of Information and Computer Engineering, Northeast Forestry University, No.26 Hexing Road, Harbin, 150040, China
| | - Yiming Wu
- College of Information and Computer Engineering, Northeast Forestry University, No.26 Hexing Road, Harbin, 150040, China
| | - Hanxu Wang
- College of Information and Computer Engineering, Northeast Forestry University, No.26 Hexing Road, Harbin, 150040, China
| | - Shan Huang
- Department of Neurology, The 2nd Affiliated Hospital of Harbin Medical University, No. 246 Xuefu Road, Harbin, 150086, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, No.26 Hexing Road, Harbin, 150040, China. .,State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, No.26 Hexing Road, Harbin, 150040, China.
| |
Collapse
|
35
|
Zhang Y, Xie R, Wang J, Leier A, Marquez-Lago TT, Akutsu T, Webb GI, Chou KC, Song J. Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework. Brief Bioinform 2019; 20:2185-2199. [PMID: 30351377 PMCID: PMC6954445 DOI: 10.1093/bib/bby079] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 07/28/2018] [Accepted: 08/01/2018] [Indexed: 11/15/2022] Open
Abstract
As a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required for constructing the necessary prediction models. Here, we review, analyze and compare 11 different feature encoding methods, with the goal of extracting key patterns and characteristics from residue sequences of Kmal sites. We identify optimized feature sets, with which four commonly used ML methods (random forest, support vector machines, K-nearest neighbor and logistic regression) and one recently proposed [Light Gradient Boosting Machine (LightGBM)] are trained on data from three species, namely, Escherichia coli, Mus musculus and Homo sapiens, and compared using randomized 10-fold cross-validation tests. We show that integration of the single method-based models through ensemble learning further improves the prediction performance and model robustness on the independent test. When compared to the existing state-of-the-art predictor, MaloPred, the optimal ensemble models were more accurate for all three species (AUC: 0.930, 0.923 and 0.944 for E. coli, M. musculus and H. sapiens, respectively). Using the ensemble models, we developed an accessible online predictor, kmal-sp, available at http://kmalsp.erc.monash.edu/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for PTM site prediction, expedite the discovery of new malonylation and other PTM types and facilitate hypothesis-driven experimental validation of novel malonylated substrates and malonylation sites.
Collapse
Affiliation(s)
- Yanju Zhang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Ruopeng Xie
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Jiawei Wang
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatiana T Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, VIC 3800, Australia
| |
Collapse
|
36
|
Chen Z, Liu X, Li F, Li C, Marquez-Lago T, Leier A, Akutsu T, Webb GI, Xu D, Smith AI, Li L, Chou KC, Song J. Large-scale comparative assessment of computational predictors for lysine post-translational modification sites. Brief Bioinform 2019; 20:2267-2290. [PMID: 30285084 PMCID: PMC6954452 DOI: 10.1093/bib/bby089] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 08/17/2018] [Accepted: 08/18/2018] [Indexed: 12/22/2022] Open
Abstract
Lysine post-translational modifications (PTMs) play a crucial role in regulating diverse functions and biological processes of proteins. However, because of the large volumes of sequencing data generated from genome-sequencing projects, systematic identification of different types of lysine PTM substrates and PTM sites in the entire proteome remains a major challenge. In recent years, a number of computational methods for lysine PTM identification have been developed. These methods show high diversity in their core algorithms, features extracted and feature selection techniques and evaluation strategies. There is therefore an urgent need to revisit these methods and summarize their methodologies, to improve and further develop computational techniques to identify and characterize lysine PTMs from the large amounts of sequence data. With this goal in mind, we first provide a comprehensive survey on a large collection of 49 state-of-the-art approaches for lysine PTM prediction. We cover a variety of important aspects that are crucial for the development of successful predictors, including operating algorithms, sequence and structural features, feature selection, model performance evaluation and software utility. We further provide our thoughts on potential strategies to improve the model performance. Second, in order to examine the feasibility of using deep learning for lysine PTM prediction, we propose a novel computational framework, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs), using deep, bidirectional, long short-term memory recurrent neural networks for accurate and systematic mapping of eight major types of lysine PTMs in the human and mouse proteomes. Extensive benchmarking tests show that MUscADEL outperforms current methods for lysine PTM characterization, demonstrating the potential and power of deep learning techniques in protein PTM prediction. The web server of MUscADEL, together with all the data sets assembled in this study, is freely available at http://muscadel.erc.monash.edu/. We anticipate this comprehensive review and the application of deep learning will provide practical guide and useful insights into PTM prediction and inspire future bioinformatics studies in the related fields.
Collapse
Affiliation(s)
- Zhen Chen
- School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
| | - Xuhan Liu
- Medicinal Chemistry, Leiden Academic Centre for Drug Research,Einsteinweg, Leiden, The Netherlands
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- Institute of Molecular Systems Biology, ETH Zürich,Auguste-Piccard-Hof, Zürich, Switzerland
| | - Tatiana Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research,Kyoto University, Uji, Kyoto, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Dakang Xu
- Faculty of Medical Laboratory Science, Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Department of Molecular and Translational Science, Faculty of Medicine, Hudson Institute of Medical Research, Monash University, Melbourne, VIC, Australia
| | - Alexander Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Lei Li
- School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
37
|
Abstract
Proteomics and phosphoproteomics have been emerging as new dimensions of omics. Phosphorylation has a profound impact on the biological functions and applications of proteins. It influences everything from intrinsic activity and extrinsic executions to cellular localization. This post-translational modification has been subjected to detailed study and has been an object of analytical curiosity with the advent of faster instrumentation. The major strength of phosphoproteomic research lies in the fact that it gives an overall picture of the workforce of the cell. Phosphoproteomics gives deeper insights into understanding the mechanism behind development and progression of a disease. This review for the first time consolidates the list of existing bioinformatics tools developed for phosphoproteomics. The gap between development of bioinformatics tools and their implementation in clinical research is highlighted. The challenge facing progress is ideally believed to be the interdisciplinary arena this field of research is associated with. For meaningful solutions and deliverables, these tools need to be implemented in clinical studies for obtaining answers to pharmacodynamic questions, saving time, costs and energy. This review hopes to invoke some thought in this direction.
Collapse
|
38
|
4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N4-methylcytosine Sites in the Mouse Genome. Cells 2019; 8:cells8111332. [PMID: 31661923 PMCID: PMC6912380 DOI: 10.3390/cells8111332] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 10/21/2019] [Accepted: 10/24/2019] [Indexed: 12/24/2022] Open
Abstract
DNA N4-methylcytosine (4mC) is one of the key epigenetic alterations, playing essential roles in DNA replication, differentiation, cell cycle, and gene expression. To better understand 4mC biological functions, it is crucial to gain knowledge on its genomic distribution. In recent times, few computational studies, in particular machine learning (ML) approaches have been applied in the prediction of 4mC site predictions. Although ML-based methods are promising for 4mC identification in other species, none are available for detecting 4mCs in the mouse genome. Our novel computational approach, called 4mCpred-EL, is the first method for identifying 4mC sites in the mouse genome where four different ML algorithms with a wide range of seven feature encodings are utilized. Subsequently, those feature encodings predicted probabilistic values are used as a feature vector and are once again inputted to ML algorithms, whose corresponding models are integrated into ensemble learning. Our benchmarking results demonstrated that 4mCpred-EL achieved an accuracy and MCC values of 0.795 and 0.591, which significantly outperformed seven other classifiers by more than 1.5–5.9% and 3.2–11.7%, respectively. Additionally, 4mCpred-EL attained an overall accuracy of 79.80%, which is 1.8–5.1% higher than that yielded by seven other classifiers in the independent evaluation. We provided a user-friendly web server, namely 4mCpred-EL which could be implemented as a pre-screening tool for the identification of potential 4mC sites in the mouse genome.
Collapse
|
39
|
Li F, Li C, Marquez-Lago TT, Leier A, Akutsu T, Purcell AW, Ian Smith A, Lithgow T, Daly RJ, Song J, Chou KC. Quokka: a comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome. Bioinformatics 2019; 34:4223-4231. [PMID: 29947803 DOI: 10.1093/bioinformatics/bty522] [Citation(s) in RCA: 120] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 06/26/2018] [Indexed: 01/28/2023] Open
Abstract
Motivation Kinase-regulated phosphorylation is a ubiquitous type of post-translational modification (PTM) in both eukaryotic and prokaryotic cells. Phosphorylation plays fundamental roles in many signalling pathways and biological processes, such as protein degradation and protein-protein interactions. Experimental studies have revealed that signalling defects caused by aberrant phosphorylation are highly associated with a variety of human diseases, especially cancers. In light of this, a number of computational methods aiming to accurately predict protein kinase family-specific or kinase-specific phosphorylation sites have been established, thereby facilitating phosphoproteomic data analysis. Results In this work, we present Quokka, a novel bioinformatics tool that allows users to rapidly and accurately identify human kinase family-regulated phosphorylation sites. Quokka was developed by using a variety of sequence scoring functions combined with an optimized logistic regression algorithm. We evaluated Quokka based on well-prepared up-to-date benchmark and independent test datasets, curated from the Phospho.ELM and UniProt databases, respectively. The independent test demonstrates that Quokka improves the prediction performance compared with state-of-the-art computational tools for phosphorylation prediction. In summary, our tool provides users with high-quality predicted human phosphorylation sites for hypothesis generation and biological validation. Availability and implementation The Quokka webserver and datasets are freely available at http://quokka.erc.monash.edu/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia.,Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Tatiana T Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Anthony W Purcell
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia
| | - A Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia.,ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Trevor Lithgow
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, VIC, Australia
| | - Roger J Daly
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, Australia.,Monash Centre for Data Science, Monash University, Clayton, VIC, Australia
| | | |
Collapse
|
40
|
Abstract
Cell division is a highly regulated and carefully orchestrated process. Understanding the mechanisms that promote proper cell division is an important step toward unraveling important questions in cell biology and human health. Early studies seeking to dissect the mechanisms of cell division used classical genetics approaches to identify genes involved in mitosis and deployed biochemical approaches to isolate and identify proteins critical for cell division. These studies underscored that post-translational modifications and cyclin-kinase complexes play roles at the heart of the cell division program. Modern approaches for examining the mechanisms of cell division, including the use of high-throughput methods to study the effects of RNAi, cDNA, and chemical libraries, have evolved to encompass a larger biological and chemical space. Here, we outline some of the classical studies that established a foundation for the field and provide an overview of recent approaches that have advanced the study of cell division.
Collapse
Affiliation(s)
- Joseph Y Ong
- Department of Chemistry and Biochemistry, UCLA, Los Angeles, California 90095
| | - Jorge Z Torres
- Department of Chemistry and Biochemistry, UCLA, Los Angeles, California 90095 .,The Jonsson Comprehensive Cancer Center, UCLA, Los Angeles, California 90095.,Molecular Biology Institute, UCLA, Los Angeles, California 90095
| |
Collapse
|
41
|
Hasan MM, Khatun MS, Kurata H. Large-Scale Assessment of Bioinformatics Tools for Lysine Succinylation Sites. Cells 2019; 8:cells8020095. [PMID: 30696115 PMCID: PMC6406724 DOI: 10.3390/cells8020095] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Revised: 01/24/2019] [Accepted: 01/24/2019] [Indexed: 12/19/2022] Open
Abstract
Lysine succinylation is a form of posttranslational modification of the proteins that play an essential functional role in every aspect of cell metabolism in both prokaryotes and eukaryotes. Aside from experimental identification of succinylation sites, there has been an intense effort geared towards the development of sequence-based prediction through machine learning, due to its promising and essential properties of being highly accurate, robust and cost-effective. In spite of these advantages, there are several problems that are in need of attention in the design and development of succinylation site predictors. Notwithstanding of many studies on the employment of machine learning approaches, few articles have examined this bioinformatics field in a systematic manner. Thus, we review the advancements regarding the current state-of-the-art prediction models, datasets, and online resources and illustrate the challenges and limitations to present a useful guideline for developing powerful succinylation site prediction tools.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680⁻4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| | - Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680⁻4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680⁻4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
- Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| |
Collapse
|
42
|
KSIMC: Predicting Kinase⁻Substrate Interactions Based on Matrix Completion. Int J Mol Sci 2019; 20:ijms20020302. [PMID: 30646505 PMCID: PMC6358935 DOI: 10.3390/ijms20020302] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 12/31/2018] [Accepted: 01/07/2019] [Indexed: 12/17/2022] Open
Abstract
Protein phosphorylation is an important chemical modification catalyzed by kinases. It plays important roles in many cellular processes. Predicting kinase–substrate interactions is vital to understanding the mechanism of many diseases. Many computational methods have been proposed to identify kinase–substrate interactions. However, the prediction accuracy still needs to be improved. Therefore, it is necessary to develop an efficient computational method to predict kinase–substrate interactions. In this paper, we propose a novel computational approach, KSIMC, to identify kinase–substrate interactions based on matrix completion. Firstly, the kinase similarity and substrate similarity are calculated by aligning sequence of kinase–kinase and substrate–substrate, respectively. Then, the original association network is adjusted based on the similarities. Finally, the matrix completion is used to predict potential kinase–substrate interactions. The experiment results show that our method outperforms other state-of-the-art algorithms in performance. Furthermore, the relevant databases and scientific literature verify the effectiveness of our algorithm for new kinase–substrate interaction identification.
Collapse
|
43
|
Cao M, Chen G, Yu J, Shi S. Computational prediction and analysis of species-specific fungi phosphorylation via feature optimization strategy. Brief Bioinform 2018; 21:595-608. [DOI: 10.1093/bib/bby122] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Revised: 11/16/2018] [Accepted: 11/22/2018] [Indexed: 11/12/2022] Open
Abstract
Abstract
Protein phosphorylation is a reversible and ubiquitous post-translational modification that primarily occurs at serine, threonine and tyrosine residues and regulates a variety of biological processes. In this paper, we first briefly summarized the current progresses in computational prediction of eukaryotic protein phosphorylation sites, which mainly focused on animals and plants, especially on human, with a less extent on fungi. Since the number of identified fungi phosphorylation sites has greatly increased in a wide variety of organisms and their roles in pathological physiology still remain largely unknown, more attention has been paid on the identification of fungi-specific phosphorylation. Here, experimental fungi phosphorylation sites data were collected and most of the sites were classified into different types to be encoded with various features and trained via a two-step feature optimization method. A novel method for prediction of species-specific fungi phosphorylation-PreSSFP was developed, which can identify fungi phosphorylation in seven species for specific serine, threonine and tyrosine residues (http://computbiol.ncu.edu.cn/PreSSFP). Meanwhile, we critically evaluated the performance of PreSSFP and compared it with other existing tools. The satisfying results showed that PreSSFP is a robust predictor. Feature analyses exhibited that there have some significant differences among seven species. The species-specific prediction via two-step feature optimization method to mine important features for training could considerably improve the prediction performance. We anticipate that our study provides a new lead for future computational analysis of fungi phosphorylation.
Collapse
Affiliation(s)
- Man Cao
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Guodong Chen
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Jialin Yu
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| |
Collapse
|
44
|
He W, Wei L, Zou Q. Research progress in protein posttranslational modification site prediction. Brief Funct Genomics 2018; 18:220-229. [DOI: 10.1093/bfgp/ely039] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 11/15/2018] [Accepted: 11/22/2018] [Indexed: 01/24/2023] Open
Abstract
AbstractPosttranslational modifications (PTMs) play an important role in regulating protein folding, activity and function and are involved in almost all cellular processes. Identification of PTMs of proteins is the basis for elucidating the mechanisms of cell biology and disease treatments. Compared with the laboriousness of equivalent experimental work, PTM prediction using various machine-learning methods can provide accurate, simple and rapid research solutions and generate valuable information for further laboratory studies. In this review, we manually curate most of the bioinformatics tools published since 2008. We also summarize the approaches for predicting ubiquitination sites and glycosylation sites. Moreover, we discuss the challenges of current PTM bioinformatics tools and look forward to future research possibilities.
Collapse
Affiliation(s)
- Wenying He
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
45
|
Chandra A, Sharma A, Dehzangi A, Ranganathan S, Jokhan A, Chou KC, Tsunoda T. PhoglyStruct: Prediction of phosphoglycerylated lysine residues using structural properties of amino acids. Sci Rep 2018; 8:17923. [PMID: 30560923 PMCID: PMC6299098 DOI: 10.1038/s41598-018-36203-8] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Accepted: 11/16/2018] [Indexed: 12/22/2022] Open
Abstract
The biological process known as post-translational modification (PTM) contributes to diversifying the proteome hence affecting many aspects of normal cell biology and pathogenesis. There have been many recently reported PTMs, but lysine phosphoglycerylation has emerged as the most recent subject of interest. Despite a large number of proteins being sequenced, the experimental method for detection of phosphoglycerylated residues remains an expensive, time-consuming and inefficient endeavor in the post-genomic era. Instead, the computational methods are being proposed for accurately predicting phosphoglycerylated lysines. Though a number of predictors are available, performance in detecting phosphoglycerylated lysine residues is still limited. In this paper, we propose a new predictor called PhoglyStruct that utilizes structural information of amino acids alongside a multilayer perceptron classifier for predicting phosphoglycerylated and non-phosphoglycerylated lysine residues. For the experiment, we located phosphoglycerylated and non-phosphoglycerylated lysines in our employed benchmark. We then derived and integrated properties such as accessible surface area, backbone torsion angles, and local structure conformations. PhoglyStruct showed significant improvement in the ability to detect phosphoglycerylated residues from non-phosphoglycerylated ones when compared to previous predictors. The sensitivity, specificity, accuracy, Mathews correlation coefficient and AUC were 0.8542, 0.7597, 0.7834, 0.5468 and 0.8077, respectively. The data and Matlab/Octave software packages are available at https://github.com/abelavit/PhoglyStruct .
Collapse
Affiliation(s)
- Abel Chandra
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji.
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD-4111, Australia.
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, 113-8510, Japan.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Kanagawa, Japan.
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji.
- CREST, JST, Tokyo, 113-8510, Japan.
| | - Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, Maryland, USA
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, NSW, 2109, Australia
| | - Anjeela Jokhan
- Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji
| | - Kuo-Chen Chou
- The Gordon Life Science Institute, Boston, MA, 02478, USA
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, 113-8510, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Kanagawa, Japan
- CREST, JST, Tokyo, 113-8510, Japan
| |
Collapse
|
46
|
Lu B, Li C, Chen Q, Song J. ProBAPred: Inferring protein–protein binding affinity by incorporating protein sequence and structural features. J Bioinform Comput Biol 2018; 16:1850011. [PMID: 29954286 DOI: 10.1142/s0219720018500117] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Protein-protein binding interaction is the most prevalent biological activity that mediates a great variety of biological processes. The increasing availability of experimental data of protein–protein interaction allows a systematic construction of protein–protein interaction networks, significantly contributing to a better understanding of protein functions and their roles in cellular pathways and human diseases. Compared to well-established classification for protein–protein interactions (PPIs), limited work has been conducted for estimating protein–protein binding free energy, which can provide informative real-value regression models for characterizing the protein–protein binding affinity. In this study, we propose a novel ensemble computational framework, termed ProBAPred (Protein–protein Binding Affinity Predictor), for quantitative estimation of protein–protein binding affinity. A large number of sequence and structural features, including physical–chemical properties, binding energy and conformation annotations, were collected and calculated from currently available protein binding complex datasets and the literature. Feature selection based on the WEKA package was performed to identify and characterize the most informative and contributing feature subsets. Experiments on the independent test showed that our ensemble method achieved the lowest Mean Absolute Error (MAE; 1.657[Formula: see text]kcal/mol) and the second highest correlation coefficient ([Formula: see text]), compared with the existing methods. The datasets and source codes of ProBAPred, and the supplementary materials in this study can be downloaded at http://lightning.med.monash.edu/probapred/ for academic use. We anticipate that the developed ProBAPred regression models can facilitate computational characterization and experimental studies of protein–protein binding affinity.
Collapse
Affiliation(s)
- Bangli Lu
- School of Computer, Electronic and Information, and State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, 100 Daxue Road, 530004 Nanning, P. R. China
| | - Chen Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
| | - Qingfeng Chen
- School of Computer, Electronic and Information, and State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, 100 Daxue Road, 530004 Nanning, P. R. China
| | - Jiangning Song
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia
- ARC Centre of Excellence for Advanced Molecular Imaging, Monash University, VIC 3800, Australia
| |
Collapse
|
47
|
Cai L, Huang T, Su J, Zhang X, Chen W, Zhang F, He L, Chou KC. Implications of Newly Identified Brain eQTL Genes and Their Interactors in Schizophrenia. MOLECULAR THERAPY. NUCLEIC ACIDS 2018; 12:433-442. [PMID: 30195780 PMCID: PMC6041437 DOI: 10.1016/j.omtn.2018.05.026] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2018] [Revised: 05/19/2018] [Accepted: 05/30/2018] [Indexed: 12/21/2022]
Abstract
Schizophrenia (SCZ) is a devastating genetic mental disorder. Identification of the SCZ risk genes in brains is helpful to understand this disease. Thus, we first used the minimum Redundancy-Maximum Relevance (mRMR) approach to integrate the genome-wide sequence analysis results on SCZ and the expression quantitative trait locus (eQTL) data from ten brain tissues to identify the genes related to SCZ. Second, we adopted the variance inflation factor regression algorithm to identify their interacting genes in brains. Third, using multiple analysis methods, we explored and validated their roles. By means of the aforementioned procedures, we have found that (1) the cerebellum may play a crucial role in the pathogenesis of SCZ and (2) ITIH4 may be utilized as a clinical biomarker for the diagnosis of SCZ. These interesting findings may stimulate novel strategy for developing new drugs against SCZ. It has not escaped our notice that the approach reported here is of use for studying many other genome diseases as well.
Collapse
Affiliation(s)
- Lei Cai
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Genetics and Development, Shanghai Mental Health Center, Shanghai Jiaotong University, Shanghai 200240, China; Gordon Life Science Institute, Boston, MA 02478, USA; Shanghai Center for Women and Children's Health, Shanghai 200062, China.
| | - Tao Huang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Genetics and Development, Shanghai Mental Health Center, Shanghai Jiaotong University, Shanghai 200240, China; Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Jingjing Su
- Department of Neurology, Shanghai Ninth People's Hospital, Shanghai Jiaotong University School of Medicine, Shanghai 200011, China
| | - Xinxin Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Genetics and Development, Shanghai Mental Health Center, Shanghai Jiaotong University, Shanghai 200240, China
| | - Wenzhong Chen
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Genetics and Development, Shanghai Mental Health Center, Shanghai Jiaotong University, Shanghai 200240, China
| | - Fuquan Zhang
- Department of Psychiatry, Wuxi Mental Health Center, Nanjing Medical University, Wuxi 214015, China
| | - Lin He
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Genetics and Development, Shanghai Mental Health Center, Shanghai Jiaotong University, Shanghai 200240, China; Shanghai Center for Women and Children's Health, Shanghai 200062, China.
| | - Kuo-Chen Chou
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Genetics and Development, Shanghai Mental Health Center, Shanghai Jiaotong University, Shanghai 200240, China; Gordon Life Science Institute, Boston, MA 02478, USA; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
48
|
Li J, Lan CN, Kong Y, Feng SS, Huang T. Identification and Analysis of Blood Gene Expression Signature for Osteoarthritis With Advanced Feature Selection Methods. Front Genet 2018; 9:246. [PMID: 30214455 PMCID: PMC6125376 DOI: 10.3389/fgene.2018.00246] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Accepted: 06/22/2018] [Indexed: 12/15/2022] Open
Abstract
Osteoarthritis (OA) is a complex disease that affects articular joints and may cause disability. The incidence of OA is extremely high. Most elderly people have the symptoms of osteoarthritis. The physiotherapy of OA is time consuming, and the chances of full recovery from OA are very minimal. The most effective way of fighting OA is early diagnosis and early intervention. Liquid biopsy has become a popular noninvasive test. To find the blood gene expression signature for OA, we reanalyzed the publicly available blood gene expression profiles of 106 patients with OA and 33 control samples using an automatic computational pipeline based on advanced feature selection methods. Finally, a compact 23-gene set was identified. On the basis of these 23 genes, we constructed a Support Vector Machine (SVM) classifier and evaluated it with leave-one-out cross-validation. Its sensitivity (Sn), specificity (Sp), accuracy (ACC), and Mathew's correlation coefficient (MCC) were 0.991, 0.909, 0.971, and 0.920, respectively. Obviously, the performance needed to be validated in an independent large dataset, but the in-depth biological analysis of the 23 biomarkers showed great promise and suggested that mRNA surveillance pathway and multicellular organism growth played important roles in OA. Our results shed light on OA diagnosis through liquid biopsy.
Collapse
Affiliation(s)
- Jing Li
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Chun-Na Lan
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Ying Kong
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Song-Shan Feng
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
49
|
Manavalan B, Govindaraj RG, Shin TH, Kim MO, Lee G. iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction. Front Immunol 2018; 9:1695. [PMID: 30100904 PMCID: PMC6072840 DOI: 10.3389/fimmu.2018.01695] [Citation(s) in RCA: 113] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Accepted: 07/10/2018] [Indexed: 11/13/2022] Open
Abstract
Identification of B-cell epitopes (BCEs) is a fundamental step for epitope-based vaccine development, antibody production, and disease prevention and diagnosis. Due to the avalanche of protein sequence data discovered in postgenomic age, it is essential to develop an automated computational method to enable fast and accurate identification of novel BCEs within vast number of candidate proteins and peptides. Although several computational methods have been developed, their accuracy is unreliable. Thus, developing a reliable model with significant prediction improvements is highly desirable. In this study, we first constructed a non-redundant data set of 5,550 experimentally validated BCEs and 6,893 non-BCEs from the Immune Epitope Database. We then developed a novel ensemble learning framework for improved linear BCE predictor called iBCE-EL, a fusion of two independent predictors, namely, extremely randomized tree (ERT) and gradient boosting (GB) classifiers, which, respectively, uses a combination of physicochemical properties (PCP) and amino acid composition and a combination of dipeptide and PCP as input features. Cross-validation analysis on a benchmarking data set showed that iBCE-EL performed better than individual classifiers (ERT and GB), with a Matthews correlation coefficient (MCC) of 0.454. Furthermore, we evaluated the performance of iBCE-EL on the independent data set. Results show that iBCE-EL significantly outperformed the state-of-the-art method with an MCC of 0.463. To the best of our knowledge, iBCE-EL is the first ensemble method for linear BCEs prediction. iBCE-EL was implemented in a web-based platform, which is available at http://thegleelab.org/iBCE-EL. iBCE-EL contains two prediction modes. The first one identifying peptide sequences as BCEs or non-BCEs, while later one is aimed at providing users with the option of mining potential BCEs from protein sequences.
Collapse
Affiliation(s)
| | - Rajiv Gandhi Govindaraj
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, United States
| | - Tae Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon, South Korea.,Institute of Molecular Science and Technology, Ajou University, Suwon, South Korea
| | - Myeong Ok Kim
- Division of Life Science and Applied Life Science (BK21 Plus), College of Natural Sciences, Gyeongsang National University, Jinju, South Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, South Korea.,Institute of Molecular Science and Technology, Ajou University, Suwon, South Korea
| |
Collapse
|
50
|
PhosContext2vec: a distributed representation of residue-level sequence contexts and its application to general and kinase-specific phosphorylation site prediction. Sci Rep 2018; 8:8240. [PMID: 29844483 PMCID: PMC5974293 DOI: 10.1038/s41598-018-26392-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 05/10/2018] [Indexed: 11/28/2022] Open
Abstract
Phosphorylation is the most important type of protein post-translational modification. Accordingly, reliable identification of kinase-mediated phosphorylation has important implications for functional annotation of phosphorylated substrates and characterization of cellular signalling pathways. The local sequence context surrounding potential phosphorylation sites is considered to harbour the most relevant information for phosphorylation site prediction models. However, currently there is a lack of condensed vector representation for this important contextual information, despite the presence of varying residue-level features that can be constructed from sequence homology profiles, structural information, and physicochemical properties. To address this issue, we present PhosContext2vec which is a distributed representation of residue-level sequence contexts for potential phosphorylation sites and demonstrate its application in both general and kinase-specific phosphorylation site predictions. Benchmarking experiments indicate that PhosContext2vec could achieve promising predictive performance compared with several other existing methods for phosphorylation site prediction. We envisage that PhosContext2vec, as a new sequence context representation, can be used in combination with other informative residue-level features to improve the classification performance in a number of related bioinformatics tasks that require appropriate residue-level feature vector representation and extraction. The web server of PhosContext2vec is publicly available at http://phoscontext2vec.erc.monash.edu/.
Collapse
|