1
|
Pratyush P, Pokharel S, Ismail HD, Bahmani S, Kc DB. LMPTMSite: A Platform for PTM Site Prediction in Proteins Leveraging Transformer-Based Protein Language Models. Methods Mol Biol 2025; 2867:261-297. [PMID: 39576587 DOI: 10.1007/978-1-0716-4196-5_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Protein post-translational modifications (PTMs) introduce new functionalities and play a critical role in the regulation of protein functions. Characterizing these modifications, especially PTM sites, is essential for unraveling complex biological systems. However, traditional experimental approaches, such as mass spectrometry, are time-consuming and expensive. Machine learning and deep learning techniques offer promising alternatives for predicting PTM sites. In this chapter, we introduce our LMPTMSite (language model-based post-translational modification site predictor) platform, which emphasizes two transformer-based protein language model (pLM) approaches: pLMSNOSite and LMSuccSite, for the prediction of S-nitrosylation sites and succinylation sites in proteins, respectively. We highlight the various methods of using pLM-based sequence encoding, explain the underlying deep learning architectures, and discuss the superior efficacy of these tools compared to other state-of-the-art tools. Subsequently, we present an analysis of runtime and memory usage for pLMSNOSite, with a focus on CPU and RAM usage as the input sequence length is scaled up. Finally, we showcase a case study predicting succinylation sites in proteins active within the tricarboxylic acid (TCA) cycle pathway using LMSuccSite, demonstrating its potential utility and efficiency in real-world biological contexts. The LMPTMSite platform, inclusive of pLMSNOSite and LMSuccSite, is freely available both as a web server ( http://kcdukkalab.org/pLMSNOSite/ and http://kcdukkalab.org/LMSuccSite/ ) and as standalone packages ( https://github.com/KCLabMTU/pLMSNOSite and https://github.com/KCLabMTU/LMSuccSite ), providing valuable tools for researchers in the field.
Collapse
Affiliation(s)
- Pawel Pratyush
- Computer Science Department, Rochester Institute of Technology, Rochester, NY, USA
| | - Suresh Pokharel
- Computer Science Department, Rochester Institute of Technology, Rochester, NY, USA
| | - Hamid D Ismail
- Computer Science Department, Rochester Institute of Technology, Rochester, NY, USA
- North Carolina A&T State University, Computational Data Science and Engineering, Greensboro, NC, USA
| | - Soufia Bahmani
- Computer Science Department, Rochester Institute of Technology, Rochester, NY, USA
- Michigan Technological University, Comptuer Science Department, Houghton, MI, USA
| | - Dukka B Kc
- Computer Science Department, Rochester Institute of Technology, Rochester, NY, USA.
| |
Collapse
|
2
|
Gao M, Song C, Liu T. PLM-T3SE: Accurate Prediction of Type III Secretion Effectors Using Protein Language Model Embeddings. J Cell Biochem 2025; 126:e30642. [PMID: 39164870 DOI: 10.1002/jcb.30642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 08/04/2024] [Accepted: 08/07/2024] [Indexed: 08/22/2024]
Abstract
The Type III secretion effectors (T3SEs) are bacterial proteins synthesized by Gram-negative pathogens and delivered into host cells via the Type III secretion system (T3SS). These effectors usually play a pivotal role in the interactions between bacteria and hosts. Hence, the precise identification of T3SEs aids researchers in exploring the pathogenic mechanisms of bacterial infections. Since the diversity and complexity of T3SE sequences often make traditional experimental methods time-consuming, it is imperative to explore more efficient and convenient computational approaches for T3SE prediction. Inspired by the promising potential exhibited by pre-trained language models in protein recognition tasks, we proposed a method called PLM-T3SE that utilizes protein language models (PLMs) for effective recognition of T3SEs. First, we utilized PLM embeddings and evolutionary features from the position-specific scoring matrix (PSSM) profiles to transform protein sequences into fixed-length vectors for model training. Second, we employed the extreme gradient boosting (XGBoost) algorithm to rank these features based on their importance. Finally, a MLP neural network model was used to predict T3SEs based on the selected optimal feature set. Experimental results from the cross-validation and independent test demonstrated that our model exhibited superior performance compared to the existing models. Specifically, our model achieved an accuracy of 98.1%, which is 1.8%-42.4% higher than the state-of-the-art predictors based on the same independent data set test. These findings highlight the superiority of the PLM-T3SE and the remarkable characterization ability of PLM embeddings for T3SE prediction.
Collapse
Affiliation(s)
- Mengru Gao
- College of Information Technology, Shanghai Ocean University, Shanghai, China
| | - Chen Song
- College of Information Technology, Shanghai Ocean University, Shanghai, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai, China
| |
Collapse
|
3
|
Pratyush P, Kc DB. Advances in Prediction of Posttranslational Modification Sites Known to Localize in Protein Supersecondary Structures. Methods Mol Biol 2025; 2870:117-151. [PMID: 39543034 DOI: 10.1007/978-1-0716-4213-9_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
Posttranslational modifications (PTMs) play a crucial role in modulating the structure, function, localization, and interactions of proteins, with many PTMs being localized within supersecondary structures, such as helical pairs. These modifications can significantly influence the conformation and stability of these structures. For instance, phosphorylation introduces negative charges that alter electrostatic interactions, while acetylation or methylation of lysine residues affects the stability and interactions of alpha helices or beta strands. Given the pivotal role of supersecondary structures in the overall protein architecture, their modulation by PTMs is essential for protein functionality. This chapter explores the latest advancements in predicting sites for the five PTMs (phosphorylation, acetylation, glycosylation, methylation, and ubiquitination) known to be localized within supersecondary structures. The chapter highlights the recent advances in the prediction of these PTM sites, including the use of global contextualized embeddings from protein language models, integration of structural information, utilization of reliable positive and negative sites, and application of contrastive learning. These methodologies and emerging trends offer a roadmap for novel innovations in addressing PTM prediction challenges, particularly those linked to supersecondary structures.
Collapse
Affiliation(s)
- Pawel Pratyush
- Computer Science Department, Michigan Technological University, Houghton, MI, USA
- Computer Science Department, Rochester Institute of Technology, Henrietta, NY, USA
| | - Dukka B Kc
- Computer Science Department, Michigan Technological University, Houghton, MI, USA.
- Computer Science Department, Rochester Institute of Technology, Henrietta, NY, USA.
| |
Collapse
|
4
|
Yan ZN, Liu PR, Zhou H, Zhang JY, Liu SX, Xie Y, Wang HL, Yu JB, Zhou Y, Ni CM, Huang L, Ye ZW. Brain-computer Interaction in the Smart Era. Curr Med Sci 2024; 44:1123-1131. [PMID: 39347924 DOI: 10.1007/s11596-024-2927-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Accepted: 08/18/2024] [Indexed: 10/01/2024]
Abstract
The brain-computer interface (BCI) system serves as a critical link between external output devices and the human brain. A monitored object's mental state, sensory cognition, and even higher cognition are reflected in its electroencephalography (EEG) signal. Nevertheless, unprocessed EEG signals are frequently contaminated with a variety of artifacts, rendering the analysis and elimination of impurities from the collected EEG data exceedingly challenging, not to mention the manual adjustment thereof. Over the last few decades, the rapid advancement of artificial intelligence (AI) technology has contributed to the development of BCI technology. Algorithms derived from AI and machine learning have significantly enhanced the ability to analyze and process EEG electrical signals, thereby expanding the range of potential interactions between the human brain and computers. As a result, the present BCI technology with the help of AI can assist physicians in gaining a more comprehensive understanding of their patients' physical and psychological status, thereby contributing to improvements in their health and quality of life.
Collapse
Affiliation(s)
- Zi-Neng Yan
- Intelligent Medical Laboratory, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
- Department of Orthopedics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Peng-Ran Liu
- Intelligent Medical Laboratory, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
- Department of Orthopedics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Hong Zhou
- Intelligent Medical Laboratory, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
- Department of Orthopedics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Jia-Yao Zhang
- Intelligent Medical Laboratory, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
- Department of Orthopedics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Song-Xiang Liu
- Intelligent Medical Laboratory, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
- Department of Orthopedics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Yi Xie
- Intelligent Medical Laboratory, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
- Department of Orthopedics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Hong-Lin Wang
- Intelligent Medical Laboratory, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
- Department of Orthopedics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Jin-Bo Yu
- Wuhan Neuracom Technology Development Co., Ltd, Wuhan, 430200, China
| | - Yu Zhou
- Wuhan Neuracom Technology Development Co., Ltd, Wuhan, 430200, China
| | - Chang-Mao Ni
- Wuhan Neuracom Technology Development Co., Ltd, Wuhan, 430200, China
| | - Li Huang
- Wuhan Neuracom Technology Development Co., Ltd, Wuhan, 430200, China.
| | - Zhe-Wei Ye
- Intelligent Medical Laboratory, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China.
- Department of Orthopedics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China.
| |
Collapse
|
5
|
Shrestha P, Kandel J, Tayara H, Chong KT. Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model. Nat Commun 2024; 15:6699. [PMID: 39107330 PMCID: PMC11303401 DOI: 10.1038/s41467-024-51071-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 07/29/2024] [Indexed: 08/10/2024] Open
Abstract
Post-translational modifications (PTMs) are pivotal in modulating protein functions and influencing cellular processes like signaling, localization, and degradation. The complexity of these biological interactions necessitates efficient predictive methodologies. In this work, we introduce PTMGPT2, an interpretable protein language model that utilizes prompt-based fine-tuning to improve its accuracy in precisely predicting PTMs. Drawing inspiration from recent advancements in GPT-based architectures, PTMGPT2 adopts unsupervised learning to identify PTMs. It utilizes a custom prompt to guide the model through the subtle linguistic patterns encoded in amino acid sequences, generating tokens indicative of PTM sites. To provide interpretability, we visualize attention profiles from the model's final decoder layer to elucidate sequence motifs essential for molecular recognition and analyze the effects of mutations at or near PTM sites to offer deeper insights into protein functionality. Comparative assessments reveal that PTMGPT2 outperforms existing methods across 19 PTM types, underscoring its potential in identifying disease associations and drug targets.
Collapse
Affiliation(s)
- Palistha Shrestha
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea
| | - Jeevan Kandel
- Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea.
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea.
| |
Collapse
|
6
|
Pratyush P, Bahmani S, Pokharel S, Ismail HD, KC DB. LMCrot: an enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model. Bioinformatics 2024; 40:btae290. [PMID: 38662579 PMCID: PMC11088740 DOI: 10.1093/bioinformatics/btae290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 02/13/2024] [Accepted: 04/24/2024] [Indexed: 05/13/2024] Open
Abstract
MOTIVATION Recent advancements in natural language processing have highlighted the effectiveness of global contextualized representations from protein language models (pLMs) in numerous downstream tasks. Nonetheless, strategies to encode the site-of-interest leveraging pLMs for per-residue prediction tasks, such as crotonylation (Kcr) prediction, remain largely uncharted. RESULTS Herein, we adopt a range of approaches for utilizing pLMs by experimenting with different input sequence types (full-length protein sequence versus window sequence), assessing the implications of utilizing per-residue embedding of the site-of-interest as well as embeddings of window residues centered around it. Building upon these insights, we developed a novel residual ConvBiLSTM network designed to process window-level embeddings of the site-of-interest generated by the ProtT5-XL-UniRef50 pLM using full-length sequences as input. This model, termed T5ResConvBiLSTM, surpasses existing state-of-the-art Kcr predictors in performance across three diverse datasets. To validate our approach of utilizing full sequence-based window-level embeddings, we also delved into the interpretability of ProtT5-derived embedding tensors in two ways: firstly, by scrutinizing the attention weights obtained from the transformer's encoder block; and secondly, by computing SHAP values for these tensors, providing a model-agnostic interpretation of the prediction results. Additionally, we enhance the latent representation of ProtT5 by incorporating two additional local representations, one derived from amino acid properties and the other from supervised embedding layer, through an intermediate fusion stacked generalization approach, using an n-mer window sequence (or, peptide/fragment). The resultant stacked model, dubbed LMCrot, exhibits a more pronounced improvement in predictive performance across the tested datasets. AVAILABILITY AND IMPLEMENTATION LMCrot is publicly available at https://github.com/KCLabMTU/LMCrot.
Collapse
Affiliation(s)
- Pawel Pratyush
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| | - Soufia Bahmani
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| | - Suresh Pokharel
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| | - Hamid D Ismail
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| | - Dukka B KC
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| |
Collapse
|
7
|
Liu T, Song C, Wang C. NCSP-PLM: An ensemble learning framework for predicting non-classical secreted proteins based on protein language models and deep learning. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:1472-1488. [PMID: 38303473 DOI: 10.3934/mbe.2024063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Non-classical secreted proteins (NCSPs) refer to a group of proteins that are located in the extracellular environment despite the absence of signal peptides and motifs. They usually play different roles in intercellular communication. Therefore, the accurate prediction of NCSPs is a critical step to understanding in depth their associated secretion mechanisms. Since the experimental recognition of NCSPs is often costly and time-consuming, computational methods are desired. In this study, we proposed an ensemble learning framework, termed NCSP-PLM, for the identification of NCSPs by extracting feature embeddings from pre-trained protein language models (PLMs) as input to several fine-tuned deep learning models. First, we compared the performance of nine PLM embeddings by training three neural networks: Multi-layer perceptron (MLP), attention mechanism and bidirectional long short-term memory network (BiLSTM) and selected the best network model for each PLM embedding. Then, four models were excluded due to their below-average accuracies, and the remaining five models were integrated to perform the prediction of NCSPs based on the weighted voting. Finally, the 5-fold cross validation and the independent test were conducted to evaluate the performance of NCSP-PLM on the benchmark datasets. Based on the same independent dataset, the sensitivity and specificity of NCSP-PLM were 91.18% and 97.06%, respectively. Particularly, the overall accuracy of our model achieved 94.12%, which was 7~16% higher than that of the existing state-of-the-art predictors. It indicated that NCSP-PLM could serve as a useful tool for the annotation of NCSPs.
Collapse
Affiliation(s)
- Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Chen Song
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Chunhua Wang
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| |
Collapse
|
8
|
Pokharel S, Pratyush P, Ismail HD, Ma J, KC DB. Integrating Embeddings from Multiple Protein Language Models to Improve Protein O-GlcNAc Site Prediction. Int J Mol Sci 2023; 24:16000. [PMID: 37958983 PMCID: PMC10650050 DOI: 10.3390/ijms242116000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 11/02/2023] [Accepted: 11/04/2023] [Indexed: 11/15/2023] Open
Abstract
O-linked β-N-acetylglucosamine (O-GlcNAc) is a distinct monosaccharide modification of serine (S) or threonine (T) residues of nucleocytoplasmic and mitochondrial proteins. O-GlcNAc modification (i.e., O-GlcNAcylation) is involved in the regulation of diverse cellular processes, including transcription, epigenetic modifications, and cell signaling. Despite the great progress in experimentally mapping O-GlcNAc sites, there is an unmet need to develop robust prediction tools that can effectively locate the presence of O-GlcNAc sites in protein sequences of interest. In this work, we performed a comprehensive evaluation of a framework for prediction of protein O-GlcNAc sites using embeddings from pre-trained protein language models. In particular, we compared the performance of three protein sequence-based large protein language models (pLMs), Ankh, ESM-2, and ProtT5, for prediction of O-GlcNAc sites and also evaluated various ensemble strategies to integrate embeddings from these protein language models. Upon investigation, the decision-level fusion approach that integrates the decisions of the three embedding models, which we call LM-OGlcNAc-Site, outperformed the models trained on these individual language models as well as other fusion approaches and other existing predictors in almost all of the parameters evaluated. The precise prediction of O-GlcNAc sites will facilitate the probing of O-GlcNAc site-specific functions of proteins in physiology and diseases. Moreover, these findings also indicate the effectiveness of combined uses of multiple protein language models in post-translational modification prediction and open exciting avenues for further research and exploration in other protein downstream tasks. LM-OGlcNAc-Site's web server and source code are publicly available to the community.
Collapse
Affiliation(s)
- Suresh Pokharel
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, USA; (S.P.); (P.P.); (H.D.I.)
| | - Pawel Pratyush
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, USA; (S.P.); (P.P.); (H.D.I.)
| | - Hamid D. Ismail
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, USA; (S.P.); (P.P.); (H.D.I.)
| | - Junfeng Ma
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Georgetown University, Washington, DC 20057, USA;
| | - Dukka B. KC
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, USA; (S.P.); (P.P.); (H.D.I.)
| |
Collapse
|
9
|
Kumari S, Gupta R, Ambasta RK, Kumar P. Emerging trends in post-translational modification: Shedding light on Glioblastoma multiforme. Biochim Biophys Acta Rev Cancer 2023; 1878:188999. [PMID: 37858622 DOI: 10.1016/j.bbcan.2023.188999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/06/2023] [Accepted: 10/06/2023] [Indexed: 10/21/2023]
Abstract
Recent multi-omics studies, including proteomics, transcriptomics, genomics, and metabolomics have revealed the critical role of post-translational modifications (PTMs) in the progression and pathogenesis of Glioblastoma multiforme (GBM). Further, PTMs alter the oncogenic signaling events and offer a novel avenue in GBM therapeutics research through PTM enzymes as potential biomarkers for drug targeting. In addition, PTMs are critical regulators of chromatin architecture, gene expression, and tumor microenvironment (TME), that play a crucial function in tumorigenesis. Moreover, the implementation of artificial intelligence and machine learning algorithms enhances GBM therapeutics research through the identification of novel PTM enzymes and residues. Herein, we briefly explain the mechanism of protein modifications in GBM etiology, and in altering the biologics of GBM cells through chromatin remodeling, modulation of the TME, and signaling pathways. In addition, we highlighted the importance of PTM enzymes as therapeutic biomarkers and the role of artificial intelligence and machine learning in protein PTM prediction.
Collapse
Affiliation(s)
- Smita Kumari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India
| | - Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India; School of Medicine, University of South Carolina, Columbia, SC, United States of America
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India; Department of Biotechnology and Microbiology, SRM University, Sonepat, Haryana, India.
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India.
| |
Collapse
|
10
|
Badgandi HB, Weichsel A, Montfort WR. Nitric oxide delivery and heme-assisted S-nitrosation by the bedbug nitrophorin. J Inorg Biochem 2023; 246:112263. [PMID: 37290359 PMCID: PMC10332259 DOI: 10.1016/j.jinorgbio.2023.112263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 05/10/2023] [Accepted: 05/20/2023] [Indexed: 06/10/2023]
Abstract
Nitrophorins are heme proteins used by blood feeding insects to deliver nitric oxide (NO) to a victim, leading to vasodilation and antiplatelet activity. Cimex lectularius (bedbug) nitrophorin (cNP) accomplishes this with a cysteine ligated ferric (Fe(III)) heme. In the acidic environment of the insect's salivary glands, NO binds tightly to cNP. During a blood meal, cNP-NO is delivered to the feeding site where dilution and increased pH lead to NO release. In a previous study, cNP was shown to not only bind heme, but to also nitrosate the proximal cysteine, leading to Cys-NO (SNO) formation. SNO formation requires oxidation of the proximal cysteine, which was proposed to be metal-assisted through accompanying reduction of ferric heme and formation of Fe(II)-NO. Here, we report the 1.6 Å crystal structure of cNP first chemically reduced and then exposed to NO, and show that Fe(II)-NO is formed but SNO is not, supporting a metal-assisted SNO formation mechanism. Crystallographic and spectroscopic studies of mutated cNP show that steric crowding of the proximal site inhibits SNO formation while a sterically relaxed proximal site enhances SNO formation, providing insight into specificity for this poorly understood modification. Experiments examining the pH dependence for NO implicate direct protonation of the proximal cysteine as the underlying mechanism. At lower pH, thiol heme ligation predominates, leading to a smaller trans effect and 60-fold enhanced NO affinity (Kd = 70 nM). Unexpectedly, we find that thiol formation interferes with SNO formation, suggesting cNP-SNO is unlikely to form in the insect salivary glands.
Collapse
Affiliation(s)
- Hemant B Badgandi
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85721, United States of America
| | - Andrzej Weichsel
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85721, United States of America
| | - William R Montfort
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85721, United States of America.
| |
Collapse
|
11
|
Pakhrin SC, Pokharel S, Pratyush P, Chaudhari M, Ismail HD, Kc DB. LMPhosSite: A Deep Learning-Based Approach for General Protein Phosphorylation Site Prediction Using Embeddings from the Local Window Sequence and Pretrained Protein Language Model. J Proteome Res 2023; 22:2548-2557. [PMID: 37459437 DOI: 10.1021/acs.jproteome.2c00667] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Phosphorylation is one of the most important post-translational modifications and plays a pivotal role in various cellular processes. Although there exist several computational tools to predict phosphorylation sites, existing tools have not yet harnessed the knowledge distilled by pretrained protein language models. Herein, we present a novel deep learning-based approach called LMPhosSite for the general phosphorylation site prediction that integrates embeddings from the local window sequence and the contextualized embedding obtained using global (overall) protein sequence from a pretrained protein language model to improve the prediction performance. Thus, the LMPhosSite consists of two base-models: one for capturing effective local representation and the other for capturing global per-residue contextualized embedding from a pretrained protein language model. The output of these base-models is integrated using a score-level fusion approach. LMPhosSite achieves a precision, recall, Matthew's correlation coefficient, and F1-score of 38.78%, 67.12%, 0.390, and 49.15%, for the combined serine and threonine independent test data set and 34.90%, 62.03%, 0.298, and 44.67%, respectively, for the tyrosine independent test data set, which is better than the compared approaches. These results demonstrate that LMPhosSite is a robust computational tool for the prediction of the general phosphorylation sites in proteins.
Collapse
Affiliation(s)
- Subash C Pakhrin
- School of Computing, Wichita State University, 1845 Fairmount St., Wichita, Kansas 67260, United States
- Department of Computer Science & Engineering Technology, University of Houston-Downtown, 1 Main St., Houston, Texas 77002, United States
| | - Suresh Pokharel
- Department of Computer Science, Michigan Technological University, Houghton, Michigan 49931, United States
| | - Pawel Pratyush
- Department of Computer Science, Michigan Technological University, Houghton, Michigan 49931, United States
| | - Meenal Chaudhari
- Department of Biology, North Carolina A&T State University, Greensboro, North Carolina 27411, United States
| | - Hamid D Ismail
- Department of Computer Science, Michigan Technological University, Houghton, Michigan 49931, United States
| | - Dukka B Kc
- Department of Computer Science, Michigan Technological University, Houghton, Michigan 49931, United States
| |
Collapse
|