1
|
Pham NT, Zhang Y, Rakkiyappan R, Manavalan B. HOTGpred: Enhancing human O-linked threonine glycosylation prediction using integrated pretrained protein language model-based features and multi-stage feature selection approach. Comput Biol Med 2024; 179:108859. [PMID: 39029431 DOI: 10.1016/j.compbiomed.2024.108859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 06/19/2024] [Accepted: 07/06/2024] [Indexed: 07/21/2024]
Abstract
O-linked glycosylation is a complex post-translational modification (PTM) in human proteins that plays a critical role in regulating various cellular metabolic and signaling pathways. In contrast to N-linked glycosylation, O-linked glycosylation lacks specific sequence features and maintains an unstable core structure. Identifying O-linked threonine glycosylation sites (OTGs) remains challenging, requiring extensive experimental tests. While bioinformatics tools have emerged for predicting OTGs, their reliance on limited conventional features and absence of well-defined feature selection strategies limit their effectiveness. To address these limitations, we introduced HOTGpred (Human O-linked Threonine Glycosylation predictor), employing a multi-stage feature selection process to identify the optimal feature set for accurately identifying OTGs. Initially, we assessed 25 different feature sets derived from various pretrained protein language model (PLM)-based embeddings and conventional feature descriptors using nine classifiers. Subsequently, we integrated the top five embeddings linearly and determined the most effective scoring function for ranking hybrid features, identifying the optimal feature set through a process of sequential forward search. Among the classifiers, the extreme gradient boosting (XGBT)-based model, using the optimal feature set (HOTGpred), achieved 92.03 % accuracy on the training dataset and 88.25 % on the balanced independent dataset. Notably, HOTGpred significantly outperformed the current state-of-the-art methods on both the balanced and imbalanced independent datasets, demonstrating its superior prediction capabilities. Additionally, SHapley Additive exPlanations (SHAP) and ablation analyses were conducted to identify the features contributing most significantly to HOTGpred. Finally, we developed an easy-to-navigate web server, accessible at https://balalab-skku.org/HOTGpred/, to support glycobiologists in their research on glycosylation structure and function.
Collapse
Affiliation(s)
- Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea
| | - Ying Zhang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Rajan Rakkiyappan
- Department of Mathematics, Bharathiar University, Coimbatore, 641046, Tamil Nadu, India.
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
2
|
Ding X, Wang M, Chang R, Su M, Wang J, Li X. Longer fatty acid-protected GalNAz enables efficient labeling of proteins in living cells with minimized S-glyco modification. Org Biomol Chem 2024; 22:4574-4579. [PMID: 38775030 DOI: 10.1039/d4ob00486h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
Metabolic glycoengineering provides a powerful tool to label proteins with chemical tags for cell imaging and protein enrichment. The structures of per-O-acetylation on unnatural sugars facilitate membrane permeability and increase cellular uptake and are widely used for metabolic glycan labeling. However, unexpected S-glyco modification was discovered via a non-enzymatic reaction with protein cysteines, which was initially conducted with the hydrolysis of anomeric acetate by esterase. Herein, we synthesized a series of GalNAz derivatives that were protected with various lengths of short-chain fatty acid, including acetate, propionate, butyrate, valerate and pivalate, to detect differences in labeling efficiencies and occurrence of S-glyco modification. Our results demonstrate that all the GalNAz derivatives could effectively label proteins in HeLa cells, except the pivalate group. Of note, But4GalNAz exhibited excellent labeling abilities compared with Ac4GalNAz from the results for western blot, flow cytometry and confocal laser scanning microscopy. Moreover, the results for the S-glyco-modification assay by western blot and chemoproteomic analysis indicated that But4GalNAz generated negligible unexpected labeling signals compared to Ac4GalNAz. Our study uncovers the distinct labeling efficiency of different protected groups on unnatural sugars, which provides an alternative strategy to explore novel glycan probes.
Collapse
Affiliation(s)
- Xin Ding
- Joint National Laboratory for Antibody Drug Engineering, The First Affiliated Hospital of Henan University, Henan University, Kaifeng 475000, China.
| | - Menghe Wang
- Joint National Laboratory for Antibody Drug Engineering, The First Affiliated Hospital of Henan University, Henan University, Kaifeng 475000, China.
| | - Renhao Chang
- Joint National Laboratory for Antibody Drug Engineering, The First Affiliated Hospital of Henan University, Henan University, Kaifeng 475000, China.
| | - Miaomiao Su
- Joint National Laboratory for Antibody Drug Engineering, The First Affiliated Hospital of Henan University, Henan University, Kaifeng 475000, China.
| | - Jiajia Wang
- Joint National Laboratory for Antibody Drug Engineering, The First Affiliated Hospital of Henan University, Henan University, Kaifeng 475000, China.
| | - Xia Li
- Joint National Laboratory for Antibody Drug Engineering, The First Affiliated Hospital of Henan University, Henan University, Kaifeng 475000, China.
| |
Collapse
|
3
|
Murphy LD, Huxley KE, Wilding A, Robinson C, Foucart QPO, Willems LI. Synthesis of biolabile thioalkyl-protected phosphates from an easily accessible phosphotriester precursor. Chem Sci 2023; 14:5062-5068. [PMID: 37206382 PMCID: PMC10189884 DOI: 10.1039/d3sc00693j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 04/19/2023] [Indexed: 05/21/2023] Open
Abstract
Robust methods for the synthesis of mixed phosphotriesters are essential to accelerate the development of novel phosphate-containing bioactive molecules. To enable efficient cellular uptake, phosphate groups are commonly masked with biolabile protecting groups, such as S-acyl-2-thioethyl (SATE) esters, that are removed once the molecule is inside the cell. Typically, bis-SATE-protected phosphates are synthesised through phosphoramidite chemistry. This approach, however, suffers from issues with hazardous reagents and can give unreliable yields, especially when applied to the synthesis of sugar-1-phosphate derivatives as tools for metabolic oligosaccharide engineering. Here, we report the development of an alternative approach that gives access to bis-SATE phosphotriesters in two steps from an easy to synthesise tri(2-bromoethyl)phosphotriester precursor. We demonstrate the viability of this strategy using glucose as a model substrate, onto which a bis-SATE-protected phosphate is introduced either at the anomeric position or at C6. We show compability with various protecting groups and further explore the scope and limitations of the methodology on different substrates, including N-acetylhexosamine and amino acid derivatives. The new approach facilitates the synthesis of bis-SATE-protected phosphoprobes and prodrugs and provides a platform that can boost further studies aimed at exploring the unique potential of sugar phosphates as research tools.
Collapse
Affiliation(s)
- Lloyd D Murphy
- York Structural Biology Laboratory and York Biomedical Research Institute, Department of Chemistry, University of York York YO10 5DD UK
| | - Kathryn E Huxley
- York Structural Biology Laboratory and York Biomedical Research Institute, Department of Chemistry, University of York York YO10 5DD UK
| | - Ava Wilding
- Department of Chemistry, University of York York YO10 5DD UK
| | - Cyane Robinson
- Department of Chemistry, University of York York YO10 5DD UK
| | - Quentin P O Foucart
- York Structural Biology Laboratory and York Biomedical Research Institute, Department of Chemistry, University of York York YO10 5DD UK
| | - Lianne I Willems
- York Structural Biology Laboratory and York Biomedical Research Institute, Department of Chemistry, University of York York YO10 5DD UK
| |
Collapse
|
4
|
Tang H, Tang Q, Zhang Q, Feng P. O-GlyThr: Prediction of human O-linked threonine glycosites using multi-feature fusion. Int J Biol Macromol 2023; 242:124761. [PMID: 37156312 DOI: 10.1016/j.ijbiomac.2023.124761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 05/01/2023] [Accepted: 05/02/2023] [Indexed: 05/10/2023]
Abstract
O-linked glycosylation is one of the most complex post-translational modifications (PTM) of human proteins modulating various cellular metabolic and signaling pathways. Unlike N-glycosylation, the O-glycosylation has nonspecific sequence features and nonstable glycan core structure, which makes identification of O-glycosites more challenging either by experimental or computational methods. Biochemical experiments to identify O-glycosites in batches are technically and economically demanding. Therefore, development of computation-based methods is greatly warranted. This study constructed a prediction model based on feature fusion for O-glycosites linked to the threonine residues in Homo sapiens. In the training model, we collected and sorted out high-quality human protein data with O-linked threonine glycosites. Seven feature coding methods were fused to represent the sample sequence. By comparison of different algorithms, random forest was selected as the final classifier to construct the classification model. Through 5-fold cross-validation, the proposed model, namely O-GlyThr, performed satisfactorily on both training set (AUC: 0.9308) and independent validation dataset (AUC: 0.9323). Compared with previously published predictors, O-GlyThr achieved the highest ACC of 0.8475 on the independent test dataset. These results demonstrated the high competency of our predictor in identifying O-glycosites on threonine residues. Furthermore, a user-friendly webserver named O-GlyThr (http://cbcb.cdutcm.edu.cn/O-GlyThr/) was developed to assist glycobiologists in the research associated with glycosylation structure and function.
Collapse
Affiliation(s)
- Hua Tang
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China
| | - Qiang Tang
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China
| | - Pengmian Feng
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.
| |
Collapse
|
5
|
Park S, Chin-Hun Kuo J, Reesink HL, Paszek MJ. Recombinant mucin biotechnology and engineering. Adv Drug Deliv Rev 2023; 193:114618. [PMID: 36375719 PMCID: PMC10253230 DOI: 10.1016/j.addr.2022.114618] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 10/14/2022] [Accepted: 11/04/2022] [Indexed: 11/13/2022]
Abstract
Mucins represent a largely untapped class of polymeric building block for biomaterials, therapeutics, and other biotechnology. Because the mucin polymer backbone is genetically encoded, sequence-specific mucins with defined physical and biochemical properties can be fabricated using recombinant technologies. The pendent O-glycans of mucins are increasingly implicated in immunomodulation, suppression of pathogen virulence, and other biochemical activities. Recent advances in engineered cell production systems are enabling the scalable synthesis of recombinant mucins with precisely tuned glycan side chains, offering exciting possibilities to tune the biological functionality of mucin-based products. New metabolic and chemoenzymatic strategies enable further tuning and functionalization of mucin O-glycans, opening new possibilities to expand the chemical diversity and functionality of mucin building blocks. In this review, we discuss these advances, and the opportunities for engineered mucins in biomedical applications ranging from in vitro models to therapeutics.
Collapse
Affiliation(s)
- Sangwoo Park
- Field of Biophysics, Cornell University, Ithaca, NY 14853, USA
| | - Joe Chin-Hun Kuo
- Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853, USA
| | - Heidi L Reesink
- Department of Clinical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - Matthew J Paszek
- Field of Biophysics, Cornell University, Ithaca, NY 14853, USA; Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853, USA; Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
6
|
Abstract
Mucin-domain glycoproteins comprise a class of proteins whose densely O-glycosylated mucin domains adopt a secondary structure with unique biophysical and biochemical properties. The canonical family of mucins is well-known to be involved in various diseases, especially cancer. Despite this, very little is known about the site-specific molecular structures and biological activities of mucins, in part because they are extremely challenging to study by mass spectrometry (MS). Here, we summarize recent advancements toward this goal, with a particular focus on mucin-domain glycoproteins as opposed to general O-glycoproteins. We summarize proteolytic digestion techniques, enrichment strategies, MS fragmentation, and intact analysis, as well as new bioinformatic platforms. In particular, we highlight mucin directed technologies such as mucin-selective proteases, tunable mucin platforms, and a mucinomics strategy to enrich mucin-domain glycoproteins from complex samples. Finally, we provide examples of targeted mucin-domain glycoproteomics that combine these techniques in comprehensive site-specific analyses of proteins. Overall, this Review summarizes the methods, challenges, and new opportunities associated with studying enigmatic mucin domains.
Collapse
Affiliation(s)
- Valentina Rangel-Angarita
- Department of Chemistry, Yale University, 275 Prospect Street, New Haven, Connecticut 06511, United States
| | - Stacy A. Malaker
- Department of Chemistry, Yale University, 275 Prospect Street, New Haven, Connecticut 06511, United States
| |
Collapse
|