1
|
Lee KW, Pham NT, Min HJ, Park HW, Lee JW, Lo HE, Kwon NY, Seo J, Shaginyan I, Cho H, Wei L, Manavalan B, Jeon YJ. DOGpred: A Novel Deep Learning Framework for Accurate Identification of Human O-linked Threonine Glycosylation Sites. J Mol Biol 2025:168977. [PMID: 39900285 DOI: 10.1016/j.jmb.2025.168977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 01/06/2025] [Accepted: 01/28/2025] [Indexed: 02/05/2025]
Abstract
O-linked glycosylation is a crucial post-transcriptional modification that regulates protein function and biological processes. Dysregulation of this process is associated with various diseases, underscoring the need to accurately identify O-linked glycosylation sites on proteins. Current experimental methods for identifying O-linked threonine glycosylation (OTG) sites are often complex and costly. Consequently, developing computational tools that predict these sites based on protein features is crucial. Such tools can complement experimental approaches, enhancing our understanding of the role of OTG dysregulation in diseases and uncovering potential therapeutic targets. In this study, we developed DOGpred, a deep learning-based predictor for precisely identifying human OTGs using high-latent feature representations. Initially, we extracted nine different conventional feature descriptors (CFDs) and nine pre-trained protein language model (PLM)-based embeddings. Notably, each feature was encoded as a 2D tensor, capturing both the sequential and inherent feature characteristics. Subsequently, we designed a stacked convolutional neural network (CNN) module to learn spatial feature representations from CFDs and a stacked recurrent neural network (RNN) module to learn temporal feature representations from PLM-based embeddings. These features were integrated using attention-based fusion mechanisms to generate high-level feature representations for final classification. Ablation analysis and independent tests demonstrated that the optimal model (DOGpred), employing a stacked 1D CNN and a stacked attention-based RNN module with cross-attention feature fusion, achieved the best performance on the training dataset and significantly outperformed machine learning-based single-feature models and state-of-the-art methods on independent datasets. Furthermore, DOGpred is publicly available at https://github.com/JeonRPM/DOGpred/ for free access and usage.
Collapse
Affiliation(s)
- Ki Wook Lee
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Republic of Korea
| | - Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Republic of Korea
| | - Hye Jung Min
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Republic of Korea
| | - Hyun Woo Park
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Republic of Korea
| | - Ji Won Lee
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Republic of Korea
| | - Han-En Lo
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Republic of Korea
| | - Na Young Kwon
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Republic of Korea
| | - Jimin Seo
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Republic of Korea
| | - Illia Shaginyan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Republic of Korea
| | - Heeje Cho
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Republic of Korea
| | - Leyi Wei
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macau
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Republic of Korea
| | - Young-Jun Jeon
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Republic of Korea.
| |
Collapse
|
2
|
Raju C, Sankaranarayanan K. Insights on post-translational modifications in fatty liver and fibrosis progression. Biochim Biophys Acta Mol Basis Dis 2025; 1871:167659. [PMID: 39788217 DOI: 10.1016/j.bbadis.2025.167659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 12/20/2024] [Accepted: 01/02/2025] [Indexed: 01/12/2025]
Abstract
Metabolic dysfunction-associated steatotic liver disease [MASLD] is a pervasive multifactorial health burden. Post-translational modifications [PTMs] of amino acid residues in protein domains demonstrate pivotal roles for imparting dynamic alterations in the cellular micro milieu. The crux of identifying novel druggable targets relies on comprehensively studying the etiology of metabolic disorders. This review article presents how different chemical moieties of various PTMs like phosphorylation, methylation, ubiquitination, glutathionylation, neddylation, acetylation, SUMOylation, lactylation, crotonylation, hydroxylation, glycosylation, citrullination, S-sulfhydration and succinylation presents the cause-effect contribution towards the MASLD spectra. Additionally, the therapeutic prospects in the management of liver steatosis and hepatic fibrosis via targeting PTMs and regulatory enzymes are also encapsulated. This review seeks to understand the function of protein modifications in progression and promote the markers discovery of diagnostic, prognostic and drug targets towards MASLD management which could also halt the progression of a catalogue of related diseases.
Collapse
Affiliation(s)
- Chithra Raju
- Ion Channel Biology Laboratory, AU-KBC Research Centre, Madras Institute of Technology Campus, Anna University, Chrompet, Chennai 600 044, Tamil Nadu, India
| | - Kavitha Sankaranarayanan
- Ion Channel Biology Laboratory, AU-KBC Research Centre, Madras Institute of Technology Campus, Anna University, Chrompet, Chennai 600 044, Tamil Nadu, India.
| |
Collapse
|
3
|
Pakhrin SC, Chauhan N, Khan S, Upadhyaya J, Beck MR, Blanco E. Prediction of human O-linked glycosylation sites using stacked generalization and embeddings from pre-trained protein language model. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae643. [PMID: 39447059 PMCID: PMC11552629 DOI: 10.1093/bioinformatics/btae643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 10/02/2024] [Accepted: 10/23/2024] [Indexed: 10/26/2024]
Abstract
MOTIVATION O-linked glycosylation, an essential post-translational modification process in Homo sapiens, involves attaching sugar moieties to the oxygen atoms of serine and/or threonine residues. It influences various biological and cellular functions. While threonine or serine residues within protein sequences are potential sites for O-linked glycosylation, not all serine and/or threonine residues undergo this modification, underscoring the importance of characterizing its occurrence. This study presents a novel approach for predicting intracellular and extracellular O-linked glycosylation events on proteins, which are crucial for comprehending cellular processes. Two base multi-layer perceptron models were trained by leveraging a stacked generalization framework. These base models respectively use ProtT5 and Ankh O-linked glycosylation site-specific embeddings whose combined predictions are used to train the meta-multi-layer perceptron model. Trained on extensive O-linked glycosylation datasets, the stacked-generalization model demonstrated high predictive performance on independent test datasets. Furthermore, the study emphasizes the distinction between nucleocytoplasmic and extracellular O-linked glycosylation, offering insights into their functional implications that were overlooked in previous studies. By integrating the protein language model's embedding with stacked generalization techniques, this approach enhances predictive accuracy of O-linked glycosylation events and illuminates the intricate roles of O-linked glycosylation in proteomics, potentially accelerating the discovery of novel glycosylation sites. RESULTS Stack-OglyPred-PLM produces Sensitivity, Specificity, Matthews Correlation Coefficient, and Accuracy of 90.50%, 89.60%, 0.464, and 89.70%, respectively on a benchmark NetOGlyc-4.0 independent test dataset. These results demonstrate that Stack-OglyPred-PLM is a robust computational tool to predict O-linked glycosylation sites in proteins. AVAILABILITY AND IMPLEMENTATION The developed tool, programs, training, and test dataset are available at https://github.com/PakhrinLab/Stack-OglyPred-PLM.
Collapse
Affiliation(s)
- Subash Chandra Pakhrin
- Department of Computer Science and Engineering Technology, University of Houston-Downtown, Houston, TX 77002, United States
| | - Neha Chauhan
- School of Computing, Wichita State University, Wichita, KS 67260, United States
| | - Salman Khan
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, United States
| | - Jamie Upadhyaya
- Department of Computer Science and Engineering Technology, University of Houston-Downtown, Houston, TX 77002, United States
| | - Moriah Rene Beck
- Department of Chemistry and Biochemistry, Wichita State University, Wichita, KS 67260, United States
| | - Eduardo Blanco
- Department of Computer Science, University of Arizona, Tucson, AZ 85721, United States
| |
Collapse
|
4
|
Pham NT, Zhang Y, Rakkiyappan R, Manavalan B. HOTGpred: Enhancing human O-linked threonine glycosylation prediction using integrated pretrained protein language model-based features and multi-stage feature selection approach. Comput Biol Med 2024; 179:108859. [PMID: 39029431 DOI: 10.1016/j.compbiomed.2024.108859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 06/19/2024] [Accepted: 07/06/2024] [Indexed: 07/21/2024]
Abstract
O-linked glycosylation is a complex post-translational modification (PTM) in human proteins that plays a critical role in regulating various cellular metabolic and signaling pathways. In contrast to N-linked glycosylation, O-linked glycosylation lacks specific sequence features and maintains an unstable core structure. Identifying O-linked threonine glycosylation sites (OTGs) remains challenging, requiring extensive experimental tests. While bioinformatics tools have emerged for predicting OTGs, their reliance on limited conventional features and absence of well-defined feature selection strategies limit their effectiveness. To address these limitations, we introduced HOTGpred (Human O-linked Threonine Glycosylation predictor), employing a multi-stage feature selection process to identify the optimal feature set for accurately identifying OTGs. Initially, we assessed 25 different feature sets derived from various pretrained protein language model (PLM)-based embeddings and conventional feature descriptors using nine classifiers. Subsequently, we integrated the top five embeddings linearly and determined the most effective scoring function for ranking hybrid features, identifying the optimal feature set through a process of sequential forward search. Among the classifiers, the extreme gradient boosting (XGBT)-based model, using the optimal feature set (HOTGpred), achieved 92.03 % accuracy on the training dataset and 88.25 % on the balanced independent dataset. Notably, HOTGpred significantly outperformed the current state-of-the-art methods on both the balanced and imbalanced independent datasets, demonstrating its superior prediction capabilities. Additionally, SHapley Additive exPlanations (SHAP) and ablation analyses were conducted to identify the features contributing most significantly to HOTGpred. Finally, we developed an easy-to-navigate web server, accessible at https://balalab-skku.org/HOTGpred/, to support glycobiologists in their research on glycosylation structure and function.
Collapse
Affiliation(s)
- Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea
| | - Ying Zhang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Rajan Rakkiyappan
- Department of Mathematics, Bharathiar University, Coimbatore, 641046, Tamil Nadu, India.
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
5
|
Hu F, Gao J, Zheng J, Kwoh C, Jia C. N-GlycoPred: A hybrid deep learning model for accurate identification of N-glycosylation sites. Methods 2024; 227:48-57. [PMID: 38734394 DOI: 10.1016/j.ymeth.2024.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/16/2024] [Accepted: 05/03/2024] [Indexed: 05/13/2024] Open
Abstract
Studies have shown that protein glycosylation in cells reflects the real-time dynamics of biological processes, and the occurrence and development of many diseases are closely related to protein glycosylation. Abnormal protein glycosylation can be used as a potential diagnostic and prognostic marker of a disease, as well as a therapeutic target and a new breakthrough point for exploring pathogenesis. To address the issue of significant differences in the prediction results of previous models for different species, we constructed a hybrid deep learning model N-GlycoPred on the basis of dual-layer convolution, a paired attention mechanism and BiLSTM for accurate identification of N-glycosylation sites. By adopting one-hot encoding or the AAindex, we specifically selected the optimum combination of features and deep learning frameworks for human and mouse to refine the models. Based on six independent test datasets, our N-GlycoPred model achieved an average AUC of 0.9553, which is 0.23% higher than MusiteDeep. The comparison results indicate that our model can serve as a powerful tool for N-glycosylation site prescreening for biological researchers.
Collapse
Affiliation(s)
- Fengzhu Hu
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jie Gao
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jia Zheng
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Cheekeong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China.
| |
Collapse
|
6
|
Kellman BP, Mariethoz J, Zhang Y, Shaul S, Alteri M, Sandoval D, Jeffris M, Armingol E, Bao B, Lisacek F, Bojar D, Lewis NE. Decoding glycosylation potential from protein structure across human glycoproteins with a multi-view recurrent neural network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594334. [PMID: 38798633 PMCID: PMC11118808 DOI: 10.1101/2024.05.15.594334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Glycosylation is described as a non-templated biosynthesis. Yet, the template-free premise is antithetical to the observation that different N-glycans are consistently placed at specific sites. It has been proposed that glycosite-proximal protein structures could constrain glycosylation and explain the observed microheterogeneity. Using site-specific glycosylation data, we trained a hybrid neural network to parse glycosites (recurrent neural network) and match them to feasible N-glycosylation events (graph neural network). From glycosite-flanking sequences, the algorithm predicts most human N-glycosylation events documented in the GlyConnect database and proposed structures corresponding to observed monosaccharide composition of the glycans at these sites. The algorithm also recapitulated glycosylation in Enhanced Aromatic Sequons, SARS-CoV-2 spike, and IgG3 variants, thus demonstrating the ability of the algorithm to predict both glycan structure and abundance. Thus, protein structure constrains glycosylation, and the neural network enables predictive in silico glycosylation of uncharacterized or novel protein sequences and genetic variants.
Collapse
Affiliation(s)
- Benjamin P. Kellman
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
- Augment Biologics, La Jolla, CA 92092
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, USA
| | - Julien Mariethoz
- Proteome Informatics Group, Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland
| | - Yujie Zhang
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Sigal Shaul
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Mia Alteri
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Daniel Sandoval
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Mia Jeffris
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Erick Armingol
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Bokan Bao
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Frederique Lisacek
- Proteome Informatics Group, Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland
- Computer Science Department & Section of Biology, University of Geneva, route de Drize 7, CH-1227, Geneva, Switzerland
| | - Daniel Bojar
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg 41390, Sweden
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg 41390, Sweden
| | - Nathan E. Lewis
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, USA
| |
Collapse
|
7
|
Nunes MJ, Carvalho AN, Rosa AI, Videira PA, Gama MJ, Rodrigues E, Castro-Caldas M. Altered expression of Sialyl Lewis X in experimental models of Parkinson's disease. J Mol Med (Berl) 2024; 102:365-377. [PMID: 38197965 PMCID: PMC10879467 DOI: 10.1007/s00109-023-02415-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 12/07/2023] [Accepted: 12/22/2023] [Indexed: 01/11/2024]
Abstract
The mechanisms underlying neurodegeneration in Parkinson's disease (PD) are still not fully understood. Glycosylation is an important post-translational modification that affects protein function, cell-cell contacts and inflammation and can be modified in pathologic conditions. Although the involvement of aberrant glycosylation has been proposed for PD, the knowledge of the diversity of glycans and their role in PD is still minimal. Sialyl Lewis X (sLeX) is a sialylated and fucosylated tetrasaccharide with essential roles in cell-to-cell recognition processes. Pathological conditions and pro-inflammatory mediators can up-regulate sLeX expression on cell surfaces, which has important consequences in intracellular signalling and immune function. Here, we investigated the expression of this glycan using in vivo and in vitro models of PD. We show the activation of deleterious glycation-related pathways in mouse striatum upon treatment with 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP), a toxin-based model of PD. Importantly, our results show that MPTP triggers the presentation of more proteins decorated with sLeX in mouse cortex and striatum in a time-dependent manner, as well as increased mRNA expression of its rate-limiting enzyme fucosyltransferase 7. sLeX is expressed in neurons, including dopaminergic neurons, and microglia. Although the underlying mechanism that drives increased sLeX epitopes, the nature of the protein scaffolds and their functional importance in PD remain unknown, our data suggest for the first time that sLeX in the brain may have a role in neuronal signalling and immunomodulation in pathological conditions. KEY MESSAGES: MPTP triggers the presentation of proteins decorated with sLeX in mouse brain. MPTP triggers the expression of sLeX rate-limiting enzyme FUT 7 in striatum. sLeX is expressed in neurons, including dopaminergic neurons, and microglia. sLeX in the brain may have a role in neuronal signalling and immunomodulation.
Collapse
Affiliation(s)
- Maria João Nunes
- Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003, Lisbon, Portugal
| | - Andreia Neves Carvalho
- Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003, Lisbon, Portugal
| | - Alexandra I Rosa
- Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003, Lisbon, Portugal
| | - Paula A Videira
- Department of Life Sciences, UCIBIO, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516, Caparica, Portugal.
- CDG & Allies - Professionals and Patient Associations International Network (CDG & Allies - PPAIN), NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516, Caparica, Portugal.
| | - Maria João Gama
- Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003, Lisbon, Portugal
| | - Elsa Rodrigues
- Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003, Lisbon, Portugal
| | - Margarida Castro-Caldas
- Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003, Lisbon, Portugal.
- Department of Life Sciences, UCIBIO, NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829-516, Caparica, Portugal.
| |
Collapse
|
8
|
Ertelt M, Mulligan VK, Maguire JB, Lyskov S, Moretti R, Schiffner T, Meiler J, Schoeder CT. Combining machine learning with structure-based protein design to predict and engineer post-translational modifications of proteins. PLoS Comput Biol 2024; 20:e1011939. [PMID: 38484014 PMCID: PMC10965067 DOI: 10.1371/journal.pcbi.1011939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 03/26/2024] [Accepted: 02/20/2024] [Indexed: 03/27/2024] Open
Abstract
Post-translational modifications (PTMs) of proteins play a vital role in their function and stability. These modifications influence protein folding, signaling, protein-protein interactions, enzyme activity, binding affinity, aggregation, degradation, and much more. To date, over 400 types of PTMs have been described, representing chemical diversity well beyond the genetically encoded amino acids. Such modifications pose a challenge to the successful design of proteins, but also represent a major opportunity to diversify the protein engineering toolbox. To this end, we first trained artificial neural networks (ANNs) to predict eighteen of the most abundant PTMs, including protein glycosylation, phosphorylation, methylation, and deamidation. In a second step, these models were implemented inside the computational protein modeling suite Rosetta, which allows flexible combination with existing protocols to model the modified sites and understand their impact on protein stability as well as function. Lastly, we developed a new design protocol that either maximizes or minimizes the predicted probability of a particular site being modified. We find that this combination of ANN prediction and structure-based design can enable the modification of existing, as well as the introduction of novel, PTMs. The potential applications of our work include, but are not limited to, glycan masking of epitopes, strengthening protein-protein interactions through phosphorylation, as well as protecting proteins from deamidation liabilities. These applications are especially important for the design of new protein therapeutics where PTMs can drastically change the therapeutic properties of a protein. Our work adds novel tools to Rosetta's protein engineering toolbox that allow for the rational design of PTMs.
Collapse
Affiliation(s)
- Moritz Ertelt
- Institute for Drug Discovery, Leipzig University Medical Faculty, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, Dresden/Leipzig, Germany
| | - Vikram Khipple Mulligan
- Center for Computational Biology, Flatiron Institute, New York, New York, United States of America
| | - Jack B. Maguire
- Program in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Sergey Lyskov
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Rocco Moretti
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Torben Schiffner
- Institute for Drug Discovery, Leipzig University Medical Faculty, Leipzig, Germany
| | - Jens Meiler
- Institute for Drug Discovery, Leipzig University Medical Faculty, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, Dresden/Leipzig, Germany
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Clara T. Schoeder
- Institute for Drug Discovery, Leipzig University Medical Faculty, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, Dresden/Leipzig, Germany
| |
Collapse
|
9
|
Santamaria S. Web-Based Resources to Investigate Protease Function. Methods Mol Biol 2024; 2747:1-18. [PMID: 38038927 DOI: 10.1007/978-1-0716-3589-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
In 2001, the release of the first draft of the human genome marked the beginning of the Big Data era for biological sciences. Since then, the complexity of datasets generated by laboratories worldwide has increased exponentially. Public repositories such as the Protein Data Bank, which has exceeded the 200000 entries in 2023, have been instrumental not only to collect, organize, and distill this enormous research output but also to promote further research enterprises. The achievements of artificial intelligence programs such as AlphaFold would not have been possible without the collective efforts of countless researchers who made their work publicly available. Here, I provide a practical, but far from exhaustive, list of resources useful to investigate protease function.
Collapse
Affiliation(s)
- Salvatore Santamaria
- Department of Biochemical Sciences, School of Biosciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford, Surrey, UK.
| |
Collapse
|
10
|
Hou X, Wang Y, Bu D, Wang Y, Sun S. EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction. Bioinformatics 2023; 39:btad650. [PMID: 37930896 PMCID: PMC10627407 DOI: 10.1093/bioinformatics/btad650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 09/14/2023] [Indexed: 11/08/2023] Open
Abstract
MOTIVATION N-linked glycosylation is a frequently occurring post-translational protein modification that serves critical functions in protein folding, stability, trafficking, and recognition. Its involvement spans across multiple biological processes and alterations to this process can result in various diseases. Therefore, identifying N-linked glycosylation sites is imperative for comprehending the mechanisms and systems underlying glycosylation. Due to the inherent experimental complexities, machine learning and deep learning have become indispensable tools for predicting these sites. RESULTS In this context, a new approach called EMNGly has been proposed. The EMNGly approach utilizes pretrained protein language model (Evolutionary Scale Modeling) and pretrained protein structure model (Inverse Folding Model) for features extraction and support vector machine for classification. Ten-fold cross-validation and independent tests show that this approach has outperformed existing techniques. And it achieves Matthews Correlation Coefficient, sensitivity, specificity, and accuracy of 0.8282, 0.9343, 0.8934, and 0.9143, respectively on a benchmark independent test set.
Collapse
Affiliation(s)
- Xiaoyang Hou
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yu Wang
- Syneron Technology, Guangzhou 510000, China
| | - Dongbo Bu
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yaojun Wang
- College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
| | - Shiwei Sun
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
11
|
Li C, Dan W, Li P, Xin M, Lan R, Zhu B, Chen Z, Dong W, Dang L, Zhang X, Sun S. Site-specific N-glycan changes during semen liquefaction. Anal Biochem 2023; 680:115318. [PMID: 37696464 DOI: 10.1016/j.ab.2023.115318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 07/16/2023] [Accepted: 09/07/2023] [Indexed: 09/13/2023]
Abstract
Normal liquefaction of semen is one of the key steps to ensure the smooth progress of fertilization, and glycosylation has been reported to be involved in the whole process of fertilization. Till now, it is still unclear whether and how glycosylation changes during the liquefaction process of semen. In this study, by performing a glycoproteomic analysis of human semen with the liquefaction process (liquefaction time of semen: 0 min vs 30 min) using our recently developed StrucGP software combined with the Tandem Mass Tags (TMT) based quantification, we identified 25 intact glycopeptides (IGPs) from 10 glycoproteins in semen that were significantly changed during liquefaction, including 23 up-regulated and two down-regulated. Among the 23 up-regulated glycopeptides, half were modified with sialylated glycans, suggesting that sialylated glycans may play a key role in the semen liquefaction process. The data provide an invaluable resource for further studies on the role of glycosylation during semen liquefaction.
Collapse
Affiliation(s)
- Cheng Li
- College of Life Sciences, Northwest University, Xi'an, Shaanxi Province, 710069, PR China
| | - Wei Dan
- College of Life Sciences, Northwest University, Xi'an, Shaanxi Province, 710069, PR China
| | - Pengfei Li
- College of Life Sciences, Northwest University, Xi'an, Shaanxi Province, 710069, PR China
| | - Miaomiao Xin
- College of Life Sciences, Northwest University, Xi'an, Shaanxi Province, 710069, PR China
| | - Rongxia Lan
- College of Life Sciences, Northwest University, Xi'an, Shaanxi Province, 710069, PR China
| | - Bojing Zhu
- College of Life Sciences, Northwest University, Xi'an, Shaanxi Province, 710069, PR China
| | - Zexuan Chen
- College of Life Sciences, Northwest University, Xi'an, Shaanxi Province, 710069, PR China
| | - Wenbo Dong
- College of Life Sciences, Northwest University, Xi'an, Shaanxi Province, 710069, PR China
| | - Liuyi Dang
- College of Life Sciences, Northwest University, Xi'an, Shaanxi Province, 710069, PR China
| | - Xinwen Zhang
- Center of Medical Genetics, Xi'an People's Hospital (Xi'an Fourth Hospital), Xi'an, Shaanxi, 710004, PR China
| | - Shisheng Sun
- College of Life Sciences, Northwest University, Xi'an, Shaanxi Province, 710069, PR China.
| |
Collapse
|
12
|
Zeng Y, Yuan Z, Chen Y, Hu Y. CBDT-Oglyc: Prediction of O-glycosylation sites using ChiMIC-based balanced decision table and feature selection. J Bioinform Comput Biol 2023; 21:2350024. [PMID: 37899352 DOI: 10.1142/s0219720023500245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2023]
Abstract
O-glycosylation (Oglyc) plays an important role in various biological processes. The key to understanding the mechanisms of Oglyc is identifying the corresponding glycosylation sites. Two critical steps, feature selection and classifier design, greatly affect the accuracy of computational methods for predicting Oglyc sites. Based on an efficient feature selection algorithm and a classifier capable of handling imbalanced datasets, a new computational method, ChiMIC-based balanced decision table O-glycosylation (CBDT-Oglyc), is proposed. ChiMIC-based balanced decision table for O-glycosylation (CBDT-Oglyc), is proposed to predict Oglyc sites in proteins. Sequence characterization is performed by combining amino acid composition (AAC), undirected composition of [Formula: see text]-spaced amino acid pairs (undirected-CKSAAP) and pseudo-position-specific scoring matrix (PsePSSM). Chi-MIC-share algorithm is used for feature selection, which simplifies the model and improves predictive accuracy. For imbalanced classification, a backtracking method based on local chi-square test is designed, and then cost-sensitive learning is incorporated to construct a novel classifier named ChiMIC-based balanced decision table (CBDT). Based on a 1:49 (positives:negatives) training set, the CBDT classifier achieves significantly better prediction performance than traditional classifiers. Moreover, the independent test results on separate human and mouse glycoproteins show that CBDT-Oglyc outperforms previous methods in global accuracy. CBDT-Oglyc shows great promise in predicting Oglyc sites and is expected to facilitate further experimental studies on protein glycosylation.
Collapse
Affiliation(s)
- Ying Zeng
- School of Computer and Communication, Hunan Institute of Engineering, Xiangtan 411104, Hunan, P. R. China
| | - Zheming Yuan
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, Hunan, P. R. China
| | - Yuan Chen
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, Hunan, P. R. China
| | - Ying Hu
- School of Computer and Communication, Hunan Institute of Engineering, Xiangtan 411104, Hunan, P. R. China
| |
Collapse
|
13
|
Toul M, Slonkova V, Mican J, Urminsky A, Tomkova M, Sedlak E, Bednar D, Damborsky J, Hernychova L, Prokop Z. Identification, characterization, and engineering of glycosylation in thrombolyticsa. Biotechnol Adv 2023; 66:108174. [PMID: 37182613 DOI: 10.1016/j.biotechadv.2023.108174] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 05/09/2023] [Accepted: 05/09/2023] [Indexed: 05/16/2023]
Abstract
Cardiovascular diseases, such as myocardial infarction, ischemic stroke, and pulmonary embolism, are the most common causes of disability and death worldwide. Blood clot hydrolysis by thrombolytic enzymes and thrombectomy are key clinical interventions. The most widely used thrombolytic enzyme is alteplase, which has been used in clinical practice since 1986. Another clinically used thrombolytic protein is tenecteplase, which has modified epitopes and engineered glycosylation sites, suggesting that carbohydrate modification in thrombolytic enzymes is a viable strategy for their improvement. This comprehensive review summarizes current knowledge on computational and experimental identification of glycosylation sites and glycan identity, together with methods used for their reengineering. Practical examples from previous studies focus on modification of glycosylations in thrombolytics, e.g., alteplase, tenecteplase, reteplase, urokinase, saruplase, and desmoteplase. Collected clinical data on these glycoproteins demonstrate the great potential of this engineering strategy. Outstanding combinatorics originating from multiple glycosylation sites and the vast variety of covalently attached glycan species can be addressed by directed evolution or rational design. Directed evolution pipelines would benefit from more efficient cell-free expression and high-throughput screening assays, while rational design must employ structure prediction by machine learning and in silico characterization by supercomputing. Perspectives on challenges and opportunities for improvement of thrombolytic enzymes by engineering and evolution of protein glycosylation are provided.
Collapse
Affiliation(s)
- Martin Toul
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Veronika Slonkova
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Jan Mican
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Adam Urminsky
- Research Centre for Applied Molecular Oncology, Masaryk Memorial Cancer Institute, Zluty kopec 7, 656 53 Brno, Czech Republic
| | - Maria Tomkova
- Center for Interdisciplinary Biosciences, P. J. Safarik University in Kosice, Jesenna 5, 04154 Kosice, Slovakia
| | - Erik Sedlak
- Center for Interdisciplinary Biosciences, P. J. Safarik University in Kosice, Jesenna 5, 04154 Kosice, Slovakia
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Lenka Hernychova
- Research Centre for Applied Molecular Oncology, Masaryk Memorial Cancer Institute, Zluty kopec 7, 656 53 Brno, Czech Republic.
| | - Zbynek Prokop
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital, Pekarska 53, 656 91 Brno, Czech Republic.
| |
Collapse
|
14
|
Tang H, Tang Q, Zhang Q, Feng P. O-GlyThr: Prediction of human O-linked threonine glycosites using multi-feature fusion. Int J Biol Macromol 2023; 242:124761. [PMID: 37156312 DOI: 10.1016/j.ijbiomac.2023.124761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 05/01/2023] [Accepted: 05/02/2023] [Indexed: 05/10/2023]
Abstract
O-linked glycosylation is one of the most complex post-translational modifications (PTM) of human proteins modulating various cellular metabolic and signaling pathways. Unlike N-glycosylation, the O-glycosylation has nonspecific sequence features and nonstable glycan core structure, which makes identification of O-glycosites more challenging either by experimental or computational methods. Biochemical experiments to identify O-glycosites in batches are technically and economically demanding. Therefore, development of computation-based methods is greatly warranted. This study constructed a prediction model based on feature fusion for O-glycosites linked to the threonine residues in Homo sapiens. In the training model, we collected and sorted out high-quality human protein data with O-linked threonine glycosites. Seven feature coding methods were fused to represent the sample sequence. By comparison of different algorithms, random forest was selected as the final classifier to construct the classification model. Through 5-fold cross-validation, the proposed model, namely O-GlyThr, performed satisfactorily on both training set (AUC: 0.9308) and independent validation dataset (AUC: 0.9323). Compared with previously published predictors, O-GlyThr achieved the highest ACC of 0.8475 on the independent test dataset. These results demonstrated the high competency of our predictor in identifying O-glycosites on threonine residues. Furthermore, a user-friendly webserver named O-GlyThr (http://cbcb.cdutcm.edu.cn/O-GlyThr/) was developed to assist glycobiologists in the research associated with glycosylation structure and function.
Collapse
Affiliation(s)
- Hua Tang
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China
| | - Qiang Tang
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China
| | - Pengmian Feng
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.
| |
Collapse
|
15
|
Yin X, Wang W, Seah SYK, Mine Y, Fan MZ. Deglycosylation Differentially Regulates Weaned Porcine Gut Alkaline Phosphatase Isoform Functionality along the Longitudinal Axis. Pathogens 2023; 12:pathogens12030407. [PMID: 36986329 PMCID: PMC10053101 DOI: 10.3390/pathogens12030407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 02/27/2023] [Accepted: 02/28/2023] [Indexed: 03/06/2023] Open
Abstract
Gut alkaline phosphatases (AP) dephosphorylate the lipid moiety of endotoxin and other pathogen-associated-molecular patterns members, thus maintaining gut eubiosis and preventing metabolic endotoxemia. Early weaned pigs experience gut dysbiosis, enteric diseases and growth retardation in association with decreased intestinal AP functionality. However, the role of glycosylation in modulation of the weaned porcine gut AP functionality is unclear. Herein three different research approaches were taken to investigate how deglycosylation affected weaned porcine gut AP activity kinetics. In the first approach, weaned porcine jejunal AP isoform (IAP) was fractionated by the fast protein-liquid chromatography and purified IAP fractions were kinetically characterized to be the higher-affinity and lower-capacity glycosylated mature IAP (p < 0.05) in comparison with the lower-affinity and higher-capacity non-glycosylated pre-mature IAP. The second approach enzyme activity kinetic analyses showed that N-deglycosylation of AP by the peptide N-glycosidase-F enzyme reduced (p < 0.05) the IAP maximal activity in the jejunum and ileum and decreased AP affinity (p < 0.05) in the large intestine. In the third approach, the porcine IAP isoform-X1 (IAPX1) gene was overexpressed in the prokaryotic ClearColiBL21 (DE3) cell and the recombinant porcine IAPX1 was associated with reduced (p < 0.05) enzyme affinity and maximal enzyme activity. Therefore, levels of glycosylation can modulate plasticity of weaned porcine gut AP functionality towards maintaining gut microbiome and the whole-body physiological status.
Collapse
Affiliation(s)
- Xindi Yin
- Department of Animal Biosciences, University of Guelph, Guelph, ON N1G 2W1, Canada
- Key Laboratory of Precision Nutrition and Food Quality, Department of Nutrition and Health, China Agricultural University, Beijing 100083, China
| | - Weijun Wang
- Department of Animal Biosciences, University of Guelph, Guelph, ON N1G 2W1, Canada
- Canadian Food Inspection Agency (CFIA)-Ontario Operation, Guelph, ON N1G 4S9, Canada
| | - Stephen Y. K. Seah
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Yoshinori Mine
- Department of Food Science, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Ming Z. Fan
- Department of Animal Biosciences, University of Guelph, Guelph, ON N1G 2W1, Canada
- One Health Institute, University of Guelph, Guelph, ON N1G 2W1, Canada
- Correspondence:
| |
Collapse
|
16
|
Yoodee S, Thongboonkerd V. Bioinformatics and computational analyses of kidney stone modulatory proteins lead to solid experimental evidence and therapeutic potential. Biomed Pharmacother 2023; 159:114217. [PMID: 36623450 DOI: 10.1016/j.biopha.2023.114217] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 12/26/2022] [Accepted: 01/04/2023] [Indexed: 01/09/2023] Open
Abstract
In recent biomedical research, bioinformatics and computational analyses have played essential roles for examining experimental findings and database information. Several bioinformatic tools have been developed and made publicly available for analyzing protein sequence, structure, functional motif/domain, and interactions network. Such properties are very helpful to define biochemical and functional roles of the protein(s) of interest. During the past few decades, bioinformatics and computational biotechnology have been widely applied to kidney stone research. This review summarizes commonly used tools and evidence of bioinformatics and computational biotechnology applied to kidney stone disease (KSD) with special emphasis on analyses of the stone modulatory proteins that play critical roles in kidney stone formation. Such analyses lead to solid experimental evidence to demonstrate mechanisms underlying their stone modulatory activities. The findings obtained from such analyses may also lead to better understanding of KSD pathogenesis and to further development of new therapeutic and preventive strategies.
Collapse
Affiliation(s)
- Sunisa Yoodee
- Medical Proteomics Unit, Research Department, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Visith Thongboonkerd
- Medical Proteomics Unit, Research Department, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
17
|
Taylor MK, Williams EP, Xue Y, Jenjaroenpun P, Wongsurawat T, Smith AP, Smith AM, Parvathareddy J, Kong Y, Vogel P, Cao X, Reichard W, Spruill-Harrell B, Samarasinghe AE, Nookaew I, Fitzpatrick EA, Smith MD, Aranha M, Smith JC, Jonsson CB. Dissecting Phenotype from Genotype with Clinical Isolates of SARS-CoV-2 First Wave Variants. Viruses 2023; 15:611. [PMID: 36992320 PMCID: PMC10059853 DOI: 10.3390/v15030611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 02/06/2023] [Accepted: 02/10/2023] [Indexed: 02/25/2023] Open
Abstract
The emergence and availability of closely related clinical isolates of SARS-CoV-2 offers a unique opportunity to identify novel nonsynonymous mutations that may impact phenotype. Global sequencing efforts show that SARS-CoV-2 variants have emerged and then been replaced since the beginning of the pandemic, yet we have limited information regarding the breadth of variant-specific host responses. Using primary cell cultures and the K18-hACE2 mouse, we investigated the replication, innate immune response, and pathology of closely related, clinical variants circulating during the first wave of the pandemic. Mathematical modeling of the lung viral replication of four clinical isolates showed a dichotomy between two B.1. isolates with significantly faster and slower infected cell clearance rates, respectively. While isolates induced several common immune host responses to infection, one B.1 isolate was unique in the promotion of eosinophil-associated proteins IL-5 and CCL11. Moreover, its mortality rate was significantly slower. Lung microscopic histopathology suggested further phenotypic divergence among the five isolates showing three distinct sets of phenotypes: (i) consolidation, alveolar hemorrhage, and inflammation, (ii) interstitial inflammation/septal thickening and peribronchiolar/perivascular lymphoid cells, and (iii) consolidation, alveolar involvement, and endothelial hypertrophy/margination. Together these findings show divergence in the phenotypic outcomes of these clinical isolates and reveal the potential importance of nonsynonymous mutations in nsp2 and ORF8.
Collapse
Affiliation(s)
- Mariah K. Taylor
- Department of Microbiology, Immunology and Biochemistry, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Evan P. Williams
- Department of Microbiology, Immunology and Biochemistry, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Yi Xue
- Department of Microbiology, Immunology and Biochemistry, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Piroon Jenjaroenpun
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Thidathip Wongsurawat
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Amanda P. Smith
- Department of Pediatrics, The University of Tennessee Health Science Center, Memphis, TN 38103, USA
| | - Amber M. Smith
- Department of Microbiology, Immunology and Biochemistry, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
- Department of Pediatrics, The University of Tennessee Health Science Center, Memphis, TN 38103, USA
- Institute for the Study of Host-Pathogen Systems, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Jyothi Parvathareddy
- Regional Biocontainment Laboratory, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Ying Kong
- Department of Microbiology, Immunology and Biochemistry, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Peter Vogel
- Veterinary Pathology Core Laboratory, St Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Xueyuan Cao
- Department of Health Promotion and Disease Prevention, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Walter Reichard
- Department of Microbiology, Immunology and Biochemistry, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Briana Spruill-Harrell
- Department of Microbiology, Immunology and Biochemistry, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Amali E. Samarasinghe
- Department of Microbiology, Immunology and Biochemistry, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
- Department of Pediatrics, The University of Tennessee Health Science Center, Memphis, TN 38103, USA
| | - Intawat Nookaew
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Elizabeth A. Fitzpatrick
- Department of Microbiology, Immunology and Biochemistry, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
- Institute for the Study of Host-Pathogen Systems, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Micholas Dean Smith
- Center for Molecular Biophysics, University of Tennessee-Oak Ridge National Laboratory, Knoxville, TN 37996, USA
- Department of Biochemistry and Cellular and Molecular Biology, The University of Tennessee- Knoxville, Knoxville, TN 37996, USA
| | - Michelle Aranha
- Department of Biochemistry and Cellular and Molecular Biology, The University of Tennessee- Knoxville, Knoxville, TN 37996, USA
| | - Jeremy C. Smith
- Center for Molecular Biophysics, University of Tennessee-Oak Ridge National Laboratory, Knoxville, TN 37996, USA
- Department of Biochemistry and Cellular and Molecular Biology, The University of Tennessee- Knoxville, Knoxville, TN 37996, USA
| | - Colleen B. Jonsson
- Department of Microbiology, Immunology and Biochemistry, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
- Institute for the Study of Host-Pathogen Systems, University of Tennessee Health Science Center, Memphis, TN 38163, USA
- Regional Biocontainment Laboratory, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| |
Collapse
|
18
|
Vilen Z, Reeves AE, Huang ML. (Glycan Binding) Activity‐Based Protein Profiling in Cells Enabled by Mass Spectrometry‐Based Proteomics. Isr J Chem 2023; 63. [PMID: 37131487 PMCID: PMC10150848 DOI: 10.1002/ijch.202200097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The presence of glycan modifications at the cell surface and other locales positions them as key regulators of cell recognition and function. However, due to the complexity of glycosylation, the annotation of which proteins bear glycan modifications, which glycan patterns are present, and which proteins are capable of binding glycans is incomplete. Inspired by activity-based protein profiling to enrich for proteins in cells based on select characteristics, these endeavors have been greatly advanced by the development of appropriate glycan-binding and glycan-based probes. Here, we provide context for these three problems and describe how the capability of molecules to interact with glycans has enabled the assignment of proteins with specific glycan modifications or of proteins that bind glycans. Furthermore, we discuss how the integration of these probes with high resolution mass spectrometry-based technologies has greatly advanced glycoscience.
Collapse
Affiliation(s)
- Zak Vilen
- Skaggs Graduate School of Chemical and Biological Sciences Scripps Research 10550 N. Torrey Pines Rd. La Jolla CA 92037 USA
- Department of Molecular Medicine Scripps Research 10550 N. Torrey Pines Rd. La Jolla CA 92037, USA
| | - Abigail E. Reeves
- Skaggs Graduate School of Chemical and Biological Sciences Scripps Research 10550 N. Torrey Pines Rd. La Jolla CA 92037 USA
- Department of Molecular Medicine Scripps Research 10550 N. Torrey Pines Rd. La Jolla CA 92037, USA
| | - Mia L. Huang
- Skaggs Graduate School of Chemical and Biological Sciences Scripps Research 10550 N. Torrey Pines Rd. La Jolla CA 92037 USA
- Department of Molecular Medicine Scripps Research 10550 N. Torrey Pines Rd. La Jolla CA 92037, USA
| |
Collapse
|
19
|
Weigle AT, Feng J, Shukla D. Thirty years of molecular dynamics simulations on posttranslational modifications of proteins. Phys Chem Chem Phys 2022; 24:26371-26397. [PMID: 36285789 PMCID: PMC9704509 DOI: 10.1039/d2cp02883b] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
Posttranslational modifications (PTMs) are an integral component to how cells respond to perturbation. While experimental advances have enabled improved PTM identification capabilities, the same throughput for characterizing how structural changes caused by PTMs equate to altered physiological function has not been maintained. In this Perspective, we cover the history of computational modeling and molecular dynamics simulations which have characterized the structural implications of PTMs. We distinguish results from different molecular dynamics studies based upon the timescales simulated and analysis approaches used for PTM characterization. Lastly, we offer insights into how opportunities for modern research efforts on in silico PTM characterization may proceed given current state-of-the-art computing capabilities and methodological advancements.
Collapse
Affiliation(s)
- Austin T Weigle
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Jiangyan Feng
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA.
| |
Collapse
|
20
|
Li H, Chiang AWT, Lewis NE. Artificial intelligence in the analysis of glycosylation data. Biotechnol Adv 2022; 60:108008. [PMID: 35738510 PMCID: PMC11157671 DOI: 10.1016/j.biotechadv.2022.108008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 06/15/2022] [Accepted: 06/16/2022] [Indexed: 11/18/2022]
Abstract
Glycans are complex, yet ubiquitous across biological systems. They are involved in diverse essential organismal functions. Aberrant glycosylation may lead to disease development, such as cancer, autoimmune diseases, and inflammatory diseases. Glycans, both normal and aberrant, are synthesized using extensive glycosylation machinery, and understanding this machinery can provide invaluable insights for diagnosis, prognosis, and treatment of various diseases. Increasing amounts of glycomics data are being generated thanks to advances in glycoanalytics technologies, but to maximize the value of such data, innovations are needed for analyzing and interpreting large-scale glycomics data. Artificial intelligence (AI) provides a powerful analysis toolbox in many scientific fields, and here we review state-of-the-art AI approaches on glycosylation analysis. We further discuss how models can be analyzed to gain mechanistic insights into glycosylation machinery and how the machinery shapes glycans under different scenarios. Finally, we propose how to leverage the gained knowledge for developing predictive AI-based models of glycosylation. Thus, guiding future research of AI-based glycosylation model development will provide valuable insights into glycosylation and glycan machinery.
Collapse
Affiliation(s)
- Haining Li
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Austin W T Chiang
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA.
| | - Nathan E Lewis
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
21
|
Abstract
Artificial intelligence (AI) methods have been and are now being increasingly integrated in prediction software implemented in bioinformatics and its glycoscience branch known as glycoinformatics. AI techniques have evolved in the past decades, and their applications in glycoscience are not yet widespread. This limited use is partly explained by the peculiarities of glyco-data that are notoriously hard to produce and analyze. Nonetheless, as time goes, the accumulation of glycomics, glycoproteomics, and glycan-binding data has reached a point where even the most recent deep learning methods can provide predictors with good performance. We discuss the historical development of the application of various AI methods in the broader field of glycoinformatics. A particular focus is placed on shining a light on challenges in glyco-data handling, contextualized by lessons learnt from related disciplines. Ending on the discussion of state-of-the-art deep learning approaches in glycoinformatics, we also envision the future of glycoinformatics, including development that need to occur in order to truly unleash the capabilities of glycoscience in the systems biology era.
Collapse
Affiliation(s)
- Daniel Bojar
- Department
of Chemistry and Molecular Biology, University
of Gothenburg, Gothenburg 41390, Sweden
- Wallenberg
Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg 41390, Sweden
| | - Frederique Lisacek
- Proteome
Informatics Group, Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland
- Computer
Science Department & Section of Biology, University of Geneva, route de Drize 7, CH-1227, Geneva, Switzerland
| |
Collapse
|
22
|
Akmal MA, Hassan MA, Muhammad S, Khurshid KS, Mohamed A. An analytical study on the identification of N-linked glycosylation sites using machine learning model. PeerJ Comput Sci 2022; 8:e1069. [PMID: 36262138 PMCID: PMC9575850 DOI: 10.7717/peerj-cs.1069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 07/25/2022] [Indexed: 06/16/2023]
Abstract
N-linked is the most common type of glycosylation which plays a significant role in identifying various diseases such as type I diabetes and cancer and helps in drug development. Most of the proteins cannot perform their biological and psychological functionalities without undergoing such modification. Therefore, it is essential to identify such sites by computational techniques because of experimental limitations. This study aims to analyze and synthesize the progress to discover N-linked places using machine learning methods. It also explores the performance of currently available tools to predict such sites. Almost seventy research articles published in recognized journals of the N-linked glycosylation field have shortlisted after the rigorous filtering process. The findings of the studies have been reported based on multiple aspects: publication channel, feature set construction method, training algorithm, and performance evaluation. Moreover, a literature survey has developed a taxonomy of N-linked sequence identification. Our study focuses on the performance evaluation criteria, and the importance of N-linked glycosylation motivates us to discover resources that use computational methods instead of the experimental method due to its limitations.
Collapse
Affiliation(s)
- Muhammad Aizaz Akmal
- Department of Computer Science, University of Engineering and Technology, KSK, Lahore, Punjab, Pakistan
| | - Muhammad Awais Hassan
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | - Shoaib Muhammad
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | - Khaldoon S. Khurshid
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | | |
Collapse
|
23
|
Pajic P, Shen S, Qu J, May AJ, Knox S, Ruhl S, Gokcumen O. A mechanism of gene evolution generating mucin function. SCIENCE ADVANCES 2022; 8:eabm8757. [PMID: 36026444 PMCID: PMC9417175 DOI: 10.1126/sciadv.abm8757] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 07/12/2022] [Indexed: 05/12/2023]
Abstract
How novel gene functions evolve is a fundamental question in biology. Mucin proteins, a functionally but not evolutionarily defined group of proteins, allow the study of convergent evolution of gene function. By analyzing the genomic variation of mucins across a wide range of mammalian genomes, we propose that exonic repeats and their copy number variation contribute substantially to the de novo evolution of new gene functions. By integrating bioinformatic, phylogenetic, proteomic, and immunohistochemical approaches, we identified 15 undescribed instances of evolutionary convergence, where novel mucins originated by gaining densely O-glycosylated exonic repeat domains. Our results suggest that secreted proteins rich in proline are natural precursors for acquiring mucin function. Our findings have broad implications for understanding the role of exonic repeats in the parallel evolution of new gene functions, especially those involving protein glycosylation.
Collapse
Affiliation(s)
- Petar Pajic
- Department of Biological Sciences, University at Buffalo, The State University of New York, Buffalo, NY 14260, USA
- Department of Oral Biology, School of Dental Medicine, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
| | - Shichen Shen
- Department of Pharmaceutical Sciences, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
- Center of Excellence in Bioinformatics and Life Science, Buffalo, NY 14203, USA
| | - Jun Qu
- Department of Pharmaceutical Sciences, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
- Center of Excellence in Bioinformatics and Life Science, Buffalo, NY 14203, USA
| | - Alison J. May
- Program in Craniofacial Biology, Department of Cell and Tissue Biology, School of Dentistry, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Sarah Knox
- Program in Craniofacial Biology, Department of Cell and Tissue Biology, School of Dentistry, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Stefan Ruhl
- Department of Oral Biology, School of Dental Medicine, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo, The State University of New York, Buffalo, NY 14260, USA
| |
Collapse
|
24
|
Puranik A, Dandekar P, Jain R. Exploring the potential of machine learning for more efficient development and production of biopharmaceuticals. Biotechnol Prog 2022; 38:e3291. [PMID: 35918873 DOI: 10.1002/btpr.3291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 06/20/2022] [Accepted: 07/31/2022] [Indexed: 11/10/2022]
Abstract
Principles of Industry 4.0 direct us to predict how pharmaceutical operations and regulations may exist with automation, digitization, artificial intelligence (AI), and real time data acquisition. Machine learning (ML), a sub-discipline of AI, involves the use of statistical tools to extract the desired information either through understanding the underlying patterns in the information or by development of mathematical relationships among the critical process parameters (CPPs) and critical quality attributes (CQAs) of biopharmaceuticals. ML is still in its infancy for directly supporting the quality-by-design based development and manufacturing of biopharmaceuticals. However, adoption of ML-based models in place of conventional multi-variate-data-analysis (MVDA) is increasing with the accumulation of large-scale data. This has been majorly contributed by the real-time monitoring of process variables and quality attributes of products through the implementation of process analytical technology in biopharmaceutical manufacturing. All aspects of healthcare, from drug design to product distribution, are complex and multidimensional. Thus, ML-based approaches are being applied to achieve sophistication, accuracy, flexibility and agility in all these areas. This review discusses the potential of ML for addressing the complex issues in diverse areas of biopharmaceutical development, such as biopharmaceuticals design and assessment of early stage development, upstream and downstream process development, analysis, characterization and prediction of post translational modifications (PTMs), formulation and stability studies. Moreover, the challenges in acquisition, cleaning and structuring the bioprocess data, which is one of the major hurdles in implementation of ML in biopharma industry, have also been discussed. Regulatory perspectives on implementation of AI/ML in the biopharma sector have also been briefly discussed. This article is a bird's eye view on the recent developments and applications of ML in overcoming the challenges for adopting "Industry - 4.0" in the biopharma industry.
Collapse
Affiliation(s)
- Amita Puranik
- Department of Chemical Engineering, Institute of Chemical Technology, Matunga, Mumbai, India
| | - Prajakta Dandekar
- Department of Pharmaceutical Sciences and Technology, Institute of Chemical Technology, Matunga, Mumbai, India
| | - Ratnesh Jain
- Department of Chemical Engineering, Institute of Chemical Technology, Matunga, Mumbai, India
| |
Collapse
|
25
|
Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022; 10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open
Abstract
Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit-explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring "the state of the art" in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI-PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI-PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI-PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the "state of the art" on research in the AI-PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.
Collapse
Affiliation(s)
- Jalil Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Luis Ochoa-Toledo
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Mario Javier Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Atocha Aliseda
- Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Fernando Pérez-Escamirosa
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | | | - Francine Ochoa-Fernández
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Ricardo Zamora-Solís
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Sebastián Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Cristina Revilla-Monsalve
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Nicolás Kemper-Valverde
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Myriam M. Altamirano-Bustamante
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| |
Collapse
|
26
|
A convolutional neural network based tool for predicting protein AMPylation sites from binary profile representation. Sci Rep 2022; 12:11451. [PMID: 35794165 PMCID: PMC9259580 DOI: 10.1038/s41598-022-15403-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Accepted: 06/23/2022] [Indexed: 11/09/2022] Open
Abstract
AMPylation is an emerging post-translational modification that occurs on the hydroxyl group of threonine, serine, or tyrosine via a phosphodiester bond. AMPylators catalyze this process as covalent attachment of adenosine monophosphate to the amino acid side chain of a peptide. Recent studies have shown that this post-translational modification is directly responsible for the regulation of neurodevelopment and neurodegeneration and is also involved in many physiological processes. Despite the importance of this post-translational modification, there is no peptide sequence dataset available for conducting computation analysis. Therefore, so far, no computational approach has been proposed for predicting AMPylation. In this study, we introduce a new dataset of this distinct post-translational modification and develop a new machine learning tool using a deep convolutional neural network called DeepAmp to predict AMPylation sites in proteins. DeepAmp achieves 77.7%, 79.1%, 76.8%, 0.55, and 0.85 in terms of Accuracy, Sensitivity, Specificity, Matthews Correlation Coefficient, and Area Under Curve for AMPylation site prediction task, respectively. As the first machine learning model, DeepAmp demonstrate promising results which highlight its potential to solve this problem. Our presented dataset and DeepAmp as a standalone predictor are publicly available at https://github.com/MehediAzim/DeepAmp .
Collapse
|
27
|
Puranik A, Saldanha M, Chirmule N, Dandekar P, Jain R. Advanced strategies in glycosylation prediction and control during biopharmaceutical development: Avenues toward Industry 4.0. Biotechnol Prog 2022; 38:e3283. [PMID: 35752935 DOI: 10.1002/btpr.3283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 05/31/2022] [Accepted: 06/17/2022] [Indexed: 11/09/2022]
Abstract
Glycosylation has been shown to define the safety and efficacy of biopharmaceuticals, thus classified as a critical quality attribute. However, controlling glycan heterogeneity has always been a major challenge owing to the multi-variate factors that govern the glycosylation process. Conventional approaches for controlling glycosylation such as gene editing and metabolic control have succeeded in obtaining desired glycan profiles in accordance with the Quality by Design paradigm. Nonetheless, the development of smart algorithms and omics-enabled complete cell characterization have made it possible to predict glycan profiles beforehand, and manipulate process variables accordingly. This review thus discusses the various approaches available for control and prediction of glycosylation in biopharmaceuticals. Further, the futuristic goal of integrating such technologies is discussed in order to attain an automated and digitized continuous bioprocess for control of glycosylation. Given, control of a process as complex as glycosylation requires intense monitoring intervention, we examine the current technologies that enable automation. Finally, we discuss the challenges and the technological gap that currently limits incorporation of an automated process in routine bio-manufacturing, with a glimpse into the economic bearing. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Amita Puranik
- Department of Chemical Engineering, Institute of Chemical Technology, Matunga, Mumbai, India
| | - Marianne Saldanha
- Department of Chemical Engineering, Institute of Chemical Technology, Matunga, Mumbai, India
| | | | - Prajakta Dandekar
- Department of Pharmaceutical Sciences and Technology, Institute of Chemical Technology, Matunga, Mumbai, India
| | - Ratnesh Jain
- Department of Chemical Engineering, Institute of Chemical Technology, Matunga, Mumbai, India
| |
Collapse
|
28
|
Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2499:285-322. [PMID: 35696087 DOI: 10.1007/978-1-0716-2317-6_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Posttranslational modification (PTM ) is a ubiquitous phenomenon in both eukaryotes and prokaryotes which gives rise to enormous proteomic diversity. PTM mostly comes in two flavors: covalent modification to polypeptide chain and proteolytic cleavage. Understanding and characterization of PTM is a fundamental step toward understanding the underpinning of biology. Recent advances in experimental approaches, mainly mass-spectrometry-based approaches, have immensely helped in obtaining and characterizing PTMs. However, experimental approaches are not enough to understand and characterize more than 450 different types of PTMs and complementary computational approaches are becoming popular. Recently, due to the various advancements in the field of Deep Learning (DL), along with the explosion of applications of DL to various fields, the field of computational prediction of PTM has also witnessed the development of a plethora of deep learning (DL)-based approaches. In this book chapter, we first review some recent DL-based approaches in the field of PTM site prediction. In addition, we also review the recent advances in the not-so-studied PTM , that is, proteolytic cleavage predictions. We describe advances in PTM prediction by highlighting the Deep learning architecture, feature encoding, novelty of the approaches, and availability of the tools/approaches. Finally, we provide an outlook and possible future research directions for DL-based approaches for PTM prediction.
Collapse
|
29
|
Flevaris K, Kontoravdi C. Immunoglobulin G N-glycan Biomarkers for Autoimmune Diseases: Current State and a Glycoinformatics Perspective. Int J Mol Sci 2022; 23:5180. [PMID: 35563570 PMCID: PMC9100869 DOI: 10.3390/ijms23095180] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/02/2022] [Accepted: 05/04/2022] [Indexed: 02/04/2023] Open
Abstract
The effective treatment of autoimmune disorders can greatly benefit from disease-specific biomarkers that are functionally involved in immune system regulation and can be collected through minimally invasive procedures. In this regard, human serum IgG N-glycans are promising for uncovering disease predisposition and monitoring progression, and for the identification of specific molecular targets for advanced therapies. In particular, the IgG N-glycome in diseased tissues is considered to be disease-dependent; thus, specific glycan structures may be involved in the pathophysiology of autoimmune diseases. This study provides a critical overview of the literature on human IgG N-glycomics, with a focus on the identification of disease-specific glycan alterations. In order to expedite the establishment of clinically-relevant N-glycan biomarkers, the employment of advanced computational tools for the interpretation of clinical data and their relationship with the underlying molecular mechanisms may be critical. Glycoinformatics tools, including artificial intelligence and systems glycobiology approaches, are reviewed for their potential to provide insight into patient stratification and disease etiology. Challenges in the integration of such glycoinformatics approaches in N-glycan biomarker research are critically discussed.
Collapse
Affiliation(s)
| | - Cleo Kontoravdi
- Department of Chemical Engineering, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
30
|
Tang W, Liu D, Nie SP. Food glycomics in food science: recent advances and future perspectives. Curr Opin Food Sci 2022. [DOI: 10.1016/j.cofs.2022.100850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
31
|
Taherzadeh G, Campbell M, Zhou Y. Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins. Methods Mol Biol 2022; 2499:177-186. [PMID: 35696081 DOI: 10.1007/978-1-0716-2317-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Protein glycosylation is one of the most complex posttranslational modifications (PTM) that play a fundamental role in protein function. Identification and annotation of these sites using experimental approaches are challenging and time consuming. Hence, there is a demand to build fast and efficient computational methods to address this problem. Here, we present the SPRINT-Gly framework containing the largest dataset and a prediction model of glycosylation sites for a given protein sequence. In this framework, we construct a large dataset containing N- and O-linked glycosylation sites of human and mouse proteins, collected from different sources. We then introduce the SPRINT-Gly method to predict putative N- and O-linked sites. SPRINT-Gly is a machine learning-based approach consisting of a number of trained predictive models for glycosylation sites in both human and mouse proteins, separately. The method is built by incorporating sequence-based, predicted structural, and physicochemical information of the neighboring residues of each N- and O-linked glycosylation site and by training deep learning neural network and support vector machine as classifiers. SPRINT-Gly outperformed other existing methods by achieving 18% and 50% higher Matthew's correlation coefficient for N- and O-linked glycosylation site prediction, respectively. SPRINT-Gly is publicly available as an online and stand-alone predictor at https://sparks-lab.org/server/sprint-gly/ .
Collapse
Affiliation(s)
- Ghazaleh Taherzadeh
- Department of Mathematics and Computer Science, Wilkes University, Wilkes-Barre, PA, USA.
| | - Matthew Campbell
- Institute for Glycomics, Griffith University, Southport, QLD, Australia
| | - Yaoqi Zhou
- Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, China
| |
Collapse
|
32
|
Dehzangi I, Sharma A, Shatabda S. iProtGly-SS: A Tool to Accurately Predict Protein Glycation Site Using Structural-Based Features. Methods Mol Biol 2022; 2499:125-134. [PMID: 35696077 DOI: 10.1007/978-1-0716-2317-6_5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Posttranslational modification (PTM) is an important biological mechanism to promote functional diversity among the proteins. So far, a wide range of PTMs has been identified. Among them, glycation is considered as one of the most important PTMs. Glycation is associated with different neurological disorders including Parkinson and Alzheimer. It is also shown to be responsible for different diseases, including vascular complications of diabetes mellitus. Despite all the efforts have been made so far, the prediction performance of glycation sites using computational methods remains limited. Here we present a newly developed machine learning tool called iProtGly-SS that utilizes sequential and structural information as well as Support Vector Machine (SVM) classifier to enhance lysine glycation site prediction accuracy. The performance of iProtGly-SS was investigated using the three most popular benchmarks used for this task. Our results demonstrate that iProtGly-SS is able to achieve 81.61%, 93.62%, and 92.95% prediction accuracies on these benchmarks, which are significantly better than those results reported in the previous studies. iProtGly-SS is implemented as a web-based tool which is publicly available at http://brl.uiu.ac.bd/iprotgly-ss/ .
Collapse
Affiliation(s)
- Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, USA.
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA.
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, Australia.
- Department of Medical Science Mathematics, Tokyo Medical and Dental University (TMDU), Tokyo, Japan.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh.
| |
Collapse
|
33
|
Aoki-Kinoshita KF. Functions of Glycosylation and Related Web Resources for Its Prediction. Methods Mol Biol 2022; 2499:135-144. [PMID: 35696078 DOI: 10.1007/978-1-0716-2317-6_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Glycosylation involves the attachment of carbohydrate sugar chains, or glycans, onto an amino acid residue of a protein. These glycans are often branched structures and serve to modulate the function of proteins. Glycans are synthesized through a complex process of enzymatic reactions that occur in the Golgi apparatus in mammalian systems. Because there is currently no sequencer for glycans, technologies such as mass spectrometry is used to characterize glycans in a biological sample to ascertain its glycome. This is a tedious process that requires high levels of expertise and equipment. Thus, the enzymes that work on glycans, called glycogenes or glycoenzymes, have been studied to better understand glycan function. With the development of glycan-related databases and a glycan repository, bioinformatics approaches have attempted to predict the glycosylation pathway and the glycosylation sites on proteins. This chapter introduces these methods and related Web resources for understanding glycan function.
Collapse
|
34
|
Sobitan A, Mahase V, Rhoades R, Williams D, Liu D, Xie Y, Li L, Tang Q, Teng S. Computational Saturation Mutagenesis of SARS-CoV-1 Spike Glycoprotein: Stability, Binding Affinity, and Comparison With SARS-CoV-2. Front Mol Biosci 2021; 8:784303. [PMID: 34957216 PMCID: PMC8696472 DOI: 10.3389/fmolb.2021.784303] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 11/18/2021] [Indexed: 12/24/2022] Open
Abstract
Severe Acute respiratory syndrome coronavirus (SARS-CoV-1) attaches to the host cell surface to initiate the interaction between the receptor-binding domain (RBD) of its spike glycoprotein (S) and the human Angiotensin-converting enzyme (hACE2) receptor. SARS-CoV-1 mutates frequently because of its RNA genome, which challenges the antiviral development. Here, we per-formed computational saturation mutagenesis of the S protein of SARS-CoV-1 to identify the residues crucial for its functions. We used the structure-based energy calculations to analyze the effects of the missense mutations on the SARS-CoV-1 S stability and the binding affinity with hACE2. The sequence and structure alignment showed similarities between the S proteins of SARS-CoV-1 and SARS-CoV-2. Interestingly, we found that target mutations of S protein amino acids generate similar effects on their stabilities between SARS-CoV-1 and SARS-CoV-2. For example, G839W of SARS-CoV-1 corresponds to G857W of SARS-CoV-2, which decrease the stability of their S glycoproteins. The viral mutation analysis of the two different SARS-CoV-1 isolates showed that mutations, T487S and L472P, weakened the S-hACE2 binding of the 2003–2004 SARS-CoV-1 isolate. In addition, the mutations of L472P and F360S destabilized the 2003–2004 viral isolate. We further predicted that many mutations on N-linked glycosylation sites would increase the stability of the S glycoprotein. Our results can be of therapeutic importance in the design of antivirals or vaccines against SARS-CoV-1 and SARS-CoV-2.
Collapse
Affiliation(s)
- Adebiyi Sobitan
- Department of Biology, Howard University, Washington, DC, United States
| | - Vidhyanand Mahase
- Department of Biology, Howard University, Washington, DC, United States
| | - Raina Rhoades
- Department of Biology, Howard University, Washington, DC, United States
| | - Dejaun Williams
- Department of Biology, Howard University, Washington, DC, United States
| | - Dongxiao Liu
- Howard University College of Medicine, Washington, DC, United States
| | - Yixin Xie
- Computational Science Program, University of Texas at El Paso, El Paso, TX, United States
| | - Lin Li
- Computational Science Program, University of Texas at El Paso, El Paso, TX, United States.,Physics Department, University of Texas at El Paso, El Paso, TX, United States
| | - Qiyi Tang
- Howard University College of Medicine, Washington, DC, United States
| | - Shaolei Teng
- Department of Biology, Howard University, Washington, DC, United States
| |
Collapse
|
35
|
Pakhrin SC, Aoki-Kinoshita KF, Caragea D, KC DB. DeepNGlyPred: A Deep Neural Network-Based Approach for Human N-Linked Glycosylation Site Prediction. Molecules 2021; 26:molecules26237314. [PMID: 34885895 PMCID: PMC8658957 DOI: 10.3390/molecules26237314] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 11/22/2021] [Accepted: 11/26/2021] [Indexed: 12/21/2022] Open
Abstract
Protein N-linked glycosylation is a post-translational modification that plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of glycosylation sites. Most of the existing predictors for N-linked glycosylation utilize the information that the glycosylation site occurs at the N-X-[S/T] sequon, where X is any amino acid except proline. Not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. Here, we report DeepNGlyPred a deep learning-based approach that encodes the positive and negative sequences in the human proteome dataset (extracted from N-GlycositeAtlas) using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. DeepNGlyPred produces SN, SP, MCC, and ACC of 88.62%, 73.92%, 0.60, and 79.41%, respectively on N-GlyDE independent test set, which is better than the compared approaches. These results demonstrate that DeepNGlyPred is a robust computational technique to predict N-Linked glycosylation sites confined to N-X-[S/T] sequon. DeepNGlyPred will be a useful resource for the glycobiology community.
Collapse
Affiliation(s)
- Subash C. Pakhrin
- School of Computing, Wichita State University, 1845 Fairmount St., Wichita, KS 67260, USA;
| | | | - Doina Caragea
- Department of Computer Science, Kansas State University, Manhattan, KS 66506, USA;
| | - Dukka B. KC
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, USA
- Correspondence: ; Tel.: +1-906-487-1657
| |
Collapse
|
36
|
Modeling coronavirus spike protein dynamics: implications for immunogenicity and immune escape. Biophys J 2021; 120:5592-5618. [PMID: 34767789 PMCID: PMC8577870 DOI: 10.1016/j.bpj.2021.11.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 03/19/2021] [Accepted: 11/04/2021] [Indexed: 12/23/2022] Open
Abstract
The ongoing COVID-19 pandemic is a global public health emergency requiring urgent development of efficacious vaccines. While concentrated research efforts have focused primarily on antibody-based vaccines that neutralize SARS-CoV-2, and several first-generation vaccines have either been approved or received emergency use authorization, it is forecasted that COVID-19 will become an endemic disease requiring updated second-generation vaccines. The SARS-CoV-2 surface spike (S) glycoprotein represents a prime target for vaccine development because antibodies that block viral attachment and entry, i.e., neutralizing antibodies, bind almost exclusively to the receptor-binding domain. Here, we develop computational models for a large subset of S proteins associated with SARS-CoV-2, implemented through coarse-grained elastic network models and normal mode analysis. We then analyze local protein domain dynamics of the S protein systems and their thermal stability to characterize structural and dynamical variability among them. These results are compared against existing experimental data and used to elucidate the impact and mechanisms of SARS-CoV-2 S protein mutations and their associated antibody binding behavior. We construct a SARS-CoV-2 antigenic map and offer predictions about the neutralization capabilities of antibody and S mutant combinations based on protein dynamic signatures. We then compare SARS-CoV-2 S protein dynamics to SARS-CoV and MERS-CoV S proteins to investigate differing antibody binding and cellular fusion mechanisms that may explain the high transmissibility of SARS-CoV-2. The outbreaks associated with SARS-CoV, MERS-CoV, and SARS-CoV-2 over the last two decades suggest that the threat presented by coronaviruses is ever-changing and long term. Our results provide insights into the dynamics-driven mechanisms of immunogenicity associated with coronavirus S proteins and present a new, to our knowledge, approach to characterize and screen potential mutant candidates for immunogen design, as well as to characterize emerging natural variants that may escape vaccine-induced antibody responses.
Collapse
|
37
|
Characterization of M116.1p, a murine cytomegalovirus protein required for efficient infection of mononuclear phagocytes. J Virol 2021; 96:e0087621. [PMID: 34705561 DOI: 10.1128/jvi.00876-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Broad tissue tropism of cytomegaloviruses (CMVs) is facilitated by different glycoprotein entry complexes, which are conserved between human CMV (HCMV) and murine CMV (MCMV). Among the wide array of cell types susceptible to the infection, mononuclear phagocytes (MNPs) play a unique role in the pathogenesis of the infection as they contribute both to the virus spread and immune control. CMVs have dedicated numerous genes for the efficient infection and evasion of macrophages and dendritic cells. In this study, we have characterized the properties and function of M116, a previously poorly described but highly transcribed MCMV gene region which encodes M116.1p, a novel protein necessary for the efficient infection of MNPs and viral spread in vivo. Our study further revealed that M116.1p shares similarities with its positional homologs in HCMV and RCMV, UL116 and R116, respectively, such as late kinetics of expression, N-glycosylation, localization to the virion assembly compartment, and interaction with gH - a member of the CMVs fusion complex. This study, therefore, expands our knowledge about virally encoded glycoproteins that play important roles in viral infectivity and tropism. Importance Human cytomegalovirus (HCMV) is a species-specific herpesvirus that causes severe disease in immunocompromised individuals and immunologically immature neonates. Murine cytomegalovirus (MCMV) is biologically similar to HCMV, and it serves as a widely used model for studying the infection, pathogenesis, and immune responses to HCMV. In our previous work, we have identified the M116 ORF as one of the most extensively transcribed regions of the MCMV genome without an assigned function. This study shows that the M116 locus codes for a novel protein, M116.1p, which shares similarities with UL116 and R116 in HCMV and RCMV, respectively, and is required for the efficient infection of mononuclear phagocytes and virus spread in vivo. Furthermore, this study establishes the α-M116 monoclonal antibody and MCMV mutants lacking M116, generated in this work, as valuable tools for studying the role of macrophages and dendritic cells in limiting CMV infection following different MCMV administration routes.
Collapse
|
38
|
Nguyen TTD, Le NQK, Tran TA, Pham DM, Ou YY. Incorporating a transfer learning technique with amino acid embeddings to efficiently predict N-linked glycosylation sites in ion channels. Comput Biol Med 2021; 130:104212. [PMID: 33454535 DOI: 10.1016/j.compbiomed.2021.104212] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 12/21/2020] [Accepted: 01/04/2021] [Indexed: 11/27/2022]
Abstract
Glycosylation is a dynamic enzymatic process that attaches glycan to proteins or other organic molecules such as lipoproteins. Research has shown that such a process in ion channel proteins plays a fundamental role in modulating ion channel functions. This study used a computational method to predict N-linked glycosylation sites, the most common type, in ion channel proteins. From segments of ion channel proteins centered around N-linked glycosylation sites, the amino acid embedding vectors of each residue were concatenated to create features for prediction. We experimented with two different models for converting amino acids to their corresponding embeddings: one was fed with ion channel sequences and the other with a large dataset composed of more than one million protein sequences. The latter model stemmed from the idea of transfer learning technique and emerged as a more efficient feature extractor. Our best model was obtained from this transfer learning approach and a hyperparameter tuning process with a random search on 5-fold cross-validation data. It achieved an accuracy, specificity, sensitivity, and Matthews correlation coefficient of 93.4%, 92.8%, 98.6%, and 0.726, respectively. Corresponding scores on an independent test were 92.9%, 92.2%, 99%, and 0.717. These results outperform the position-specific scoring matrix features that are predominantly employed in post-translational modification site predictions. Furthermore, compared to N-GlyDE, GlycoEP, SPRINT-Gly, the most recent N-linked glycosylation site predictors, our model yields higher scores on the above 4 metrics, thus further demonstrating the efficiency of our approach.
Collapse
Affiliation(s)
| | - Nguyen-Quoc-Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, 106, Taiwan; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, 106, Taiwan
| | | | - Dinh-Minh Pham
- Institute of Biotechnology, Vietnam Academy of Science and Technology, Hanoi, Viet Nam
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan.
| |
Collapse
|
39
|
Calmodulin Supports TRPA1 Channel Association with Opioid Receptors and Glutamate NMDA Receptors in the Nervous Tissue. Int J Mol Sci 2020; 22:ijms22010229. [PMID: 33379368 PMCID: PMC7795679 DOI: 10.3390/ijms22010229] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 12/21/2020] [Accepted: 12/24/2020] [Indexed: 12/24/2022] Open
Abstract
Transient receptor potential ankyrin member 1 (TRPA1) belongs to the family of thermo TRP cation channels that detect harmful temperatures, acids and numerous chemical pollutants. TRPA1 is expressed in nervous tissue, where it participates in the genesis of nociceptive signals in response to noxious stimuli and mediates mechanical hyperalgesia and allodynia associated with different neuropathies. The glutamate N-methyl-d-aspartate receptor (NMDAR), which plays a relevant role in allodynia to mechanical stimuli, is connected via histidine triad nucleotide-binding protein 1 (HINT1) and type 1 sigma receptor (σ1R) to mu-opioid receptors (MORs), which mediate the most potent pain relief. Notably, neuropathic pain causes a reduction in MOR antinociceptive efficacy, which can be reversed by blocking spinal NMDARs and TRPA1 channels. Thus, we studied whether TRPA1 channels form complexes with MORs and NMDARs that may be implicated in the aforementioned nociceptive signals. Our data suggest that TRPA1 channels functionally associate with MORs, delta opioid receptors and NMDARs in the dorsal root ganglia, the spinal cord and brain areas. These associations were altered in response to pharmacological interventions and the induction of inflammatory and also neuropathic pain. The MOR-TRPA1 and NMDAR-TRPA1 associations do not require HINT1 or σ1R but appear to be mediated by calcium-activated calmodulin. Thus, TRPA1 channels may associate with NMDARs to promote ascending acute and chronic pain signals and to control MOR antinociception.
Collapse
|
40
|
Insights into Bioinformatic Applications for Glycosylation: Instigating an Awakening towards Applying Glycoinformatic Resources for Cancer Diagnosis and Therapy. Int J Mol Sci 2020; 21:ijms21249336. [PMID: 33302373 PMCID: PMC7762546 DOI: 10.3390/ijms21249336] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 11/26/2020] [Accepted: 12/01/2020] [Indexed: 01/10/2023] Open
Abstract
Glycosylation plays a crucial role in various diseases and their etiology. This has led to a clear understanding on the functions of carbohydrates in cell communication, which eventually will result in novel therapeutic approaches for treatment of various disease. Glycomics has now become one among the top ten technologies that will change the future. The direct implication of glycosylation as a hallmark of cancer and for cancer therapy is well established. As in proteomics, where bioinformatics tools have led to revolutionary achievements, bioinformatics resources for glycosylation have improved its practical implication. Bioinformatics tools, algorithms and databases are a mandatory requirement to manage and successfully analyze large amount of glycobiological data generated from glycosylation studies. This review consolidates all the available tools and their applications in glycosylation research. The achievements made through the use of bioinformatics into glycosylation studies are also presented. The importance of glycosylation in cancer diagnosis and therapy is discussed and the gap in the application of widely available glyco-informatic tools for cancer research is highlighted. This review is expected to bring an awakening amongst glyco-informaticians as well as cancer biologists to bridge this gap, to exploit the available glyco-informatic tools for cancer.
Collapse
|
41
|
Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020; 20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]
Abstract
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen‐Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Chinese Academy of SciencesInstitute of Computing TechnologyBeijing100190China
| | - Yuxing Liao
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Zhiao Shi
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Sara R. Savage
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen Jiang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Bing Zhang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| |
Collapse
|
42
|
Affiliation(s)
- Hayden Wilkinson
- NIBRT GlycoScience Group, National Institute for Bioprocessing, Research and Training, Blackrock, Dublin, Ireland
- CÚRAM, SFI Research Centre for Medical Devices, National University of Ireland, Galway, Ireland
- UCD School of Medicine, College of Health and Agricultural Science, University College Dublin, Dublin, Ireland
| | - Radka Saldova
- NIBRT GlycoScience Group, National Institute for Bioprocessing, Research and Training, Blackrock, Dublin, Ireland
- CÚRAM, SFI Research Centre for Medical Devices, National University of Ireland, Galway, Ireland
- UCD School of Medicine, College of Health and Agricultural Science, University College Dublin, Dublin, Ireland
| |
Collapse
|
43
|
Copoiu L, Malhotra S. The current structural glycome landscape and emerging technologies. Curr Opin Struct Biol 2020; 62:132-139. [PMID: 32006784 DOI: 10.1016/j.sbi.2019.12.020] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 12/23/2019] [Accepted: 12/24/2019] [Indexed: 11/19/2022]
Abstract
Carbohydrates represent one of the building blocks of life, along with nucleic acids, proteins and lipids. Although glycans are involved in a wide range of processes from embryogenesis to protein trafficking and pathogen infection, we are still a long way from deciphering the glycocode. In this review, we aim to present a few of the challenges that researchers working in the area of glycobiology can encounter and what strategies can be utilised to overcome them. Our goal is to paint a comprehensive picture of the current saccharide landscape available in the Protein Data Bank (PDB). We also review recently updated repositories relevant to the topic proposed, the impact of software development on strategies to structurally solve carbohydrate moieties, and state-of-the-art molecular and cellular biology methods that can shed some light on the function and structure of glycans.
Collapse
Affiliation(s)
- Liviu Copoiu
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, United Kingdom
| | - Sony Malhotra
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck College, University of London, Malet Street, London WC1E 7HX, United Kingdom.
| |
Collapse
|
44
|
Abrahams JL, Taherzadeh G, Jarvas G, Guttman A, Zhou Y, Campbell MP. Recent advances in glycoinformatic platforms for glycomics and glycoproteomics. Curr Opin Struct Biol 2019; 62:56-69. [PMID: 31874386 DOI: 10.1016/j.sbi.2019.11.009] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Revised: 11/05/2019] [Accepted: 11/15/2019] [Indexed: 12/16/2022]
Abstract
Protein glycosylation is the most complex and prevalent post-translation modification in terms of the number of proteins modified and the diversity generated. To understand the functional roles of glycoproteins it is important to gain an insight into the repertoire of oligosaccharides present. The comparison and relative quantitation of glycoforms combined with site-specific identification and occupancy are necessary steps in this direction. Computational platforms have continued to mature assisting researchers with the interpretation of such glycomics and glycoproteomics data sets, but frequently support dedicated workflows and users rely on the manual interpretation of data to gain insights into the glycoproteome. The growth of site-specific knowledge has also led to the implementation of machine-learning algorithms to predict glycosylation which is now being integrated into glycoproteomics pipelines. This short review describes commercial and open-access databases and software with an emphasis on those that are actively maintained and designed to support current analytical workflows.
Collapse
Affiliation(s)
- Jodie L Abrahams
- Institute for Glycomics, Griffith University, Gold Coast, QLD, Australia
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Gabor Jarvas
- Translational Glycomics Research Group, Research Institute of Biomolecular and Chemical Engineering, University of Pannonia, Veszprém, Hungary; Horváth Csaba Laboratory of Bioseparation Sciences, Research Centre for Molecular Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Andras Guttman
- Translational Glycomics Research Group, Research Institute of Biomolecular and Chemical Engineering, University of Pannonia, Veszprém, Hungary; Horváth Csaba Laboratory of Bioseparation Sciences, Research Centre for Molecular Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary; SCIEX, Brea, CA, USA
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Matthew P Campbell
- Institute for Glycomics, Griffith University, Gold Coast, QLD, Australia.
| |
Collapse
|
45
|
N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding. Sci Rep 2019; 9:15975. [PMID: 31685900 PMCID: PMC6828726 DOI: 10.1038/s41598-019-52341-z] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Accepted: 10/15/2019] [Indexed: 01/23/2023] Open
Abstract
N-linked glycosylation is one of the predominant post-translational modifications involved in a number of biological functions. Since experimental characterization of glycosites is challenging, glycosite prediction is crucial. Several predictors have been made available and report high performance. Most of them evaluate their performance at every asparagine in protein sequences, not confined to asparagine in the N-X-S/T sequon. In this paper, we present N-GlyDE, a two-stage prediction tool trained on rigorously-constructed non-redundant datasets to predict N-linked glycosites in the human proteome. The first stage uses a protein similarity voting algorithm trained on both glycoproteins and non-glycoproteins to predict a score for a protein to improve glycosite prediction. The second stage uses a support vector machine to predict N-linked glycosites by utilizing features of gapped dipeptides, pattern-based predicted surface accessibility, and predicted secondary structure. N-GlyDE's final predictions are derived from a weight adjustment of the second-stage prediction results based on the first-stage prediction score. Evaluated on N-X-S/T sequons of an independent dataset comprised of 53 glycoproteins and 33 non-glycoproteins, N-GlyDE achieves an accuracy and MCC of 0.740 and 0.499, respectively, outperforming the compared tools. The N-GlyDE web server is available at http://bioapp.iis.sinica.edu.tw/N-GlyDE/ .
Collapse
|