1
|
Urban J, Jin C, Thomsson KA, Karlsson NG, Ives CM, Fadda E, Bojar D. Predicting glycan structure from tandem mass spectrometry via deep learning. Nat Methods 2024; 21:1206-1215. [PMID: 38951670 PMCID: PMC11239490 DOI: 10.1038/s41592-024-02314-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 05/17/2024] [Indexed: 07/03/2024]
Abstract
Glycans constitute the most complicated post-translational modification, modulating protein activity in health and disease. However, structural annotation from tandem mass spectrometry (MS/MS) data is a bottleneck in glycomics, preventing high-throughput endeavors and relegating glycomics to a few experts. Trained on a newly curated set of 500,000 annotated MS/MS spectra, here we present CandyCrunch, a dilated residual neural network predicting glycan structure from raw liquid chromatography-MS/MS data in seconds (top-1 accuracy: 90.3%). We developed an open-access Python-based workflow of raw data conversion and prediction, followed by automated curation and fragment annotation, with predictions recapitulating and extending expert annotation. We demonstrate that this can be used for de novo annotation, diagnostic fragment identification and high-throughput glycomics. For maximum impact, this entire pipeline is tightly interlaced with our glycowork platform and can be easily tested at https://colab.research.google.com/github/BojarLab/CandyCrunch/blob/main/CandyCrunch.ipynb . We envision CandyCrunch to democratize structural glycomics and the elucidation of biological roles of glycans.
Collapse
Affiliation(s)
- James Urban
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Chunsheng Jin
- Proteomics Core Facility at Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Kristina A Thomsson
- Proteomics Core Facility at Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Niclas G Karlsson
- Section of Pharmacy, Department of Life Sciences and Health, Faculty of Health Sciences, Oslo Metropolitan University, Oslo, Norway
| | - Callum M Ives
- Department of Chemistry and Hamilton Institute, Maynooth University, Maynooth, Ireland
| | - Elisa Fadda
- School of Biological Sciences, University of Southampton, Southampton, UK
| | - Daniel Bojar
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden.
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden.
| |
Collapse
|
2
|
Akune-Taylor Y, Kon A, Aoki-Kinoshita KF. In silico simulation of glycosylation and related pathways. Anal Bioanal Chem 2024; 416:3687-3696. [PMID: 38748247 PMCID: PMC11180631 DOI: 10.1007/s00216-024-05331-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 04/30/2024] [Accepted: 05/02/2024] [Indexed: 06/18/2024]
Abstract
Glycans participate in a vast number of recognition systems in diverse organisms in health and in disease. However, glycans cannot be sequenced because there is no sequencer technology that can fully characterize them. There is no "template" for replicating glycans as there are for amino acids and nucleic acids. Instead, glycans are synthesized by a complicated orchestration of multitudes of glycosyltransferases and glycosidases. Thus glycans can vary greatly in structure, but they are not genetically reproducible and are usually isolated in minute amounts. To characterize (sequence) the glycome (defined as the glycans in a particular organism, tissue, cell, or protein), glycosylation pathway prediction using in silico methods based on glycogene expression data, and glycosylation simulations have been attempted. Since many of the mammalian glycogenes have been identified and cloned, it has become possible to predict the glycan biosynthesis pathway in these systems. By then incorporating systems biology and bioprocessing technologies to these pathway models, given the right enzymatic parameters including enzyme and substrate concentrations and kinetic reaction parameters, it is possible to predict the potentially synthesized glycans in the pathway. This review presents information on the data resources that are currently available to enable in silico simulations of glycosylation and related pathways. Then some of the software tools that have been developed in the past to simulate and analyze glycosylation pathways will be described, followed by a summary and vision for the future developments and research directions in this area.
Collapse
Affiliation(s)
- Yukie Akune-Taylor
- Glycan and Life Systems Integration Center, Soka University, Tokyo, Japan
| | - Akane Kon
- Graduate School of Science and Engineering, Soka University, Tokyo, Japan
| | - Kiyoko F Aoki-Kinoshita
- Glycan and Life Systems Integration Center, Soka University, Tokyo, Japan.
- Graduate School of Science and Engineering, Soka University, Tokyo, Japan.
- iGCORE, Nagoya University, Nagoya, Japan.
| |
Collapse
|
3
|
Xu T, Wang YC, Ma J, Cui Y, Wang L. In silico discovery and anti-tumor bioactivities validation of an algal lectin from Kappaphycus alvarezii genome. Int J Biol Macromol 2024; 275:133311. [PMID: 38909728 DOI: 10.1016/j.ijbiomac.2024.133311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 05/24/2024] [Accepted: 06/13/2024] [Indexed: 06/25/2024]
Abstract
Lectins are proteins that bind specifically and reversibly to carbohydrates, and some of them have significant anti-tumor activities. Compared to those of lectins from land plants, there are far fewer studies on algal lectins, despite of the high biodiversity of algae. However, canonical strategies based on chromatographic feature-oriented screening cannot satisfy the requirement for algal lectin discovery. In this study, prospecting for novel OAAH family lectins throughout 358 genomes of red algae and cyanobacteria was conducted. Then 35 candidate lectins and 1843 of their simulated mutated forms were virtually screened based on predicted binding specificities to characteristic carbohydrates on cancer cells inferred by a deep learning model. A new lectin, named Siye, was discovered in Kappaphycus alvarezii genome and further verified on different cancer cells. Without causing agglutination of erythrocytes, Siye showed significant cytotoxicity to four human cancer cell lines (IC50 values ranging from 0.11 to 3.95 μg/mL), including breast adenocarcinoma HCC1937, lung carcinoma A549, liver cancer HepG2 and romyelocytic leukemia HL60. And the cytotoxicity was induced through promoting apoptosis by regulating the caspase and the p53 pathway within 24 h. This study testifies the feasibility and efficiency of the genome mining guided by evolutionary theory and artificial intelligence in the discovery of algal lectins.
Collapse
Affiliation(s)
- Tongli Xu
- Key Laboratory of Coastal Biology and Biological Resource Utilization, Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai 264003, China; Qingdao Academy of Chinese Medical Sciences, Shandong University of Traditional Chinese Medicine, Qingdao 266071, China
| | - Yin-Chu Wang
- Key Laboratory of Coastal Biology and Biological Resource Utilization, Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai 264003, China; Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao 266071, China; National Basic Science Data Center, Beijing 100190, China.
| | - Jiahao Ma
- Hong Kong University of Science and Technology, Clear Water Bay, 999077, Hong Kong
| | - Yulin Cui
- Binzhou Medical University, Yantai 264003, China.
| | - Lu Wang
- School of Pharmacy, Yantai University, Yantai 264005, China.
| |
Collapse
|
4
|
Lundstrøm J, Thomès L, Bojar D. Protocol for constructing glycan biosynthetic networks using glycowork. STAR Protoc 2024; 5:102937. [PMID: 38630592 PMCID: PMC11036093 DOI: 10.1016/j.xpro.2024.102937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 01/09/2024] [Accepted: 02/19/2024] [Indexed: 04/19/2024] Open
Abstract
Glycans, present across all domains of life, comprise a wide range of monosaccharides assembled into complex, branching structures. Here, we present an in silico protocol to construct biosynthetic networks from a list of observed glycans using the Python package glycowork. We describe steps for data preparation, network construction, feature analysis, and data export. This protocol is implemented in Python using example data and can be adapted for use with customized datasets. For complete details on the use and execution of this protocol, please refer to Thomès et al.1.
Collapse
Affiliation(s)
- Jon Lundstrøm
- Department of Chemistry and Molecular Biology, University of Gothenburg, 41390 Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 41390 Gothenburg, Sweden.
| | - Luc Thomès
- University Lille, CHU Lille, ULR 7364 - RADEME - Maladies RAres du DÉveloppement embryonnaire et du Métabolisme, 59000 Lille, France
| | - Daniel Bojar
- Department of Chemistry and Molecular Biology, University of Gothenburg, 41390 Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 41390 Gothenburg, Sweden.
| |
Collapse
|
5
|
Kellman BP, Mariethoz J, Zhang Y, Shaul S, Alteri M, Sandoval D, Jeffris M, Armingol E, Bao B, Lisacek F, Bojar D, Lewis NE. Decoding glycosylation potential from protein structure across human glycoproteins with a multi-view recurrent neural network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594334. [PMID: 38798633 PMCID: PMC11118808 DOI: 10.1101/2024.05.15.594334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Glycosylation is described as a non-templated biosynthesis. Yet, the template-free premise is antithetical to the observation that different N-glycans are consistently placed at specific sites. It has been proposed that glycosite-proximal protein structures could constrain glycosylation and explain the observed microheterogeneity. Using site-specific glycosylation data, we trained a hybrid neural network to parse glycosites (recurrent neural network) and match them to feasible N-glycosylation events (graph neural network). From glycosite-flanking sequences, the algorithm predicts most human N-glycosylation events documented in the GlyConnect database and proposed structures corresponding to observed monosaccharide composition of the glycans at these sites. The algorithm also recapitulated glycosylation in Enhanced Aromatic Sequons, SARS-CoV-2 spike, and IgG3 variants, thus demonstrating the ability of the algorithm to predict both glycan structure and abundance. Thus, protein structure constrains glycosylation, and the neural network enables predictive in silico glycosylation of uncharacterized or novel protein sequences and genetic variants.
Collapse
Affiliation(s)
- Benjamin P. Kellman
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
- Augment Biologics, La Jolla, CA 92092
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, USA
| | - Julien Mariethoz
- Proteome Informatics Group, Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland
| | - Yujie Zhang
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Sigal Shaul
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Mia Alteri
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Daniel Sandoval
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Mia Jeffris
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Erick Armingol
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Bokan Bao
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Frederique Lisacek
- Proteome Informatics Group, Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland
- Computer Science Department & Section of Biology, University of Geneva, route de Drize 7, CH-1227, Geneva, Switzerland
| | - Daniel Bojar
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg 41390, Sweden
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg 41390, Sweden
| | - Nathan E. Lewis
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, USA
| |
Collapse
|
6
|
Bennett AR, Bojar D. Syntactic sugars: crafting a regular expression framework for glycan structures. BIOINFORMATICS ADVANCES 2024; 4:vbae059. [PMID: 38708029 PMCID: PMC11069104 DOI: 10.1093/bioadv/vbae059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 03/15/2024] [Accepted: 04/17/2024] [Indexed: 05/07/2024]
Abstract
Motivation Structural analysis of glycans poses significant challenges in glycobiology due to their complex sequences. Research questions such as analyzing the sequence content of the α1-6 branch in N-glycans, are biologically meaningful yet can be hard to automate. Results Here, we introduce a regular expression system, designed for glycans, feature-complete, and closely aligned with regular expression formatting. We use this to annotate glycan motifs of arbitrary complexity, perform differential expression analysis on designated sequence stretches, or elucidate branch-specific binding specificities of lectins in an automated manner. We are confident that glycan regular expressions will empower computational analyses of these sequences. Availability and implementation Our regular expression framework for glycans is implemented in Python and is incorporated into the open-source glycowork package (version 1.1+). Code and documentation are available at https://github.com/BojarLab/glycowork/blob/master/glycowork/motif/regex.py.
Collapse
Affiliation(s)
- Alexander R Bennett
- Department of Medical Biochemistry, Institute of Biomedicine, University of Gothenburg, 41390 Gothenburg, Sweden
| | - Daniel Bojar
- Department of Chemistry and Molecular Biology, University of Gothenburg, 41390 Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 41390 Gothenburg, Sweden
| |
Collapse
|
7
|
Lundstrøm J, Gillon E, Chazalet V, Kerekes N, Di Maio A, Feizi T, Liu Y, Varrot A, Bojar D. Elucidating the glycan-binding specificity and structure of Cucumis melo agglutinin, a new R-type lectin. Beilstein J Org Chem 2024; 20:306-320. [PMID: 38410776 PMCID: PMC10896221 DOI: 10.3762/bjoc.20.31] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 02/09/2024] [Indexed: 02/28/2024] Open
Abstract
Plant lectins have garnered attention for their roles as laboratory probes and potential therapeutics. Here, we report the discovery and characterization of Cucumis melo agglutinin (CMA1), a new R-type lectin from melon. Our findings reveal CMA1's unique glycan-binding profile, mechanistically explained by its 3D structure, augmenting our understanding of R-type lectins. We expressed CMA1 recombinantly and assessed its binding specificity using multiple glycan arrays, covering 1,046 unique sequences. This resulted in a complex binding profile, strongly preferring C2-substituted, beta-linked galactose (both GalNAc and Fuca1-2Gal), which we contrasted with the established R-type lectin Ricinus communis agglutinin 1 (RCA1). We also report binding of specific glycosaminoglycan subtypes and a general enhancement of binding by sulfation. Further validation using agglutination, thermal shift assays, and surface plasmon resonance confirmed and quantified this binding specificity in solution. Finally, we solved the high-resolution structure of the CMA1 N-terminal domain using X-ray crystallography, supporting our functional findings at the molecular level. Our study provides a comprehensive understanding of CMA1, laying the groundwork for further exploration of its biological and therapeutic potential.
Collapse
Affiliation(s)
- Jon Lundstrøm
- Department of Chemistry and Molecular Biology, University of Gothenburg, Medicinaregatan 7B, 413 90 Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 413 90 Gothenburg, Sweden
| | - Emilie Gillon
- Univ. Grenoble Alpes, CNRS, CERMAV, 601 Rue de la Chimie, 38610 Gières, France
| | - Valérie Chazalet
- Univ. Grenoble Alpes, CNRS, CERMAV, 601 Rue de la Chimie, 38610 Gières, France
| | - Nicole Kerekes
- Department of Chemistry and Molecular Biology, University of Gothenburg, Medicinaregatan 7B, 413 90 Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 413 90 Gothenburg, Sweden
| | - Antonio Di Maio
- Glycosciences Laboratory, Faculty of Medicine, Imperial College London, Du Cane Rd, London W12 0NN, United Kingdom
| | - Ten Feizi
- Glycosciences Laboratory, Faculty of Medicine, Imperial College London, Du Cane Rd, London W12 0NN, United Kingdom
| | - Yan Liu
- Glycosciences Laboratory, Faculty of Medicine, Imperial College London, Du Cane Rd, London W12 0NN, United Kingdom
| | - Annabelle Varrot
- Univ. Grenoble Alpes, CNRS, CERMAV, 601 Rue de la Chimie, 38610 Gières, France
| | - Daniel Bojar
- Department of Chemistry and Molecular Biology, University of Gothenburg, Medicinaregatan 7B, 413 90 Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 413 90 Gothenburg, Sweden
| |
Collapse
|
8
|
Lundstrøm J, Urban J, Thomès L, Bojar D. GlycoDraw: a python implementation for generating high-quality glycan figures. Glycobiology 2023; 33:927-934. [PMID: 37498172 PMCID: PMC10859633 DOI: 10.1093/glycob/cwad063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 07/14/2023] [Accepted: 07/26/2023] [Indexed: 07/28/2023] Open
Abstract
Glycans are essential to all scales of biology, with their intricate structures being crucial for their biological functions. The structural complexity of glycans is communicated through simplified and unified visual representations according to the Symbol Nomenclature for Glycans (SNFGs) guidelines adopted by the community. Here, we introduce GlycoDraw, a Python-native implementation for high-throughput generation of high-quality, SNFG-compliant glycan figures with flexible display options. GlycoDraw is released as part of our glycan analysis ecosystem, glycowork, facilitating integration into existing workflows by enabling fully automated annotation of glycan-related figures and thus assisting the analysis of e.g. differential abundance data or glycomics mass spectra.
Collapse
Affiliation(s)
- Jon Lundstrøm
- Department of Chemistry and Molecular Biology, University of Gothenburg, Medicinaregatan 9C, 41390 Gothenburg, Västra Götaland, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Medicinaregatan 9C, 41390 Gothenburg, Västra Götaland, Sweden
| | - James Urban
- Department of Chemistry and Molecular Biology, University of Gothenburg, Medicinaregatan 9C, 41390 Gothenburg, Västra Götaland, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Medicinaregatan 9C, 41390 Gothenburg, Västra Götaland, Sweden
| | - Luc Thomès
- Department of Chemistry and Molecular Biology, University of Gothenburg, Medicinaregatan 9C, 41390 Gothenburg, Västra Götaland, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Medicinaregatan 9C, 41390 Gothenburg, Västra Götaland, Sweden
| | - Daniel Bojar
- Department of Chemistry and Molecular Biology, University of Gothenburg, Medicinaregatan 9C, 41390 Gothenburg, Västra Götaland, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Medicinaregatan 9C, 41390 Gothenburg, Västra Götaland, Sweden
| |
Collapse
|
9
|
Lundstrøm J, Urban J, Bojar D. Decoding glycomics with a suite of methods for differential expression analysis. CELL REPORTS METHODS 2023; 3:100652. [PMID: 37992708 PMCID: PMC10753297 DOI: 10.1016/j.crmeth.2023.100652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 10/04/2023] [Accepted: 10/30/2023] [Indexed: 11/24/2023]
Abstract
Glycomics, the comprehensive profiling of all glycan structures in samples, is rapidly expanding to enable insights into physiology and disease mechanisms. However, glycan structure complexity and glycomics data interpretation present challenges, especially for differential expression analysis. Here, we present a framework for differential glycomics expression analysis. Our methodology encompasses specialized and domain-informed methods for data normalization and imputation, glycan motif extraction and quantification, differential expression analysis, motif enrichment analysis, time series analysis, and meta-analytic capabilities, synthesizing results across multiple studies. All methods are integrated into our open-source glycowork package, facilitating performant workflows and user-friendly access. We demonstrate these methods using dedicated simulations and glycomics datasets of N-, O-, lipid-linked, and free glycans. Differential expression tests here focus on human datasets and cancer vs. healthy tissue comparisons. Our rigorous approach allows for robust, reliable, and comprehensive differential expression analyses in glycomics, contributing to advancing glycomics research and its translation to clinical and diagnostic applications.
Collapse
Affiliation(s)
- Jon Lundstrøm
- Department of Chemistry and Molecular Biology, University of Gothenburg, 41390 Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 41390 Gothenburg, Sweden
| | - James Urban
- Department of Chemistry and Molecular Biology, University of Gothenburg, 41390 Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 41390 Gothenburg, Sweden
| | - Daniel Bojar
- Department of Chemistry and Molecular Biology, University of Gothenburg, 41390 Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 41390 Gothenburg, Sweden.
| |
Collapse
|
10
|
Krishna Perumal P, Dong CD, Chauhan AS, Anisha GS, Kadri MS, Chen CW, Singhania RR, Patel AK. Advances in oligosaccharides production from algal sources and potential applications. Biotechnol Adv 2023; 67:108195. [PMID: 37315876 DOI: 10.1016/j.biotechadv.2023.108195] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 06/02/2023] [Accepted: 06/05/2023] [Indexed: 06/16/2023]
Abstract
In recent years, algal-derived glycans and oligosaccharides have become increasingly important in health applications due to higher bioactivities than plant-derived oligosaccharides. The marine organisms have complex, and highly branched glycans and more reactive groups to elicit greater bioactivities. However, complex and large molecules have limited use in broad commercial applications due to dissolution limitations. In comparison to these, oligosaccharides show better solubility and retain their bioactivities, hence, offering better applications opportunity. Accordingly, efforts are being made to develop a cost-effective method for enzymatic extraction of oligosaccharides from algal polysaccharides and algal biomass. Yet detailed structural characterization of algal-derived glycans is required to produce and characterize the potential biomolecules for improved bioactivity and commercial applications. Some macroalgae and microalgae are being evaluated as in vivo biofactories for efficient clinical trials, which could be very helpful in understanding the therapeutic responses. This review discusses the recent advancements in the production of oligosaccharides from microalgae. It also discusses the bottlenecks of the oligosaccharides research, technological limitations, and probable solutions to these problems. Furthermore, it presents the emerging bioactivities of algal oligosaccharides and their promising potential for possible biotherapeutic application.
Collapse
Affiliation(s)
- Pitchurajan Krishna Perumal
- Institute of Aquatic Science and Technology, National Kaohsiung University of Science and Technology, Kaohsiung City 81157, Taiwan
| | - Cheng-Di Dong
- Institute of Aquatic Science and Technology, National Kaohsiung University of Science and Technology, Kaohsiung City 81157, Taiwan; Sustainable Environment Research Centre, National Kaohsiung University of Science and Technology, Kaohsiung City 81157, Taiwan; Department of Marine Environmental Engineering, National Kaohsiung University of Science and Technology, Kaohsiung City, Taiwan
| | - Ajeet Singh Chauhan
- Institute of Aquatic Science and Technology, National Kaohsiung University of Science and Technology, Kaohsiung City 81157, Taiwan
| | - Grace Sathyanesan Anisha
- Post-Graduate and Research Department of Zoology, Government College for Women, Thiruvananthapuram 695014, Kerala, India
| | - Mohammad Sibtain Kadri
- Department of Marine Biotechnology and Resources, National Sun Yat-Sen University, Kaohsiung City-804201, Taiwan
| | - Chiu-Wen Chen
- Institute of Aquatic Science and Technology, National Kaohsiung University of Science and Technology, Kaohsiung City 81157, Taiwan; Sustainable Environment Research Centre, National Kaohsiung University of Science and Technology, Kaohsiung City 81157, Taiwan; Department of Marine Environmental Engineering, National Kaohsiung University of Science and Technology, Kaohsiung City, Taiwan
| | - Reeta Rani Singhania
- Institute of Aquatic Science and Technology, National Kaohsiung University of Science and Technology, Kaohsiung City 81157, Taiwan; Centre for Energy and Environmental Sustainability, Lucknow 226 029, Uttar Pradesh, India
| | - Anil Kumar Patel
- Institute of Aquatic Science and Technology, National Kaohsiung University of Science and Technology, Kaohsiung City 81157, Taiwan; Centre for Energy and Environmental Sustainability, Lucknow 226 029, Uttar Pradesh, India.
| |
Collapse
|
11
|
Jin C, Lundstrøm J, Korhonen E, Luis AS, Bojar D. Breast Milk Oligosaccharides Contain Immunomodulatory Glucuronic Acid and LacdiNAc. Mol Cell Proteomics 2023; 22:100635. [PMID: 37597722 PMCID: PMC10509713 DOI: 10.1016/j.mcpro.2023.100635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 07/31/2023] [Accepted: 08/16/2023] [Indexed: 08/21/2023] Open
Abstract
Breast milk is abundant with functionalized milk oligosaccharides (MOs) to nourish and protect the neonate. Yet we lack a comprehensive understanding of the repertoire and evolution of MOs across Mammalia. We report ∼400 MO-species associations (>100 novel structures) from milk glycomics of nine mostly understudied species: alpaca, beluga whale, black rhinoceros, bottlenose dolphin, impala, L'Hoest's monkey, pygmy hippopotamus, domestic sheep, and striped dolphin. This revealed the hitherto unknown existence of the LacdiNAc motif (GalNAcβ1-4GlcNAc) in MOs of all species except alpaca, sheep, and striped dolphin, indicating the widespread occurrence of this potentially antimicrobial motif in MOs. We also characterize glucuronic acid-containing MOs in the milk of impala, dolphins, sheep, and rhinoceros, previously only reported in cows. We demonstrate that these GlcA-MOs exhibit potent immunomodulatory effects. Our study extends the number of known MOs by >15%. Combined with >1900 curated MO-species associations, we characterize MO motif distributions, presenting an exhaustive overview of MO biodiversity.
Collapse
Affiliation(s)
- Chunsheng Jin
- Proteomics Core Facility at Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Jon Lundstrøm
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Emma Korhonen
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Ana S Luis
- Department of Medical Biochemistry and Cell Biology, University of Gothenburg, Gothenburg, Sweden
| | - Daniel Bojar
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden.
| |
Collapse
|
12
|
Thomès L, Karlsson V, Lundstrøm J, Bojar D. Mammalian milk glycomes: Connecting the dots between evolutionary conservation and biosynthetic pathways. Cell Rep 2023; 42:112710. [PMID: 37379211 DOI: 10.1016/j.celrep.2023.112710] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 05/09/2023] [Accepted: 06/12/2023] [Indexed: 06/30/2023] Open
Abstract
Milk oligosaccharides (MOs) are among the most abundant constituents of breast milk and are essential for health and development. Biosynthesized from monosaccharides into complex sequences, MOs differ considerably between taxonomic groups. Even human MO biosynthesis is insufficiently understood, hampering evolutionary and functional analyses. Using a comprehensive resource of all published MOs from >100 mammals, we develop a pipeline for generating and analyzing MO biosynthetic networks. We then use evolutionary relationships and inferred intermediates of these networks to discover (1) systematic glycome biases, (2) biosynthetic restrictions, such as reaction path preference, and (3) conserved biosynthetic modules. This allows us to prune and pinpoint biosynthetic pathways despite missing information. Machine learning and network analysis cluster species by their milk glycome, identifying characteristic sequence relationships and evolutionary gains/losses of motifs, MOs, and biosynthetic modules. These resources and analyses will advance our understanding of glycan biosynthesis and the evolution of breast milk.
Collapse
Affiliation(s)
- Luc Thomès
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Viktoria Karlsson
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Jon Lundstrøm
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Daniel Bojar
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden.
| |
Collapse
|
13
|
Perez S, Makshakova O, Angulo J, Bedini E, Bisio A, de Paz JL, Fadda E, Guerrini M, Hricovini M, Hricovini M, Lisacek F, Nieto PM, Pagel K, Paiardi G, Richter R, Samsonov SA, Vivès RR, Nikitovic D, Ricard Blum S. Glycosaminoglycans: What Remains To Be Deciphered? JACS AU 2023; 3:628-656. [PMID: 37006755 PMCID: PMC10052243 DOI: 10.1021/jacsau.2c00569] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 12/05/2022] [Accepted: 12/07/2022] [Indexed: 06/19/2023]
Abstract
Glycosaminoglycans (GAGs) are complex polysaccharides exhibiting a vast structural diversity and fulfilling various functions mediated by thousands of interactions in the extracellular matrix, at the cell surface, and within the cells where they have been detected in the nucleus. It is known that the chemical groups attached to GAGs and GAG conformations comprise "glycocodes" that are not yet fully deciphered. The molecular context also matters for GAG structures and functions, and the influence of the structure and functions of the proteoglycan core proteins on sulfated GAGs and vice versa warrants further investigation. The lack of dedicated bioinformatic tools for mining GAG data sets contributes to a partial characterization of the structural and functional landscape and interactions of GAGs. These pending issues will benefit from the development of new approaches reviewed here, namely (i) the synthesis of GAG oligosaccharides to build large and diverse GAG libraries, (ii) GAG analysis and sequencing by mass spectrometry (e.g., ion mobility-mass spectrometry), gas-phase infrared spectroscopy, recognition tunnelling nanopores, and molecular modeling to identify bioactive GAG sequences, biophysical methods to investigate binding interfaces, and to expand our knowledge and understanding of glycocodes governing GAG molecular recognition, and (iii) artificial intelligence for in-depth investigation of GAGomic data sets and their integration with proteomics.
Collapse
Affiliation(s)
- Serge Perez
- Centre
de Recherche sur les Macromolecules, Vegetales,
University of Grenoble-Alpes, Centre National de la Recherche Scientifique, Grenoble F-38041 France
| | - Olga Makshakova
- FRC
Kazan Scientific Center of Russian Academy of Sciences, Kazan Institute of Biochemistry and Biophysics, Kazan 420111, Russia
| | - Jesus Angulo
- Insituto
de Investigaciones Quimicas, CIC Cartuja, CSIC and Universidad de Sevilla, Sevilla, SP 41092, Spain
| | - Emiliano Bedini
- Department
of Chemical Sciences, University of Naples
Federico II, Naples,I-80126, Italy
| | - Antonella Bisio
- Istituto
di Richerche Chimiche e Biochimiche, G. Ronzoni, Milan I-20133, Italy
| | - Jose Luis de Paz
- Insituto
de Investigaciones Quimicas, CIC Cartuja, CSIC and Universidad de Sevilla, Sevilla, SP 41092, Spain
| | - Elisa Fadda
- Department
of Chemistry and Hamilton Institute, Maynooth
University, Maynooth W23 F2H6, Ireland
| | - Marco Guerrini
- Istituto
di Richerche Chimiche e Biochimiche, G. Ronzoni, Milan I-20133, Italy
| | - Michal Hricovini
- Institute
of Chemistry, Slovak Academy of Sciences, Bratislava SK-845 38, Slovakia
| | - Milos Hricovini
- Institute
of Chemistry, Slovak Academy of Sciences, Bratislava SK-845 38, Slovakia
| | - Frederique Lisacek
- Computer
Science Department & Section of Biology, University of Geneva & Swiss Institue of Bioinformatics, Geneva CH-1227, Switzerland
| | - Pedro M. Nieto
- Insituto
de Investigaciones Quimicas, CIC Cartuja, CSIC and Universidad de Sevilla, Sevilla, SP 41092, Spain
| | - Kevin Pagel
- Institut
für Chemie und Biochemie Organische Chemie, Freie Universität Berlin, Berlin 14195, Germany
| | - Giulia Paiardi
- Molecular
and Cellular Modeling Group, Heidelberg Institute for Theoretical
Studies, Heidelberg University, Heidelberg 69118, Germany
| | - Ralf Richter
- School
of Biomedical Sciences, Faculty of Biological Sciences, School of
Physics and Astronomy, Faculty of Engineering and Physical Sciences,
Astbury Centre for Structural Molecular Biology and Bragg Centre for
Materials Research, University of Leeds, Leeds LS2 9JT, United Kingdom
| | - Sergey A. Samsonov
- Department
of Theoretical Chemistry, Faculty of Chemistry, University of Gdansk, Gdsank 80-309, Poland
| | - Romain R. Vivès
- Univ.
Grenoble Alpes, CNRS, CEA, IBS, Grenoble F-38044, France
| | - Dragana Nikitovic
- School
of Histology-Embriology, Medical School, University of Crete, Heraklion 71003, Greece
| | - Sylvie Ricard Blum
- University
Claude Bernard Lyon 1, CNRS, INSA Lyon, CPE, Institute of Molecular and Supramolecular Chemistry and Biochemistry,
UMR 5246, Villeurbanne F 69622 Cedex, France
| |
Collapse
|
14
|
Joeres R, Bojar D, Kalinina OV. GlyLES: Grammar-based Parsing of Glycans from IUPAC-condensed to SMILES. J Cheminform 2023; 15:37. [PMID: 36959676 PMCID: PMC10035253 DOI: 10.1186/s13321-023-00704-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 02/18/2023] [Indexed: 03/25/2023] Open
Abstract
Glycans are important polysaccharides on cellular surfaces that are bound to glycoproteins and glycolipids. These are one of the most common post-translational modifications of proteins in eukaryotic cells. They play important roles in protein folding, cell-cell interactions, and other extracellular processes. Changes in glycan structures may influence the course of different diseases, such as infections or cancer. Glycans are commonly represented using the IUPAC-condensed notation. IUPAC-condensed is a textual representation of glycans operating on the same topological level as the Symbol Nomenclature for Glycans (SNFG) that assigns colored, geometrical shapes to the main monomers. These symbols are then connected in tree-like structures, visualizing the glycan structure on a topological level. Yet for a representation on the atomic level, notations such as SMILES should be used. To our knowledge, there is no easy-to-use, general, open-source, and offline tool to convert the IUPAC-condensed notation to SMILES. Here, we present the open-access Python package GlyLES for the generalizable generation of SMILES representations out of IUPAC-condensed representations. GlyLES uses a grammar to read in the monomer tree from the IUPAC-condensed notation. From this tree, the tool can compute the atomic structures of each monomer based on their IUPAC-condensed descriptions. In the last step, it merges all monomers into the atomic structure of a glycan in the SMILES notation. GlyLES is the first package that allows conversion from the IUPAC-condensed notation of glycans to SMILES strings. This may have multiple applications, including straightforward visualization, substructure search, molecular modeling and docking, and a new featurization strategy for machine-learning algorithms. GlyLES is available at https://github.com/kalininalab/GlyLES.
Collapse
Affiliation(s)
- Roman Joeres
- grid.7490.a0000 0001 2238 295XHelmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbruecken, Germany
- grid.11749.3a0000 0001 2167 7588Center for Bioinformatics, Saarland University, Saarbruecken, Germany
| | - Daniel Bojar
- grid.8761.80000 0000 9919 9582Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- grid.8761.80000 0000 9919 9582Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Olga V. Kalinina
- grid.7490.a0000 0001 2238 295XHelmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbruecken, Germany
- grid.11749.3a0000 0001 2167 7588Center for Bioinformatics, Saarland University, Saarbruecken, Germany
- grid.11749.3a0000 0001 2167 7588Faculty of Medicine, Saarland University, Homburg, Germany
| |
Collapse
|
15
|
Li H, Chiang AWT, Lewis NE. Artificial intelligence in the analysis of glycosylation data. Biotechnol Adv 2022; 60:108008. [PMID: 35738510 PMCID: PMC11157671 DOI: 10.1016/j.biotechadv.2022.108008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 06/15/2022] [Accepted: 06/16/2022] [Indexed: 11/18/2022]
Abstract
Glycans are complex, yet ubiquitous across biological systems. They are involved in diverse essential organismal functions. Aberrant glycosylation may lead to disease development, such as cancer, autoimmune diseases, and inflammatory diseases. Glycans, both normal and aberrant, are synthesized using extensive glycosylation machinery, and understanding this machinery can provide invaluable insights for diagnosis, prognosis, and treatment of various diseases. Increasing amounts of glycomics data are being generated thanks to advances in glycoanalytics technologies, but to maximize the value of such data, innovations are needed for analyzing and interpreting large-scale glycomics data. Artificial intelligence (AI) provides a powerful analysis toolbox in many scientific fields, and here we review state-of-the-art AI approaches on glycosylation analysis. We further discuss how models can be analyzed to gain mechanistic insights into glycosylation machinery and how the machinery shapes glycans under different scenarios. Finally, we propose how to leverage the gained knowledge for developing predictive AI-based models of glycosylation. Thus, guiding future research of AI-based glycosylation model development will provide valuable insights into glycosylation and glycan machinery.
Collapse
Affiliation(s)
- Haining Li
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Austin W T Chiang
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA.
| | - Nathan E Lewis
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
16
|
Abstract
Artificial intelligence (AI) methods have been and are now being increasingly integrated in prediction software implemented in bioinformatics and its glycoscience branch known as glycoinformatics. AI techniques have evolved in the past decades, and their applications in glycoscience are not yet widespread. This limited use is partly explained by the peculiarities of glyco-data that are notoriously hard to produce and analyze. Nonetheless, as time goes, the accumulation of glycomics, glycoproteomics, and glycan-binding data has reached a point where even the most recent deep learning methods can provide predictors with good performance. We discuss the historical development of the application of various AI methods in the broader field of glycoinformatics. A particular focus is placed on shining a light on challenges in glyco-data handling, contextualized by lessons learnt from related disciplines. Ending on the discussion of state-of-the-art deep learning approaches in glycoinformatics, we also envision the future of glycoinformatics, including development that need to occur in order to truly unleash the capabilities of glycoscience in the systems biology era.
Collapse
Affiliation(s)
- Daniel Bojar
- Department
of Chemistry and Molecular Biology, University
of Gothenburg, Gothenburg 41390, Sweden
- Wallenberg
Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg 41390, Sweden
| | - Frederique Lisacek
- Proteome
Informatics Group, Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland
- Computer
Science Department & Section of Biology, University of Geneva, route de Drize 7, CH-1227, Geneva, Switzerland
| |
Collapse
|
17
|
Akmal MA, Hassan MA, Muhammad S, Khurshid KS, Mohamed A. An analytical study on the identification of N-linked glycosylation sites using machine learning model. PeerJ Comput Sci 2022; 8:e1069. [PMID: 36262138 PMCID: PMC9575850 DOI: 10.7717/peerj-cs.1069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 07/25/2022] [Indexed: 06/16/2023]
Abstract
N-linked is the most common type of glycosylation which plays a significant role in identifying various diseases such as type I diabetes and cancer and helps in drug development. Most of the proteins cannot perform their biological and psychological functionalities without undergoing such modification. Therefore, it is essential to identify such sites by computational techniques because of experimental limitations. This study aims to analyze and synthesize the progress to discover N-linked places using machine learning methods. It also explores the performance of currently available tools to predict such sites. Almost seventy research articles published in recognized journals of the N-linked glycosylation field have shortlisted after the rigorous filtering process. The findings of the studies have been reported based on multiple aspects: publication channel, feature set construction method, training algorithm, and performance evaluation. Moreover, a literature survey has developed a taxonomy of N-linked sequence identification. Our study focuses on the performance evaluation criteria, and the importance of N-linked glycosylation motivates us to discover resources that use computational methods instead of the experimental method due to its limitations.
Collapse
Affiliation(s)
- Muhammad Aizaz Akmal
- Department of Computer Science, University of Engineering and Technology, KSK, Lahore, Punjab, Pakistan
| | - Muhammad Awais Hassan
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | - Shoaib Muhammad
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | - Khaldoon S. Khurshid
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | | |
Collapse
|
18
|
Flevaris K, Kontoravdi C. Immunoglobulin G N-glycan Biomarkers for Autoimmune Diseases: Current State and a Glycoinformatics Perspective. Int J Mol Sci 2022; 23:ijms23095180. [PMID: 35563570 PMCID: PMC9100869 DOI: 10.3390/ijms23095180] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/02/2022] [Accepted: 05/04/2022] [Indexed: 02/04/2023] Open
Abstract
The effective treatment of autoimmune disorders can greatly benefit from disease-specific biomarkers that are functionally involved in immune system regulation and can be collected through minimally invasive procedures. In this regard, human serum IgG N-glycans are promising for uncovering disease predisposition and monitoring progression, and for the identification of specific molecular targets for advanced therapies. In particular, the IgG N-glycome in diseased tissues is considered to be disease-dependent; thus, specific glycan structures may be involved in the pathophysiology of autoimmune diseases. This study provides a critical overview of the literature on human IgG N-glycomics, with a focus on the identification of disease-specific glycan alterations. In order to expedite the establishment of clinically-relevant N-glycan biomarkers, the employment of advanced computational tools for the interpretation of clinical data and their relationship with the underlying molecular mechanisms may be critical. Glycoinformatics tools, including artificial intelligence and systems glycobiology approaches, are reviewed for their potential to provide insight into patient stratification and disease etiology. Challenges in the integration of such glycoinformatics approaches in N-glycan biomarker research are critically discussed.
Collapse
|
19
|
Lundstrøm J, Korhonen E, Lisacek F, Bojar D. LectinOracle: A Generalizable Deep Learning Model for Lectin-Glycan Binding Prediction. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2022; 9:e2103807. [PMID: 34862760 PMCID: PMC8728848 DOI: 10.1002/advs.202103807] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 11/03/2021] [Indexed: 05/07/2023]
Abstract
Ranging from bacterial cell adhesion over viral cell entry to human innate immunity, glycan-binding proteins or lectins are abound in nature. Widely used as staining and characterization reagents in cell biology and crucial for understanding the interactions in biological systems, lectins are a focal point of study in glycobiology. Yet the sheer breadth and depth of specificity for diverse oligosaccharide motifs has made studying lectins a largely piecemeal approach, with few options to generalize. Here, LectinOracle, a model combining transformer-based representations for proteins and graph convolutional neural networks for glycans to predict their interaction, is presented. Using a curated data set of 564,647 unique protein-glycan interactions, it is shown that LectinOracle predictions agree with literature-annotated specificities for a wide range of lectins. Using a range of specialized glycan arrays, it is shown that LectinOracle predictions generalize to new glycans and lectins, with qualitative and quantitative agreement with experimental data. It is further demonstrated that LectinOracle can be used to improve lectin classification, accelerate lectin directed evolution, predict epidemiological outcomes in the context of influenza virus, and analyze whole lectomes in host-microbe interactions. It is envisioned that the herein presented platform will advance both the study of lectins and their role in (glyco)biology.
Collapse
Affiliation(s)
- Jon Lundstrøm
- Department of Chemistry and Molecular BiologyUniversity of GothenburgGothenburg41390Sweden
- Wallenberg Centre for Molecular and Translational MedicineUniversity of GothenburgGothenburg41390Sweden
| | - Emma Korhonen
- Department of Chemistry and Molecular BiologyUniversity of GothenburgGothenburg41390Sweden
- Wallenberg Centre for Molecular and Translational MedicineUniversity of GothenburgGothenburg41390Sweden
| | - Frédérique Lisacek
- Swiss Institute of BioinformaticsGeneva1227Switzerland
- Computer Science DepartmentUniGeGeneva1227Switzerland
- Section of BiologyUniGeGeneva1205Switzerland
| | - Daniel Bojar
- Department of Chemistry and Molecular BiologyUniversity of GothenburgGothenburg41390Sweden
- Wallenberg Centre for Molecular and Translational MedicineUniversity of GothenburgGothenburg41390Sweden
| |
Collapse
|
20
|
Dealing with the Ambiguity of Glycan Substructure Search. MOLECULES (BASEL, SWITZERLAND) 2021; 27:molecules27010065. [PMID: 35011294 PMCID: PMC8746581 DOI: 10.3390/molecules27010065] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 12/17/2021] [Accepted: 12/17/2021] [Indexed: 01/15/2023]
Abstract
The level of ambiguity in describing glycan structure has significantly increased with the upsurge of large-scale glycomics and glycoproteomics experiments. Consequently, an ontology-based model appears as an appropriate solution for navigating these data. However, navigation is not sufficient and the model should also enable advanced search and comparison. A new ontology with a tree logical structure is introduced to represent glycan structures irrespective of the precision of molecular details. The model heavily relies on the GlycoCT encoding of glycan structures. Its implementation in the GlySTreeM knowledge base was validated with GlyConnect data and benchmarked with the Glycowork library. GlySTreeM is shown to be fast, consistent, reliable and more flexible than existing solutions for matching parts of or whole glycan structures. The model is also well suited for painless future expansion.
Collapse
|
21
|
Thomès L, Bojar D. The Role of Fucose-Containing Glycan Motifs Across Taxonomic Kingdoms. Front Mol Biosci 2021; 8:755577. [PMID: 34631801 PMCID: PMC8492980 DOI: 10.3389/fmolb.2021.755577] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 09/10/2021] [Indexed: 11/13/2022] Open
Abstract
The extraordinary diversity of glycans leads to large differences in the glycomes of different kingdoms of life. Yet, while most monosaccharides are solely found in certain taxonomic groups, there is a small set of monosaccharides with widespread distribution across nearly all domains of life. These general monosaccharides are particularly relevant for glycan motifs, as they can readily be used by commensals and pathogens to mimic host glycans or hijack existing glycan recognition systems. Among these, the monosaccharide fucose is especially interesting, as it frequently presents itself as a terminal monosaccharide, primed for interaction with proteins. Here, we analyze fucose-containing glycan motifs across all taxonomic kingdoms. Using a hereby presented large species-specific glycan dataset and a plethora of methods for glycan-focused bioinformatics and machine learning, we identify characteristic as well as shared fucose-containing glycan motifs for various taxonomic groups, demonstrating clear differences in fucose usage. Even within domains, fucose is used differentially based on an organism’s physiology and habitat. We particularly highlight differences in fucose-containing motifs between vertebrates and invertebrates. With the example of pathogenic and non-pathogenic Escherichia coli strains, we also demonstrate the importance of fucose-containing motifs in molecular mimicry and thereby pathogenic potential. We envision that this study will shed light on an important class of glycan motifs, with potential new insights into the role of fucosylated glycans in symbiosis, pathogenicity, and immunity.
Collapse
Affiliation(s)
- Luc Thomès
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden.,Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Daniel Bojar
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden.,Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|