1
|
Hou X, Wang Y, Bu D, Wang Y, Sun S. EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction. Bioinformatics 2023; 39:btad650. [PMID: 37930896 PMCID: PMC10627407 DOI: 10.1093/bioinformatics/btad650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 09/14/2023] [Indexed: 11/08/2023] Open
Abstract
MOTIVATION N-linked glycosylation is a frequently occurring post-translational protein modification that serves critical functions in protein folding, stability, trafficking, and recognition. Its involvement spans across multiple biological processes and alterations to this process can result in various diseases. Therefore, identifying N-linked glycosylation sites is imperative for comprehending the mechanisms and systems underlying glycosylation. Due to the inherent experimental complexities, machine learning and deep learning have become indispensable tools for predicting these sites. RESULTS In this context, a new approach called EMNGly has been proposed. The EMNGly approach utilizes pretrained protein language model (Evolutionary Scale Modeling) and pretrained protein structure model (Inverse Folding Model) for features extraction and support vector machine for classification. Ten-fold cross-validation and independent tests show that this approach has outperformed existing techniques. And it achieves Matthews Correlation Coefficient, sensitivity, specificity, and accuracy of 0.8282, 0.9343, 0.8934, and 0.9143, respectively on a benchmark independent test set.
Collapse
Affiliation(s)
- Xiaoyang Hou
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yu Wang
- Syneron Technology, Guangzhou 510000, China
| | - Dongbo Bu
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yaojun Wang
- College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
| | - Shiwei Sun
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
2
|
Harvey DJ. Analysis of carbohydrates and glycoconjugates by matrix-assisted laser desorption/ionization mass spectrometry: An update for 2019-2020. MASS SPECTROMETRY REVIEWS 2022:e21806. [PMID: 36468275 DOI: 10.1002/mas.21806] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
This review is the tenth update of the original article published in 1999 on the application of matrix-assisted laser desorption/ionization (MALDI) mass spectrometry to the analysis of carbohydrates and glycoconjugates and brings coverage of the literature to the end of 2020. Also included are papers that describe methods appropriate to analysis by MALDI, such as sample preparation techniques, even though the ionization method is not MALDI. The review is basically divided into three sections: (1) general aspects such as theory of the MALDI process, matrices, derivatization, MALDI imaging, fragmentation, quantification and the use of arrays. (2) Applications to various structural types such as oligo- and polysaccharides, glycoproteins, glycolipids, glycosides and biopharmaceuticals, and (3) other areas such as medicine, industrial processes and glycan synthesis where MALDI is extensively used. Much of the material relating to applications is presented in tabular form. The reported work shows increasing use of incorporation of new techniques such as ion mobility and the enormous impact that MALDI imaging is having. MALDI, although invented nearly 40 years ago is still an ideal technique for carbohydrate analysis and advancements in the technique and range of applications show little sign of diminishing.
Collapse
Affiliation(s)
- David J Harvey
- Nuffield Department of Medicine, Target Discovery Institute, University of Oxford, Oxford, UK
- Department of Chemistry, University of Oxford, Oxford, Oxfordshire, United Kingdom
| |
Collapse
|
5
|
Wang H, Zhang J, Dong J, Hou M, Pan W, Bu D, Zhou J, Zhang Q, Wang Y, Zhao K, Li Y, Huang C, Sun S. Identification of glycan branching patterns using multistage mass spectrometry with spectra tree analysis. J Proteomics 2020; 217:103649. [PMID: 31978548 DOI: 10.1016/j.jprot.2020.103649] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 01/02/2020] [Accepted: 01/16/2020] [Indexed: 12/11/2022]
Abstract
Glycans are crucial to a wide range of biological processes, and their biological activities are closely related to the branching patterns of structures. Different from the simple linear chains of proteins, branching patterns of glycans are more complicated, making their identification extremely challenging. Tandem mass spectrometry (MS2) cannot provide sufficient structural information to deduce glycan branching patterns even with the assistance of various bioinformatic tools and algorithms.The promising technology to identify glycan branching patterns is multi-stage mass spectrometry (MSn). The production-relationship among MSn spectra of a glycan is essentially a tree, making deducing glycan structures from MSn spectra a great challenge. In the present study, we report an approach called glyBranch (glycan Branching pattern identification based on spectra tree) to fully exploit the information contained in the MSn spectra tree for glycan identification. Using 14 glycan standards, including 2 pairs with isomeric sequence, and 16 complex N-glycans isolated from RNase B and IgG, we demonstrated the successful application of glyBranch to branching pattern analysis. The source code of glyBranch is available at https://github.com/bigict/glyBranch/. We have also developed a web-server, which is freely accessible at http://glycan.ict.ac.cn/glyBranch/. SIGNIFICANCE: Glycans are crucial in various biological processes and their functions are closely related to the details of their structures; thus, the identification of glycan branching patterns is of great significance to biological studies. Multistage mass spectrometry (MSn) can provide detailed structural information by generating multiple-level fragments through consecutive fragmentation; however, the interpretation of numerous MSn spectra is extremely challenging. In this study, we present an approach called glyBranch (glycan Branching pattern identification based on spectra tree) to exploit the information contained in MSn spectra tree for glycan identification. This approach will greatly facilitate the automated identification of glycan structures and related biological studies.
Collapse
Affiliation(s)
- Hui Wang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jingwei Zhang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA
| | - Junchuan Dong
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Meijie Hou
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Weiyi Pan
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dongbo Bu
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jinyu Zhou
- Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qi Zhang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yaojun Wang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; College of Information and Electrical Engineering, China Agricultural University, 100083,China
| | - Keli Zhao
- Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yan Li
- Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chuncui Huang
- Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Shiwei Sun
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|