1
|
He X, Zhao L, Tian Y, Li R, Chu Q, Gu Z, Zheng M, Wang Y, Li S, Jiang H, Jiang Y, Wen L, Wang D, Cheng X. Highly accurate carbohydrate-binding site prediction with DeepGlycanSite. Nat Commun 2024; 15:5163. [PMID: 38886381 PMCID: PMC11183243 DOI: 10.1038/s41467-024-49516-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 06/10/2024] [Indexed: 06/20/2024] Open
Abstract
As the most abundant organic substances in nature, carbohydrates are essential for life. Understanding how carbohydrates regulate proteins in the physiological and pathological processes presents opportunities to address crucial biological problems and develop new therapeutics. However, the diversity and complexity of carbohydrates pose a challenge in experimentally identifying the sites where carbohydrates bind to and act on proteins. Here, we introduce a deep learning model, DeepGlycanSite, capable of accurately predicting carbohydrate-binding sites on a given protein structure. Incorporating geometric and evolutionary features of proteins into a deep equivariant graph neural network with the transformer architecture, DeepGlycanSite remarkably outperforms previous state-of-the-art methods and effectively predicts binding sites for diverse carbohydrates. Integrating with a mutagenesis study, DeepGlycanSite reveals the guanosine-5'-diphosphate-sugar-recognition site of an important G-protein coupled receptor. These findings demonstrate DeepGlycanSite is invaluable for carbohydrate-binding site prediction and could provide insights into molecular mechanisms underlying carbohydrate-regulation of therapeutically important proteins.
Collapse
Affiliation(s)
- Xinheng He
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Lifen Zhao
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Yinping Tian
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Rui Li
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Qinyu Chu
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
| | - Zhiyong Gu
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
| | - Mingyue Zheng
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
| | - Yusong Wang
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, China
| | - Shaoning Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Hualiang Jiang
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
- Lingang Laboratory, Shanghai, China
| | - Yi Jiang
- Lingang Laboratory, Shanghai, China
| | - Liuqing Wen
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | | | - Xi Cheng
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China.
| |
Collapse
|
2
|
Canner SW, Shanker S, Gray JJ. Structure-based neural network protein-carbohydrate interaction predictions at the residue level. FRONTIERS IN BIOINFORMATICS 2023; 3:1186531. [PMID: 37409346 PMCID: PMC10318439 DOI: 10.3389/fbinf.2023.1186531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 05/31/2023] [Indexed: 07/07/2023] Open
Abstract
Carbohydrates dynamically and transiently interact with proteins for cell-cell recognition, cellular differentiation, immune response, and many other cellular processes. Despite the molecular importance of these interactions, there are currently few reliable computational tools to predict potential carbohydrate-binding sites on any given protein. Here, we present two deep learning (DL) models named CArbohydrate-Protein interaction Site IdentiFier (CAPSIF) that predicts non-covalent carbohydrate-binding sites on proteins: (1) a 3D-UNet voxel-based neural network model (CAPSIF:V) and (2) an equivariant graph neural network model (CAPSIF:G). While both models outperform previous surrogate methods used for carbohydrate-binding site prediction, CAPSIF:V performs better than CAPSIF:G, achieving test Dice scores of 0.597 and 0.543 and test set Matthews correlation coefficients (MCCs) of 0.599 and 0.538, respectively. We further tested CAPSIF:V on AlphaFold2-predicted protein structures. CAPSIF:V performed equivalently on both experimentally determined structures and AlphaFold2-predicted structures. Finally, we demonstrate how CAPSIF models can be used in conjunction with local glycan-docking protocols, such as GlycanDock, to predict bound protein-carbohydrate structures.
Collapse
Affiliation(s)
- Samuel W. Canner
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, United States
| | - Sudhanshu Shanker
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Jeffrey J. Gray
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, United States
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, United States
| |
Collapse
|
3
|
Canner SW, Shanker S, Gray JJ. Structure-Based Neural Network Protein-Carbohydrate Interaction Predictions at the Residue Level. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.14.531382. [PMID: 36993750 PMCID: PMC10054975 DOI: 10.1101/2023.03.14.531382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/13/2023]
Abstract
Carbohydrates dynamically and transiently interact with proteins for cell-cell recognition, cellular differentiation, immune response, and many other cellular processes. Despite the molecular importance of these interactions, there are currently few reliable computational tools to predict potential carbohydrate binding sites on any given protein. Here, we present two deep learning models named CArbohydrate-Protein interaction Site IdentiFier (CAPSIF) that predict carbohydrate binding sites on proteins: (1) a 3D-UNet voxel-based neural network model (CAPSIF:V) and (2) an equivariant graph neural network model (CAPSIF:G). While both models outperform previous surrogate methods used for carbohydrate binding site prediction, CAPSIF:V performs better than CAPSIF:G, achieving test Dice scores of 0.597 and 0.543 and test set Matthews correlation coefficients (MCCs) of 0.599 and 0.538, respectively. We further tested CAPSIF:V on AlphaFold2-predicted protein structures. CAPSIF:V performed equivalently on both experimentally determined structures and AlphaFold2 predicted structures. Finally, we demonstrate how CAPSIF models can be used in conjunction with local glycan-docking protocols, such as GlycanDock, to predict bound protein-carbohydrate structures.
Collapse
Affiliation(s)
- Samuel W Canner
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, United States of America
| | - Sudhanshu Shanker
- Dept. of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, United States of America
| | - Jeffrey J Gray
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, United States of America
- Dept. of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, United States of America
| |
Collapse
|
4
|
Dixit R, Khambhati K, Supraja KV, Singh V, Lederer F, Show PL, Awasthi MK, Sharma A, Jain R. Application of machine learning on understanding biomolecule interactions in cellular machinery. BIORESOURCE TECHNOLOGY 2023; 370:128522. [PMID: 36565819 DOI: 10.1016/j.biortech.2022.128522] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/17/2022] [Accepted: 12/20/2022] [Indexed: 06/17/2023]
Abstract
Machine learning (ML) applications have become ubiquitous in all fields of research including protein science and engineering. Apart from protein structure and mutation prediction, scientists are focusing on knowledge gaps with respect to the molecular mechanisms involved in protein binding and interactions with other components in the experimental setups or the human body. Researchers are working on several wet-lab techniques and generating data for a better understanding of concepts and mechanics involved. The information like biomolecular structure, binding affinities, structure fluctuations and movements are enormous which can be handled and analyzed by ML. Therefore, this review highlights the significance of ML in understanding the biomolecular interactions while assisting in various fields of research such as drug discovery, nanomedicine, nanotoxicity and material science. Hence, the way ahead would be to force hand-in hand of laboratory work and computational techniques.
Collapse
Affiliation(s)
- Rewati Dixit
- Waste Treatment Laboratory, Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Haus-khas, New Delhi 110016, India
| | - Khushal Khambhati
- Department of Biosciences, School of Science, Indrashil University, Rajpur, Mehsana 382715, Gujarat, India
| | - Kolli Venkata Supraja
- Waste Treatment Laboratory, Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Haus-khas, New Delhi 110016, India
| | - Vijai Singh
- Department of Biosciences, School of Science, Indrashil University, Rajpur, Mehsana 382715, Gujarat, India
| | - Franziska Lederer
- Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz Institute Freiberg for Resource Technology, Bautzner landstrasse 400, 01328 Dresden, Germany
| | - Pau-Loke Show
- Zhejiang Provincial Key Laboratory for Subtropical Water Environment and Marine Biological Resources Protection, Wenzhou University, Wenzhou 325035, China; Department of Sustainable Engineering, Saveetha School of Engineering, SIMATS, Chennai 602105, India; Department of Chemical and Environmental Engineering, University of Nottingham, Malaysia, 43500 Semenyih, Selangor Darul Ehsan, Malaysia
| | - Mukesh Kumar Awasthi
- College of Natural Resources and Environment, Northwest A&F University, Yangling 712100, China
| | - Abhinav Sharma
- Institute Theory of Polymers, Leibniz Institute for Polymer Research, Hohe Strasse 6, 01069 Dresden, Germany
| | - Rohan Jain
- Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz Institute Freiberg for Resource Technology, Bautzner landstrasse 400, 01328 Dresden, Germany.
| |
Collapse
|
5
|
Peng HP, Yang AS. Computational Analysis of Antibody Paratopes for Antibody Sequences in Antibody Libraries. Methods Mol Biol 2023; 2552:437-445. [PMID: 36346607 DOI: 10.1007/978-1-0716-2609-2_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
To ensure the functionalities of the antibodies in phage-displayed synthetic antibody libraries, we use computational method to evaluate the designs of the antibody libraries. The computational methodologies developed in our lab for designing antibody library provide rich information on the function of the designed antibody sequences-adequate antibody designs for a specific antigen type should have predicted paratopes for the antigen type. This computational assessment of the designed antibody sequences helps eliminate non-functional designs before proceeding to construct the library designs in the wet lab. As such, only reasonable antibody designs are constructed for antibody discoveries.
Collapse
Affiliation(s)
- Hung-Pin Peng
- Genomics Research Center, Academia Sinica, Taipei, Taiwan.
| | - An-Suei Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan.
| |
Collapse
|
6
|
Abstract
Glycoscience assembles all the scientific disciplines involved in studying various molecules and macromolecules containing carbohydrates and complex glycans. Such an ensemble involves one of the most extensive sets of molecules in quantity and occurrence since they occur in all microorganisms and higher organisms. Once the compositions and sequences of these molecules are established, the determination of their three-dimensional structural and dynamical features is a step toward understanding the molecular basis underlying their properties and functions. The range of the relevant computational methods capable of addressing such issues is anchored by the specificity of stereoelectronic effects from quantum chemistry to mesoscale modeling throughout molecular dynamics and mechanics and coarse-grained and docking calculations. The Review leads the reader through the detailed presentations of the applications of computational modeling. The illustrations cover carbohydrate-carbohydrate interactions, glycolipids, and N- and O-linked glycans, emphasizing their role in SARS-CoV-2. The presentation continues with the structure of polysaccharides in solution and solid-state and lipopolysaccharides in membranes. The full range of protein-carbohydrate interactions is presented, as exemplified by carbohydrate-active enzymes, transporters, lectins, antibodies, and glycosaminoglycan binding proteins. A final section features a list of 150 tools and databases to help address the many issues of structural glycobioinformatics.
Collapse
Affiliation(s)
- Serge Perez
- Centre de Recherche sur les Macromolecules Vegetales, University of Grenoble-Alpes, Centre National de la Recherche Scientifique, Grenoble F-38041, France
| | - Olga Makshakova
- FRC Kazan Scientific Center of Russian Academy of Sciences, Kazan Institute of Biochemistry and Biophysics, Kazan 420111, Russia
| |
Collapse
|
7
|
Franke B, Veses-Garcia M, Diederichs K, Allison H, Rigden DJ, Mayans O. Structural annotation of the conserved carbohydrate esterase vb_24B_21 from Shiga toxin-encoding bacteriophage Φ24B. J Struct Biol 2020; 212:107596. [DOI: 10.1016/j.jsb.2020.107596] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 07/21/2020] [Accepted: 07/30/2020] [Indexed: 12/24/2022]
|
8
|
Gattani S, Mishra A, Hoque MT. StackCBPred: A stacking based prediction of protein-carbohydrate binding sites from sequence. Carbohydr Res 2019; 486:107857. [DOI: 10.1016/j.carres.2019.107857] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 10/05/2019] [Accepted: 10/23/2019] [Indexed: 11/26/2022]
|
9
|
In vivo cancer targeting via glycopolyester nanoparticle mediated metabolic cell labeling followed by click reaction. Biomaterials 2019; 218:119305. [DOI: 10.1016/j.biomaterials.2019.119305] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 06/21/2019] [Accepted: 06/24/2019] [Indexed: 01/18/2023]
|
10
|
Jian JW, Chen HS, Chiu YK, Peng HP, Tung CP, Chen IC, Yu CM, Tsou YL, Kuo WY, Hsu HJ, Yang AS. Effective binding to protein antigens by antibodies from antibody libraries designed with enhanced protein recognition propensities. MAbs 2019; 11:373-387. [PMID: 30526270 PMCID: PMC6380391 DOI: 10.1080/19420862.2018.1550320] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Antibodies provide immune protection by recognizing antigens of diverse chemical properties, but elucidating the amino acid sequence-function relationships underlying the specificity and affinity of antibody-antigen interactions remains challenging. We designed and constructed phage-displayed synthetic antibody libraries with enriched protein antigen-recognition propensities calculated with machine learning predictors, which indicated that the designed single-chain variable fragment variants were encoded with enhanced distributions of complementarity-determining region (CDR) hot spot residues with high protein antigen recognition propensities in comparison with those in the human antibody germline sequences. Antibodies derived directly from the synthetic antibody libraries, without affinity maturation cycles comparable to those in in vivo immune systems, bound to the corresponding protein antigen through diverse conformational or linear epitopes with specificity and affinity comparable to those of the affinity-matured antibodies from in vivo immune systems. The results indicated that more densely populated CDR hot spot residues were sustainable by the antibody structural frameworks and could be accompanied by enhanced functionalities in recognizing protein antigens. Our study results suggest that synthetic antibody libraries, which are not limited by the sequences found in antibodies in nature, could be designed with the guidance of the computational machine learning algorithms that are programmed to predict interaction propensities to molecules of diverse chemical properties, leading to antibodies with optimal characteristics pertinent to their medical applications.
Collapse
Affiliation(s)
- Jhih-Wei Jian
- a Genomics Research Center , Academia Sinica , Taipei , Taiwan.,b Institute of Biomedical Informatics, National Yang-Ming University , Taipei , Taiwan.,c Bioinformatics Program, Taiwan International Graduate Program , Institute of Information Science, Academia Sinica , Taipei , Taiwan
| | - Hong-Sen Chen
- a Genomics Research Center , Academia Sinica , Taipei , Taiwan
| | - Yi-Kai Chiu
- a Genomics Research Center , Academia Sinica , Taipei , Taiwan
| | - Hung-Pin Peng
- a Genomics Research Center , Academia Sinica , Taipei , Taiwan
| | - Chao-Ping Tung
- a Genomics Research Center , Academia Sinica , Taipei , Taiwan
| | - Ing-Chien Chen
- a Genomics Research Center , Academia Sinica , Taipei , Taiwan
| | - Chung-Ming Yu
- a Genomics Research Center , Academia Sinica , Taipei , Taiwan
| | - Yueh-Liang Tsou
- a Genomics Research Center , Academia Sinica , Taipei , Taiwan
| | - Wei-Ying Kuo
- a Genomics Research Center , Academia Sinica , Taipei , Taiwan
| | - Hung-Ju Hsu
- a Genomics Research Center , Academia Sinica , Taipei , Taiwan
| | - An-Suei Yang
- a Genomics Research Center , Academia Sinica , Taipei , Taiwan
| |
Collapse
|
11
|
Neagu AN. Proteome Imaging: From Classic to Modern Mass Spectrometry-Based Molecular Histology. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1140:55-98. [PMID: 31347042 DOI: 10.1007/978-3-030-15950-4_4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
In order to overcome the limitations of classic imaging in Histology during the actually era of multiomics, the multi-color "molecular microscope" by its emerging "molecular pictures" offers quantitative and spatial information about thousands of molecular profiles without labeling of potential targets. Healthy and diseased human tissues, as well as those of diverse invertebrate and vertebrate animal models, including genetically engineered species and cultured cells, can be easily analyzed by histology-directed MALDI imaging mass spectrometry. The aims of this review are to discuss a range of proteomic information emerging from MALDI mass spectrometry imaging comparative to classic histology, histochemistry and immunohistochemistry, with applications in biology and medicine, concerning the detection and distribution of structural proteins and biological active molecules, such as antimicrobial peptides and proteins, allergens, neurotransmitters and hormones, enzymes, growth factors, toxins and others. The molecular imaging is very well suited for discovery and validation of candidate protein biomarkers in neuroproteomics, oncoproteomics, aging and age-related diseases, parasitoproteomics, forensic, and ecotoxicology. Additionally, in situ proteome imaging may help to elucidate the physiological and pathological mechanisms involved in developmental biology, reproductive research, amyloidogenesis, tumorigenesis, wound healing, neural network regeneration, matrix mineralization, apoptosis and oxidative stress, pain tolerance, cell cycle and transformation under oncogenic stress, tumor heterogeneity, behavior and aggressiveness, drugs bioaccumulation and biotransformation, organism's reaction against environmental penetrating xenobiotics, immune signaling, assessment of integrity and functionality of tissue barriers, behavioral biology, and molecular origins of diseases. MALDI MSI is certainly a valuable tool for personalized medicine and "Eco-Evo-Devo" integrative biology in the current context of global environmental challenges.
Collapse
Affiliation(s)
- Anca-Narcisa Neagu
- Laboratory of Animal Histology, Faculty of Biology, "Alexandru Ioan Cuza" University of Iasi, Iasi, Romania.
| |
Collapse
|
12
|
Zhao H, Taherzadeh G, Zhou Y, Yang Y. Computational Prediction of Carbohydrate-Binding Proteins and Binding Sites. ACTA ACUST UNITED AC 2018; 94:e75. [PMID: 30106511 DOI: 10.1002/cpps.75] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Protein-carbohydrate interaction is essential for biological systems, and carbohydrate-binding proteins (CBPs) are important targets when designing antiviral and anticancer drugs. Due to the high cost and difficulty associated with experimental approaches, many computational methods have been developed as complementary approaches to predict CBPs or carbohydrate-binding sites. However, most of these computational methods are not publicly available. Here, we provide a comprehensive review of related studies and demonstrate our two recently developed bioinformatics methods. The method SPOT-CBP is a template-based method for detecting CBPs based on structure through structural homology search combined with a knowledge-based scoring function. This method can yield model complex structure in addition to accurate prediction of CBPs. Furthermore, it has been observed that similarly accurate predictions can be made using structures from homology modeling, which has significantly expanded its applicability. The other method, SPRINT-CBH, is a de novo approach that predicts binding residues directly from protein sequences by using sequence information and predicted structural properties. This approach does not need structurally similar templates and thus is not limited by the current database of known protein-carbohydrate complex structures. These two complementary methods are available at https://sparks-lab.org. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Huiying Zhao
- Sun Yat-Sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Gold Coast, Queensland, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Gold Coast, Queensland, Australia.,Institute for Glycomics, Griffith University, Gold Coast, Queensland, Australia
| | - Yuedong Yang
- School of Information and Communication Technology, Griffith University, Gold Coast, Queensland, Australia.,Institute for Glycomics, Griffith University, Gold Coast, Queensland, Australia.,School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
13
|
Insights into the effects of glycosylation and the monosaccharide-binding activity of the plant lectin CrataBL. Glycoconj J 2017; 34:515-522. [PMID: 28299519 DOI: 10.1007/s10719-017-9766-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Revised: 03/03/2017] [Accepted: 03/07/2017] [Indexed: 10/20/2022]
Abstract
CrataBL is a glycoprotein isolated from Crataeva tapia bark, containing two N-glycosylation sites. It has been identified to present lectin activity with some specificity for binding glucose over galactose. However, to date, no information on the effects of glycosylation or CrataBL monosaccharide-binding sites and monosaccharide specificity has been obtained. Thus, molecular docking and molecular dynamics simulations were employed to characterize the glycosylated CrataBL conformation and dynamics in aqueous solutions, as well as the molecular basis for its binding specificity. The obtained results indicate both local and distant conformational stabilization effects of N-linked glycans over CrataBL protein moiety. Regarding its lectin activity, molecular docking calculations were performed in two possible binding sites, identified through sequence-based, structure-based and evolutionary information, using α- and β-anomeric states of the monosaccharides. The obtained poses were further refined through molecular dynamics simulations, suggesting that positively-charged amino acids dictate the binding preference for glucose over galactose in both sites. In addition, a possible preference for β-monosaccharides was proposed. Such data are expected to contribute to a better comprehension of the lectins monosaccharide-binding activities and carbohydrate-binding site structures.
Collapse
|
14
|
Banno M, Komiyama Y, Cao W, Oku Y, Ueki K, Sumikoshi K, Nakamura S, Terada T, Shimizu K. Development of a sugar-binding residue prediction system from protein sequences using support vector machine. Comput Biol Chem 2016; 66:36-43. [PMID: 27889654 DOI: 10.1016/j.compbiolchem.2016.10.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Revised: 10/05/2016] [Accepted: 10/23/2016] [Indexed: 11/16/2022]
Abstract
Several methods have been proposed for protein-sugar binding site prediction using machine learning algorithms. However, they are not effective to learn various properties of binding site residues caused by various interactions between proteins and sugars. In this study, we classified sugars into acidic and nonacidic sugars and showed that their binding sites have different amino acid occurrence frequencies. By using this result, we developed sugar-binding residue predictors dedicated to the two classes of sugars: an acid sugar binding predictor and a nonacidic sugar binding predictor. We also developed a combination predictor which combines the results of the two predictors. We showed that when a sugar is known to be an acidic sugar, the acidic sugar binding predictor achieves the best performance, and showed that when a sugar is known to be a nonacidic sugar or is not known to be either of the two classes, the combination predictor achieves the best performance. Our method uses only amino acid sequences for prediction. Support vector machine was used as a machine learning algorithm and the position-specific scoring matrix created by the position-specific iterative basic local alignment search tool was used as the feature vector. We evaluated the performance of the predictors using five-fold cross-validation. We have launched our system, as an open source freeware tool on the GitHub repository (https://doi.org/10.5281/zenodo.61513).
Collapse
Affiliation(s)
- Masaki Banno
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Yusuke Komiyama
- Digital Content and Media Sciences Research Division, National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-Ward, Tokyo 101-8430, Japan
| | - Wei Cao
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Yuya Oku
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Kokoro Ueki
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Kazuya Sumikoshi
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Shugo Nakamura
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Tohru Terada
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Kentaro Shimizu
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan.
| |
Collapse
|
15
|
Taherzadeh G, Zhou Y, Liew AWC, Yang Y. Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines. J Chem Inf Model 2016; 56:2115-2122. [PMID: 27623166 DOI: 10.1021/acs.jcim.6b00320] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Carbohydrate-binding proteins play significant roles in many diseases including cancer. Here, we established a machine-learning-based method (called sequence-based prediction of residue-level interaction sites of carbohydrates, SPRINT-CBH) to predict carbohydrate-binding sites in proteins using support vector machines (SVMs). We found that integrating evolution-derived sequence profiles with additional information on sequence and predicted solvent accessible surface area leads to a reasonably accurate, robust, and predictive method, with area under receiver operating characteristic curve (AUC) of 0.78 and 0.77 and Matthew's correlation coefficient of 0.34 and 0.29, respectively for 10-fold cross validation and independent test without balancing binding and nonbinding residues. The quality of the method is further demonstrated by having statistically significantly more binding residues predicted for carbohydrate-binding proteins than presumptive nonbinding proteins in the human proteome, and by the bias of rare alleles toward predicted carbohydrate-binding sites for nonsynonymous mutations from the 1000 genome project. SPRINT-CBH is available as an online server at http://sparks-lab.org/server/SPRINT-CBH .
Collapse
Affiliation(s)
- Ghazaleh Taherzadeh
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Alan Wee-Chung Liew
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Yuedong Yang
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| |
Collapse
|
16
|
Jian JW, Elumalai P, Pitti T, Wu CY, Tsai KC, Chang JY, Peng HP, Yang AS. Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms. PLoS One 2016; 11:e0160315. [PMID: 27513851 PMCID: PMC4981321 DOI: 10.1371/journal.pone.0160315] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Accepted: 07/18/2016] [Indexed: 11/18/2022] Open
Abstract
Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites.
Collapse
Affiliation(s)
- Jhih-Wei Jian
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan 11221
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan 115
| | | | - Thejkiran Pitti
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan 115
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, Taiwan 30013
| | - Chih Yuan Wu
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
| | - Keng-Chang Tsai
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
| | - Jeng-Yih Chang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
| | - Hung-Pin Peng
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
| | - An-Suei Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
- * E-mail:
| |
Collapse
|
17
|
Pai PP, Mondal S. MOWGLI: prediction of protein-MannOse interacting residues With ensemble classifiers usinG evoLutionary Information. J Biomol Struct Dyn 2015; 34:2069-83. [PMID: 26457920 DOI: 10.1080/07391102.2015.1106978] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Proteins interact with carbohydrates to perform various cellular interactions. Of the many carbohydrate ligands that proteins bind with, mannose constitute an important class, playing important roles in host defense mechanisms. Accurate identification of mannose-interacting residues (MIR) may provide important clues to decipher the underlying mechanisms of protein-mannose interactions during infections. This study proposes an approach using an ensemble of base classifiers for prediction of MIR using their evolutionary information in the form of position-specific scoring matrix. The base classifiers are random forests trained by different subsets of training data set Dset128 using 10-fold cross-validation. The optimized ensemble of base classifiers, MOWGLI, is then used to predict MIR on protein chains of the test data set Dtestset29 which showed a promising performance with 92.0% accurate prediction. An overall improvement of 26.6% in precision was observed upon comparison with the state-of-art. It is hoped that this approach, yielding enhanced predictions, could be eventually used for applications in drug design and vaccine development.
Collapse
Affiliation(s)
- Priyadarshini P Pai
- a Department of Biological Sciences , Birla Institute of Technology and Science-Pilani , K.K. Birla Goa Campus, Near NH17 Bypass Road, Zuarinagar , Goa 403726 , India
| | - Sukanta Mondal
- a Department of Biological Sciences , Birla Institute of Technology and Science-Pilani , K.K. Birla Goa Campus, Near NH17 Bypass Road, Zuarinagar , Goa 403726 , India
| |
Collapse
|
18
|
Crystal structure of Streptococcus pneumoniae pneumolysin provides key insights into early steps of pore formation. Sci Rep 2015; 5:14352. [PMID: 26403197 PMCID: PMC4585913 DOI: 10.1038/srep14352] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2015] [Accepted: 08/26/2015] [Indexed: 11/16/2022] Open
Abstract
Pore-forming proteins are weapons often used by bacterial pathogens to breach the membrane barrier of target cells. Despite their critical role in infection important structural aspects of the mechanism of how these proteins assemble into pores remain unknown. Streptococcus pneumoniae is the world’s leading cause of pneumonia, meningitis, bacteremia and otitis media. Pneumolysin (PLY) is a major virulence factor of S. pneumoniae and a target for both small molecule drug development and vaccines. PLY is a member of the cholesterol-dependent cytolysins (CDCs), a family of pore-forming toxins that form gigantic pores in cell membranes. Here we present the structure of PLY determined by X-ray crystallography and, in solution, by small-angle X-ray scattering. The crystal structure reveals PLY assembles as a linear oligomer that provides key structural insights into the poorly understood early monomer-monomer interactions of CDCs at the membrane surface.
Collapse
|
19
|
The cholesterol-dependent cytolysins pneumolysin and streptolysin O require binding to red blood cell glycans for hemolytic activity. Proc Natl Acad Sci U S A 2014; 111:E5312-20. [PMID: 25422425 DOI: 10.1073/pnas.1412703111] [Citation(s) in RCA: 95] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The cholesterol-dependent cytolysin (CDC) pneumolysin (Ply) is a key virulence factor of Streptococcus pneumoniae. Membrane cholesterol is required for the cytolytic activity of this toxin, but it is not clear whether cholesterol is the only cellular receptor. Analysis of Ply binding to a glycan microarray revealed that Ply has lectin activity and binds glycans, including the Lewis histo-blood group antigens. Surface plasmon resonance analysis showed that Ply has the highest affinity for the sialyl LewisX (sLeX) structure, with a K(d) of 1.88 × 10(-5) M. Ply hemolytic activity against human RBCs showed dose-dependent inhibition by sLeX. Flow cytometric analysis and Western blots showed that blocking binding of Ply to the sLeX glycolipid on RBCs prevents deposition of the toxin in the membrane. The lectin domain responsible for sLeX binding is in domain 4 of Ply, which contains candidate carbohydrate-binding sites. Mutagenesis of these predicted carbohydrate-binding residues of Ply resulted in a decrease in hemolytic activity and a reduced affinity for sLeX. This study reveals that this archetypal CDC requires interaction with the sLeX glycolipid cellular receptor as an essential step before membrane insertion. A similar analysis conducted on streptolysin O from Streptococcus pyogenes revealed that this CDC also has glycan-binding properties and that hemolytic activity against RBCs can be blocked with the glycan lacto-N-neotetraose by inhibiting binding to the cell surface. Together, these data support the emerging paradigm shift that pore-forming toxins, including CDCs, have cellular receptors other than cholesterol that define target cell tropism.
Collapse
|
20
|
Zhao H, Yang Y, von Itzstein M, Zhou Y. Carbohydrate-binding protein identification by coupling structural similarity searching with binding affinity prediction. J Comput Chem 2014; 35:2177-83. [PMID: 25220682 DOI: 10.1002/jcc.23730] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Revised: 05/27/2014] [Accepted: 08/25/2014] [Indexed: 02/03/2023]
Abstract
Carbohydrate-binding proteins (CBPs) are potential biomarkers and drug targets. However, the interactions between carbohydrates and proteins are challenging to study experimentally and computationally because of their low binding affinity, high flexibility, and the lack of a linear sequence in carbohydrates as exists in RNA, DNA, and proteins. Here, we describe a structure-based function-prediction technique called SPOT-Struc that identifies carbohydrate-recognizing proteins and their binding amino acid residues by structural alignment program SPalign and binding affinity scoring according to a knowledge-based statistical potential based on the distance-scaled finite-ideal gas reference state (DFIRE). The leave-one-out cross-validation of the method on 113 carbohydrate-binding domains and 3442 noncarbohydrate binding proteins yields a Matthews correlation coefficient of 0.56 for SPalign alone and 0.63 for SPOT-Struc (SPalign + binding affinity scoring) for CBP prediction. SPOT-Struc is a technique with high positive predictive value (79% correct predictions in all positive CBP predictions) with a reasonable sensitivity (52% positive predictions in all CBPs). The sensitivity of the method was changed slightly when applied to 31 APO (unbound) structures found in the protein databank (14/31 for APO versus 15/31 for HOLO). The result of SPOT-Struc will not change significantly if highly homologous templates were used. SPOT-Struc predicted 19 out of 2076 structural genome targets as CBPs. In particular, one uncharacterized protein in Bacillus subtilis (1oq1A) was matched to galectin-9 from Mus musculus. Thus, SPOT-Struc is useful for uncovering novel carbohydrate-binding proteins. SPOT-Struc is available at http://sparks-lab.org.
Collapse
Affiliation(s)
- Huiying Zhao
- Indiana University School of Informatics, Indiana University Purdue University, Indianapolis, 719 Indiana Ave, Suite 319, Indianapolis, Indiana, 46202; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, 46202
| | | | | | | |
Collapse
|
21
|
Mahalingam R, Peng HP, Yang AS. Prediction of fatty acid-binding residues on protein surfaces with three-dimensional probability distributions of interacting atoms. Biophys Chem 2014; 192:10-9. [PMID: 24934883 DOI: 10.1016/j.bpc.2014.05.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 05/22/2014] [Accepted: 05/22/2014] [Indexed: 10/25/2022]
Abstract
Protein-fatty acid interaction is vital for many cellular processes and understanding this interaction is important for functional annotation as well as drug discovery. In this work, we present a method for predicting the fatty acid (FA)-binding residues by using three-dimensional probability density distributions of interacting atoms of FAs on protein surfaces which are derived from the known protein-FA complex structures. A machine learning algorithm was established to learn the characteristic patterns of the probability density maps specific to the FA-binding sites. The predictor was trained with five-fold cross validation on a non-redundant training set and then evaluated with an independent test set as well as on holo-apo pair's dataset. The results showed good accuracy in predicting the FA-binding residues. Further, the predictor developed in this study is implemented as an online server which is freely accessible at the following website, http://ismblab.genomics.sinica.edu.tw/.
Collapse
Affiliation(s)
| | - Hung-Pin Peng
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan; Institute of Biomedical Informatics, National Yang-Ming University, Taipei 11221, Taiwan; Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - An-Suei Yang
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan.
| |
Collapse
|
22
|
Mahalingam R, Peng HP, Yang AS. Prediction of FMN-binding residues with three-dimensional probability distributions of interacting atoms on protein surfaces. J Theor Biol 2013; 343:154-61. [PMID: 24211525 DOI: 10.1016/j.jtbi.2013.10.020] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Revised: 10/29/2013] [Accepted: 10/30/2013] [Indexed: 12/12/2022]
Abstract
Flavin mono-nucleotide (FMN) is a cofactor which is involved in many biological reactions. The insights on protein-FMN interactions aid the protein functional annotation and also facilitate in drug design. In this study, we have established a new method, making use of an encoding scheme of the three-dimensional probability density maps that describe the distributions of 40 non-covalent interacting atom types around protein surfaces, to predict FMN-binding sites on protein surfaces. One machine learning model was trained for each of the 30 protein atom types to predict tentative FMN-binding sites on protein structures. The method's capability was evaluated by five-fold cross-validation on a dataset containing 81 non-redundant FMN-binding protein structures and further tested on independent datasets of 30 and 15 non-redundant protein structures respectively. These predictions achieved an accuracy of 0.94, 0.94 and 0.96 with the Matthews correlation coefficient (MCC) of 0.53, 0.53 and 0.65 respectively for the three protein structure sets. The prediction capability is superior to the existing method. This is the first structure-based approach that does not rely on evolutionary information for predicting FMN-interacting residues. The webserver for the prediction is available at http://ismblab.genomics.sinica.edu.tw/.
Collapse
Affiliation(s)
- Rajasekaran Mahalingam
- Genomics Research Center, Academia Sinica, 128 Academia Rd., Sec. 2, Nankang Dist., Taipei 115, Taiwan.
| | - Hung-Pin Peng
- Genomics Research Center, Academia Sinica, 128 Academia Rd., Sec. 2, Nankang Dist., Taipei 115, Taiwan; Institute of Biomedical Informatics, National Yang-Ming University, Taipei 11221, Taiwan; Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - An-Suei Yang
- Genomics Research Center, Academia Sinica, 128 Academia Rd., Sec. 2, Nankang Dist., Taipei 115, Taiwan.
| |
Collapse
|