1
|
He X, Zhao L, Tian Y, Li R, Chu Q, Gu Z, Zheng M, Wang Y, Li S, Jiang H, Jiang Y, Wen L, Wang D, Cheng X. Highly accurate carbohydrate-binding site prediction with DeepGlycanSite. Nat Commun 2024; 15:5163. [PMID: 38886381 PMCID: PMC11183243 DOI: 10.1038/s41467-024-49516-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 06/10/2024] [Indexed: 06/20/2024] Open
Abstract
As the most abundant organic substances in nature, carbohydrates are essential for life. Understanding how carbohydrates regulate proteins in the physiological and pathological processes presents opportunities to address crucial biological problems and develop new therapeutics. However, the diversity and complexity of carbohydrates pose a challenge in experimentally identifying the sites where carbohydrates bind to and act on proteins. Here, we introduce a deep learning model, DeepGlycanSite, capable of accurately predicting carbohydrate-binding sites on a given protein structure. Incorporating geometric and evolutionary features of proteins into a deep equivariant graph neural network with the transformer architecture, DeepGlycanSite remarkably outperforms previous state-of-the-art methods and effectively predicts binding sites for diverse carbohydrates. Integrating with a mutagenesis study, DeepGlycanSite reveals the guanosine-5'-diphosphate-sugar-recognition site of an important G-protein coupled receptor. These findings demonstrate DeepGlycanSite is invaluable for carbohydrate-binding site prediction and could provide insights into molecular mechanisms underlying carbohydrate-regulation of therapeutically important proteins.
Collapse
Affiliation(s)
- Xinheng He
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Lifen Zhao
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Yinping Tian
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Rui Li
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Qinyu Chu
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
| | - Zhiyong Gu
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
| | - Mingyue Zheng
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
| | - Yusong Wang
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, China
| | - Shaoning Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Hualiang Jiang
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
- Lingang Laboratory, Shanghai, China
| | - Yi Jiang
- Lingang Laboratory, Shanghai, China
| | - Liuqing Wen
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | | | - Xi Cheng
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China.
| |
Collapse
|
2
|
Canner SW, Shanker S, Gray JJ. Structure-based neural network protein-carbohydrate interaction predictions at the residue level. FRONTIERS IN BIOINFORMATICS 2023; 3:1186531. [PMID: 37409346 PMCID: PMC10318439 DOI: 10.3389/fbinf.2023.1186531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 05/31/2023] [Indexed: 07/07/2023] Open
Abstract
Carbohydrates dynamically and transiently interact with proteins for cell-cell recognition, cellular differentiation, immune response, and many other cellular processes. Despite the molecular importance of these interactions, there are currently few reliable computational tools to predict potential carbohydrate-binding sites on any given protein. Here, we present two deep learning (DL) models named CArbohydrate-Protein interaction Site IdentiFier (CAPSIF) that predicts non-covalent carbohydrate-binding sites on proteins: (1) a 3D-UNet voxel-based neural network model (CAPSIF:V) and (2) an equivariant graph neural network model (CAPSIF:G). While both models outperform previous surrogate methods used for carbohydrate-binding site prediction, CAPSIF:V performs better than CAPSIF:G, achieving test Dice scores of 0.597 and 0.543 and test set Matthews correlation coefficients (MCCs) of 0.599 and 0.538, respectively. We further tested CAPSIF:V on AlphaFold2-predicted protein structures. CAPSIF:V performed equivalently on both experimentally determined structures and AlphaFold2-predicted structures. Finally, we demonstrate how CAPSIF models can be used in conjunction with local glycan-docking protocols, such as GlycanDock, to predict bound protein-carbohydrate structures.
Collapse
Affiliation(s)
- Samuel W. Canner
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, United States
| | - Sudhanshu Shanker
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Jeffrey J. Gray
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, United States
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, United States
| |
Collapse
|
3
|
Canner SW, Shanker S, Gray JJ. Structure-Based Neural Network Protein-Carbohydrate Interaction Predictions at the Residue Level. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.14.531382. [PMID: 36993750 PMCID: PMC10054975 DOI: 10.1101/2023.03.14.531382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/13/2023]
Abstract
Carbohydrates dynamically and transiently interact with proteins for cell-cell recognition, cellular differentiation, immune response, and many other cellular processes. Despite the molecular importance of these interactions, there are currently few reliable computational tools to predict potential carbohydrate binding sites on any given protein. Here, we present two deep learning models named CArbohydrate-Protein interaction Site IdentiFier (CAPSIF) that predict carbohydrate binding sites on proteins: (1) a 3D-UNet voxel-based neural network model (CAPSIF:V) and (2) an equivariant graph neural network model (CAPSIF:G). While both models outperform previous surrogate methods used for carbohydrate binding site prediction, CAPSIF:V performs better than CAPSIF:G, achieving test Dice scores of 0.597 and 0.543 and test set Matthews correlation coefficients (MCCs) of 0.599 and 0.538, respectively. We further tested CAPSIF:V on AlphaFold2-predicted protein structures. CAPSIF:V performed equivalently on both experimentally determined structures and AlphaFold2 predicted structures. Finally, we demonstrate how CAPSIF models can be used in conjunction with local glycan-docking protocols, such as GlycanDock, to predict bound protein-carbohydrate structures.
Collapse
Affiliation(s)
- Samuel W. Canner
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, United States of America
| | - Sudhanshu Shanker
- Dept. of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, United States of America
| | - Jeffrey J. Gray
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, United States of America
- Dept. of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, United States of America
- Correspondence: Jeffrey J. Gray,
| |
Collapse
|
4
|
Sun Z, Zheng S, Zhao H, Niu Z, Lu Y, Pan Y, Yang Y. To Improve Prediction of Binding Residues With DNA, RNA, Carbohydrate, and Peptide Via Multi-Task Deep Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3735-3743. [PMID: 34637380 DOI: 10.1109/tcbb.2021.3118916] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
MOTIVATION The interactions of proteins with DNA, RNA, peptide, and carbohydrate play key roles in various biological processes. The studies of uncharacterized protein-molecules interactions could be aided by accurate predictions of residues that bind with partner molecules. However, the existing methods for predicting binding residues on proteins remain of relatively low accuracies due to the limited number of complex structures in databases. As different types of molecules partially share chemical mechanisms, the predictions for each molecular type should benefit from the binding information with other molecule types. RESULTS In this study, we employed a multiple task deep learning strategy to develop a new sequence-based method for simultaneously predicting binding residues/sites with multiple important molecule types named MTDsite. By combining four training sets for DNA, RNA, peptide, and carbohydrate-binding proteins, our method yielded accurate and robust predictions with AUC values of 0.852, 0836, 0.758, and 0.776 on their respective independent test sets, which are 0.52 to 6.6% better than other state-of-the-art methods. To my best knowledge, this is the first method using multi-task framework to predict multiple molecular binding sites simultaneously.
Collapse
|
5
|
Abstract
Many important interactions between bacterial pathogens and their hosts are highly specific binding events that involve host or pathogen carbohydrate structures (glycans). Glycan interactions can mediate adhesion, invasion and immune evasion and can act as receptors for toxins. Several bacterial pathogens can also enzymatically alter host glycans to reveal binding targets, degrade the host cell glycans or alter the function of host glycoproteins. In recent years, high-throughput screening technologies, such as lectin, glycan and mucin microarrays, have transformed the field by identifying new bacterial-host glycointeractions, which are crucial for colonization, persistence and disease. In this Review, we discuss interactions involving both host and bacterial glycans that have a role in bacterial pathogenesis. We also highlight recent technological advances that have illuminated the glycoscience of microbial pathogenesis.
Collapse
|
6
|
DLIGAND2: an improved knowledge-based energy function for protein-ligand interactions using the distance-scaled, finite, ideal-gas reference state. J Cheminform 2019; 11:52. [PMID: 31392430 PMCID: PMC6686496 DOI: 10.1186/s13321-019-0373-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 07/27/2019] [Indexed: 12/14/2022] Open
Abstract
Performance of structure-based molecular docking largely depends on the accuracy of scoring functions. One important type of scoring functions are knowledge-based potentials derived from known three-dimensional structures of proteins and/or protein–ligand complex structures. This study seeks to improve a knowledge-based protein–ligand potential based on a distance-scale finite ideal-gas reference (DFIRE) state (DLIGAND) by expanding the representation of protein atoms from 13 mol2 atom types to 167 residue-specific atom types, and employing a recently updated dataset containing 12,450 monomer protein chains for training. We found that the updated version DLIGAND2 has a consistent improvement over DLIGAND in predicting binding affinities for either native complex structures or docking-generated poses. More importantly, DLIGAND2 has a 52% increase over DLIGAND in enrichment factors in top 1% predictions based on the DUD-E decoy set, and consistently improves over Autodock Vina and other statistical energy functions in all three benchmark tests. We further found that DLIGAND2 outperforms empirical and machine-learning methods compared for virtual screening on new targets that are not homologous to the DUD-E training set. Given the best performance as a parameter-free statistical potential and among the best in all performance measures, DLIGAND2 should be useful for re-assessing the poses generated by docking software, or acting as one term in other scoring functions. The program is available at https://github.com/sysu-yanglab/DLIGAND2.![]()
Collapse
|
7
|
Litfin T, Yang Y, Zhou Y. SPOT-Peptide: Template-Based Prediction of Peptide-Binding Proteins and Peptide-Binding Sites. J Chem Inf Model 2019; 59:924-930. [DOI: 10.1021/acs.jcim.8b00777] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Thomas Litfin
- School of Information and Communication Technology, Griffith University, Southport, QLD 4222, Australia
| | - Yuedong Yang
- School of Data and Computer Science, Sun-Yat Sen University, Guangzhou, Guangdong 510006, China
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Southport, QLD 4222, Australia
- Institute for Glycomics, Griffith University, Southport, QLD 4222, Australia
| |
Collapse
|
8
|
Tiralongo J, Cooper O, Litfin T, Yang Y, King R, Zhan J, Zhao H, Bovin N, Day CJ, Zhou Y. YesU from Bacillus subtilis preferentially binds fucosylated glycans. Sci Rep 2018; 8:13139. [PMID: 30177739 PMCID: PMC6120924 DOI: 10.1038/s41598-018-31241-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 08/10/2018] [Indexed: 12/25/2022] Open
Abstract
The interaction of carbohydrate-binding proteins (CBPs) with their corresponding glycan ligands is challenging to study both experimentally and computationally. This is in part due to their low binding affinity, high flexibility, and the lack of a linear sequence in carbohydrates, as exists in nucleic acids and proteins. We recently described a function-prediction technique called SPOT-Struc that identifies CBPs by global structural alignment and binding-affinity prediction. Here we experimentally determined the carbohydrate specificity and binding affinity of YesU (RCSB PDB ID: 1oq1), an uncharacterized protein from Bacillus subtilis that SPOT-Struc predicted would bind high mannose-type glycans. Glycan array analyses however revealed glycan binding patterns similar to those exhibited by fucose (Fuc)-binding lectins, with SPR analysis revealing high affinity binding to Lewisx and lacto-N-fucopentaose III. Structure based alignment of YesU revealed high similarity to the legume lectins UEA-I and GS-IV, and docking of Lewisx into YesU revealed a complex structure model with predicted binding affinity of −4.3 kcal/mol. Moreover the adherence of B. subtilis to intestinal cells was significantly inhibited by Lex and Ley but by not non-fucosylated glycans, suggesting the interaction of YesU to fucosylated glycans may be involved in the adhesion of B. subtilis to the gastrointestinal tract of mammals.
Collapse
Affiliation(s)
- Joe Tiralongo
- Institute for Glycomics, Griffith University, Gold Coast Campus, QLD 4222, Australia.
| | - Oren Cooper
- Institute for Glycomics, Griffith University, Gold Coast Campus, QLD 4222, Australia
| | - Tom Litfin
- Institute for Glycomics, Griffith University, Gold Coast Campus, QLD 4222, Australia
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, People's Republic of China
| | - Rebecca King
- Institute for Glycomics, Griffith University, Gold Coast Campus, QLD 4222, Australia
| | - Jian Zhan
- Institute for Glycomics, Griffith University, Gold Coast Campus, QLD 4222, Australia
| | - Huiying Zhao
- Queensland Institute of Medical Research, Brisbane, Queensland, Australia
| | - Nicolai Bovin
- Shemyakin Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
| | - Christopher J Day
- Institute for Glycomics, Griffith University, Gold Coast Campus, QLD 4222, Australia
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Gold Coast Campus, QLD 4222, Australia.
| |
Collapse
|
9
|
Zhao H, Taherzadeh G, Zhou Y, Yang Y. Computational Prediction of Carbohydrate-Binding Proteins and Binding Sites. ACTA ACUST UNITED AC 2018; 94:e75. [PMID: 30106511 DOI: 10.1002/cpps.75] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Protein-carbohydrate interaction is essential for biological systems, and carbohydrate-binding proteins (CBPs) are important targets when designing antiviral and anticancer drugs. Due to the high cost and difficulty associated with experimental approaches, many computational methods have been developed as complementary approaches to predict CBPs or carbohydrate-binding sites. However, most of these computational methods are not publicly available. Here, we provide a comprehensive review of related studies and demonstrate our two recently developed bioinformatics methods. The method SPOT-CBP is a template-based method for detecting CBPs based on structure through structural homology search combined with a knowledge-based scoring function. This method can yield model complex structure in addition to accurate prediction of CBPs. Furthermore, it has been observed that similarly accurate predictions can be made using structures from homology modeling, which has significantly expanded its applicability. The other method, SPRINT-CBH, is a de novo approach that predicts binding residues directly from protein sequences by using sequence information and predicted structural properties. This approach does not need structurally similar templates and thus is not limited by the current database of known protein-carbohydrate complex structures. These two complementary methods are available at https://sparks-lab.org. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Huiying Zhao
- Sun Yat-Sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Gold Coast, Queensland, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Gold Coast, Queensland, Australia.,Institute for Glycomics, Griffith University, Gold Coast, Queensland, Australia
| | - Yuedong Yang
- School of Information and Communication Technology, Griffith University, Gold Coast, Queensland, Australia.,Institute for Glycomics, Griffith University, Gold Coast, Queensland, Australia.,School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
10
|
Jin T, Brefo-Mensah E, Fan W, Zeng W, Li Y, Zhang Y, Palmer M. Crystal structure of the Streptococcus agalactiae CAMP factor provides insights into its membrane-permeabilizing activity. J Biol Chem 2018; 293:11867-11877. [PMID: 29884770 DOI: 10.1074/jbc.ra118.002336] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 05/30/2018] [Indexed: 11/06/2022] Open
Abstract
Streptococcus agalactiae is an important human opportunistic pathogen that can cause serious health problems, particularly among newborns and older individuals. S. agalactiae contains the CAMP factor, a pore-forming toxin first identified in this bacterium. The CAMP reaction is based on the co-hemolytic activity of the CAMP factor and is commonly used to identify S. agalactiae in the clinic. Closely related proteins are present also in other Gram-positive pathogens. Although the CAMP toxin was discovered more than a half century ago, no structure from this toxin family has been reported, and the mechanism of action of this toxin remains unclear. Here, we report the first structure of this toxin family, revealing a structural fold composed of 5 + 3-helix bundles. Further analysis by protein truncation and site-directed mutagenesis indicated that the N-terminal 5-helix bundle is responsible for membrane permeabilization, whereas the C-terminal 3-helix bundle is likely responsible for host receptor binding. Interestingly, the C-terminal domain inhibited the activity of both full-length toxin and its N-terminal domain. Moreover, we observed that the linker region is highly conserved and has a conserved DLXXXDXAT sequence motif. Structurally, this linker region extensively interacted with both terminal CAMP factor domains, and mutagenesis disclosed that the conserved sequence motif is required for CAMP factor's co-hemolytic activity. In conclusion, our results reveal a unique structure of this bacterial toxin and help clarify the molecular mechanism of its co-hemolytic activity.
Collapse
Affiliation(s)
- Tengchuan Jin
- From the Hefei National Laboratory for Physical Sciences at Microscale, CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Life Sciences and Medical Center, University of Science and Technology of China, Hefei, Anhui 230027, China,
| | - Eric Brefo-Mensah
- the Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Weirong Fan
- Shanghai Jiao Tong University Affiliated Sixth People's Hospital South Campus, Shanghai 201400, China, and
| | - Weihong Zeng
- From the Hefei National Laboratory for Physical Sciences at Microscale, CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Life Sciences and Medical Center, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Yajuan Li
- From the Hefei National Laboratory for Physical Sciences at Microscale, CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Life Sciences and Medical Center, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Yuzhu Zhang
- the Healthy Processed Foods Research Unit, United States Department of Agriculture Agricultural Research Service, Western Regional Research Center, Albany, California 94706
| | - Michael Palmer
- the Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
11
|
Taherzadeh G, Zhou Y, Liew AWC, Yang Y. Structure-based prediction of protein– peptide binding regions using Random Forest. Bioinformatics 2017; 34:477-484. [DOI: 10.1093/bioinformatics/btx614] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 09/25/2017] [Indexed: 11/12/2022] Open
Affiliation(s)
- Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
- Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD, Australia
| | - Alan Wee-Chung Liew
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
| | - Yuedong Yang
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
- Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD, Australia
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
12
|
Insights into the effects of glycosylation and the monosaccharide-binding activity of the plant lectin CrataBL. Glycoconj J 2017; 34:515-522. [PMID: 28299519 DOI: 10.1007/s10719-017-9766-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Revised: 03/03/2017] [Accepted: 03/07/2017] [Indexed: 10/20/2022]
Abstract
CrataBL is a glycoprotein isolated from Crataeva tapia bark, containing two N-glycosylation sites. It has been identified to present lectin activity with some specificity for binding glucose over galactose. However, to date, no information on the effects of glycosylation or CrataBL monosaccharide-binding sites and monosaccharide specificity has been obtained. Thus, molecular docking and molecular dynamics simulations were employed to characterize the glycosylated CrataBL conformation and dynamics in aqueous solutions, as well as the molecular basis for its binding specificity. The obtained results indicate both local and distant conformational stabilization effects of N-linked glycans over CrataBL protein moiety. Regarding its lectin activity, molecular docking calculations were performed in two possible binding sites, identified through sequence-based, structure-based and evolutionary information, using α- and β-anomeric states of the monosaccharides. The obtained poses were further refined through molecular dynamics simulations, suggesting that positively-charged amino acids dictate the binding preference for glucose over galactose in both sites. In addition, a possible preference for β-monosaccharides was proposed. Such data are expected to contribute to a better comprehension of the lectins monosaccharide-binding activities and carbohydrate-binding site structures.
Collapse
|
13
|
Cao H, Wei D, Yang Y, Shang Y, Li G, Zhou Y, Ma Q, Xu Y. Systems-level understanding of ethanol-induced stresses and adaptation in E. coli. Sci Rep 2017; 7:44150. [PMID: 28300180 PMCID: PMC5353561 DOI: 10.1038/srep44150] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 02/02/2017] [Indexed: 01/10/2023] Open
Abstract
Understanding ethanol-induced stresses and responses in biofuel-producing bacteria at systems level has significant implications in engineering more efficient biofuel producers. We present a computational study of transcriptomic and genomic data of both ethanol-stressed and ethanol-adapted E. coli cells with computationally predicated ethanol-binding proteins and experimentally identified ethanol tolerance genes. Our analysis suggests: (1) ethanol damages cell wall and membrane integrity, causing increased stresses, particularly reactive oxygen species, which damages DNA and reduces the O2 level; (2) decreased cross-membrane proton gradient from membrane damage, coupled with hypoxia, leads to reduced ATP production by aerobic respiration, driving cells to rely more on fatty acid oxidation, anaerobic respiration and fermentation for ATP production; (3) the reduced ATP generation results in substantially decreased synthesis of macromolecules; (4) ethanol can directly bind 213 proteins including transcription factors, altering their functions; (5) all these changes together induce multiple stress responses, reduced biosynthesis, cell viability and growth; and (6) ethanol-adapted E. coli cells restore the majority of these reduced activities through selection of specific genomic mutations and alteration of stress responses, ultimately restoring normal ATP production, macromolecule biosynthesis, and growth. These new insights into the energy and mass balance will inform design of more ethanol-tolerant strains.
Collapse
Affiliation(s)
- Huansheng Cao
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, the University of Georgia, Athens, GA 30602, USA
- BioEnergy Science Center, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Du Wei
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, the University of Georgia, Athens, GA 30602, USA
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Yuedong Yang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr., Southport, QLD 4222, Australia
| | - Yu Shang
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, the University of Georgia, Athens, GA 30602, USA
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Gaoyang Li
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, the University of Georgia, Athens, GA 30602, USA
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr., Southport, QLD 4222, Australia
| | - Qin Ma
- Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD 57007, USA
- BioSNTR, Brookings, SD, 57007, USA
| | - Ying Xu
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, the University of Georgia, Athens, GA 30602, USA
- BioEnergy Science Center, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| |
Collapse
|
14
|
Yang Y, Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Zhou Y. SPIDER2: A Package to Predict Secondary Structure, Accessible Surface Area, and Main-Chain Torsional Angles by Deep Neural Networks. Methods Mol Biol 2017; 1484:55-63. [PMID: 27787820 DOI: 10.1007/978-1-4939-6406-2_6] [Citation(s) in RCA: 101] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Predicting one-dimensional structure properties has played an important role to improve prediction of protein three-dimensional structures and functions. The most commonly predicted properties are secondary structure and accessible surface area (ASA) representing local and nonlocal structural characteristics, respectively. Secondary structure prediction is further complemented by prediction of continuous main-chain torsional angles. Here we describe a newly developed method SPIDER2 that utilizes three iterations of deep learning neural networks to improve the prediction accuracy of several structural properties simultaneously. For an independent test set of 1199 proteins SPIDER2 achieves 82 % accuracy for secondary structure prediction, 0.76 for the correlation coefficient between predicted and actual solvent accessible surface area, 19° and 30° for mean absolute errors of backbone φ and ψ angles, respectively, and 8° and 32° for mean absolute errors of Cα-based θ and τ angles, respectively. The method provides state-of-the-art, all-in-one accurate prediction of local structure and solvent accessible surface area. The method is implemented, as a webserver along with a standalone package that are available in our website: http://sparks-lab.org .
Collapse
Affiliation(s)
- Yuedong Yang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast Campus, Science 1 (G24) 2.10, Parklands Drive, Southport, QLD, 4222, Australia
| | - Rhys Heffernan
- Signal Processing Laboratory, School of Engineering, Griffith University, Brisbane, QLD, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering, Griffith University, Brisbane, QLD, Australia
| | - James Lyons
- Signal Processing Laboratory, School of Engineering, Griffith University, Brisbane, QLD, Australia
| | - Abdollah Dehzangi
- Department of Psychiatry, Medical Research Center, University of Iowa, Iowa City, IA, USA
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, Australia
- School of Engineering and Physics, University of the South Pacific, Private Mail Bag, Laucala Campus, Suva, Fiji
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Dezhou University, Dezhou, Shandong, China
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, Australia
- National ICT Australia (NICTA), Brisbane, QLD, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast Campus, Science 1 (G24) 2.10, Parklands Drive, Southport, QLD, 4222, Australia.
| |
Collapse
|
15
|
Banno M, Komiyama Y, Cao W, Oku Y, Ueki K, Sumikoshi K, Nakamura S, Terada T, Shimizu K. Development of a sugar-binding residue prediction system from protein sequences using support vector machine. Comput Biol Chem 2016; 66:36-43. [PMID: 27889654 DOI: 10.1016/j.compbiolchem.2016.10.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Revised: 10/05/2016] [Accepted: 10/23/2016] [Indexed: 11/16/2022]
Abstract
Several methods have been proposed for protein-sugar binding site prediction using machine learning algorithms. However, they are not effective to learn various properties of binding site residues caused by various interactions between proteins and sugars. In this study, we classified sugars into acidic and nonacidic sugars and showed that their binding sites have different amino acid occurrence frequencies. By using this result, we developed sugar-binding residue predictors dedicated to the two classes of sugars: an acid sugar binding predictor and a nonacidic sugar binding predictor. We also developed a combination predictor which combines the results of the two predictors. We showed that when a sugar is known to be an acidic sugar, the acidic sugar binding predictor achieves the best performance, and showed that when a sugar is known to be a nonacidic sugar or is not known to be either of the two classes, the combination predictor achieves the best performance. Our method uses only amino acid sequences for prediction. Support vector machine was used as a machine learning algorithm and the position-specific scoring matrix created by the position-specific iterative basic local alignment search tool was used as the feature vector. We evaluated the performance of the predictors using five-fold cross-validation. We have launched our system, as an open source freeware tool on the GitHub repository (https://doi.org/10.5281/zenodo.61513).
Collapse
Affiliation(s)
- Masaki Banno
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Yusuke Komiyama
- Digital Content and Media Sciences Research Division, National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-Ward, Tokyo 101-8430, Japan
| | - Wei Cao
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Yuya Oku
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Kokoro Ueki
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Kazuya Sumikoshi
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Shugo Nakamura
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Tohru Terada
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Kentaro Shimizu
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan.
| |
Collapse
|
16
|
Taherzadeh G, Zhou Y, Liew AWC, Yang Y. Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines. J Chem Inf Model 2016; 56:2115-2122. [PMID: 27623166 DOI: 10.1021/acs.jcim.6b00320] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Carbohydrate-binding proteins play significant roles in many diseases including cancer. Here, we established a machine-learning-based method (called sequence-based prediction of residue-level interaction sites of carbohydrates, SPRINT-CBH) to predict carbohydrate-binding sites in proteins using support vector machines (SVMs). We found that integrating evolution-derived sequence profiles with additional information on sequence and predicted solvent accessible surface area leads to a reasonably accurate, robust, and predictive method, with area under receiver operating characteristic curve (AUC) of 0.78 and 0.77 and Matthew's correlation coefficient of 0.34 and 0.29, respectively for 10-fold cross validation and independent test without balancing binding and nonbinding residues. The quality of the method is further demonstrated by having statistically significantly more binding residues predicted for carbohydrate-binding proteins than presumptive nonbinding proteins in the human proteome, and by the bias of rare alleles toward predicted carbohydrate-binding sites for nonsynonymous mutations from the 1000 genome project. SPRINT-CBH is available as an online server at http://sparks-lab.org/server/SPRINT-CBH .
Collapse
Affiliation(s)
- Ghazaleh Taherzadeh
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Alan Wee-Chung Liew
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Yuedong Yang
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| |
Collapse
|
17
|
Yang Y, Zhan J, Zhou Y. SPOT‐Ligand: Fast and effective structure‐based virtual screening by binding homology search according to ligand and receptor similarity. J Comput Chem 2016; 37:1734-9. [DOI: 10.1002/jcc.24380] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2015] [Revised: 01/12/2016] [Accepted: 03/05/2016] [Indexed: 12/11/2022]
Affiliation(s)
- Yuedong Yang
- Institute for Glycomics and School of Information and Communication TechnologyGriffith UniversityParklands DrSouthport QLD4222 Australia
| | - Jian Zhan
- Institute for Glycomics and School of Information and Communication TechnologyGriffith UniversityParklands DrSouthport QLD4222 Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication TechnologyGriffith UniversityParklands DrSouthport QLD4222 Australia
| |
Collapse
|
18
|
Nahalka J, Hrabarova E, Talafova K. Protein-RNA and protein-glycan recognitions in light of amino acid codes. Biochim Biophys Acta Gen Subj 2015; 1850:1942-52. [PMID: 26145579 DOI: 10.1016/j.bbagen.2015.06.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Revised: 06/18/2015] [Accepted: 06/22/2015] [Indexed: 12/13/2022]
Abstract
BACKGROUND RNA-binding proteins, in cooperation with non-coding RNAs, play important roles in post-transcriptional regulation. Non-coding micro-RNAs control information flow from the genome to the glycome by interacting with glycan-synthesis enzymes. Glycan-binding proteins read the cell surface and cytoplasmic glycome and transfer signals back to the nucleus. The profiling of the protein-RNA and protein-glycan interactomes is of significant medicinal importance. SCOPE OF REVIEW This review discusses the state-of-the-art research in the protein-RNA and protein-glycan recognition fields and proposes the application of amino acid codes in profiling and programming the interactomes. MAJOR CONCLUSIONS The deciphered PUF-RNA and PPR-RNA amino acid recognition codes can be explained by the protein-RNA amino acid recognition hypothesis based on the genetic code. The tripartite amino acid code is also involved in protein-glycan interactions. At present, the results indicate that a system of four codons ("gnc", where n=g - guanine, c - cytosine, u - uracil or a - adenine) and four amino acids (G - glycine, A - alanine, V - valine, D - aspartic acid) could be the original genetic code that imprinted "rules" into both recognition processes. GENERAL SIGNIFICANCE Amino acid recognition codes have provocative potential in the profiling and programming of the protein-RNA and protein-glycan interactomes. The profiling and even programming of the interactomes will play significant roles in diagnostics and the development of therapeutic procedures against cancer and neurodegenerative, developmental and other diseases.
Collapse
Affiliation(s)
- Jozef Nahalka
- Institute of Chemistry, Centre for Glycomics, Slovak Academy of Sciences, Dubravska cesta 9, SK-84538 Bratislava, Slovak Republic; Institute of Chemistry, Centre of Excellence for White-green Biotechnology, Slovak Academy of Sciences, Trieda Andreja Hlinku 2, SK-94976 Nitra, Slovak Republic.
| | - Eva Hrabarova
- Institute of Chemistry, Centre for Glycomics, Slovak Academy of Sciences, Dubravska cesta 9, SK-84538 Bratislava, Slovak Republic; Institute of Chemistry, Centre of Excellence for White-green Biotechnology, Slovak Academy of Sciences, Trieda Andreja Hlinku 2, SK-94976 Nitra, Slovak Republic
| | - Klaudia Talafova
- Institute of Chemistry, Centre for Glycomics, Slovak Academy of Sciences, Dubravska cesta 9, SK-84538 Bratislava, Slovak Republic; Institute of Chemistry, Centre of Excellence for White-green Biotechnology, Slovak Academy of Sciences, Trieda Andreja Hlinku 2, SK-94976 Nitra, Slovak Republic
| |
Collapse
|