1
|
Dixit R, Khambhati K, Supraja KV, Singh V, Lederer F, Show PL, Awasthi MK, Sharma A, Jain R. Application of machine learning on understanding biomolecule interactions in cellular machinery. BIORESOURCE TECHNOLOGY 2023; 370:128522. [PMID: 36565819 DOI: 10.1016/j.biortech.2022.128522] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/17/2022] [Accepted: 12/20/2022] [Indexed: 06/17/2023]
Abstract
Machine learning (ML) applications have become ubiquitous in all fields of research including protein science and engineering. Apart from protein structure and mutation prediction, scientists are focusing on knowledge gaps with respect to the molecular mechanisms involved in protein binding and interactions with other components in the experimental setups or the human body. Researchers are working on several wet-lab techniques and generating data for a better understanding of concepts and mechanics involved. The information like biomolecular structure, binding affinities, structure fluctuations and movements are enormous which can be handled and analyzed by ML. Therefore, this review highlights the significance of ML in understanding the biomolecular interactions while assisting in various fields of research such as drug discovery, nanomedicine, nanotoxicity and material science. Hence, the way ahead would be to force hand-in hand of laboratory work and computational techniques.
Collapse
Affiliation(s)
- Rewati Dixit
- Waste Treatment Laboratory, Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Haus-khas, New Delhi 110016, India
| | - Khushal Khambhati
- Department of Biosciences, School of Science, Indrashil University, Rajpur, Mehsana 382715, Gujarat, India
| | - Kolli Venkata Supraja
- Waste Treatment Laboratory, Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Haus-khas, New Delhi 110016, India
| | - Vijai Singh
- Department of Biosciences, School of Science, Indrashil University, Rajpur, Mehsana 382715, Gujarat, India
| | - Franziska Lederer
- Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz Institute Freiberg for Resource Technology, Bautzner landstrasse 400, 01328 Dresden, Germany
| | - Pau-Loke Show
- Zhejiang Provincial Key Laboratory for Subtropical Water Environment and Marine Biological Resources Protection, Wenzhou University, Wenzhou 325035, China; Department of Sustainable Engineering, Saveetha School of Engineering, SIMATS, Chennai 602105, India; Department of Chemical and Environmental Engineering, University of Nottingham, Malaysia, 43500 Semenyih, Selangor Darul Ehsan, Malaysia
| | - Mukesh Kumar Awasthi
- College of Natural Resources and Environment, Northwest A&F University, Yangling 712100, China
| | - Abhinav Sharma
- Institute Theory of Polymers, Leibniz Institute for Polymer Research, Hohe Strasse 6, 01069 Dresden, Germany
| | - Rohan Jain
- Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz Institute Freiberg for Resource Technology, Bautzner landstrasse 400, 01328 Dresden, Germany.
| |
Collapse
|
2
|
Saeed A, Rafiq Z, Imran M, Saeed Q, Saeed MQ, Ali Z, Iqbal RK, Hussain S, Khaliq B, Mehmood S, Akrem A. In-silico Studies Calculated a New Chitin Oligomer Binding Site Inside Vicilin: A Potent Antifungal and Insecticidal Agent. Dose Response 2022; 20:15593258221108280. [PMID: 35734395 PMCID: PMC9208065 DOI: 10.1177/15593258221108280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 06/01/2022] [Indexed: 11/20/2022] Open
Abstract
Vicilins are major seed storage proteins and show differential binding affinities toward sugar moieties of fungal cell wall and insect gut epithelium. Hence, purpose of study is the thorough in-silico characterization of interactions between vicilin and chitin oligomer followed by fungal and insecticidal bioassays. This work covers the molecular simulation studies explaining the interactions between Pisum sativum vicilin (PsV) and chitin oligomer followed by protein bioassay against different pathogens. LC-MS/MS of purified PsV (∼50 kDa) generated residual data along high pea vicilin homology (UniProtKB ID; P13918). Predicted model (PsV) indicated the characteristic homotrimer joined through head-to-tail association and each monomer is containing a bicupin domain. PsV site map analysis showed a new site (Site 4) into which molecular docking confirmed the strong binding of chitin oligomer (GlcNAc)4. Molecular dynamics simulation data (50 ns) indicated that chitin-binding site was comprised of 8 residues (DKEDRNEN). However, aspartate and glutamate significantly contributed in the stability of ligand binding. Computational findings were further verified via significant growth inhibition of Aspergillus flavus, A. niger, and Fusarium oxysporum against PsV. Additionally, the substantial adult population of Brevicoryne brassicae was reduced and different life stages of Tribolium castaneum also showed significant mortality.
Collapse
Affiliation(s)
- Ahsan Saeed
- Botany Division, Institute of Pure and Applied Biology, Bahauddin Zakariya University, Multan, Pakistan
| | - Zahra Rafiq
- Botany Division, Institute of Pure and Applied Biology, Bahauddin Zakariya University, Multan, Pakistan
| | - Muhammad Imran
- Forman Christian College (A Chartered University), Lahore, Pakistan
| | - Qamar Saeed
- Department of Entomology, Bahauddin Zakariya University, Multan, Pakistan
| | - Muhammad Q Saeed
- Department of Microbiology, Institute of Pure and Applied Biology, Bahauddin Zakariya University, Multan, Pakistan
| | - Zahid Ali
- Department of Biosciences, Plant Biotechnology and Molecular Pharming Lab, COMSATS University, Islamabad, Pakistan
| | - Rana K Iqbal
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, Pakistan
| | - Saber Hussain
- Botany Division, Institute of Pure and Applied Biology, Bahauddin Zakariya University, Multan, Pakistan
| | - Binish Khaliq
- Department of Botany, University of Okara, Okara, Pakistan
| | - Sohaib Mehmood
- Botany Division, Institute of Pure and Applied Biology, Bahauddin Zakariya University, Multan, Pakistan
| | - Ahmed Akrem
- Botany Division, Institute of Pure and Applied Biology, Bahauddin Zakariya University, Multan, Pakistan
| |
Collapse
|
3
|
André C, Guillaume YC. Development of nano affinity columns for the study of ligand (including SARS-CoV-2 related proteins) binding to heparan sulfate proteoglycans. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2021; 13:3050-3058. [PMID: 34132262 DOI: 10.1039/d1ay00506e] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The interactions of heparan sulfate proteoglycans (HSPGs) present on the cell surface with target proteins lead to cell signaling and they are considered as viral receptors. The analysis of the recognition mechanism between HSPG and its potential ligands and high-throughput screening in drug discovery thus remain important challenges. Glycidyl methacrylate-based monoliths were thus prepared in situ in miniaturized capillary columns (internal diameter 75 μm) and HSPG was grafted onto them by the use of the Schiff base method. The quantity of grafted HSPG was in the nanogram range (11 nanograms per cm of capillary length). This is of significant importance when working with less available or expensive biological material. Other advantages of our miniaturized capillary column are as follows: (i) the immobilization process of HSPG onto the organic monolithic support was reliable and reproducible. (ii) The resultant affinity capillary column showed a strong resistance to changes in temperature and pH and a negligible non-specific interaction. So as to confirm the proper functioning of our miniaturized capillary column, the molecular recognition by HSPG of five selected compounds including three ligands of interest related to SARS-CoV-2 was studied.
Collapse
Affiliation(s)
- Claire André
- Univ Franche - Comté, F-25000 Besançon, France. and EA481 Neurosciences Intégratives et Cliniques/Pôle Chimie Analytique Bioanalytique et Physique (PCABP), F-25000 Besançon, France and CHRU Besançon, Pôle Pharmaceutique, F-25000 Besançon, France
| | - Yves Claude Guillaume
- Univ Franche - Comté, F-25000 Besançon, France. and EA481 Neurosciences Intégratives et Cliniques/Pôle Chimie Analytique Bioanalytique et Physique (PCABP), F-25000 Besançon, France and CHRU Besançon, Pôle Pharmaceutique, F-25000 Besançon, France
| |
Collapse
|
4
|
David L, Thakkar A, Mercado R, Engkvist O. Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminform 2020; 12:56. [PMID: 33431035 PMCID: PMC7495975 DOI: 10.1186/s13321-020-00460-5] [Citation(s) in RCA: 165] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 09/05/2020] [Indexed: 02/08/2023] Open
Abstract
The technological advances of the past century, marked by the computer revolution and the advent of high-throughput screening technologies in drug discovery, opened the path to the computational analysis and visualization of bioactive molecules. For this purpose, it became necessary to represent molecules in a syntax that would be readable by computers and understandable by scientists of various fields. A large number of chemical representations have been developed over the years, their numerosity being due to the fast development of computers and the complexity of producing a representation that encompasses all structural and chemical characteristics. We present here some of the most popular electronic molecular and macromolecular representations used in drug discovery, many of which are based on graph representations. Furthermore, we describe applications of these representations in AI-driven drug discovery. Our aim is to provide a brief guide on structural representations that are essential to the practice of AI in drug discovery. This review serves as a guide for researchers who have little experience with the handling of chemical representations and plan to work on applications at the interface of these fields.
Collapse
Affiliation(s)
- Laurianne David
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden.
| | - Amol Thakkar
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
- Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
| | - Rocío Mercado
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
| |
Collapse
|
5
|
Harvey CM, O'Toole KH, Liu C, Mariano P, Dunaway-Mariano D, Allen KN. Structural Analysis of Binding Determinants of Salmonella typhimurium Trehalose-6-phosphate Phosphatase Using Ground-State Complexes. Biochemistry 2020; 59:3247-3257. [PMID: 32786412 DOI: 10.1021/acs.biochem.0c00317] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Trehalose-6-phosphate phosphatase (T6PP) catalyzes the dephosphorylation of trehalose 6-phosphate (T6P) to the disaccharide trehalose. The enzyme is not present in mammals but is essential to the viability of multiple lower organisms as trehalose is a critical metabolite, and T6P accumulation is toxic. Hence, T6PP is a target for therapeutics of human pathologies caused by bacteria, fungi, and parasitic nematodes. Here, we report the X-ray crystal structures of Salmonella typhimurium T6PP (StT6PP) in its apo form and in complex with the cofactor Mg2+ and the substrate analogue trehalose 6-sulfate (T6S), the product trehalose, or the competitive inhibitor 4-n-octylphenyl α-d-glucopyranoside 6-sulfate (OGS). OGS replaces the substrate phosphoryl group with a sulfate group and the glucosyl ring distal to the sulfate group with an octylphenyl moiety. The structures of these substrate-analogue and product complexes with T6PP show that specificity is conferred via hydrogen bonds to the glucosyl group proximal to the phosphoryl moiety through Glu123, Lys125, and Glu167, conserved in T6PPs from multiple species. The structure of the first-generation inhibitor OGS shows that it retains the substrate-binding interactions observed for the sulfate group and the proximal glucosyl ring. The OGS octylphenyl moiety binds in a unique manner, indicating that this subsite can tolerate various chemotypes. Together, these findings show that these conserved interactions at the proximal glucosyl ring binding site could provide the basis for the development of broad-spectrum therapeutics, whereas variable interactions at the divergent distal subsite could present an opportunity for the design of potent organism-specific therapeutics.
Collapse
Affiliation(s)
- Christine M Harvey
- Department of Chemistry, Boston University, Boston, Massachusetts 02215, United States
| | - Katherine H O'Toole
- Department of Chemistry, Boston University, Boston, Massachusetts 02215, United States
| | - Chunliang Liu
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, New Mexico 87131, United States
| | - Patrick Mariano
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, New Mexico 87131, United States
| | - Debra Dunaway-Mariano
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, New Mexico 87131, United States
| | - Karen N Allen
- Department of Chemistry, Boston University, Boston, Massachusetts 02215, United States
| |
Collapse
|
6
|
Miranda MRA, Uchôa AF, Ferreira SR, Ventury KE, Costa EP, Carmo PRL, Machado OLT, Fernandes KVS, Amancio Oliveira AE. Chemical Modifications of Vicilins Interfere with Chitin-Binding Affinity and Toxicity to Callosobruchus maculatus (Coleoptera: Chrysomelidae) Insect: A Combined In Vitro and In Silico Analysis. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2020; 68:5596-5605. [PMID: 32343573 DOI: 10.1021/acs.jafc.9b08034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Vicilins are related to cowpea seed resistance toward Callosobruchus maculatus due to their ability to bind to chitinous structures lining larval midgut. However, this binding mechanism is not fully understood. Here, we identified chitin binding sites and investigated how in vitro and in silico chemical modifications interfere with vicilin chitin binding and insect toxicity. In vitro assays showed that unmodified vicilin strongly binds to chitin matrices, mainly with acetylated chitin. Chemical modifications of specific amino acids (tryptophan, lysine, tyrosine), as well as glutaraldehyde cross-linking, decreased the evaluated parameters. In silico analyses identified at least one chitin binding site in vicilin monomer, the region between Arg208 and Lys216, which bears the sequence REGIRELMK and forms an α helix, exposed in the 3D structure. In silico modifications of Lys223 (acetylated at its terminal nitrogen) and Trp316 (iodinated to 7-iodine-L-tryptophan or oxidized to β-oxy-indolylalanine) decreased vicilin chitin binding affinity. Glucose, sucrose, and N-acetylglucosamine also interfered with vicilin chitin binding affinity.
Collapse
Affiliation(s)
- Maria Raquel A Miranda
- Departamento de Bioquímica, Centro de Ciências, Universidade Federal do Ceará (UFC), Fortaleza Ceará 60440554, Brazil
| | - Adriana F Uchôa
- Departamento de Biologia Celular e Genética, Centro de Biociências, Universidade Federal do Rio Grande do Norte, Natal, Rio Grande do Norte 59072970, Brazil
| | - Sarah R Ferreira
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Kayan E Ventury
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Evenilton P Costa
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Paulo R Leitão Carmo
- NUPEN, Universidade Federal do Rio de Janeiro (UFRJ) Macaé, Rio de Janeiro 27965-045, Brazil
| | - Olga L T Machado
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Katia V S Fernandes
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| | - Antonia Elenir Amancio Oliveira
- Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Rio de Janeiro 28013-602, Brazil
| |
Collapse
|
7
|
Copoiu L, Malhotra S. The current structural glycome landscape and emerging technologies. Curr Opin Struct Biol 2020; 62:132-139. [PMID: 32006784 DOI: 10.1016/j.sbi.2019.12.020] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 12/23/2019] [Accepted: 12/24/2019] [Indexed: 11/19/2022]
Abstract
Carbohydrates represent one of the building blocks of life, along with nucleic acids, proteins and lipids. Although glycans are involved in a wide range of processes from embryogenesis to protein trafficking and pathogen infection, we are still a long way from deciphering the glycocode. In this review, we aim to present a few of the challenges that researchers working in the area of glycobiology can encounter and what strategies can be utilised to overcome them. Our goal is to paint a comprehensive picture of the current saccharide landscape available in the Protein Data Bank (PDB). We also review recently updated repositories relevant to the topic proposed, the impact of software development on strategies to structurally solve carbohydrate moieties, and state-of-the-art molecular and cellular biology methods that can shed some light on the function and structure of glycans.
Collapse
Affiliation(s)
- Liviu Copoiu
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, United Kingdom
| | - Sony Malhotra
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck College, University of London, Malet Street, London WC1E 7HX, United Kingdom.
| |
Collapse
|
8
|
Glucose-induced structural changes and anomalous diffusion of elastin. Colloids Surf B Biointerfaces 2020; 188:110776. [PMID: 31945631 DOI: 10.1016/j.colsurfb.2020.110776] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 12/18/2019] [Accepted: 01/04/2020] [Indexed: 01/31/2023]
Abstract
Elastin is the principal protein component of elastic fiber, which renders essential elasticity to connective tissues and organs. Here, we adopted a multi-technique approach to study the transport, viscoelastic, and structural properties of elastin exposed to various glucose concentrations (X=[gluc]/[elastin]). Laser light scattering experiments revealed an anomalous behavior (anomaly exponent, β <0.6) of elastin. In this regime (β <0.6), the diffusion constant decreases by 40% in the presence of glucose (X> 10), which suggests the structural change in elastin. We have observed a peculiar inverse temperature transition of elastin protein, which is a measure of structural change, at 40 °C through rheology experiments. Moreover, we observe its shift towards lower temperature with a higher X. FTIR revealed that the presence of glucose (X < 10) favors the formation of β-sheet structure in elastin. However, for X > 10, dominative crowding effect reduces the mobility of protein and favors the increase in β-turns and γ-turns by 25 ± 1% over the β-sheet (β-sheet decreases by 12 ± 0.8%) and α-helix (α-helix decreases by 13 ± 0.8%). The stiffness of protein is estimated through Flory characteristic ratio, C∞ and found to be increasing with X. These glucose-based structural changes in the elastin may explain the role of glucose in age-related issues of the skin.
Collapse
|
9
|
Gattani S, Mishra A, Hoque MT. StackCBPred: A stacking based prediction of protein-carbohydrate binding sites from sequence. Carbohydr Res 2019; 486:107857. [DOI: 10.1016/j.carres.2019.107857] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 10/05/2019] [Accepted: 10/23/2019] [Indexed: 11/26/2022]
|
10
|
Zhao H, Taherzadeh G, Zhou Y, Yang Y. Computational Prediction of Carbohydrate-Binding Proteins and Binding Sites. ACTA ACUST UNITED AC 2018; 94:e75. [PMID: 30106511 DOI: 10.1002/cpps.75] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Protein-carbohydrate interaction is essential for biological systems, and carbohydrate-binding proteins (CBPs) are important targets when designing antiviral and anticancer drugs. Due to the high cost and difficulty associated with experimental approaches, many computational methods have been developed as complementary approaches to predict CBPs or carbohydrate-binding sites. However, most of these computational methods are not publicly available. Here, we provide a comprehensive review of related studies and demonstrate our two recently developed bioinformatics methods. The method SPOT-CBP is a template-based method for detecting CBPs based on structure through structural homology search combined with a knowledge-based scoring function. This method can yield model complex structure in addition to accurate prediction of CBPs. Furthermore, it has been observed that similarly accurate predictions can be made using structures from homology modeling, which has significantly expanded its applicability. The other method, SPRINT-CBH, is a de novo approach that predicts binding residues directly from protein sequences by using sequence information and predicted structural properties. This approach does not need structurally similar templates and thus is not limited by the current database of known protein-carbohydrate complex structures. These two complementary methods are available at https://sparks-lab.org. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Huiying Zhao
- Sun Yat-Sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Gold Coast, Queensland, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Gold Coast, Queensland, Australia.,Institute for Glycomics, Griffith University, Gold Coast, Queensland, Australia
| | - Yuedong Yang
- School of Information and Communication Technology, Griffith University, Gold Coast, Queensland, Australia.,Institute for Glycomics, Griffith University, Gold Coast, Queensland, Australia.,School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
11
|
Banno M, Komiyama Y, Cao W, Oku Y, Ueki K, Sumikoshi K, Nakamura S, Terada T, Shimizu K. Development of a sugar-binding residue prediction system from protein sequences using support vector machine. Comput Biol Chem 2016; 66:36-43. [PMID: 27889654 DOI: 10.1016/j.compbiolchem.2016.10.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Revised: 10/05/2016] [Accepted: 10/23/2016] [Indexed: 11/16/2022]
Abstract
Several methods have been proposed for protein-sugar binding site prediction using machine learning algorithms. However, they are not effective to learn various properties of binding site residues caused by various interactions between proteins and sugars. In this study, we classified sugars into acidic and nonacidic sugars and showed that their binding sites have different amino acid occurrence frequencies. By using this result, we developed sugar-binding residue predictors dedicated to the two classes of sugars: an acid sugar binding predictor and a nonacidic sugar binding predictor. We also developed a combination predictor which combines the results of the two predictors. We showed that when a sugar is known to be an acidic sugar, the acidic sugar binding predictor achieves the best performance, and showed that when a sugar is known to be a nonacidic sugar or is not known to be either of the two classes, the combination predictor achieves the best performance. Our method uses only amino acid sequences for prediction. Support vector machine was used as a machine learning algorithm and the position-specific scoring matrix created by the position-specific iterative basic local alignment search tool was used as the feature vector. We evaluated the performance of the predictors using five-fold cross-validation. We have launched our system, as an open source freeware tool on the GitHub repository (https://doi.org/10.5281/zenodo.61513).
Collapse
Affiliation(s)
- Masaki Banno
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Yusuke Komiyama
- Digital Content and Media Sciences Research Division, National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-Ward, Tokyo 101-8430, Japan
| | - Wei Cao
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Yuya Oku
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Kokoro Ueki
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Kazuya Sumikoshi
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Shugo Nakamura
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Tohru Terada
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan
| | - Kentaro Shimizu
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan.
| |
Collapse
|
12
|
Taherzadeh G, Zhou Y, Liew AWC, Yang Y. Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines. J Chem Inf Model 2016; 56:2115-2122. [PMID: 27623166 DOI: 10.1021/acs.jcim.6b00320] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Carbohydrate-binding proteins play significant roles in many diseases including cancer. Here, we established a machine-learning-based method (called sequence-based prediction of residue-level interaction sites of carbohydrates, SPRINT-CBH) to predict carbohydrate-binding sites in proteins using support vector machines (SVMs). We found that integrating evolution-derived sequence profiles with additional information on sequence and predicted solvent accessible surface area leads to a reasonably accurate, robust, and predictive method, with area under receiver operating characteristic curve (AUC) of 0.78 and 0.77 and Matthew's correlation coefficient of 0.34 and 0.29, respectively for 10-fold cross validation and independent test without balancing binding and nonbinding residues. The quality of the method is further demonstrated by having statistically significantly more binding residues predicted for carbohydrate-binding proteins than presumptive nonbinding proteins in the human proteome, and by the bias of rare alleles toward predicted carbohydrate-binding sites for nonsynonymous mutations from the 1000 genome project. SPRINT-CBH is available as an online server at http://sparks-lab.org/server/SPRINT-CBH .
Collapse
Affiliation(s)
- Ghazaleh Taherzadeh
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Alan Wee-Chung Liew
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Yuedong Yang
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| |
Collapse
|
13
|
Ranganarayanan P, Thanigesan N, Ananth V, Jayaraman VK, Ramakrishnan V. Identification of Glucose-Binding Pockets in Human Serum Albumin Using Support Vector Machine and Molecular Dynamics Simulations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:148-157. [PMID: 26886739 DOI: 10.1109/tcbb.2015.2415806] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Human Serum Albumin (HSA) has been suggested to be an alternate biomarker to the existing Hemoglobin-A1c (HbA1c) marker for glycemic monitoring. Development and usage of HSA as an alternate biomarker requires the identification of glycation sites, or equivalently, glucose-binding pockets. In this work, we combine molecular dynamics simulations of HSA and the state-of-art machine learning method Support Vector Machine (SVM) to predict glucose-binding pockets in HSA. SVM uses the three dimensional arrangement of atoms and their chemical properties to predict glucose-binding ability of a pocket. Feature selection reveals that the arrangement of atoms and their chemical properties within the first 4Å from the centroid of the pocket play an important role in the binding of glucose. With a 10-fold cross validation accuracy of 84 percent, our SVM model reveals seven new potential glucose-binding sites in HSA of which two are exposed only during the dynamics of HSA. The predictions are further corroborated using docking studies. These findings can complement studies directed towards the development of HSA as an alternate biomarker for glycemic monitoring.
Collapse
|
14
|
Al-Ali H. The evolution of drug discovery: from phenotypes to targets, and back. MEDCHEMCOMM 2016. [DOI: 10.1039/c6md00129g] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Cumulative scientific and technological advances over the past two centuries have transformed drug discovery from a largely serendipitous process into the high tech pipelines of today.
Collapse
Affiliation(s)
- Hassan Al-Ali
- Miami Project to Cure Paralysis
- University of Miami Miller School of Medicine
- Miami FL 33136
- USA
| |
Collapse
|
15
|
Pai PP, Mondal S. MOWGLI: prediction of protein-MannOse interacting residues With ensemble classifiers usinG evoLutionary Information. J Biomol Struct Dyn 2015; 34:2069-83. [PMID: 26457920 DOI: 10.1080/07391102.2015.1106978] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Proteins interact with carbohydrates to perform various cellular interactions. Of the many carbohydrate ligands that proteins bind with, mannose constitute an important class, playing important roles in host defense mechanisms. Accurate identification of mannose-interacting residues (MIR) may provide important clues to decipher the underlying mechanisms of protein-mannose interactions during infections. This study proposes an approach using an ensemble of base classifiers for prediction of MIR using their evolutionary information in the form of position-specific scoring matrix. The base classifiers are random forests trained by different subsets of training data set Dset128 using 10-fold cross-validation. The optimized ensemble of base classifiers, MOWGLI, is then used to predict MIR on protein chains of the test data set Dtestset29 which showed a promising performance with 92.0% accurate prediction. An overall improvement of 26.6% in precision was observed upon comparison with the state-of-art. It is hoped that this approach, yielding enhanced predictions, could be eventually used for applications in drug design and vaccine development.
Collapse
Affiliation(s)
- Priyadarshini P Pai
- a Department of Biological Sciences , Birla Institute of Technology and Science-Pilani , K.K. Birla Goa Campus, Near NH17 Bypass Road, Zuarinagar , Goa 403726 , India
| | - Sukanta Mondal
- a Department of Biological Sciences , Birla Institute of Technology and Science-Pilani , K.K. Birla Goa Campus, Near NH17 Bypass Road, Zuarinagar , Goa 403726 , India
| |
Collapse
|
16
|
Zhao H, Yang Y, von Itzstein M, Zhou Y. Carbohydrate-binding protein identification by coupling structural similarity searching with binding affinity prediction. J Comput Chem 2014; 35:2177-83. [PMID: 25220682 DOI: 10.1002/jcc.23730] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Revised: 05/27/2014] [Accepted: 08/25/2014] [Indexed: 02/03/2023]
Abstract
Carbohydrate-binding proteins (CBPs) are potential biomarkers and drug targets. However, the interactions between carbohydrates and proteins are challenging to study experimentally and computationally because of their low binding affinity, high flexibility, and the lack of a linear sequence in carbohydrates as exists in RNA, DNA, and proteins. Here, we describe a structure-based function-prediction technique called SPOT-Struc that identifies carbohydrate-recognizing proteins and their binding amino acid residues by structural alignment program SPalign and binding affinity scoring according to a knowledge-based statistical potential based on the distance-scaled finite-ideal gas reference state (DFIRE). The leave-one-out cross-validation of the method on 113 carbohydrate-binding domains and 3442 noncarbohydrate binding proteins yields a Matthews correlation coefficient of 0.56 for SPalign alone and 0.63 for SPOT-Struc (SPalign + binding affinity scoring) for CBP prediction. SPOT-Struc is a technique with high positive predictive value (79% correct predictions in all positive CBP predictions) with a reasonable sensitivity (52% positive predictions in all CBPs). The sensitivity of the method was changed slightly when applied to 31 APO (unbound) structures found in the protein databank (14/31 for APO versus 15/31 for HOLO). The result of SPOT-Struc will not change significantly if highly homologous templates were used. SPOT-Struc predicted 19 out of 2076 structural genome targets as CBPs. In particular, one uncharacterized protein in Bacillus subtilis (1oq1A) was matched to galectin-9 from Mus musculus. Thus, SPOT-Struc is useful for uncovering novel carbohydrate-binding proteins. SPOT-Struc is available at http://sparks-lab.org.
Collapse
Affiliation(s)
- Huiying Zhao
- Indiana University School of Informatics, Indiana University Purdue University, Indianapolis, 719 Indiana Ave, Suite 319, Indianapolis, Indiana, 46202; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, 46202
| | | | | | | |
Collapse
|
17
|
Malik A, Lee J, Lee J. Community-based network study of protein-carbohydrate interactions in plant lectins using glycan array data. PLoS One 2014; 9:e95480. [PMID: 24755681 PMCID: PMC3995809 DOI: 10.1371/journal.pone.0095480] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 03/27/2014] [Indexed: 12/14/2022] Open
Abstract
Lectins play major roles in biological processes such as immune recognition and regulation, inflammatory responses, cytokine signaling, and cell adhesion. Recently, glycan microarrays have shown to play key roles in understanding glycobiology, allowing us to study the relationship between the specificities of glycan binding proteins and their natural ligands at the omics scale. However, one of the drawbacks in utilizing glycan microarray data is the lack of systematic analysis tools to extract information. In this work, we attempt to group various lectins and their interacting carbohydrates by using community-based analysis of a lectin-carbohydrate network. The network consists of 1119 nodes and 16769 edges and we have identified 3 lectins having large degrees of connectivity playing the roles of hubs. The community based network analysis provides an easy way to obtain a general picture of the lectin-glycan interaction and many statistically significant functional groups.
Collapse
Affiliation(s)
- Adeel Malik
- Center for In Silico Protein Science, School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
| | - Juyong Lee
- Center for In Silico Protein Science, School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science, School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
- * E-mail:
| |
Collapse
|
18
|
Panwar B, Gupta S, Raghava GPS. Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information. BMC Bioinformatics 2013; 14:44. [PMID: 23387468 PMCID: PMC3577447 DOI: 10.1186/1471-2105-14-44] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Accepted: 01/31/2013] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The vitamins are important cofactors in various enzymatic-reactions. In past, many inhibitors have been designed against vitamin binding pockets in order to inhibit vitamin-protein interactions. Thus, it is important to identify vitamin interacting residues in a protein. It is possible to detect vitamin-binding pockets on a protein, if its tertiary structure is known. Unfortunately tertiary structures of limited proteins are available. Therefore, it is important to develop in-silico models for predicting vitamin interacting residues in protein from its primary structure. RESULTS In this study, first we compared protein-interacting residues of vitamins with other ligands using Two Sample Logo (TSL). It was observed that ATP, GTP, NAD, FAD and mannose preferred {G,R,K,S,H}, {G,K,T,S,D,N}, {T,G,Y}, {G,Y,W} and {Y,D,W,N,E} residues respectively, whereas vitamins preferred {Y,F,S,W,T,G,H} residues for the interaction with proteins. Furthermore, compositional information of preferred and non-preferred residues along with patterns-specificity was also observed within different vitamin-classes. Vitamins A, B and B6 preferred {F,I,W,Y,L,V}, {S,Y,G,T,H,W,N,E} and {S,T,G,H,Y,N} interacting residues respectively. It suggested that protein-binding patterns of vitamins are different from other ligands, and motivated us to develop separate predictor for vitamins and their sub-classes. The four different prediction modules, (i) vitamin interacting residues (VIRs), (ii) vitamin-A interacting residues (VAIRs), (iii) vitamin-B interacting residues (VBIRs) and (iv) pyridoxal-5-phosphate (vitamin B6) interacting residues (PLPIRs) have been developed. We applied various classifiers of SVM, BayesNet, NaiveBayes, ComplementNaiveBayes, NaiveBayesMultinomial, RandomForest and IBk etc., as machine learning techniques, using binary and Position-Specific Scoring Matrix (PSSM) features of protein sequences. Finally, we selected best performing SVM modules and obtained highest MCC of 0.53, 0.48, 0.61, 0.81 for VIRs, VAIRs, VBIRs, PLPIRs respectively, using PSSM-based evolutionary information. All the modules developed in this study have been trained and tested on non-redundant datasets and evaluated using five-fold cross-validation technique. The performances were also evaluated on the balanced and different independent datasets. CONCLUSIONS This study demonstrates that it is possible to predict VIRs, VAIRs, VBIRs and PLPIRs from evolutionary information of protein sequence. In order to provide service to the scientific community, we have developed web-server and standalone software VitaPred (http://crdd.osdd.net/raghava/vitapred/).
Collapse
Affiliation(s)
- Bharat Panwar
- Bioinformatics Centre, Institute of Microbial Technology (CSIR), Chandigarh, India
| | | | | |
Collapse
|
19
|
Khare H, Ratnaparkhi V, Chavan S, Jayraman V. Prediction of protein-mannose binding sites using random forest. Bioinformation 2012; 8:1202-5. [PMID: 23275720 PMCID: PMC3530872 DOI: 10.6026/97320630081202] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2012] [Accepted: 11/19/2012] [Indexed: 11/23/2022] Open
Abstract
Mannose is an abundant cell surface monosaccharide and has an important role in many biochemical processes. It binds to a great diversity of receptor proteins. In this study we have employed Random Forest for prediction of mannose binding sites. Mannosebinding site is taken to be a sphere around the centroid of the ligand and the sphere is subdivided into different layers and atom wise and residue wise features were extracted for each layer. The method achieves 95.59 % of accuracy using Random Forest with 10 fold cross validation. Prediction of mannose binding site analysis will be quite useful in drug design.
Collapse
Affiliation(s)
| | | | - Sonali Chavan
- Bioinformatics centre, University of Pune, Pune, India
| | - Valadi Jayraman
- Centre for Development of Advanced Computing (C-DAC), Pune, India
| |
Collapse
|
20
|
Local functional descriptors for surface comparison based binding prediction. BMC Bioinformatics 2012; 13:314. [PMID: 23176080 PMCID: PMC3585919 DOI: 10.1186/1471-2105-13-314] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2012] [Accepted: 10/10/2012] [Indexed: 11/10/2022] Open
Abstract
Background Molecular recognition in proteins occurs due to appropriate arrangements of physical, chemical, and geometric properties of an atomic surface. Similar surface regions should create similar binding interfaces. Effective methods for comparing surface regions can be used in identifying similar regions, and to predict interactions without regard to the underlying structural scaffold that creates the surface. Results We present a new descriptor for protein functional surfaces and algorithms for using these descriptors to compare protein surface regions to identify ligand binding interfaces. Our approach uses descriptors of local regions of the surface, and assembles collections of matches to compare larger regions. Our approach uses a variety of physical, chemical, and geometric properties, adaptively weighting these properties as appropriate for different regions of the interface. Our approach builds a classifier based on a training corpus of examples of binding sites of the target ligand. The constructed classifiers can be applied to a query protein providing a probability for each position on the protein that the position is part of a binding interface. We demonstrate the effectiveness of the approach on a number of benchmarks, demonstrating performance that is comparable to the state-of-the-art, with an approach with more generality than these prior methods. Conclusions Local functional descriptors offer a new method for protein surface comparison that is sufficiently flexible to serve in a variety of applications.
Collapse
|
21
|
Tsai KC, Jian JW, Yang EW, Hsu PC, Peng HP, Chen CT, Chen JB, Chang JY, Hsu WL, Yang AS. Prediction of carbohydrate binding sites on protein surfaces with 3-dimensional probability density distributions of interacting atoms. PLoS One 2012; 7:e40846. [PMID: 22848404 PMCID: PMC3405063 DOI: 10.1371/journal.pone.0040846] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Accepted: 06/13/2012] [Indexed: 11/22/2022] Open
Abstract
Non-covalent protein-carbohydrate interactions mediate molecular targeting in many biological processes. Prediction of non-covalent carbohydrate binding sites on protein surfaces not only provides insights into the functions of the query proteins; information on key carbohydrate-binding residues could suggest site-directed mutagenesis experiments, design therapeutics targeting carbohydrate-binding proteins, and provide guidance in engineering protein-carbohydrate interactions. In this work, we show that non-covalent carbohydrate binding sites on protein surfaces can be predicted with relatively high accuracy when the query protein structures are known. The prediction capabilities were based on a novel encoding scheme of the three-dimensional probability density maps describing the distributions of 36 non-covalent interacting atom types around protein surfaces. One machine learning model was trained for each of the 30 protein atom types. The machine learning algorithms predicted tentative carbohydrate binding sites on query proteins by recognizing the characteristic interacting atom distribution patterns specific for carbohydrate binding sites from known protein structures. The prediction results for all protein atom types were integrated into surface patches as tentative carbohydrate binding sites based on normalized prediction confidence level. The prediction capabilities of the predictors were benchmarked by a 10-fold cross validation on 497 non-redundant proteins with known carbohydrate binding sites. The predictors were further tested on an independent test set with 108 proteins. The residue-based Matthews correlation coefficient (MCC) for the independent test was 0.45, with prediction precision and sensitivity (or recall) of 0.45 and 0.49 respectively. In addition, 111 unbound carbohydrate-binding protein structures for which the structures were determined in the absence of the carbohydrate ligands were predicted with the trained predictors. The overall prediction MCC was 0.49. Independent tests on anti-carbohydrate antibodies showed that the carbohydrate antigen binding sites were predicted with comparable accuracy. These results demonstrate that the predictors are among the best in carbohydrate binding site predictions to date.
Collapse
Affiliation(s)
| | - Jhih-Wei Jian
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Ei-Wen Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Information Sciences, Academia Sinica, Taipei, Taiwan
| | - Po-Chiang Hsu
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Hung-Pin Peng
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Ching-Tai Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Institute of Bioinformatics and Systems Biology, National Chiao-Tung University, Hsinchu, Taiwan
| | - Jun-Bo Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan
| | - Jeng-Yih Chang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Wen-Lian Hsu
- Institute of Information Sciences, Academia Sinica, Taipei, Taiwan
| | - An-Suei Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- * E-mail:
| |
Collapse
|
22
|
A Santos JC, Nassif H, Page D, Muggleton SH, E Sternberg MJ. Automated identification of protein-ligand interaction features using Inductive Logic Programming: a hexose binding case study. BMC Bioinformatics 2012; 13:162. [PMID: 22783946 PMCID: PMC3458898 DOI: 10.1186/1471-2105-13-162] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2011] [Accepted: 06/15/2012] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND There is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP), which automatically generates comprehensible rules in addition to prediction. The development of ILP systems which can learn rules of the complexity required for studies on protein structure remains a challenge. In this work we use a new ILP system, ProGolem, and demonstrate its performance on learning features of hexose-protein interactions. RESULTS The rules induced by ProGolem detect interactions mediated by aromatics and by planar-polar residues, in addition to less common features such as the aromatic sandwich. The rules also reveal a previously unreported dependency for residues cys and leu. They also specify interactions involving aromatic and hydrogen bonding residues. This paper shows that Inductive Logic Programming implemented in ProGolem can derive rules giving structural features of protein/ligand interactions. Several of these rules are consistent with descriptions in the literature. CONCLUSIONS In addition to confirming literature results, ProGolem's model has a 10-fold cross-validated predictive accuracy that is superior, at the 95% confidence level, to another ILP system previously used to study protein/hexose interactions and is comparable with state-of-the-art statistical learners.
Collapse
Affiliation(s)
- Jose C A Santos
- Computational Bioinformatics Laboratory, Department of Computer Science, Imperial College London, London, SW7 2BZ, UK
| | - Houssam Nassif
- Department of Computer Sciences, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI-53706, USA
| | - David Page
- Department of Computer Sciences, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI-53706, USA
| | - Stephen H Muggleton
- Computational Bioinformatics Laboratory, Department of Computer Science, Imperial College London, London, SW7 2BZ, UK
| | - Michael J E Sternberg
- Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
23
|
Identification of mannose interacting residues using local composition. PLoS One 2011; 6:e24039. [PMID: 21931639 PMCID: PMC3172211 DOI: 10.1371/journal.pone.0024039] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2011] [Accepted: 07/29/2011] [Indexed: 01/24/2023] Open
Abstract
Background Mannose binding proteins (MBPs) play a vital role in several biological functions such as defense mechanisms. These proteins bind to mannose on the surface of a wide range of pathogens and help in eliminating these pathogens from our body. Thus, it is important to identify mannose interacting residues (MIRs) in order to understand mechanism of recognition of pathogens by MBPs. Results This paper describes modules developed for predicting MIRs in a protein. Support vector machine (SVM) based models have been developed on 120 mannose binding protein chains, where no two chains have more than 25% sequence similarity. SVM models were developed on two types of datasets: 1) main dataset consists of 1029 mannose interacting and 1029 non-interacting residues, 2) realistic dataset consists of 1029 mannose interacting and 10320 non-interacting residues. In this study, firstly, we developed standard modules using binary and PSSM profile of patterns and got maximum MCC around 0.32. Secondly, we developed SVM modules using composition profile of patterns and achieved maximum MCC around 0.74 with accuracy 86.64% on main dataset. Thirdly, we developed a model on a realistic dataset and achieved maximum MCC of 0.62 with accuracy 93.08%. Based on this study, a standalone program and web server have been developed for predicting mannose interacting residues in proteins (http://www.imtech.res.in/raghava/premier/). Conclusions Compositional analysis of mannose interacting and non-interacting residues shows that certain types of residues are preferred in mannose interaction. It was also observed that residues around mannose interacting residues have a preference for certain types of residues. Composition of patterns/peptide/segment has been used for predicting MIRs and achieved reasonable high accuracy. It is possible that this novel strategy may be effective to predict other types of interacting residues. This study will be useful in annotating the function of protein as well as in understanding the role of mannose in the immune system.
Collapse
|
24
|
Doxey AC, Cheng Z, Moffatt BA, McConkey BJ. Structural motif screening reveals a novel, conserved carbohydrate-binding surface in the pathogenesis-related protein PR-5d. BMC STRUCTURAL BIOLOGY 2010; 10:23. [PMID: 20678238 PMCID: PMC2924342 DOI: 10.1186/1472-6807-10-23] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2010] [Accepted: 08/03/2010] [Indexed: 01/10/2023]
Abstract
BACKGROUND Aromatic amino acids play a critical role in protein-glycan interactions. Clusters of surface aromatic residues and their features may therefore be useful in distinguishing glycan-binding sites as well as predicting novel glycan-binding proteins. In this work, a structural bioinformatics approach was used to screen the Protein Data Bank (PDB) for coplanar aromatic motifs similar to those found in known glycan-binding proteins. RESULTS The proteins identified in the screen were significantly associated with carbohydrate-related functions according to gene ontology (GO) enrichment analysis, and predicted motifs were found frequently within novel folds and glycan-binding sites not included in the training set. In addition to numerous binding sites predicted in structural genomics proteins of unknown function, one novel prediction was a surface motif (W34/W36/W192) in the tobacco pathogenesis-related protein, PR-5d. Phylogenetic analysis revealed that the surface motif is exclusive to a subfamily of PR-5 proteins from the Solanaceae family of plants, and is absent completely in more distant homologs. To confirm PR-5d's insoluble-polysaccharide binding activity, a cellulose-pulldown assay of tobacco proteins was performed and PR-5d was identified in the cellulose-binding fraction by mass spectrometry. CONCLUSIONS Based on the combined results, we propose that the putative binding site in PR-5d may be an evolutionary adaptation of Solanaceae plants including potato, tomato, and tobacco, towards defense against cellulose-containing pathogens such as species of the deadly oomycete genus, Phytophthora. More generally, the results demonstrate that coplanar aromatic clusters on protein surfaces are a structural signature of glycan-binding proteins, and can be used to computationally predict novel glycan-binding proteins from 3 D structure.
Collapse
Affiliation(s)
- Andrew C Doxey
- Department of Biology, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, N2L 3G1, Canada
- Department of Developmental Biology, Stanford University, Stanford, CA, 94305, USA
| | - Zhenyu Cheng
- Department of Biology, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, N2L 3G1, Canada
| | - Barbara A Moffatt
- Department of Biology, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, N2L 3G1, Canada
| | - Brendan J McConkey
- Department of Biology, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, N2L 3G1, Canada
| |
Collapse
|
25
|
Nassif H, Al-Ali H, Khuri S, Keirouz W, Page D. An Inductive Logic Programming Approach to Validate Hexose Binding Biochemical Knowledge. INDUCTIVE LOGIC PROGRAMMING. ILP 2010; 5989:149-165. [PMID: 25309972 PMCID: PMC4190110 DOI: 10.1007/978-3-642-13840-9_14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Hexoses are simple sugars that play a key role in many cellular pathways, and in the regulation of development and disease mechanisms. Current protein-sugar computational models are based, at least partially, on prior biochemical findings and knowledge. They incorporate different parts of these findings in predictive black-box models. We investigate the empirical support for biochemical findings by comparing Inductive Logic Programming (ILP) induced rules to actual biochemical results. We mine the Protein Data Bank for a representative data set of hexose binding sites, non-hexose binding sites and surface grooves. We build an ILP model of hexose-binding sites and evaluate our results against several baseline machine learning classifiers. Our method achieves an accuracy similar to that of other black-box classifiers while providing insight into the discriminating process. In addition, it confirms wet-lab findings and reveals a previously unreported Trp-Glu amino acids dependency.
Collapse
Affiliation(s)
- Houssam Nassif
- Department of Computer Sciences, University of Wisconsin-Madison, USA
| | - Hassan Al-Ali
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, USA
| | - Sawsan Khuri
- Department of Biochemistry and Molecular Biology, University of Miami, Florida, USA
| | - Walid Keirouz
- Center for Computational Science, University of Miami, Florida, USA
| | - David Page
- The Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Florida, USA
| |
Collapse
|
26
|
Transcriptome analysis of agmatine and putrescine catabolism in Pseudomonas aeruginosa PAO1. J Bacteriol 2008; 192:4317-26. [PMID: 18192388 DOI: 10.1128/jb.00335-10] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Polyamines (putrescine, spermidine, and spermine) are major organic polycations essential for a wide spectrum of cellular processes. The cells require mechanisms to maintain homeostasis of intracellular polyamines to prevent otherwise severe adverse effects. We performed a detailed transcriptome profile analysis of Pseudomonas aeruginosa in response to agmatine and putrescine with an emphasis in polyamine catabolism. Agmatine serves as the precursor compound for putrescine (and hence spermidine and spermine), which was proposed to convert into 4-aminobutyrate (GABA) and succinate before entering the tricarboxylic acid cycle in support of cell growth, as the sole source of carbon and nitrogen. Two acetylpolyamine amidohydrolases, AphA and AphB, were found to be involved in the conversion of agmatine into putrescine. Enzymatic products of AphA were confirmed by mass spectrometry analysis. Interestingly, the alanine-pyruvate cycle was shown to be indispensable for polyamine utilization. The newly identified dadRAX locus encoding the regulator alanine transaminase and racemase coupled with SpuC, the major putrescine-pyruvate transaminase, were key components to maintaining alanine homeostasis. Corresponding mutant strains were severely hampered in polyamine utilization. On the other hand, an alternative gamma-glutamylation pathway for the conversion of putrescine into GABA is present in some organisms. Subsequently, GabD, GabT, and PA5313 were identified for GABA utilization. The growth defect of the PA5313 gabT double mutant in GABA suggested the importance of these two transaminases. The succinic-semialdehyde dehydrogenase activity of GabD and its induction by GABA were also demonstrated in vitro. Polyamine utilization in general was proven to be independent of the PhoPQ two-component system, even though a modest induction of this operon was induced by polyamines. Multiple potent catabolic pathways, as depicted in this study, could serve pivotal roles in the control of intracellular polyamine levels.
Collapse
|