1
|
Yamaguchi S, Nakashima H, Moriwaki Y, Terada T, Shimizu K. Prediction of protein mononucleotide binding sites using AlphaFold2 and machine learning. Comput Biol Chem 2022; 100:107744. [DOI: 10.1016/j.compbiolchem.2022.107744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 07/12/2022] [Accepted: 07/22/2022] [Indexed: 11/26/2022]
|
2
|
Yang YH, Wang JS, Yuan SS, Liu ML, Su W, Lin H, Zhang ZY. A Survey for Predicting ATP Binding Residues of Proteins Using Machine Learning Methods. Curr Med Chem 2021; 29:789-806. [PMID: 34514982 DOI: 10.2174/0929867328666210910125802] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 06/29/2021] [Accepted: 07/04/2021] [Indexed: 11/22/2022]
Abstract
Protein-ligand interactions are necessary for majority protein functions. Adenosine-5'-triphosphate (ATP) is one such ligand that plays vital role as a coenzyme in providing energy for cellular activities, catalyzing biological reaction and signaling. Knowing ATP binding residues of proteins is helpful for annotation of protein function and drug design. However, due to the huge amounts of protein sequences influx into databases in the post-genome era, experimentally identifying ATP binding residues is cost-ineffective and time-consuming. To address this problem, computational methods have been developed to predict ATP binding residues. In this review, we briefly summarized the application of machine learning methods in detecting ATP binding residues of proteins. We expect this review will be helpful for further research.
Collapse
Affiliation(s)
- Yu-He Yang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Jia-Shu Wang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Shi-Shi Yuan
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Meng-Lu Liu
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Wei Su
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Zhao-Yue Zhang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| |
Collapse
|
3
|
Santhosh R, Satheesh SN, Gurusaran M, Michael D, Sekar K, Jeyakanthan J. NIMS: a database on nucleobase compounds and their interactions in macromolecular structures. J Appl Crystallogr 2016. [DOI: 10.1107/s1600576716006208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The intense exploration of nucleotide-binding protein structures has created a whirlwind in the field of structural biology and bioinformatics. This has led to the conception and birth of NIMS. This database is a collection of detailed data on the nucleobases, nucleosides and nucleotides, along with their analogues as well as the protein structures to which they bind. Interaction details such as the interacting residues and all associated values have been made available. As a pioneering step, the diffraction precision index for protein structures, the atomic uncertainty for each atom, and the computed errors on the interatomic distances and angles are available in the database. Apart from the above, provision has been made to visualize the three-dimensional structures of both ligands and protein–ligand structures and their interactions inJmolas well asJSmol. One of the salient features of NIMS is that it has been interfaced with a user-friendly and query-based efficient search engine. It was conceived and developed with the aim of serving a significant section of researchers working in the area of protein and nucleobase complexes. NIMS is freely available online at http://iris.physics.iisc.ernet.in/nims and it is hoped that it will prove to be an invaluable asset.
Collapse
|
4
|
Predicting flavin and nicotinamide adenine dinucleotide-binding sites in proteins using the fragment transformation method. BIOMED RESEARCH INTERNATIONAL 2015; 2015:402536. [PMID: 26000290 PMCID: PMC4426894 DOI: 10.1155/2015/402536] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Accepted: 07/21/2014] [Indexed: 11/18/2022]
Abstract
We developed a computational method to identify NAD- and FAD-binding sites in proteins. First, we extracted from the Protein Data Bank structures of proteins that bind to at least one of these ligands. NAD-/FAD-binding residue templates were then constructed by identifying binding residues through the ligand-binding database BioLiP. The fragment transformation method was used to identify structures within query proteins that resembled the ligand-binding templates. By comparing residue types and their relative spatial positions, potential binding sites were identified and a ligand-binding potential for each residue was calculated. Setting the false positive rate at 5%, our method predicted NAD- and FAD-binding sites at true positive rates of 67.1% and 68.4%, respectively. Our method provides excellent results for identifying FAD- and NAD-binding sites in proteins, and the most important is that the requirement of conservation of residue types and local structures in the FAD- and NAD-binding sites can be verified.
Collapse
|
5
|
Usha S, Selvaraj S. Structure-wise discrimination of adenine and guanine by proteins on the basis of their nonbonded interactions. J Biomol Struct Dyn 2014; 33:1474-92. [DOI: 10.1080/07391102.2014.958759] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
6
|
Hu J, He X, Yu DJ, Yang XB, Yang JY, Shen HB. A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction. PLoS One 2014; 9:e107676. [PMID: 25229688 PMCID: PMC4168127 DOI: 10.1371/journal.pone.0107676] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Accepted: 08/09/2014] [Indexed: 12/21/2022] Open
Abstract
Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residues. Alleviating the severity of class imbalance has been demonstrated to be a promising means of improving the prediction performance of a machine-learning-based predictor for class imbalance problems. However, little attention has been paid to the negative impact of class imbalance on protein-nucleotide binding residue prediction. In this study, we propose a new supervised over-sampling algorithm that synthesizes additional minority class samples to address class imbalance. The experimental results from protein-nucleotide interaction datasets demonstrate that the proposed supervised over-sampling algorithm can relieve the severity of class imbalance and help to improve prediction performance. Based on the proposed over-sampling algorithm, a predictor, called TargetSOS, is implemented for protein-nucleotide binding residue prediction. Cross-validation tests and independent validation tests demonstrate the effectiveness of TargetSOS. The web-server and datasets used in this study are freely available at http://www.csbio.sjtu.edu.cn/bioinf/TargetSOS/.
Collapse
Affiliation(s)
- Jun Hu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Xue He
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
- Changshu Institute, Nanjing University of Science and Technology, Changshu, Jiangsu, China
- * E-mail: (DJY); (HBS)
| | - Xi-Bei Yang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
- School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
| | - Jing-Yu Yang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
- * E-mail: (DJY); (HBS)
| |
Collapse
|
7
|
Abstract
The ATP binding proteins exist as a hybrid of proteins with Walker A motif and universal stress proteins (USPs) having an alternative motif for binding ATP. There is an urgent need to find a reliable and comprehensive hybrid predictor for ATP binding proteins using whole sequence information. In this paper the open source LIBSVM toolbox was used to build a classifier at 10-fold cross-validation. The best hybrid model was the combination of amino acid and dipeptide composition with an accuracy of 84.57% and Mathews correlation coefficient (MCC) value of 0.693. This classifier proves to be better than many classical ATP binding protein predictors. The general trend observed is that combinations of descriptors performed better and improved the overall performances of individual descriptors, particularly when combined with amino acid composition. The work developed a comprehensive model for predicting ATP binding proteins irrespective of their functional motifs. This model provides a high probability of success for molecular biologists in predicting and selecting diverse groups of ATP binding proteins irrespective of their functional motifs.
Collapse
|
8
|
Parca L, Ferré F, Ausiello G, Helmer-Citterich M. Nucleos: a web server for the identification of nucleotide-binding sites in protein structures. Nucleic Acids Res 2013; 41:W281-5. [PMID: 23703207 PMCID: PMC3692072 DOI: 10.1093/nar/gkt390] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Nucleos is a web server for the identification of nucleotide-binding sites in protein structures. Nucleos compares the structure of a query protein against a set of known template 3D binding sites representing nucleotide modules, namely the nucleobase, carbohydrate and phosphate. Structural features, clustering and conservation are used to filter and score the predictions. The predicted nucleotide modules are then joined to build whole nucleotide-binding sites, which are ranked by their score. The server takes as input either the PDB code of the query protein structure or a user-submitted structure in PDB format. The output of Nucleos is composed of ranked lists of predicted nucleotide-binding sites divided by nucleotide type (e.g. ATP-like). For each ranked prediction, Nucleos provides detailed information about the score, the template structure and the structural match for each nucleotide module composing the nucleotide-binding site. The predictions on the query structure and the template-binding sites can be viewed directly on the web through a graphical applet. In 98% of the cases, the modules composing correct predictions belong to proteins with no homology relationship between each other, meaning that the identification of brand-new nucleotide-binding sites is possible using information from non-homologous proteins. Nucleos is available at http://nucleos.bio.uniroma2.it/nucleos/.
Collapse
Affiliation(s)
- Luca Parca
- Department of Biology, Centre for Molecular Bioinformatics, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
| | | | | | | |
Collapse
|
9
|
Parca L, Gherardini PF, Truglio M, Mangone I, Ferrè F, Helmer-Citterich M, Ausiello G. Identification of nucleotide-binding sites in protein structures: a novel approach based on nucleotide modularity. PLoS One 2012; 7:e50240. [PMID: 23209685 PMCID: PMC3507729 DOI: 10.1371/journal.pone.0050240] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2012] [Accepted: 10/22/2012] [Indexed: 01/30/2023] Open
Abstract
Nucleotides are involved in several cellular processes, ranging from the transmission of genetic information, to energy transfer and storage. Both sequence and structure based methods have been developed to predict the location of nucleotide-binding sites in proteins. Here we propose a novel methodology that leverages the observation that nucleotide-binding sites have a modular structure. Nucleotides are composed of identifiable fragments, i.e. the phosphate, the nucleobase and the carbohydrate moieties. These fragments are bound by specific structural motifs that recur in proteins of different fold. Moreover these motifs behave as modules and are found in different combinations across fold space. Our method predicts binding sites for each nucleotide fragment by comparing a query protein with a database of templates extracted from proteins of known structure. Whenever a similarity is found the fragment bound by the template is transferred on the query protein, thus identifying a putative binding site. Predictions falling inside the surface of the protein are discarded, and the remaining ones are scored using clustering and conservation. The method is able to rank as first a correct prediction in the 48%, 48% and 68% of the analyzed proteins for the nucleobase, carbohydrate and phosphate respectively, while considering the first five predictions the performances change to 71%, 65% and 86% respectively. Furthermore we attempted to reconstruct the full structure of the binding site, starting from the predicted positions of the fragments. We calculated that in the 59% of the analyzed proteins the method ranks as first a reconstructed binding site or a part of it. Finally we tested the reliability of our method in a real world case in which it has to predict nucleotide-binding sites in unbound proteins. We analyzed proteins whose structure has been solved with and without the nucleotide and observed only little variations in the method performance.
Collapse
Affiliation(s)
- Luca Parca
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | | | - Mauro Truglio
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - Iolanda Mangone
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - Fabrizio Ferrè
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | | | - Gabriele Ausiello
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| |
Collapse
|
10
|
Wang X, Mi G, Wang C, Zhang Y, Li J, Guo Y, Pu X, Li M. Prediction of flavin mono-nucleotide binding sites using modified PSSM profile and ensemble support vector machine. Comput Biol Med 2012; 42:1053-9. [PMID: 22985817 DOI: 10.1016/j.compbiomed.2012.08.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2012] [Revised: 07/12/2012] [Accepted: 08/13/2012] [Indexed: 11/25/2022]
Abstract
Flavin mono-nucleotide (FMN) closely evolves in many biological processes. In this study, a computational method was proposed to identify FMN binding sites based on amino acid sequences of proteins only. A modified Position Specific Score Matrix was used to characterize the local environmental sequence information, and a visible improvement of performance was obtained. Also, the ensemble SVM was applied to solve the imbalanced data problem. Additionally, an independent dataset was built to evaluate the practical performance of the method, and a satisfactory accuracy of 87.87% was achieved. It demonstrates that the method is effective in predicting FMN-binding sites.
Collapse
Affiliation(s)
- Xia Wang
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Coughlin JE, Pandey RK, Padmanabhan S, O'Loughlin KG, Marquis J, Green CE, Mirsalis JC, Iyer RP. Metabolism, pharmacokinetics, tissue distribution, and stability studies of the prodrug analog of an anti-hepatitis B virus dinucleoside phosphorothioate. Drug Metab Dispos 2012; 40:970-81. [PMID: 22328581 DOI: 10.1124/dmd.111.044446] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The alkoxycarbonyloxy dinucleotide prodrug R(p), S(p)-2 is an orally bioavailable anti-hepatitis B virus agent. The compound is efficiently metabolized to the active dinucleoside phosphorothioate R(p), S(p)-1 by human liver microsomes and S9 fraction without cytochrome P450-mediated oxidation or conjugation. The conversion of R(p), S(p)-2 to R(p), S(p)-1 appears to be mediated by liver esterases, occurs in a stereospecific manner, and is consistent with our earlier reported studies of serum-mediated hydrolytic conversion of R(p), S(p)-2 to R(p), S(p)-1. However, further metabolism of R(p), S(p)-1 does not occur. The presence of a minor metabolite, the desulfurized product 10 was noted. The prodrug R(p), S(p)-2 was quite stable in simulated gastric fluid, whereas the active R(p), S(p)-1 had a half-life of <15 min. In simulated intestinal fluid, the prodrug 2 was fully converted to 1 in approximately 3 h, whereas 1 remained stable. To ascertain the tissue distribution of the prodrug 2 in rats, the synthesis of (35)S-labeled R(p), S(p)-2 was undertaken. Tissue distribution studies of orally and intravenously administered radiolabeled [(35)S]2 demonstrated that the radioactivity concentrates in the liver, with the highest liver/plasma ratio in the intravenous group at 1 h being 3.89 (females) and in the oral group at 1 h being 2.86 (males). The preferential distribution of the dinucleotide 1 and its prodrug 2 into liver may be attributed to the presence of nucleoside phosphorothioate backbone because phosphorothioate oligonucleotides also reveal a similar tissue distribution profile upon intravenous administration.
Collapse
Affiliation(s)
- John E Coughlin
- Spring Bank Pharmaceuticals, Inc., S-7, 113 Cedar St., Milford, MA 01757, USA
| | | | | | | | | | | | | | | |
Collapse
|
12
|
Chen K, Mizianty MJ, Kurgan L. Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. ACTA ACUST UNITED AC 2011; 28:331-41. [PMID: 22130595 DOI: 10.1093/bioinformatics/btr657] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Nucleotides are multifunctional molecules that are essential for numerous biological processes. They serve as sources for chemical energy, participate in the cellular signaling and they are involved in the enzymatic reactions. The knowledge of the nucleotide-protein interactions helps with annotation of protein functions and finds applications in drug design. RESULTS We propose a novel ensemble of accurate high-throughput predictors of binding residues from the protein sequence for ATP, ADP, AMP, GTP and GDP. Empirical tests show that our NsitePred method significantly outperforms existing predictors and approaches based on sequence alignment and residue conservation scoring. The NsitePred accurately finds more binding residues and binding sites and it performs particularly well for the sites with residues that are clustered close together in the sequence. The high predictive quality stems from the usage of novel, comprehensive and custom-designed inputs that utilize information extracted from the sequence, evolutionary profiles, several sequence-predicted structural descriptors and sequence alignment. Analysis of the predictive model reveals several sequence-derived hallmarks of nucleotide-binding residues; they are usually conserved and flanked by less conserved residues, and they are associated with certain arrangements of secondary structures and amino acid pairs in the specific neighboring positions in the sequence. AVAILABILITY http://biomine.ece.ualberta.ca/nSITEpred/ CONTACT lkurgan@ece.ualberta.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ke Chen
- School of Computer Science and Software Engineering, Tianjin Polytechnic University, Hedong District, Tianjin 300160, PR of China
| | | | | |
Collapse
|
13
|
Residue propensities, discrimination and binding site prediction of adenine and guanine phosphates. BMC BIOCHEMISTRY 2011; 12:20. [PMID: 21569447 PMCID: PMC3113737 DOI: 10.1186/1471-2091-12-20] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2010] [Accepted: 05/13/2011] [Indexed: 11/15/2022]
Abstract
Background Adenine and guanine phosphates are involved in a number of biological processes such as cell signaling, metabolism and enzymatic cofactor functions. Binding sites in proteins for these ligands are often detected by looking for a previously known motif by alignment based search. This is likely to miss those where a similar binding site has not been previously characterized and when the binding sites do not follow the rule described by predefined motif. Also, it is intriguing how proteins select between adenine and guanine derivative with high specificity. Results Residue preferences for AMP, GMP, ADP, GDP, ATP and GTP have been investigated in details with additional comparison with cyclic variants cAMP and cGMP. We also attempt to predict residues interacting with these nucleotides using information derived from local sequence and evolutionary profiles. Results indicate that subtle differences exist between single residue preferences for specific nucleotides and taking neighbor environment and evolutionary context into account, successful models of their binding site prediction can be developed. Conclusion In this work, we explore how single amino acid propensities for these nucleotides play a role in the affinity and specificity of this set of nucleotides. This is expected to be helpful in identifying novel binding sites for adenine and guanine phosphates, especially when a known binding motif is not detectable.
Collapse
|
14
|
The food colorant erythrosine is a promiscuous protein–protein interaction inhibitor. Biochem Pharmacol 2011; 81:810-8. [DOI: 10.1016/j.bcp.2010.12.020] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2010] [Revised: 12/23/2010] [Accepted: 12/27/2010] [Indexed: 11/22/2022]
|
15
|
Pyrkov TV, Ozerov IV, Blitskaia ED, Efremov RG. [Molecular docking: role of intermolecular contacts in formation of complexes of proteins with nucleotides and peptides]. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2010; 36:482-92. [PMID: 20823916 DOI: 10.1134/s1068162010040023] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Knowledge of 3D-structure of protein-ligand complex is a major prerequisite for understanding the functioning mechanism of cellular proteins and membrane receptors. This is also of a great help in rational drug design projects. In the present paper we briefly review the molecular docking approaches used to predict possible orientation of a ligand in the protein binding site. The recent trends to improve the accuracy and efficiency of docking algorithms are demonstrated with the results obtained in Laboratory of Biomolecular Modeling. Particular attention is paid to protein-ligand hydrophobic and stacking interactions responsible for molecular recognition of ligand fragments. Such type of interactions are not always adequately represented in scoring criteria of docking applications that leads to mismatch in 3D-structure complexes predictions. That is why further inquiry of methods to account for these interactions is now the area of active research.
Collapse
|
16
|
Chauhan JS, Mishra NK, Raghava GPS. Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information. BMC Bioinformatics 2010; 11:301. [PMID: 20525281 PMCID: PMC3098072 DOI: 10.1186/1471-2105-11-301] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2010] [Accepted: 06/03/2010] [Indexed: 11/17/2022] Open
Abstract
Background Guanosine triphosphate (GTP)-binding proteins play an important role in regulation of G-protein. Thus prediction of GTP interacting residues in a protein is one of the major challenges in the field of the computational biology. In this study, an attempt has been made to develop a computational method for predicting GTP interacting residues in a protein with high accuracy (Acc), precision (Prec) and recall (Rc). Result All the models developed in this study have been trained and tested on a non-redundant (40% similarity) dataset using five-fold cross-validation. Firstly, we have developed neural network based models using single sequence and PSSM profile and achieved maximum Matthews Correlation Coefficient (MCC) 0.24 (Acc 61.30%) and 0.39 (Acc 68.88%) respectively. Secondly, we have developed a support vector machine (SVM) based models using single sequence and PSSM profile and achieved maximum MCC 0.37 (Prec 0.73, Rc 0.57, Acc 67.98%) and 0.55 (Prec 0.80, Rc 0.73, Acc 77.17%) respectively. In this work, we have introduced a new concept of predicting GTP interacting dipeptide (two consecutive GTP interacting residues) and tripeptide (three consecutive GTP interacting residues) for the first time. We have developed SVM based model for predicting GTP interacting dipeptides using PSSM profile and achieved MCC 0.64 with precision 0.87, recall 0.74 and accuracy 81.37%. Similarly, SVM based model have been developed for predicting GTP interacting tripeptides using PSSM profile and achieved MCC 0.70 with precision 0.93, recall 0.73 and accuracy 83.98%. Conclusion These results show that PSSM based method performs better than single sequence based method. The prediction models based on dipeptides or tripeptides are more accurate than the traditional model based on single residue. A web server "GTPBinder" http://www.imtech.res.in/raghava/gtpbinder/ based on above models has been developed for predicting GTP interacting residues in a protein.
Collapse
Affiliation(s)
- Jagat S Chauhan
- Bioinformatics Centre, Institute of Microbial Technology (IMTECH), Chandigarh, India
| | | | | |
Collapse
|
17
|
Kasahara K, Kinoshita K, Takagi T. Ligand-binding site prediction of proteins based on known fragment-fragment interactions. ACTA ACUST UNITED AC 2010; 26:1493-9. [PMID: 20472546 PMCID: PMC2881410 DOI: 10.1093/bioinformatics/btq232] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Motivation: The identification of putative ligand-binding sites on proteins is important for the prediction of protein function. Knowledge-based approaches using structure databases have become interesting, because of the recent increase in structural information. Approaches using binding motif information are particularly effective. However, they can only be applied to well-known ligands that frequently appear in the structure databases. Results: We have developed a new method for predicting the binding sites of chemically diverse ligands, by using information about the interactions between fragments. The selection of the fragment size is important. If the fragments are too small, then the patterns derived from the binding motifs cannot be used, since they are many-body interactions, while using larger fragments limits the application to well-known ligands. In our method, we used the main and side chains for proteins, and three successive atoms for ligands, as fragments. After superposition of the fragments, our method builds the conformations of ligands and predicts the binding sites. As a result, our method could accurately predict the binding sites of chemically diverse ligands, even though the Protein Data Bank currently contains a large number of nucleotides. Moreover, a further evaluation for the unbound forms of proteins revealed that our building up procedure was robust to conformational changes induced by ligand binding. Availability: Our method, named ‘BUMBLE’, is available at http://bumble.hgc.jp/ Contact:kasahara@cb.k.u-tokyo.ac.jp Supplementary information:Supplementary Material is available at Bioinformatics online.
Collapse
Affiliation(s)
- Kota Kasahara
- Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8568, Japan.
| | | | | |
Collapse
|
18
|
Ansari HR, Raghava GPS. Identification of NAD interacting residues in proteins. BMC Bioinformatics 2010; 11:160. [PMID: 20353553 PMCID: PMC2853471 DOI: 10.1186/1471-2105-11-160] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2009] [Accepted: 03/30/2010] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Small molecular cofactors or ligands play a crucial role in the proper functioning of cells. Accurate annotation of their target proteins and binding sites is required for the complete understanding of reaction mechanisms. Nicotinamide adenine dinucleotide (NAD+ or NAD) is one of the most commonly used organic cofactors in living cells, which plays a critical role in cellular metabolism, storage and regulatory processes. In the past, several NAD binding proteins (NADBP) have been reported in the literature, which are responsible for a wide-range of activities in the cell. Attempts have been made to derive a rule for the binding of NAD+ to its target proteins. However, so far an efficient model could not be derived due to the time consuming process of structure determination, and limitations of similarity based approaches. Thus a sequence and non-similarity based method is needed to characterize the NAD binding sites to help in the annotation. In this study attempts have been made to predict NAD binding proteins and their interacting residues (NIRs) from amino acid sequence using bioinformatics tools. RESULTS We extracted 1556 proteins chains from 555 NAD binding proteins whose structure is available in Protein Data Bank. Then we removed all redundant protein chains and finally obtained 195 non-redundant NAD binding protein chains, where no two chains have more than 40% sequence identity. In this study all models were developed and evaluated using five-fold cross validation technique on the above dataset of 195 NAD binding proteins. While certain type of residues are preferred (e.g. Gly, Tyr, Thr, His) in NAD interaction, residues like Ala, Glu, Leu, Lys are not preferred. A support vector machine (SVM) based method has been developed using various window lengths of amino acid sequence for predicting NAD interacting residues and obtained maximum Matthew's correlation coefficient (MCC) 0.47 with accuracy 74.13% at window length 17. We also developed a SVM based method using evolutionary information in the form of position specific scoring matrix (PSSM) and obtained maximum MCC 0.75 with accuracy 87.25%. CONCLUSION For the first time a sequence-based method has been developed for the prediction of NAD binding proteins and their interacting residues, in the absence of any prior structural information. The present model will aid in the understanding of NAD+ dependent mechanisms of action in the cell. To provide service to the scientific community, we have developed a user-friendly web server, which is available from URL http://www.imtech.res.in/raghava/nadbinder/.
Collapse
Affiliation(s)
- Hifzur R Ansari
- Institute of Microbial Technology, Sector 39A, Chandigarh, 160036, India
| | | |
Collapse
|
19
|
Mishra NK, Raghava GPS. Prediction of FAD interacting residues in a protein from its primary sequence using evolutionary information. BMC Bioinformatics 2010; 11 Suppl 1:S48. [PMID: 20122222 PMCID: PMC3009520 DOI: 10.1186/1471-2105-11-s1-s48] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background Flavin binding proteins (FBP) plays a critical role in several biological functions such as electron transport system (ETS). These flavoproteins contain very tightly bound, sometimes covalently, flavin adenine dinucleotide (FAD) or flavin mono nucleotide (FMN). The interaction between flavin nucleotide and amino acids of flavoprotein is essential for their functionality. Thus identification of FAD interacting residues in a FBP is an important step for understanding their function and mechanism. Results In this study, we describe models developed for predicting FAD interacting residues using 15, 17 and 19 window pattern. Support vector machine (SVM) based models have been developed using binary pattern of amino acid sequence of protein and achieved maximum accuracy 69.65% with Mathew's Correlation Coefficient (MCC) 0.39 and Area Under Curve (AUC) 0.773. The performance of these models have been improved significantly from 69.65% to 82.86% with MCC 0.66 and AUC 0.904, when evolutionary information is used as input in SVM. The evolutionary information was generated in form of position specific score matrix (PSSM) profile by using PSI-BLAST at e-value 0.001. All models were developed on 198 non-redundant FAD binding protein chains containing 5172 FAD interacting residues and evaluated using fivefold cross-validation technique. Conclusion This study suggests that evolutionary information of 17 amino acid patterns perform best for FAD interacting residues prediction. We also developed a web server which predicts FAD interacting residues in a protein which is freely available for academics.
Collapse
Affiliation(s)
- Nitish K Mishra
- Institute of Microbial Technology, Sector 39A, Chandigarh, India.
| | | |
Collapse
|
20
|
Chauhan JS, Mishra NK, Raghava GPS. Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinformatics 2009; 10:434. [PMID: 20021687 PMCID: PMC2803200 DOI: 10.1186/1471-2105-10-434] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2009] [Accepted: 12/19/2009] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND One of the major challenges in post-genomic era is to provide functional annotations for large number of proteins arising from genome sequencing projects. The function of many proteins depends on their interaction with small molecules or ligands. ATP is one such important ligand that plays critical role as a coenzyme in the functionality of many proteins. There is a need to develop method for identifying ATP interacting residues in a ATP binding proteins (ABPs), in order to understand mechanism of protein-ligands interaction. RESULTS We have compared the amino acid composition of ATP interacting and non-interacting regions of proteins and observed that certain residues are preferred for interaction with ATP. This study describes few models that have been developed for identifying ATP interacting residues in a protein. All these models were trained and tested on 168 non-redundant ABPs chains. First we have developed a Support Vector Machine (SVM) based model using primary sequence of proteins and obtained maximum MCC 0.33 with accuracy of 66.25%. Secondly, another SVM based model was developed using position specific scoring matrix (PSSM) generated by PSI-BLAST. The performance of this model was improved significantly (MCC 0.5) from the previous one, where only the primary sequence of the proteins were used. CONCLUSION This study demonstrates that it is possible to predict 'ATP interacting residues' in a protein with moderate accuracy using its sequence. The evolutionary information is important for the identification of 'ATP interacting residues', as it provides more information compared to the primary sequence. This method will be useful for researchers studying ATP-binding proteins. Based on this study, a web server has been developed for predicting 'ATP interacting residues' in a protein http://www.imtech.res.in/raghava/atpint/.
Collapse
|
21
|
Song WQ, Qin YM, Saito M, Shirai T, Pujol FM, Kastaniotis AJ, Hiltunen JK, Zhu YX. Characterization of two cotton cDNAs encoding trans-2-enoyl-CoA reductase reveals a putative novel NADPH-binding motif. JOURNAL OF EXPERIMENTAL BOTANY 2009; 60:1839-48. [PMID: 19286916 PMCID: PMC2671629 DOI: 10.1093/jxb/erp057] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2008] [Revised: 02/05/2009] [Accepted: 02/13/2009] [Indexed: 05/19/2023]
Abstract
Very long chain fatty acids are important components of plant lipids, suberins, and cuticular waxes. Trans-2-enoyl-CoA reductase (ECR) catalyses the fourth reaction of fatty acid elongation, which is NADPH dependent. In the present study, the expression of two cotton ECR (GhECR) genes revealed by quantitative RT-PCR analysis was up-regulated during cotton fibre elongation. GhECR1 and 2 each contain open reading frames of 933 bp in length, both encoding proteins consisting of 310 amino acid residues. GhECRs show 32% identity to Saccharomyces cerevisiae Tsc13p at the deduced amino acid level, and the GhECR genes were able to restore the viability of the S. cerevisiae haploid tsc13-deletion strain. A putative non-classical NADPH-binding site in GhECR was predicted by an empirical approach. Site-directed mutagenesis in combination with gas chromatography-mass spectrometry analysis suggests that G(5X)IPXG presents a putative novel NADPH-binding motif of the plant ECR family. The data suggest that both GhECR genes encode functional enzymes harbouring non-classical NADPH-binding sites at their C-termini, and are involved in fatty acid elongation during cotton fibre development.
Collapse
Affiliation(s)
- Wen-Qiang Song
- National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, 100871, China
- Department of Biochemistry and Molecular Biology, College of Life Sciences, Peking University, Beijing, 100871, China
| | - Yong-Mei Qin
- National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, 100871, China
- Department of Biochemistry and Molecular Biology, College of Life Sciences, Peking University, Beijing, 100871, China
- To whom correspondence should be addressed. E-mail:
| | - Mihoko Saito
- Department of Bioscience, Nagahama Institute of Bioscience and Technology, 1266 Tamura, Nagahama 526-0829, Japan
| | - Tsuyoshi Shirai
- Department of Bioscience, Nagahama Institute of Bioscience and Technology, 1266 Tamura, Nagahama 526-0829, Japan
| | - François M. Pujol
- Biocenter Oulu and Department of Biochemistry, University of Oulu, PO Box 3000, FI-90014 Oulu, Finland
| | - Alexander J. Kastaniotis
- Biocenter Oulu and Department of Biochemistry, University of Oulu, PO Box 3000, FI-90014 Oulu, Finland
| | - J. Kalervo Hiltunen
- Biocenter Oulu and Department of Biochemistry, University of Oulu, PO Box 3000, FI-90014 Oulu, Finland
| | - Yu-Xian Zhu
- National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, 100871, China
- Department of Biochemistry and Molecular Biology, College of Life Sciences, Peking University, Beijing, 100871, China
| |
Collapse
|
22
|
Gou Z, Kuznetsov IB. On the Accuracy of Sequence-Based Computational Inference of Protein Residues Involved in Interactions with DNA. ACTA ACUST UNITED AC 2008; 3:285-291. [PMID: 20209034 DOI: 10.3923/tasr.2008.285.291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Methods for computational inference of DNA-binding residues in DNA-binding proteins are usually developed using classification techniques trained to distinguish between binding and non-binding residues on the basis of known examples observed in experimentally determined high-resolution structures of protein-DNA complexes. What degree of accuracy can be expected when a computational methods is applied to a particular novel protein remains largely unknown. We test the utility of classification methods on the example of Kernel Logistic Regression (KLR) predictors of DNA-binding residues. We show that predictors that utilize sequence properties of proteins can successfully predict DNA-binding residues in proteins from a novel structural class. We use Multiple Linear Regression (MLR) to establish a quantitative relationship between protein properties and the expected accuracy of KLR predictors. Present results indicate that in the case of novel proteins the expected accuracy provided by an MLR model is close to the actual accuracy and can be used to assess the overall confidence of the prediction.
Collapse
Affiliation(s)
- Zhenkun Gou
- Gen NY sis Center for Excellence in Cancer Genomics, Department of Epidemiology and Biostatistics, University at Albany, One Discovery Drive Rensselaer, 12144 New York, USA
| | | |
Collapse
|
23
|
|
24
|
Sasaki K, Ose T, Okamoto N, Maenaka K, Tanaka T, Masai H, Saito M, Shirai T, Kohda D. Structural basis of the 3'-end recognition of a leading strand in stalled replication forks by PriA. EMBO J 2007; 26:2584-93. [PMID: 17464287 PMCID: PMC1868909 DOI: 10.1038/sj.emboj.7601697] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2006] [Accepted: 03/29/2007] [Indexed: 11/08/2022] Open
Abstract
In eubacteria, PriA helicase detects the stalled DNA replication forks. This critical role of PriA is ascribed to its ability to bind to the 3' end of a nascent leading DNA strand in the stalled replication forks. The crystal structures in complexes with oligonucleotides and the combination of fluorescence correlation spectroscopy and mutagenesis reveal that the N-terminal domain of PriA possesses a binding pocket for the 3'-terminal nucleotide residue of DNA. The interaction with the deoxyribose 3'-OH is essential for the 3'-terminal recognition. In contrast, the direct interaction with 3'-end nucleobase is unexpected, considering the same affinity for oligonucleotides carrying the four bases at the 3' end. Thus, the N-terminal domain of PriA recognizes the 3'-end base in a base-non-selective manner, in addition to the deoxyribose and 5'-side phosphodiester group, of the 3'-terminal nucleotide to acquire both sufficient affinity and non-selectivity to find all of the stalled replication forks generated during DNA duplication. This unique feature is prerequisite for the proper positioning of the helicase domain of PriA on the unreplicated double-stranded DNA.
Collapse
MESH Headings
- Amino Acid Sequence
- Base Sequence
- Binding Sites
- Buffers
- Crystallography, X-Ray
- DNA Helicases/chemistry
- DNA Helicases/genetics
- DNA Helicases/isolation & purification
- DNA Helicases/metabolism
- DNA Helicases/physiology
- DNA Replication/physiology
- DNA, Bacterial/physiology
- Databases, Protein
- Escherichia coli/physiology
- Escherichia coli Proteins/chemistry
- Escherichia coli Proteins/genetics
- Escherichia coli Proteins/isolation & purification
- Escherichia coli Proteins/metabolism
- Escherichia coli Proteins/physiology
- Histidine/chemistry
- Hydrogen Bonding
- Hydrogen-Ion Concentration
- Hydrophobic and Hydrophilic Interactions
- Ligands
- Models, Chemical
- Models, Molecular
- Molecular Sequence Data
- Oligonucleotides/analysis
- Oligonucleotides/chemistry
- Phosphates/chemistry
- Point Mutation
- Protein Binding
- Protein Structure, Secondary
- Protein Structure, Tertiary
- Rhodamines/metabolism
- Sequence Homology, Amino Acid
- Spectrometry, Fluorescence
- Spectrum Analysis, Raman
- Thrombin/pharmacology
Collapse
Affiliation(s)
- Kaori Sasaki
- Division of Structural Biology, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Toyoyuki Ose
- Division of Structural Biology, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | | | - Katsumi Maenaka
- Division of Structural Biology, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Taku Tanaka
- Genome Dynamics Project, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Hisao Masai
- Genome Dynamics Project, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Mihoko Saito
- Department of Bioscience, Nagahama Institute of Bioscience and Technology, and JST-BIRD, Siga, Japan
| | - Tsuyoshi Shirai
- Department of Bioscience, Nagahama Institute of Bioscience and Technology, and JST-BIRD, Siga, Japan
| | - Daisuke Kohda
- Division of Structural Biology, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
- Division of Structural Biology, Medical Institute of Bioregulation, Kyushu University, Maidashi 3-1-1, Higashi-ku, Fukuoka 812-8582, Japan. Tel.:+81 92 642 6968; Fax: +81 92 642 6764; E-mail:
| |
Collapse
|
25
|
Pyrkov TV, Kosinsky YA, Arseniev AS, Priestle JP, Jacoby E, Efremov RG. Complementarity of hydrophobic properties in ATP-protein binding: a new criterion to rank docking solutions. Proteins 2007; 66:388-98. [PMID: 17094116 DOI: 10.1002/prot.21122] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
ATP is an important substrate of numerous biochemical reactions in living cells. Molecular recognition of this ligand by proteins is very important for understanding enzymatic mechanisms. Considerable insight into the problem may be gained via molecular docking simulations. At the same time, standard docking protocols are often insufficient to predict correct conformations for protein-ATP complexes. Thus, in most cases the native-like solutions can be found among the docking poses, but current scoring functions have only limited ability to discriminate them from false positives. To improve the selection of correct docking solutions obtained with the GOLD software, we developed a new ranking criterion specific for ATP-protein binding. The method is based on detailed analysis of the intermolecular interactions in 40 high-resolution 3D structures of ATP-protein complexes (the training set). We found that the most important factors governing this recognition are hydrogen-bonding, stacking between adenine and aromatic protein residues, and hydrophobic contacts between adenine and protein residues. To address the latter, we applied the formalism of 3D molecular hydrophobicity potential. The results obtained were used to construct an ATP-oriented scoring criterion as a linear combination of the terms describing these intermolecular interactions. The criterion was then validated using the test set of 10 additional ATP-protein complexes. As compared with the standard scoring functions, the new ranking criterion significantly improved the selection of correct docking solutions in both sets and allowed considerable enrichment at the top of the list containing docking poses with correct solutions.
Collapse
Affiliation(s)
- Timothy V Pyrkov
- M.M. Shemyakin & Yu. A. Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Ul. Miklukho-Maklaya, 16/10, 117997 GSP, Moscow V-437, Russia.
| | | | | | | | | | | |
Collapse
|