1
|
Atas H, Tuncbag N, Doğan T. Phylogenetic and Other Conservation-Based Approaches to Predict Protein Functional Sites. Methods Mol Biol 2018; 1762:51-69. [PMID: 29594767 DOI: 10.1007/978-1-4939-7756-7_4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2023]
Abstract
Proteins use their functional regions to exploit various activities, including binding to other proteins, nucleic acids, or drugs. Functional sites of the proteins have a tendency to be more conserved than the rest of the protein surface. Therefore, detection of the conserved residues using phylogenetic analysis is a general approach to predict functionally critical residues. In this chapter, we describe some of the available methods to predict functional sites and demonstrate a complete pipeline with tool alternatives at several steps. We explain the standard procedure and all intermediate stages including homology detection with BLAST search, multiple sequence alignment (MSA) and the construction of a phylogenetic tree for a given query sequence. Additionally, we demonstrate the prediction results of these methods on a case study. Finally, we discuss the possible challenges and bottlenecks throughout the pipeline. Our step-by-step description about the functional site prediction could be a helpful resource for the researchers interested in finding protein functional sites, to be used in drug discovery research.
Collapse
Affiliation(s)
- Heval Atas
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, 06800, Turkey.,Cancer Systems Biology Laboratory (CanSyL), METU, Ankara, 06800, Turkey
| | - Nurcan Tuncbag
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, 06800, Turkey.,Cancer Systems Biology Laboratory (CanSyL), METU, Ankara, 06800, Turkey
| | - Tunca Doğan
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, 06800, Turkey. .,Cancer Systems Biology Laboratory (CanSyL), METU, Ankara, 06800, Turkey. .,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, CB10 1SD, UK.
| |
Collapse
|
2
|
Jian JW, Elumalai P, Pitti T, Wu CY, Tsai KC, Chang JY, Peng HP, Yang AS. Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms. PLoS One 2016; 11:e0160315. [PMID: 27513851 PMCID: PMC4981321 DOI: 10.1371/journal.pone.0160315] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Accepted: 07/18/2016] [Indexed: 11/18/2022] Open
Abstract
Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites.
Collapse
Affiliation(s)
- Jhih-Wei Jian
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan 11221
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan 115
| | | | - Thejkiran Pitti
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan 115
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, Taiwan 30013
| | - Chih Yuan Wu
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
| | - Keng-Chang Tsai
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
| | - Jeng-Yih Chang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
| | - Hung-Pin Peng
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
| | - An-Suei Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
- * E-mail:
| |
Collapse
|
3
|
Zeng P, Li J, Ma W, Cui Q. Rsite: a computational method to identify the functional sites of noncoding RNAs. Sci Rep 2015; 5:9179. [PMID: 25776805 PMCID: PMC4361870 DOI: 10.1038/srep09179] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Accepted: 02/18/2015] [Indexed: 01/01/2023] Open
Abstract
There is an increasing demand for identifying the functional sites of noncoding RNAs (ncRNAs). Here we introduce a tertiary-structure based computational approach, Rsite, which first calculates the Euclidean distances between each nucleotide and all the other nucleotides in a RNA molecule and then determines the nucleotides that are the extreme points in the distance curve as the functional sites. By analyzing two ncRNAs, tRNA (Lys) and Diels-Alder ribozyme, we demonstrated the efficiency of Rsite. As a result, Rsite recognized all of the known functional sites of the two ncRNAs, suggesting that Rsite could be a potentially useful tool for discovering the functional sites of ncRNAs. The source codes and data sets of Rsite are available at http://www.cuilab.cn/rsite.
Collapse
Affiliation(s)
- Pan Zeng
- Department of Biomedical Informatics, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 xueyuan Rd, Beijing. 100191, China
| | - Jianwei Li
- Lab of Translational Biomedicine Informatics, School of Computer Science and Engineering, Hebei University of Technology, 5340 Xiping Rd, Tianjin. 300401, China
| | - Wei Ma
- Department of Biomedical Informatics, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 xueyuan Rd, Beijing. 100191, China
| | - Qinghua Cui
- Department of Biomedical Informatics, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 xueyuan Rd, Beijing. 100191, China
| |
Collapse
|
4
|
Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 2015; 44:1172-239. [PMID: 25503938 PMCID: PMC4349129 DOI: 10.1039/c4cs00351a] [Citation(s) in RCA: 251] [Impact Index Per Article: 27.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Indexed: 12/21/2022]
Abstract
The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biology, offers scope for the development of novel biocatalysts that are both highly active and robust.
Collapse
Affiliation(s)
- Andrew Currin
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| | - Neil Swainston
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- School of Computer Science , The University of Manchester , Manchester M13 9PL , UK
| | - Philip J. Day
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- Faculty of Medical and Human Sciences , The University of Manchester , Manchester M13 9PT , UK
| | - Douglas B. Kell
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| |
Collapse
|
5
|
Mills CL, Beuning PJ, Ondrechen MJ. Biochemical functional predictions for protein structures of unknown or uncertain function. Comput Struct Biotechnol J 2015; 13:182-91. [PMID: 25848497 PMCID: PMC4372640 DOI: 10.1016/j.csbj.2015.02.003] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Revised: 02/06/2015] [Accepted: 02/11/2015] [Indexed: 01/07/2023] Open
Abstract
With the exponential growth in the determination of protein sequences and structures via genome sequencing and structural genomics efforts, there is a growing need for reliable computational methods to determine the biochemical function of these proteins. This paper reviews the efforts to address the challenge of annotating the function at the molecular level of uncharacterized proteins. While sequence- and three-dimensional-structure-based methods for protein function prediction have been reviewed previously, the recent trends in local structure-based methods have received less attention. These local structure-based methods are the primary focus of this review. Computational methods have been developed to predict the residues important for catalysis and the local spatial arrangements of these residues can be used to identify protein function. In addition, the combination of different types of methods can help obtain more information and better predictions of function for proteins of unknown function. Global initiatives, including the Enzyme Function Initiative (EFI), COMputational BRidges to EXperiments (COMBREX), and the Critical Assessment of Function Annotation (CAFA), are evaluating and testing the different approaches to predicting the function of proteins of unknown function. These initiatives and global collaborations will increase the capability and reliability of methods to predict biochemical function computationally and will add substantial value to the current volume of structural genomics data by reducing the number of absent or inaccurate functional annotations.
Collapse
Affiliation(s)
- Caitlyn L Mills
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, United States
| | - Penny J Beuning
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, United States
| | - Mary Jo Ondrechen
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, United States
| |
Collapse
|
6
|
Nemoto W, Saito A, Oikawa H. Recent advances in functional region prediction by using structural and evolutionary information - Remaining problems and future extensions. Comput Struct Biotechnol J 2013; 8:e201308007. [PMID: 24688747 PMCID: PMC3962155 DOI: 10.5936/csbj.201308007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 11/12/2013] [Accepted: 11/13/2013] [Indexed: 11/22/2022] Open
Abstract
Structural genomics projects have solved many new structures with unknown functions. One strategy to investigate the function of a structure is to computationally find the functionally important residues or regions on it. Therefore, the development of functional region prediction methods has become an important research subject. An effective approach is to use a method employing structural and evolutionary information, such as the evolutionary trace (ET) method. ET ranks the residues of a protein structure by calculating the scores for relative evolutionary importance, and locates functionally important sites by identifying spatial clusters of highly ranked residues. After ET was developed, numerous ET-like methods were subsequently reported, and many of them are in practical use, although they require certain conditions. In this mini review, we first introduce the remaining problems and the recent improvements in the methods using structural and evolutionary information. We then summarize the recent developments of the methods. Finally, we conclude by describing possible extensions of the evolution- and structure-based methods.
Collapse
Affiliation(s)
- Wataru Nemoto
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Akira Saito
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Hayato Oikawa
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| |
Collapse
|
7
|
Costa EP, Vens C, Blockeel H. Top-down clustering for protein subfamily identification. Evol Bioinform Online 2013; 9:185-202. [PMID: 23700359 PMCID: PMC3653887 DOI: 10.4137/ebo.s11609] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
We propose a novel method for the task of protein subfamily identification; that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein sequences, then uses a post-pruning procedure to extract clusters from the tree. Differently from existing methods, it constructs the hierarchical tree top-down, rather than bottom-up and associates particular mutations with each division into subclusters. The motivating hypothesis for this method is that it may yield a better tree topology with more accurate subfamily identification as a result and additionally indicates functionally important sites and allows for easy classification of new proteins. A thorough experimental evaluation confirms the hypothesis. The novel method yields more accurate clusters and a better tree topology than the state-of-the-art method SCI-PHY, identifies known functional sites, and identifies mutations that alone allow for classifying new sequences with an accuracy approaching that of hidden Markov models.
Collapse
|
8
|
Relationship between global structural parameters and Enzyme Commission hierarchy: Implications for function prediction. Comput Biol Chem 2012; 40:15-9. [DOI: 10.1016/j.compbiolchem.2012.06.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2012] [Revised: 05/06/2012] [Accepted: 06/22/2012] [Indexed: 11/18/2022]
|
9
|
A holistic in silico approach to predict functional sites in protein structures. Bioinformatics 2012; 28:1845-50. [DOI: 10.1093/bioinformatics/bts269] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
10
|
Affiliation(s)
- Maria Kontoyianni
- Department
of Pharmaceutical Sciences and §Department of Psychology, Southern Illinois University Edwardsville, Edwardsville,
Illinois 62026, United States
| | - Christopher B. Rosnick
- Department
of Pharmaceutical Sciences and §Department of Psychology, Southern Illinois University Edwardsville, Edwardsville,
Illinois 62026, United States
| |
Collapse
|
11
|
Abstract
As the field of synthetic biology is developing, the prospects for de novo design of biosynthetic pathways are becoming more and more realistic. Hence, there is an increasing need for computational tools that can support these efforts. A range of algorithms has been developed that can be used to identify all possible metabolic pathways and their corresponding enzymatic parts. These can then be ranked according to various properties and modelled in an organism-specific context. Finally, design software can aid the biologist in the integration of a selected pathway into smartly regulated transcriptional units. Here, we review key existing tools and offer suggestions for how informatics can help to shape the future of synthetic microbiology.
Collapse
|
12
|
Kochańczyk M. Prediction of functionally important residues in globular proteins from unusual central distances of amino acids. BMC STRUCTURAL BIOLOGY 2011; 11:34. [PMID: 21923943 PMCID: PMC3188475 DOI: 10.1186/1472-6807-11-34] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2011] [Accepted: 09/18/2011] [Indexed: 12/12/2022]
Abstract
BACKGROUND Well-performing automated protein function recognition approaches usually comprise several complementary techniques. Beside constructing better consensus, their predictive power can be improved by either adding or refining independent modules that explore orthogonal features of proteins. In this work, we demonstrated how the exploration of global atomic distributions can be used to indicate functionally important residues. RESULTS Using a set of carefully selected globular proteins, we parametrized continuous probability density functions describing preferred central distances of individual protein atoms. Relative preferred burials were estimated using mixture models of radial density functions dependent on the amino acid composition of a protein under consideration. The unexpectedness of extraordinary locations of atoms was evaluated in the information-theoretic manner and used directly for the identification of key amino acids. In the validation study, we tested capabilities of a tool built upon our approach, called SurpResi, by searching for binding sites interacting with ligands. The tool indicated multiple candidate sites achieving success rates comparable to several geometric methods. We also showed that the unexpectedness is a property of regions involved in protein-protein interactions, and thus can be used for the ranking of protein docking predictions. The computational approach implemented in this work is freely available via a Web interface at http://www.bioinformatics.org/surpresi. CONCLUSIONS Probabilistic analysis of atomic central distances in globular proteins is capable of capturing distinct orientational preferences of amino acids as resulting from different sizes, charges and hydrophobic characters of their side chains. When idealized spatial preferences can be inferred from the sole amino acid composition of a protein, residues located in hydrophobically unfavorable environments can be easily detected. Such residues turn out to be often directly involved in binding ligands or interfacing with other proteins.
Collapse
Affiliation(s)
- Marek Kochańczyk
- Faculty of Physics, Jagiellonian University, ul, Reymonta 4, 30-059 Krakow, Poland.
| |
Collapse
|
13
|
Somarowthu S, Yang H, Hildebrand DG, Ondrechen MJ. High-performance prediction of functional residues in proteins with machine learning and computed input features. Biopolymers 2011; 95:390-400. [DOI: 10.1002/bip.21589] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
14
|
Mitternacht S, Berezovsky IN. A geometry-based generic predictor for catalytic and allosteric sites. Protein Eng Des Sel 2010; 24:405-9. [PMID: 21159618 DOI: 10.1093/protein/gzq115] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
An important aspect of understanding protein allostery, and of artificial effector design, is the characterization and prediction of substrate- and effector-binding sites. To find binding sites in allosteric enzymes, many of which are oligomeric with allosteric sites at domain interfaces, we devise a local centrality measure for residue interaction graphs, which behaves well for both small/monomeric and large/multimeric proteins. The measure is purely structure based and has a clear geometrical interpretation and no free parameters. It is not biased towards typically catalytic residues, a property that is crucial when looking for non-catalytic effector sites, which are potent drug targets.
Collapse
Affiliation(s)
- Simon Mitternacht
- Computational Biology Unit, Bergen Center for Computational Science, Bergen, Norway
| | | |
Collapse
|
15
|
Volkamer A, Griewel A, Grombacher T, Rarey M. Analyzing the Topology of Active Sites: On the Prediction of Pockets and Subpockets. J Chem Inf Model 2010; 50:2041-52. [DOI: 10.1021/ci100241y] [Citation(s) in RCA: 122] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Andrea Volkamer
- Research Group for Computational Molecular Design, Bundesstr. 43, 20146 Hamburg, Germany, and Merck KGaA, Frankfurter Str. 250, 64293 Darmstadt, Germany
| | - Axel Griewel
- Research Group for Computational Molecular Design, Bundesstr. 43, 20146 Hamburg, Germany, and Merck KGaA, Frankfurter Str. 250, 64293 Darmstadt, Germany
| | - Thomas Grombacher
- Research Group for Computational Molecular Design, Bundesstr. 43, 20146 Hamburg, Germany, and Merck KGaA, Frankfurter Str. 250, 64293 Darmstadt, Germany
| | - Matthias Rarey
- Research Group for Computational Molecular Design, Bundesstr. 43, 20146 Hamburg, Germany, and Merck KGaA, Frankfurter Str. 250, 64293 Darmstadt, Germany
| |
Collapse
|
16
|
Thangudu RR, Tyagi M, Shoemaker BA, Bryant SH, Panchenko AR, Madej T. Knowledge-based annotation of small molecule binding sites in proteins. BMC Bioinformatics 2010; 11:365. [PMID: 20594344 PMCID: PMC2909224 DOI: 10.1186/1471-2105-11-365] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Accepted: 07/01/2010] [Indexed: 11/16/2022] Open
Abstract
Background The study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity. Results We have developed a new method for the annotation of protein-small molecule binding sites, using inference by homology, which allows us to extend annotation onto protein sequences without experimental data available. To ensure biological relevance of binding sites, our method clusters similar binding sites found in homologous protein structures based on their sequence and structure conservation. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query and to better gauge their biological relevance. The method also facilitates a succinct and informative representation of observed and inferred binding sites from homologs with known three-dimensional structures, thereby providing the means to analyze conservation and diversity of binding modes. Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting point in small molecule virtual screening. The method was validated by comparison to other binding site prediction methods and to a collection of manually curated binding site annotations. We show that our method achieves a sensitivity of 72% at predicting biologically relevant binding sites and can accurately discriminate those sites that bind biological small molecules from non-biological ones. Conclusions A new algorithm has been developed to predict binding sites with high accuracy in terms of their biological validity. It also provides a common platform for function prediction, knowledge-based docking and for small molecule virtual screening. The method can be applied even for a query sequence without structure. The method is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi.
Collapse
Affiliation(s)
- Ratna R Thangudu
- National Center for Biotechnology Information, 8600 Rockville Pike, Building 38A, Bethesda, MD 20894, USA
| | | | | | | | | | | |
Collapse
|