Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Stark A, Russell RB. Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures. Nucleic Acids Res 2003;31:3341-4. [PMID: 12824322 PMCID: PMC168913 DOI: 10.1093/nar/gkg506] [Citation(s) in RCA: 139] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

For:	Stark A, Russell RB. Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures. Nucleic Acids Res 2003;31:3341-4. [PMID: 12824322 PMCID: PMC168913 DOI: 10.1093/nar/gkg506] [Citation(s) in RCA: 139] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Riziotis IG, Ribeiro AJM, Borkakoti N, Thornton JM. The 3D Modules of Enzyme Catalysis: Deconstructing Active Sites into Distinct Functional Entities. J Mol Biol 2023;435:168254. [PMID: 37652131 DOI: 10.1016/j.jmb.2023.168254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 08/20/2023] [Accepted: 08/22/2023] [Indexed: 09/02/2023]

Riziotis IG, Thornton JM. Capturing the geometry, function, and evolution of enzymes with 3D templates. Protein Sci 2022;31:e4363. [PMID: 35762726 PMCID: PMC9207746 DOI: 10.1002/pro.4363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 05/06/2022] [Accepted: 05/14/2022] [Indexed: 11/05/2022]

Tsoy O, Mushegian A. Florigen and its homologs of FT/CETS/PEBP/RKIP/YbhB family may be the enzymes of small molecule metabolism: review of the evidence. BMC PLANT BIOLOGY 2022;22:56. [PMID: 35086479 PMCID: PMC8793217 DOI: 10.1186/s12870-022-03432-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 01/07/2022] [Indexed: 06/14/2023]

Pan X, Kortemme T. De novo protein fold families expand the designable ligand binding site space. PLoS Comput Biol 2021;17:e1009620. [PMID: 34807909 PMCID: PMC8648124 DOI: 10.1371/journal.pcbi.1009620] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 12/06/2021] [Accepted: 11/08/2021] [Indexed: 11/19/2022] Open

Abstract

A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place (“match”) these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein “scaffold”, which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.

De novo design of proteins that can bind to novel and highly diverse user-defined small molecule ligands could have broad biomedical and synthetic biology applications. Because ligand binding site geometries need to be accommodated by protein backbone scaffolds at high accuracy, the diversity of scaffolds is a major limitation for designing new ligand binding functions. Advances in computational protein structure design methods have significantly increased the number of accessible stable scaffold structures. Understanding how many new ligand binding sites can be designed into the de novo scaffolds is important for engineering novel ligand binding proteins. To answer this question, we constructed a large library of ligand binding sites from the Protein Data Bank (PDB). We tested the number of ligand binding sites that can be designed into de novo scaffolds and naturally existing scaffolds with the same fold topologies. The results showed that de novo scaffolds significantly expanded the potential ligand binding space of their respective fold topologies. We also identified factors that affect difficulties of binding site accommodation, as well as the relationship between the number of scaffolds and the accessible ligand binding site space. We believe our findings will benefit future method development and applications of ligand binding protein design.

Collapse

Moraes JPA, Pappa GL, Pires DEV, Izidoro SC. GASS-WEB: a web server for identifying enzyme active sites based on genetic algorithms. Nucleic Acids Res 2019;45:W315-W319. [PMID: 28459991 PMCID: PMC5570142 DOI: 10.1093/nar/gkx337] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2017] [Accepted: 04/27/2017] [Indexed: 02/01/2023] Open

3D-PP: A Tool for Discovering Conserved Three-Dimensional Protein Patterns. Int J Mol Sci 2019;20:ijms20133174. [PMID: 31261733 PMCID: PMC6651053 DOI: 10.3390/ijms20133174] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2019] [Revised: 06/19/2019] [Accepted: 06/20/2019] [Indexed: 01/25/2023] Open

Abstract

Discovering conserved three-dimensional (3D) patterns among protein structures may provide valuable insights into protein classification, functional annotations or the rational design of multi-target drugs. Thus, several computational tools have been developed to discover and compare protein 3D-patterns. However, most of them only consider previously known 3D-patterns such as orthosteric binding sites or structural motifs. This fact makes necessary the development of new methods for the identification of all possible 3D-patterns that exist in protein structures (allosteric sites, enzyme-cofactor interaction motifs, among others). In this work, we present 3D-PP, a new free access web server for the discovery and recognition all similar 3D amino acid patterns among a set of proteins structures (independent of their sequence similarity). This new tool does not require any previous structural knowledge about ligands, and all data are organized in a high-performance graph database. The input can be a text file with the PDB access codes or a zip file of PDB coordinates regardless of the origin of the structural data: X-ray crystallographic experiments or in silico homology modeling. The results are presented as lists of sequence patterns that can be further analyzed within the web page. We tested the accuracy and suitability of 3D-PP using two sets of proteins coming from the Protein Data Bank: (a) Zinc finger containing and (b) Serotonin target proteins. We also evaluated its usefulness for the discovering of new 3D-patterns, using a set of protein structures coming from in silico homology modeling methodologies, all of which are overexpressed in different types of cancer. Results indicate that 3D-PP is a reliable, flexible and friendly-user tool to identify conserved structural motifs, which could be relevant to improve the knowledge about protein function or classification. The web server can be freely utilized at https://appsbio.utalca.cl/3d-pp/.

Collapse

Kaiser F, Labudde D. Unsupervised Discovery of Geometrically Common Structural Motifs and Long-Range Contacts in Protein 3D Structures. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019;16:671-680. [PMID: 29990265 DOI: 10.1109/tcbb.2017.2786250] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Fetrow JS, Babbitt PC. New computational approaches to understanding molecular protein function. PLoS Comput Biol 2018;14:e1005756. [PMID: 29621256 PMCID: PMC5886384 DOI: 10.1371/journal.pcbi.1005756] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Žváček C, Friedrichs G, Heizinger L, Merkl R. An assessment of catalytic residue 3D ensembles for the prediction of enzyme function. BMC Bioinformatics 2015;16:359. [PMID: 26538500 PMCID: PMC4634577 DOI: 10.1186/s12859-015-0807-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 10/29/2015] [Indexed: 12/03/2022] Open

Abstract

Background

The central element of each enzyme is the catalytic site, which commonly catalyzes a single biochemical reaction with high specificity. It was unclear to us how often sites that catalyze the same or highly similar reactions evolved on different, i. e. non-homologous protein folds and how similar their 3D poses are. Both similarities are key criteria for assessing the usability of pose comparison for function prediction.

Results

We have analyzed the SCOP database on the superfamily level in order to estimate the number of non-homologous enzymes possessing the same function according to their EC number. 89 % of the 873 substrate-specific functions (four digit EC number) assigned to mono-functional, single-domain enzymes were only found in one superfamily. For a reaction-specific grouping (three digit EC number), this value dropped to 35 %, indicating that in approximately 65 % of all enzymes the same function evolved in two or more non-homologous proteins.

For these isofunctional enzymes, structural similarity of the catalytic sites may help to predict function, because neither high sequence similarity nor identical folds are required for a comparison. To assess the specificity of catalytic 3D poses, we compiled the redundancy-free set ENZ_SITES, which comprises 695 sites, whose composition and function are well-defined. We compared their poses with the help of the program Superpose3D and determined classification performance. If the sites were from different superfamilies, the number of true and false positive predictions was similarly high, both for a coarse and a detailed grouping of enzyme function. Moreover, classification performance did not improve drastically, if we additionally used homologous sites to predict function.

Conclusions

For a large number of enzymatic functions, dissimilar sites evolved that catalyze the same reaction and it is the individual substrate that determines the arrangement of the catalytic site and its local environment. These substrate-specific requirements turn the comparison of catalytic residues into a weak classifier for the prediction of enzyme function.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0807-6) contains supplementary material, which is available to authorized users.

Collapse

Krotzky T, Grunwald C, Egerland U, Klebe G. Large-scale mining for similar protein binding pockets: with RAPMAD retrieval on the fly becomes real. J Chem Inf Model 2014;55:165-79. [PMID: 25474400 DOI: 10.1021/ci5005898] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]

Izidoro SC, de Melo-Minardi RC, Pappa GL. GASS: identifying enzyme active sites with genetic algorithms. ACTA ACUST UNITED AC 2014;31:864-70. [PMID: 25388152 DOI: 10.1093/bioinformatics/btu746] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

Flores DI, Sotelo-Mundo RR, Brizuela CA. A simple extension to the CMASA method for the prediction of catalytic residues in the presence of single point mutations. PLoS One 2014;9:e108513. [PMID: 25268770 PMCID: PMC4182483 DOI: 10.1371/journal.pone.0108513] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Accepted: 08/31/2014] [Indexed: 11/23/2022] Open

Micale G, Pulvirenti A, Giugno R, Ferro A. Proteins comparison through probabilistic optimal structure local alignment. Front Genet 2014;5:302. [PMID: 25228906 PMCID: PMC4151033 DOI: 10.3389/fgene.2014.00302] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2014] [Accepted: 08/12/2014] [Indexed: 11/13/2022] Open

Krotzky T, Fober T, Hüllermeier E, Klebe G. Extended Graph-Based Models for Enhanced Similarity Search in Cavbase. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014;11:878-890. [PMID: 26356860 DOI: 10.1109/tcbb.2014.2325020] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Rigden DJ, Eberhardt RY, Gilbert HJ, Xu Q, Chang Y, Godzik A. Structure- and context-based analysis of the GxGYxYP family reveals a new putative class of glycoside hydrolase. BMC Bioinformatics 2014;15:196. [PMID: 24938123 PMCID: PMC4071793 DOI: 10.1186/1471-2105-15-196] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 06/10/2014] [Indexed: 01/24/2023] Open

Feinstein WP, Brylinski M. eFindSite: Enhanced Fingerprint-Based Virtual Screening Against Predicted Ligand Binding Sites in Protein Models. Mol Inform 2014;33:135-50. [PMID: 27485570 DOI: 10.1002/minf.201300143] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Accepted: 12/06/2013] [Indexed: 12/26/2022]

Structure-based function prediction of uncharacterized protein using binding sites comparison. PLoS Comput Biol 2013;9:e1003341. [PMID: 24244144 PMCID: PMC3828134 DOI: 10.1371/journal.pcbi.1003341] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2013] [Accepted: 10/01/2013] [Indexed: 11/25/2022] Open

Abstract

A challenge in structural genomics is prediction of the function of uncharacterized proteins. When proteins cannot be related to other proteins of known activity, identification of function based on sequence or structural homology is impossible and in such cases it would be useful to assess structurally conserved binding sites in connection with the protein's function. In this paper, we propose the function of a protein of unknown activity, the Tm1631 protein from Thermotoga maritima, by comparing its predicted binding site to a library containing thousands of candidate structures. The comparison revealed numerous similarities with nucleotide binding sites including specifically, a DNA-binding site of endonuclease IV. We constructed a model of this Tm1631 protein with a DNA-ligand from the newly found similar binding site using ProBiS, and validated this model by molecular dynamics. The interactions predicted by the Tm1631-DNA model corresponded to those known to be important in endonuclease IV-DNA complex model and the corresponding binding free energies, calculated from these models were in close agreement. We thus propose that Tm1631 is a DNA binding enzyme with endonuclease activity that recognizes DNA lesions in which at least two consecutive nucleotides are unpaired. Our approach is general, and can be applied to any protein of unknown function. It might also be useful to guide experimental determination of function of uncharacterized proteins.

For a substantial proportion of proteins, their functions are not known since these proteins are not related in sequence to any other known proteins. Binding sites are evolutionarily conserved across very distant protein families, and finding similar binding sites between known and unknown proteins can provide clues as to functions of the unknown proteins. We choose one of the “unknown function” proteins, and found, using a novel strategy of binding site comparison to construct a hypothetical protein-ligand complex, subsequently validated by molecular dynamics that this protein most likely binds and repairs the damaged DNA similar to known DNA-repair enzymes. Our methodology is general and enables one to determine functions of other proteins currently labelled as “unknown function”. We envision that the methodology presented herein, the binding sites comparisons enhanced by molecular dynamics, will stimulate the function prediction of other uncharacterized proteins with structures in the Protein Data Bank and boost experimental functional studies of proteins of unknown functions.

Collapse

Jalencas X, Mestres J. Identification of Similar Binding Sites to Detect Distant Polypharmacology. Mol Inform 2013;32:976-90. [PMID: 27481143 DOI: 10.1002/minf.201300082] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Accepted: 07/29/2013] [Indexed: 01/19/2023]

Van Voorst JR, Finzel BC. Searching for likeness in a database of macromolecular complexes. J Chem Inf Model 2013;53:2634-47. [PMID: 24047445 DOI: 10.1021/ci4002537] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Janda JO, Meier A, Merkl R. CLIPS-4D: a classifier that distinguishes structurally and functionally important residue-positions based on sequence and 3D data. ACTA ACUST UNITED AC 2013;29:3029-35. [PMID: 24048358 DOI: 10.1093/bioinformatics/btt519] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Kirshner DA, Nilmeier JP, Lightstone FC. Catalytic site identification--a web server to identify catalytic site structural matches throughout PDB. Nucleic Acids Res 2013;41:W256-65. [PMID: 23680785 PMCID: PMC3692059 DOI: 10.1093/nar/gkt403] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Nilmeier JP, Kirshner DA, Wong SE, Lightstone FC. Rapid catalytic template searching as an enzyme function prediction procedure. PLoS One 2013;8:e62535. [PMID: 23675414 PMCID: PMC3651201 DOI: 10.1371/journal.pone.0062535] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Accepted: 03/22/2013] [Indexed: 11/18/2022] Open

Küçükural A, Szilagyi A, Sezerman OU, Zhang Y. Protein Homology Analysis for Function Prediction with Parallel Sub-Graph Isomorphism. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open

Fober T, Mernberger M, Klebe G, Hüllermeier E. Fingerprint Kernels for Protein Structure Comparison. Mol Inform 2012;31:443-52. [PMID: 27477463 DOI: 10.1002/minf.201100149] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2011] [Accepted: 04/03/2012] [Indexed: 11/06/2022]

Structure-based computational analysis of protein binding sites for function and druggability prediction. J Biotechnol 2012;159:123-34. [DOI: 10.1016/j.jbiotec.2011.12.005] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2011] [Revised: 12/02/2011] [Accepted: 12/06/2011] [Indexed: 11/19/2022]

Wilkins AD, Bachman BJ, Erdin S, Lichtarge O. The use of evolutionary patterns in protein annotation. Curr Opin Struct Biol 2012;22:316-25. [PMID: 22633559 DOI: 10.1016/j.sbi.2012.05.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2012] [Accepted: 05/01/2012] [Indexed: 01/13/2023]

Anand P, Yeturu K, Chandra N. PocketAnnotate: towards site-based function annotation. Nucleic Acids Res 2012;40:W400-8. [PMID: 22618878 PMCID: PMC3394344 DOI: 10.1093/nar/gks421] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Trabuco LG, Lise S, Petsalaki E, Russell RB. PepSite: prediction of peptide-binding sites from protein surfaces. Nucleic Acids Res 2012;40:W423-7. [PMID: 22600738 PMCID: PMC3394340 DOI: 10.1093/nar/gks398] [Citation(s) in RCA: 131] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open

Konc J, Janezic D. ProBiS-2012: web server and web services for detection of structurally similar binding sites in proteins. Nucleic Acids Res 2012;40:W214-21. [PMID: 22600737 PMCID: PMC3394329 DOI: 10.1093/nar/gks435] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Ueno K, Mineta K, Ito K, Endo T. Exploring functionally related enzymes using radially distributed properties of active sites around the reacting points of bound ligands. BMC STRUCTURAL BIOLOGY 2012;12:5. [PMID: 22536854 PMCID: PMC3408369 DOI: 10.1186/1472-6807-12-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2011] [Accepted: 04/26/2012] [Indexed: 11/10/2022]

Ming D, Wall ME. Predicting binding sites by analyzing allosteric effects. Methods Mol Biol 2012;796:423-436. [PMID: 22052504 DOI: 10.1007/978-1-61779-334-9_23] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Pires DEV, de Melo-Minardi RC, dos Santos MA, da Silveira CH, Santoro MM, Meira W. Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns. BMC Genomics 2011;12 Suppl 4:S12. [PMID: 22369665 PMCID: PMC3287581 DOI: 10.1186/1471-2164-12-s4-s12] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Abstract

Background

The unforgiving pace of growth of available biological data has increased the demand for efficient and scalable paradigms, models and methodologies for automatic annotation. In this paper, we present a novel structure-based protein function prediction and structural classification method: Cutoff Scanning Matrix (CSM). CSM generates feature vectors that represent distance patterns between protein residues. These feature vectors are then used as evidence for classification. Singular value decomposition is used as a preprocessing step to reduce dimensionality and noise. The aspect of protein function considered in the present work is enzyme activity. A series of experiments was performed on datasets based on Enzyme Commission (EC) numbers and mechanistically different enzyme superfamilies as well as other datasets derived from SCOP release 1.75.

Results

CSM was able to achieve a precision of up to 99% after SVD preprocessing for a database derived from manually curated protein superfamilies and up to 95% for a dataset of the 950 most-populated EC numbers. Moreover, we conducted experiments to verify our ability to assign SCOP class, superfamily, family and fold to protein domains. An experiment using the whole set of domains found in last SCOP version yielded high levels of precision and recall (up to 95%). Finally, we compared our structural classification results with those in the literature to place this work into context. Our method was capable of significantly improving the recall of a previous study while preserving a compatible precision level.

Conclusions

We showed that the patterns derived from CSMs could effectively be used to predict protein function and thus help with automatic function annotation. We also demonstrated that our method is effective in structural classification tasks. These facts reinforce the idea that the pattern of inter-residue distances is an important component of family structural signatures. Furthermore, singular value decomposition provided a consistent increase in precision and recall, which makes it an important preprocessing step when dealing with noisy data.

Collapse

Current methods in structural proteomics and its applications in biological sciences. 3 Biotech 2011. [PMCID: PMC3376864 DOI: 10.1007/s13205-011-0037-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/29/2022] Open

Stegemann B, Klebe G. Cofactor-binding sites in proteins of deviating sequence: comparative analysis and clustering in torsion angle, cavity, and fold space. Proteins 2011;80:626-48. [PMID: 22095739 DOI: 10.1002/prot.23226] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Revised: 09/29/2011] [Accepted: 10/10/2011] [Indexed: 12/13/2022]

Mernberger M, Klebe G, Hüllermeier E. SEGA: semiglobal graph alignment for structure-based protein comparison. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;8:1330-1343. [PMID: 21339532 DOI: 10.1109/tcbb.2011.35] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]

Yeturu K, Chandra N. PocketAlign a novel algorithm for aligning binding sites in protein structures. J Chem Inf Model 2011;51:1725-36. [PMID: 21662242 DOI: 10.1021/ci200132z] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Abstract

A fundamental task in bioinformatics involves a transfer of knowledge from one protein molecule onto another by way of recognizing similarities. Such similarities are obtained at different levels, that of sequence, whole fold, or important substructures. Comparison of binding sites is important to understand functional similarities among the proteins and also to understand drug cross-reactivities. Current methods in literature have their own merits and demerits, warranting exploration of newer concepts and algorithms, especially for large-scale comparisons and for obtaining accurate residue-wise mappings. Here, we report the development of a new algorithm, PocketAlign, for obtaining structural superpositions of binding sites. The software is available as a web-service at http://proline.physics.iisc.ernet.in/pocketalign/. The algorithm encodes shape descriptors in the form of geometric perspectives, supplemented by chemical group classification. The shape descriptor considers several perspectives with each residue as the focus and captures relative distribution of residues around it in a given site. Residue-wise pairings are computed by comparing the set of perspectives of the first site with that of the second, followed by a greedy approach that incrementally combines residue pairings into a mapping. The mappings in different frames are then evaluated by different metrics encoding the extent of alignment of individual geometric perspectives. Different initial seed alignments are computed, each subsequently extended by detecting consequential atomic alignments in a three-dimensional grid, and the best 500 stored in a database. Alignments are then ranked, and the top scoring alignments reported, which are then streamed into Pymol for visualization and analyses. The method is validated for accuracy and sensitivity and benchmarked against existing methods. An advantage of PocketAlign, as compared to some of the existing tools available for binding site comparison in literature, is that it explores different schemes for identifying an alignment thus has a better potential to capture similarities in ligand recognition abilities. PocketAlign, by finding a detailed alignment of a pair of sites, provides insights as to why two sites are similar and which set of residues and atoms contribute to the similarity.

Collapse

Moll M, Bryant DH, Kavraki LE. The LabelHash Server and Tools for substructure-based functional annotation. Bioinformatics 2011;27:2161-2. [DOI: 10.1093/bioinformatics/btr343] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Kato T, Nagano N. Discriminative structural approaches for enzyme active-site prediction. BMC Bioinformatics 2011;12 Suppl 1:S49. [PMID: 21342581 PMCID: PMC3044306 DOI: 10.1186/1471-2105-12-s1-s49] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Searching the protein structure database for ligand-binding site similarities using CPASS v.2. BMC Res Notes 2011;4:17. [PMID: 21269480 PMCID: PMC3057182 DOI: 10.1186/1756-0500-4-17] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2010] [Accepted: 01/26/2011] [Indexed: 11/17/2022] Open

Abstract

Background

A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2) database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR) protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function.

Findings

We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG) to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated.

Conclusions

CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores for false positives by ~30%, while leaving true positives unaffected. Importantly, receiver operating characteristics (ROC) curves demonstrate the high correlation between CPASS similarity scores and an accurate functional assignment. As indicated by distribution curves, scores ≥ 30% infer a functional similarity. Software URL: http://cpass.unl.edu.

Collapse

Grant MA. INTEGRATING COMPUTATIONAL PROTEIN FUNCTION PREDICTION INTO DRUG DISCOVERY INITIATIVES. Drug Dev Res 2010;72:4-16. [PMID: 25530654 DOI: 10.1002/ddr.20397] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Sharma A, Malakar P. Structure modeling and comparative genomics for epimerase enzyme (Gal10p). Bioinformation 2010;5:266-70. [PMID: 21364830 PMCID: PMC3055706 DOI: 10.6026/97320630005266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2010] [Accepted: 10/20/2010] [Indexed: 11/23/2022] Open

Prymula K, Jadczyk T, Roterman I. Catalytic residues in hydrolases: analysis of methods designed for ligand-binding site prediction. J Comput Aided Mol Des 2010;25:117-33. [PMID: 21104192 PMCID: PMC3032897 DOI: 10.1007/s10822-010-9402-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2010] [Accepted: 11/08/2010] [Indexed: 11/26/2022]

Moll M, Bryant DH, Kavraki LE. The LabelHash algorithm for substructure matching. BMC Bioinformatics 2010;11:555. [PMID: 21070651 PMCID: PMC2996407 DOI: 10.1186/1471-2105-11-555] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2010] [Accepted: 11/11/2010] [Indexed: 08/30/2023] Open

Abstract

Background

There is an increasing number of proteins with known structure but unknown function. Determining their function would have a significant impact on understanding diseases and designing new therapeutics. However, experimental protein function determination is expensive and very time-consuming. Computational methods can facilitate function determination by identifying proteins that have high structural and chemical similarity.

Results

We present LabelHash, a novel algorithm for matching substructural motifs to large collections of protein structures. The algorithm consists of two phases. In the first phase the proteins are preprocessed in a fashion that allows for instant lookup of partial matches to any motif. In the second phase, partial matches for a given motif are expanded to complete matches. The general applicability of the algorithm is demonstrated with three different case studies. First, we show that we can accurately identify members of the enolase superfamily with a single motif. Next, we demonstrate how LabelHash can complement SOIPPA, an algorithm for motif identification and pairwise substructure alignment. Finally, a large collection of Catalytic Site Atlas motifs is used to benchmark the performance of the algorithm. LabelHash runs very efficiently in parallel; matching a motif against all proteins in the 95% sequence identity filtered non-redundant Protein Data Bank typically takes no more than a few minutes. The LabelHash algorithm is available through a web server and as a suite of standalone programs at http://labelhash.kavrakilab.org. The output of the LabelHash algorithm can be further analyzed with Chimera through a plugin that we developed for this purpose.

Conclusions

LabelHash is an efficient, versatile algorithm for large-scale substructure matching. When LabelHash is running in parallel, motifs can typically be matched against the entire PDB on the order of minutes. The algorithm is able to identify functional homologs beyond the twilight zone of sequence identity and even beyond fold similarity. The three case studies presented in this paper illustrate the versatility of the algorithm.

Collapse

Wohlkönig A, Huet J, Looze Y, Wintjens R. Structural relationships in the lysozyme superfamily: significant evidence for glycoside hydrolase signature motifs. PLoS One 2010;5:e15388. [PMID: 21085702 PMCID: PMC2976769 DOI: 10.1371/journal.pone.0015388] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2010] [Accepted: 08/31/2010] [Indexed: 11/19/2022] Open

Abstract

BACKGROUND

Chitin is a polysaccharide that forms the hard, outer shell of arthropods and the cell walls of fungi and some algae. Peptidoglycan is a polymer of sugars and amino acids constituting the cell walls of most bacteria. Enzymes that are able to hydrolyze these cell membrane polymers generally play important roles for protecting plants and animals against infection with insects and pathogens. A particular group of such glycoside hydrolase enzymes share some common features in their three-dimensional structure and in their molecular mechanism, forming the lysozyme superfamily.

RESULTS

Besides having a similar fold, all known catalytic domains of glycoside hydrolase proteins of lysozyme superfamily (families and subfamilies GH19, GH22, GH23, GH24 and GH46) share in common two structural elements: the central helix of the all-α domain, which invariably contains the catalytic glutamate residue acting as general-acid catalyst, and a β-hairpin pointed towards the substrate binding cleft. The invariant β-hairpin structure is interestingly found to display the highest amino acid conservation in aligned sequences of a given family, thereby allowing to define signature motifs for each GH family. Most of such signature motifs are found to have promising performances for searching sequence databases. Our structural analysis further indicates that the GH motifs participate in enzymatic catalysis essentially by containing the catalytic water positioning residue of inverting mechanism.

CONCLUSIONS

The seven families and subfamilies of the lysozyme superfamily all have in common a β-hairpin structure which displays a family-specific sequence motif. These GH β-hairpin motifs contain potentially important residues for the catalytic activity, thereby suggesting the participation of the GH motif to catalysis and also revealing a common catalytic scheme utilized by enzymes of the lysozyme superfamily.

Collapse

Kato T, Nagano N. Metric learning for enzyme active-site search. ACTA ACUST UNITED AC 2010;26:2698-704. [PMID: 20870642 PMCID: PMC2958746 DOI: 10.1093/bioinformatics/btq519] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Sharma A, Nigam A. Structure modeling of novel DNA glycosylase enzyme from oral pathogen Streptococcus sanguinis. Bioinformation 2010;5:136-40. [PMID: 21364794 PMCID: PMC3040489 DOI: 10.6026/97320630005136] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2010] [Accepted: 08/26/2010] [Indexed: 11/23/2022] Open

Li GH, Huang JF. CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation. BMC Bioinformatics 2010;11:439. [PMID: 20796320 PMCID: PMC2936402 DOI: 10.1186/1471-2105-11-439] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2009] [Accepted: 08/27/2010] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB), thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been developed to predict protein function, but these methods need to be improved further, such as, enhancing the accuracy, sensitivity, and the computational speed. Here, an accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm), has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods.

RESULTS

The evaluation of CMASA shows that the CMASA is highly accurate (0.96), sensitive (0.86), and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. Comparing to other local structure comparison-based methods, the CMASA can obtain the better performance than both FFF (a method using geometry to predict protein function) and SPASM (a local structure alignment method); and the CMASA is more sensitive than PINTS and is more accurate than JESS (both are local structure alignment methods). The CMASA was applied to annotate the enzyme catalytic sites of the non-redundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA).

CONCLUSIONS

The CMASA is an accurate algorithm for detecting local protein structural similarity, and it holds several advantages in predicting enzyme active sites. The CMASA can be used in large-scale enzyme active site annotation. The CMASA can be available by the mail-based server (http://159.226.149.45/other1/CMASA/CMASA.htm).

Collapse

Gherardini PF, Ausiello G, Helmer-Citterich M. Superpose3D: a local structural comparison program that allows for user-defined structure representations. PLoS One 2010;5:e11988. [PMID: 20700534 PMCID: PMC2916828 DOI: 10.1371/journal.pone.0011988] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Accepted: 07/08/2010] [Indexed: 11/19/2022] Open

Bandyopadhyay D, Huan J, Liu J, Prins J, Snoeyink J, Wang W, Tropsha A. Functional neighbors: inferring relationships between nonhomologous protein families using family-specific packing motifs. ACTA ACUST UNITED AC 2010;14:1137-43. [PMID: 20570776 DOI: 10.1109/titb.2010.2053550] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Konc J, Janezic D. ProBiS: a web server for detection of structurally similar protein binding sites. Nucleic Acids Res 2010;38:W436-40. [PMID: 20504855 PMCID: PMC2896105 DOI: 10.1093/nar/gkq479] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open