1
|
Riziotis IG, Ribeiro AJM, Borkakoti N, Thornton JM. The 3D Modules of Enzyme Catalysis: Deconstructing Active Sites into Distinct Functional Entities. J Mol Biol 2023; 435:168254. [PMID: 37652131 DOI: 10.1016/j.jmb.2023.168254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 08/20/2023] [Accepted: 08/22/2023] [Indexed: 09/02/2023]
Abstract
Enzyme catalysis is governed by a limited toolkit of residues and organic or inorganic co-factors. Therefore, it is expected that recurring residue arrangements will be found across the enzyme space, which perform a defined catalytic function, are structurally similar and occur in unrelated enzymes. Leveraging the integrated information in the Mechanism and Catalytic Site Atlas (M-CSA) (enzyme structure, sequence, catalytic residue annotations, catalysed reaction, detailed mechanism description), 3D templates were derived to represent compact groups of catalytic residues. A fuzzy template-template search, allowed us to identify those recurring motifs, which are conserved or convergent, that we define as the "modules of enzyme catalysis". We show that a large fraction of these modules facilitate binding of metal ions, co-factors and substrates, and are frequently the result of convergent evolution. A smaller number of convergent modules perform a well-defined catalytic role, such as the variants of the catalytic triad (i.e. Ser-His-Asp/Cys-His-Asp) and the saccharide-cleaving Asp/Glu triad. It is also shown that enzymes whose functions have diverged during evolution preserve regions of their active site unaltered, as shown by modules performing similar or identical steps of the catalytic mechanism. We have compiled a comprehensive library of catalytic modules, that characterise a broad spectrum of enzymes. These modules can be used as templates in enzyme design and for better understanding catalysis in 3D.
Collapse
Affiliation(s)
- Ioannis G Riziotis
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK.
| | - António J M Ribeiro
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| | - Neera Borkakoti
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| | - Janet M Thornton
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| |
Collapse
|
2
|
Shen X, Zhang S, Long J, Chen C, Wang M, Cui Z, Chen B, Tan T. A Highly Sensitive Model Based on Graph Neural Networks for Enzyme Key Catalytic Residue Prediction. J Chem Inf Model 2023; 63:4277-4290. [PMID: 37399293 DOI: 10.1021/acs.jcim.3c00273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2023]
Abstract
Determining the catalytic site of enzymes is a great help for understanding the relationship between protein sequence, structure, and function, which provides the basis and targets for designing, modifying, and enhancing enzyme activity. The unique local spatial configuration bound to the substrate at the active center of the enzyme determines the catalytic ability of enzymes and plays an important role in the catalytic site prediction. As a suitable tool, the graph neural network can better understand and identify the residue sites with unique local spatial configurations due to its remarkable ability to characterize the three-dimensional structural features of proteins. Consequently, a novel model for predicting enzyme catalytic sites has been developed, which incorporates a uniquely designed adaptive edge-gated graph attention neural network (AEGAN). This model is capable of effectively handling sequential and structural characteristics of proteins at various levels, and the extracted features enable an accurate description of the local spatial configuration of the enzyme active site by sampling the local space around candidate residues and special design of amino acid physical and chemical properties. To evaluate its performance, the model was compared with existing catalytic site prediction models using different benchmark datasets and achieved the best results on each benchmark dataset. The model exhibited a sensitivity of 0.9659, accuracy of 0.9226, and area under the precision-recall curve (AUPRC) of 0.9241 on the independent test set constructed for evaluation. Furthermore, the F1-score of this model is nearly four times higher than that of the best-performing similar model in previous studies. This research can serve as a valuable tool to help researchers understand protein sequence-structure-function relationships while facilitating the characterization of novel enzymes of unknown function.
Collapse
Affiliation(s)
- Xiaowei Shen
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| | - Shiding Zhang
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| | - Jianyu Long
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| | - Changjing Chen
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| | - Meng Wang
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| | - Ziheng Cui
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| | - Biqiang Chen
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| | - Tianwei Tan
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| |
Collapse
|
3
|
Eguida M, Rognan D. Estimating the Similarity between Protein Pockets. Int J Mol Sci 2022; 23:12462. [PMID: 36293316 PMCID: PMC9604425 DOI: 10.3390/ijms232012462] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 10/15/2022] [Accepted: 10/16/2022] [Indexed: 10/28/2023] Open
Abstract
With the exponential increase in publicly available protein structures, the comparison of protein binding sites naturally emerged as a scientific topic to explain observations or generate hypotheses for ligand design, notably to predict ligand selectivity for on- and off-targets, explain polypharmacology, and design target-focused libraries. The current review summarizes the state-of-the-art computational methods applied to pocket detection and comparison as well as structural druggability estimates. The major strengths and weaknesses of current pocket descriptors, alignment methods, and similarity search algorithms are presented. Lastly, an exhaustive survey of both retrospective and prospective applications in diverse medicinal chemistry scenarios illustrates the capability of the existing methods and the hurdle that still needs to be overcome for more accurate predictions.
Collapse
Affiliation(s)
| | - Didier Rognan
- Laboratoire d’Innovation Thérapeutique, UMR7200 CNRS-Université de Strasbourg, 67400 Illkirch, France
| |
Collapse
|
4
|
Riziotis IG, Thornton JM. Capturing the geometry, function, and evolution of enzymes with 3D templates. Protein Sci 2022; 31:e4363. [PMID: 35762726 PMCID: PMC9207746 DOI: 10.1002/pro.4363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 05/06/2022] [Accepted: 05/14/2022] [Indexed: 11/05/2022]
Abstract
Structural templates are 3D signatures representing protein functional sites, such as ligand binding cavities, metal coordination motifs, or catalytic sites. Here we explore methods to generate template libraries and algorithms to query structures for conserved 3D motifs. Applications of templates are discussed, as well as some exemplar cases for examining evolutionary links in enzymes. We also introduce the concept of using more than one template per structure to represent flexible sites, as an approach to better understand catalysis through snapshots captured in enzyme structures. Functional annotation from structure is an important topic that has recently resurfaced due to the new more accurate methods of protein structure prediction. Therefore, we anticipate that template-based functional site detection will be a powerful tool in the task of characterizing a vast number of new protein models.
Collapse
|
5
|
Lu Q. Identifying molecular structural features by pattern recognition methods. RSC Adv 2022; 12:17559-17569. [PMID: 35765452 PMCID: PMC9192268 DOI: 10.1039/d2ra00764a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 06/06/2022] [Indexed: 11/21/2022] Open
Abstract
Identification of molecular structural features is a central part of computational chemistry. It would be beneficial if pattern recognition techniques could be incorporated to facilitate the identification. Currently, the quantification of the structural dissimilarity is mainly carried out by root-mean-square-deviation (RMSD) calculations such as in molecular dynamics simulations. However, the RMSD calculation underperforms for large molecules, showing the so-called "curse of dimensionality" problem. Also, it requires consistent ordering of atoms in two comparing structures, which needs nontrivial effort to fulfill. In this work, we propose to take advantage of the point cloud recognition using convex hulls as the basis to recognize molecular structural features. Two advantages of the method can be highlighted. First, the dimension of the input data structure is largely reduced from the number of atoms of molecules to the number of atoms of convex hulls. Therefore, the dimensionality curse problem is avoided, and the atom ordering process is saved. Second, the construction of convex hulls can be used to define new molecular descriptors, such as the contact area of molecular interactions. These new molecular descriptors have different properties from existing ones, therefore they are expected to exhibit different behaviors for certain machine learning studies. Several illustrative applications have been carried out, which provide promising results for structure-activity studies.
Collapse
Affiliation(s)
- Qing Lu
- Beijing National Laboratory for Molecular Sciences, Institute of Chemistry, Chinese Academy of Sciences Beijing 100190 China
| |
Collapse
|
6
|
Riziotis IG, Ribeiro AJ, Borkakoti N, Thornton JM. Conformational variation in enzyme catalysis: A structural study on catalytic residues. J Mol Biol 2022; 434:167517. [PMID: 35240125 PMCID: PMC9005782 DOI: 10.1016/j.jmb.2022.167517] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 02/21/2022] [Accepted: 02/23/2022] [Indexed: 11/26/2022]
Abstract
We introduce a pipeline to compare and contrast active sites from homologous enzymes in 3D. Comprehensive structural study covering enzymes from a large functional space. High heterogeneity in magnitude of active site flexibililty between enzyme families. Diffferent catalytic residue types and functions relate to different degrees of flexibility. Four paradigms classify enzymes according to the structural behaviour during catalysis.
Conformational variation in catalytic residues can be captured as alternative snapshots in enzyme crystal structures. Addressing the question of whether active site flexibility is an intrinsic and essential property of enzymes for catalysis, we present a comprehensive study on the 3D variation of active sites of 925 enzyme families, using explicit catalytic residue annotations from the Mechanism and Catalytic Site Atlas and structural data from the Protein Data Bank. Through weighted pairwise superposition of the functional atoms of active sites, we captured structural variability at single-residue level and examined the geometrical changes as ligands bind or as mutations occur. We demonstrate that catalytic centres of enzymes can be inherently rigid or flexible to various degrees according to the function they perform, and structural variability most often involves a subset of the catalytic residues, usually those not directly involved in the formation or cleavage of bonds. Moreover, data suggest that 2/3 of active sites are flexible, and in half of those, flexibility is only observed in the side chain. The goal of this work is to characterise our current knowledge of the extent of flexibility at the heart of catalysis and ultimately place our findings in the context of the evolution of catalysis as enzymes evolve new functions and bind different substrates.
Collapse
|
7
|
Lu Q. Molecular structure recognition by blob detection. RSC Adv 2021; 11:35879-35886. [PMID: 35492772 PMCID: PMC9043223 DOI: 10.1039/d1ra05752a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 10/31/2021] [Indexed: 11/23/2022] Open
Abstract
Molecular structure recognition is fundamental in computational chemistry. The most common approach is to calculate the root mean square deviation (RMSD) between two sets of molecular coordinates. However, this method does not perform well for large molecules. In this work, a new method is proposed for structure comparison. Blob detection is used for recognizing structural features. Fragmentation of molecules is proposed as the pre-treatment. Mapping between blobs and atoms is developed as the post-treatment. A set of key parameters important for blob detections are determined. The dissimilarity is quantified by calculating the Euclidean metric of the blob vectors. The overall algorithm is found to be accurate to distinguish structural dissimilarity. The method has potential to be combined with other pattern recognition techniques for new chemistry discoveries. Molecular structure recognition is fundamental in computational chemistry.![]()
Collapse
Affiliation(s)
- Qing Lu
- Beijing National Laboratory for Molecular Sciences, Institute of Chemistry, Chinese Academy of Sciences Beijing 100190 China
| |
Collapse
|
8
|
Das S, Scholes HM, Sen N, Orengo C. CATH functional families predict functional sites in proteins. Bioinformatics 2021; 37:1099-1106. [PMID: 33135053 PMCID: PMC8150129 DOI: 10.1093/bioinformatics/btaa937] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 09/30/2020] [Accepted: 10/27/2020] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. AVAILABILITYAND IMPLEMENTATION https://github.com/UCL/cath-funsite-predictor. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sayoni Das
- PrecisionLife Ltd., Long Hanborough, OX29 8LJ Oxford, UK
| | - Harry M Scholes
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| |
Collapse
|
9
|
A mathematical representation of protein binding sites using structural dispersion of atoms from principal axes for classification of binding ligands. PLoS One 2021; 16:e0244905. [PMID: 33831020 PMCID: PMC8031081 DOI: 10.1371/journal.pone.0244905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 03/09/2021] [Indexed: 11/23/2022] Open
Abstract
Many researchers have studied the relationship between the biological functions of proteins and the structures of both their overall backbones of amino acids and their binding sites. A large amount of the work has focused on summarizing structural features of binding sites as scalar quantities, which can result in a great deal of information loss since the structures are three-dimensional. Additionally, a common way of comparing binding sites is via aligning their atoms, which is a computationally intensive procedure that substantially limits the types of analysis and modeling that can be done. In this work, we develop a novel encoding of binding sites as covariance matrices of the distances of atoms to the principal axes of the structures. This representation is invariant to the chosen coordinate system for the atoms in the binding sites, which removes the need to align the sites to a common coordinate system, is computationally efficient, and permits the development of probability models. These can then be used to both better understand groups of binding sites that bind to the same ligand and perform classification for these ligand groups. We demonstrate the utility of our method for discrimination of binding ligand through classification studies with two benchmark datasets using nearest mean and polytomous logistic regression classifiers.
Collapse
|
10
|
Ireland SM, Martin ACR. Zincbindpredict-Prediction of Zinc Binding Sites in Proteins. Molecules 2021; 26:molecules26040966. [PMID: 33673040 PMCID: PMC7918553 DOI: 10.3390/molecules26040966] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 01/26/2021] [Accepted: 02/09/2021] [Indexed: 11/21/2022] Open
Abstract
Background: Zinc binding proteins make up a significant proportion of the proteomes of most organisms and, within those proteins, zinc performs rôles in catalysis and structure stabilisation. Identifying the ability to bind zinc in a novel protein can offer insights into its functions and the mechanism by which it carries out those functions. Computational means of doing so are faster than spectroscopic means, allowing for searching at much greater speeds and scales, and thereby guiding complimentary experimental approaches. Typically, computational models of zinc binding predict zinc binding for individual residues rather than as a single binding site, and typically do not distinguish between different classes of binding site—missing crucial properties indicative of zinc binding. Methods: Previously, we created ZincBindDB, a continuously updated database of known zinc binding sites, categorised by family (the set of liganding residues). Here, we use this dataset to create ZincBindPredict, a set of machine learning methods to predict the most common zinc binding site families for both structure and sequence. Results: The models all achieve an MCC ≥ 0.88, recall ≥ 0.93 and precision ≥ 0.91 for the structural models (mean MCC = 0.97), while the sequence models have MCC ≥ 0.64, recall ≥ 0.80 and precision ≥ 0.83 (mean MCC = 0.87), with the models for binding sites containing four liganding residues performing much better than this. Conclusions: The predictors outperform competing zinc binding site predictors and are available online via a web interface and a GraphQL API.
Collapse
|
11
|
Bittrich S, Burley SK, Rose AS. Real-time structural motif searching in proteins using an inverted index strategy. PLoS Comput Biol 2020; 16:e1008502. [PMID: 33284792 PMCID: PMC7746303 DOI: 10.1371/journal.pcbi.1008502] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 12/17/2020] [Accepted: 11/09/2020] [Indexed: 12/30/2022] Open
Abstract
Biochemical and biological functions of proteins are the product of both the overall fold of the polypeptide chain, and, typically, structural motifs made up of smaller numbers of amino acids constituting a catalytic center or a binding site that may be remote from one another in amino acid sequence. Detection of such structural motifs can provide valuable insights into the function(s) of previously uncharacterized proteins. Technically, this remains an extremely challenging problem because of the size of the Protein Data Bank (PDB) archive. Existing methods depend on a clustering by sequence similarity and can be computationally slow. We have developed a new approach that uses an inverted index strategy capable of analyzing >170,000 PDB structures with unmatched speed. The efficiency of the inverted index method depends critically on identifying the small number of structures containing the query motif and ignoring most of the structures that are irrelevant. Our approach (implemented at motif.rcsb.org) enables real-time retrieval and superposition of structural motifs, either extracted from a reference structure or uploaded by the user. Herein, we describe the method and present five case studies that exemplify its efficacy and speed for analyzing 3D structures of both proteins and nucleic acids. The Protein Data Bank (PDB) provides open access to more than 170,000 three-dimensional structures of proteins, nucleic acids, and biological complexes. Similarities between PDB structures give valuable functional and evolutionary insights but such resemblance may not be evident at sequence or global structure level. Throughout the database, there are recurring structural motifs—groups of modest numbers of residues in proximity that, for example, support catalytic activity. Identification of common structural motifs can reveal similarities between proteins and serve as fingerprints for spatial configurations of amino acids, such as the His-Asp-Ser catalytic triad found in serine proteases or the zinc coordination site found in Zinc Finger DNA-binding domains. We present a highly efficient yet flexible strategy that allows users for the first time to search for arbitrary structural motifs across the entire PDB archive in real-time. Our approach scales favorably with the increasing number and complexity of deposited structures, and, also, has the potential to be adapted for other applications in a macromolecular context.
Collapse
Affiliation(s)
- Sebastian Bittrich
- RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, California, USA
- * E-mail:
| | - Stephen K. Burley
- RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, California, USA
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, New Jersey, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California, USA
| | - Alexander S. Rose
- RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, California, USA
| |
Collapse
|
12
|
Torng W, Altman RB. High precision protein functional site detection using 3D convolutional neural networks. Bioinformatics 2020; 35:1503-1512. [PMID: 31051039 PMCID: PMC6499237 DOI: 10.1093/bioinformatics/bty813] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 08/14/2018] [Accepted: 09/19/2018] [Indexed: 12/02/2022] Open
Abstract
Motivation Accurate annotation of protein functions is fundamental for understanding molecular and cellular physiology. Data-driven methods hold promise for systematically deriving rules underlying the relationship between protein structure and function. However, the choice of protein structural representation is critical. Pre-defined biochemical features emphasize certain aspects of protein properties while ignoring others, and therefore may fail to capture critical information in complex protein sites. Results In this paper, we present a general framework that applies 3D convolutional neural networks (3DCNNs) to structure-based protein functional site detection. The framework can extract task-dependent features automatically from the raw atom distributions. We benchmarked our method against other methods and demonstrate better or comparable performance for site detection. Our deep 3DCNNs achieved an average recall of 0.955 at a precision threshold of 0.99 on PROSITE families, detected 98.89 and 92.88% of nitric oxide synthase and TRYPSIN-like enzyme sites in Catalytic Site Atlas, and showed good performance on challenging cases where sequence motifs are absent but a function is known to exist. Finally, we inspected the individual contributions of each atom to the classification decisions and show that our models successfully recapitulate known 3D features within protein functional sites. Availability and implementation The 3DCNN models described in this paper are available at https://simtk.org/projects/fscnn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wen Torng
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Russ B Altman
- Department of Bioengineering, Stanford University, Stanford, CA, USA.,Department of Genetics, Stanford University, Stanford, CA, USA
| |
Collapse
|
13
|
Rey J, Rasolohery I, Tufféry P, Guyon F, Moroy G. PatchSearch: a web server for off-target protein identification. Nucleic Acids Res 2020; 47:W365-W372. [PMID: 31131411 PMCID: PMC6602448 DOI: 10.1093/nar/gkz478] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 04/26/2019] [Accepted: 05/21/2019] [Indexed: 01/17/2023] Open
Abstract
The large number of proteins found in the human body implies that a drug may interact with many proteins, called off-target proteins, besides its intended target. The PatchSearch web server provides an automated workflow that allows users to identify structurally conserved binding sites at the protein surfaces in a set of user-supplied protein structures. Thus, this web server may help to detect potential off-target protein. It takes as input a protein complexed with a ligand and identifies within user-defined or predefined collections of protein structures, those having a binding site compatible with this ligand in terms of geometry and physicochemical properties. It is based on a non-sequential local alignment of the patch over the entire protein surface. Then the PatchSearch web server proposes a ligand binding mode for the potential off-target, as well as an estimated affinity calculated by the Vinardo scoring function. This novel tool is able to efficiently detects potential interactions of ligands with distant off-target proteins. Furthermore, by facilitating the discovery of unexpected off-targets, PatchSearch could contribute to the repurposing of existing drugs. The server is freely available at http://bioserv.rpbs.univ-paris-diderot.fr/services/PatchSearch.
Collapse
Affiliation(s)
- Julien Rey
- Université Paris Diderot, Sorbonne Paris Cité, INSERM UMRS-973, Molécules Thérapeutiques in silico (MTi), F-75205 Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Inès Rasolohery
- Université Paris Diderot, Sorbonne Paris Cité, INSERM UMRS-973, Molécules Thérapeutiques in silico (MTi), F-75205 Paris, France
| | - Pierre Tufféry
- Université Paris Diderot, Sorbonne Paris Cité, INSERM UMRS-973, Molécules Thérapeutiques in silico (MTi), F-75205 Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Frédéric Guyon
- Université Paris Diderot, Sorbonne Paris Cité, INSERM UMRS-973, Molécules Thérapeutiques in silico (MTi), F-75205 Paris, France
| | - Gautier Moroy
- Université Paris Diderot, Sorbonne Paris Cité, INSERM UMRS-973, Molécules Thérapeutiques in silico (MTi), F-75205 Paris, France
| |
Collapse
|
14
|
Ribeiro AJM, Holliday GL, Furnham N, Tyzack JD, Ferris K, Thornton JM. Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites. Nucleic Acids Res 2019; 46:D618-D623. [PMID: 29106569 PMCID: PMC5753290 DOI: 10.1093/nar/gkx1012] [Citation(s) in RCA: 120] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/13/2017] [Indexed: 12/28/2022] Open
Abstract
M-CSA (Mechanism and Catalytic Site Atlas) is a database of enzyme active sites and reaction mechanisms that can be accessed at www.ebi.ac.uk/thornton-srv/m-csa. Our objectives with M-CSA are to provide an open data resource for the community to browse known enzyme reaction mechanisms and catalytic sites, and to use the dataset to understand enzyme function and evolution. M-CSA results from the merging of two existing databases, MACiE (Mechanism, Annotation and Classification in Enzymes), a database of enzyme mechanisms, and CSA (Catalytic Site Atlas), a database of catalytic sites of enzymes. We are releasing M-CSA as a new website and underlying database architecture. At the moment, M-CSA contains 961 entries, 423 of these with detailed mechanism information, and 538 with information on the catalytic site residues only. In total, these cover 81% (195/241) of third level EC numbers with a PDB structure, and 30% (840/2793) of fourth level EC numbers with a PDB structure, out of 6028 in total. By searching for close homologues, we are able to extend M-CSA coverage of PDB and UniProtKB to 51 993 structures and to over five million sequences, respectively, of which about 40% and 30% have a conserved active site.
Collapse
Affiliation(s)
- António J M Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gemma L Holliday
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 1HT, UK
| | - Jonathan D Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Katherine Ferris
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
15
|
Kaiser F, Labudde D. Unsupervised Discovery of Geometrically Common Structural Motifs and Long-Range Contacts in Protein 3D Structures. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:671-680. [PMID: 29990265 DOI: 10.1109/tcbb.2017.2786250] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The essential role of small evolutionarily conserved structural units in proteins has been extensively researched and validated. A popular example are serine proteases, where the peptide cleavage reaction is realized by a configuration of only three residues. Brought to spatial proximity during the protein folding process, such structural motifs are often long-range contacts and usually hard to detect at sequence level. Due to the constantly increasing resource of protein 3D structure data, the computational identification of structural motifs can contribute significantly to the understanding of protein fold and function. Thus, we propose a method to discover structural motifs of high geometrical similarity and desired sequence separation in protein 3D structure data. By utilizing methods originated from data mining, no a priori knowledge is required. The applicability of the method is demonstrated by the identification of the catalytic unit of serine proteases and the ion-coordination center of cupredoxins. Furthermore, large-scale analysis of the entire Protein Data Bank points towards the presence of ubiquitous structural motifs, independent of any specific fold or function. We envision that our method is suitable to uncover functional mechanisms and to derive fingerprint libraries of structural motifs, which could be used to assess protein family association.
Collapse
|
16
|
Jiang T, Renfrew PD, Drew K, Youngs N, Butterfoss GL, Bonneau R, Shasha DN. An adaptive geometric search algorithm for macromolecular scaffold selection. Protein Eng Des Sel 2018; 31:345-354. [PMID: 30407584 PMCID: PMC6373690 DOI: 10.1093/protein/gzy028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 10/05/2018] [Indexed: 11/14/2022] Open
Abstract
A wide variety of protein and peptidomimetic design tasks require matching functional 3D motifs to potential oligomeric scaffolds. For example, during enzyme design, one aims to graft active-site patterns-typically consisting of 3-15 residues-onto new protein surfaces. Identifying protein scaffolds suitable for such active-site engraftment requires costly searches for protein folds that provide the correct side chain positioning to host the desired active site. Other examples of biodesign tasks that require similar fast exact geometric searches of potential side chain positioning include mimicking binding hotspots, design of metal binding clusters and the design of modular hydrogen binding networks for specificity. In these applications, the speed and scaling of geometric searches limits the scope of downstream design to small patterns. Here, we present an adaptive algorithm capable of searching for side chain take-off angles, which is compatible with an arbitrarily specified functional pattern and which enjoys substantive performance improvements over previous methods. We demonstrate this method in both genetically encoded (protein) and synthetic (peptidomimetic) design scenarios. Examples of using this method with the Rosetta framework for protein design are provided. Our implementation is compatible with multiple protein design frameworks and is freely available as a set of python scripts (https://github.com/JiangTian/adaptive-geometric-search-for-protein-design).
Collapse
Affiliation(s)
- Tian Jiang
- Computer ScienceDepartment, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - P Douglas Renfrew
- Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, UAE
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Kevin Drew
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA
| | - Noah Youngs
- Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Glenn L Butterfoss
- Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Richard Bonneau
- Computer ScienceDepartment, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
- Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, UAE
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Den Nis Shasha
- Computer ScienceDepartment, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| |
Collapse
|
17
|
Muthu Krishnan S. Using Chou's general PseAAC to analyze the evolutionary relationship of receptor associated proteins (RAP) with various folding patterns of protein domains. J Theor Biol 2018; 445:62-74. [DOI: 10.1016/j.jtbi.2018.02.008] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 01/24/2018] [Accepted: 02/12/2018] [Indexed: 01/31/2023]
|
18
|
Anders G, Hassiepen U, Theisgen S, Heymann S, Muller L, Panigada T, Huster D, Samsonov SA. The Intrinsic Pepsin Resistance of Interleukin-8 Can Be Explained from a Combined Bioinformatical and Experimental Approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:300-308. [PMID: 28113517 DOI: 10.1109/tcbb.2016.2614821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Interleukin-8 (IL-8, CXCL8) is a neutrophil chemotactic factor belonging to the family of chemokines. IL-8 was shown to resist pepsin cleavage displaying its high resistance to this protease. However, the molecular mechanisms underlying this resistance are not fully understood. Using our in-house database containing the data on three-dimensional arrangements of secondary structure elements from the whole Protein Data Bank, we found a striking structural similarity between IL-8 and pepsin inhibitor-3. Such similarity could play a key role in understanding IL-8 resistance to the protease pepsin. To support this hypothesis, we applied pepsin assays confirming that intact IL-8 is not degraded by pepsin in comparison to IL-8 in a denaturated state. Applying 1H-15N Heteronuclear Single Quantum Coherence NMR measurements, we determined the putative regions at IL-8 that are potentially responsible for interactions with the pepsin. The results obtained in this work contribute to the understanding of the resistance of IL-8 to pepsin proteolysis in terms of its structural properties.
Collapse
|
19
|
Sánchez DA, Tonetto GM, Ferreira ML. Burkholderia cepacia
lipase: A versatile catalyst in synthesis reactions. Biotechnol Bioeng 2017; 115:6-24. [DOI: 10.1002/bit.26458] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 07/14/2017] [Accepted: 09/21/2017] [Indexed: 12/19/2022]
Affiliation(s)
- Daniel A. Sánchez
- Planta Piloto de Ingeniería Química (PLAPIQUI), Universidad Nacional del Sur; CONICET; Bahía Blanca Argentina
| | - Gabriela M. Tonetto
- Planta Piloto de Ingeniería Química (PLAPIQUI), Universidad Nacional del Sur; CONICET; Bahía Blanca Argentina
| | - María L. Ferreira
- Planta Piloto de Ingeniería Química (PLAPIQUI), Universidad Nacional del Sur; CONICET; Bahía Blanca Argentina
| |
Collapse
|
20
|
Rasolohery I, Moroy G, Guyon F. PatchSearch: A Fast Computational Method for Off-Target Detection. J Chem Inf Model 2017; 57:769-777. [PMID: 28282119 DOI: 10.1021/acs.jcim.6b00529] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Many therapeutic molecules are known to bind several proteins, which can be different from the initially targeted one. Such unexpected interactions with proteins called off-targets can lead to adverse effects. Potential off-target identification is important to predict to avoid drug side effects or to discover new targets for existing drugs. We propose a new program named PatchSearch that implements local nonsequential searching for similar binding sites on protein surfaces with a controlled amount of flexibility. It is based on detection of quasi-cliques in product graphs representing all the possible matchings between two compared structures. This method has been benchmarked on a large diversity of ligands and on five data sets ranging from 12 to more than 7000 protein structures. The experiments conducted in this study show that the PatchSearch method could be useful in the early identification of off-targets. The program and the benchmarks presented in this paper are available as an R package at https://github.com/MTiPatchSearch .
Collapse
Affiliation(s)
- Inès Rasolohery
- Molécules Thérapeutiques in Silico, UMRS 973, Université Paris Diderot, INSERM , F-75013 Paris, France
| | - Gautier Moroy
- Molécules Thérapeutiques in Silico, UMRS 973, Université Paris Diderot, INSERM , F-75013 Paris, France
| | - Frédéric Guyon
- Molécules Thérapeutiques in Silico, UMRS 973, Université Paris Diderot, INSERM , F-75013 Paris, France
| |
Collapse
|
21
|
Ahmed HR, Glasgow JI. The Agile particle swarm optimizer applied to proteomic pattern matching and discovery. Soft comput 2016. [DOI: 10.1007/s00500-015-1769-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
22
|
CRHunter: integrating multifaceted information to predict catalytic residues in enzymes. Sci Rep 2016; 6:34044. [PMID: 27665935 PMCID: PMC5036049 DOI: 10.1038/srep34044] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 09/07/2016] [Indexed: 11/08/2022] Open
Abstract
A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.
Collapse
|
23
|
Ruiz-Gómez G, Hawkins JC, Philipp J, Künze G, Wodtke R, Löser R, Fahmy K, Pisabarro MT. Rational Structure-Based Rescaffolding Approach to De Novo Design of Interleukin 10 (IL-10) Receptor-1 Mimetics. PLoS One 2016; 11:e0154046. [PMID: 27123592 PMCID: PMC4849758 DOI: 10.1371/journal.pone.0154046] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Accepted: 04/07/2016] [Indexed: 12/25/2022] Open
Abstract
Tackling protein interfaces with small molecules capable of modulating protein-protein interactions remains a challenge in structure-based ligand design. Particularly arduous are cases in which the epitopes involved in molecular recognition have a non-structured and discontinuous nature. Here, the basic strategy of translating continuous binding epitopes into mimetic scaffolds cannot be applied, and other innovative approaches are therefore required. We present a structure-based rational approach involving the use of a regular expression syntax inspired in the well established PROSITE to define minimal descriptors of geometric and functional constraints signifying relevant functionalities for recognition in protein interfaces of non-continuous and unstructured nature. These descriptors feed a search engine that explores the currently available three-dimensional chemical space of the Protein Data Bank (PDB) in order to identify in a straightforward manner regular architectures containing the desired functionalities, which could be used as templates to guide the rational design of small natural-like scaffolds mimicking the targeted recognition site. The application of this rescaffolding strategy to the discovery of natural scaffolds incorporating a selection of functionalities of interleukin-10 receptor-1 (IL-10R1), which are relevant for its interaction with interleukin-10 (IL-10) has resulted in the de novo design of a new class of potent IL-10 peptidomimetic ligands.
Collapse
Affiliation(s)
- Gloria Ruiz-Gómez
- Structural Bioinformatics, BIOTEC TU Dresden, Tatzberg, Dresden, Germany
- * E-mail: (GRG); (MTB)
| | - John C. Hawkins
- Structural Bioinformatics, BIOTEC TU Dresden, Tatzberg, Dresden, Germany
| | - Jenny Philipp
- Helmholtz-Zentrum Dresden Rossendorf, Institute of Resource Ecology, Dresden, Germany
| | - Georg Künze
- Institute of Medical Physics and Biophysics, University of Leipzig, Leipzig, Germany
| | - Robert Wodtke
- Helmholtz-Zentrum Dresden Rossendorf, Institute of Radiopharmaceutical Cancer Research, Dresden, Germany
| | - Reik Löser
- Helmholtz-Zentrum Dresden Rossendorf, Institute of Radiopharmaceutical Cancer Research, Dresden, Germany
| | - Karim Fahmy
- Helmholtz-Zentrum Dresden Rossendorf, Institute of Resource Ecology, Dresden, Germany
| | - M. Teresa Pisabarro
- Structural Bioinformatics, BIOTEC TU Dresden, Tatzberg, Dresden, Germany
- * E-mail: (GRG); (MTB)
| |
Collapse
|
24
|
SÁENZ-SUÁREZ H, POUTOU-PIÑALES RA, GONZÁLEZ-SANTOS J, BARRETO GE, RIETO-NAVARRERA LP, SÁENZ-MORENO JA, LANDÁZURI P, BARRERA-AVELLANEDA LA. Prediction of glycation sites: new insights from protein structural analysis. Turk J Biol 2016. [DOI: 10.3906/biy-1501-71] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
|
25
|
Ramirez-Manzanares A, Peña J, Azpiroz JM, Merino G. A hierarchical algorithm for molecular similarity (H-FORMS). J Comput Chem 2015; 36:1456-66. [DOI: 10.1002/jcc.23947] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Revised: 04/24/2015] [Accepted: 04/27/2015] [Indexed: 11/10/2022]
Affiliation(s)
- Alonso Ramirez-Manzanares
- Computer Science Department; Centro de Investigación en Matemáticas, A.C. Callejón Jalisco S/N; Guanajuato Gto Mexico
| | - Joaquin Peña
- Computer Science Department; Centro de Investigación en Matemáticas, A.C. Callejón Jalisco S/N; Guanajuato Gto Mexico
| | - Jon M. Azpiroz
- Kimika Fakultatea, Euskal Herriko Unibertsitatea (UPV/EHU); Donostia International Physics Center (DIPC); P. K. 1072 20080 Donostia Euskadi Spain
- Computational Laboratory for Hybrid/Organic Photovoltaics (CLHYO); CNR-ISTM; Via Elce di Sotto 8 06123 Perugia Italy
| | - Gabriel Merino
- Departamento de Física Aplicada; Centro de Investigación y de Estudios Avanzados Unidad Mérida. Km 6 Antigua carretera a Progreso. Apdo. Postal 73; Cordemex 97310 Mérida Yuc México
| |
Collapse
|
26
|
Abu Seman N, He B, Ojala JRM, Wan Mohamud WN, Östenson CG, Brismar K, Gu HF. Genetic and biological effects of sodium-chloride cotransporter (SLC12A3) in diabetic nephropathy. Am J Nephrol 2014; 40:408-16. [PMID: 25401745 DOI: 10.1159/000368916] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Accepted: 10/06/2014] [Indexed: 11/19/2022]
Abstract
BACKGROUND/AIMS Solute carrier family 12 member 3 (SLC12A3) encodes a sodium/chloride transporter in kidneys. Previous reports suggest that Arg913Gln polymorphism in this gene is associated with diabetic nephropathy (DN), but the data appear to be inconsistent. Up to now, there is no biological evidence concerning the effects of SLC12A3 in DN. In this study, we aim to evaluate the genetic effects of the SLC12A3 gene and its Arg913Gln polymorphism with genetic and functional analyses. METHODS We genotyped SLC12A3 genetic polymorphisms including Arg913Gln in 784 non-diabetes controls and 633 type 2 diabetes (T2D) subjects with or without DN in a Malaysian population and performed a meta-analysis of the present and previous studies. We further analyzed the role of slc12a3 in kidney development and progress of DN in zebrafish and db/db mice. RESULTS We found that SLC12A3 Arg913Gln polymorphism was associated with T2D (p = 0.028, OR = 0.772, 95% CI = 0.612-0.973) and DN (p = 0.038, OR = 0.547, 95% CI = 0.308-0.973) in the Malaysian cohort. The meta-analysis confirmed the protective effects of SLC12A3 913Gln allele in DN (Z-value = -1.992, p = 0.046, OR = 0.792). Furthermore, with knockdown of zebrafish ortholog, slc12a3 led to structural abnormality of kidney pronephric distal duct at 1-cell stage. Slc12a3 mRNA and protein expression levels were upregulated in kidneys of db/db mice from 6, 12, and 26 weeks at the age. CONCLUSION The present study provided the first biological and further genetic evidence that SLC12A3 has genetic susceptibility in the development of DN, while the minor 913Gln allele in this gene confers a protective effect in the disease.
Collapse
Affiliation(s)
- Norhashimah Abu Seman
- Rolf Luft Research Center for Diabetes and Endocrinology, Department of Molecular Medicine and Surgery, Karolinska University Hospital, Karolinska Institutet, Stockholm, Sweden
| | | | | | | | | | | | | |
Collapse
|
27
|
Izidoro SC, de Melo-Minardi RC, Pappa GL. GASS: identifying enzyme active sites with genetic algorithms. ACTA ACUST UNITED AC 2014; 31:864-70. [PMID: 25388152 DOI: 10.1093/bioinformatics/btu746] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
MOTIVATION Currently, 25% of proteins annotated in Pfam have their function unknown. One way of predicting proteins function is by looking at their active site, which has two main parts: the catalytic site and the substrate binding site. The active site is more conserved than the other residues of the protein and can be a rich source of information for protein function prediction. This article presents a new heuristic method, named genetic active site search (GASS), which searches for given active site 3D templates in unknown proteins. The method can perform non-exact amino acid matches (conservative mutations), is able to find amino acids in different chains and does not impose any restrictions on the active site size. RESULTS GASS results were compared with those catalogued in the catalytic site atlas (CSA) in four different datasets and compared with two other methods: amino acid pattern search for substructures and motif and catalytic site identification. The results show GASS can correctly identify >90% of the templates searched. Experiments were also run using data from the substrate binding sites prediction competition CASP 10, and GASS is ranked fourth among the 18 methods considered.
Collapse
Affiliation(s)
- Sandro C Izidoro
- Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil
| | - Raquel C de Melo-Minardi
- Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil
| | - Gisele L Pappa
- Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil
| |
Collapse
|
28
|
Flores DI, Sotelo-Mundo RR, Brizuela CA. A simple extension to the CMASA method for the prediction of catalytic residues in the presence of single point mutations. PLoS One 2014; 9:e108513. [PMID: 25268770 PMCID: PMC4182483 DOI: 10.1371/journal.pone.0108513] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Accepted: 08/31/2014] [Indexed: 11/23/2022] Open
Abstract
The automatic identification of catalytic residues still remains an important challenge in structural bioinformatics. Sequence-based methods are good alternatives when the query shares a high percentage of identity with a well-annotated enzyme. However, when the homology is not apparent, which occurs with many structures from the structural genome initiative, structural information should be exploited. A local structural comparison is preferred to a global structural comparison when predicting functional residues. CMASA is a recently proposed method for predicting catalytic residues based on a local structure comparison. The method achieves high accuracy and a high value for the Matthews correlation coefficient. However, point substitutions or a lack of relevant data strongly affect the performance of the method. In the present study, we propose a simple extension to the CMASA method to overcome this difficulty. Extensive computational experiments are shown as proof of concept instances, as well as for a few real cases. The results show that the extension performs well when the catalytic site contains mutated residues or when some residues are missing. The proposed modification could correctly predict the catalytic residues of a mutant thymidylate synthase, 1EVF. It also successfully predicted the catalytic residues for 3HRC despite the lack of information for a relevant side chain atom in the PDB file.
Collapse
Affiliation(s)
- David I. Flores
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada, Ensenada, Baja California, México
| | | | - Carlos A. Brizuela
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada, Ensenada, Baja California, México
- * E-mail:
| |
Collapse
|
29
|
Brylinski M. eMatchSite: sequence order-independent structure alignments of ligand binding pockets in protein models. PLoS Comput Biol 2014; 10:e1003829. [PMID: 25232727 PMCID: PMC4168975 DOI: 10.1371/journal.pcbi.1003829] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2014] [Accepted: 07/24/2014] [Indexed: 01/26/2023] Open
Abstract
Detecting similarities between ligand binding sites in the absence of global homology between target proteins has been recognized as one of the critical components of modern drug discovery. Local binding site alignments can be constructed using sequence order-independent techniques, however, to achieve a high accuracy, many current algorithms for binding site comparison require high-quality experimental protein structures, preferably in the bound conformational state. This, in turn, complicates proteome scale applications, where only various quality structure models are available for the majority of gene products. To improve the state-of-the-art, we developed eMatchSite, a new method for constructing sequence order-independent alignments of ligand binding sites in protein models. Large-scale benchmarking calculations using adenine-binding pockets in crystal structures demonstrate that eMatchSite generates accurate alignments for almost three times more protein pairs than SOIPPA. More importantly, eMatchSite offers a high tolerance to structural distortions in ligand binding regions in protein models. For example, the percentage of correctly aligned pairs of adenine-binding sites in weakly homologous protein models is only 4–9% lower than those aligned using crystal structures. This represents a significant improvement over other algorithms, e.g. the performance of eMatchSite in recognizing similar binding sites is 6% and 13% higher than that of SiteEngine using high- and moderate-quality protein models, respectively. Constructing biologically correct alignments using predicted ligand binding sites in protein models opens up the possibility to investigate drug-protein interaction networks for complete proteomes with prospective systems-level applications in polypharmacology and rational drug repositioning. eMatchSite is freely available to the academic community as a web-server and a stand-alone software distribution at http://www.brylinski.org/ematchsite.
Collapse
Affiliation(s)
- Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, United States of America
- Center for Computation & Technology, Louisiana State University, Baton Rouge, Louisiana, United States of America
- * E-mail:
| |
Collapse
|
30
|
Micale G, Pulvirenti A, Giugno R, Ferro A. Proteins comparison through probabilistic optimal structure local alignment. Front Genet 2014; 5:302. [PMID: 25228906 PMCID: PMC4151033 DOI: 10.3389/fgene.2014.00302] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2014] [Accepted: 08/12/2014] [Indexed: 11/13/2022] Open
Abstract
Multiple local structure comparison helps to identify common structural motifs or conserved binding sites in 3D structures in distantly related proteins. Since there is no best way to compare structures and evaluate the alignment, a wide variety of techniques and different similarity scoring schemes have been proposed. Existing algorithms usually compute the best superposition of two structures or attempt to solve it as an optimization problem in a simpler setting (e.g., considering contact maps or distance matrices). Here, we present PROPOSAL (PROteins comparison through Probabilistic Optimal Structure local ALignment), a stochastic algorithm based on iterative sampling for multiple local alignment of protein structures. Our method can efficiently find conserved motifs across a set of protein structures. Only the distances between all pairs of residues in the structures are computed. To show the accuracy and the effectiveness of PROPOSAL we tested it on a few families of protein structures. We also compared PROPOSAL with two state-of-the-art tools for pairwise local alignment on a dataset of manually annotated motifs. PROPOSAL is available as a Java 2D standalone application or a command line program at http://ferrolab.dmi.unict.it/proposal/proposal.html.
Collapse
Affiliation(s)
- Giovanni Micale
- Department of Computer Science, University of Pisa Pisa, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Molecular Biomedicine, University of Catania Catania, Italy
| | - Rosalba Giugno
- Department of Clinical and Molecular Biomedicine, University of Catania Catania, Italy
| | - Alfredo Ferro
- Department of Clinical and Molecular Biomedicine, University of Catania Catania, Italy
| |
Collapse
|
31
|
Oldfield T. Mathematical Data Mining. CRYSTALLOGR REV 2014. [DOI: 10.1080/08893110410001664909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
32
|
Saberi Fathi SM, Tuszynski JA. A simple method for finding a protein's ligand-binding pockets. BMC STRUCTURAL BIOLOGY 2014; 14:18. [PMID: 25038637 PMCID: PMC4112621 DOI: 10.1186/1472-6807-14-18] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Accepted: 07/11/2014] [Indexed: 12/03/2022]
Abstract
BACKGROUND This paper provides a simple and rapid method for a protein-clustering strategy. The basic idea implemented here is to use computational geometry methods to predict and characterize ligand-binding pockets of a given protein structure. In addition to geometrical characteristics of the protein structure, we consider some simple biochemical properties that help recognize the best candidates for pockets in a protein's active site. RESULTS Our results are shown to produce good agreement with known empirical results. CONCLUSIONS The method presented in this paper is a low-cost rapid computational method that could be used to classify proteins and other biomolecules, and furthermore could be useful in reducing the cost and time of drug discovery.
Collapse
Affiliation(s)
| | - Jack A Tuszynski
- Department of Physics, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
33
|
Steinkellner G, Gruber CC, Pavkov-Keller T, Binter A, Steiner K, Winkler C, Łyskowski A, Schwamberger O, Oberer M, Schwab H, Faber K, Macheroux P, Gruber K. Identification of promiscuous ene-reductase activity by mining structural databases using active site constellations. Nat Commun 2014; 5:4150. [PMID: 24954722 PMCID: PMC4083419 DOI: 10.1038/ncomms5150] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Accepted: 05/15/2014] [Indexed: 12/03/2022] Open
Abstract
The exploitation of catalytic promiscuity and the application of de novo design have recently opened the access to novel, non-natural enzymatic activities. Here we describe a structural bioinformatic method for predicting catalytic activities of enzymes based on three-dimensional constellations of functional groups in active sites ('catalophores'). As a proof-of-concept we identify two enzymes with predicted promiscuous ene-reductase activity (reduction of activated C-C double bonds) and compare them with known ene-reductases, that is, members of the Old Yellow Enzyme family. Despite completely different amino acid sequences, overall structures and protein folds, high-resolution crystal structures reveal equivalent binding modes of typical Old Yellow Enzyme substrates and ligands. Biochemical and biocatalytic data show that the two enzymes indeed possess ene-reductase activity and reveal an inverted stereopreference compared with Old Yellow Enzymes for some substrates. This method could thus be a tool for the identification of viable starting points for the development and engineering of novel biocatalysts.
Collapse
Affiliation(s)
- Georg Steinkellner
- ACIB GmbH, Petersgasse 14, 8010 Graz, Austria
- These authors contributed equally to this work
| | - Christian C. Gruber
- ACIB GmbH, Petersgasse 14, 8010 Graz, Austria
- These authors contributed equally to this work
| | | | | | | | - Christoph Winkler
- Department of Chemistry, University of Graz, Heinrichstrasse 28, 8010 Graz, Austria
| | | | - Orsolya Schwamberger
- ACIB GmbH, Petersgasse 14, 8010 Graz, Austria
- Institute of Molecular Biosciences, University of Graz, Humboldtstrasse 50, 8010 Graz, Austria
| | - Monika Oberer
- Institute of Molecular Biosciences, University of Graz, Humboldtstrasse 50, 8010 Graz, Austria
| | - Helmut Schwab
- ACIB GmbH, Petersgasse 14, 8010 Graz, Austria
- Institute of Molecular Biotechnology, Graz University of Technology, Petersgasse 14, 8010 Graz, Austria
| | - Kurt Faber
- ACIB GmbH, Petersgasse 14, 8010 Graz, Austria
- Department of Chemistry, University of Graz, Heinrichstrasse 28, 8010 Graz, Austria
| | - Peter Macheroux
- ACIB GmbH, Petersgasse 14, 8010 Graz, Austria
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12, 8010 Graz, Austria
| | - Karl Gruber
- ACIB GmbH, Petersgasse 14, 8010 Graz, Austria
- Institute of Molecular Biosciences, University of Graz, Humboldtstrasse 50, 8010 Graz, Austria
| |
Collapse
|
34
|
Buturovic L, Wong M, Tang GW, Altman RB, Petkovic D. High precision prediction of functional sites in protein structures. PLoS One 2014; 9:e91240. [PMID: 24632601 PMCID: PMC3954699 DOI: 10.1371/journal.pone.0091240] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2013] [Accepted: 02/11/2014] [Indexed: 11/29/2022] Open
Abstract
We address the problem of assigning biological function to solved protein structures. Computational tools play a critical role in identifying potential active sites and informing screening decisions for further lab analysis. A critical parameter in the practical application of computational methods is the precision, or positive predictive value. Precision measures the level of confidence the user should have in a particular computed functional assignment. Low precision annotations lead to futile laboratory investigations and waste scarce research resources. In this paper we describe an advanced version of the protein function annotation system FEATURE, which achieved 99% precision and average recall of 95% across 20 representative functional sites. The system uses a Support Vector Machine classifier operating on the microenvironment of physicochemical features around an amino acid. We also compared performance of our method with state-of-the-art sequence-level annotator Pfam in terms of precision, recall and localization. To our knowledge, no other functional site annotator has been rigorously evaluated against these key criteria. The software and predictive models are incorporated into the WebFEATURE service at http://feature.stanford.edu/wf4.0-beta.
Collapse
Affiliation(s)
- Ljubomir Buturovic
- Department of Computer Science, San Francisco State University, San Francisco, California, United States of America
- * E-mail:
| | - Mike Wong
- Center for Computing for Life Sciences, San Francisco State University, San Francisco, California, United States of America
| | - Grace W. Tang
- Department of Bioengineering, Stanford University, Stanford, California, United States of America
| | - Russ B. Altman
- Department of Bioengineering, Stanford University, Stanford, California, United States of America
| | - Dragutin Petkovic
- Department of Computer Science, San Francisco State University, San Francisco, California, United States of America
- Center for Computing for Life Sciences, San Francisco State University, San Francisco, California, United States of America
| |
Collapse
|
35
|
Abstract
The amount of known protein structures is continuously growing, exhibited in over 95,000 3D structures freely available via the PDB. Over the last decade, pharmaceutical research has sparked interest in computationally extracting information from this large data pool, resulting in a homology-driven knowledge transfer from annotated to new structures. Studying protein structures with respect to understanding and modulating their functional behavior means analyzing their centers of action. Therefore, the detection and description of potential binding sites on the protein surface is a major step towards protein classification and assessment. Subsequently, these representations can be incorporated to compare proteins, and to predict their druggability or function. Especially in the context of target identification and polypharmacology, automated tools for large-scale target comparisons are highly needed. In this article, developments for automated structure-based target assessment are reviewed and remaining challenges as well as future perspectives are discussed.
Collapse
|
36
|
Abstract
The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.
Collapse
Affiliation(s)
- Abel Rodriguez
- University of California, Santa Cruz and Duke University
| | | |
Collapse
|
37
|
Nemoto W, Saito A, Oikawa H. Recent advances in functional region prediction by using structural and evolutionary information - Remaining problems and future extensions. Comput Struct Biotechnol J 2013; 8:e201308007. [PMID: 24688747 PMCID: PMC3962155 DOI: 10.5936/csbj.201308007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 11/12/2013] [Accepted: 11/13/2013] [Indexed: 11/22/2022] Open
Abstract
Structural genomics projects have solved many new structures with unknown functions. One strategy to investigate the function of a structure is to computationally find the functionally important residues or regions on it. Therefore, the development of functional region prediction methods has become an important research subject. An effective approach is to use a method employing structural and evolutionary information, such as the evolutionary trace (ET) method. ET ranks the residues of a protein structure by calculating the scores for relative evolutionary importance, and locates functionally important sites by identifying spatial clusters of highly ranked residues. After ET was developed, numerous ET-like methods were subsequently reported, and many of them are in practical use, although they require certain conditions. In this mini review, we first introduce the remaining problems and the recent improvements in the methods using structural and evolutionary information. We then summarize the recent developments of the methods. Finally, we conclude by describing possible extensions of the evolution- and structure-based methods.
Collapse
Affiliation(s)
- Wataru Nemoto
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Akira Saito
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Hayato Oikawa
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| |
Collapse
|
38
|
Caprari S, Toti D, Viet Hung L, Di Stefano M, Polticelli F. ASSIST: a fast versatile local structural comparison tool. Bioinformatics 2013; 30:1022-4. [DOI: 10.1093/bioinformatics/btt664] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
39
|
Prediction and experimental validation of enzyme substrate specificity in protein structures. Proc Natl Acad Sci U S A 2013; 110:E4195-202. [PMID: 24145433 DOI: 10.1073/pnas.1305162110] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Structural Genomics aims to elucidate protein structures to identify their functions. Unfortunately, the variation of just a few residues can be enough to alter activity or binding specificity and limit the functional resolution of annotations based on sequence and structure; in enzymes, substrates are especially difficult to predict. Here, large-scale controls and direct experiments show that the local similarity of five or six residues selected because they are evolutionarily important and on the protein surface can suffice to identify an enzyme activity and substrate. A motif of five residues predicted that a previously uncharacterized Silicibacter sp. protein was a carboxylesterase for short fatty acyl chains, similar to hormone-sensitive-lipase-like proteins that share less than 20% sequence identity. Assays and directed mutations confirmed this activity and showed that the motif was essential for catalysis and substrate specificity. We conclude that evolutionary and structural information may be combined on a Structural Genomics scale to create motifs of mixed catalytic and noncatalytic residues that identify enzyme activity and substrate specificity.
Collapse
|
40
|
Jalencas X, Mestres J. Identification of Similar Binding Sites to Detect Distant Polypharmacology. Mol Inform 2013; 32:976-90. [PMID: 27481143 DOI: 10.1002/minf.201300082] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Accepted: 07/29/2013] [Indexed: 01/19/2023]
Abstract
The ability of small molecules to interact with multiple proteins is referred to as polypharmacology. This property is often linked to the therapeutic action of drugs but it is known also to be responsible for many of their side effects. Because of its importance, the development of computational methods that can predict drug polypharmacology has become an important line of research that led recently to the identification of many novel targets for known drugs. Nowadays, the majority of these methods are based on measuring the similarity of a query molecule against the hundreds of thousands of molecules for which pharmacological data on thousands of proteins are available in public sources. However, similarity-based methods are inherently biased by the chemical coverage offered by the active molecules present in those public repositories, which limits significantly their capacity to predict interactions with proteins structurally and functionally unrelated to any of the already known targets for drugs. It is in this respect that structure-based methods aiming at identifying similar binding sites may offer an alternative complementary means to ligand-based methods for detecting distant polypharmacology. The different existing approaches to binding site detection, representation, comparison, and fragmentation are reviewed and recent successful applications presented.
Collapse
Affiliation(s)
- Xavier Jalencas
- Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Research Institute & University Pompeu Fabra, Parc de Recerca Biomèdica, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain fax: +34 93 3160550
| | - Jordi Mestres
- Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Research Institute & University Pompeu Fabra, Parc de Recerca Biomèdica, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain fax: +34 93 3160550.
| |
Collapse
|
41
|
He L, Vandin F, Pandurangan G, Bailey-Kellogg C. Ballast: a ball-based algorithm for structural motifs. J Comput Biol 2013; 20:137-51. [PMID: 23383999 DOI: 10.1089/cmb.2012.0246] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Structural motifs encapsulate local sequence-structure-function relationships characteristic of related proteins, enabling the prediction of functional characteristics of new proteins, providing molecular-level insights into how those functions are performed, and supporting the development of variants specifically maintaining or perturbing function in concert with other properties. Numerous computational methods have been developed to search through databases of structures for instances of specified motifs. However, it remains an open problem how best to leverage the local geometric and chemical constraints underlying structural motifs in order to develop motif-finding algorithms that are both theoretically and practically efficient. We present a simple, general, efficient approach, called Ballast (ball-based algorithm for structural motifs), to match given structural motifs to given structures. Ballast combines the best properties of previously developed methods, exploiting the composition and local geometry of a structural motif and its possible instances in order to effectively filter candidate matches. We show that on a wide range of motif-matching problems, Ballast efficiently and effectively finds good matches, and we provide theoretical insights into why it works well. By supporting generic measures of compositional and geometric similarity, Ballast provides a powerful substrate for the development of motif-matching algorithms.
Collapse
Affiliation(s)
- Lu He
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA
| | | | | | | |
Collapse
|
42
|
Nilmeier JP, Kirshner DA, Wong SE, Lightstone FC. Rapid catalytic template searching as an enzyme function prediction procedure. PLoS One 2013; 8:e62535. [PMID: 23675414 PMCID: PMC3651201 DOI: 10.1371/journal.pone.0062535] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Accepted: 03/22/2013] [Indexed: 11/18/2022] Open
Abstract
We present an enzyme protein function identification algorithm, Catalytic Site Identification (CatSId), based on identification of catalytic residues. The method is optimized for highly accurate template identification across a diverse template library and is also very efficient in regards to time and scalability of comparisons. The algorithm matches three-dimensional residue arrangements in a query protein to a library of manually annotated, catalytic residues--The Catalytic Site Atlas (CSA). Two main processes are involved. The first process is a rapid protein-to-template matching algorithm that scales quadratically with target protein size and linearly with template size. The second process incorporates a number of physical descriptors, including binding site predictions, in a logistic scoring procedure to re-score matches found in Process 1. This approach shows very good performance overall, with a Receiver-Operator-Characteristic Area Under Curve (AUC) of 0.971 for the training set evaluated. The procedure is able to process cofactors, ions, nonstandard residues, and point substitutions for residues and ions in a robust and integrated fashion. Sites with only two critical (catalytic) residues are challenging cases, resulting in AUCs of 0.9411 and 0.5413 for the training and test sets, respectively. The remaining sites show excellent performance with AUCs greater than 0.90 for both the training and test data on templates of size greater than two critical (catalytic) residues. The procedure has considerable promise for larger scale searches.
Collapse
Affiliation(s)
- Jerome P. Nilmeier
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Daniel A. Kirshner
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Sergio E. Wong
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Felice C. Lightstone
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| |
Collapse
|
43
|
Volkamer A, Kuhn D, Rippmann F, Rarey M. Predicting enzymatic function from global binding site descriptors. Proteins 2012; 81:479-89. [DOI: 10.1002/prot.24205] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Revised: 09/21/2012] [Accepted: 10/11/2012] [Indexed: 11/09/2022]
|
44
|
Ellingson L, Zhang J. Protein surface matching by combining local and global geometric information. PLoS One 2012; 7:e40540. [PMID: 22815760 PMCID: PMC3398928 DOI: 10.1371/journal.pone.0040540] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2011] [Accepted: 06/12/2012] [Indexed: 01/01/2023] Open
Abstract
Comparison of the binding sites of proteins is an effective means for predicting protein functions based on their structure information. Despite the importance of this problem and much research in the past, it is still very challenging to predict the binding ligands from the atomic structures of protein binding sites. Here, we designed a new algorithm, TIPSA (Triangulation-based Iterative-closest-point for Protein Surface Alignment), based on the iterative closest point (ICP) algorithm. TIPSA aims to find the maximum number of atoms that can be superposed between two protein binding sites, where any pair of superposed atoms has a distance smaller than a given threshold. The search starts from similar tetrahedra between two binding sites obtained from 3D Delaunay triangulation and uses the Hungarian algorithm to find additional matched atoms. We found that, due to the plasticity of protein binding sites, matching the rigid body of point clouds of protein binding sites is not adequate for satisfactory binding ligand prediction. We further incorporated global geometric information, the radius of gyration of binding site atoms, and used nearest neighbor classification for binding site prediction. Tested on benchmark data, our method achieved a performance comparable to the best methods in the literature, while simultaneously providing the common atom set and atom correspondences.
Collapse
Affiliation(s)
- Leif Ellingson
- Department of Mathematics and Statistics, Texas Tech University, Lubbock, Texas, United States of America
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, Florida, United States of America
- * E-mail:
| |
Collapse
|
45
|
Wu CY, Hwa YH, Chen YC, Lim C. Hidden relationship between conserved residues and locally conserved phosphate-binding structures in NAD(P)-binding proteins. J Phys Chem B 2012; 116:5644-52. [PMID: 22530587 DOI: 10.1021/jp3014332] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
A one-dimensional (1D) motif usually comprises conserved essential residues involved in catalysis, ligand binding, or maintaining a specific structure. However, it cannot be easily detected in proteins with low sequence identity because it is difficult to (1) identify protein sequences suspected to contain the motif, and (2) align sequences with little sequence identity to spot the conserved residues. Here, we present a strategy for discovering phosphate-binding 1D motifs in NAD(P)-binding proteins sharing low sequence identity that overcomes these two hurdles by determining all distinct locally conserved pyrophosphate-binding structures and aligning the same-length sequences comprising each of these structures to identify the conserved residues. We show that the sequence motifs derived from the distinct pyrophosphate-binding structures yield different numbers/spacing of conserved Gly residues. We also show that they depend on the side chain orientations and cofactor type (NAD or NADP). Thus, sequence motifs derived from local similarity of backbone structures without consideration of the cofactor type and/or side chain orientations would reduce their reliability in annotating protein function from sequence alone. The three-dimensional (3D) and 1D motifs comprising conserved residues in nonredundant proteins reveal hidden relationships between the protein structure/function and sequence as well as protein-cofactor interactions.
Collapse
Affiliation(s)
- Chih Yuan Wu
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan
| | | | | | | |
Collapse
|
46
|
Ueno K, Mineta K, Ito K, Endo T. Exploring functionally related enzymes using radially distributed properties of active sites around the reacting points of bound ligands. BMC STRUCTURAL BIOLOGY 2012; 12:5. [PMID: 22536854 PMCID: PMC3408369 DOI: 10.1186/1472-6807-12-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2011] [Accepted: 04/26/2012] [Indexed: 11/10/2022]
Abstract
BACKGROUND Structural genomics approaches, particularly those solving the 3D structures of many proteins with unknown functions, have increased the desire for structure-based function predictions. However, prediction of enzyme function is difficult because one member of a superfamily may catalyze a different reaction than other members, whereas members of different superfamilies can catalyze the same reaction. In addition, conformational changes, mutations or the absence of a particular catalytic residue can prevent inference of the mechanism by which catalytic residues stabilize and promote the elementary reaction. A major hurdle for alignment-based methods for prediction of function is the absence (despite its importance) of a measure of similarity of the physicochemical properties of catalytic sites. To solve this problem, the physicochemical features radially distributed around catalytic sites should be considered in addition to structural and sequence similarities. RESULTS We showed that radial distribution functions (RDFs), which are associated with the local structural and physicochemical properties of catalytic active sites, are capable of clustering oxidoreductases and transferases by function. The catalytic sites of these enzymes were also characterized using the RDFs. The RDFs provided a measure of the similarity among the catalytic sites, detecting conformational changes caused by mutation of catalytic residues. Furthermore, the RDFs reinforced the classification of enzyme functions based on conventional sequence and structural alignments. CONCLUSIONS Our results demonstrate that the application of RDFs provides advantages in the functional classification of enzymes by providing information about catalytic sites.
Collapse
Affiliation(s)
- Keisuke Ueno
- Division of Bioinformatics, Hokkaido University Research Center for Zoonosis Control, North 20 West 10, Sapporo, Hokkaido 001-0020, Japan
| | | | | | | |
Collapse
|
47
|
Kinjo AR, Nakamura H. Composite structural motifs of binding sites for delineating biological functions of proteins. PLoS One 2012; 7:e31437. [PMID: 22347478 PMCID: PMC3275580 DOI: 10.1371/journal.pone.0031437] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Accepted: 01/08/2012] [Indexed: 11/19/2022] Open
Abstract
Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, Suita, Osaka, Japan.
| | | |
Collapse
|
48
|
Sacan A, Ekins S, Kortagere S. Applications and limitations of in silico models in drug discovery. Methods Mol Biol 2012; 910:87-124. [PMID: 22821594 DOI: 10.1007/978-1-61779-965-5_6] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Drug discovery in the late twentieth and early twenty-first century has witnessed a myriad of changes that were adopted to predict whether a compound is likely to be successful, or conversely enable identification of molecules with liabilities as early as possible. These changes include integration of in silico strategies for lead design and optimization that perform complementary roles to that of the traditional in vitro and in vivo approaches. The in silico models are facilitated by the availability of large datasets associated with high-throughput screening, bioinformatics algorithms to mine and annotate the data from a target perspective, and chemoinformatics methods to integrate chemistry methods into lead design process. This chapter highlights the applications of some of these methods and their limitations. We hope this serves as an introduction to in silico drug discovery.
Collapse
Affiliation(s)
- Ahmet Sacan
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | | | | |
Collapse
|
49
|
Abstract
This chapter describes a method for analyzing the allosteric influence of molecular interactions on protein conformational distributions. The method, called Dynamics Perturbation Analysis (DPA), generally yields insights into allosteric effects in proteins and is especially useful for predicting ligand-binding sites. The use of DPA for binding site prediction is motivated by the following allosteric regulation hypothesis: interactions in native binding sites cause a large change in protein conformational distributions. Here, we review the reasoning behind this hypothesis, describe the math behind the method, and present a recipe for predicting binding sites using DPA.
Collapse
Affiliation(s)
- Dengming Ming
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, China
| | | |
Collapse
|
50
|
Stegemann B, Klebe G. Cofactor-binding sites in proteins of deviating sequence: comparative analysis and clustering in torsion angle, cavity, and fold space. Proteins 2011; 80:626-48. [PMID: 22095739 DOI: 10.1002/prot.23226] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Revised: 09/29/2011] [Accepted: 10/10/2011] [Indexed: 12/13/2022]
Abstract
Small molecules are recognized in protein-binding pockets through surface-exposed physicochemical properties. To optimize binding, they have to adopt a conformation corresponding to a local energy minimum within the formed protein-ligand complex. However, their conformational flexibility makes them competent to bind not only to homologous proteins of the same family but also to proteins of remote similarity with respect to the shape of the binding pockets and folding pattern. Considering drug action, such observations can give rise to unexpected and undesired cross reactivity. In this study, datasets of six different cofactors (ADP, ATP, NAD(P)(H), FAD, and acetyl CoA, sharing an adenosine diphosphate moiety as common substructure), observed in multiple crystal structures of protein-cofactor complexes exhibiting sequence identity below 25%, have been analyzed for the conformational properties of the bound ligands, the distribution of physicochemical properties in the accommodating protein-binding pockets, and the local folding patterns next to the cofactor-binding site. State-of-the-art clustering techniques have been applied to group the different protein-cofactor complexes in the different spaces. Interestingly, clustering in cavity (Cavbase) and fold space (DALI) reveals virtually the same data structuring. Remarkable relationships can be found among the different spaces. They provide information on how conformations are conserved across the host proteins and which distinct local cavity and fold motifs recognize the different portions of the cofactors. In those cases, where different cofactors are found to be accommodated in a similar fashion to the same fold motifs, only a commonly shared substructure of the cofactors is used for the recognition process.
Collapse
Affiliation(s)
- Björn Stegemann
- Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, Marbacher Weg 6, D-35032 Marburg, Germany
| | | |
Collapse
|