1
|
Zemla AT, Allen JE, Kirshner D, Lightstone FC. PDBspheres: a method for finding 3D similarities in local regions in proteins. NAR Genom Bioinform 2022; 4:lqac078. [PMID: 36225529 PMCID: PMC9549786 DOI: 10.1093/nargab/lqac078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 08/06/2022] [Accepted: 09/29/2022] [Indexed: 11/05/2022] Open
Abstract
We present a structure-based method for finding and evaluating structural similarities in protein regions relevant to ligand binding. PDBspheres comprises an exhaustive library of protein structure regions ('spheres') adjacent to complexed ligands derived from the Protein Data Bank (PDB), along with methods to find and evaluate structural matches between a protein of interest and spheres in the library. PDBspheres uses the LGA (Local-Global Alignment) structure alignment algorithm as the main engine for detecting structural similarities between the protein of interest and template spheres from the library, which currently contains >2 million spheres. To assess confidence in structural matches, an all-atom-based similarity metric takes side chain placement into account. Here, we describe the PDBspheres method, demonstrate its ability to detect and characterize binding sites in protein structures, show how PDBspheres-a strictly structure-based method-performs on a curated dataset of 2528 ligand-bound and ligand-free crystal structures, and use PDBspheres to cluster pockets and assess structural similarities among protein binding sites of 4876 structures in the 'refined set' of the PDBbind 2019 dataset.
Collapse
Affiliation(s)
- Adam T Zemla
- To whom correspondence should be addressed. Tel: +1 925 423 5571; Fax: +1 925 423 6437;
| | - Jonathan E Allen
- Global Security Computing Applications, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | - Dan Kirshner
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | - Felice C Lightstone
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| |
Collapse
|
2
|
Jones D, Kim H, Zhang X, Zemla A, Stevenson G, Bennett WFD, Kirshner D, Wong SE, Lightstone FC, Allen JE. Improved Protein-Ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference. J Chem Inf Model 2021; 61:1583-1592. [PMID: 33754707 DOI: 10.1021/acs.jcim.0c01306] [Citation(s) in RCA: 96] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Predicting accurate protein-ligand binding affinities is an important task in drug discovery but remains a challenge even with computationally expensive biophysics-based energy scoring methods and state-of-the-art deep learning approaches. Despite the recent advances in the application of deep convolutional and graph neural network-based approaches, it remains unclear what the relative advantages of each approach are and how they compare with physics-based methodologies that have found more mainstream success in virtual screening pipelines. We present fusion models that combine features and inference from complementary representations to improve binding affinity prediction. This, to our knowledge, is the first comprehensive study that uses a common series of evaluations to directly compare the performance of three-dimensional (3D)-convolutional neural networks (3D-CNNs), spatial graph neural networks (SG-CNNs), and their fusion. We use temporal and structure-based splits to assess performance on novel protein targets. To test the practical applicability of our models, we examine their performance in cases that assume that the crystal structure is not available. In these cases, binding free energies are predicted using docking pose coordinates as the inputs to each model. In addition, we compare these deep learning approaches to predictions based on docking scores and molecular mechanic/generalized Born surface area (MM/GBSA) calculations. Our results show that the fusion models make more accurate predictions than their constituent neural network models as well as docking scoring and MM/GBSA rescoring, with the benefit of greater computational efficiency than the MM/GBSA method. Finally, we provide the code to reproduce our results and the parameter files of the trained models used in this work. The software is available as open source at https://github.com/llnl/fast. Model parameter files are available at ftp://gdo-bioinformatics.ucllnl.org/fast/pdbbind2016_model_checkpoints/.
Collapse
Affiliation(s)
- Derek Jones
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Hyojin Kim
- Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Xiaohua Zhang
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Adam Zemla
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Garrett Stevenson
- Computational Engineering Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - W F Drew Bennett
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Daniel Kirshner
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Sergio E Wong
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Felice C Lightstone
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Jonathan E Allen
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| |
Collapse
|
3
|
Darwiche R, Lugo F, Drurey C, Varossieau K, Smant G, Wilbers RHP, Maizels RM, Schneiter R, Asojo OA. Crystal structure of Brugia malayi venom allergen-like protein-1 (BmVAL-1), a vaccine candidate for lymphatic filariasis. Int J Parasitol 2018; 48:371-378. [PMID: 29501266 PMCID: PMC5893361 DOI: 10.1016/j.ijpara.2017.12.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Revised: 12/04/2017] [Accepted: 12/19/2017] [Indexed: 12/11/2022]
Abstract
The vaccine candidate Brugia malayi venom allergen-like 1 protein (BmVAL-1) has three distinct binding cavities. The cavities are the central cavity; the sterol-binding caveolin-binding motif (CBM); and the palmitate-binding cavity. These cavities are connected by channels, which can accommodate water molecules, ions and small ligands. The channels explain how blocking divalent ions in the central cavity affects sterol binding in the distinct CBM cavity. BmVAL-1 has a glycosylated CBM, is an effective sterol transporter in vivo and binds cholesterol and palmitate in vitro.
Brugia malayi is a causative agent of lymphatic filariasis, a major tropical disease. The infective L3 parasite stage releases immunomodulatory proteins including the venom allergen-like proteins (VALs), which are members of the SCP/TAPS (Sperm-coating protein/Tpx/antigen 5/pathogenesis related-1/Sc7) superfamily. BmVAL-1 is a major target of host immunity with >90% of infected B. malayi microfilaraemic cases being seropositive for antibodies to BmVAL-1. This study is part of ongoing efforts to characterize the structures and functions of important B. malayi proteins. Recombinant BmVAL-1 was produced using a plant expression system, crystallized and the structure was solved by molecular replacement and refined to 2.1 Å, revealing the characteristic alpha/beta/alpha sandwich topology of eukaryotic SCP/TAPS proteins. The protein has more than 45% loop regions and these flexible loops connect the helices and strands, which are longer than predicted based on other parasite SCP/TAPS protein structures. The large central cavity of BmVAL-1 is a prototypical CRISP cavity with two histidines required to bind divalent cations. The caveolin-binding motif (CBM) that mediates sterol binding in SCP/TAPS proteins is large and open in BmVAL-1 and is N-glycosylated. N-glycosylation of the CBM does not affect the ability of BmVAL-1 to bind sterol in vitro. BmVAL-1 complements the in vivo sterol export phenotype of yeast mutants lacking their endogenous SCP/TAPS proteins. The in vitro sterol-binding affinity of BmVAL-1 is comparable with Pry1, a yeast sterol transporting SCP/TAPS protein. Sterol binding of BmVAL-1 is dependent on divalent cations. BmVAL-1 also has a large open palmitate-binding cavity, which binds palmitate comparably to tablysin-15, a lipid-binding SCP/TAPS protein. The central cavity, CBM and palmitate-binding cavity of BmVAL-1 are interconnected within the monomer with channels that can serve as pathways for water molecules, cations and small molecules.
Collapse
Affiliation(s)
- Rabih Darwiche
- Division of Biochemistry, Department of Biology, University of Fribourg, Chemin du Musée 10, CH 1700 Fribourg, Switzerland
| | - Fernanda Lugo
- National School of Tropical Medicine, Baylor College of Medicine, Houston, TX 77030, USA
| | - Claire Drurey
- Wellcome Centre for Molecular Parasitology, Institute for Infection, Immunity and Inflammation, University of Glasgow, Sir Graeme Davies Building, 120 University Place, Glasgow G12 8TA, UK
| | - Koen Varossieau
- Laboratory of Nematology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Geert Smant
- Laboratory of Nematology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Ruud H P Wilbers
- Laboratory of Nematology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Rick M Maizels
- Wellcome Centre for Molecular Parasitology, Institute for Infection, Immunity and Inflammation, University of Glasgow, Sir Graeme Davies Building, 120 University Place, Glasgow G12 8TA, UK
| | - Roger Schneiter
- Division of Biochemistry, Department of Biology, University of Fribourg, Chemin du Musée 10, CH 1700 Fribourg, Switzerland
| | - Oluwatoyin A Asojo
- National School of Tropical Medicine, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
4
|
Ehrt C, Brinkjost T, Koch O. Impact of Binding Site Comparisons on Medicinal Chemistry and Rational Molecular Design. J Med Chem 2016; 59:4121-51. [PMID: 27046190 DOI: 10.1021/acs.jmedchem.6b00078] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Modern rational drug design not only deals with the search for ligands binding to interesting and promising validated targets but also aims to identify the function and ligands of yet uncharacterized proteins having impact on different diseases. Additionally, it contributes to the design of inhibitors with distinct selectivity patterns and the prediction of possible off-target effects. The identification of similarities between binding sites of various proteins is a useful approach to cope with those challenges. The main scope of this perspective is to describe applications of different protein binding site comparison approaches to outline their applicability and impact on molecular design. The article deals with various substantial application domains and provides some outstanding examples to show how various binding site comparison methods can be applied to promote in silico drug design workflows. In addition, we will also briefly introduce the fundamental principles of different protein binding site comparison methods.
Collapse
Affiliation(s)
- Christiane Ehrt
- Faculty of Chemistry and Chemical Biology, TU Dortmund University , Otto-Hahn-Straße 6, 44227 Dortmund, Germany
| | - Tobias Brinkjost
- Faculty of Chemistry and Chemical Biology, TU Dortmund University , Otto-Hahn-Straße 6, 44227 Dortmund, Germany.,Department of Computer Science, TU Dortmund University , Otto-Hahn-Straße 14, 44224 Dortmund, Germany
| | - Oliver Koch
- Faculty of Chemistry and Chemical Biology, TU Dortmund University , Otto-Hahn-Straße 6, 44227 Dortmund, Germany
| |
Collapse
|
5
|
Lee S, Min H, Yoon S. Will solid-state drives accelerate your bioinformatics? In-depth profiling, performance analysis and beyond. Brief Bioinform 2015; 17:713-27. [PMID: 26330577 DOI: 10.1093/bib/bbv073] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Indexed: 11/12/2022] Open
Abstract
A wide variety of large-scale data have been produced in bioinformatics. In response, the need for efficient handling of biomedical big data has been partly met by parallel computing. However, the time demand of many bioinformatics programs still remains high for large-scale practical uses because of factors that hinder acceleration by parallelization. Recently, new generations of storage devices have emerged, such as NAND flash-based solid-state drives (SSDs), and with the renewed interest in near-data processing, they are increasingly becoming acceleration methods that can accompany parallel processing. In certain cases, a simple drop-in replacement of hard disk drives by SSDs results in dramatic speedup. Despite the various advantages and continuous cost reduction of SSDs, there has been little review of SSD-based profiling and performance exploration of important but time-consuming bioinformatics programs. For an informative review, we perform in-depth profiling and analysis of 23 key bioinformatics programs using multiple types of devices. Based on the insight we obtain from this research, we further discuss issues related to design and optimize bioinformatics algorithms and pipelines to fully exploit SSDs. The programs we profile cover traditional and emerging areas of importance, such as alignment, assembly, mapping, expression analysis, variant calling and metagenomics. We explain how acceleration by parallelization can be combined with SSDs for improved performance and also how using SSDs can expedite important bioinformatics pipelines, such as variant calling by the Genome Analysis Toolkit and transcriptome analysis using RNA sequencing. We hope that this review can provide useful directions and tips to accompany future bioinformatics algorithm design procedures that properly consider new generations of powerful storage devices.
Collapse
|
6
|
Kubrycht J, Sigler K, Souček P, Hudeček J. Structures composing protein domains. Biochimie 2013; 95:1511-24. [DOI: 10.1016/j.biochi.2013.04.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 04/02/2013] [Indexed: 12/21/2022]
|
7
|
Cadag E, Vitalis E, Lennox KP, Zhou CLE, Zemla AT. Computational analysis of pathogen-borne metallo β-lactamases reveals discriminating structural features between B1 types. BMC Res Notes 2012; 5:96. [PMID: 22333139 PMCID: PMC3293060 DOI: 10.1186/1756-0500-5-96] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2011] [Accepted: 02/14/2012] [Indexed: 01/25/2023] Open
Abstract
Background Genes conferring antibiotic resistance to groups of bacterial pathogens are cause for considerable concern, as many once-reliable antibiotics continue to see a reduction in efficacy. The recent discovery of the metallo β-lactamase blaNDM-1 gene, which appears to grant antibiotic resistance to a variety of Enterobacteriaceae via a mobile plasmid, is one example of this distressing trend. The following work describes a computational analysis of pathogen-borne MBLs that focuses on the structural aspects of characterized proteins. Results Using both sequence and structural analyses, we examine residues and structural features specific to various pathogen-borne MBL types. This analysis identifies a linker region within MBL-like folds that may act as a discriminating structural feature between these proteins, and specifically resistance-associated acquirable MBLs. Recently released crystal structures of the newly emerged NDM-1 protein were aligned against related MBL structures using a variety of global and local structural alignment methods, and the overall fold conformation is examined for structural conservation. Conservation appears to be present in most areas of the protein, yet is strikingly absent within a linker region, making NDM-1 unique with respect to a linker-based classification scheme. Variability analysis of the NDM-1 crystal structure highlights unique residues in key regions as well as identifying several characteristics shared with other transferable MBLs. Conclusions A discriminating linker region identified in MBL proteins is highlighted and examined in the context of NDM-1 and primarily three other MBL types: IMP-1, VIM-2 and ccrA. The presence of an unusual linker region variant and uncommon amino acid composition at specific structurally important sites may help to explain the unusually broad kinetic profile of NDM-1 and may aid in directing research attention to areas of this protein, and possibly other MBLs, that may be targeted for inactivation or attenuation of enzymatic activity.
Collapse
Affiliation(s)
- Eithon Cadag
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, 94550 CA, USA.
| | | | | | | | | |
Collapse
|
8
|
Liu T, Altman RB. Using multiple microenvironments to find similar ligand-binding sites: application to kinase inhibitor binding. PLoS Comput Biol 2011; 7:e1002326. [PMID: 22219723 PMCID: PMC3248393 DOI: 10.1371/journal.pcbi.1002326] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2011] [Accepted: 11/10/2011] [Indexed: 11/20/2022] Open
Abstract
The recognition of cryptic small-molecular binding sites in protein structures is important for understanding off-target side effects and for recognizing potential new indications for existing drugs. Current methods focus on the geometry and detailed chemical interactions within putative binding pockets, but may not recognize distant similarities where dynamics or modified interactions allow one ligand to bind apparently divergent binding pockets. In this paper, we introduce an algorithm that seeks similar microenvironments within two binding sites, and assesses overall binding site similarity by the presence of multiple shared microenvironments. The method has relatively weak geometric requirements (to allow for conformational change or dynamics in both the ligand and the pocket) and uses multiple biophysical and biochemical measures to characterize the microenvironments (to allow for diverse modes of ligand binding). We term the algorithm PocketFEATURE, since it focuses on pockets using the FEATURE system for characterizing microenvironments. We validate PocketFEATURE first by showing that it can better discriminate sites that bind similar ligands from those that do not, and by showing that we can recognize FAD-binding sites on a proteome scale with Area Under the Curve (AUC) of 92%. We then apply PocketFEATURE to evolutionarily distant kinases, for which the method recognizes several proven distant relationships, and predicts unexpected shared ligand binding. Using experimental data from ChEMBL and Ambit, we show that at high significance level, 40 kinase pairs are predicted to share ligands. Some of these pairs offer new opportunities for inhibiting two proteins in a single pathway. Small molecule drugs may interact with many proteins. Some of these interactions may cause unexpected effects, including side effects or potentially useful therapeutic effects. One way to predict these effects is to analyze the three-dimensional structure of target proteins, and identify new binding sites for small molecule drugs. Several methods have been proposed for predicting new binding sites, relying on geometric and functional complementarity of the sites and the small molecules. In this paper, we report on a new method for identifying novel protein-drug interactions by analyzing the similarity between binding sites in proteins. The method has relatively weak geometric requirements and allows for conformational change or dynamics in both the ligand and protein. Our results show that geometric flexibility is useful for effectively comparing sites. We have applied the method to evolutionarily distant kinases, and find unexpected shared inhibitor binding. Our results may be valuable for drug repurposing in order to find novel uses for existing kinase inhibitors.
Collapse
Affiliation(s)
- Tianyun Liu
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Russ B. Altman
- Department of Genetics, Stanford University, Stanford, California, United States of America
- Department of Bioengineering, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
9
|
Doppelt-Azeroual O, Delfaud F, Moriaud F, de Brevern AG. Fast and automated functional classification with MED-SuMo: an application on purine-binding proteins. Protein Sci 2010; 19:847-67. [PMID: 20162627 DOI: 10.1002/pro.364] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Ligand-protein interactions are essential for biological processes, and precise characterization of protein binding sites is crucial to understand protein functions. MED-SuMo is a powerful technology to localize similar local regions on protein surfaces. Its heuristic is based on a 3D representation of macromolecules using specific surface chemical features associating chemical characteristics with geometrical properties. MED-SMA is an automated and fast method to classify binding sites. It is based on MED-SuMo technology, which builds a similarity graph, and it uses the Markov Clustering algorithm. Purine binding sites are well studied as drug targets. Here, purine binding sites of the Protein DataBank (PDB) are classified. Proteins potentially inhibited or activated through the same mechanism are gathered. Results are analyzed according to PROSITE annotations and to carefully refined functional annotations extracted from the PDB. As expected, binding sites associated with related mechanisms are gathered, for example, the Small GTPases. Nevertheless, protein kinases from different Kinome families are also found together, for example, Aurora-A and CDK2 proteins which are inhibited by the same drugs. Representative examples of different clusters are presented. The effectiveness of the MED-SMA approach is demonstrated as it gathers binding sites of proteins with similar structure-activity relationships. Moreover, an efficient new protocol associates structures absent of cocrystallized ligands to the purine clusters enabling those structures to be associated with a specific binding mechanism. Applications of this classification by binding mode similarity include target-based drug design and prediction of cross-reactivity and therefore potential toxic side effects.
Collapse
Affiliation(s)
- Olivia Doppelt-Azeroual
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Université Paris Diderot-Paris 7, Institut National de la Transfusion Sanguine (INTS), 6, rue Alexandre Cabanel, 75739 Paris cedex 15, France.
| | | | | | | |
Collapse
|
10
|
Min H, Yu S, Lee T, Yoon S. Support vector machine based classification of 3-dimensional protein physicochemical environments for automated function annotation. Arch Pharm Res 2010; 33:1451-9. [PMID: 20945145 DOI: 10.1007/s12272-010-0920-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2010] [Revised: 08/10/2010] [Accepted: 08/15/2010] [Indexed: 10/19/2022]
Abstract
The knowledge of protein functions as well as structures is critical for drug discovery and development. The FEATURE system developed at Stanford is an effective tool for characterizing and classifying local environments in proteins. FEATURE utilizes vectors of a fixed dimension to represent the physicochemical properties around a residue. Functional sites and non-sites are identified by classifying such vectors using the Naïve Bayes classifier. In this paper, we improve the FEATURE framework in several ways so that it can be more flexible, robust and accurate. The new tool can handle vectors of a user-specified dimension and can suppress noise effectively, with little loss of important signals, by employing dimensionality reduction. Furthermore, our approach utilizes the support vector machine for a more accurate classification. According to the results of our thorough experiments, the proposed new approach outperformed the original tool by 20.13% and 13.42% with respect to true and false positive rates, respectively.
Collapse
Affiliation(s)
- Hyeyoung Min
- College of Pharmacy, Chung-Ang University, Seoul, 156-756, Korea
| | | | | | | |
Collapse
|
11
|
Lee T, Min H, Kim SJ, Yoon S. Application of maximin correlation analysis to classifying protein environments for function prediction. Biochem Biophys Res Commun 2010; 400:219-24. [DOI: 10.1016/j.bbrc.2010.08.042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2010] [Accepted: 08/11/2010] [Indexed: 10/19/2022]
|
12
|
Xue Y, Liu Z, Gao X, Jin C, Wen L, Yao X, Ren J. GPS-SNO: computational prediction of protein S-nitrosylation sites with a modified GPS algorithm. PLoS One 2010; 5:e11290. [PMID: 20585580 PMCID: PMC2892008 DOI: 10.1371/journal.pone.0011290] [Citation(s) in RCA: 177] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2009] [Accepted: 06/04/2010] [Indexed: 11/18/2022] Open
Abstract
As one of the most important and ubiquitous post-translational modifications (PTMs) of proteins, S-nitrosylation plays important roles in a variety of biological processes, including the regulation of cellular dynamics and plasticity. Identification of S-nitrosylated substrates with their exact sites is crucial for understanding the molecular mechanisms of S-nitrosylation. In contrast with labor-intensive and time-consuming experimental approaches, prediction of S-nitrosylation sites using computational methods could provide convenience and increased speed. In this work, we developed a novel software of GPS-SNO 1.0 for the prediction of S-nitrosylation sites. We greatly improved our previously developed algorithm and released the GPS 3.0 algorithm for GPS-SNO. By comparison, the prediction performance of GPS 3.0 algorithm was better than other methods, with an accuracy of 75.80%, a sensitivity of 53.57% and a specificity of 80.14%. As an application of GPS-SNO 1.0, we predicted putative S-nitrosylation sites for hundreds of potentially S-nitrosylated substrates for which the exact S-nitrosylation sites had not been experimentally determined. In this regard, GPS-SNO 1.0 should prove to be a useful tool for experimentalists. The online service and local packages of GPS-SNO were implemented in JAVA and are freely available at: http://sno.biocuckoo.org/.
Collapse
Affiliation(s)
- Yu Xue
- Hubei Bioinformatics and Molecular Imaging Key Laboratory, Department of Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Zexian Liu
- Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, China
| | - Xinjiao Gao
- Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, China
| | - Changjiang Jin
- Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, China
| | - Longping Wen
- Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, China
| | - Xuebiao Yao
- Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, China
| | - Jian Ren
- Life Sciences School, Sun Yat-sen University (SYSU), Guangzhou, Guangdong, China
| |
Collapse
|
13
|
Wu S, Liu T, Altman RB. Identification of recurring protein structure microenvironments and discovery of novel functional sites around CYS residues. BMC STRUCTURAL BIOLOGY 2010; 10:4. [PMID: 20122268 PMCID: PMC2833161 DOI: 10.1186/1472-6807-10-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Accepted: 02/02/2010] [Indexed: 11/29/2022]
Abstract
Background The emergence of structural genomics presents significant challenges in the annotation of biologically uncharacterized proteins. Unfortunately, our ability to analyze these proteins is restricted by the limited catalog of known molecular functions and their associated 3D motifs. Results In order to identify novel 3D motifs that may be associated with molecular functions, we employ an unsupervised, two-phase clustering approach that combines k-means and hierarchical clustering with knowledge-informed cluster selection and annotation methods. We applied the approach to approximately 20,000 cysteine-based protein microenvironments (3D regions 7.5 Å in radius) and identified 70 interesting clusters, some of which represent known motifs (e.g. metal binding and phosphatase activity), and some of which are novel, including several zinc binding sites. Detailed annotation results are available online for all 70 clusters at http://feature.stanford.edu/clustering/cys. Conclusions The use of microenvironments instead of backbone geometric criteria enables flexible exploration of protein function space, and detection of recurring motifs that are discontinuous in sequence and diverse in structure. Clustering microenvironments may thus help to functionally characterize novel proteins and better understand the protein structure-function relationship.
Collapse
Affiliation(s)
- Shirley Wu
- 23andMe, 1390 Shorebird Way, Mountain View, CA, USA
| | | | | |
Collapse
|
14
|
Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol 2009; 5:e1000585. [PMID: 19997483 PMCID: PMC2777313 DOI: 10.1371/journal.pcbi.1000585] [Citation(s) in RCA: 285] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2009] [Accepted: 10/30/2009] [Indexed: 11/20/2022] Open
Abstract
Identifying a protein's functional sites is an important step towards characterizing its molecular function. Numerous structure- and sequence-based methods have been developed for this problem. Here we introduce ConCavity, a small molecule binding site prediction algorithm that integrates evolutionary sequence conservation estimates with structure-based methods for identifying protein surface cavities. In large-scale testing on a diverse set of single- and multi-chain protein structures, we show that ConCavity substantially outperforms existing methods for identifying both 3D ligand binding pockets and individual ligand binding residues. As part of our testing, we perform one of the first direct comparisons of conservation-based and structure-based methods. We find that the two approaches provide largely complementary information, which can be combined to improve upon either approach alone. We also demonstrate that ConCavity has state-of-the-art performance in predicting catalytic sites and drug binding pockets. Overall, the algorithms and analysis presented here significantly improve our ability to identify ligand binding sites and further advance our understanding of the relationship between evolutionary sequence conservation and structural and functional attributes of proteins. Data, source code, and prediction visualizations are available on the ConCavity web site (http://compbio.cs.princeton.edu/concavity/). Protein molecules are ubiquitous in the cell; they perform thousands of functions crucial for life. Proteins accomplish nearly all of these functions by interacting with other molecules. These interactions are mediated by specific amino acid positions in the proteins. Knowledge of these “functional sites” is crucial for understanding the molecular mechanisms by which proteins carry out their functions; however, functional sites have not been identified in the vast majority of proteins. Here, we present ConCavity, a computational method that predicts small molecule binding sites in proteins by combining analysis of evolutionary sequence conservation and protein 3D structure. ConCavity provides significant improvement over previous approaches, especially on large, multi-chain proteins. In contrast to earlier methods which only predict entire binding sites, ConCavity makes specific predictions of positions in space that are likely to overlap ligand atoms and of residues that are likely to contact bound ligands. These predictions can be used to aid computational function prediction, to guide experimental protein analysis, and to focus computationally intensive techniques used in drug discovery.
Collapse
Affiliation(s)
- John A. Capra
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Roman A. Laskowski
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Janet M. Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- * E-mail: (MS); (TAF)
| | - Thomas A. Funkhouser
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- * E-mail: (MS); (TAF)
| |
Collapse
|
15
|
Nagel K, Jimeno-Yepes A, Rebholz-Schuhmann D. Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb. BMC Bioinformatics 2009; 10 Suppl 8:S4. [PMID: 19758468 PMCID: PMC2745586 DOI: 10.1186/1471-2105-10-s8-s4] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background A protein annotation database, such as the Universal Protein Resource knowledge base (UniProtKb), is a valuable resource for the validation and interpretation of predicted 3D structure patterns in proteins. Existing studies have focussed on point mutation extraction methods from biomedical literature which can be used to support the time consuming work of manual database curation. However, these methods were limited to point mutation extraction and do not extract features for the annotation of proteins at the residue level. Results This work introduces a system that identifies protein residues in MEDLINE abstracts and annotates them with features extracted from the context written in the surrounding text. MEDLINE abstract texts have been processed to identify protein mentions in combination with taxonomic species and protein residues (F1-measure 0.52). The identified protein-species-residue triplets have been validated and benchmarked against reference data resources (UniProtKb, average F1-measure of 0.54). Then, contextual features were extracted through shallow and deep parsing and the features have been classified into predefined categories (F1-measure ranges from 0.15 to 0.67). Furthermore, the feature sets have been aligned with annotation types in UniProtKb to assess the relevance of the annotations for ongoing curation projects. Altogether, the annotations have been assessed automatically and manually against reference data resources. Conclusion This work proposes a solution for the automatic extraction of functional annotation for protein residues from biomedical articles. The presented approach is an extension to other existing systems in that a wider range of residue entities are considered and that features of residues are extracted as annotations.
Collapse
Affiliation(s)
- Kevin Nagel
- European Bioinformatics Institute, Hinxton, Cambridge, UK.
| | | | | |
Collapse
|
16
|
Halperin I, Glazer DS, Wu S, Altman RB. The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications. BMC Genomics 2008; 9 Suppl 2:S2. [PMID: 18831785 PMCID: PMC2559884 DOI: 10.1186/1471-2164-9-s2-s2] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Structural genomics efforts contribute new protein structures that often lack significant sequence and fold similarity to known proteins. Traditional sequence and structure-based methods may not be sufficient to annotate the molecular functions of these structures. Techniques that combine structural and functional modeling can be valuable for functional annotation. FEATURE is a flexible framework for modeling and recognition of functional sites in macromolecular structures. Here, we present an overview of the main components of the FEATURE framework, and describe the recent developments in its use. These include automating training sets selection to increase functional coverage, coupling FEATURE to structural diversity generating methods such as molecular dynamics simulations and loop modeling methods to improve performance, and using FEATURE in large-scale modeling and structure determination efforts.
Collapse
Affiliation(s)
- Inbal Halperin
- Department of Genetics, 318 Campus Drive, Clark Center S240, Stanford, CA 94305, USA.
| | | | | | | |
Collapse
|
17
|
Rodrigues APC, Grant BJ, Godzik A, Friedberg I. The 2006 automated function prediction meeting. BMC Bioinformatics 2007; 8 Suppl 4:S1-4. [PMID: 17570143 PMCID: PMC1892079 DOI: 10.1186/1471-2105-8-s4-s1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Ana PC Rodrigues
- Burnham Institute for Medical Research, 10901 N. Torrey Pines Rd., La Jolla, CA 92037 USA
| | - Barry J Grant
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093, USA
| | - Adam Godzik
- Burnham Institute for Medical Research, 10901 N. Torrey Pines Rd., La Jolla, CA 92037 USA
- Center for Research in Biological Systems (CRBS), University of California, San Diego, 9500 Gilman Drive La Jolla, MC 0446 CA 92093, USA
| | - Iddo Friedberg
- Burnham Institute for Medical Research, 10901 N. Torrey Pines Rd., La Jolla, CA 92037 USA
| |
Collapse
|