1
|
Wehrspan ZJ, McDonnell RT, Elcock AH. Identification of Iron-Sulfur (Fe-S) Cluster and Zinc (Zn) Binding Sites Within Proteomes Predicted by DeepMind's AlphaFold2 Program Dramatically Expands the Metalloproteome. J Mol Biol 2022; 434:167377. [PMID: 34838520 PMCID: PMC8785651 DOI: 10.1016/j.jmb.2021.167377] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 11/17/2021] [Accepted: 11/18/2021] [Indexed: 02/01/2023]
Abstract
DeepMind's AlphaFold2 software has ushered in a revolution in high quality, 3D protein structure prediction. In very recent work by the DeepMind team, structure predictions have been made for entire proteomes of twenty-one organisms, with >360,000 structures made available for download. Here we show that thousands of novel binding sites for iron-sulfur (Fe-S) clusters and zinc (Zn) ions can be identified within these predicted structures by exhaustive enumeration of all potential ligand-binding orientations. We demonstrate that AlphaFold2 routinely makes highly specific predictions of ligand binding sites: for example, binding sites that are comprised exclusively of four cysteine sidechains fall into three clusters, representing binding sites for 4Fe-4S clusters, 2Fe-2S clusters, or individual Zn ions. We show further: (a) that the majority of known Fe-S cluster and Zn binding sites documented in UniProt are recovered by the AlphaFold2 structures, (b) that there are occasional disputes between AlphaFold2 and UniProt with AlphaFold2 predicting highly plausible alternative binding sites, (c) that the Fe-S cluster binding sites that we identify in E. coli agree well with previous bioinformatics predictions, (d) that cysteines predicted here to be part of ligand binding sites show little overlap with those shown via chemoproteomics techniques to be highly reactive, and (e) that AlphaFold2 occasionally appears to build erroneous disulfide bonds between cysteines that should instead coordinate a ligand. These results suggest that AlphaFold2 could be an important tool for the functional annotation of proteomes, and the methodology presented here is likely to be useful for predicting other ligand-binding sites.
Collapse
Affiliation(s)
| | | | - Adrian H Elcock
- Department of Biochemistry, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
2
|
Pazos F. Computational prediction of protein functional sites-Applications in biotechnology and biomedicine. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:39-57. [PMID: 35534114 DOI: 10.1016/bs.apcsb.2021.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
There are many computational approaches for predicting protein functional sites based on different sequence and structural features. These methods are essential to cope with the sequence deluge that is filling databases with uncharacterized protein sequences. They complement the more expensive and time-consuming experimental approaches by pointing them to possible candidate positions. In many cases they are jointly used to characterize the functional sites in proteins of biotechnological and biomedical interest and eventually modify them for different purposes. There is a clear trend towards approaches based on machine learning and those using structural information, due to the recent developments in these areas. Nevertheless, "classic" methods based on sequence and evolutionary features are still playing an important role as these features are strongly related to functionality. In this review, the main approaches for predicting general functional sites in a protein are discussed, with a focus on sequence-based approaches.
Collapse
Affiliation(s)
- Florencio Pazos
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), Madrid, Spain.
| |
Collapse
|
3
|
Feehan R, Franklin MW, Slusky JSG. Machine learning differentiates enzymatic and non-enzymatic metals in proteins. Nat Commun 2021; 12:3712. [PMID: 34140507 PMCID: PMC8211803 DOI: 10.1038/s41467-021-24070-3] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 06/02/2021] [Indexed: 11/09/2022] Open
Abstract
Metalloenzymes are 40% of all enzymes and can perform all seven classes of enzyme reactions. Because of the physicochemical similarities between the active sites of metalloenzymes and inactive metal binding sites, it is challenging to differentiate between them. Yet distinguishing these two classes is critical for the identification of both native and designed enzymes. Because of similarities between catalytic and non-catalytic metal binding sites, finding physicochemical features that distinguish these two types of metal sites can indicate aspects that are critical to enzyme function. In this work, we develop the largest structural dataset of enzymatic and non-enzymatic metalloprotein sites to date. We then use a decision-tree ensemble machine learning model to classify metals bound to proteins as enzymatic or non-enzymatic with 92.2% precision and 90.1% recall. Our model scores electrostatic and pocket lining features as more important than pocket volume, despite the fact that volume is the most quantitatively different feature between enzyme and non-enzymatic sites. Finally, we find our model has overall better performance in a side-to-side comparison against other methods that differentiate enzymatic from non-enzymatic sequences. We anticipate that our model's ability to correctly identify which metal sites are responsible for enzymatic activity could enable identification of new enzymatic mechanisms and de novo enzyme design.
Collapse
Affiliation(s)
- Ryan Feehan
- Center for Computational Biology, The University of Kansas, Lawrence, KS, USA
| | - Meghan W Franklin
- Center for Computational Biology, The University of Kansas, Lawrence, KS, USA
| | - Joanna S G Slusky
- Center for Computational Biology, The University of Kansas, Lawrence, KS, USA.
- Department of Molecular Biosciences, The University of Kansas, Lawrence, KS, USA.
| |
Collapse
|
4
|
Cao JR, Fan FF, Lv CJ, Wang HP, Li Y, Hu S, Zhao WR, Chen HB, Huang J, Mei LH. Improving the Thermostability and Activity of Transaminase From Aspergillus terreus by Charge-Charge Interaction. Front Chem 2021; 9:664156. [PMID: 33937200 PMCID: PMC8081293 DOI: 10.3389/fchem.2021.664156] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 03/23/2021] [Indexed: 11/13/2022] Open
Abstract
Transaminases that promote the amination of ketones into amines are an emerging class of biocatalysts for preparing a series of drugs and their intermediates. One of the main limitations of (R)-selective amine transaminase from Aspergillus terreus (At-ATA) is its weak thermostability, with a half-life (t 1/2) of only 6.9 min at 40°C. To improve its thermostability, four important residue sites (E133, D224, E253, and E262) located on the surface of At-ATA were identified using the enzyme thermal stability system (ETSS). Subsequently, 13 mutants (E133A, E133H, E133K, E133R, E133Q, D224A, D224H, D224K, D224R, E253A, E253H, E253K, and E262A) were constructed by site-directed mutagenesis according to the principle of turning the residues into opposite charged ones. Among them, three substitutions, E133Q, D224K, and E253A, displayed higher thermal stability than the wild-type enzyme. Molecular dynamics simulations indicated that these three mutations limited the random vibration amplitude in the two α-helix regions of 130-135 and 148-158, thereby increasing the rigidity of the protein. Compared to the wild-type, the best mutant, D224K, showed improved thermostability with a 4.23-fold increase in t 1/2 at 40°C, and 6.08°C increase in T 50 10 . Exploring the three-dimensional structure of D224K at the atomic level, three strong hydrogen bonds were added to form a special "claw structure" of the α-helix 8, and the residues located at 151-156 also stabilized the α-helix 9 by interacting with each other alternately.
Collapse
Affiliation(s)
- Jia-Ren Cao
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Fang-Fang Fan
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Chang-Jiang Lv
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Hong-Peng Wang
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Ye Li
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Sheng Hu
- School of Biotechnology and Chemical Engineering, NingboTech University, Ningbo, China
| | - Wei-Rui Zhao
- School of Biotechnology and Chemical Engineering, NingboTech University, Ningbo, China
| | - Hai-Bin Chen
- Enzymaster (Ningbo) Bio-Engineering Co., Ltd., Ningbo, China
| | - Jun Huang
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Le-He Mei
- School of Biotechnology and Chemical Engineering, NingboTech University, Ningbo, China.,Jinhua Advanced Research Institute, Jinhua, China.,Department of Chemical and Biological Engineering, Zhejiang University, Hangzhou, China
| |
Collapse
|
5
|
Barile A, Nogués I, di Salvo ML, Bunik V, Contestabile R, Tramonti A. Molecular characterization of pyridoxine 5'-phosphate oxidase and its pathogenic forms associated with neonatal epileptic encephalopathy. Sci Rep 2020; 10:13621. [PMID: 32788630 PMCID: PMC7424515 DOI: 10.1038/s41598-020-70598-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 07/31/2020] [Indexed: 01/18/2023] Open
Abstract
Defects of vitamin B6 metabolism are responsible for severe neurological disorders, such as pyridoxamine 5'-phosphate oxidase deficiency (PNPOD; OMIM: 610090), an autosomal recessive inborn error of metabolism that usually manifests with neonatal-onset severe seizures and subsequent encephalopathy. At present, 27 pathogenic mutations of the gene encoding human PNPO are known, 13 of which are homozygous missense mutations; however, only 3 of them have been characterised with respect to the molecular and functional properties of the variant enzyme forms. Moreover, studies on wild type and variant human PNPOs have so far largely ignored the regulation properties of this enzyme. Here, we present a detailed characterisation of the inhibition mechanism of PNPO by pyridoxal 5'-phosphate (PLP), the reaction product of the enzyme. Our study reveals that human PNPO has an allosteric PLP binding site that plays a crucial role in the enzyme regulation and therefore in the regulation of vitamin B6 metabolism in humans. Furthermore, we have produced, recombinantly expressed and characterised several PNPO pathogenic variants responsible for PNPOD (G118R, R141C, R225H, R116Q/R225H, and X262Q). Such replacements mainly affect the catalytic activity of PNPO and binding of the enzyme substrate and FMN cofactor, leaving the allosteric properties unaltered.
Collapse
Affiliation(s)
- Anna Barile
- Istituto di Biologia e Patologia Molecolari, Consiglio Nazionale delle Ricerche, Rome, Italy.,Dipartimento di Scienze Biochimiche "A. Rossi Fanelli", Sapienza Università di Roma, Rome, Italy
| | - Isabel Nogués
- Istituto di Ricerca sugli Ecosistemi Terrestri, Consiglio Nazionale delle Ricerche, 00015, Monterotondo, Rome, Italy
| | - Martino L di Salvo
- Dipartimento di Scienze Biochimiche "A. Rossi Fanelli", Sapienza Università di Roma, Rome, Italy
| | - Victoria Bunik
- Belozersky Institute of Physico-Chemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, 119991, Russia.,Department of Biochemistry, Sechenov University, Trubetskaya, 8/2, Moscow, 119991, Russia
| | - Roberto Contestabile
- Dipartimento di Scienze Biochimiche "A. Rossi Fanelli", Sapienza Università di Roma, Rome, Italy.
| | - Angela Tramonti
- Istituto di Biologia e Patologia Molecolari, Consiglio Nazionale delle Ricerche, Rome, Italy. .,Dipartimento di Scienze Biochimiche "A. Rossi Fanelli", Sapienza Università di Roma, Rome, Italy.
| |
Collapse
|
6
|
Kim DM, Yao X, Vanam RP, Marlow MS. Measuring the effects of macromolecular crowding on antibody function with biolayer interferometry. MAbs 2019; 11:1319-1330. [PMID: 31401928 PMCID: PMC6748605 DOI: 10.1080/19420862.2019.1647744] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Biotherapeutic proteins are commonly dosed at high concentrations into the blood, which is an inherently complex, crowded solution with substantial protein content. The effects of macromolecular crowding may lead to an appreciable level of non-specific hetero-association in this physiological environment. Therefore, developing a method to characterize the diverse consequences of non-specific interactions between proteins under such non-ideal, crowded conditions, which deviate substantially from those commonly employed for in vitro characterization, is vital to achieving a more complete picture of antibody function in a biological context. In this study, we investigated non-specific interactions between human serum albumin (HSA) and two monoclonal antibodies (mAbs) by static light scattering and determined these interactions are both ionic strength-dependent and mAb-dependent. Using biolayer interferometry (BLI), we assessed the effect of HSA on antigen binding by mAbs, demonstrating that these non-specific interactions have a functional impact on mAb:antigen interactions, particularly at low ionic strength. While this effect is mitigated at physiological ionic strength, our in vitro data support the notion that HSA in the blood may lead to non-specific interactions with mAbs in vivo, with a potential impact on their interactions with antigen. Furthermore, the BLI method offers a high-throughput advantage compared to orthogonal techniques such as analytical ultracentrifugation and is amenable to a greater variety of solution conditions compared to nuclear magnetic resonance spectroscopy. Our study demonstrates that BLI is a viable technology for examining the impact of non-specific interactions on specific biologically relevant interactions, providing a direct method to assess binding events in crowded conditions.
Collapse
Affiliation(s)
- Dorothy M Kim
- Pre-Clinical Development and Protein Chemistry, Regeneron Pharmaceuticals, Inc ., Tarrytown , NY , USA
| | - Xiao Yao
- Pre-Clinical Development and Protein Chemistry, Regeneron Pharmaceuticals, Inc ., Tarrytown , NY , USA
| | - Ram P Vanam
- Pre-Clinical Development and Protein Chemistry, Regeneron Pharmaceuticals, Inc ., Tarrytown , NY , USA
| | - Michael S Marlow
- Pre-Clinical Development and Protein Chemistry, Regeneron Pharmaceuticals, Inc ., Tarrytown , NY , USA.,Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc ., Ridgefield , CT , USA
| |
Collapse
|
7
|
Pahari S, Sun L, Alexov E. PKAD: a database of experimentally measured pKa values of ionizable groups in proteins. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5359213. [PMID: 30805645 PMCID: PMC6389863 DOI: 10.1093/database/baz024] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 01/11/2019] [Accepted: 01/30/2019] [Indexed: 11/14/2022]
Abstract
Ionizable residues play key roles in many biological phenomena including protein folding, enzyme catalysis and binding. We present PKAD, a database of experimentally measured pKas of protein residues reported in the literature or taken from existing databases. The database contains pKa data for 1350 residues in 157 wild-type proteins and for 232 residues in 45 mutant proteins. Most of these values are for Asp, Glu, His and Lys amino acids. The database is available as downloadable file as well as a web server (http://compbio.clemson.edu/pkad). The PKAD database can be used as a benchmarking source for development and improvement of pKa's prediction methods. The web server provides additional information taken from the corresponding structures and amino acid sequences, which allows for easy search and grouping of the experimental pKas according to various biophysical characteristics, amino acid type and others.
Collapse
Affiliation(s)
- Swagata Pahari
- Computational Biophysics and Bioinformatics, Department of Physics and Astronomy, Clemson University, Clemson, South Carolina, USA
| | - Lexuan Sun
- Computational Biophysics and Bioinformatics, Department of Physics and Astronomy, Clemson University, Clemson, South Carolina, USA
| | - Emil Alexov
- Computational Biophysics and Bioinformatics, Department of Physics and Astronomy, Clemson University, Clemson, South Carolina, USA
| |
Collapse
|
8
|
Upadhyay R, Kim JY, Hong EY, Lee SG, Seo JH, Kim BG. RiSLnet: Rapid identification of smart mutant libraries using protein structure network. Application to thermal stability enhancement. Biotechnol Bioeng 2018; 116:250-259. [PMID: 30414290 DOI: 10.1002/bit.26861] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 10/24/2018] [Accepted: 11/07/2018] [Indexed: 01/22/2023]
Abstract
A key point of protein stability engineering is to identify specific target residues whose mutations can stabilize the protein structure without negatively affecting the function or activity of the protein. Here, we propose a method called RiSLnet (Rapid identification of Smart mutant Library using residue network) to identify such residues by combining network analysis for protein residue interactions, identification of conserved residues, and evaluation of relative solvent accessibility. To validate its performance, the method was applied to four proteins, that is, T4 lysozyme, ribonuclease H, barnase, and cold shock protein B. Our method predicted beneficial mutations in thermal stability with ~62% average accuracy when the thermal stability of the mutants was compared with the ones in the Protherm database. It was further applied to lysine decarboxylase (CadA) to experimentally confirm its accuracy and effectiveness. RiSLnet identified mutations increasing the thermal stability of CadA with the accuracy of ~60% and significantly reduced the number of candidate residues (~99%) for mutation. Finally, combinatorial mutations designed by RiSLnet and in silico saturation mutagenesis yielded a thermally stable triple mutant with the half-life (T 1/2 ) of 114.9 min at 58°C, which is approximately twofold higher than that of the wild-type.
Collapse
Affiliation(s)
- Roopali Upadhyay
- Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea.,Institute of Molecular Biology and Genetics, Seoul National University, Seoul, Republic of Korea
| | - Jin Young Kim
- Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea.,Institute of Molecular Biology and Genetics, Seoul National University, Seoul, Republic of Korea
| | - Eun Young Hong
- School of Chemical and Biological Engineering Institute of Engineering Research, Seoul National University, Seoul, Republic of Korea.,Institute of Molecular Biology and Genetics, Seoul National University, Seoul, Republic of Korea
| | - Sun-Gu Lee
- Department of Chemical and Biochemical Engineering, Pusan National University, Busan, Republic of Korea
| | - Joo-Hyun Seo
- Department of BT-Convergent Pharmaceutical Engineering, Sunmoon University, Asan, Republic of Korea
| | - Byung-Gee Kim
- Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea.,School of Chemical and Biological Engineering Institute of Engineering Research, Seoul National University, Seoul, Republic of Korea.,Institute of Molecular Biology and Genetics, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
9
|
Han M, Song Y, Qian J, Ming D. Sequence-based prediction of physicochemical interactions at protein functional sites using a function-and-interaction-annotated domain profile database. BMC Bioinformatics 2018; 19:204. [PMID: 29859055 PMCID: PMC5984826 DOI: 10.1186/s12859-018-2206-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 05/15/2018] [Indexed: 01/16/2023] Open
Abstract
Background Identifying protein functional sites (PFSs) and, particularly, the physicochemical interactions at these sites is critical to understanding protein functions and the biochemical reactions involved. Several knowledge-based methods have been developed for the prediction of PFSs; however, accurate methods for predicting the physicochemical interactions associated with PFSs are still lacking. Results In this paper, we present a sequence-based method for the prediction of physicochemical interactions at PFSs. The method is based on a functional site and physicochemical interaction-annotated domain profile database, called fiDPD, which was built using protein domains found in the Protein Data Bank. This method was applied to 13 target proteins from the very recent Critical Assessment of Structure Prediction (CASP10/11), and our calculations gave a Matthews correlation coefficient (MCC) value of 0.66 for PFS prediction and an 80% recall in the prediction of the associated physicochemical interactions. Conclusions Our results show that, in addition to the PFSs, the physical interactions at these sites are also conserved in the evolution of proteins. This work provides a valuable sequence-based tool for rational drug design and side-effect assessment. The method is freely available and can be accessed at http://202.119.249.49.
Collapse
Affiliation(s)
- Min Han
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Yifan Song
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Jiaqiang Qian
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Dengming Ming
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangsu, 211816, Nanjing, People's Republic of China.
| |
Collapse
|
10
|
Kalaivani R, Reema R, Srinivasan N. Recognition of sites of functional specialisation in all known eukaryotic protein kinase families. PLoS Comput Biol 2018; 14:e1005975. [PMID: 29438395 PMCID: PMC5826538 DOI: 10.1371/journal.pcbi.1005975] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Revised: 02/26/2018] [Accepted: 01/13/2018] [Indexed: 11/25/2022] Open
Abstract
The conserved function of protein phosphorylation, catalysed by members of protein kinase superfamily, is regulated in different ways in different kinase families. Further, differences in activating triggers, cellular localisation, domain architecture and substrate specificity between kinase families are also well known. While the transfer of γ-phosphate from ATP to the hydroxyl group of Ser/Thr/Tyr is mediated by a conserved Asp, the characteristic functional and regulatory sites are specialized at the level of families or sub-families. Such family-specific sites of functional specialization are unknown for most families of kinases. In this work, we systematically identify the family-specific residue features by comparing the extent of conservation of physicochemical properties, Shannon entropy and statistical probability of residue distributions between families of kinases. An integrated discriminatory score, which combines these three features, is developed to demarcate the functionally specialized sites in a kinase family from other sites. We achieved an area under ROC curve of 0.992 for the discrimination of kinase families. Our approach was extensively tested on well-studied families CDK and MAPK, wherein specific protein interaction sites and substrate recognition sites were successfully detected (p-value < 0.05). We also find that the known family-specific oncogenic driver mutation sites were scored high by our method. The method was applied to all known kinases encompassing 107 families from diverse eukaryotic organisms leading to a comprehensive list of family-specific functional sites. Apart from other uses, our method facilitates identification of specific protein interaction sites and drug target sites in a kinase family.
Collapse
Affiliation(s)
- Raju Kalaivani
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | - Raju Reema
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | | |
Collapse
|
11
|
Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, Carter L, Ravichandran R, Mulligan VK, Chevalier A, Arrowsmith CH, Baker D. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 2018; 357:168-175. [PMID: 28706065 PMCID: PMC5568797 DOI: 10.1126/science.aan0693] [Citation(s) in RCA: 296] [Impact Index Per Article: 49.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Accepted: 06/09/2017] [Indexed: 12/18/2022]
Abstract
Proteins fold into unique native structures stabilized by thousands of weak interactions that collectively overcome the entropic cost of folding. Although these forces are "encoded" in the thousands of known protein structures, "decoding" them is challenging because of the complexity of natural proteins that have evolved for function, not stability. We combined computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for more than 15,000 de novo designed miniproteins, 1000 natural proteins, 10,000 point mutants, and 30,000 negative control sequences. This analysis identified more than 2500 stable designed proteins in four basic folds-a number sufficient to enable us to systematically examine how sequence determines folding and stability in uncharted protein space. Iteration between design and experiment increased the design success rate from 6% to 47%, produced stable proteins unlike those found in nature for topologies where design was initially unsuccessful, and revealed subtle contributions to stability as designs became increasingly optimized. Our approach achieves the long-standing goal of a tight feedback cycle between computation and experiment and has the potential to transform computational protein design into a data-driven science.
Collapse
Affiliation(s)
- Gabriel J Rocklin
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Tamuka M Chidyausiku
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA.,Graduate Program in Biological Physics, Structure, and Design, University of Washington, Seattle, WA 98195, USA
| | - Inna Goreshnik
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Alex Ford
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA.,Graduate Program in Biological Physics, Structure, and Design, University of Washington, Seattle, WA 98195, USA
| | - Scott Houliston
- Princess Margaret Cancer Centre, Toronto, Ontario M5G 1L7, Canada.,Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
| | - Alexander Lemak
- Princess Margaret Cancer Centre, Toronto, Ontario M5G 1L7, Canada
| | - Lauren Carter
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Rashmi Ravichandran
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Vikram K Mulligan
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Aaron Chevalier
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Cheryl H Arrowsmith
- Princess Margaret Cancer Centre, Toronto, Ontario M5G 1L7, Canada.,Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario M5G 1L7, Canada
| | - David Baker
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA. .,Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
12
|
McCafferty CL, Sergeev YV. Global computational mutagenesis provides a critical stability framework in protein structures. PLoS One 2017; 12:e0189064. [PMID: 29216252 PMCID: PMC5720693 DOI: 10.1371/journal.pone.0189064] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Accepted: 11/17/2017] [Indexed: 11/20/2022] Open
Abstract
A protein’s amino acid sequence dictates the folds and final structure the macromolecule will form. We propose that by identifying critical residues in a protein’s atomic structure, we can select a critical stability framework within the protein structure essential to proper protein folding. We use global computational mutagenesis based on the unfolding mutation screen to test the effect of every possible missense mutation on the protein structure to identify the residues that cannot tolerate a substitution without causing protein misfolding. This method was tested using molecular dynamics to simulate the stability effects of mutating critical residues in proteins involved in inherited disease, such as myoglobin, p53, and the 15th sushi domain of complement factor H. In addition we prove that when the critical residues are in place, other residues may be changed within the structure without a stability loss. We validate that critical residues are conserved using myoglobin to show that critical residues are the same for crystal structures of 6 different species and comparing conservation indices to critical residues in 9 eye disease-related proteins. Our studies demonstrate that by using a selection of critical elements in a protein structure we can identify a critical protein stability framework. The frame of critical residues can be used in genetic engineering to improve small molecule binding for drug studies, identify loss-of-function disease-causing missense mutations in genetics studies, and aide in identifying templates for homology modeling.
Collapse
Affiliation(s)
- Caitlyn L. McCafferty
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Yuri V. Sergeev
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
13
|
Glantz-Gashai Y, Meirson T, Samson AO. Normal Modes Expose Active Sites in Enzymes. PLoS Comput Biol 2016; 12:e1005293. [PMID: 28002427 PMCID: PMC5225006 DOI: 10.1371/journal.pcbi.1005293] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2015] [Revised: 01/10/2017] [Accepted: 12/07/2016] [Indexed: 01/10/2023] Open
Abstract
Accurate prediction of active sites is an important tool in bioinformatics. Here we present an improved structure based technique to expose active sites that is based on large changes of solvent accessibility accompanying normal mode dynamics. The technique which detects EXPOsure of active SITes through normal modEs is named EXPOSITE. The technique is trained using a small 133 enzyme dataset and tested using a large 845 enzyme dataset, both with known active site residues. EXPOSITE is also tested in a benchmark protein ligand dataset (PLD) comprising 48 proteins with and without bound ligands. EXPOSITE is shown to successfully locate the active site in most instances, and is found to be more accurate than other structure-based techniques. Interestingly, in several instances, the active site does not correspond to the largest pocket. EXPOSITE is advantageous due to its high precision and paves the way for structure based prediction of active site in enzymes. In this paper, we present an improved technique to predict active sites in enzymes. Our technique is based on changes of solvent accessibility that accompany normal mode dynamics. We assert the technique strength using several enzyme datasets with known catalytic residues. We show the technique successfully locates the active site in most cases, and consistently surpasses the accuracy of other techniques. We show how the technique is advantageous and paves the way for high precision prediction of active sites.
Collapse
Affiliation(s)
| | - Tomer Meirson
- Faculty of Medicine in the Galilee, Bar Ilan University, Safed, Israel
| | - Abraham O. Samson
- Faculty of Medicine in the Galilee, Bar Ilan University, Safed, Israel
- * E-mail:
| |
Collapse
|
14
|
Au L, Green DF. Direct Calculation of Protein Fitness Landscapes through Computational Protein Design. Biophys J 2016; 110:75-84. [PMID: 26745411 DOI: 10.1016/j.bpj.2015.11.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Revised: 11/03/2015] [Accepted: 11/16/2015] [Indexed: 11/24/2022] Open
Abstract
Naturally selected amino-acid sequences or experimentally derived ones are often the basis for understanding how protein three-dimensional conformation and function are determined by primary structure. Such sequences for a protein family comprise only a small fraction of all possible variants, however, representing the fitness landscape with limited scope. Explicitly sampling and characterizing alternative, unexplored protein sequences would directly identify fundamental reasons for sequence robustness (or variability), and we demonstrate that computational methods offer an efficient mechanism toward this end, on a large scale. The dead-end elimination and A(∗) search algorithms were used here to find all low-energy single mutant variants, and corresponding structures of a G-protein heterotrimer, to measure changes in structural stability and binding interactions to define a protein fitness landscape. We established consistency between these algorithms with known biophysical and evolutionary trends for amino-acid substitutions, and could thus recapitulate known protein side-chain interactions and predict novel ones.
Collapse
Affiliation(s)
- Loretta Au
- Department of Statistics, The University of Chicago, Chicago, Illinois.
| | - David F Green
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York
| |
Collapse
|
15
|
Oda H, Ota M, Toh H. Profile comparison revealed deviation from structural constraint at the positively selected sites. Biosystems 2016; 147:67-77. [PMID: 27443483 DOI: 10.1016/j.biosystems.2016.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2015] [Revised: 07/13/2016] [Accepted: 07/16/2016] [Indexed: 11/18/2022]
Abstract
The amino acid substitutions at a site are affected by mixture of various constraints. It is also known that the amino acid substitutions are accelerated at sites under positive selection. However, the relationship between the substitutions at positively selected sites and the constraints has not been thoroughly examined. The advances in computational biology have enabled us to divide the mixture of the constraints into the structural constraint and the remainings by using the amino acid sequences and the tertiary structures, which is expressed as the deviation of the mixture of constraints from the structural constraint. Here, two types of profiles, or matrices with the size of 20 x (site length), are compared. One of the profiles represents the mixture of constraints, and is generated from a multiple amino acid sequence alignment, whereas the other is designed to represent the structural constraints. We applied the profile comparison method to proteins under positive selection to examine the relationship between the positive selection and constraints. The results suggested that the constraint at a site under positive selection tends to be deviated from the structural constraint at the site.
Collapse
Affiliation(s)
- Hiroyuki Oda
- Graduate School of Systems Life Sciences, Kyushu University, 744 Motooka Nishi-ku, Fukuoka 819-0395, Japan.
| | - Motonori Ota
- Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya City, Aichi 464-8601, Japan
| | - Hiroyuki Toh
- Department of Biomedical Chemistry, School of Science and Technology, Kwansei Gakuin University, 2-1 Gakuen, Sanda, Hyogo 669-1337, Japan
| |
Collapse
|
16
|
Arcus VL, Prentice EJ, Hobbs JK, Mulholland AJ, Van der Kamp MW, Pudney CR, Parker EJ, Schipper LA. On the Temperature Dependence of Enzyme-Catalyzed Rates. Biochemistry 2016; 55:1681-8. [DOI: 10.1021/acs.biochem.5b01094] [Citation(s) in RCA: 175] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Affiliation(s)
- Vickery L. Arcus
- School
of Science, University of Waikato, Hamilton 3240, New Zealand
| | - Erica J. Prentice
- School
of Science, University of Waikato, Hamilton 3240, New Zealand
| | - Joanne K. Hobbs
- School
of Science, University of Waikato, Hamilton 3240, New Zealand
| | | | | | - Christopher R. Pudney
- Department
of Biology and Biochemistry, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom
| | - Emily J. Parker
- Biomolecular
Interaction Centre and Department of Chemistry, University of Canterbury, Christchurch 8041, New Zealand
| | - Louis A. Schipper
- School
of Science, University of Waikato, Hamilton 3240, New Zealand
| |
Collapse
|
17
|
De Laet M, Gilis D, Rooman M. Stability strengths and weaknesses in protein structures detected by statistical potentials: Application to bovine seminal ribonuclease. Proteins 2015; 84:143-58. [DOI: 10.1002/prot.24962] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Revised: 10/27/2015] [Accepted: 11/09/2015] [Indexed: 11/10/2022]
Affiliation(s)
- Marie De Laet
- 3BIO-BioInfo Department; Université Libre De Bruxelles; Avenue F. Roosevelt 50 CP 165/61 Brussels 1050 Belgium
| | - Dimitri Gilis
- 3BIO-BioInfo Department; Université Libre De Bruxelles; Avenue F. Roosevelt 50 CP 165/61 Brussels 1050 Belgium
| | - Marianne Rooman
- 3BIO-BioInfo Department; Université Libre De Bruxelles; Avenue F. Roosevelt 50 CP 165/61 Brussels 1050 Belgium
| |
Collapse
|
18
|
Aubailly S, Piazza F. Cutoff lensing: predicting catalytic sites in enzymes. Sci Rep 2015; 5:14874. [PMID: 26445900 PMCID: PMC4597221 DOI: 10.1038/srep14874] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2015] [Accepted: 09/10/2015] [Indexed: 01/12/2023] Open
Abstract
Predicting function-related amino acids in proteins with unknown function or unknown allosteric binding sites in drug-targeted proteins is a task of paramount importance in molecular biomedicine. In this paper we introduce a simple, light and computationally inexpensive structure-based method to identify catalytic sites in enzymes. Our method, termed cutoff lensing, is a general procedure consisting in letting the cutoff used to build an elastic network model increase to large values. A validation of our method against a large database of annotated enzymes shows that optimal values of the cutoff exist such that three different structure-based indicators allow one to recover a maximum of the known catalytic sites. Interestingly, we find that the larger the structures the greater the predictive power afforded by our method. Possible ways to combine the three indicators into a single figure of merit and into a specific sequential analysis are suggested and discussed with reference to the classic case of HIV-protease. Our method could be used as a complement to other sequence- and/or structure-based methods to narrow the results of large-scale screenings.
Collapse
Affiliation(s)
- Simon Aubailly
- Université d'Orléans, Centre de Biophysique Moléculaire, CNRS-UPR4301, Rue C. Sadron, 45071, Orléans, France
| | - Francesco Piazza
- Université d'Orléans, Centre de Biophysique Moléculaire, CNRS-UPR4301, Rue C. Sadron, 45071, Orléans, France
| |
Collapse
|
19
|
Ranganathan A, Dror RO, Carlsson J. Insights into the Role of Asp792.50 in β2 Adrenergic Receptor Activation from Molecular Dynamics Simulations. Biochemistry 2014; 53:7283-96. [DOI: 10.1021/bi5008723] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Anirudh Ranganathan
- Science for Life Laboratory, Box 1031, SE-171 21 Solna, Sweden
- Department
of Biochemistry and Biophysics and Center for Biomembrane Research, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Ron O. Dror
- Department
of Computer Science, Department of Molecular and Cellular Physiology,
and Institute for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
| | - Jens Carlsson
- Science for Life Laboratory, Box 1031, SE-171 21 Solna, Sweden
- Department
of Biochemistry and Biophysics and Center for Biomembrane Research, Stockholm University, SE-106 91 Stockholm, Sweden
| |
Collapse
|
20
|
Angelucci F, Morea V, Angelaccio S, Saccoccia F, Contestabile R, Ilari A. The crystal structure of archaeal serine hydroxymethyltransferase reveals idiosyncratic features likely required to withstand high temperatures. Proteins 2014; 82:3437-49. [PMID: 25257552 DOI: 10.1002/prot.24697] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Revised: 09/09/2014] [Accepted: 09/10/2014] [Indexed: 01/19/2023]
Abstract
Serine hydroxymethyltransferases (SHMTs) play an essential role in one-carbon unit metabolism and are used in biomimetic reactions. We determined the crystal structure of free (apo) and pyridoxal-5'-phosphate-bound (holo) SHMT from Methanocaldococcus jannaschii, the first from a hyperthermophile, from the archaea domain of life and that uses H₄MPT as a cofactor, at 2.83 and 3.0 Å resolution, respectively. Idiosyncratic features were observed that are likely to contribute to structure stabilization. At the dimer interface, the C-terminal region folds in a unique fashion with respect to SHMTs from eubacteria and eukarya. At the active site, the conserved tyrosine does not make a cation-π interaction with an arginine like that observed in all other SHMT structures, but establishes an amide-aromatic interaction with Asn257, at a different sequence position. This asparagine residue is conserved and occurs almost exclusively in (hyper)thermophile SHMTs. This led us to formulate the hypothesis that removal of frustrated interactions (such as the Arg-Tyr cation-π interaction occurring in mesophile SHMTs) is an additional strategy of adaptation to high temperature. Both peculiar features may be tested by designing enzyme variants potentially endowed with improved stability for applications in biomimetic processes.
Collapse
Affiliation(s)
- Francesco Angelucci
- Department of Life, Health and Environmental Sciences, University of L'Aquila, P.le Salvatore Tommasi 1, L'Aquila, Italy
| | | | | | | | | | | |
Collapse
|
21
|
Chen BY. VASP-E: specificity annotation with a volumetric analysis of electrostatic isopotentials. PLoS Comput Biol 2014; 10:e1003792. [PMID: 25166865 PMCID: PMC4148194 DOI: 10.1371/journal.pcbi.1003792] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Accepted: 06/17/2014] [Indexed: 12/01/2022] Open
Abstract
Algorithms for comparing protein structure are frequently used for function annotation. By searching for subtle similarities among very different proteins, these algorithms can identify remote homologs with similar biological functions. In contrast, few comparison algorithms focus on specificity annotation, where the identification of subtle differences among very similar proteins can assist in finding small structural variations that create differences in binding specificity. Few specificity annotation methods consider electrostatic fields, which play a critical role in molecular recognition. To fill this gap, this paper describes VASP-E (Volumetric Analysis of Surface Properties with Electrostatics), a novel volumetric comparison tool based on the electrostatic comparison of protein-ligand and protein-protein binding sites. VASP-E exploits the central observation that three dimensional solids can be used to fully represent and compare both electrostatic isopotentials and molecular surfaces. With this integrated representation, VASP-E is able to dissect the electrostatic environments of protein-ligand and protein-protein binding interfaces, identifying individual amino acids that have an electrostatic influence on binding specificity. VASP-E was used to examine a nonredundant subset of the serine and cysteine proteases as well as the barnase-barstar and Rap1a-raf complexes. Based on amino acids established by various experimental studies to have an electrostatic influence on binding specificity, VASP-E identified electrostatically influential amino acids with 100% precision and 83.3% recall. We also show that VASP-E can accurately classify closely related ligand binding cavities into groups with different binding preferences. These results suggest that VASP-E should prove a useful tool for the characterization of specific binding and the engineering of binding preferences in proteins. Proteins, the ubiquitous worker molecules of the cell, are a diverse class of molecules that perform very specific tasks. Understanding how proteins achieve specificity is a critical step towards understanding biological systems and a key prerequisite for rationally engineering new proteins. To examine electrostatic influences on specificity in proteins, this paper presents VASP-E, a software tool that generates solid representations of the electrostatic potential fields that surround proteins. VASP-E compares solids with constructive solid geometry, a class of techniques developed first for modeling complex machine parts. We observed that solid representations could quantify the degree of charge complementarity in protein-protein interactions and identify key residues that strengthen or weaken them. VASP-E correctly identified amino acids with established experimental influences on protein-protein binding specificity. We also observed that solid representations of electrostatic fields could identify electrostatic conservations and variations that relate to similarities and differences in binding specificity between proteins and small molecules.
Collapse
Affiliation(s)
- Brian Y. Chen
- Department of Computer Science and Engineering, P.C. Rossin College of Engineering and Applied Sciences, Lehigh University, Bethlehem, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
22
|
Zhang L, Tang X, Cui D, Yao Z, Gao B, Jiang S, Yin B, Yuan YA, Wei D. A method to rationally increase protein stability based on the charge-charge interaction, with application to lipase LipK107. Protein Sci 2013; 23:110-6. [PMID: 24353171 DOI: 10.1002/pro.2388] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2013] [Revised: 10/14/2013] [Accepted: 10/21/2013] [Indexed: 11/10/2022]
Abstract
We report a suite of enzyme redesign protocol based on the surface charge-charge interaction calculation, which is potentially applied to improve the stability of an enzyme without compromising its catalytic activity. Together with the experimental validation, we have released a suite of enzyme redesign algorithm Enzyme Thermal Stability System, written based on our model, for open access to meet the needs in wet labs. Lipk107, a lipase of a versatile industrial use, was chosen to test our software. Our calculation determined that four residues, D113, D149, D213, and D253, located on the surface of LipK107 were critical to the stability of the enzyme. The model was validated with mutagenesis at these four residues followed by stability and activity tests. LipK107 mutants D113A and D149K were more resistant to thermal inactivation with ∼10°C higher half-inactivation temperature than wild-type LipK107. Moreover, mutant D149K exhibited significant retention in residual activity under constant heat, showing a 14-fold increase in the half-inactivation time at 50°C. Activity tests showed that these mutants retained the equal or higher specific activity, among which noteworthy was the mutant D253A with as much as 20% higher activity. We suggest that our protocol could be used as a general guideline to redesign protein enzymes with increased stabilities and enhanced activities.
Collapse
Affiliation(s)
- Lujia Zhang
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, People's Republic of China; State Key Laboratory of Materials-Oriented Chemical Engineering, Nanjing University of Technology, Nanjing, Jiangsu 211800, People's Republic of China
| | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Jackson EL, Ollikainen N, Covert AW, Kortemme T, Wilke CO. Amino-acid site variability among natural and designed proteins. PeerJ 2013; 1:e211. [PMID: 24255821 PMCID: PMC3828621 DOI: 10.7717/peerj.211] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Accepted: 10/24/2013] [Indexed: 11/20/2022] Open
Abstract
Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.
Collapse
Affiliation(s)
- Eleisha L. Jackson
- Institute of Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, and Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Noah Ollikainen
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, CA, USA
| | - Arthur W. Covert
- Institute of Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, and Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Tanja Kortemme
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, CA, USA
- California Institute for Quantitative Biosciences (QB3) and Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Claus O. Wilke
- Institute of Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, and Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
24
|
Schwans JP, Sunden F, Gonzalez A, Tsai Y, Herschlag D. Uncovering the determinants of a highly perturbed tyrosine pKa in the active site of ketosteroid isomerase. Biochemistry 2013; 52:7840-55. [PMID: 24151972 PMCID: PMC3890242 DOI: 10.1021/bi401083b] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Within the idiosyncratic enzyme active-site environment, side chain and ligand pKa values can be profoundly perturbed relative to their values in aqueous solution. Whereas structural inspection of systems has often attributed perturbed pKa values to dominant contributions from placement near charged groups or within hydrophobic pockets, Tyr57 of a Pseudomonas putida ketosteroid isomerase (KSI) mutant, suggested to have a pKa perturbed by nearly 4 units to 6.3, is situated within a solvent-exposed active site devoid of cationic side chains, metal ions, or cofactors. Extensive comparisons among 45 variants with mutations in and around the KSI active site, along with protein semisynthesis, (13)C NMR spectroscopy, absorbance spectroscopy, and X-ray crystallography, was used to unravel the basis for this perturbed Tyr pKa. The results suggest that the origin of large energetic perturbations are more complex than suggested by visual inspection. For example, the introduction of positively charged residues near Tyr57 raises its pKa rather than lowers it; this effect, and part of the increase in the Tyr pKa from the introduction of nearby anionic groups, arises from accompanying active-site structural rearrangements. Other mutations with large effects also cause structural perturbations or appear to displace a structured water molecule that is part of a stabilizing hydrogen-bond network. Our results lead to a model in which three hydrogen bonds are donated to the stabilized ionized Tyr, with these hydrogen-bond donors, two Tyr side chains, and a water molecule positioned by other side chains and by a water-mediated hydrogen-bond network. These results support the notion that large energetic effects are often the consequence of multiple stabilizing interactions rather than a single dominant interaction. Most generally, this work provides a case study for how extensive and comprehensive comparisons via site-directed mutagenesis in a tight feedback loop with structural analysis can greatly facilitate our understanding of enzyme active-site energetics. The extensive data set provided may also be a valuable resource for those wishing to extensively test computational approaches for determining enzymatic pKa values and energetic effects.
Collapse
Affiliation(s)
- Jason P. Schwans
- Department of Biochemistry, Stanford University, Stanford, California 94305
| | - Fanny Sunden
- Department of Biochemistry, Stanford University, Stanford, California 94305
| | - Ana Gonzalez
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, California 94025
| | - Yingssu Tsai
- Department of Chemistry, Stanford University, Stanford, California 94305
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, California 94025
| | - Daniel Herschlag
- Department of Biochemistry, Stanford University, Stanford, California 94305
- Department of Chemistry, Stanford University, Stanford, California 94305
| |
Collapse
|
25
|
An accurate method for prediction of protein-ligand binding site on protein surface using SVM and statistical depth function. BIOMED RESEARCH INTERNATIONAL 2013; 2013:409658. [PMID: 24195070 PMCID: PMC3806129 DOI: 10.1155/2013/409658] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2013] [Revised: 08/15/2013] [Accepted: 08/29/2013] [Indexed: 11/17/2022]
Abstract
Since proteins carry out their functions through interactions with other molecules, accurately identifying the protein-ligand binding site plays an important role in protein functional annotation and rational drug discovery. In the past two decades, a lot of algorithms were present to predict the protein-ligand binding site. In this paper, we introduce statistical depth function to define negative samples and propose an SVM-based method which integrates sequence and structural information to predict binding site. The results show that the present method performs better than the existent ones. The accuracy, sensitivity, and specificity on training set are 77.55%, 56.15%, and 87.96%, respectively; on the independent test set, the accuracy, sensitivity, and specificity are 80.36%, 53.53%, and 92.38%, respectively.
Collapse
|
26
|
Stefl S, Nishi H, Petukh M, Panchenko AR, Alexov E. Molecular mechanisms of disease-causing missense mutations. J Mol Biol 2013; 425:3919-36. [PMID: 23871686 DOI: 10.1016/j.jmb.2013.07.014] [Citation(s) in RCA: 187] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2013] [Revised: 07/04/2013] [Accepted: 07/10/2013] [Indexed: 12/23/2022]
Abstract
Genetic variations resulting in a change of amino acid sequence can have a dramatic effect on stability, hydrogen bond network, conformational dynamics, activity and many other physiologically important properties of proteins. The substitutions of only one residue in a protein sequence, so-called missense mutations, can be related to many pathological conditions and may influence susceptibility to disease and drug treatment. The plausible effects of missense mutations range from affecting the macromolecular stability to perturbing macromolecular interactions and cellular localization. Here we review the individual cases and genome-wide studies that illustrate the association between missense mutations and diseases. In addition, we emphasize that the molecular mechanisms of effects of mutations should be revealed in order to understand the disease origin. Finally, we report the current state-of-the-art methodologies that predict the effects of mutations on protein stability, the hydrogen bond network, pH dependence, conformational dynamics and protein function.
Collapse
Affiliation(s)
- Shannon Stefl
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, SC 29634, USA
| | | | | | | | | |
Collapse
|
27
|
Bianchi V, Mangone I, Ferrè F, Helmer-Citterich M, Ausiello G. webPDBinder: a server for the identification of ligand binding sites on protein structures. Nucleic Acids Res 2013; 41:W308-13. [PMID: 23737450 PMCID: PMC3692056 DOI: 10.1093/nar/gkt457] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
The webPDBinder (http://pdbinder.bio.uniroma2.it/PDBinder) is a web server for the identification of small ligand-binding sites in a protein structure. webPDBinder searches a protein structure against a library of known binding sites and a collection of control non-binding pockets. The number of similarities identified with the residues in the two sets is then used to derive a propensity value for each residue of the query protein associated to the likelihood that the residue is part of a ligand binding site. The predicted binding residues can be further refined using conservation scores derived from the multiple alignment of the PFAM protein family. webPDBinder correctly identifies residues belonging to the binding site in 77% of the cases and is able to identify binding pockets starting from holo or apo structures with comparable performances. This is important for all the real world cases where the query protein has been crystallized without a ligand and is also difficult to obtain clear similarities with bound pockets from holo pocket libraries. The input is either a PDB code or a user-submitted structure. The output is a list of predicted binding pocket residues with propensity and conservation values both in text and graphical format.
Collapse
Affiliation(s)
- Valerio Bianchi
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
| | | | | | | | | |
Collapse
|
28
|
Kanematsu Y, Koike R, Amemiya T, Ota M. Substrate-shielding and hydrolytic reaction in hydrolases. Proteins 2013; 81:926-32. [DOI: 10.1002/prot.24253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Revised: 12/10/2012] [Accepted: 01/04/2013] [Indexed: 11/07/2022]
|
29
|
Brylinski M. The utility of artificially evolved sequences in protein threading and fold recognition. J Theor Biol 2013; 328:77-88. [PMID: 23542050 DOI: 10.1016/j.jtbi.2013.03.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Revised: 01/24/2013] [Accepted: 03/18/2013] [Indexed: 12/23/2022]
Abstract
Template-based protein structure prediction plays an important role in Functional Genomics by providing structural models of gene products, which can be utilized by structure-based approaches to function inference. From a systems level perspective, the high structural coverage of gene products in a given organism is critical. Despite continuous efforts towards the development of more sensitive threading approaches, confident structural models cannot be constructed for a considerable fraction of proteins due to difficulties in recognizing low-sequence identity templates with a similar fold to the target. Here we introduce a new modeling stratagem, which employs a library of synthetic sequences to improve template ranking in fold recognition by sequence profile-based methods. We developed a new method for the optimization of generic protein-like amino acid sequences to stabilize the respective structures using a combined empirical scoring function, which is compatible with these commonly used in protein threading and fold recognition. We show that the artificially evolved sequences, whose average sequence identity to the wild-type sequences is as low as 13.8%, have significant capabilities to recognize the correct structures. Importantly, the quality of the corresponding threading alignments is comparable to these constructed using conventional wild-type approaches (the average TM-score is 0.48 and 0.54, respectively). Fold recognition that uses data fusion to combine ranks calculated for both wild-type and synthetic template libraries systematically improves the detection of structural analogs. Depending on the threading algorithm used, it yields on average 4-16% higher recognition rates than using the wild-type template library alone. Synthetic sequences artificially evolved for the template structures provide an orthogonal source of signal that could be exploited to detect these templates unrecognized by standard modeling techniques. It opens up new directions in the development of more sensitive threading methods with the enhanced capabilities of targeting difficult, midnight zone templates.
Collapse
Affiliation(s)
- Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
| |
Collapse
|
30
|
Chakraborty S. A quantitative measure of electrostatic perturbation in holo and apo enzymes induced by structural changes. PLoS One 2013; 8:e59352. [PMID: 23516628 PMCID: PMC3597595 DOI: 10.1371/journal.pone.0059352] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2012] [Accepted: 02/13/2013] [Indexed: 11/19/2022] Open
Abstract
Biological pathways are subject to subtle manipulations that achieve a wide range of functional variation in differing physiological niches. In many instances, changes in the structure of an enzyme on ligand binding germinate electrostatic perturbations that form the basis of its changed catalytic or transcriptional efficiency. Computational methods that seek to gain insights into the electrostatic changes in enzymes require expertise to setup and computing prowess. In the current work, we present a fast, easy and reliable methodology to compute electrostatic perturbations induced by ligand binding (MEPP). The theoretical foundation of MEPP is the conserved electrostatic potential difference (EPD) in cognate pairs of active site residues in proteins with the same functionality. Previously, this invariance has been used to unravel promiscuous serine protease and metallo-β-lactamase scaffolds in alkaline phosphatases. Given that a similarity in EPD is significant, we expect differences in the EPD to be significant too. MEPP identifies residues or domains that undergo significant electrostatic perturbations, and also enumerates residue pairs that undergo significant polarity change. The gain in a certain polarity of a residue with respect to neighboring residues, or the reversal of polarity between two residues might indicate a change in the preferred ligand. The methodology of MEPP has been demonstrated on several enzymes that employ varying mechanisms to perform their roles. For example, we have attributed the change in polarity in residue pairs to be responsible for the loss of metal ion binding in fructose 1,6-bisphosphatases, and corroborated the pre-organized state of the active site of the enzyme with respect to functionally relevant changes in electric fields in ketosteroid isomerases.
Collapse
Affiliation(s)
- Sandeep Chakraborty
- Department of Biological Sciences, Tata Institute of Fundamental Research, Mumbai, India.
| |
Collapse
|
31
|
Skolnick J, Zhou H, Gao M. Are predicted protein structures of any value for binding site prediction and virtual ligand screening? Curr Opin Struct Biol 2013; 23:191-7. [PMID: 23415854 DOI: 10.1016/j.sbi.2013.01.009] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Revised: 01/04/2013] [Accepted: 01/23/2013] [Indexed: 01/03/2023]
Abstract
The recently developed field of ligand homology modeling (LHM) that extends the ideas of protein homology modeling to the prediction of ligand binding sites and for use in virtual ligand screening has emerged as a powerful new approach. Unlike traditional docking methodologies, LHM can be applied to low-to-moderate resolution predicted as well as experimental structures with little if any diminution in performance; thereby enabling ≈ 75% of an average proteome to have potentially significant virtual screening predictions. In large scale benchmarking, LHM is able to predict off-target ligand binding. Thus, despite the widespread belief to the contrary, low-to-moderate resolution predicted structures have considerable utility for biochemical function prediction.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA.
| | | | | |
Collapse
|
32
|
Gao YF, Li BQ, Cai YD, Feng KY, Li ZD, Jiang Y. Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection. ACTA ACUST UNITED AC 2013; 9:61-9. [DOI: 10.1039/c2mb25327e] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
33
|
Mallipeddi PL, Joshi M, Briggs JM. Pharmacophore-Based Virtual Screening to Aid in the Identification of Unknown Protein Function. Chem Biol Drug Des 2012; 80:828-42. [DOI: 10.1111/j.1747-0285.2012.01408.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
34
|
Ray A, Lindahl E, Wallner B. Improved model quality assessment using ProQ2. BMC Bioinformatics 2012; 13:224. [PMID: 22963006 PMCID: PMC3584948 DOI: 10.1186/1471-2105-13-224] [Citation(s) in RCA: 150] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2012] [Accepted: 09/07/2012] [Indexed: 11/19/2022] Open
Abstract
Background Employing methods to assess the quality of modeled protein structures is now standard practice in bioinformatics. In a broad sense, the techniques can be divided into methods relying on consensus prediction on the one hand, and single-model methods on the other. Consensus methods frequently perform very well when there is a clear consensus, but this is not always the case. In particular, they frequently fail in selecting the best possible model in the hard cases (lacking consensus) or in the easy cases where models are very similar. In contrast, single-model methods do not suffer from these drawbacks and could potentially be applied on any protein of interest to assess quality or as a scoring function for sampling-based refinement. Results Here, we present a new single-model method, ProQ2, based on ideas from its predecessor, ProQ. ProQ2 is a model quality assessment algorithm that uses support vector machines to predict local as well as global quality of protein models. Improved performance is obtained by combining previously used features with updated structural and predicted features. The most important contribution can be attributed to the use of profile weighting of the residue specific features and the use features averaged over the whole model even though the prediction is still local. Conclusions ProQ2 is significantly better than its predecessors at detecting high quality models, improving the sum of Z-scores for the selected first-ranked models by 20% and 32% compared to the second-best single-model method in CASP8 and CASP9, respectively. The absolute quality assessment of the models at both local and global level is also improved. The Pearson’s correlation between the correct and local predicted score is improved from 0.59 to 0.70 on CASP8 and from 0.62 to 0.68 on CASP9; for global score to the correct GDT_TS from 0.75 to 0.80 and from 0.77 to 0.80 again compared to the second-best single methods in CASP8 and CASP9, respectively. ProQ2 is available at http://proq2.wallnerlab.org.
Collapse
Affiliation(s)
- Arjun Ray
- Department of Theoretical Physics & Swedish eScience Research Center, Royal Institute of Technology, Stockholm, Sweden
| | | | | |
Collapse
|
35
|
Han L, Zhang YJ, Song J, Liu MS, Zhang Z. Identification of catalytic residues using a novel feature that integrates the microenvironment and geometrical location properties of residues. PLoS One 2012; 7:e41370. [PMID: 22829945 PMCID: PMC3400608 DOI: 10.1371/journal.pone.0041370] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Accepted: 06/20/2012] [Indexed: 11/18/2022] Open
Abstract
Enzymes play a fundamental role in almost all biological processes and identification of catalytic residues is a crucial step for deciphering the biological functions and understanding the underlying catalytic mechanisms. In this work, we developed a novel structural feature called MEDscore to identify catalytic residues, which integrated the microenvironment (ME) and geometrical properties of amino acid residues. Firstly, we converted a residue's ME into a series of spatially neighboring residue pairs, whose likelihood of being located in a catalytic ME was deduced from a benchmark enzyme dataset. We then calculated an ME-based score, termed as MEscore, by summing up the likelihood of all residue pairs. Secondly, we defined a parameter called Dscore to measure the relative distance of a residue to the center of the protein, provided that catalytic residues are typically located in the center of the protein structure. Finally, we defined the MEDscore feature based on an effective nonlinear integration of MEscore and Dscore. When evaluated on a well-prepared benchmark dataset using five-fold cross-validation tests, MEDscore achieved a robust performance in identifying catalytic residues with an AUC1.0 of 0.889. At a ≤ 10% false positive rate control, MEDscore correctly identified approximately 70% of the catalytic residues. Remarkably, MEDscore achieved a competitive performance compared with the residue conservation score (e.g. CONscore), the most informative singular feature predominantly employed to identify catalytic residues. To the best of our knowledge, MEDscore is the first singular structural feature exhibiting such an advantage. More importantly, we found that MEDscore is complementary with CONscore and a significantly improved performance can be achieved by combining CONscore with MEDscore in a linear manner. As an implementation of this work, MEDscore has been made freely accessible at http://protein.cau.edu.cn/mepi/.
Collapse
Affiliation(s)
- Lei Han
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, People's Republic of China
| | - Yong-Jun Zhang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, People's Republic of China
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, People's Republic of China
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
| | - Ming S. Liu
- CSIRO - Mathematics, Informatics and Statistics, Clayton, Victoria, Australia
- * E-mail: (MSL); (ZZ)
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, People's Republic of China
- * E-mail: (MSL); (ZZ)
| |
Collapse
|
36
|
Abstract
This chapter describes a method for analyzing the allosteric influence of molecular interactions on protein conformational distributions. The method, called Dynamics Perturbation Analysis (DPA), generally yields insights into allosteric effects in proteins and is especially useful for predicting ligand-binding sites. The use of DPA for binding site prediction is motivated by the following allosteric regulation hypothesis: interactions in native binding sites cause a large change in protein conformational distributions. Here, we review the reasoning behind this hypothesis, describe the math behind the method, and present a recipe for predicting binding sites using DPA.
Collapse
Affiliation(s)
- Dengming Ming
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, China
| | | |
Collapse
|
37
|
Chakraborty S, Minda R, Salaye L, Bhattacharjee SK, Rao BJ. Active site detection by spatial conformity and electrostatic analysis--unravelling a proteolytic function in shrimp alkaline phosphatase. PLoS One 2011; 6:e28470. [PMID: 22174814 PMCID: PMC3234256 DOI: 10.1371/journal.pone.0028470] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2011] [Accepted: 11/08/2011] [Indexed: 11/30/2022] Open
Abstract
Computational methods are increasingly gaining importance as an aid in identifying active sites. Mostly these methods tend to have structural information that supplement sequence conservation based analyses. Development of tools that compute electrostatic potentials has further improved our ability to better characterize the active site residues in proteins. We have described a computational methodology for detecting active sites based on structural and electrostatic conformity - CataLytic Active Site Prediction (CLASP). In our pipelined model, physical 3D signature of any particular enzymatic function as defined by its active sites is used to obtain spatially congruent matches. While previous work has revealed that catalytic residues have large pKa deviations from standard values, we show that for a given enzymatic activity, electrostatic potential difference (PD) between analogous residue pairs in an active site taken from different proteins of the same family are similar. False positives in spatially congruent matches are further pruned by PD analysis where cognate pairs with large deviations are rejected. We first present the results of active site prediction by CLASP for two enzymatic activities - β-lactamases and serine proteases, two of the most extensively investigated enzymes. The results of CLASP analysis on motifs extracted from Catalytic Site Atlas (CSA) are also presented in order to demonstrate its ability to accurately classify any protein, putative or otherwise, with known structure. The source code and database is made available at www.sanchak.com/clasp/. Subsequently, we probed alkaline phosphatases (AP), one of the well known promiscuous enzymes, for additional activities. Such a search has led us to predict a hitherto unknown function of shrimp alkaline phosphatase (SAP), where the protein acts as a protease. Finally, we present experimental evidence of the prediction by CLASP by showing that SAP indeed has protease activity in vitro.
Collapse
Affiliation(s)
- Sandeep Chakraborty
- Department of Biological Sciences, Tata Institute of Fundamental Research, Mumbai, India
| | | | | | | | | |
Collapse
|
38
|
Serine hydroxymethyltransferase: A model enzyme for mechanistic, structural, and evolutionary studies. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2011; 1814:1489-96. [DOI: 10.1016/j.bbapap.2010.10.010] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2010] [Revised: 10/25/2010] [Accepted: 10/29/2010] [Indexed: 11/18/2022]
|
39
|
Barrantes-Reynolds R, Wallace SS, Bond JP. Using shifts in amino acid frequency and substitution rate to identify latent structural characters in base-excision repair enzymes. PLoS One 2011; 6:e25246. [PMID: 21998646 PMCID: PMC3188539 DOI: 10.1371/journal.pone.0025246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2010] [Accepted: 08/30/2011] [Indexed: 12/30/2022] Open
Abstract
Protein evolution includes the birth and death of structural motifs. For example, a zinc finger or a salt bridge may be present in some, but not all, members of a protein family. We propose that such transitions are manifest in sequence phylogenies as concerted shifts in substitution rates of amino acids that are neighbors in a representative structure. First, we identified rate shifts in a quartet from the Fpg/Nei family of base excision repair enzymes using a method developed by Xun Gu and coworkers. We found the shifts to be spatially correlated, more precisely, associated with a flexible loop involved in bacterial Fpg substrate specificity. Consistent with our result, sequences and structures provide convincing evidence that this loop plays a very different role in other family members. Second, then, we developed a method for identifying latent protein structural characters (LSC) given a set of homologous sequences based on Gu's method and proximity in a high-resolution structure. Third, we identified LSC and assigned states of LSC to clades within the Fpg/Nei family of base excision repair enzymes. We describe seven LSC; an accompanying Proteopedia page (http://proteopedia.org/wiki/index.php/Fpg_Nei_Protein_Family) describes these in greater detail and facilitates 3D viewing. The LSC we found provided a surprisingly complete picture of the interaction of the protein with the DNA capturing familiar examples, such as a Zn finger, as well as more subtle interactions. Their preponderance is consistent with an important role as phylogenetic characters. Phylogenetic inference based on LSC provided convincing evidence of independent losses of Zn fingers. Structural motifs may serve as important phylogenetic characters and modeling transitions involving structural motifs may provide a much deeper understanding of protein evolution.
Collapse
Affiliation(s)
- Ramiro Barrantes-Reynolds
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, United States of America
| | - Susan S. Wallace
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, United States of America
| | - Jeffrey P. Bond
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, United States of America
- * E-mail:
| |
Collapse
|
40
|
Kochańczyk M. Prediction of functionally important residues in globular proteins from unusual central distances of amino acids. BMC STRUCTURAL BIOLOGY 2011; 11:34. [PMID: 21923943 PMCID: PMC3188475 DOI: 10.1186/1472-6807-11-34] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2011] [Accepted: 09/18/2011] [Indexed: 12/12/2022]
Abstract
BACKGROUND Well-performing automated protein function recognition approaches usually comprise several complementary techniques. Beside constructing better consensus, their predictive power can be improved by either adding or refining independent modules that explore orthogonal features of proteins. In this work, we demonstrated how the exploration of global atomic distributions can be used to indicate functionally important residues. RESULTS Using a set of carefully selected globular proteins, we parametrized continuous probability density functions describing preferred central distances of individual protein atoms. Relative preferred burials were estimated using mixture models of radial density functions dependent on the amino acid composition of a protein under consideration. The unexpectedness of extraordinary locations of atoms was evaluated in the information-theoretic manner and used directly for the identification of key amino acids. In the validation study, we tested capabilities of a tool built upon our approach, called SurpResi, by searching for binding sites interacting with ligands. The tool indicated multiple candidate sites achieving success rates comparable to several geometric methods. We also showed that the unexpectedness is a property of regions involved in protein-protein interactions, and thus can be used for the ranking of protein docking predictions. The computational approach implemented in this work is freely available via a Web interface at http://www.bioinformatics.org/surpresi. CONCLUSIONS Probabilistic analysis of atomic central distances in globular proteins is capable of capturing distinct orientational preferences of amino acids as resulting from different sizes, charges and hydrophobic characters of their side chains. When idealized spatial preferences can be inferred from the sole amino acid composition of a protein, residues located in hydrophobically unfavorable environments can be easily detected. Such residues turn out to be often directly involved in binding ligands or interfacing with other proteins.
Collapse
Affiliation(s)
- Marek Kochańczyk
- Faculty of Physics, Jagiellonian University, ul, Reymonta 4, 30-059 Krakow, Poland.
| |
Collapse
|
41
|
Zhao J, Dundas J, Kachalo S, Ouyang Z, Liang J. Accuracy of functional surfaces on comparatively modeled protein structures. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2011; 12:97-107. [PMID: 21541664 PMCID: PMC3415962 DOI: 10.1007/s10969-011-9109-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2010] [Accepted: 04/20/2011] [Indexed: 12/18/2022]
Abstract
Identification and characterization of protein functional surfaces are important for predicting protein function, understanding enzyme mechanism, and docking small compounds to proteins. As the rapid speed of accumulation of protein sequence information far exceeds that of structures, constructing accurate models of protein functional surfaces and identify their key elements become increasingly important. A promising approach is to build comparative models from sequences using known structural templates such as those obtained from structural genome projects. Here we assess how well this approach works in modeling binding surfaces. By systematically building three-dimensional comparative models of proteins using MODELLER: , we determine how well functional surfaces can be accurately reproduced. We use an alpha shape based pocket algorithm to compute all pockets on the modeled structures, and conduct a large-scale computation of similarity measurements (pocket RMSD and fraction of functional atoms captured) for 26,590 modeled enzyme protein structures. Overall, we find that when the sequence fragment of the binding surfaces has more than 45% identity to that of the template protein, the modeled surfaces have on average an RMSD of 0.5 Å, and contain 48% or more of the binding surface atoms, with nearly all of the important atoms in the signatures of binding pockets captured.
Collapse
Affiliation(s)
- Jieling Zhao
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Joe Dundas
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Sema Kachalo
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Zheng Ouyang
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Jie Liang
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| |
Collapse
|
42
|
Dehouck Y, Kwasigroch JM, Gilis D, Rooman M. PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinformatics 2011; 12:151. [PMID: 21569468 PMCID: PMC3113940 DOI: 10.1186/1471-2105-12-151] [Citation(s) in RCA: 381] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2010] [Accepted: 05/13/2011] [Indexed: 12/31/2022] Open
Abstract
Background The rational design of modified proteins with controlled stability is of extreme importance in a whole range of applications, notably in the biotechnological and environmental areas, where proteins are used for their catalytic or other functional activities. Future breakthroughs in medical research may also be expected from an improved understanding of the effect of naturally occurring disease-causing mutations on the molecular level. Results PoPMuSiC-2.1 is a web server that predicts the thermodynamic stability changes caused by single site mutations in proteins, using a linear combination of statistical potentials whose coefficients depend on the solvent accessibility of the mutated residue. PoPMuSiC presents good prediction performances (correlation coefficient of 0.8 between predicted and measured stability changes, in cross validation, after exclusion of 10% outliers). It is moreover very fast, allowing the prediction of the stability changes resulting from all possible mutations in a medium size protein in less than a minute. This unique functionality is user-friendly implemented in PoPMuSiC and is particularly easy to exploit. Another new functionality of our server concerns the estimation of the optimality of each amino acid in the sequence, with respect to the stability of the structure. It may be used to detect structural weaknesses, i.e. clusters of non-optimal residues, which represent particularly interesting sites for introducing targeted mutations. This sequence optimality data is also expected to have significant implications in the prediction and the analysis of particular structural or functional protein regions. To illustrate the interest of this new functionality, we apply it to a dataset of known catalytic sites, and show that a much larger than average concentration of structural weaknesses is detected, quantifying how these sites have been optimized for function rather than stability. Conclusion The freely available PoPMuSiC-2.1 web server is highly useful for identifying very rapidly a list of possibly relevant mutations with the desired stability properties, on which subsequent experimental studies can be focused. It can also be used to detect sequence regions corresponding to structural weaknesses, which could be functionally important or structurally delicate regions, with obvious applications in rational protein design.
Collapse
Affiliation(s)
- Yves Dehouck
- Bioinformatique génomique et structurale, Université Libre de Bruxelles, Av, Fr, Roosevelt 50, CP165/61, 1050 Brussels, Belgium.
| | | | | | | |
Collapse
|
43
|
Yahalom R, Reshef D, Wiener A, Frankel S, Kalisman N, Lerner B, Keasar C. Structure-based identification of catalytic residues. Proteins 2011; 79:1952-63. [PMID: 21491495 DOI: 10.1002/prot.23020] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2010] [Revised: 01/14/2011] [Accepted: 01/28/2011] [Indexed: 11/10/2022]
Abstract
The identification of catalytic residues is an essential step in functional characterization of enzymes. We present a purely structural approach to this problem, which is motivated by the difficulty of evolution-based methods to annotate structural genomics targets that have few or no homologs in the databases. Our approach combines a state-of-the-art support vector machine (SVM) classifier with novel structural features that augment structural clues by spatial averaging and Z scoring. Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues. This problem is tackled by: (1) optimizing the classifier to maximize a performance criterion that considers both Type I and Type II errors in the classification of catalytic and non-catalytic residues; (2) under-sampling non-catalytic residues before SVM training; and (3) during SVM training, penalizing errors in learning catalytic residues more than errors in learning non-catalytic residues. Tested on four enzyme datasets, one specifically designed by us to mimic the structural genomics scenario and three previously evaluated datasets, our structure-based classifier is never inferior to similar structure-based classifiers and comparable to classifiers that use both structural and evolutionary features. In addition to the evaluation of the performance of catalytic residue identification, we also present detailed case studies on three proteins. This analysis suggests that many false positive predictions may correspond to binding sites and other functional residues. A web server that implements the method, our own-designed database, and the source code of the programs are publicly available at http://www.cs.bgu.ac.il/∼meshi/functionPrediction.
Collapse
Affiliation(s)
- Ran Yahalom
- Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| | | | | | | | | | | | | |
Collapse
|
44
|
Guharoy M, Janin J, Robert CH. Side-chain rotamer transitions at protein-protein interfaces. Proteins 2011; 78:3219-25. [PMID: 20737439 DOI: 10.1002/prot.22821] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We compare the changes in side chain conformations that accompany the formation of protein-protein complexes, in residues forming either the interface or the remainder of the solvent-accessible surface of the proteins in the Docking Benchmark 3.0. We find that the interface residues undergo significantly more changes than other surface residues, and these changes are more likely to convert them from a high-energy torsion angle state to a lower-energy one than the reverse. Moreover, in both the unbound proteins and the complexes, the interface residues are more frequently found to be in a high-energy torsion angle state than the noninterface residues. As these differences exist before the binding step, they may be relevant to specificity and help in identifying binding sites for docking predictions.
Collapse
Affiliation(s)
- Mainak Guharoy
- CNRS Laboratoire de Biochimie Théorique, Institut de Biologie Physico-Chimique (IBPC), Paris, France
| | | | | |
Collapse
|
45
|
Rostkowski M, Olsson MHM, Søndergaard CR, Jensen JH. Graphical analysis of pH-dependent properties of proteins predicted using PROPKA. BMC STRUCTURAL BIOLOGY 2011; 11:6. [PMID: 21269479 PMCID: PMC3038139 DOI: 10.1186/1472-6807-11-6] [Citation(s) in RCA: 315] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2010] [Accepted: 01/26/2011] [Indexed: 11/10/2022]
Abstract
BACKGROUND Charge states of ionizable residues in proteins determine their pH-dependent properties through their pKa values. Thus, various theoretical methods to determine ionization constants of residues in biological systems have been developed. One of the more widely used approaches for predicting pKa values in proteins is the PROPKA program, which provides convenient structural rationalization of the predicted pKa values without any additional calculations. RESULTS The PROPKA Graphical User Interface (GUI) is a new tool for studying the pH-dependent properties of proteins such as charge and stabilization energy. It facilitates a quantitative analysis of pKa values of ionizable residues together with their structural determinants by providing a direct link between the pKa data, predicted by the PROPKA calculations, and the structure via the Visual Molecular Dynamics (VMD) program. The GUI also calculates contributions to the pH-dependent unfolding free energy at a given pH for each ionizable group in the protein. Moreover, the PROPKA-computed pKa values or energy contributions of the ionizable residues in question can be displayed interactively. The PROPKA GUI can also be used for comparing pH-dependent properties of more than one structure at the same time. CONCLUSIONS The GUI considerably extends the analysis and validation possibilities of the PROPKA approach. The PROPKA GUI can conveniently be used to investigate ionizable groups, and their interactions, of residues with significantly perturbed pKa values or residues that contribute to the stabilization energy the most. Charge-dependent properties can be studied either for a single protein or simultaneously with other homologous structures, which makes it a helpful tool, for instance, in protein design studies or structure-based function predictions. The GUI is implemented as a Tcl/Tk plug-in for VMD, and can be obtained online at http://propka.ki.ku.dk/~luca/wiki/index.php/GUI_Web.
Collapse
Affiliation(s)
- Michał Rostkowski
- Department of Chemistry, University of Copenhagen, Universitetsparken 5, 2100 Copenhagen, Denmark
| | | | | | | |
Collapse
|
46
|
Sael L, Kihara D. Binding ligand prediction for proteins using partial matching of local surface patches. Int J Mol Sci 2010; 11:5009-26. [PMID: 21614188 PMCID: PMC3100846 DOI: 10.3390/ijms11125009] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2010] [Revised: 12/02/2010] [Accepted: 12/03/2010] [Indexed: 11/25/2022] Open
Abstract
Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group.
Collapse
Affiliation(s)
- Lee Sael
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA; E-Mail:
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA; E-Mail:
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
- Markey Center for Structural Biology, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
47
|
Prymula K, Jadczyk T, Roterman I. Catalytic residues in hydrolases: analysis of methods designed for ligand-binding site prediction. J Comput Aided Mol Des 2010; 25:117-33. [PMID: 21104192 PMCID: PMC3032897 DOI: 10.1007/s10822-010-9402-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2010] [Accepted: 11/08/2010] [Indexed: 11/26/2022]
Abstract
The comparison of eight tools applicable to ligand-binding site prediction is presented. The methods examined cover three types of approaches: the geometrical (CASTp, PASS, Pocket-Finder), the physicochemical (Q-SiteFinder, FOD) and the knowledge-based (ConSurf, SuMo, WebFEATURE). The accuracy of predictions was measured in reference to the catalytic residues documented in the Catalytic Site Atlas. The test was performed on a set comprising selected chains of hydrolases. The results were analysed with regard to size, polarity, secondary structure, accessible solvent area of predicted sites as well as parameters commonly used in machine learning (F-measure, MCC). The relative accuracies of predictions are presented in the ROC space, allowing determination of the optimal methods by means of the ROC convex hull. Additionally the minimum expected cost analysis was performed. Both advantages and disadvantages of the eight methods are presented. Characterization of protein chains in respect to the level of difficulty in the active site prediction is introduced. The main reasons for failures are discussed. Overall, the best performance offers SuMo followed by FOD, while Pocket-Finder is the best method among the geometrical approaches.
Collapse
Affiliation(s)
- Katarzyna Prymula
- Faculty of Chemistry, Jagiellonian University, 3 Ingardena Street, 30-060 Krakow, Poland
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, 7E Kopernika Street, 31-034 Krakow, Poland
| | - Tomasz Jadczyk
- Department of Electronics, AGH University of Science and Technology, 30 Mickiewicza Avenue, 30-059 Krakow, Poland
| | - Irena Roterman
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, 16 Lazarza Street, 31-530 Krakow, Poland
| |
Collapse
|
48
|
Sonavane S, Chakrabarti P. Prediction of active site cleft using support vector machines. J Chem Inf Model 2010; 50:2266-73. [PMID: 21080689 DOI: 10.1021/ci1002922] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Computational tools are available today for the detection and delineation of the clefts and cavities in protein 3D structure and ranking them on the basis of probable binding site clefts. There is a need to improve the ranking of clefts and accuracy of predicting catalytic site clefts. Our results show that the distance of the clefts from protein centroid and sequence entropy of the lining residues, when used in conjunction with the volume, are valuable descriptors for predicting the catalytic site. We have applied the SVM approach for recognizing and ranking the active site clefts and tested its performance using different combinations of attributes. In both the ligand-bound and the unbound forms of structures, our method correctly predicts the active site clefts in 73% of cases at rank one. If we consider the results at rank 3 (i.e., the correct solution is among one of the top three solutions), the correctly predicted cases are 94% and 90% for the bound and the unbound forms of structures, respectively. Our approach improves the ranking of binding site clefts in comparison with CASTp and is comparable to other existing methods like Fpocket. Although the data set for training the SVM approach is rather small in size, the results are encouraging for the method to be used as complementary to other existing tools.
Collapse
Affiliation(s)
- Shrihari Sonavane
- Department of Biochemistry and Bioinformatics Centre, Bose Institute, P-1/12 CIT Scheme VIIM, Kolkata 700 054, India
| | | |
Collapse
|
49
|
Chikhi R, Sael L, Kihara D. Real-time ligand binding pocket database search using local surface descriptors. Proteins 2010; 78:2007-28. [PMID: 20455259 DOI: 10.1002/prot.22715] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Because of the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two-dimensional pseudo-Zernike moments or the three-dimensional Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark studies employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed.
Collapse
Affiliation(s)
- Rayan Chikhi
- Computer Science Department, Ecole Normale Supérieure de Cachan, 94235 Cachan cedex, Britanny, France
| | | | | |
Collapse
|
50
|
Wilkins AD, Lua R, Erdin S, Ward RM, Lichtarge O. Sequence and structure continuity of evolutionary importance improves protein functional site discovery and annotation. Protein Sci 2010; 19:1296-311. [PMID: 20506260 DOI: 10.1002/pro.406] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Protein functional sites control most biological processes and are important targets for drug design and protein engineering. To characterize them, the evolutionary trace (ET) ranks the relative importance of residues according to their evolutionary variations. Generally, top-ranked residues cluster spatially to define evolutionary hotspots that predict functional sites in structures. Here, various functions that measure the physical continuity of ET ranks among neighboring residues in the structure, or in the sequence, are shown to inform sequence selection and to improve functional site resolution. This is shown first, in 110 proteins, for which the overlap between top-ranked residues and actual functional sites rose by 8% in significance. Then, on a structural proteomic scale, optimized ET led to better 3D structure-function motifs (3D templates) and, in turn, to enzyme function prediction by the Evolutionary Trace Annotation (ETA) method with better sensitivity of (40% to 53%) and positive predictive value (93% to 94%). This suggests that the similarity of evolutionary importance among neighboring residues in the sequence and in the structure is a universal feature of protein evolution. In practice, this yields a tool for optimizing sequence selections for comparative analysis and, via ET, for better predictions of functional site and function. This should prove useful for the efficient mutational redesign of protein function and for pharmaceutical targeting.
Collapse
Affiliation(s)
- A D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | | | | | |
Collapse
|