1
|
Summers TJ, Hemmati R, Miller JE, Agbaglo DA, Cheng Q, DeYonker NJ. Evaluating the active site-substrate interplay between x-ray crystal structure and molecular dynamics in chorismate mutase. J Chem Phys 2023; 158:065101. [PMID: 36792523 DOI: 10.1063/5.0127106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Designing realistic quantum mechanical (QM) models of enzymes is dependent on reliably discerning and modeling residues, solvents, and cofactors important in crafting the active site microenvironment. Interatomic van der Waals contacts have previously demonstrated usefulness toward designing QM-models, but their measured values (and subsequent residue importance rankings) are expected to be influenceable by subtle changes in protein structure. Using chorismate mutase as a case study, this work examines the differences in ligand-residue interatomic contacts between an x-ray crystal structure and structures from a molecular dynamics simulation. Select structures are further analyzed using symmetry adapted perturbation theory to compute ab initio ligand-residue interaction energies. The findings of this study show that ligand-residue interatomic contacts measured for an x-ray crystal structure are not predictive of active site contacts from a sampling of molecular dynamics frames. In addition, the variability in interatomic contacts among structures is not correlated with variability in interaction energies. However, the results spotlight using interaction energies to characterize and rank residue importance in future computational enzymology workflows.
Collapse
Affiliation(s)
- Thomas J Summers
- Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, USA
| | - Reza Hemmati
- Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, USA
| | - Justin E Miller
- Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, USA
| | - Donatus A Agbaglo
- Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, USA
| | - Qianyi Cheng
- Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, USA
| | - Nathan J DeYonker
- Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, USA
| |
Collapse
|
2
|
Effects of 6-Shogaol on Glucose Uptake and Intestinal Barrier Integrity in Caco-2 Cells. Foods 2023; 12:foods12030503. [PMID: 36766032 PMCID: PMC9913893 DOI: 10.3390/foods12030503] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 01/10/2023] [Accepted: 01/17/2023] [Indexed: 01/25/2023] Open
Abstract
As the main bioactive component in dried ginger, 6-shogaol has potential hypoglycemic activity, but its mechanism is still unclear. The process of carbohydrate digestion and glucose absorption is closely related to the enzymatic activity of epithelial brush cells, expression of glucose transporters, and permeability of intestinal epithelial cells. Therefore, this study explored the hypoglycemic mechanism of 6-shogaol from the perspective of glucose uptake, absorption transport, and protection of intestinal barrier function. Based on molecular docking, the binding energy of 6-shogaol and α-glucosidase is -6.24 kcal/mol, showing a high binding affinity. Moreover, a-glucosidase enzymatic activity was reduced (-78.96%) when the 6-shogaol concentration was 500 µg/mL. After 6-shogaol intervention, the glucose uptake was reduced; the relative expression of glucose transporters GLUT2 and SGLT1 were down regulated; and tight junction proteins ZO-1, Occludin and Claudin were up regulated in differentiated Caco-2 cells. This study confirmed that 6-shogaol effectively inhibits the activity of α-glucosidase and has beneficial effects on glucose uptake, protection of intestinal barrier function, and promotion of intestinal material absorption.
Collapse
|
3
|
Jenkins NW, Kundrotas PJ, Vakser IA. Size of the protein-protein energy funnel in crowded environment. Front Mol Biosci 2022; 9:1031225. [PMID: 36425657 PMCID: PMC9679368 DOI: 10.3389/fmolb.2022.1031225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 10/26/2022] [Indexed: 11/09/2022] Open
Abstract
Association of proteins to a significant extent is determined by their geometric complementarity. Large-scale recognition factors, which directly relate to the funnel-like intermolecular energy landscape, provide important insights into the basic rules of protein recognition. Previously, we showed that simple energy functions and coarse-grained models reveal major characteristics of the energy landscape. As new computational approaches increasingly address structural modeling of a whole cell at the molecular level, it becomes important to account for the crowded environment inside the cell. The crowded environment drastically changes protein recognition properties, and thus significantly alters the underlying energy landscape. In this study, we addressed the effect of crowding on the protein binding funnel, focusing on the size of the funnel. As crowders occupy the funnel volume, they make it less accessible to the ligands. Thus, the funnel size, which can be defined by ligand occupancy, is generally reduced with the increase of the crowders concentration. This study quantifies this reduction for different concentration of crowders and correlates this dependence with the structural details of the interacting proteins. The results provide a better understanding of the rules of protein association in the crowded environment.
Collapse
Affiliation(s)
- Nathan W. Jenkins
- Computational Biology Program, The University of Kansas, Lawrence, KS, United States
| | - Petras J. Kundrotas
- Computational Biology Program, The University of Kansas, Lawrence, KS, United States
- *Correspondence: Petras J. Kundrotas, ; Ilya A. Vakser,
| | - Ilya A. Vakser
- Computational Biology Program, The University of Kansas, Lawrence, KS, United States
- Department of Molecular Biosciences, The University of Kansas, Lawrence, KS, United States
- *Correspondence: Petras J. Kundrotas, ; Ilya A. Vakser,
| |
Collapse
|
4
|
Feehan R, Franklin MW, Slusky JSG. Machine learning differentiates enzymatic and non-enzymatic metals in proteins. Nat Commun 2021; 12:3712. [PMID: 34140507 PMCID: PMC8211803 DOI: 10.1038/s41467-021-24070-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 06/02/2021] [Indexed: 11/09/2022] Open
Abstract
Metalloenzymes are 40% of all enzymes and can perform all seven classes of enzyme reactions. Because of the physicochemical similarities between the active sites of metalloenzymes and inactive metal binding sites, it is challenging to differentiate between them. Yet distinguishing these two classes is critical for the identification of both native and designed enzymes. Because of similarities between catalytic and non-catalytic metal binding sites, finding physicochemical features that distinguish these two types of metal sites can indicate aspects that are critical to enzyme function. In this work, we develop the largest structural dataset of enzymatic and non-enzymatic metalloprotein sites to date. We then use a decision-tree ensemble machine learning model to classify metals bound to proteins as enzymatic or non-enzymatic with 92.2% precision and 90.1% recall. Our model scores electrostatic and pocket lining features as more important than pocket volume, despite the fact that volume is the most quantitatively different feature between enzyme and non-enzymatic sites. Finally, we find our model has overall better performance in a side-to-side comparison against other methods that differentiate enzymatic from non-enzymatic sequences. We anticipate that our model's ability to correctly identify which metal sites are responsible for enzymatic activity could enable identification of new enzymatic mechanisms and de novo enzyme design.
Collapse
Affiliation(s)
- Ryan Feehan
- Center for Computational Biology, The University of Kansas, Lawrence, KS, USA
| | - Meghan W Franklin
- Center for Computational Biology, The University of Kansas, Lawrence, KS, USA
| | - Joanna S G Slusky
- Center for Computational Biology, The University of Kansas, Lawrence, KS, USA.
- Department of Molecular Biosciences, The University of Kansas, Lawrence, KS, USA.
| |
Collapse
|
5
|
Summers TJ, Daniel BP, Cheng Q, DeYonker NJ. Quantifying Inter-Residue Contacts through Interaction Energies. J Chem Inf Model 2019; 59:5034-5044. [DOI: 10.1021/acs.jcim.9b00804] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Thomas J. Summers
- The Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, United States
| | - Baty P. Daniel
- The Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, United States
| | - Qianyi Cheng
- The Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, United States
| | - Nathan J. DeYonker
- The Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, United States
| |
Collapse
|
6
|
Choudhary P, Kumar S, Bachhawat AK, Pandit SB. CSmetaPred: a consensus method for prediction of catalytic residues. BMC Bioinformatics 2017; 18:583. [PMID: 29273005 PMCID: PMC5741869 DOI: 10.1186/s12859-017-1987-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 12/05/2017] [Indexed: 01/27/2023] Open
Abstract
Background Knowledge of catalytic residues can play an essential role in elucidating mechanistic details of an enzyme. However, experimental identification of catalytic residues is a tedious and time-consuming task, which can be expedited by computational predictions. Despite significant development in active-site prediction methods, one of the remaining issues is ranked positions of putative catalytic residues among all ranked residues. In order to improve ranking of catalytic residues and their prediction accuracy, we have developed a meta-approach based method CSmetaPred. In this approach, residues are ranked based on the mean of normalized residue scores derived from four well-known catalytic residue predictors. The mean residue score of CSmetaPred is combined with predicted pocket information to improve prediction performance in meta-predictor, CSmetaPred_poc. Results Both meta-predictors are evaluated on two comprehensive benchmark datasets and three legacy datasets using Receiver Operating Characteristic (ROC) and Precision Recall (PR) curves. The visual and quantitative analysis of ROC and PR curves shows that meta-predictors outperform their constituent methods and CSmetaPred_poc is the best of evaluated methods. For instance, on CSAMAC dataset CSmetaPred_poc (CSmetaPred) achieves highest Mean Average Specificity (MAS), a scalar measure for ROC curve, of 0.97 (0.96). Importantly, median predicted rank of catalytic residues is the lowest (best) for CSmetaPred_poc. Considering residues ranked ≤20 classified as true positive in binary classification, CSmetaPred_poc achieves prediction accuracy of 0.94 on CSAMAC dataset. Moreover, on the same dataset CSmetaPred_poc predicts all catalytic residues within top 20 ranks for ~73% of enzymes. Furthermore, benchmarking of prediction on comparative modelled structures showed that models result in better prediction than only sequence based predictions. These analyses suggest that CSmetaPred_poc is able to rank putative catalytic residues at lower (better) ranked positions, which can facilitate and expedite their experimental characterization. Conclusions The benchmarking studies showed that employing meta-approach in combining residue-level scores derived from well-known catalytic residue predictors can improve prediction accuracy as well as provide improved ranked positions of known catalytic residues. Hence, such predictions can assist experimentalist to prioritize residues for mutational studies in their efforts to characterize catalytic residues. Both meta-predictors are available as webserver at: http://14.139.227.206/csmetapred/. Electronic supplementary material The online version of this article (10.1186/s12859-017-1987-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Preeti Choudhary
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, Knowledge City, Sector 81, SAS Nagar, Manuali PO 140306, India
| | - Shailesh Kumar
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, Knowledge City, Sector 81, SAS Nagar, Manuali PO 140306, India.,Laboratory of Biochemistry and Genetics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Anand Kumar Bachhawat
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, Knowledge City, Sector 81, SAS Nagar, Manuali PO 140306, India
| | - Shashi Bhushan Pandit
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, Knowledge City, Sector 81, SAS Nagar, Manuali PO 140306, India.
| |
Collapse
|
7
|
Glantz-Gashai Y, Meirson T, Samson AO. Normal Modes Expose Active Sites in Enzymes. PLoS Comput Biol 2016; 12:e1005293. [PMID: 28002427 PMCID: PMC5225006 DOI: 10.1371/journal.pcbi.1005293] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2015] [Revised: 01/10/2017] [Accepted: 12/07/2016] [Indexed: 01/10/2023] Open
Abstract
Accurate prediction of active sites is an important tool in bioinformatics. Here we present an improved structure based technique to expose active sites that is based on large changes of solvent accessibility accompanying normal mode dynamics. The technique which detects EXPOsure of active SITes through normal modEs is named EXPOSITE. The technique is trained using a small 133 enzyme dataset and tested using a large 845 enzyme dataset, both with known active site residues. EXPOSITE is also tested in a benchmark protein ligand dataset (PLD) comprising 48 proteins with and without bound ligands. EXPOSITE is shown to successfully locate the active site in most instances, and is found to be more accurate than other structure-based techniques. Interestingly, in several instances, the active site does not correspond to the largest pocket. EXPOSITE is advantageous due to its high precision and paves the way for structure based prediction of active site in enzymes. In this paper, we present an improved technique to predict active sites in enzymes. Our technique is based on changes of solvent accessibility that accompany normal mode dynamics. We assert the technique strength using several enzyme datasets with known catalytic residues. We show the technique successfully locates the active site in most cases, and consistently surpasses the accuracy of other techniques. We show how the technique is advantageous and paves the way for high precision prediction of active sites.
Collapse
Affiliation(s)
| | - Tomer Meirson
- Faculty of Medicine in the Galilee, Bar Ilan University, Safed, Israel
| | - Abraham O. Samson
- Faculty of Medicine in the Galilee, Bar Ilan University, Safed, Israel
- * E-mail:
| |
Collapse
|
8
|
PINGU: PredIction of eNzyme catalytic residues usinG seqUence information. PLoS One 2015; 10:e0135122. [PMID: 26261982 PMCID: PMC4532418 DOI: 10.1371/journal.pone.0135122] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2015] [Accepted: 07/17/2015] [Indexed: 11/19/2022] Open
Abstract
Identification of catalytic residues can help unveil interesting attributes of enzyme function for various therapeutic and industrial applications. Based on their biochemical roles, the number of catalytic residues and sequence lengths of enzymes vary. This article describes a prediction approach (PINGU) for such a scenario. It uses models trained using physicochemical properties and evolutionary information of 650 non-redundant enzymes (2136 catalytic residues) in a support vector machines architecture. Independent testing on 200 non-redundant enzymes (683 catalytic residues) in predefined prediction settings, i.e., with non-catalytic per catalytic residue ranging from 1 to 30, suggested that the prediction approach was highly sensitive and specific, i.e., 80% or above, over the incremental challenges. To learn more about the discriminatory power of PINGU in real scenarios, where the prediction challenge is variable and susceptible to high false positives, the best model from independent testing was used on 60 diverse enzymes. Results suggested that PINGU was able to identify most catalytic residues and non-catalytic residues properly with 80% or above accuracy, sensitivity and specificity. The effect of false positives on precision was addressed in this study by application of predicted ligand-binding residue information as a post-processing filter. An overall improvement of 20% in F-measure and 0.138 in Correlation Coefficient with 16% enhanced precision could be achieved. On account of its encouraging performance, PINGU is hoped to have eventual applications in boosting enzyme engineering and novel drug discovery.
Collapse
|
9
|
Sequence Conservation, Radial Distance and Packing Density in Spherical Viral Capsids. PLoS One 2015; 10:e0132234. [PMID: 26132081 PMCID: PMC4488880 DOI: 10.1371/journal.pone.0132234] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2015] [Accepted: 06/11/2015] [Indexed: 12/30/2022] Open
Abstract
The conservation level of a residue is a useful measure about the importance of that residue in protein structure and function. Much information about sequence conservation comes from aligning homologous sequences. Profiles showing the variation of the conservation level along the sequence are usually interpreted in evolutionary terms and dictated by site similarities of a proper set of homologous sequences. Here, we report that, of the viral icosahedral capsids, the sequence conservation profile can be determined by variations in the distances between residues and the centroid of the capsid – with a direct inverse proportionality between the conservation level and the centroid distance – as well as by the spatial variations in local packing density. Examining both the centroid and the packing density models against a dataset of 51 crystal structures of nonhomologous icosahedral capsids, we found that many global patterns and minor features derived from the viral structures are consistent with those present in the sequence conservation profiles. The quantitative link between the level of conservation and structural features like centroid-distance or packing density allows us to look at residue conservation from a structural viewpoint as well as from an evolutionary viewpoint.
Collapse
|
10
|
Zeng P, Li J, Ma W, Cui Q. Rsite: a computational method to identify the functional sites of noncoding RNAs. Sci Rep 2015; 5:9179. [PMID: 25776805 PMCID: PMC4361870 DOI: 10.1038/srep09179] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Accepted: 02/18/2015] [Indexed: 01/01/2023] Open
Abstract
There is an increasing demand for identifying the functional sites of noncoding RNAs (ncRNAs). Here we introduce a tertiary-structure based computational approach, Rsite, which first calculates the Euclidean distances between each nucleotide and all the other nucleotides in a RNA molecule and then determines the nucleotides that are the extreme points in the distance curve as the functional sites. By analyzing two ncRNAs, tRNA (Lys) and Diels-Alder ribozyme, we demonstrated the efficiency of Rsite. As a result, Rsite recognized all of the known functional sites of the two ncRNAs, suggesting that Rsite could be a potentially useful tool for discovering the functional sites of ncRNAs. The source codes and data sets of Rsite are available at http://www.cuilab.cn/rsite.
Collapse
Affiliation(s)
- Pan Zeng
- Department of Biomedical Informatics, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 xueyuan Rd, Beijing. 100191, China
| | - Jianwei Li
- Lab of Translational Biomedicine Informatics, School of Computer Science and Engineering, Hebei University of Technology, 5340 Xiping Rd, Tianjin. 300401, China
| | - Wei Ma
- Department of Biomedical Informatics, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 xueyuan Rd, Beijing. 100191, China
| | - Qinghua Cui
- Department of Biomedical Informatics, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 xueyuan Rd, Beijing. 100191, China
| |
Collapse
|
11
|
Lin JJ, Lin ZL, Hwang JK, Huang TT. On the packing density of the unbound protein-protein interaction interface and its implications in dynamics. BMC Bioinformatics 2015; 16 Suppl 1:S7. [PMID: 25708145 PMCID: PMC4331706 DOI: 10.1186/1471-2105-16-s1-s7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Background Characterizing the interface residues will help shed light on protein-protein interactions, which are involved in many important biological processes. Many studies focus on characterizing sequence or structure features of protein interfaces, but there are few studies characterizing the dynamics of interfaces. Therefore, we would like to know whether there is any specific dynamics pattern in the protein-protein interaction interfaces. Thermal fluctuation is an important dynamical property for a residue, and could be quickly estimated by local packing density without large computation since studies have showen closely relationship between these two properties. Therefore, we divided surface of an unbound subunit (free protein subunits before they are involved in forming the protein complexes) into several separate regions, and compared their average thermal fluctuations of different regions in order to characterize the dynamics pattern in unbound protein-protein interaction interfaces. Results We used weighted contact numbers (WCN), a parameter-free method to quantify packing density, to estimate the thermal fluctuations of residues in the interfaces. By analyzing the WCN distributions of interfaces in unbound subunits from 1394 non-homologous protein complexes, we show that the residues in the central regions of interfaces have higher packing density (i.e. more rigid); on the other hand, residues surrounding the central regions have smaller packing density (i.e. more flexible). The distinct distributions of packing density, suggesting distinct thermal fluctuation, reveals specific dynamics pattern in the interface of unbound protein subunits. Conclusions We found general trend that the unbound protein-protein interaction interfaces consist of rigid residues in the central regions, which are surrounded by flexible residues. This finding suggests that the dynamics might be one of the important features for the formation of protein complexes.
Collapse
|
12
|
EXIA2: web server of accurate and rapid protein catalytic residue prediction. BIOMED RESEARCH INTERNATIONAL 2014; 2014:807839. [PMID: 25295274 PMCID: PMC4177735 DOI: 10.1155/2014/807839] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Revised: 05/27/2014] [Accepted: 06/11/2014] [Indexed: 11/18/2022]
Abstract
We propose a method (EXIA2) of catalytic residue prediction based on protein structure without needing homology information. The method is based on the special side chain orientation of catalytic residues. We found that the side chain of catalytic residues usually points to the center of the catalytic site. The special orientation is usually observed in catalytic residues but not in noncatalytic residues, which usually have random side chain orientation. The method is shown to be the most accurate catalytic residue prediction method currently when combined with PSI-Blast sequence conservation. It performs better than other competing methods on several benchmark datasets that include over 1,200 enzyme structures. The areas under the ROC curve (AUC) on these benchmark datasets are in the range from 0.934 to 0.968.
Collapse
|
13
|
Liu R, Hu J. DNABind: A hybrid algorithm for structure-based prediction of DNA-binding residues by combining machine learning- and template-based approaches. Proteins 2013; 81:1885-99. [DOI: 10.1002/prot.24330] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2013] [Revised: 05/02/2013] [Accepted: 05/12/2013] [Indexed: 01/10/2023]
Affiliation(s)
- Rong Liu
- Department of Computer Science and Engineering; University of South Carolina; Columbia South Carolina 29208
- Center for Bioinformatics; College of Life Science and Technology; Huazhong Agricultural University; Wuhan 430070 People's Republic of China
| | - Jianjun Hu
- Department of Computer Science and Engineering; University of South Carolina; Columbia South Carolina 29208
| |
Collapse
|
14
|
Protein structure based prediction of catalytic residues. BMC Bioinformatics 2013; 14:63. [PMID: 23433045 PMCID: PMC3598644 DOI: 10.1186/1471-2105-14-63] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2012] [Accepted: 02/17/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. RESULTS We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. CONCLUSIONS We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.
Collapse
|
15
|
On the structural context and identification of enzyme catalytic residues. BIOMED RESEARCH INTERNATIONAL 2013; 2013:802945. [PMID: 23484160 PMCID: PMC3581254 DOI: 10.1155/2013/802945] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 12/28/2012] [Indexed: 11/25/2022]
Abstract
Enzymes play important roles in most of the biological processes. Although only a small fraction of residues are directly involved in catalytic reactions, these catalytic residues are the most crucial parts in enzymes. The study of the fundamental and unique features of catalytic residues benefits the understanding of enzyme functions and catalytic mechanisms. In this work, we analyze the structural context of catalytic residues based on theoretical and experimental structure flexibility. The results show that catalytic residues have distinct structural features and context. Their neighboring residues, whether sequence or structure neighbors within specific range, are usually structurally more rigid than those of noncatalytic residues. The structural context feature is combined with support vector machine to identify catalytic residues from enzyme structure. The prediction results are better or comparable to those of recent structure-based prediction methods.
Collapse
|
16
|
Accurate prediction of protein catalytic residues by side chain orientation and residue contact density. PLoS One 2012; 7:e47951. [PMID: 23110141 PMCID: PMC3480458 DOI: 10.1371/journal.pone.0047951] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2012] [Accepted: 09/18/2012] [Indexed: 11/19/2022] Open
Abstract
Prediction of protein catalytic residues provides useful information for the studies of protein functions. Most of the existing methods combine both structure and sequence information but heavily rely on sequence conservation from multiple sequence alignments. The contribution of structure information is usually less than that of sequence conservation in existing methods. We found a novel structure feature, residue side chain orientation, which is the first structure-based feature that achieves prediction results comparable to that of evolutionary sequence conservation. We developed a structure-based method, Enzyme Catalytic residue SIde-chain Arrangement (EXIA), which is based on residue side chain orientations and backbone flexibility of protein structure. The prediction that uses EXIA outperforms existing structure-based features. The prediction quality of combing EXIA and sequence conservation exceeds that of the state-of-the-art prediction methods. EXIA is designed to predict catalytic residues from single protein structure without needing sequence or structure alignments. It provides invaluable information when there is no sufficient or reliable homology information for target protein. We found that catalytic residues have very special side chain orientation and designed the EXIA method based on the newly discovered feature. It was also found that EXIA performs well for a dataset of enzymes without any bounded ligand in their crystallographic structures.
Collapse
|
17
|
Han L, Zhang YJ, Song J, Liu MS, Zhang Z. Identification of catalytic residues using a novel feature that integrates the microenvironment and geometrical location properties of residues. PLoS One 2012; 7:e41370. [PMID: 22829945 PMCID: PMC3400608 DOI: 10.1371/journal.pone.0041370] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Accepted: 06/20/2012] [Indexed: 11/18/2022] Open
Abstract
Enzymes play a fundamental role in almost all biological processes and identification of catalytic residues is a crucial step for deciphering the biological functions and understanding the underlying catalytic mechanisms. In this work, we developed a novel structural feature called MEDscore to identify catalytic residues, which integrated the microenvironment (ME) and geometrical properties of amino acid residues. Firstly, we converted a residue's ME into a series of spatially neighboring residue pairs, whose likelihood of being located in a catalytic ME was deduced from a benchmark enzyme dataset. We then calculated an ME-based score, termed as MEscore, by summing up the likelihood of all residue pairs. Secondly, we defined a parameter called Dscore to measure the relative distance of a residue to the center of the protein, provided that catalytic residues are typically located in the center of the protein structure. Finally, we defined the MEDscore feature based on an effective nonlinear integration of MEscore and Dscore. When evaluated on a well-prepared benchmark dataset using five-fold cross-validation tests, MEDscore achieved a robust performance in identifying catalytic residues with an AUC1.0 of 0.889. At a ≤ 10% false positive rate control, MEDscore correctly identified approximately 70% of the catalytic residues. Remarkably, MEDscore achieved a competitive performance compared with the residue conservation score (e.g. CONscore), the most informative singular feature predominantly employed to identify catalytic residues. To the best of our knowledge, MEDscore is the first singular structural feature exhibiting such an advantage. More importantly, we found that MEDscore is complementary with CONscore and a significantly improved performance can be achieved by combining CONscore with MEDscore in a linear manner. As an implementation of this work, MEDscore has been made freely accessible at http://protein.cau.edu.cn/mepi/.
Collapse
Affiliation(s)
- Lei Han
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, People's Republic of China
| | - Yong-Jun Zhang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, People's Republic of China
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, People's Republic of China
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
| | - Ming S. Liu
- CSIRO - Mathematics, Informatics and Statistics, Clayton, Victoria, Australia
- * E-mail: (MSL); (ZZ)
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, People's Republic of China
- * E-mail: (MSL); (ZZ)
| |
Collapse
|
18
|
Zhang YN, Yu DJ, Li SS, Fan YX, Huang Y, Shen HB. Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features. BMC Bioinformatics 2012; 13:118. [PMID: 22651691 PMCID: PMC3424114 DOI: 10.1186/1471-2105-13-118] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2011] [Accepted: 05/31/2012] [Indexed: 12/23/2022] Open
Abstract
Background Adenosine-5′-triphosphate (ATP) is one of multifunctional nucleotides and plays an important role in cell biology as a coenzyme interacting with proteins. Revealing the binding sites between protein and ATP is significantly important to understand the functionality of the proteins and the mechanisms of protein-ATP complex. Results In this paper, we propose a novel framework for predicting the proteins’ functional residues, through which they can bind with ATP molecules. The new prediction protocol is achieved by combination of sequence evolutional information and bi-profile sampling of multi-view sequential features and the sequence derived structural features. The hypothesis for this strategy is single-view feature can only represent partial target’s knowledge and multiple sources of descriptors can be complementary. Conclusions Prediction performances evaluated by both 5-fold and leave-one-out jackknife cross-validation tests on two benchmark datasets consisting of 168 and 227 non-homologous ATP binding proteins respectively demonstrate the efficacy of the proposed protocol. Our experimental results also reveal that the residue structural characteristics of real protein-ATP binding sites are significant different from those normal ones, for example the binding residues do not show high solvent accessibility propensities, and the bindings prefer to occur at the conjoint points between different secondary structure segments. Furthermore, results also show that performance is affected by the imbalanced training datasets by testing multiple ratios between positive and negative samples in the experiments. Increasing the dataset scale is also demonstrated useful for improving the prediction performances.
Collapse
Affiliation(s)
- Ya-Nan Zhang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | | | | | | | | | | |
Collapse
|
19
|
Frankenstein Z, Sperling J, Sperling R, Eisenstein M. A unique spatial arrangement of the snRNPs within the native spliceosome emerges from in silico studies. Structure 2012; 20:1097-106. [PMID: 22578543 DOI: 10.1016/j.str.2012.03.022] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2011] [Revised: 02/25/2012] [Accepted: 03/26/2012] [Indexed: 02/05/2023]
Abstract
The spliceosome is a mega-Dalton ribonucleoprotein (RNP) assembly that processes primary RNA transcripts, producing functional mRNA. The electron microscopy structures of the native spliceosome and of several spliceosomal subcomplexes are available; however, the spatial arrangement of the latter within the native spliceosome is not known. We designed a computational procedure to efficiently fit thousands of conformers into the spliceosome envelope. Despite the low resolution limitations, we obtained only one model that complies with the available biochemical data. Our model localizes the five small nuclear RNPs (snRNPs) mostly within the large subunit of the native spliceosome, requiring only minor conformation changes. The remaining free volume presumably accommodates additional spliceosomal components. The constituents of the active core of the spliceosome are juxtaposed, forming a continuous surface deep within the large spliceosomal cavity, which provides a sheltered environment for the splicing reaction.
Collapse
Affiliation(s)
- Ziv Frankenstein
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | | | |
Collapse
|
20
|
Shih CH, Chang CM, Lin YS, Lo WC, Hwang JK. Evolutionary information hidden in a single protein structure. Proteins 2012; 80:1647-57. [DOI: 10.1002/prot.24058] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2011] [Revised: 02/07/2012] [Accepted: 02/12/2012] [Indexed: 11/07/2022]
|
21
|
Kochańczyk M. Prediction of functionally important residues in globular proteins from unusual central distances of amino acids. BMC STRUCTURAL BIOLOGY 2011; 11:34. [PMID: 21923943 PMCID: PMC3188475 DOI: 10.1186/1472-6807-11-34] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2011] [Accepted: 09/18/2011] [Indexed: 12/12/2022]
Abstract
BACKGROUND Well-performing automated protein function recognition approaches usually comprise several complementary techniques. Beside constructing better consensus, their predictive power can be improved by either adding or refining independent modules that explore orthogonal features of proteins. In this work, we demonstrated how the exploration of global atomic distributions can be used to indicate functionally important residues. RESULTS Using a set of carefully selected globular proteins, we parametrized continuous probability density functions describing preferred central distances of individual protein atoms. Relative preferred burials were estimated using mixture models of radial density functions dependent on the amino acid composition of a protein under consideration. The unexpectedness of extraordinary locations of atoms was evaluated in the information-theoretic manner and used directly for the identification of key amino acids. In the validation study, we tested capabilities of a tool built upon our approach, called SurpResi, by searching for binding sites interacting with ligands. The tool indicated multiple candidate sites achieving success rates comparable to several geometric methods. We also showed that the unexpectedness is a property of regions involved in protein-protein interactions, and thus can be used for the ranking of protein docking predictions. The computational approach implemented in this work is freely available via a Web interface at http://www.bioinformatics.org/surpresi. CONCLUSIONS Probabilistic analysis of atomic central distances in globular proteins is capable of capturing distinct orientational preferences of amino acids as resulting from different sizes, charges and hydrophobic characters of their side chains. When idealized spatial preferences can be inferred from the sole amino acid composition of a protein, residues located in hydrophobically unfavorable environments can be easily detected. Such residues turn out to be often directly involved in binding ligands or interfacing with other proteins.
Collapse
Affiliation(s)
- Marek Kochańczyk
- Faculty of Physics, Jagiellonian University, ul, Reymonta 4, 30-059 Krakow, Poland.
| |
Collapse
|
22
|
Gamliel R, Kedem K, Kolodny R, Keasar C. A library of protein surface patches discriminates between native structures and decoys generated by structure prediction servers. BMC STRUCTURAL BIOLOGY 2011; 11:20. [PMID: 21542935 PMCID: PMC3114701 DOI: 10.1186/1472-6807-11-20] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2010] [Accepted: 05/04/2011] [Indexed: 11/10/2022]
Abstract
Background Protein surfaces serve as an interface with the molecular environment and are thus tightly bound to protein function. On the surface, geometric and chemical complementarity to other molecules provides interaction specificity for ligand binding, docking of bio-macromolecules, and enzymatic catalysis. As of today, there is no accepted general scheme to represent protein surfaces. Furthermore, most of the research on protein surface focuses on regions of specific interest such as interaction, ligand binding, and docking sites. We present a first step toward a general purpose representation of protein surfaces: a novel surface patch library that represents most surface patches (~98%) in a data set regardless of their functional roles. Results Surface patches, in this work, are small fractions of the protein surface. Using a measure of inter-patch distance, we clustered patches extracted from a data set of high quality, non-redundant, proteins. The surface patch library is the collection of all the cluster centroids; thus, each of the data set patches is close to one of the elements in the library. We demonstrate the biological significance of our method through the ability of the library to capture surface characteristics of native protein structures as opposed to those of decoy sets generated by state-of-the-art protein structure prediction methods. The patches of the decoys are significantly less compatible with the library than their corresponding native structures, allowing us to reliably distinguish native models from models generated by servers. This trend, however, does not extend to the decoys themselves, as their similarity to the native structures does not correlate with compatibility with the library. Conclusions We expect that this high-quality, generic surface patch library will add a new perspective to the description of protein structures and improve our ability to predict them. In particular, we expect that it will help improve the prediction of surface features that are apparently neglected by current techniques. The surface patch libraries are publicly available at http://www.cs.bgu.ac.il/~keasar/patchLibrary.
Collapse
Affiliation(s)
- Roi Gamliel
- Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | | | | | | |
Collapse
|
23
|
Yahalom R, Reshef D, Wiener A, Frankel S, Kalisman N, Lerner B, Keasar C. Structure-based identification of catalytic residues. Proteins 2011; 79:1952-63. [PMID: 21491495 DOI: 10.1002/prot.23020] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2010] [Revised: 01/14/2011] [Accepted: 01/28/2011] [Indexed: 11/10/2022]
Abstract
The identification of catalytic residues is an essential step in functional characterization of enzymes. We present a purely structural approach to this problem, which is motivated by the difficulty of evolution-based methods to annotate structural genomics targets that have few or no homologs in the databases. Our approach combines a state-of-the-art support vector machine (SVM) classifier with novel structural features that augment structural clues by spatial averaging and Z scoring. Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues. This problem is tackled by: (1) optimizing the classifier to maximize a performance criterion that considers both Type I and Type II errors in the classification of catalytic and non-catalytic residues; (2) under-sampling non-catalytic residues before SVM training; and (3) during SVM training, penalizing errors in learning catalytic residues more than errors in learning non-catalytic residues. Tested on four enzyme datasets, one specifically designed by us to mimic the structural genomics scenario and three previously evaluated datasets, our structure-based classifier is never inferior to similar structure-based classifiers and comparable to classifiers that use both structural and evolutionary features. In addition to the evaluation of the performance of catalytic residue identification, we also present detailed case studies on three proteins. This analysis suggests that many false positive predictions may correspond to binding sites and other functional residues. A web server that implements the method, our own-designed database, and the source code of the programs are publicly available at http://www.cs.bgu.ac.il/∼meshi/functionPrediction.
Collapse
Affiliation(s)
- Ran Yahalom
- Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| | | | | | | | | | | | | |
Collapse
|
24
|
Novel feature for catalytic protein residues reflecting interactions with other residues. PLoS One 2011; 6:e16932. [PMID: 21468322 PMCID: PMC3066176 DOI: 10.1371/journal.pone.0016932] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2010] [Accepted: 01/10/2011] [Indexed: 11/29/2022] Open
Abstract
Owing to their potential for systematic analysis, complex networks have been
widely used in proteomics. Representing a protein structure as a topology
network provides novel insight into understanding protein folding mechanisms,
stability and function. Here, we develop a new feature to reveal
correlations between residues using a protein structure network. In an original
attempt to quantify the effects of several key residues on catalytic residues, a
power function was used to model interactions between residues. The results
indicate that focusing on a few residues is a feasible approach to identifying
catalytic residues. The spatial environment surrounding a catalytic residue was
analyzed in a layered manner. We present evidence that correlation between
residues is related to their distance apart most environmental parameters of the
outer layer make a smaller contribution to prediction and ii catalytic residues
tend to be located near key positions in enzyme folds. Feature analysis revealed
satisfactory performance for our features, which were combined with several
conventional features in a prediction model for catalytic residues using a
comprehensive data set from the Catalytic Site Atlas. Values of 88.6 for
sensitivity and 88.4 for specificity were obtained by 10fold crossvalidation.
These results suggest that these features reveal the mutual dependence of
residues and are promising for further study of structurefunction
relationship.
Collapse
|
25
|
Eisenstein M, Ben-Shimon A, Frankenstein Z, Kowalsman N. CAPRI targets T29-T42: proving ground for new docking procedures. Proteins 2011; 78:3174-81. [PMID: 20607697 DOI: 10.1002/prot.22793] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The critical assessment of protein interactions (CAPRI) experiment provides a unique opportunity for unbiased assessment of docking procedures. The recent CAPRI targets T29-T42 entailed docking of bound, unbound, and modeled structures, presenting a wide range of prediction difficulty. We submitted accurate predictions for targets T40, T41, and T42, a good prediction for T32 and acceptable predictions for T29 and T34. The accuracy of our docking results generally matched the prediction difficulty; hence, docking of modeled proteins produced less accurate results. However, there were interesting exceptions: an accurate prediction was submitted for the dimer of modeled tetratricopeptide repeat (T42) and only an acceptable prediction for the bound/unbound case T29. The ensembles of docking models produced in the scans included an acceptable or better prediction for every target. We show here that our recently developed postscan reevaluation procedure, which tests propensity and solvation measures of the whole interface and the interface core, successfully distinguished these predictions from false docking models. For enzyme-inhibitor targets, we show that the distance of the interface from the enzyme's centroid ranked high native like docking models. Also, for one case we demonstrate that docking of an ensemble of conformers produced by normal modes analysis can improve the accuracy of the prediction.
Collapse
Affiliation(s)
- Miriam Eisenstein
- Department of Chemical Research Support, Weizmann Institute of Science, Rehovot 76100, Israel.
| | | | | | | |
Collapse
|
26
|
Sonavane S, Chakrabarti P. Prediction of active site cleft using support vector machines. J Chem Inf Model 2010; 50:2266-73. [PMID: 21080689 DOI: 10.1021/ci1002922] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Computational tools are available today for the detection and delineation of the clefts and cavities in protein 3D structure and ranking them on the basis of probable binding site clefts. There is a need to improve the ranking of clefts and accuracy of predicting catalytic site clefts. Our results show that the distance of the clefts from protein centroid and sequence entropy of the lining residues, when used in conjunction with the volume, are valuable descriptors for predicting the catalytic site. We have applied the SVM approach for recognizing and ranking the active site clefts and tested its performance using different combinations of attributes. In both the ligand-bound and the unbound forms of structures, our method correctly predicts the active site clefts in 73% of cases at rank one. If we consider the results at rank 3 (i.e., the correct solution is among one of the top three solutions), the correctly predicted cases are 94% and 90% for the bound and the unbound forms of structures, respectively. Our approach improves the ranking of binding site clefts in comparison with CASTp and is comparable to other existing methods like Fpocket. Although the data set for training the SVM approach is rather small in size, the results are encouraging for the method to be used as complementary to other existing tools.
Collapse
Affiliation(s)
- Shrihari Sonavane
- Department of Biochemistry and Bioinformatics Centre, Bose Institute, P-1/12 CIT Scheme VIIM, Kolkata 700 054, India
| | | |
Collapse
|
27
|
Abstract
Organisms evolved at high temperatures must maintain their proteins' structures in the face of increased thermal disorder. This challenge results in differences in residue utilization and overall structure. Focusing on thermostable/mesostable pairs of homologous structures, we have examined these differences using novel geometric measures: specifically burial depth (distance from the molecular surface to each atom) and travel depth (distance from the convex hull to the molecular surface that avoids the protein interior). These along with common metrics like packing and Wadell Sphericity are used to gain insight into the constraints experienced by thermophiles. Mean travel depth of hyperthermostable proteins is significantly less than that of their mesostable counterparts, indicating smaller, less numerous and less deep pockets. The mean burial depth of hyperthermostable proteins is significantly higher than that of mesostable proteins indicating that they bury more atoms further from the surface. The burial depth can also be tracked on the individual residue level, adding a finer level of detail to the standard exposed surface area analysis. Hyperthermostable proteins for the first time are shown to be more spherical than their mesostable homologues, regardless of when and how they adapted to extreme temperature. Additionally, residue specific burial depth examinations reveal that charged residues stay unburied, most other residues are slightly more buried and Alanine is more significantly buried in hyperthermostable proteins.
Collapse
Affiliation(s)
- Ryan G Coleman
- The Johnson Research Foundation, Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | | |
Collapse
|
28
|
Bray T, Chan P, Bougouffa S, Greaves R, Doig AJ, Warwicker J. SitesIdentify: a protein functional site prediction tool. BMC Bioinformatics 2009; 10:379. [PMID: 19922660 PMCID: PMC2783165 DOI: 10.1186/1471-2105-10-379] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2009] [Accepted: 11/18/2009] [Indexed: 01/31/2023] Open
Abstract
Background The rate of protein structures being deposited in the Protein Data Bank surpasses the capacity to experimentally characterise them and therefore computational methods to analyse these structures have become increasingly important. Identifying the region of the protein most likely to be involved in function is useful in order to gain information about its potential role. There are many available approaches to predict functional site, but many are not made available via a publicly-accessible application. Results Here we present a functional site prediction tool (SitesIdentify), based on combining sequence conservation information with geometry-based cleft identification, that is freely available via a web-server. We have shown that SitesIdentify compares favourably to other functional site prediction tools in a comparison of seven methods on a non-redundant set of 237 enzymes with annotated active sites. Conclusion SitesIdentify is able to produce comparable accuracy in predicting functional sites to its closest available counterpart, but in addition achieves improved accuracy for proteins with few characterised homologues. SitesIdentify is available via a webserver at http://www.manchester.ac.uk/bioinformatics/sitesidentify/
Collapse
Affiliation(s)
- Tracey Bray
- Faculty of Life Sciences, The University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK.
| | | | | | | | | | | |
Collapse
|
29
|
Yu J, Zhou Y, Tanaka I, Yao M. Roll: a new algorithm for the detection of protein pockets and cavities with a rolling probe sphere. Bioinformatics 2009; 26:46-52. [DOI: 10.1093/bioinformatics/btp599] [Citation(s) in RCA: 225] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
30
|
Lu CH, Huang SW, Lai YL, Lin CP, Shih CH, Huang CC, Hsu WL, Hwang JK. On the relationship between the protein structure and protein dynamics. Proteins 2008; 72:625-34. [PMID: 18247347 DOI: 10.1002/prot.21954] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Recently, we have developed a method (Shih et al., Proteins: Structure, Function, and Bioinformatics 2007;68: 34-38) to compute correlation of fluctuations of proteins. This method, referred to as the protein fixed-point (PFP) model, is based on the positional vectors of atoms issuing from the fixed point, which is the point of the least fluctuations in proteins. One corollary from this model is that atoms lying on the same shell centered at the fixed point will have the same thermal fluctuations. In practice, this model provides a convenient way to compute the average dynamical properties of proteins directly from the geometrical shapes of proteins without the need of any mechanical models, and hence no trajectory integration or sophisticated matrix operations are needed. As a result, it is more efficient than molecular dynamics simulation or normal mode analysis. Though in the previous study the PFP model has been successfully applied to a number of proteins of various folds, it is not clear to what extent this model will be applied. In this article, we have carried out the comprehensive analysis of the PFP model for a dataset comprising 972 high-resolution X-ray structures with pairwise sequence identity <or=25%. We found that in most cases the PFP model works well. However, in case of proteins comprising multiple domains, each domain should be treated separately as an independent dynamical module with its own fixed point; and in case of the protein complex comprising a number of subunits, if functioning as a biological unit, the whole complex should be considered as one single dynamical module with one fixed point. Under such considerations, the resultant correlation coefficient between the computed and the X-ray structural B-factors for the data set is 0.59 and 75% (727/972) of proteins with a correlation coefficient >or=0.5. Our result shows that the fixed-point model is indeed quite general and will be a useful tool for high throughput analysis of dynamical properties of proteins.
Collapse
Affiliation(s)
- Chih-Hao Lu
- Institute of Bioinformatics, National Chiao Tung University, HsinChu 30050, Taiwan, Republic of China
| | | | | | | | | | | | | | | |
Collapse
|
31
|
Fukushima K, Wada M, Sakurai M. An insight into the general relationship between the three dimensional structures of enzymes and their electronic wave functions: Implication for the prediction of functional sites of enzymes. Proteins 2008; 71:1940-54. [PMID: 18186466 DOI: 10.1002/prot.21865] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In this study, we explored the general relationship between the three-dimensional (3D) structures of enzymes and their electronic wave functions. Furthermore, we developed a method for the prediction of their functionally important sites. For this purpose, we first performed linear-scaling molecular orbital calculations for 112 nonredundant, non-homologous enzymes with known structure and function. In consequence, we showed that the canonical molecular orbitals (MOs) of the enzymes could be classified into three groups according to the degree of electron delocalization: highly localized orbitals (Group A), highly delocalized orbitals whose electrons are distributed over almost the whole molecule (Group B), and moderately delocalized orbitals (Group C). The MOs belonging to Group A are located near the HOMO-LUMO band gap, and thereby include the frontier orbitals of a given enzyme. We inferred that the MOs of Group B play a role in stabilizing the 3D structure of the enzyme, while those of Group C contribute to constructing the covalent bond framework of the enzyme. Next, we investigated whether the frontier orbitals of enzymes could be used for identifying their potential functional sites. As a result, we found that the frontier orbitals of the 112 enzymes have a high propensity to be colocalized with the known functional sites, especially when the enzymes are hydrated. Such a propensity is shown to be remarkable when Glu or Asp is a functional site residue. On the basis of these results, we finally propose a protocol for the prediction of functional sites of enzymes.
Collapse
Affiliation(s)
- K Fukushima
- Center for Biological Resources and Informatics, Tokyo Institute of Technology, Midori-ku, Yokohama 226-8501, Japan
| | | | | |
Collapse
|
32
|
Gherardini PF, Helmer-Citterich M. Structure-based function prediction: approaches and applications. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2008; 7:291-302. [PMID: 18599513 DOI: 10.1093/bfgp/eln030] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The ever increasing number of protein structures determined by structural genomic projects has spurred much interest in the development of methods for structure-based function prediction. Existing methods can be roughly classified in two groups: some use a comparative approach looking for the presence of structural motifs possibly associated with a known biochemical function. Other methods try to identify functional patches on the surface of a protein using only its physicochemical characteristics. This review will cover both kinds of approaches to structure-based function prediction as well as their use in real-world cases. The main issues and limitations in using protein structure to predict function will also be discussed. These are mainly: the assessment of the statistical significance of structural similarities and the extent to which these methods depend on the accuracy and availability of structural data.
Collapse
Affiliation(s)
- Pier Federico Gherardini
- Department of Biology, Centre for Molecular Bioinformatics, University of Tor Vergata, Rome, Italy.
| | | |
Collapse
|
33
|
Funnel hunting in a rough terrain: learning and discriminating native energy funnels. Structure 2008; 16:269-79. [PMID: 18275818 DOI: 10.1016/j.str.2007.11.013] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2007] [Revised: 10/15/2007] [Accepted: 11/10/2007] [Indexed: 11/21/2022]
Abstract
Protein folding and binding is commonly depicted as a search for the minimum energy conformation. Modeling of protein complex structures by RosettaDock often results in a set of low-energy conformations near the native structure. Ensembles of low-energy conformations can appear, however, in other regions, especially when backbone movements occur upon binding. What then characterizes the energy landscape near the correct orientation? We applied a machine learning algorithm to distinguish ensembles of low-energy conformations around the native conformation from other low-energy ensembles. The resulting classifier, FunHunt, identifies the native orientation in 50/52 protein complexes in a test set. The features used by FunHunt teach us about the nature of native interfaces. Remarkably, the energy decrease of trajectories toward near-native orientations is significantly larger than for other orientations. This provides a possible explanation for the stability of association in the native orientation.
Collapse
|
34
|
Stout M, Bacardit J, Hirst JD, Krasnogor N. Prediction of recursive convex hull class assignments for protein residues. Bioinformatics 2008; 24:916-23. [DOI: 10.1093/bioinformatics/btn050] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
35
|
Tong W, Williams RJ, Wei Y, Murga LF, Ko J, Ondrechen MJ. Enhanced performance in prediction of protein active sites with THEMATICS and support vector machines. Protein Sci 2007; 17:333-41. [PMID: 18096640 DOI: 10.1110/ps.073213608] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Theoretical microscopic titration curves (THEMATICS) is a computational method for the identification of active sites in proteins through deviations in computed titration behavior of ionizable residues. While the sensitivity to catalytic sites is high, the previously reported sensitivity to catalytic residues was not as high, about 50%. Here THEMATICS is combined with support vector machines (SVM) to improve sensitivity for catalytic residue prediction from protein 3D structure alone. For a test set of 64 proteins taken from the Catalytic Site Atlas (CSA), the average recall rate for annotated catalytic residues is 61%; good precision is maintained selecting only 4% of all residues. The average false positive rate, using the CSA annotations is only 3.2%, far lower than other 3D-structure-based methods. THEMATICS-SVM returns higher precision, lower false positive rate, and better overall performance, compared with other 3D-structure-based methods. Comparison is also made with the latest machine learning methods that are based on both sequence alignments and 3D structures. For annotated sets of well-characterized enzymes, THEMATICS-SVM performance compares very favorably with methods that utilize sequence homology. However, since THEMATICS depends only on the 3D structure of the query protein, no decline in performance is expected when applied to novel folds, proteins with few sequence homologues, or even orphan sequences. An extension of the method to predict non-ionizable catalytic residues is also presented. THEMATICS-SVM predicts a local network of ionizable residues with strong interactions between protonation events; this appears to be a special feature of enzyme active sites.
Collapse
Affiliation(s)
- Wenxu Tong
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts 02115, USA
| | | | | | | | | | | |
Collapse
|
36
|
Sterner B, Singh R, Berger B. Predicting and annotating catalytic residues: an information theoretic approach. J Comput Biol 2007; 14:1058-73. [PMID: 17887954 DOI: 10.1089/cmb.2007.0042] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
We introduce a computational method to predict and annotate the catalytic residues of a protein using only its sequence information, so that we describe both the residues' sequence locations (prediction) and their specific biochemical roles in the catalyzed reaction (annotation). While knowing the chemistry of an enzyme's catalytic residues is essential to understanding its function, the challenges of prediction and annotation have remained difficult, especially when only the enzyme's sequence and no homologous structures are available. Our sequence-based approach follows the guiding principle that catalytic residues performing the same biochemical function should have similar chemical environments; it detects specific conservation patterns near in sequence to known catalytic residues and accordingly constrains what combination of amino acids can be present near a predicted catalytic residue. We associate with each catalytic residue a short sequence profile and define a Kullback-Leibler (KL) distance measure between these profiles, which, as we show, effectively captures even subtle biochemical variations. We apply the method to the class of glycohydrolase enzymes. This class includes proteins from 96 families with very different sequences and folds, many of which perform important functions. In a cross-validation test, our approach correctly predicts the location of the enzymes' catalytic residues with a sensitivity of 80% at a specificity of 99.4%, and in a separate cross-validation we also correctly annotate the biochemical role of 80% of the catalytic residues. Our results compare favorably to existing methods. Moreover, our method is more broadly applicable because it relies on sequence and not structure information; it may, furthermore, be used in conjunction with structure-based methods.
Collapse
Affiliation(s)
- Beckett Sterner
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | | | | |
Collapse
|
37
|
Mistry J, Bateman A, Finn RD. Predicting active site residue annotations in the Pfam database. BMC Bioinformatics 2007; 8:298. [PMID: 17688688 PMCID: PMC2025603 DOI: 10.1186/1471-2105-8-298] [Citation(s) in RCA: 166] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2007] [Accepted: 08/09/2007] [Indexed: 12/03/2022] Open
Abstract
Background Approximately 5% of Pfam families are enzymatic, but only a small fraction of the sequences within these families (<0.5%) have had the residues responsible for catalysis determined. To increase the active site annotations in the Pfam database, we have developed a strict set of rules, chosen to reduce the rate of false positives, which enable the transfer of experimentally determined active site residue data to other sequences within the same Pfam family. Description We have created a large database of predicted active site residues. On comparing our active site predictions to those found in UniProtKB, Catalytic Site Atlas, PROSITE and MEROPS we find that we make many novel predictions. On investigating the small subset of predictions made by these databases that are not predicted by us, we found these sequences did not meet our strict criteria for prediction. We assessed the sensitivity and specificity of our methodology and estimate that only 3% of our predicted sequences are false positives. Conclusion We have predicted 606110 active site residues, of which 94% are not found in UniProtKB, and have increased the active site annotations in Pfam by more than 200 fold. Although implemented for Pfam, the tool we have developed for transferring the data can be applied to any alignment with associated experimental active site data and is available for download. Our active site predictions are re-calculated at each Pfam release to ensure they are comprehensive and up to date. They provide one of the largest available databases of active site annotation.
Collapse
Affiliation(s)
- Jaina Mistry
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Alex Bateman
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Robert D Finn
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
38
|
Xie L, Bourne PE. A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites. BMC Bioinformatics 2007; 8 Suppl 4:S9. [PMID: 17570152 PMCID: PMC1892088 DOI: 10.1186/1471-2105-8-s4-s9] [Citation(s) in RCA: 107] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background An accurate description of protein shape derived from protein structure is necessary to establish an understanding of protein-ligand interactions, which in turn will lead to improved methods for protein-ligand docking and binding site analysis. Most current shape descriptors characterize only the local properties of protein structure using an all-atom representation and are slow to compute. We need new shape descriptors that have the ability to capture both local and global structural information, are robust for application to models and low quality structures and are computationally efficient to permit high throughput analysis of protein structures. Results We introduce a new shape description that requires only the Cα atoms to represent the protein structure, thus making it both fast and suitable for use on models and low quality structures. The notion of a geometric potential is introduced to quantitatively describe the shape of the structure. This geometric potential is dependent on both the global shape of the protein structure as well as the surrounding environment of each residue. When applying the geometric potential for binding site prediction, approximately 85% of known binding sites can be accurately identified with above 50% residue coverage and 80% specificity. Moreover, the algorithm is fast enough for proteome-scale applications. Proteins with fewer than 500 amino acids can be scanned in less than two seconds. Conclusion The reduced representation of the protein structure combined with the geometric potential provides a fast, quantitative description of protein-ligand binding sites with potential for use in large-scale predictions, comparisons and analysis.
Collapse
Affiliation(s)
- Lei Xie
- San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Philip E Bourne
- San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
- Department of Pharmacology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| |
Collapse
|
39
|
Sacquin-Mora S, Laforet E, Lavery R. Locating the active sites of enzymes using mechanical properties. Proteins 2007; 67:350-9. [PMID: 17311346 DOI: 10.1002/prot.21353] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We have applied the calculation of mechanical properties to a dataset of almost 100 enzymes to determine the extent to which catalytic residues have distinct properties. Specifically, we have calculated force constants describing the ease of moving any given amino acid residue with respect to the other residues in the protein. The results show that catalytic residues are invariably associated with high force constants. Choosing an appropriate cutoff enables the detection of roughly 80% of catalytic residues with only 25% of false positives. It is shown that neither multidomain structures, nor the presence or absence of bound ligands hinder successful detections. It is however noted that active sites near the protein surface are more difficult to detect and that non-catalytic, but structurally key residues may also exhibit high force constants.
Collapse
Affiliation(s)
- Sophie Sacquin-Mora
- Laboratoire de Biochimie Théorique, CNRS UPR 9080, Institut de Biologie Physico-Chimique, 13 rue Pierre et Marie Curie, 75005 Paris, France
| | | | | |
Collapse
|
40
|
Neuvirth H, Heinemann U, Birnbaum D, Tishby N, Schreiber G. ProMateus--an open research approach to protein-binding sites analysis. Nucleic Acids Res 2007; 35:W543-8. [PMID: 17488838 PMCID: PMC1933218 DOI: 10.1093/nar/gkm301] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The development of bioinformatic tools by individual labs results in the abundance of parallel programs for the same task. For example, identification of binding site regions between interacting proteins is done using: ProMate, WHISCY, PPI-Pred, PINUP and others. All servers first identify unique properties of binding sites and then incorporate them into a predictor. Obviously, the resulting prediction would improve if the most suitable parameters from each of those predictors would be incorporated into one server. However, because of the variation in methods and databases, this is currently not feasible. Here, the protein-binding site prediction server is extended into a general protein-binding sites research tool, ProMateus. This web tool, based on ProMate's infrastructure enables the easy exploration and incorporation of new features and databases by the user, providing an evaluation of the benefit of individual features and their combination within a set framework. This transforms the individual research into a community exercise, bringing out the best from all users for optimized predictions. The analysis is demonstrated on a database of protein protein and protein-DNA interactions. This approach is basically different from that used in generating meta-servers. The implications of the open-research approach are discussed. ProMateus is available at http://bip.weizmann.ac.il/promate.
Collapse
Affiliation(s)
- Hani Neuvirth
- School of Computer Science and Engineering, The Hebrew University Jerusalem, 91904 and Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Uri Heinemann
- School of Computer Science and Engineering, The Hebrew University Jerusalem, 91904 and Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | - David Birnbaum
- School of Computer Science and Engineering, The Hebrew University Jerusalem, 91904 and Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Naftali Tishby
- School of Computer Science and Engineering, The Hebrew University Jerusalem, 91904 and Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Gideon Schreiber
- School of Computer Science and Engineering, The Hebrew University Jerusalem, 91904 and Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
- *To whom correspondence should be addressed.
| |
Collapse
|
41
|
Relating destabilizing regions to known functional sites in proteins. BMC Bioinformatics 2007; 8:141. [PMID: 17470296 PMCID: PMC1890302 DOI: 10.1186/1471-2105-8-141] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2006] [Accepted: 04/30/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Most methods for predicting functional sites in protein 3D structures, rely on information on related proteins and cannot be applied to proteins with no known relatives. Another limitation of these methods is the lack of a well annotated set of functional sites to use as benchmark for validating their predictions. Experimental findings and theoretical considerations suggest that residues involved in function often contribute unfavorably to the native state stability. We examine the possibility of systematically exploiting this intrinsic property to identify functional sites using an original procedure that detects destabilizing regions in protein structures. In addition, to relate destabilizing regions to known functional sites, a novel benchmark consisting of a diverse set of hand-curated protein functional sites is derived. RESULTS A procedure for detecting clusters of destabilizing residues in protein structures is presented. Individual residue contributions to protein stability are evaluated using detailed atomic models and a force-field successfully applied in computational protein design. The most destabilizing residues, and some of their closest neighbours, are clustered into destabilizing regions following a rigorous protocol. Our procedure is applied to high quality apo-structures of 63 unrelated proteins. The biologically relevant binding sites of these proteins were annotated using all available information, including structural data and literature curation, resulting in the largest hand-curated dataset of binding sites in proteins available to date. Comparing the destabilizing regions with the annotated binding sites in these proteins, we find that the overlap is on average limited, but significantly better than random. Results depend on the type of bound ligand. Significant overlap is obtained for most polysaccharide- and small ligand-binding sites, whereas no overlap is observed for most nucleic acid binding sites. These differences are rationalised in terms of the geometry and energetics of the binding site. CONCLUSION We find that although destabilizing regions as detected here can in general not be used to predict binding sites in protein structures, they can provide useful information, particularly on the location of functional sites that bind polysaccharides and small ligands. This information can be exploited in methods for predicting function in protein structures with no known relatives. Our publicly available benchmark of hand-curated functional sites in proteins should help other workers derive and validate new prediction methods.
Collapse
|
42
|
Selective prediction of interaction sites in protein structures with THEMATICS. BMC Bioinformatics 2007; 8:119. [PMID: 17419878 PMCID: PMC1877815 DOI: 10.1186/1471-2105-8-119] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2006] [Accepted: 04/09/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Methods are now available for the prediction of interaction sites in protein 3D structures. While many of these methods report high success rates for site prediction, often these predictions are not very selective and have low precision. Precision in site prediction is addressed using Theoretical Microscopic Titration Curves (THEMATICS), a simple computational method for the identification of active sites in enzymes. Recall and precision are measured and compared with other methods for the prediction of catalytic sites. RESULTS Using a test set of 169 enzymes from the original Catalytic Residue Dataset (CatRes) it is shown that THEMATICS can deliver precise, localised site predictions. Furthermore, adjustment of the cut-off criteria can improve the recall rates for catalytic residues with only a small sacrifice in precision. Recall rates for CatRes/CSA annotated catalytic residues are 41.1%, 50.4%, and 54.2% for Z score cut-off values of 1.00, 0.99, and 0.98, respectively. The corresponding precision rates are 19.4%, 17.9%, and 16.4%. The success rate for catalytic sites is higher, with correct or partially correct predictions for 77.5%, 85.8%, and 88.2% of the enzymes in the test set, corresponding to the same respective Z score cut-offs, if only the CatRes annotations are used as the reference set. Incorporation of additional literature annotations into the reference set gives total success rates of 89.9%, 92.9%, and 94.1%, again for corresponding cut-off values of 1.00, 0.99, and 0.98. False positive rates for a 75-protein test set are 1.95%, 2.60%, and 3.12% for Z score cut-offs of 1.00, 0.99, and 0.98, respectively. CONCLUSION With a preferred cut-off value of 0.99, THEMATICS achieves a high success rate of interaction site prediction, about 86% correct or partially correct using CatRes/CSA annotations only and about 93% with an expanded reference set. Success rates for catalytic residue prediction are similar to those of other structure-based methods, but with substantially better precision and lower false positive rates. THEMATICS performs well across the spectrum of E.C. classes. The method requires only the structure of the query protein as input. THEMATICS predictions may be obtained via the web from structures in PDB format at: http://pfweb.chem.neu.edu/thematics/submit.html.
Collapse
|
43
|
Coleman RG, Sharp KA. Travel Depth, a New Shape Descriptor for Macromolecules: Application to Ligand Binding. J Mol Biol 2006; 362:441-58. [PMID: 16934837 DOI: 10.1016/j.jmb.2006.07.022] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2006] [Revised: 07/05/2006] [Accepted: 07/10/2006] [Indexed: 10/24/2022]
Abstract
Depth is a term frequently applied to the shape and surface of macromolecules, describing for example the grooves in DNA, the shape of an enzyme active site, or the binding site for a small molecule in a protein. Yet depth is a difficult property to define rigorously in a macromolecule, and few computational tools exist to quantify this notion, to visualize it, or analyze the results. We present our notion of travel depth, simply put the physical distance a solvent molecule would have to travel from a surface point to a suitably defined reference surface. To define the reference surface, we use the limiting form of the molecular surface with increasing probe size: the convex hull. We then present a fast, robust approximation algorithm to compute travel depth to every surface point. The travel depth is useful because it works for pockets of any size and complexity. It also works for two interesting special cases. First, it works on the grooves in DNA, which are unbounded in one direction. Second, it works on the case of tunnels, that is pockets that have no "bottom", but go through the entire macromolecule. Our algorithm makes it straightforward to quantify discussions of depth when analyzing structures. High-throughput analysis of macromolecule depth is also enabled by our algorithm. This is demonstrated by analyzing a database of protein-small molecule binding pockets, and the distribution of bound magnesium ions in RNA structures. These analyses show significant, but subtle effects of depth on ligand binding localization and strength.
Collapse
Affiliation(s)
- Ryan G Coleman
- The Johnson Research Foundation, Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | |
Collapse
|
44
|
Nam HW, Lee GY, Kim YS. Mass spectrometric identification of K210 essential for rat malonyl-CoA decarboxylase catalysis. J Proteome Res 2006; 5:1398-406. [PMID: 16739991 DOI: 10.1021/pr050487g] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Proteomic technology provides useful tools to detect protein modification sites in vivo and in vitro. In this work, we applied proteomics to identify an essential amino acid residue involved in Malonyl-CoA Decarboxylase (MCD) catalysis. A reaction with acetic anhydride and MCD, under mild conditions without acetyl CoA as a substrate, resulted in the acetylation of six lysyl residues, K210, K58, K167, K316, K388, and K444. When acetyl CoA was added to the reaction, K210 was protected from acetylation, indicating a potential role for this residue in catalysis. In addition, K210 was the only lysyl residue, out of six, that was not endogenously acetylated. Because K210, K308, and K388 are conserved across species, they were site-specifically mutated to methionine which is size-wise similar to lysine but not protonated. The K308M and K388M MCD mutants retained 60% of their enzyme activities, whereas the K210M mutant was completely inactive. These results strongly suggest that K210 is an essential residue in rat MCD catalysis and is a likely proton donor to the alpha carbon of malonyl-CoA. Therapeutic inhibition of MCD may be a viable approach to treating various clinical pathologies associated with defective fatty acid metabolism.
Collapse
Affiliation(s)
- Hyung Wook Nam
- Department of Biochemistry, College of Science, Protein Network Research Center, Yonsei University, Seoul, Korea 120-749
| | | | | |
Collapse
|