1
|
Pan C, Lei Z, Wang S, Wang X, Wei D, Cai X, Luoreng Z, Wang L, Ma Y. Genome-wide identification of cyclin-dependent kinase (CDK) genes affecting adipocyte differentiation in cattle. BMC Genomics 2021; 22:532. [PMID: 34253191 PMCID: PMC8276410 DOI: 10.1186/s12864-021-07653-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 04/27/2021] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Cyclin-dependent kinases (CDKs) are protein kinases regulating important cellular processes such as cell cycle and transcription. Many CDK genes also play a critical role during adipogenic differentiation, but the role of CDK gene family in regulating bovine adipocyte differentiation has not been studied. Therefore, the present study aims to characterize the CDK gene family in bovine and study their expression pattern during adipocyte differentiation. RESULTS We performed a genome-wide analysis and identified a number of CDK genes in several bovine species. The CDK genes were classified into 8 subfamilies through phylogenetic analysis. We found that 25 bovine CDK genes were distributed in 16 different chromosomes. Collinearity analysis revealed that the CDK gene family in Bos taurus is homologous with Bos indicus, Hybrid-Bos taurus, Hybrid Bos indicus, Bos grunniens and Bubalus bubalis. Several CDK genes had higher expression levels in preadipocytes than in differentiated adipocytes, as shown by RNA-seq analysis and qPCR, suggesting a role in the growth of emerging lipid droplets. CONCLUSION In this research, 185 CDK genes were identified and grouped into eight distinct clades in Bovidae, showing extensively homology. Global expression analysis of different bovine tissues and specific expression analysis during adipocytes differentiation revealed CDK4, CDK7, CDK8, CDK9 and CDK14 may be involved in bovine adipocyte differentiation. The results provide a basis for further study to determine the roles of CDK gene family in regulating adipocyte differentiation, which is beneficial for beef quality improvement.
Collapse
Affiliation(s)
- Cuili Pan
- School of Agriculture, Ningxia University, Yinchuan, 750021, China
- Key Laboratory of Ruminant Molecular and Cellular Breeding, Ningxia Hui Autonomous Region, Ningxia University, Yinchuan, 750021, China
| | - Zhaoxiong Lei
- School of Agriculture, Ningxia University, Yinchuan, 750021, China
- Key Laboratory of Ruminant Molecular and Cellular Breeding, Ningxia Hui Autonomous Region, Ningxia University, Yinchuan, 750021, China
| | - Shuzhe Wang
- School of Agriculture, Ningxia University, Yinchuan, 750021, China
- Key Laboratory of Ruminant Molecular and Cellular Breeding, Ningxia Hui Autonomous Region, Ningxia University, Yinchuan, 750021, China
| | - Xingping Wang
- School of Agriculture, Ningxia University, Yinchuan, 750021, China
- Key Laboratory of Ruminant Molecular and Cellular Breeding, Ningxia Hui Autonomous Region, Ningxia University, Yinchuan, 750021, China
| | - Dawei Wei
- School of Agriculture, Ningxia University, Yinchuan, 750021, China
- Key Laboratory of Ruminant Molecular and Cellular Breeding, Ningxia Hui Autonomous Region, Ningxia University, Yinchuan, 750021, China
| | - Xiaoyan Cai
- School of Agriculture, Ningxia University, Yinchuan, 750021, China
- Key Laboratory of Ruminant Molecular and Cellular Breeding, Ningxia Hui Autonomous Region, Ningxia University, Yinchuan, 750021, China
| | - Zhuoma Luoreng
- School of Agriculture, Ningxia University, Yinchuan, 750021, China
- Key Laboratory of Ruminant Molecular and Cellular Breeding, Ningxia Hui Autonomous Region, Ningxia University, Yinchuan, 750021, China
| | - Lei Wang
- College of Life Sciences, Xinyang Normal University, Xinyang, 464000, Henan, China
| | - Yun Ma
- School of Agriculture, Ningxia University, Yinchuan, 750021, China.
- Key Laboratory of Ruminant Molecular and Cellular Breeding, Ningxia Hui Autonomous Region, Ningxia University, Yinchuan, 750021, China.
- College of Life Sciences, Xinyang Normal University, Xinyang, 464000, Henan, China.
| |
Collapse
|
2
|
Kinjo AR. Cooperative "folding transition" in the sequence space facilitates function-driven evolution of protein families. J Theor Biol 2018; 443:18-27. [PMID: 29355538 DOI: 10.1016/j.jtbi.2018.01.019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2017] [Revised: 01/16/2018] [Accepted: 01/17/2018] [Indexed: 12/23/2022]
Abstract
In the protein sequence space, natural proteins form clusters of families which are characterized by their unique native folds whereas the great majority of random polypeptides are neither clustered nor foldable to unique structures. Since a given polypeptide can be either foldable or unfoldable, a kind of "folding transition" is expected at the boundary of a protein family in the sequence space. By Monte Carlo simulations of a statistical mechanical model of protein sequence alignment that coherently incorporates both short-range and long-range interactions as well as variable-length insertions to reproduce the statistics of the multiple sequence alignment of a given protein family, we demonstrate the existence of such transition between natural-like sequences and random sequences in the sequence subspaces for 15 domain families of various folds. The transition was found to be highly cooperative and two-state-like. Furthermore, enforcing or suppressing consensus residues on a few of the well-conserved sites enhanced or diminished, respectively, the natural-like pattern formation over the entire sequence. In most families, the key sites included ligand binding sites. These results suggest some selective pressure on the key residues, such as ligand binding activity, may cooperatively facilitate the emergence of a protein family during evolution. From a more practical aspect, the present results highlight an essential role of long-range effects in precisely defining protein families, which are absent in conventional sequence models.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan.
| |
Collapse
|
3
|
Chang KT, Guo J, di Ronza A, Sardiello M. Aminode: Identification of Evolutionary Constraints in the Human Proteome. Sci Rep 2018; 8:1357. [PMID: 29358731 PMCID: PMC5778061 DOI: 10.1038/s41598-018-19744-w] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 01/05/2018] [Indexed: 12/12/2022] Open
Abstract
Evolutionarily constrained regions (ECRs) are a hallmark for sites of critical importance for a protein's structure or function. ECRs can be inferred by comparing the amino acid sequences from multiple protein homologs in the context of the evolutionary relationships that link the analyzed proteins. The compilation and analysis of the datasets required to infer ECRs, however, are time consuming and require skills in coding and bioinformatics, which can limit the use of ECR analysis in the biomedical community. Here, we developed Aminode, a user-friendly webtool for the routine and rapid inference of ECRs. Aminode is pre-loaded with the results of the analysis of the whole human proteome compared with proteomes from 62 additional vertebrate species. Profiles of the relative rates of amino acid substitution and ECR maps of human proteins are available for immediate search and download on the Aminode website. Aminode can also be used for custom analyses of protein families of interest. Interestingly, mapping of known missense variants shows great enrichment of pathogenic variants and depletion of non-pathogenic variants in Aminode-generated ECRs, suggesting that ECR analysis may help evaluate the potential pathogenicity of variants of unknown significance. Aminode is freely available at http://www.aminode.org .
Collapse
Affiliation(s)
- Kevin T Chang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA
| | - Junyan Guo
- Department of Molecular and Human Genetics, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA
- Microsoft Corporation, 1 Microsoft Way, Redmond, WA, 98052, USA
| | - Alberto di Ronza
- Department of Molecular and Human Genetics, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA
| | - Marco Sardiello
- Department of Molecular and Human Genetics, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA.
| |
Collapse
|
4
|
CRHunter: integrating multifaceted information to predict catalytic residues in enzymes. Sci Rep 2016; 6:34044. [PMID: 27665935 PMCID: PMC5036049 DOI: 10.1038/srep34044] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 09/07/2016] [Indexed: 11/08/2022] Open
Abstract
A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.
Collapse
|
5
|
Oda H, Ota M, Toh H. Profile comparison revealed deviation from structural constraint at the positively selected sites. Biosystems 2016; 147:67-77. [PMID: 27443483 DOI: 10.1016/j.biosystems.2016.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2015] [Revised: 07/13/2016] [Accepted: 07/16/2016] [Indexed: 11/18/2022]
Abstract
The amino acid substitutions at a site are affected by mixture of various constraints. It is also known that the amino acid substitutions are accelerated at sites under positive selection. However, the relationship between the substitutions at positively selected sites and the constraints has not been thoroughly examined. The advances in computational biology have enabled us to divide the mixture of the constraints into the structural constraint and the remainings by using the amino acid sequences and the tertiary structures, which is expressed as the deviation of the mixture of constraints from the structural constraint. Here, two types of profiles, or matrices with the size of 20 x (site length), are compared. One of the profiles represents the mixture of constraints, and is generated from a multiple amino acid sequence alignment, whereas the other is designed to represent the structural constraints. We applied the profile comparison method to proteins under positive selection to examine the relationship between the positive selection and constraints. The results suggested that the constraint at a site under positive selection tends to be deviated from the structural constraint at the site.
Collapse
Affiliation(s)
- Hiroyuki Oda
- Graduate School of Systems Life Sciences, Kyushu University, 744 Motooka Nishi-ku, Fukuoka 819-0395, Japan.
| | - Motonori Ota
- Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya City, Aichi 464-8601, Japan
| | - Hiroyuki Toh
- Department of Biomedical Chemistry, School of Science and Technology, Kwansei Gakuin University, 2-1 Gakuen, Sanda, Hyogo 669-1337, Japan
| |
Collapse
|
6
|
Fang C, Noguchi T, Yamana H. Analysis of evolutionary conservation patterns and their influence on identifying protein functional sites. J Bioinform Comput Biol 2015; 12:1440003. [PMID: 25362840 DOI: 10.1142/s0219720014400034] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Evolutionary conservation information included in position-specific scoring matrix (PSSM) has been widely adopted by sequence-based methods for identifying protein functional sites, because all functional sites, whether in ordered or disordered proteins, are found to be conserved at some extent. However, different functional sites have different conservation patterns, some of them are linear contextual, some of them are mingled with highly variable residues, and some others seem to be conserved independently. Every value in PSSMs is calculated independently of each other, without carrying the contextual information of residues in the sequence. Therefore, adopting the direct output of PSSM for prediction fails to consider the relationship between conservation patterns of residues and the distribution of conservation scores in PSSMs. In order to demonstrate the importance of combining PSSMs with the specific conservation patterns of functional sites for prediction, three different PSSM-based methods for identifying three kinds of functional sites have been analyzed. Results suggest that, different PSSM-based methods differ in their capability to identify different patterns of functional sites, and better combining PSSMs with the specific conservation patterns of residues would largely facilitate the prediction.
Collapse
Affiliation(s)
- Chun Fang
- Department of Computer Science and Engineering of Shandong, University of Technology, Shandong 255049, P. R. China
| | | | | |
Collapse
|
7
|
Xiao X, Hui MJ, Liu Z, Qiu WR. iCataly-PseAAC: Identification of Enzymes Catalytic Sites Using Sequence Evolution Information with Grey Model GM (2,1). J Membr Biol 2015; 248:1033-41. [DOI: 10.1007/s00232-015-9815-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 06/06/2015] [Indexed: 11/25/2022]
|
8
|
EXIA2: web server of accurate and rapid protein catalytic residue prediction. BIOMED RESEARCH INTERNATIONAL 2014; 2014:807839. [PMID: 25295274 PMCID: PMC4177735 DOI: 10.1155/2014/807839] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Revised: 05/27/2014] [Accepted: 06/11/2014] [Indexed: 11/18/2022]
Abstract
We propose a method (EXIA2) of catalytic residue prediction based on protein structure without needing homology information. The method is based on the special side chain orientation of catalytic residues. We found that the side chain of catalytic residues usually points to the center of the catalytic site. The special orientation is usually observed in catalytic residues but not in noncatalytic residues, which usually have random side chain orientation. The method is shown to be the most accurate catalytic residue prediction method currently when combined with PSI-Blast sequence conservation. It performs better than other competing methods on several benchmark datasets that include over 1,200 enzyme structures. The areas under the ROC curve (AUC) on these benchmark datasets are in the range from 0.934 to 0.968.
Collapse
|
9
|
De Baets G, Van Durme J, Rousseau F, Schymkowitz J. A genome-wide sequence-structure analysis suggests aggregation gatekeepers constitute an evolutionary constrained functional class. J Mol Biol 2014; 426:2405-12. [PMID: 24735868 DOI: 10.1016/j.jmb.2014.04.007] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2014] [Revised: 03/27/2014] [Accepted: 04/06/2014] [Indexed: 11/15/2022]
Abstract
Protein aggregation is geared by aggregation-prone regions that self-associate by β-strand interactions. Charged residues and prolines are enriched at the flanks of aggregation-prone regions resulting in decreased aggregation. It is still unclear what drives the overrepresentation of these "aggregation gatekeepers", that is, whether their presence results from structural constraints determining protein stability or whether they constitute a bona fide functional class selectively maintained to control protein aggregation. As functional residues are typically conserved regardless of their cost to protein stability, we compared sequence conservation and thermodynamic cost of these residues in 2659 protein families in Escherichia coli. Across protein families, we find gatekeepers to be under strong selective conservation while at the same time representing a significant thermodynamic cost to protein structure. This finding supports the notion that aggregation gatekeepers are not structurally determined but evolutionary selected to control protein aggregation.
Collapse
Affiliation(s)
- Greet De Baets
- Switch Laboratory, Flanders Institute for Biotechnology (Vlaams Instituut voor Biotechnologie), 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, University of Leuven, Herestraat 49, 3000 Leuven, Belgium; Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Joost Van Durme
- Switch Laboratory, Flanders Institute for Biotechnology (Vlaams Instituut voor Biotechnologie), 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, University of Leuven, Herestraat 49, 3000 Leuven, Belgium; Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Frederic Rousseau
- Switch Laboratory, Flanders Institute for Biotechnology (Vlaams Instituut voor Biotechnologie), 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, University of Leuven, Herestraat 49, 3000 Leuven, Belgium.
| | - Joost Schymkowitz
- Switch Laboratory, Flanders Institute for Biotechnology (Vlaams Instituut voor Biotechnologie), 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, University of Leuven, Herestraat 49, 3000 Leuven, Belgium.
| |
Collapse
|
10
|
Bianchi V, Mangone I, Ferrè F, Helmer-Citterich M, Ausiello G. webPDBinder: a server for the identification of ligand binding sites on protein structures. Nucleic Acids Res 2013; 41:W308-13. [PMID: 23737450 PMCID: PMC3692056 DOI: 10.1093/nar/gkt457] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
The webPDBinder (http://pdbinder.bio.uniroma2.it/PDBinder) is a web server for the identification of small ligand-binding sites in a protein structure. webPDBinder searches a protein structure against a library of known binding sites and a collection of control non-binding pockets. The number of similarities identified with the residues in the two sets is then used to derive a propensity value for each residue of the query protein associated to the likelihood that the residue is part of a ligand binding site. The predicted binding residues can be further refined using conservation scores derived from the multiple alignment of the PFAM protein family. webPDBinder correctly identifies residues belonging to the binding site in 77% of the cases and is able to identify binding pockets starting from holo or apo structures with comparable performances. This is important for all the real world cases where the query protein has been crystallized without a ligand and is also difficult to obtain clear similarities with bound pockets from holo pocket libraries. The input is either a PDB code or a user-submitted structure. The output is a list of predicted binding pocket residues with propensity and conservation values both in text and graphical format.
Collapse
Affiliation(s)
- Valerio Bianchi
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
| | | | | | | | | |
Collapse
|
11
|
Kanematsu Y, Koike R, Amemiya T, Ota M. Substrate-shielding and hydrolytic reaction in hydrolases. Proteins 2013; 81:926-32. [DOI: 10.1002/prot.24253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Revised: 12/10/2012] [Accepted: 01/04/2013] [Indexed: 11/07/2022]
|
12
|
On the structural context and identification of enzyme catalytic residues. BIOMED RESEARCH INTERNATIONAL 2013; 2013:802945. [PMID: 23484160 PMCID: PMC3581254 DOI: 10.1155/2013/802945] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 12/28/2012] [Indexed: 11/25/2022]
Abstract
Enzymes play important roles in most of the biological processes. Although only a small fraction of residues are directly involved in catalytic reactions, these catalytic residues are the most crucial parts in enzymes. The study of the fundamental and unique features of catalytic residues benefits the understanding of enzyme functions and catalytic mechanisms. In this work, we analyze the structural context of catalytic residues based on theoretical and experimental structure flexibility. The results show that catalytic residues have distinct structural features and context. Their neighboring residues, whether sequence or structure neighbors within specific range, are usually structurally more rigid than those of noncatalytic residues. The structural context feature is combined with support vector machine to identify catalytic residues from enzyme structure. The prediction results are better or comparable to those of recent structure-based prediction methods.
Collapse
|
13
|
Gao YF, Li BQ, Cai YD, Feng KY, Li ZD, Jiang Y. Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection. ACTA ACUST UNITED AC 2013; 9:61-9. [DOI: 10.1039/c2mb25327e] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
14
|
Accurate prediction of protein catalytic residues by side chain orientation and residue contact density. PLoS One 2012; 7:e47951. [PMID: 23110141 PMCID: PMC3480458 DOI: 10.1371/journal.pone.0047951] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2012] [Accepted: 09/18/2012] [Indexed: 11/19/2022] Open
Abstract
Prediction of protein catalytic residues provides useful information for the studies of protein functions. Most of the existing methods combine both structure and sequence information but heavily rely on sequence conservation from multiple sequence alignments. The contribution of structure information is usually less than that of sequence conservation in existing methods. We found a novel structure feature, residue side chain orientation, which is the first structure-based feature that achieves prediction results comparable to that of evolutionary sequence conservation. We developed a structure-based method, Enzyme Catalytic residue SIde-chain Arrangement (EXIA), which is based on residue side chain orientations and backbone flexibility of protein structure. The prediction that uses EXIA outperforms existing structure-based features. The prediction quality of combing EXIA and sequence conservation exceeds that of the state-of-the-art prediction methods. EXIA is designed to predict catalytic residues from single protein structure without needing sequence or structure alignments. It provides invaluable information when there is no sufficient or reliable homology information for target protein. We found that catalytic residues have very special side chain orientation and designed the EXIA method based on the newly discovered feature. It was also found that EXIA performs well for a dataset of enzymes without any bounded ligand in their crystallographic structures.
Collapse
|
15
|
Han L, Zhang YJ, Song J, Liu MS, Zhang Z. Identification of catalytic residues using a novel feature that integrates the microenvironment and geometrical location properties of residues. PLoS One 2012; 7:e41370. [PMID: 22829945 PMCID: PMC3400608 DOI: 10.1371/journal.pone.0041370] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Accepted: 06/20/2012] [Indexed: 11/18/2022] Open
Abstract
Enzymes play a fundamental role in almost all biological processes and identification of catalytic residues is a crucial step for deciphering the biological functions and understanding the underlying catalytic mechanisms. In this work, we developed a novel structural feature called MEDscore to identify catalytic residues, which integrated the microenvironment (ME) and geometrical properties of amino acid residues. Firstly, we converted a residue's ME into a series of spatially neighboring residue pairs, whose likelihood of being located in a catalytic ME was deduced from a benchmark enzyme dataset. We then calculated an ME-based score, termed as MEscore, by summing up the likelihood of all residue pairs. Secondly, we defined a parameter called Dscore to measure the relative distance of a residue to the center of the protein, provided that catalytic residues are typically located in the center of the protein structure. Finally, we defined the MEDscore feature based on an effective nonlinear integration of MEscore and Dscore. When evaluated on a well-prepared benchmark dataset using five-fold cross-validation tests, MEDscore achieved a robust performance in identifying catalytic residues with an AUC1.0 of 0.889. At a ≤ 10% false positive rate control, MEDscore correctly identified approximately 70% of the catalytic residues. Remarkably, MEDscore achieved a competitive performance compared with the residue conservation score (e.g. CONscore), the most informative singular feature predominantly employed to identify catalytic residues. To the best of our knowledge, MEDscore is the first singular structural feature exhibiting such an advantage. More importantly, we found that MEDscore is complementary with CONscore and a significantly improved performance can be achieved by combining CONscore with MEDscore in a linear manner. As an implementation of this work, MEDscore has been made freely accessible at http://protein.cau.edu.cn/mepi/.
Collapse
Affiliation(s)
- Lei Han
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, People's Republic of China
| | - Yong-Jun Zhang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, People's Republic of China
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, People's Republic of China
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
| | - Ming S. Liu
- CSIRO - Mathematics, Informatics and Statistics, Clayton, Victoria, Australia
- * E-mail: (MSL); (ZZ)
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, People's Republic of China
- * E-mail: (MSL); (ZZ)
| |
Collapse
|
16
|
Bianchi V, Gherardini PF, Helmer-Citterich M, Ausiello G. Identification of binding pockets in protein structures using a knowledge-based potential derived from local structural similarities. BMC Bioinformatics 2012; 13 Suppl 4:S17. [PMID: 22536963 PMCID: PMC3434446 DOI: 10.1186/1471-2105-13-s4-s17] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Background The identification of ligand binding sites is a key task in the annotation of proteins with known structure but uncharacterized function. Here we describe a knowledge-based method exploiting the observation that unrelated binding sites share small structural motifs that bind the same chemical fragments irrespective of the nature of the ligand as a whole. Results PDBinder compares a query protein against a library of binding and non-binding protein surface regions derived from the PDB. The results of the comparison are used to derive a propensity value for each residue which is correlated with the likelihood that the residue is part of a ligand binding site. The method was applied to two different problems: i) the prediction of ligand binding residues and ii) the identification of which surface cleft harbours the binding site. In both cases PDBinder performed consistently better than existing methods. PDBinder has been trained on a non-redundant set of 1356 high-quality protein-ligand complexes and tested on a set of 239 holo and apo complex pairs. We obtained an MCC of 0.313 on the holo set with a PPV of 0.413 while on the apo set we achieved an MCC of 0.271 and a PPV of 0.372. Conclusions We show that PDBinder performs better than existing methods. The good performance on the unbound proteins is extremely important for real-world applications where the location of the binding site is unknown. Moreover, since our approach is orthogonal to those used in other programs, the PDBinder propensity value can be integrated in other algorithms further increasing the final performance.
Collapse
Affiliation(s)
- Valerio Bianchi
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, Rome 00133, Italy
| | | | | | | |
Collapse
|
17
|
Zhao J, Dundas J, Kachalo S, Ouyang Z, Liang J. Accuracy of functional surfaces on comparatively modeled protein structures. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2011; 12:97-107. [PMID: 21541664 PMCID: PMC3415962 DOI: 10.1007/s10969-011-9109-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2010] [Accepted: 04/20/2011] [Indexed: 12/18/2022]
Abstract
Identification and characterization of protein functional surfaces are important for predicting protein function, understanding enzyme mechanism, and docking small compounds to proteins. As the rapid speed of accumulation of protein sequence information far exceeds that of structures, constructing accurate models of protein functional surfaces and identify their key elements become increasingly important. A promising approach is to build comparative models from sequences using known structural templates such as those obtained from structural genome projects. Here we assess how well this approach works in modeling binding surfaces. By systematically building three-dimensional comparative models of proteins using MODELLER: , we determine how well functional surfaces can be accurately reproduced. We use an alpha shape based pocket algorithm to compute all pockets on the modeled structures, and conduct a large-scale computation of similarity measurements (pocket RMSD and fraction of functional atoms captured) for 26,590 modeled enzyme protein structures. Overall, we find that when the sequence fragment of the binding surfaces has more than 45% identity to that of the template protein, the modeled surfaces have on average an RMSD of 0.5 Å, and contain 48% or more of the binding surface atoms, with nearly all of the important atoms in the signatures of binding pockets captured.
Collapse
Affiliation(s)
- Jieling Zhao
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Joe Dundas
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Sema Kachalo
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Zheng Ouyang
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Jie Liang
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| |
Collapse
|
18
|
Yahalom R, Reshef D, Wiener A, Frankel S, Kalisman N, Lerner B, Keasar C. Structure-based identification of catalytic residues. Proteins 2011; 79:1952-63. [PMID: 21491495 DOI: 10.1002/prot.23020] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2010] [Revised: 01/14/2011] [Accepted: 01/28/2011] [Indexed: 11/10/2022]
Abstract
The identification of catalytic residues is an essential step in functional characterization of enzymes. We present a purely structural approach to this problem, which is motivated by the difficulty of evolution-based methods to annotate structural genomics targets that have few or no homologs in the databases. Our approach combines a state-of-the-art support vector machine (SVM) classifier with novel structural features that augment structural clues by spatial averaging and Z scoring. Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues. This problem is tackled by: (1) optimizing the classifier to maximize a performance criterion that considers both Type I and Type II errors in the classification of catalytic and non-catalytic residues; (2) under-sampling non-catalytic residues before SVM training; and (3) during SVM training, penalizing errors in learning catalytic residues more than errors in learning non-catalytic residues. Tested on four enzyme datasets, one specifically designed by us to mimic the structural genomics scenario and three previously evaluated datasets, our structure-based classifier is never inferior to similar structure-based classifiers and comparable to classifiers that use both structural and evolutionary features. In addition to the evaluation of the performance of catalytic residue identification, we also present detailed case studies on three proteins. This analysis suggests that many false positive predictions may correspond to binding sites and other functional residues. A web server that implements the method, our own-designed database, and the source code of the programs are publicly available at http://www.cs.bgu.ac.il/∼meshi/functionPrediction.
Collapse
Affiliation(s)
- Ran Yahalom
- Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| | | | | | | | | | | | | |
Collapse
|
19
|
Sonavane S, Chakrabarti P. Prediction of active site cleft using support vector machines. J Chem Inf Model 2010; 50:2266-73. [PMID: 21080689 DOI: 10.1021/ci1002922] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Computational tools are available today for the detection and delineation of the clefts and cavities in protein 3D structure and ranking them on the basis of probable binding site clefts. There is a need to improve the ranking of clefts and accuracy of predicting catalytic site clefts. Our results show that the distance of the clefts from protein centroid and sequence entropy of the lining residues, when used in conjunction with the volume, are valuable descriptors for predicting the catalytic site. We have applied the SVM approach for recognizing and ranking the active site clefts and tested its performance using different combinations of attributes. In both the ligand-bound and the unbound forms of structures, our method correctly predicts the active site clefts in 73% of cases at rank one. If we consider the results at rank 3 (i.e., the correct solution is among one of the top three solutions), the correctly predicted cases are 94% and 90% for the bound and the unbound forms of structures, respectively. Our approach improves the ranking of binding site clefts in comparison with CASTp and is comparable to other existing methods like Fpocket. Although the data set for training the SVM approach is rather small in size, the results are encouraging for the method to be used as complementary to other existing tools.
Collapse
Affiliation(s)
- Shrihari Sonavane
- Department of Biochemistry and Bioinformatics Centre, Bose Institute, P-1/12 CIT Scheme VIIM, Kolkata 700 054, India
| | | |
Collapse
|
20
|
Chikhi R, Sael L, Kihara D. Real-time ligand binding pocket database search using local surface descriptors. Proteins 2010; 78:2007-28. [PMID: 20455259 DOI: 10.1002/prot.22715] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Because of the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two-dimensional pseudo-Zernike moments or the three-dimensional Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark studies employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed.
Collapse
Affiliation(s)
- Rayan Chikhi
- Computer Science Department, Ecole Normale Supérieure de Cachan, 94235 Cachan cedex, Britanny, France
| | | | | |
Collapse
|
21
|
Nagao C, Nagano N, Mizuguchi K. Relationships between functional subclasses and information contained in active-site and ligand-binding residues in diverse superfamilies. Proteins 2010; 78:2369-84. [PMID: 20544971 DOI: 10.1002/prot.22750] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
To investigate the relationships between functional subclasses and sequence and structural information contained in the active-site and ligand-binding residues (LBRs), we performed a detailed analysis of seven diverse enzyme superfamilies: aldolase class I, TIM-barrel glycosidases, alpha/beta-hydrolases, P-loop containing nucleotide triphosphate hydrolases, collagenase, Zn peptidases, and glutamine phosphoribosylpyrophosphate, subunit 1, domain 1. These homologous superfamilies, as defined in CATH, were selected from the enzyme catalytic-mechanism database. We defined active-site and LBRs based solely on the literature information and complex structures in the Protein Data Bank. From a structure-based multiple sequence alignment for each CATH homologous superfamily, we extracted subsequences consisting of the aligned positions that were used as an active-site or a ligand-binding site by at least one sequence. Using both the subsequences and full-length alignments, we performed cluster analysis with three sequence distance measures. We showed that the cluster analysis using the subsequences was able to detect functional subclasses more accurately than the clustering using the full-length alignments. The subsequences determined by only the literature information and complex structures, thus, had sufficient information to detect the functional subclasses. Detailed examination of the clustering results provided new insights into the mechanism of functional diversification for these superfamilies.
Collapse
Affiliation(s)
- Chioko Nagao
- National Institute of Biomedical Innovation, 7-6-8 Saito-Asagi, Ibaraki, Osaka 567-0085, Japan
| | | | | |
Collapse
|
22
|
Wilkins AD, Lua R, Erdin S, Ward RM, Lichtarge O. Sequence and structure continuity of evolutionary importance improves protein functional site discovery and annotation. Protein Sci 2010; 19:1296-311. [PMID: 20506260 DOI: 10.1002/pro.406] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Protein functional sites control most biological processes and are important targets for drug design and protein engineering. To characterize them, the evolutionary trace (ET) ranks the relative importance of residues according to their evolutionary variations. Generally, top-ranked residues cluster spatially to define evolutionary hotspots that predict functional sites in structures. Here, various functions that measure the physical continuity of ET ranks among neighboring residues in the structure, or in the sequence, are shown to inform sequence selection and to improve functional site resolution. This is shown first, in 110 proteins, for which the overlap between top-ranked residues and actual functional sites rose by 8% in significance. Then, on a structural proteomic scale, optimized ET led to better 3D structure-function motifs (3D templates) and, in turn, to enzyme function prediction by the Evolutionary Trace Annotation (ETA) method with better sensitivity of (40% to 53%) and positive predictive value (93% to 94%). This suggests that the similarity of evolutionary importance among neighboring residues in the sequence and in the structure is a universal feature of protein evolution. In practice, this yields a tool for optimizing sequence selections for comparative analysis and, via ET, for better predictions of functional site and function. This should prove useful for the efficient mutational redesign of protein function and for pharmaceutical targeting.
Collapse
Affiliation(s)
- A D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | | | | | |
Collapse
|
23
|
Thangudu RR, Tyagi M, Shoemaker BA, Bryant SH, Panchenko AR, Madej T. Knowledge-based annotation of small molecule binding sites in proteins. BMC Bioinformatics 2010; 11:365. [PMID: 20594344 PMCID: PMC2909224 DOI: 10.1186/1471-2105-11-365] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Accepted: 07/01/2010] [Indexed: 11/16/2022] Open
Abstract
Background The study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity. Results We have developed a new method for the annotation of protein-small molecule binding sites, using inference by homology, which allows us to extend annotation onto protein sequences without experimental data available. To ensure biological relevance of binding sites, our method clusters similar binding sites found in homologous protein structures based on their sequence and structure conservation. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query and to better gauge their biological relevance. The method also facilitates a succinct and informative representation of observed and inferred binding sites from homologs with known three-dimensional structures, thereby providing the means to analyze conservation and diversity of binding modes. Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting point in small molecule virtual screening. The method was validated by comparison to other binding site prediction methods and to a collection of manually curated binding site annotations. We show that our method achieves a sensitivity of 72% at predicting biologically relevant binding sites and can accurately discriminate those sites that bind biological small molecules from non-biological ones. Conclusions A new algorithm has been developed to predict binding sites with high accuracy in terms of their biological validity. It also provides a common platform for function prediction, knowledge-based docking and for small molecule virtual screening. The method can be applied even for a query sequence without structure. The method is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi.
Collapse
Affiliation(s)
- Ratna R Thangudu
- National Center for Biotechnology Information, 8600 Rockville Pike, Building 38A, Bethesda, MD 20894, USA
| | | | | | | | | | | |
Collapse
|
24
|
Xin F, Myers S, Li YF, Cooper DN, Mooney SD, Radivojac P. Structure-based kernels for the prediction of catalytic residues and their involvement in human inherited disease. ACTA ACUST UNITED AC 2010; 26:1975-82. [PMID: 20551136 DOI: 10.1093/bioinformatics/btq319] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
MOTIVATION Enzyme catalysis is involved in numerous biological processes and the disruption of enzymatic activity has been implicated in human disease. Despite this, various aspects of catalytic reactions are not completely understood, such as the mechanics of reaction chemistry and the geometry of catalytic residues within active sites. As a result, the computational prediction of catalytic residues has the potential to identify novel catalytic pockets, aid in the design of more efficient enzymes and also predict the molecular basis of disease. RESULTS We propose a new kernel-based algorithm for the prediction of catalytic residues based on protein sequence, structure and evolutionary information. The method relies upon explicit modeling of similarity between residue-centered neighborhoods in protein structures. We present evidence that this algorithm evaluates favorably against established approaches, and also provides insights into the relative importance of the geometry, physicochemical properties and evolutionary conservation of catalytic residue activity. The new algorithm was used to identify known mutations associated with inherited disease whose molecular mechanism might be predicted to operate specifically though the loss or gain of catalytic residues. It should, therefore, provide a viable approach to identifying the molecular basis of disease in which the loss or gain of function is not caused solely by the disruption of protein stability. Our analysis suggests that both mechanisms are actively involved in human inherited disease. AVAILABILITY AND IMPLEMENTATION Source code for the structural kernel is available at www.informatics.indiana.edu/predrag/.
Collapse
Affiliation(s)
- Fuxiao Xin
- School of Informatics and Computing, Indiana University, Bloomington, IN 47408, USA
| | | | | | | | | | | |
Collapse
|
25
|
Du S, Sakurai M. Multivariate analysis of properties of amino acid residues in proteins from a viewpoint of functional site prediction. Chem Phys Lett 2010. [DOI: 10.1016/j.cplett.2010.02.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
26
|
Sankararaman S, Sha F, Kirsch JF, Jordan MI, Sjölander K. Active site prediction using evolutionary and structural information. ACTA ACUST UNITED AC 2010; 26:617-24. [PMID: 20080507 PMCID: PMC2828116 DOI: 10.1093/bioinformatics/btq008] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Motivation: The identification of catalytic residues is a key step in understanding the function of enzymes. While a variety of computational methods have been developed for this task, accuracies have remained fairly low. The best existing method exploits information from sequence and structure to achieve a precision (the fraction of predicted catalytic residues that are catalytic) of 18.5% at a corresponding recall (the fraction of catalytic residues identified) of 57% on a standard benchmark. Here we present a new method, Discern, which provides a significant improvement over the state-of-the-art through the use of statistical techniques to derive a model with a small set of features that are jointly predictive of enzyme active sites. Results: In cross-validation experiments on two benchmark datasets from the Catalytic Site Atlas and CATRES resources containing a total of 437 manually curated enzymes spanning 487 SCOP families, Discern increases catalytic site recall between 12% and 20% over methods that combine information from both sequence and structure, and by ≥50% over methods that make use of sequence conservation signal only. Controlled experiments show that Discern's improvement in catalytic residue prediction is derived from the combination of three ingredients: the use of the INTREPID phylogenomic method to extract conservation information; the use of 3D structure data, including features computed for residues that are proximal in the structure; and a statistical regularization procedure to prevent overfitting. Contact:kimmen@berkeley.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
27
|
Bray T, Chan P, Bougouffa S, Greaves R, Doig AJ, Warwicker J. SitesIdentify: a protein functional site prediction tool. BMC Bioinformatics 2009; 10:379. [PMID: 19922660 PMCID: PMC2783165 DOI: 10.1186/1471-2105-10-379] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2009] [Accepted: 11/18/2009] [Indexed: 01/31/2023] Open
Abstract
Background The rate of protein structures being deposited in the Protein Data Bank surpasses the capacity to experimentally characterise them and therefore computational methods to analyse these structures have become increasingly important. Identifying the region of the protein most likely to be involved in function is useful in order to gain information about its potential role. There are many available approaches to predict functional site, but many are not made available via a publicly-accessible application. Results Here we present a functional site prediction tool (SitesIdentify), based on combining sequence conservation information with geometry-based cleft identification, that is freely available via a web-server. We have shown that SitesIdentify compares favourably to other functional site prediction tools in a comparison of seven methods on a non-redundant set of 237 enzymes with annotated active sites. Conclusion SitesIdentify is able to produce comparable accuracy in predicting functional sites to its closest available counterpart, but in addition achieves improved accuracy for proteins with few characterised homologues. SitesIdentify is available via a webserver at http://www.manchester.ac.uk/bioinformatics/sitesidentify/
Collapse
Affiliation(s)
- Tracey Bray
- Faculty of Life Sciences, The University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK.
| | | | | | | | | | | |
Collapse
|
28
|
Ramanathan K, Shanthi V, Sethumadhavan R. In silico identification of catalytic residues in azobenzene reductase from Bacillus subtilis and its docking studies with azo dyes. Interdiscip Sci 2009; 1:290-7. [PMID: 20640807 DOI: 10.1007/s12539-009-0035-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2008] [Revised: 09/02/2009] [Accepted: 09/05/2009] [Indexed: 11/28/2022]
Abstract
Prediction of catalytic residues of an enzyme molecule is of great importance for a range of applications including molecular docking, drug design, structural identification and comparison of binding sites. Over the last decades, many studies have been conducted to identify the enzyme catalytic site. But, the catalytic residues of the azobenzene reductase from bacillus subtilis are still unknown. Investigation shows that under anaerobic conditions, azo dyes can be reduced by this enzyme and other environmental microorganisms to colorless amines, which may be toxic, mutagenic, and carcinogenic to humans and animals. To assess and estimate the toxicity, it is essential to identify the catalytic residues of this enzyme. The computational methods developed that address this issue are few. In this approach, we identify the catalytic residues of azobenzene reductase from bacillus subtilis, which were then analyzed in terms of properties including function, conservation, hydrogen bonding, B-factor, solvent accessibility, and flexibility. The results indicate that, Lys (83) and Tyr (74) play an important role as catalytic site residues in the azobenzene reductase from bacillus subtilis. It is hoped that this information will provide a better understanding of the molecular mechanisms involved in catalysis and a heuristic basis for predicting the catalytic residues in enzymes of unknown function. In this study, our approach mainly looks for a better understanding of the biodegradation of the Sudan I, Sudan II, Sudan III and Sudan IV dyes mediated by azobenzene reductase from bacillus subtilis. Further more, the catalytic site residues information is essential for understanding and altering substrate specificity and for the design of enzyme inhibitors.
Collapse
Affiliation(s)
- K Ramanathan
- School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | | | | |
Collapse
|
29
|
Thomas VL, McReynolds AC, Shoichet BK. Structural bases for stability-function tradeoffs in antibiotic resistance. J Mol Biol 2009; 396:47-59. [PMID: 19913034 DOI: 10.1016/j.jmb.2009.11.005] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2009] [Revised: 11/02/2009] [Accepted: 11/04/2009] [Indexed: 10/20/2022]
Abstract
Preorganization of enzyme active sites for substrate recognition typically comes at a cost to the stability of the folded form of the protein; consequently, enzymes can be dramatically stabilized by substitutions that attenuate the size and preorganization "strain" of the active site. How this stability-activity tradeoff constrains enzyme evolution has remained less certain, and it is unclear whether one should expect major stability insults as enzymes mutate towards new activities or how these new activities manifest structurally. These questions are both germane and easy to study in beta-lactamases, which are evolving on the timescale of years to confer resistance to an ever-broader spectrum of beta-lactam antibiotics. To explore whether stability is a substantial constraint on this antibiotic resistance evolution, we investigated extended-spectrum mutants of class C beta-lactamases, which had evolved new activity versus third-generation cephalosporins. Five mutant enzymes had between 100-fold and 200-fold increased activity against the antibiotic cefotaxime in enzyme assays, and the mutant enzymes all lost thermodynamic stability (from 1.7 kcal mol(-)(1) to 4.1 kcal mol(-)(1)), consistent with the stability-function hypothesis. Intriguingly, several of the substitutions were 10-20 A from the catalytic serine; the question of how they conferred extended-spectrum activity arose. Eight structures, including complexes with inhibitors and extended-spectrum antibiotics, were determined by X-ray crystallography. Distinct mechanisms of action, including changes in the flexibility and ground-state structures of the enzyme, are revealed for each mutant. These results explain the structural bases for the antibiotic resistance conferred by these substitutions and their corresponding decrease in protein stability, which will constrain the evolution of new antibiotic resistance.
Collapse
Affiliation(s)
- Veena L Thomas
- Graduate Program in Pharmaceutical Sciences and Pharmacogenomics, University of California San Francisco, San Francisco, CA 94158-2518, USA
| | | | | |
Collapse
|
30
|
Paramesvaran J, Hibbert EG, Russell AJ, Dalby PA. Distributions of enzyme residues yielding mutants with improved substrate specificities from two different directed evolution strategies. Protein Eng Des Sel 2009; 22:401-11. [DOI: 10.1093/protein/gzp020] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
31
|
Tong W, Wei Y, Murga LF, Ondrechen MJ, Williams RJ. Partial order optimum likelihood (POOL): maximum likelihood prediction of protein active site residues using 3D Structure and sequence properties. PLoS Comput Biol 2009; 5:e1000266. [PMID: 19148270 PMCID: PMC2612599 DOI: 10.1371/journal.pcbi.1000266] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2008] [Accepted: 12/04/2008] [Indexed: 11/24/2022] Open
Abstract
A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers. Genome sequencing has revealed the codes for thousands of previously unknown proteins for humans and for hundreds of other species. Many of these proteins are of unknown or unclear function. The information contained in the genome sequences holds tremendous potential benefit to humankind, including new approaches to the diagnosis and treatment of disease. In order to realize these benefits, a key step is to understand the functions of the proteins for which these genes hold the code. A first step in understanding the function of a protein is to identify the functional site, the local area on the surface of a protein where it affects its functional activity. This paper reports on a new computational methodology to predict protein functional sites from protein 3D structures. A new machine learning approach called Partial Order Optimum Likelihood (POOL) is introduced here. It is shown that POOL outperforms previous methods for the prediction of protein functional sites from 3D structures.
Collapse
Affiliation(s)
- Wenxu Tong
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts, United States of America
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
| | - Ying Wei
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, United States of America
| | - Leonel F. Murga
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, United States of America
| | - Mary Jo Ondrechen
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, United States of America
- * E-mail: (MO); (RJW)
| | - Ronald J. Williams
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts, United States of America
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
- * E-mail: (MO); (RJW)
| |
Collapse
|
32
|
Abstract
Protein–DNA/RNA/protein interactions play critical roles in many biological functions. Previous studies have focused on the different features characterizing the different macromolecule-binding sites and approaches to detect these sites. However, no common unique signature of these sites had been reported. Thus, this work aims to provide a ‘common’ principle dictating the location of the different macromolecule-binding sites founded upon fundamental principles of binding thermodynamics. To achieve this aim, a comprehensive set of structurally nonhomologous DNA-, RNA-, obligate protein- and nonobligate protein-binding proteins, both free and bound to their respective macromolecules, was created and a novel strategy for detecting clusters of residues with electrostatic or steric strain given the protein structure was developed. The results show that regardless of the macromolecule type, the binding strength and conformational changes upon binding, macromolecule-binding sites are energetically less stable than nonmacromolecule-binding sites. They also reveal new energetic features distinguishing DNA- from RNA-binding sites and obligate protein- from nonobligate protein-binding sites in both free/bound protein structures.
Collapse
Affiliation(s)
- Yao Chi Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | | |
Collapse
|
33
|
Zhang T, Zhang H, Chen K, Shen S, Ruan J, Kurgan L. Accurate sequence-based prediction of catalytic residues. ACTA ACUST UNITED AC 2008; 24:2329-38. [PMID: 18710875 DOI: 10.1093/bioinformatics/btn433] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Prediction of catalytic residues provides useful information for the research on function of enzymes. Most of the existing prediction methods are based on structural information, which limits their use. We propose a sequence-based catalytic residue predictor that provides predictions with quality comparable to modern structure-based methods and that exceeds quality of state-of-the-art sequence-based methods. RESULTS Our method (CRpred) uses sequence-based features and the sequence-derived PSI-BLAST profile. We used feature selection to reduce the dimensionality of the input (and explain the input) to support vector machine (SVM) classifier that provides predictions. Tests on eight datasets and side-by-side comparison with six modern structure- and sequence-based predictors show that CRpred provides predictions with quality comparable to current structure-based methods and better than sequence-based methods. The proposed method obtains 15-19% precision and 48-58% TP (true positive) rate, depending on the dataset used. CRpred also provides confidence values that allow selecting a subset of predictions with higher precision. The improved quality is due to newly designed features and careful parameterization of the SVM. The features incorporate amino acids characterized by the highest and the lowest propensities to constitute catalytic residues, Gly that provides flexibility for catalytic sites and sequence motifs characteristic to certain catalytic reactions. Our features indicate that catalytic residues are on average more conserved when compared with the general population of residues and that highly conserved amino acids characterized by high catalytic propensity are likely to form catalytic sites. We also show that local (with respect to the sequence) hydrophobicity contributes towards the prediction.
Collapse
Affiliation(s)
- Tuo Zhang
- College of Mathematical Science and LPMC, Nankai University, Tianjin, PRC
| | | | | | | | | | | |
Collapse
|
34
|
Fukushima K, Wada M, Sakurai M. An insight into the general relationship between the three dimensional structures of enzymes and their electronic wave functions: Implication for the prediction of functional sites of enzymes. Proteins 2008; 71:1940-54. [PMID: 18186466 DOI: 10.1002/prot.21865] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In this study, we explored the general relationship between the three-dimensional (3D) structures of enzymes and their electronic wave functions. Furthermore, we developed a method for the prediction of their functionally important sites. For this purpose, we first performed linear-scaling molecular orbital calculations for 112 nonredundant, non-homologous enzymes with known structure and function. In consequence, we showed that the canonical molecular orbitals (MOs) of the enzymes could be classified into three groups according to the degree of electron delocalization: highly localized orbitals (Group A), highly delocalized orbitals whose electrons are distributed over almost the whole molecule (Group B), and moderately delocalized orbitals (Group C). The MOs belonging to Group A are located near the HOMO-LUMO band gap, and thereby include the frontier orbitals of a given enzyme. We inferred that the MOs of Group B play a role in stabilizing the 3D structure of the enzyme, while those of Group C contribute to constructing the covalent bond framework of the enzyme. Next, we investigated whether the frontier orbitals of enzymes could be used for identifying their potential functional sites. As a result, we found that the frontier orbitals of the 112 enzymes have a high propensity to be colocalized with the known functional sites, especially when the enzymes are hydrated. Such a propensity is shown to be remarkable when Glu or Asp is a functional site residue. On the basis of these results, we finally propose a protocol for the prediction of functional sites of enzymes.
Collapse
Affiliation(s)
- K Fukushima
- Center for Biological Resources and Informatics, Tokyo Institute of Technology, Midori-ku, Yokohama 226-8501, Japan
| | | | | |
Collapse
|
35
|
Highly accurate method for ligand-binding site prediction in unbound state (apo) protein structures. Proteins 2008; 73:468-79. [DOI: 10.1002/prot.22067] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
36
|
Tong W, Williams RJ, Wei Y, Murga LF, Ko J, Ondrechen MJ. Enhanced performance in prediction of protein active sites with THEMATICS and support vector machines. Protein Sci 2007; 17:333-41. [PMID: 18096640 DOI: 10.1110/ps.073213608] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Theoretical microscopic titration curves (THEMATICS) is a computational method for the identification of active sites in proteins through deviations in computed titration behavior of ionizable residues. While the sensitivity to catalytic sites is high, the previously reported sensitivity to catalytic residues was not as high, about 50%. Here THEMATICS is combined with support vector machines (SVM) to improve sensitivity for catalytic residue prediction from protein 3D structure alone. For a test set of 64 proteins taken from the Catalytic Site Atlas (CSA), the average recall rate for annotated catalytic residues is 61%; good precision is maintained selecting only 4% of all residues. The average false positive rate, using the CSA annotations is only 3.2%, far lower than other 3D-structure-based methods. THEMATICS-SVM returns higher precision, lower false positive rate, and better overall performance, compared with other 3D-structure-based methods. Comparison is also made with the latest machine learning methods that are based on both sequence alignments and 3D structures. For annotated sets of well-characterized enzymes, THEMATICS-SVM performance compares very favorably with methods that utilize sequence homology. However, since THEMATICS depends only on the 3D structure of the query protein, no decline in performance is expected when applied to novel folds, proteins with few sequence homologues, or even orphan sequences. An extension of the method to predict non-ionizable catalytic residues is also presented. THEMATICS-SVM predicts a local network of ionizable residues with strong interactions between protonation events; this appears to be a special feature of enzyme active sites.
Collapse
Affiliation(s)
- Wenxu Tong
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts 02115, USA
| | | | | | | | | | | |
Collapse
|
37
|
Sterner B, Singh R, Berger B. Predicting and annotating catalytic residues: an information theoretic approach. J Comput Biol 2007; 14:1058-73. [PMID: 17887954 DOI: 10.1089/cmb.2007.0042] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
We introduce a computational method to predict and annotate the catalytic residues of a protein using only its sequence information, so that we describe both the residues' sequence locations (prediction) and their specific biochemical roles in the catalyzed reaction (annotation). While knowing the chemistry of an enzyme's catalytic residues is essential to understanding its function, the challenges of prediction and annotation have remained difficult, especially when only the enzyme's sequence and no homologous structures are available. Our sequence-based approach follows the guiding principle that catalytic residues performing the same biochemical function should have similar chemical environments; it detects specific conservation patterns near in sequence to known catalytic residues and accordingly constrains what combination of amino acids can be present near a predicted catalytic residue. We associate with each catalytic residue a short sequence profile and define a Kullback-Leibler (KL) distance measure between these profiles, which, as we show, effectively captures even subtle biochemical variations. We apply the method to the class of glycohydrolase enzymes. This class includes proteins from 96 families with very different sequences and folds, many of which perform important functions. In a cross-validation test, our approach correctly predicts the location of the enzymes' catalytic residues with a sensitivity of 80% at a specificity of 99.4%, and in a separate cross-validation we also correctly annotate the biochemical role of 80% of the catalytic residues. Our results compare favorably to existing methods. Moreover, our method is more broadly applicable because it relies on sequence and not structure information; it may, furthermore, be used in conjunction with structure-based methods.
Collapse
Affiliation(s)
- Beckett Sterner
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | | | | |
Collapse
|
38
|
Mistry J, Bateman A, Finn RD. Predicting active site residue annotations in the Pfam database. BMC Bioinformatics 2007; 8:298. [PMID: 17688688 PMCID: PMC2025603 DOI: 10.1186/1471-2105-8-298] [Citation(s) in RCA: 166] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2007] [Accepted: 08/09/2007] [Indexed: 12/03/2022] Open
Abstract
Background Approximately 5% of Pfam families are enzymatic, but only a small fraction of the sequences within these families (<0.5%) have had the residues responsible for catalysis determined. To increase the active site annotations in the Pfam database, we have developed a strict set of rules, chosen to reduce the rate of false positives, which enable the transfer of experimentally determined active site residue data to other sequences within the same Pfam family. Description We have created a large database of predicted active site residues. On comparing our active site predictions to those found in UniProtKB, Catalytic Site Atlas, PROSITE and MEROPS we find that we make many novel predictions. On investigating the small subset of predictions made by these databases that are not predicted by us, we found these sequences did not meet our strict criteria for prediction. We assessed the sensitivity and specificity of our methodology and estimate that only 3% of our predicted sequences are false positives. Conclusion We have predicted 606110 active site residues, of which 94% are not found in UniProtKB, and have increased the active site annotations in Pfam by more than 200 fold. Although implemented for Pfam, the tool we have developed for transferring the data can be applied to any alignment with associated experimental active site data and is available for download. Our active site predictions are re-calculated at each Pfam release to ensure they are comprehensive and up to date. They provide one of the largest available databases of active site annotation.
Collapse
Affiliation(s)
- Jaina Mistry
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Alex Bateman
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Robert D Finn
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
39
|
Sacquin-Mora S, Laforet E, Lavery R. Locating the active sites of enzymes using mechanical properties. Proteins 2007; 67:350-9. [PMID: 17311346 DOI: 10.1002/prot.21353] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We have applied the calculation of mechanical properties to a dataset of almost 100 enzymes to determine the extent to which catalytic residues have distinct properties. Specifically, we have calculated force constants describing the ease of moving any given amino acid residue with respect to the other residues in the protein. The results show that catalytic residues are invariably associated with high force constants. Choosing an appropriate cutoff enables the detection of roughly 80% of catalytic residues with only 25% of false positives. It is shown that neither multidomain structures, nor the presence or absence of bound ligands hinder successful detections. It is however noted that active sites near the protein surface are more difficult to detect and that non-catalytic, but structurally key residues may also exhibit high force constants.
Collapse
Affiliation(s)
- Sophie Sacquin-Mora
- Laboratoire de Biochimie Théorique, CNRS UPR 9080, Institut de Biologie Physico-Chimique, 13 rue Pierre et Marie Curie, 75005 Paris, France
| | | | | |
Collapse
|
40
|
Relating destabilizing regions to known functional sites in proteins. BMC Bioinformatics 2007; 8:141. [PMID: 17470296 PMCID: PMC1890302 DOI: 10.1186/1471-2105-8-141] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2006] [Accepted: 04/30/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Most methods for predicting functional sites in protein 3D structures, rely on information on related proteins and cannot be applied to proteins with no known relatives. Another limitation of these methods is the lack of a well annotated set of functional sites to use as benchmark for validating their predictions. Experimental findings and theoretical considerations suggest that residues involved in function often contribute unfavorably to the native state stability. We examine the possibility of systematically exploiting this intrinsic property to identify functional sites using an original procedure that detects destabilizing regions in protein structures. In addition, to relate destabilizing regions to known functional sites, a novel benchmark consisting of a diverse set of hand-curated protein functional sites is derived. RESULTS A procedure for detecting clusters of destabilizing residues in protein structures is presented. Individual residue contributions to protein stability are evaluated using detailed atomic models and a force-field successfully applied in computational protein design. The most destabilizing residues, and some of their closest neighbours, are clustered into destabilizing regions following a rigorous protocol. Our procedure is applied to high quality apo-structures of 63 unrelated proteins. The biologically relevant binding sites of these proteins were annotated using all available information, including structural data and literature curation, resulting in the largest hand-curated dataset of binding sites in proteins available to date. Comparing the destabilizing regions with the annotated binding sites in these proteins, we find that the overlap is on average limited, but significantly better than random. Results depend on the type of bound ligand. Significant overlap is obtained for most polysaccharide- and small ligand-binding sites, whereas no overlap is observed for most nucleic acid binding sites. These differences are rationalised in terms of the geometry and energetics of the binding site. CONCLUSION We find that although destabilizing regions as detected here can in general not be used to predict binding sites in protein structures, they can provide useful information, particularly on the location of functional sites that bind polysaccharides and small ligands. This information can be exploited in methods for predicting function in protein structures with no known relatives. Our publicly available benchmark of hand-curated functional sites in proteins should help other workers derive and validate new prediction methods.
Collapse
|
41
|
Kawabata T, Go N. Detection of pockets on protein surfaces using small and large probe spheres to find putative ligand binding sites. Proteins 2007; 68:516-29. [PMID: 17444522 DOI: 10.1002/prot.21283] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
One of the simplest ways to predict ligand binding sites is to identify pocket-shaped regions on the protein surface. Many programs have already been proposed to identify these pocket regions. Examination of their algorithms revealed that a pocket intrinsically has two arbitrary properties, "size" and "depth". We proposed a new definition for pockets using two explicit adjustable parameters that correspond to these two arbitrary properties. A pocket region is defined as a space into which a small probe can enter, but a large probe cannot. The radii of small and large probe spheres are the two parameters that correspond to the "size" and "depth" of the pockets, respectively. These values can be adjusted individual putative ligand molecule. To determine the optimal value of the large probe spheres radius, we generated pockets for thousands of protein structures in the database, using several size of large probe spheres, examined the correspondence of these pockets with known binding site positions. A new measure of shallowness, a minimum inaccessible radius, R(inaccess), indicated that binding sites of coenzymes are very deep, while those for adenine/guanine mononucleotide have only medium shallowness and those for short peptides and oligosaccharides are shallow. The optimal radius of large probe spheres was 3-4 A for the coenzymes, 4 A for adenine/guanine mononucleotides, and 5 A or more for peptides/oligosaccharides. Comparison of our program with two other popular pocket-finding programs showed that our program had a higher performance of detecting binding pockets, although it required more computational time.
Collapse
Affiliation(s)
- Takeshi Kawabata
- Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma, Nara, Japan.
| | | |
Collapse
|
42
|
Pettit FK, Bare E, Tsai A, Bowie JU. HotPatch: a statistical approach to finding biologically relevant features on protein surfaces. J Mol Biol 2007; 369:863-79. [PMID: 17451744 PMCID: PMC2034327 DOI: 10.1016/j.jmb.2007.03.036] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2006] [Revised: 03/10/2007] [Accepted: 03/15/2007] [Indexed: 10/23/2022]
Abstract
We describe a fully automated algorithm for finding functional sites on protein structures. Our method finds surface patches of unusual physicochemical properties on protein structures, and estimates the patches' probability of overlapping functional sites. Other methods for predicting the locations of specific types of functional sites exist, but in previous analyses, it has been difficult to compare methods when they are applied to different types of sites. Thus, we introduce a new statistical framework that enables rigorous comparisons of the usefulness of different physicochemical properties for predicting virtually any kind of functional site. The program's statistical models were trained for 11 individual properties (electrostatics, concavity, hydrophobicity, etc.) and for 15 neural network combination properties, all optimized and tested on 15 diverse protein functions. To simulate what to expect if the program were run on proteins of unknown function, as might arise from structural genomics, we tested it on 618 proteins of diverse mixed functions. In the higher-scoring top half of all predictions, a functional residue could typically be found within the first 1.7 residues chosen at random. The program may or may not use partial information about the protein's function type as an input, depending on which statistical model the user chooses to employ. If function type is used as an additional constraint, prediction accuracy usually increases, and is particularly good for enzymes, DNA-interacting sites, and oligomeric interfaces. The program can be accessed online (at http://hotpatch.mbi.ucla.edu).
Collapse
Affiliation(s)
- Frank K. Pettit
- UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, UCLA, Los Angeles, CA.
| | - Emiko Bare
- Department of Biology, Massachusettes Institute of Technology, Cambridge, MA.
| | - Albert Tsai
- Department of Biochemistry & Molecular Biology, Keck School of Medicine, University of Southern California, Los Angeles, CA.
| | - James U. Bowie
- Department of Chemistry and Biochemistry, UCLA, Los Angeles, CA.
| |
Collapse
|
43
|
Takahashi H, Arai M, Takenawa T, Sota H, Xie QH, Iwakura M. Stabilization of Hyperactive Dihydrofolate Reductase by Cyanocysteine-mediated Backbone Cyclization. J Biol Chem 2007; 282:9420-9429. [PMID: 17264073 DOI: 10.1074/jbc.m610983200] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Stabilization of an enzyme while maintaining its activity has been a major challenge in protein chemistry. Although it is difficult to simultaneously improve stability and activity of a protein by amino acid substitutions due to the activity-stability trade-off, backbone cyclization by connecting the N and C termini with a linker is promising as a general method of stabilizing a protein without affecting its activity. Recently, we created a hyperactive, methionine- and cysteine-free mutant of dihydrofolate reductase from Escherichia coli, called ANLYF, by introducing seven amino acid substitutions, which, however, destabilized the protein. Here we show that ANLYF is stabilized without a loss of its high activity by a novel backbone cyclization method for unprotected proteins. The method is based on the in vitro cyanocysteine-mediated intramolecular ligation reaction, which can be conducted with relatively high efficiency by a simple procedure and under mild conditions. We also show that the reversibility of thermal denaturation is highly improved by the cyclization. Thus, activity and stability of the protein can be separately improved by amino acid substitutions and backbone cyclization, respectively. We suggest that the cyanocysteine-mediated cyclization method is complementary to the intein-mediated cyclization method in stabilizing a protein without affecting its activity.
Collapse
Affiliation(s)
- Hisashi Takahashi
- Protein Design Research Group, Institute for Biological Resources and Functions, National Institute of Advanced Industrial Science and Technology, Tsukuba Central 6, 1-1-1 Higashi, Tsukuba, Ibaraki 305-8566, Japan
| | - Munehito Arai
- Protein Design Research Group, Institute for Biological Resources and Functions, National Institute of Advanced Industrial Science and Technology, Tsukuba Central 6, 1-1-1 Higashi, Tsukuba, Ibaraki 305-8566, Japan
| | - Tatsuyuki Takenawa
- Protein Design Research Group, Institute for Biological Resources and Functions, National Institute of Advanced Industrial Science and Technology, Tsukuba Central 6, 1-1-1 Higashi, Tsukuba, Ibaraki 305-8566, Japan
| | - Hiroyuki Sota
- Protein Design Research Group, Institute for Biological Resources and Functions, National Institute of Advanced Industrial Science and Technology, Tsukuba Central 6, 1-1-1 Higashi, Tsukuba, Ibaraki 305-8566, Japan
| | - Qui Hong Xie
- Protein Design Research Group, Institute for Biological Resources and Functions, National Institute of Advanced Industrial Science and Technology, Tsukuba Central 6, 1-1-1 Higashi, Tsukuba, Ibaraki 305-8566, Japan
| | - Masahiro Iwakura
- Protein Design Research Group, Institute for Biological Resources and Functions, National Institute of Advanced Industrial Science and Technology, Tsukuba Central 6, 1-1-1 Higashi, Tsukuba, Ibaraki 305-8566, Japan.
| |
Collapse
|
44
|
Youn E, Peters B, Radivojac P, Mooney SD. Evaluation of features for catalytic residue prediction in novel folds. PROTEIN SCIENCE : A PUBLICATION OF THE PROTEIN SOCIETY 2006. [PMID: 17189479 DOI: 10.1110/ps.062523907.] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 09/29/2022]
Abstract
Structural genomics projects are determining the three-dimensional structure of proteins without full characterization of their function. A critical part of the annotation process involves appropriate knowledge representation and prediction of functionally important residue environments. We have developed a method to extract features from sequence, sequence alignments, three-dimensional structure, and structural environment conservation, and used support vector machines to annotate homologous and nonhomologous residue positions based on a specific training set of residue functions. In order to evaluate this pipeline for automated protein annotation, we applied it to the challenging problem of prediction of catalytic residues in enzymes. We also ranked the features based on their ability to discriminate catalytic from noncatalytic residues. When applying our method to a well-annotated set of protein structures, we found that top-ranked features were a measure of sequence conservation, a measure of structural conservation, a degree of uniqueness of a residue's structural environment, solvent accessibility, and residue hydrophobicity. We also found that features based on structural conservation were complementary to those based on sequence conservation and that they were capable of increasing predictor performance. Using a family nonredundant version of the ASTRAL 40 v1.65 data set, we estimated that the true catalytic residues were correctly predicted in 57.0% of the cases, with a precision of 18.5%. When testing on proteins containing novel folds not used in training, the best features were highly correlated with the training on families, thus validating the approach to nonhomologous catalytic residue prediction in general. We then applied the method to 2781 coordinate files from the structural genomics target pipeline and identified both highly ranked and highly clustered groups of predicted catalytic residues.
Collapse
Affiliation(s)
- Eunseog Youn
- Center for Computational Biology and Bioinformatics, Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | | | | |
Collapse
|
45
|
Youn E, Peters B, Radivojac P, Mooney SD. Evaluation of features for catalytic residue prediction in novel folds. Protein Sci 2006; 16:216-26. [PMID: 17189479 PMCID: PMC2203287 DOI: 10.1110/ps.062523907] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Structural genomics projects are determining the three-dimensional structure of proteins without full characterization of their function. A critical part of the annotation process involves appropriate knowledge representation and prediction of functionally important residue environments. We have developed a method to extract features from sequence, sequence alignments, three-dimensional structure, and structural environment conservation, and used support vector machines to annotate homologous and nonhomologous residue positions based on a specific training set of residue functions. In order to evaluate this pipeline for automated protein annotation, we applied it to the challenging problem of prediction of catalytic residues in enzymes. We also ranked the features based on their ability to discriminate catalytic from noncatalytic residues. When applying our method to a well-annotated set of protein structures, we found that top-ranked features were a measure of sequence conservation, a measure of structural conservation, a degree of uniqueness of a residue's structural environment, solvent accessibility, and residue hydrophobicity. We also found that features based on structural conservation were complementary to those based on sequence conservation and that they were capable of increasing predictor performance. Using a family nonredundant version of the ASTRAL 40 v1.65 data set, we estimated that the true catalytic residues were correctly predicted in 57.0% of the cases, with a precision of 18.5%. When testing on proteins containing novel folds not used in training, the best features were highly correlated with the training on families, thus validating the approach to nonhomologous catalytic residue prediction in general. We then applied the method to 2781 coordinate files from the structural genomics target pipeline and identified both highly ranked and highly clustered groups of predicted catalytic residues.
Collapse
Affiliation(s)
- Eunseog Youn
- Center for Computational Biology and Bioinformatics, Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | | | | |
Collapse
|
46
|
Yao H, Mihalek I, Lichtarge O. Rank information: a structure-independent measure of evolutionary trace quality that improves identification of protein functional sites. Proteins 2006; 65:111-23. [PMID: 16894615 DOI: 10.1002/prot.21101] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Protein functional sites are key targets for drug design and protein engineering, but their large-scale experimental characterization remains difficult. The evolutionary trace (ET) is a computational approach to this problem that has been useful in a variety of case studies, but its proteomic scale application is partially hindered because automated retrieval of input sequences from databases often includes some with errors that degrade functional site identification. To recognize and purge these sequences, this study introduces a novel and structure-free measure of ET quality called rank information (RI). It is shown that RI decreases in response to errors in sequences, alignments, or functional classifications. Conversely, an automated procedure to increase RI by selectively removing sequences improves functional site identification so as to nearly match manually curated traces in kinases and in a test set of 79 diverse proteins. Thus we conclude that RI partially reflects the evolutionary consistency of sequence, structure, and function. In practice, as the size of the proteome continues to grow exponentially, it provides a novel and structure-free measure of ET quality that increases its accuracy for large-scale automated annotation of protein functional sites.
Collapse
Affiliation(s)
- Hui Yao
- Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine,Houston, Texas 77030, USA
| | | | | |
Collapse
|
47
|
Sánchez IE, Tejero J, Gómez-Moreno C, Medina M, Serrano L. Point mutations in protein globular domains: contributions from function, stability and misfolding. J Mol Biol 2006; 363:422-32. [PMID: 16978645 DOI: 10.1016/j.jmb.2006.08.020] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2006] [Revised: 07/25/2006] [Accepted: 08/08/2006] [Indexed: 11/25/2022]
Abstract
Several contrasting hypotheses have been formulated about the influence of functional and conformational properties, like stability and avoidance of misfolding, on the evolution of protein globular domains. Selection at functional sites has been suggested to be detrimental to stability or coupled to it. Avoidance of misfolding may be achieved by discarding misfolding-prone sequences or by maintaining a stable native state and thus destabilizing partially or fully unfolded states from which misfolding can take place. We have performed a hierarchical analysis of a large database of point mutations to dissect the relative contributions of function, stability and misfolding in the evolution of natural sequences. We show that at catalytic sites, selection for function overrules selection for stability but find no evidence for an anticorrelation between function and stability. Selection for stability plays a secondary role at binding sites, but is not fully coupled to selection for function. Remarkably, we did not find a selective pressure against misfolding-prone sequences in globular proteins at the level of individual positions. We suggest that such a selection would compromise native-state stability due to a correlation between the stabilities of native and misfolded states. Stabilization of the native state is the most frequent way in which natural proteins avoid misfolding.
Collapse
Affiliation(s)
- I E Sánchez
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany.
| | | | | | | | | |
Collapse
|
48
|
Liang S, Zhang C, Liu S, Zhou Y. Protein binding site prediction using an empirical scoring function. Nucleic Acids Res 2006; 34:3698-707. [PMID: 16893954 PMCID: PMC1540721 DOI: 10.1093/nar/gkl454] [Citation(s) in RCA: 194] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Most biological processes are mediated by interactions between proteins and their interacting partners including proteins, nucleic acids and small molecules. This work establishes a method called PINUP for binding site prediction of monomeric proteins. With only two weight parameters to optimize, PINUP produces not only 42.2% coverage of actual interfaces (percentage of correctly predicted interface residues in actual interface residues) but also 44.5% accuracy in predicted interfaces (percentage of correctly predicted interface residues in the predicted interface residues) in a cross validation using a 57-protein dataset. By comparison, the expected accuracy via random prediction (percentage of actual interface residues in surface residues) is only 15%. The binding sites of the 57-protein set are found to be easier to predict than that of an independent test set of 68 proteins. The average coverage and accuracy for this independent test set are 30.5 and 29.4%, respectively. The significant gain of PINUP over expected random prediction is attributed to (i) effective residue-energy score and accessible-surface-area-dependent interface-propensity, (ii) isolation of functional constraints contained in the conservation score from the structural constraints through the combination of residue-energy score (for structural constraints) and conservation score and (iii) a consensus region built on top-ranked initial patches.
Collapse
Affiliation(s)
| | | | | | - Yaoqi Zhou
- To whom correspondence should be addressed. Tel: +1 716 829 2985; Fax: +1 716 829 2344;
| |
Collapse
|
49
|
Petrova NV, Wu CH. Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties. BMC Bioinformatics 2006; 7:312. [PMID: 16790052 PMCID: PMC1534064 DOI: 10.1186/1471-2105-7-312] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2006] [Accepted: 06/21/2006] [Indexed: 11/17/2022] Open
Abstract
Background The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for functional prediction. Knowledge of catalytic sites provides a valuable insight into protein function. Although many computational methods have been developed to predict catalytic residues and active sites, their accuracy remains low, with a significant number of false positives. In this paper, we present a novel method for the prediction of catalytic sites, using a carefully selected, supervised machine learning algorithm coupled with an optimal discriminative set of protein sequence conservation and structural properties. Results To determine the best machine learning algorithm, 26 classifiers in the WEKA software package were compared using a benchmarking dataset of 79 enzymes with 254 catalytic residues in a 10-fold cross-validation analysis. Each residue of the dataset was represented by a set of 24 residue properties previously shown to be of functional relevance, as well as a label {+1/-1} to indicate catalytic/non-catalytic residue. The best-performing algorithm was the Sequential Minimal Optimization (SMO) algorithm, which is a Support Vector Machine (SVM). The Wrapper Subset Selection algorithm further selected seven of the 24 attributes as an optimal subset of residue properties, with sequence conservation, catalytic propensities of amino acids, and relative position on protein surface being the most important features. Conclusion The SMO algorithm with 7 selected attributes correctly predicted 228 of the 254 catalytic residues, with an overall predictive accuracy of more than 86%. Missing only 10.2% of the catalytic residues, the method captures the fundamental features of catalytic residues and can be used as a "catalytic residue filter" to facilitate experimental identification of catalytic residues for proteins with known structure but unknown function.
Collapse
Affiliation(s)
- Natalia V Petrova
- Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Cathy H Wu
- Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, USA
| |
Collapse
|
50
|
Yura K, Yamaguchi A, Go M. Coverage of whole proteome by structural genomics observed through protein homology modeling database. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2006; 7:65-76. [PMID: 17146617 PMCID: PMC1769342 DOI: 10.1007/s10969-006-9010-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2006] [Accepted: 08/08/2006] [Indexed: 11/07/2022]
Abstract
We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE ( http://daisy.nagahama-i-bio.ac.jp/Famsbase/ ), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics.
Collapse
Affiliation(s)
- Kei Yura
- Quantum Bioinformatics Team, Center for Computational Science and Engineering, Japan Atomic Energy Agency, Kyoto 619-0215, Japan.
| | | | | |
Collapse
|