Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Fetrow JS, Skolnick J. Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. J Mol Biol 1998;281:949-68. [PMID: 9719646 DOI: 10.1006/jmbi.1998.1993] [Citation(s) in RCA: 170] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

For:	Fetrow JS, Skolnick J. Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. J Mol Biol 1998;281:949-68. [PMID: 9719646 DOI: 10.1006/jmbi.1998.1993] [Citation(s) in RCA: 170] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Number

Cited by Other Article(s)

Derry A, Altman RB. COLLAPSE: A representation learning framework for identification and characterization of protein structural sites. Protein Sci 2023;32:e4541. [PMID: 36519247 PMCID: PMC9847082 DOI: 10.1002/pro.4541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 12/02/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022]

Riziotis IG, Thornton JM. Capturing the geometry, function, and evolution of enzymes with 3D templates. Protein Sci 2022;31:e4363. [PMID: 35762726 PMCID: PMC9207746 DOI: 10.1002/pro.4363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 05/06/2022] [Accepted: 05/14/2022] [Indexed: 11/05/2022]

Torng W, Altman RB. High precision protein functional site detection using 3D convolutional neural networks. Bioinformatics 2020;35:1503-1512. [PMID: 31051039 PMCID: PMC6499237 DOI: 10.1093/bioinformatics/bty813] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 08/14/2018] [Accepted: 09/19/2018] [Indexed: 12/02/2022] Open

Sagar A, Xue B. Recent Advances in Machine Learning Based Prediction of RNA-protein Interactions. Protein Pept Lett 2019;26:601-619. [PMID: 31215361 DOI: 10.2174/0929866526666190619103853] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 04/04/2019] [Accepted: 06/01/2019] [Indexed: 12/18/2022]

Han M, Song Y, Qian J, Ming D. Sequence-based prediction of physicochemical interactions at protein functional sites using a function-and-interaction-annotated domain profile database. BMC Bioinformatics 2018;19:204. [PMID: 29859055 PMCID: PMC5984826 DOI: 10.1186/s12859-018-2206-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 05/15/2018] [Indexed: 01/16/2023] Open

Fetrow JS, Babbitt PC. New computational approaches to understanding molecular protein function. PLoS Comput Biol 2018;14:e1005756. [PMID: 29621256 PMCID: PMC5886384 DOI: 10.1371/journal.pcbi.1005756] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Pradhan D, Padhy S, Sahoo B. Enzyme classification using multiclass support vector machine and feature subset selection. Comput Biol Chem 2017;70:211-219. [PMID: 28934693 DOI: 10.1016/j.compbiolchem.2017.08.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Revised: 07/15/2017] [Accepted: 08/15/2017] [Indexed: 10/19/2022]

Abstract

Proteins are the macromolecules responsible for almost all biological processes in a cell. With the availability of large number of protein sequences from different sequencing projects, the challenge with the scientist is to characterize their functions. As the wet lab methods are time consuming and expensive, many computational methods such as FASTA, PSI-BLAST, DNA microarray clustering, and Nearest Neighborhood classification on protein-protein interaction network have been proposed. Support vector machine is one such method that has been used successfully for several problems such as protein fold recognition, protein structure prediction etc. Cai et al. in 2003 have used SVM for classifying proteins into different functional classes and to predict their function. They used the physico-chemical properties of proteins to represent the protein sequences. In this paper a model comprising of feature subset selection followed by multiclass Support Vector Machine is proposed to determine the functional class of a newly generated protein sequence. To train and test the model for its performance, 32 physico-chemical properties of enzymes from 6 enzyme classes are considered. To determine the features that contribute significantly for functional classification, Sequential Forward Floating Selection (SFFS), Orthogonal Forward Selection (OFS), and SVM Recursive Feature Elimination (SVM-RFE) algorithms are used and it is observed that out of 32 properties considered initially, only 20 features are sufficient to classify the proteins into its functional classes with an accuracy ranging from 91% to 94%. On comparison it is seen that, OFS followed by SVM performs better than other methods. Our model generalizes the existing model to include multiclass classification and to identify most significant features affecting the protein function.

Collapse

Jamroz M, Kolinski A, Kihara D. Ensemble-based evaluation for protein structure models. Bioinformatics 2016;32:i314-i321. [PMID: 27307633 PMCID: PMC4908346 DOI: 10.1093/bioinformatics/btw262] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Kaiser F, Eisold A, Labudde D. A Novel Algorithm for Enhanced Structural Motif Matching in Proteins. J Comput Biol 2015;22:698-713. [PMID: 25695840 DOI: 10.1089/cmb.2014.0263] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Buturovic L, Wong M, Tang GW, Altman RB, Petkovic D. High precision prediction of functional sites in protein structures. PLoS One 2014;9:e91240. [PMID: 24632601 PMCID: PMC3954699 DOI: 10.1371/journal.pone.0091240] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2013] [Accepted: 02/11/2014] [Indexed: 11/29/2022] Open

Yu D, Kim M, Xiao G, Hwang TH. Review of biological network data and its applications. Genomics Inform 2013;11:200-10. [PMID: 24465231 PMCID: PMC3897847 DOI: 10.5808/gi.2013.11.4.200] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Revised: 11/20/2013] [Accepted: 11/21/2013] [Indexed: 12/16/2022] Open

Van Voorst JR, Finzel BC. Searching for likeness in a database of macromolecular complexes. J Chem Inf Model 2013;53:2634-47. [PMID: 24047445 DOI: 10.1021/ci4002537] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Structure prediction of partial-length protein sequences. Int J Mol Sci 2013;14:14892-907. [PMID: 23867606 PMCID: PMC3742278 DOI: 10.3390/ijms140714892] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Revised: 07/01/2013] [Accepted: 07/02/2013] [Indexed: 12/17/2022] Open

Kirshner DA, Nilmeier JP, Lightstone FC. Catalytic site identification--a web server to identify catalytic site structural matches throughout PDB. Nucleic Acids Res 2013;41:W256-65. [PMID: 23680785 PMCID: PMC3692059 DOI: 10.1093/nar/gkt403] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Manoharan M, Sankar K, Offmann B, Ramanathan S. Association of Putative Members to Family of Mosquito Odorant Binding Proteins: Scoring Scheme Using Fuzzy Functional Templates and Cys Residue Positions. Bioinform Biol Insights 2013;7:231-51. [PMID: 23908587 PMCID: PMC3728099 DOI: 10.4137/bbi.s11096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Anand P, Yeturu K, Chandra N. PocketAnnotate: towards site-based function annotation. Nucleic Acids Res 2012;40:W400-8. [PMID: 22618878 PMCID: PMC3394344 DOI: 10.1093/nar/gks421] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Fomenko DE, Gladyshev VN. Comparative genomics of thiol oxidoreductases reveals widespread and essential functions of thiol-based redox control of cellular processes. Antioxid Redox Signal 2012;16:193-201. [PMID: 21902454 PMCID: PMC3234660 DOI: 10.1089/ars.2011.3980] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

Tang GW, Altman RB. Remote thioredoxin recognition using evolutionary conservation and structural dynamics. Structure 2011;19:461-70. [PMID: 21481770 DOI: 10.1016/j.str.2011.02.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2010] [Revised: 02/06/2011] [Accepted: 02/16/2011] [Indexed: 12/25/2022]

Kato T, Nagano N. Discriminative structural approaches for enzyme active-site prediction. BMC Bioinformatics 2011;12 Suppl 1:S49. [PMID: 21342581 PMCID: PMC3044306 DOI: 10.1186/1471-2105-12-s1-s49] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Horst JA, Samudrala R. A protein sequence meta-functional signature for calcium binding residue prediction. Pattern Recognit Lett 2010;31:2103-2112. [PMID: 20824111 PMCID: PMC2932634 DOI: 10.1016/j.patrec.2010.04.012] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Kato T, Nagano N. Metric learning for enzyme active-site search. ACTA ACUST UNITED AC 2010;26:2698-704. [PMID: 20870642 PMCID: PMC2958746 DOI: 10.1093/bioinformatics/btq519] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Li GH, Huang JF. CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation. BMC Bioinformatics 2010;11:439. [PMID: 20796320 PMCID: PMC2936402 DOI: 10.1186/1471-2105-11-439] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2009] [Accepted: 08/27/2010] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB), thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been developed to predict protein function, but these methods need to be improved further, such as, enhancing the accuracy, sensitivity, and the computational speed. Here, an accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm), has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods.

RESULTS

The evaluation of CMASA shows that the CMASA is highly accurate (0.96), sensitive (0.86), and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. Comparing to other local structure comparison-based methods, the CMASA can obtain the better performance than both FFF (a method using geometry to predict protein function) and SPASM (a local structure alignment method); and the CMASA is more sensitive than PINTS and is more accurate than JESS (both are local structure alignment methods). The CMASA was applied to annotate the enzyme catalytic sites of the non-redundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA).

CONCLUSIONS

The CMASA is an accurate algorithm for detecting local protein structural similarity, and it holds several advantages in predicting enzyme active sites. The CMASA can be used in large-scale enzyme active site annotation. The CMASA can be available by the mail-based server (http://159.226.149.45/other1/CMASA/CMASA.htm).

Collapse

Bandyopadhyay D, Huan J, Liu J, Prins J, Snoeyink J, Wang W, Tropsha A. Functional neighbors: inferring relationships between nonhomologous protein families using family-specific packing motifs. ACTA ACUST UNITED AC 2010;14:1137-43. [PMID: 20570776 DOI: 10.1109/titb.2010.2053550] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Molecular surface mesh generation by filtering electron density map. Int J Biomed Imaging 2010;2010:923780. [PMID: 20414352 PMCID: PMC2856016 DOI: 10.1155/2010/923780] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Revised: 11/23/2009] [Accepted: 01/06/2010] [Indexed: 11/17/2022] Open

Vacic V, Iakoucheva LM, Lonardi S, Radivojac P. Graphlet kernels for prediction of functional residues in protein structures. J Comput Biol 2010;17:55-72. [PMID: 20078397 DOI: 10.1089/cmb.2009.0029] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Cammer S, Carter CW. Six Rossmannoid folds, including the Class I aminoacyl-tRNA synthetases, share a partial core with the anti-codon-binding domain of a Class II aminoacyl-tRNA synthetase. Bioinformatics 2010;26:709-14. [PMID: 20130031 DOI: 10.1093/bioinformatics/btq039] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Abstract

MOTIVATION

Similarities in core residue packing provide evidence for divergence or convergence not reported using other methods.

RESULTS

We apply a new method for rapid structure comparison based on Simplicial Neighborhood Analysis of Protein Packing (SNAPP) to the diverse structural classification of proteins (SCOP) alpha/beta-class of protein folds. The procedure identifies inter-residue packing motifs shared by protein pairs from different folds. A threshold of 0.67 A RMSD for all atoms of corresponding residues ensures inclusion of only highly significant similarities comparable with those observed for identical catalytic residues in homologues. Many tertiary packing motifs are shared among the three classical Rossmannoid folds, as well as thousands of other motifs that occur in at least two distinct folds. Merging of neighboring packing motifs facilitated recognition of larger, recurrent substructures or cores. The anti-codon-binding domain of an archeal aminoacyl-tRNA synthetase (aaRS) was discovered to possess a packed core in which eight identical amino acid residues are within 0.55 A RMSD of the comparable structure in the FixJ receiver, a member of the Rossmannoid family that also includes the CheY signaling protein and flavodoxin-like proteins. Further investigation identified close variants of this core in five other Rossmannoid folds, including a functionally relevant core in Class Ia aminoacyl-tRNA synthetases. Although it is possible that the two essentially identical cores in the ProRS anti-codon-binding domain and the FixJ receiver converged to the same structure, the consensus core obtained from the structural and sequence alignments suggests that all the implicated protein folds descended from a simpler ancestral protein in which this core provided nucleotide binding and proto-allosteric functions.

AVAILABILITY

Programs are available at http://staff.vbi.vt.edu/cammer/snapp/download/

IMPLEMENTATION

Programs were written in Perl and c and run under Linux.

CONTACT

cammer@vbi.vt.edu.

Collapse

Sankararaman S, Sha F, Kirsch JF, Jordan MI, Sjölander K. Active site prediction using evolutionary and structural information. ACTA ACUST UNITED AC 2010;26:617-24. [PMID: 20080507 PMCID: PMC2828116 DOI: 10.1093/bioinformatics/btq008] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Giard J, Ambroise J, Gala JL, Macq B. Regression applied to protein binding site prediction and comparison with classification. BMC Bioinformatics 2009;10:276. [PMID: 19728868 PMCID: PMC2749839 DOI: 10.1186/1471-2105-10-276] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 09/03/2009] [Indexed: 11/13/2022] Open

Abstract

Background

The structural genomics centers provide hundreds of protein structures of unknown function. Therefore, developing methods enabling the determination of a protein function automatically is imperative. The determination of a protein function can be achieved by studying the network of its physical interactions. In this context, identifying a potential binding site between proteins is of primary interest. In the literature, methods for predicting a potential binding site location generally are based on classification tools. The aim of this paper is to show that regression tools are more efficient than classification tools for patches based binding site predictors. For this purpose, we developed a patches based binding site localization method usable with either regression or classification tools.

Results

We compared predictive performances of regression tools with performances of machine learning classifiers. Using leave-one-out cross-validation, we showed that regression tools provide better predictions than classification ones. Among regression tools, Multilayer Perceptron ranked highest in the quality of predictions. We compared also the predictive performance of our patches based method using Multilayer Perceptron with the performance of three other methods usable through a web server. Our method performed similarly to the other methods.

Conclusion

Regression is more efficient than classification when applied to our binding site localization method. When it is possible, using regression instead of classification for other existing binding site predictors will probably improve results. Furthermore, the method presented in this work is flexible because the size of the predicted binding site is adjustable. This adaptability is useful when either false positive or negative rates have to be limited.

Collapse

Kelley LA, Shrimpton PJ, Muggleton SH, Sternberg MJE. Discovering rules for protein-ligand specificity using support vector inductive logic programming. Protein Eng Des Sel 2009;22:561-7. [PMID: 19574295 DOI: 10.1093/protein/gzp035] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

Structural genomics initiatives are rapidly generating vast numbers of protein structures. Comparative modelling is also capable of producing accurate structural models for many protein sequences. However, for many of the known structures, functions are not yet determined, and in many modelling tasks, an accurate structural model does not necessarily tell us about function. Thus, there is a pressing need for high-throughput methods for determining function from structure. The spatial arrangement of key amino acids in a folded protein, on the surface or buried in clefts, is often the determinants of its biological function. A central aim of molecular biology is to understand the relationship between such substructures or surfaces and biological function, leading both to function prediction and to function design. We present a new general method for discovering the features of binding pockets that confer specificity for particular ligands. Using a recently developed machine-learning technique which couples the rule-discovery approach of inductive logic programming with the statistical learning power of support vector machines, we are able to discriminate, with high precision (90%) and recall (86%) between pockets that bind FAD and those that bind NAD on a large benchmark set given only the geometry and composition of the backbone of the binding pocket without the use of docking. In addition, we learn rules governing this specificity which can feed into protein functional design protocols. An analysis of the rules found suggests that key features of the binding pocket may be tied to conformational freedom in the ligand. The representation is sufficiently general to be applicable to any discriminatory binding problem. All programs and data sets are freely available to non-commercial users at http://www.sbg.bio.ic.ac.uk/svilp_ligand/.

Collapse

Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development. J Comput Aided Mol Des 2009;23:773-84. [DOI: 10.1007/s10822-009-9273-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2008] [Accepted: 04/15/2009] [Indexed: 12/12/2022]

Tang K, Pugalenthi G, Suganthan PN, Lanczycki CJ, Chakrabarti S. Prediction of functionally important sites from protein sequences using sparse kernel least squares classifiers. Biochem Biophys Res Commun 2009;384:155-9. [PMID: 19394310 DOI: 10.1016/j.bbrc.2009.04.096] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2009] [Accepted: 04/20/2009] [Indexed: 11/25/2022]

Skolnick J, Brylinski M. FINDSITE: a combined evolution/structure-based approach to protein function prediction. Brief Bioinform 2009;10:378-91. [PMID: 19324930 DOI: 10.1093/bib/bbp017] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Dunker AK, Oldfield CJ, Meng J, Romero P, Yang JY, Chen JW, Vacic V, Obradovic Z, Uversky VN. The unfoldomics decade: an update on intrinsically disordered proteins. BMC Genomics 2008;9 Suppl 2:S1. [PMID: 18831774 PMCID: PMC2559873 DOI: 10.1186/1471-2164-9-s2-s1] [Citation(s) in RCA: 386] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open

Abstract

BACKGROUND

Our first predictor of protein disorder was published just over a decade ago in the Proceedings of the IEEE International Conference on Neural Networks (Romero P, Obradovic Z, Kissinger C, Villafranca JE, Dunker AK (1997) Identifying disordered regions in proteins from amino acid sequence. Proceedings of the IEEE International Conference on Neural Networks, 1: 90-95). By now more than twenty other laboratory groups have joined the efforts to improve the prediction of protein disorder. While the various prediction methodologies used for protein intrinsic disorder resemble those methodologies used for secondary structure prediction, the two types of structures are entirely different. For example, the two structural classes have very different dynamic properties, with the irregular secondary structure class being much less mobile than the disorder class. The prediction of secondary structure has been useful. On the other hand, the prediction of intrinsic disorder has been revolutionary, leading to major modifications of the more than 100 year-old views relating protein structure and function. Experimentalists have been providing evidence over many decades that some proteins lack fixed structure or are disordered (or unfolded) under physiological conditions. In addition, experimentalists are also showing that, for many proteins, their functions depend on the unstructured rather than structured state; such results are in marked contrast to the greater than hundred year old views such as the lock and key hypothesis. Despite extensive data on many important examples, including disease-associated proteins, the importance of disorder for protein function has been largely ignored. Indeed, to our knowledge, current biochemistry books don't present even one acknowledged example of a disorder-dependent function, even though some reports of disorder-dependent functions are more than 50 years old. The results from genome-wide predictions of intrinsic disorder and the results from other bioinformatics studies of intrinsic disorder are demanding attention for these proteins.

RESULTS

Disorder prediction has been important for showing that the relatively few experimentally characterized examples are members of a very large collection of related disordered proteins that are wide-spread over all three domains of life. Many significant biological functions are now known to depend directly on, or are importantly associated with, the unfolded or partially folded state. Here our goal is to review the key discoveries and to weave these discoveries together to support novel approaches for understanding sequence-function relationships.

CONCLUSION

Intrinsically disordered protein is common across the three domains of life, but especially common among the eukaryotic proteomes. Signaling sequences and sites of posttranslational modifications are frequently, or very likely most often, located within regions of intrinsic disorder. Disorder-to-order transitions are coupled with the adoption of different structures with different partners. Also, the flexibility of intrinsic disorder helps different disordered regions to bind to a common binding site on a common partner. Such capacity for binding diversity plays important roles in both protein-protein interaction networks and likely also in gene regulation networks. Such disorder-based signaling is further modulated in multicellular eukaryotes by alternative splicing, for which such splicing events map to regions of disorder much more often than to regions of structure. Associating alternative splicing with disorder rather than structure alleviates theoretical and experimentally observed problems associated with the folding of different length, isomeric amino acid sequences. The combination of disorder and alternative splicing is proposed to provide a mechanism for easily "trying out" different signaling pathways, thereby providing the mechanism for generating signaling diversity and enabling the evolution of cell differentiation and multicellularity. Finally, several recent small molecules of interest as potential drugs have been shown to act by blocking protein-protein interactions based on intrinsic disorder of one of the partners. Study of these examples has led to a new approach for drug discovery, and bioinformatics analysis of the human proteome suggests that various disease-associated proteins are very rich in such disorder-based drug discovery targets.

Collapse

Dunker AK, Oldfield CJ, Meng J, Romero P, Yang JY, Chen JW, Vacic V, Obradovic Z, Uversky VN. The unfoldomics decade: an update on intrinsically disordered proteins. BMC Genomics 2008. [PMID: 18831774 DOI: 10.1186/1471‐2164‐9‐s2‐s1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

RESULTS

CONCLUSION

Collapse

Halperin I, Glazer DS, Wu S, Altman RB. The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications. BMC Genomics 2008;9 Suppl 2:S2. [PMID: 18831785 PMCID: PMC2559884 DOI: 10.1186/1471-2164-9-s2-s2] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

Watanabe RLA, Morett E, Vallejo EE. Inferring modules of functionally interacting proteins using the Bond Energy Algorithm. BMC Bioinformatics 2008;9:285. [PMID: 18559112 PMCID: PMC2474619 DOI: 10.1186/1471-2105-9-285] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2008] [Accepted: 06/17/2008] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Non-homology based methods such as phylogenetic profiles are effective for predicting functional relationships between proteins with no considerable sequence or structure similarity. Those methods rely heavily on traditional similarity metrics defined on pairs of phylogenetic patterns. Proteins do not exclusively interact in pairs as the final biological function of a protein in the cellular context is often hold by a group of proteins. In order to accurately infer modules of functionally interacting proteins, the consideration of not only direct but also indirect relationships is required. In this paper, we used the Bond Energy Algorithm (BEA) to predict functionally related groups of proteins. With BEA we create clusters of phylogenetic profiles based on the associations of the surrounding elements of the analyzed data using a metric that considers linked relationships among elements in the data set.

RESULTS

Using phylogenetic profiles obtained from the Cluster of Orthologous Groups of Proteins (COG) database, we conducted a series of clustering experiments using BEA to predict (upper level) relationships between profiles. We evaluated our results by comparing with COG's functional categories, And even more, with the experimentally determined functional relationships between proteins provided by the DIP and ECOCYC databases. Our results demonstrate that BEA is capable of predicting meaningful modules of functionally related proteins. BEA outperforms traditionally used clustering methods, such as k-means and hierarchical clustering by predicting functional relationships between proteins with higher accuracy.

CONCLUSION

This study shows that the linked relationships of phylogenetic profiles obtained by BEA is useful for detecting functional associations between profiles and extending functional modules not found by traditional methods. BEA is capable of detecting relationship among phylogenetic patterns by linking them through a common element shared in a group. Additionally, we discuss how the proposed method may become more powerful if other criteria to classify different levels of protein functional interactions, as gene neighborhood or protein fusion information, is provided.

Collapse

Fetrow JS. Active site profiling to identify protein functional sites in sequences and structures using the Deacon Active Site Profiler (DASP). CURRENT PROTOCOLS IN BIOINFORMATICS 2008;Chapter 8:8.10.1-8.10.16. [PMID: 18428769 DOI: 10.1002/0471250953.bi0810s14] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Seo JH, Park HY, Kim J, Lee BS, Kim BG. Exploring sequence space: Profile analysis and protein-ligand docking to screen ω-aminotransferases with expanded substrate specificity. Biotechnol J 2008;3:676-86. [DOI: 10.1002/biot.200700264] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Goyal K, Mande SC. Exploiting 3D structural templates for detection of metal-binding sites in protein structures. Proteins 2008;70:1206-18. [PMID: 17847089 DOI: 10.1002/prot.21601] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Lanczycki CJ, Chakrabarti S. A tool for the prediction of functionally important sites in proteins using a library of functional templates. Bioinformation 2008;2:279-83. [PMID: 18478080 PMCID: PMC2374371 DOI: 10.6026/97320630002279] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2008] [Accepted: 02/11/2008] [Indexed: 11/23/2022] Open

Wu S, Liang MP, Altman RB. The SeqFEATURE library of 3D functional site models: comparison to existing methods and applications to protein function annotation. Genome Biol 2008;9:R8. [PMID: 18197987 PMCID: PMC2395245 DOI: 10.1186/gb-2008-9-1-r8] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2007] [Revised: 11/21/2007] [Accepted: 01/16/2008] [Indexed: 11/10/2022] Open

A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation. Proc Natl Acad Sci U S A 2007;105:129-34. [PMID: 18165317 DOI: 10.1073/pnas.0707684105] [Citation(s) in RCA: 240] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Ebert JC, Altman RB. Robust recognition of zinc binding sites in proteins. Protein Sci 2007;17:54-65. [PMID: 18042678 DOI: 10.1110/ps.073138508] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Su D, Berndt C, Fomenko DE, Holmgren A, Gladyshev VN. A Conservedcis-Proline Precludes Metal Binding by the Active Site Thiolates in Members of the Thioredoxin Family of Proteins†. Biochemistry 2007;46:6903-10. [PMID: 17503777 DOI: 10.1021/bi700152b] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Capra JA, Singh M. Predicting functionally important residues from sequence conservation. Bioinformatics 2007;23:1875-82. [PMID: 17519246 DOI: 10.1093/bioinformatics/btm270] [Citation(s) in RCA: 465] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Mirkovic N, Li Z, Parnassa A, Murray D. Strategies for high-throughput comparative modeling: applications to leverage analysis in structural genomics and protein family organization. Proteins 2007;66:766-77. [PMID: 17154423 DOI: 10.1002/prot.21191] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Selective prediction of interaction sites in protein structures with THEMATICS. BMC Bioinformatics 2007;8:119. [PMID: 17419878 PMCID: PMC1877815 DOI: 10.1186/1471-2105-8-119] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2006] [Accepted: 04/09/2007] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Methods are now available for the prediction of interaction sites in protein 3D structures. While many of these methods report high success rates for site prediction, often these predictions are not very selective and have low precision. Precision in site prediction is addressed using Theoretical Microscopic Titration Curves (THEMATICS), a simple computational method for the identification of active sites in enzymes. Recall and precision are measured and compared with other methods for the prediction of catalytic sites.

RESULTS

Using a test set of 169 enzymes from the original Catalytic Residue Dataset (CatRes) it is shown that THEMATICS can deliver precise, localised site predictions. Furthermore, adjustment of the cut-off criteria can improve the recall rates for catalytic residues with only a small sacrifice in precision. Recall rates for CatRes/CSA annotated catalytic residues are 41.1%, 50.4%, and 54.2% for Z score cut-off values of 1.00, 0.99, and 0.98, respectively. The corresponding precision rates are 19.4%, 17.9%, and 16.4%. The success rate for catalytic sites is higher, with correct or partially correct predictions for 77.5%, 85.8%, and 88.2% of the enzymes in the test set, corresponding to the same respective Z score cut-offs, if only the CatRes annotations are used as the reference set. Incorporation of additional literature annotations into the reference set gives total success rates of 89.9%, 92.9%, and 94.1%, again for corresponding cut-off values of 1.00, 0.99, and 0.98. False positive rates for a 75-protein test set are 1.95%, 2.60%, and 3.12% for Z score cut-offs of 1.00, 0.99, and 0.98, respectively.

CONCLUSION

With a preferred cut-off value of 0.99, THEMATICS achieves a high success rate of interaction site prediction, about 86% correct or partially correct using CatRes/CSA annotations only and about 93% with an expanded reference set. Success rates for catalytic residue prediction are similar to those of other structure-based methods, but with substantially better precision and lower false positive rates. THEMATICS performs well across the spectrum of E.C. classes. The method requires only the structure of the query protein as input. THEMATICS predictions may be obtained via the web from structures in PDB format at: http://pfweb.chem.neu.edu/thematics/submit.html.

Collapse

Pettit FK, Bare E, Tsai A, Bowie JU. HotPatch: a statistical approach to finding biologically relevant features on protein surfaces. J Mol Biol 2007;369:863-79. [PMID: 17451744 PMCID: PMC2034327 DOI: 10.1016/j.jmb.2007.03.036] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2006] [Revised: 03/10/2007] [Accepted: 03/15/2007] [Indexed: 10/23/2022]

Huff RG, Bayram E, Tan H, Knutson ST, Knaggs MH, Richon AB, Santago P, Fetrow JS. Chemical and structural diversity in cyclooxygenase protein active sites. Chem Biodivers 2007;2:1533-52. [PMID: 17191953 DOI: 10.1002/cbdv.200590125] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Chakrabarti S, Lanczycki CJ. Analysis and prediction of functionally important sites in proteins. Protein Sci 2007;16:4-13. [PMID: 17192586 PMCID: PMC2222836 DOI: 10.1110/ps.062506407] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]