Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Rost B. Enzyme function less conserved than anticipated. J Mol Biol 2002;318:595-608. [PMID: 12051862 DOI: 10.1016/s0022-2836(02)00016-5] [Citation(s) in RCA: 250] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Number

Cited by Other Article(s)

151

Wrzeszczynski KO, Rost B. Cell cycle kinases predicted from conserved biophysical properties. Proteins 2009;74:655-68. [PMID: 18704950 DOI: 10.1002/prot.22181] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

152

Hawkins T, Chitale M, Luban S, Kihara D. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 2009;74:566-82. [PMID: 18655063 DOI: 10.1002/prot.22172] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Abstract

Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http://dragon.bio.purdue.edu/pfp/.

Collapse

153

Standley DM, Toh H, Nakamura H. Functional annotation by sequence-weighted structure alignments: statistical analysis and case studies from the Protein 3000 structural genomics project in Japan. Proteins 2009;72:1333-51. [PMID: 18384072 DOI: 10.1002/prot.22015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

154

Shevchenko A, Valcu CM, Junqueira M. Tools for exploring the proteomosphere. J Proteomics 2009;72:137-44. [PMID: 19167528 DOI: 10.1016/j.jprot.2009.01.012] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2009] [Accepted: 01/13/2009] [Indexed: 11/29/2022]

155

Tseng YY, Dundas J, Liang J. Predicting protein function and binding profile via matching of local evolutionary and geometric surface patterns. J Mol Biol 2009;387:451-64. [PMID: 19154742 DOI: 10.1016/j.jmb.2008.12.072] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2008] [Revised: 12/19/2008] [Accepted: 12/23/2008] [Indexed: 11/25/2022]

Abstract

Inferring protein functions from structures is a challenging task, as a large number of orphan protein structures from structural genomics project are now solved without their biochemical functions characterized. For proteins binding to similar substrates or ligands and carrying out similar functions, their binding surfaces are under similar physicochemical constraints, and hence the sets of allowed and forbidden residue substitutions are similar. However, it is difficult to isolate such selection pressure due to protein function from selection pressure due to protein folding, and evolutionary relationship reflected by global sequence and structure similarities between proteins is often unreliable for inferring protein function. We have developed a method, called pevoSOAR (pocket-based evolutionary search of amino acid residues), for predicting protein functions by solving the problem of uncovering amino acids residue substitution pattern due to protein function and separating it from amino acids substitution pattern due to protein folding. We incorporate evolutionary information specific to an individual binding region and match local surfaces on a large scale with millions of precomputed protein surfaces to identify those with similar functions. Our pevoSOAR method also generates a probablistic model called the computed binding a profile that characterizes protein-binding activities that may involve multiple substrates or ligands. We show that our method can be used to predict enzyme functions with accuracy. Our method can also assess enzyme binding specificity and promiscuity. In an objective large-scale test of 100 enzyme families with thousands of structures, our predictions are found to be sensitive and specific: At the stringent specificity level of 99.98%, we can correctly predict enzyme functions for 80.55% of the proteins. The overall area under the receiver operating characteristic curve measuring the performance of our prediction is 0.955, close to the perfect value of 1.00. The best Matthews coefficient is 86.6%. Our method also works well in predicting the biochemical functions of orphan proteins from structural genomics projects.

Collapse

156

Syed U, Yona G. Enzyme function prediction with interpretable models. Methods Mol Biol 2009;541:373-420. [PMID: 19381539 DOI: 10.1007/978-1-59745-243-4_17] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]

157

Addou S, Rentzsch R, Lee D, Orengo CA. Domain-based and family-specific sequence identity thresholds increase the levels of reliable protein function transfer. J Mol Biol 2008;387:416-30. [PMID: 19135455 DOI: 10.1016/j.jmb.2008.12.045] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2008] [Revised: 12/12/2008] [Accepted: 12/17/2008] [Indexed: 11/24/2022]

158

Punta M, Ofran Y. The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS Comput Biol 2008;4:e1000160. [PMID: 18974821 PMCID: PMC2518264 DOI: 10.1371/journal.pcbi.1000160] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

159

Mazumder R, Vasudevan S. Structure-guided comparative analysis of proteins: principles, tools, and applications for predicting function. PLoS Comput Biol 2008;4:e1000151. [PMID: 18818720 PMCID: PMC2515338 DOI: 10.1371/journal.pcbi.1000151] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

160

Karimpour-Fard A, Leach SM, Gill RT, Hunter LE. Predicting protein linkages in bacteria: which method is best depends on task. BMC Bioinformatics 2008;9:397. [PMID: 18816389 PMCID: PMC2570368 DOI: 10.1186/1471-2105-9-397] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2008] [Accepted: 09/24/2008] [Indexed: 01/06/2023] Open

Abstract

Background

Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations.

Results

Using Escherichia coli K12 and Bacillus subtilis, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in E. coli K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in E. coli K12 and 88% (333/418)in B. subtilis. Comparing two versions of the E. coli K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction.

Conclusion

A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.

Collapse

161

Redfern OC, Dessailly B, Orengo CA. Exploring the structure and function paradigm. Curr Opin Struct Biol 2008;18:394-402. [PMID: 18554899 DOI: 10.1016/j.sbi.2008.05.007] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2008] [Revised: 04/16/2008] [Accepted: 05/07/2008] [Indexed: 11/29/2022]

162

Gómez A, Cedano J, Espadaler J, Hermoso A, Piñol J, Querol E. Prediction of protein function improving sequence remote alignment search by a fuzzy logic algorithm. Protein J 2008;27:130-9. [PMID: 18066655 DOI: 10.1007/s10930-007-9116-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

163

Local function conservation in sequence and structure space. PLoS Comput Biol 2008;4:e1000105. [PMID: 18604264 PMCID: PMC2427199 DOI: 10.1371/journal.pcbi.1000105] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2007] [Accepted: 05/28/2008] [Indexed: 11/19/2022] Open

164

Watanabe RLA, Morett E, Vallejo EE. Inferring modules of functionally interacting proteins using the Bond Energy Algorithm. BMC Bioinformatics 2008;9:285. [PMID: 18559112 PMCID: PMC2474619 DOI: 10.1186/1471-2105-9-285] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2008] [Accepted: 06/17/2008] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Non-homology based methods such as phylogenetic profiles are effective for predicting functional relationships between proteins with no considerable sequence or structure similarity. Those methods rely heavily on traditional similarity metrics defined on pairs of phylogenetic patterns. Proteins do not exclusively interact in pairs as the final biological function of a protein in the cellular context is often hold by a group of proteins. In order to accurately infer modules of functionally interacting proteins, the consideration of not only direct but also indirect relationships is required. In this paper, we used the Bond Energy Algorithm (BEA) to predict functionally related groups of proteins. With BEA we create clusters of phylogenetic profiles based on the associations of the surrounding elements of the analyzed data using a metric that considers linked relationships among elements in the data set.

RESULTS

Using phylogenetic profiles obtained from the Cluster of Orthologous Groups of Proteins (COG) database, we conducted a series of clustering experiments using BEA to predict (upper level) relationships between profiles. We evaluated our results by comparing with COG's functional categories, And even more, with the experimentally determined functional relationships between proteins provided by the DIP and ECOCYC databases. Our results demonstrate that BEA is capable of predicting meaningful modules of functionally related proteins. BEA outperforms traditionally used clustering methods, such as k-means and hierarchical clustering by predicting functional relationships between proteins with higher accuracy.

CONCLUSION

This study shows that the linked relationships of phylogenetic profiles obtained by BEA is useful for detecting functional associations between profiles and extending functional modules not found by traditional methods. BEA is capable of detecting relationship among phylogenetic patterns by linking them through a common element shared in a group. Additionally, we discuss how the proposed method may become more powerful if other criteria to classify different levels of protein functional interactions, as gene neighborhood or protein fusion information, is provided.

Collapse

165

Powers R, Mercier KA, Copeland JC. The application of FAST-NMR for the identification of novel drug discovery targets. Drug Discov Today 2008;13:172-9. [PMID: 18275915 DOI: 10.1016/j.drudis.2007.11.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2007] [Revised: 10/30/2007] [Accepted: 11/01/2007] [Indexed: 10/22/2022]

166

Stevens FJ. Possible evolutionary links between immunoglobulin light chains and other proteins involved in amyloidosis. Amyloid 2008;15:96-107. [PMID: 18484336 DOI: 10.1080/13506120802005973] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

167

Espadaler J, Eswar N, Querol E, Avilés FX, Sali A, Marti-Renom MA, Oliva B. Prediction of enzyme function by combining sequence similarity and protein interactions. BMC Bioinformatics 2008;9:249. [PMID: 18505562 PMCID: PMC2430716 DOI: 10.1186/1471-2105-9-249] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2007] [Accepted: 05/27/2008] [Indexed: 11/18/2022] Open

168

Fetrow JS. Active site profiling to identify protein functional sites in sequences and structures using the Deacon Active Site Profiler (DASP). CURRENT PROTOCOLS IN BIOINFORMATICS 2008;Chapter 8:8.10.1-8.10.16. [PMID: 18428769 DOI: 10.1002/0471250953.bi0810s14] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

169

Lobley AE, Nugent T, Orengo CA, Jones DT. FFPred: an integrated feature-based function prediction server for vertebrate proteomes. Nucleic Acids Res 2008;36:W297-302. [PMID: 18463141 PMCID: PMC2447771 DOI: 10.1093/nar/gkn193] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

170

Toyota CG, Berthold CL, Gruez A, Jónsson S, Lindqvist Y, Cambillau C, Richards NGJ. Differential substrate specificity and kinetic behavior of Escherichia coli YfdW and Oxalobacter formigenes formyl coenzyme A transferase. J Bacteriol 2008;190:2556-64. [PMID: 18245280 PMCID: PMC2293189 DOI: 10.1128/jb.01823-07] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2007] [Accepted: 01/25/2008] [Indexed: 01/29/2023] Open

171

Pugalenthi G, Kumar KK, Suganthan P, Gangal R. Identification of catalytic residues from protein structure using support vector machine with sequence and structural features. Biochem Biophys Res Commun 2008;367:630-4. [DOI: 10.1016/j.bbrc.2008.01.038] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2008] [Accepted: 01/10/2008] [Indexed: 11/16/2022]

172

Wass MN, Sternberg MJE. ConFunc—functional annotation in the twilight zone. Bioinformatics 2008;24:798-806. [DOI: 10.1093/bioinformatics/btn037] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

173

Bujnicki JM, Droogmans L, Grosjean H, Purushothaman SK, Lapeyre B. Bioinformatics-Guided Identification and Experimental Characterization of Novel RNA Methyltransferas. ACTA ACUST UNITED AC 2008. [DOI: 10.1007/978-3-540-74268-5_7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]

174

Lee D, Redfern O, Orengo C. Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 2007;8:995-1005. [PMID: 18037900 DOI: 10.1038/nrm2281] [Citation(s) in RCA: 359] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

175

Yeats C, Lees J, Reid A, Kellam P, Martin N, Liu X, Orengo C. Gene3D: comprehensive structural and functional annotation of genomes. Nucleic Acids Res 2007;36:D414-8. [PMID: 18032434 PMCID: PMC2238970 DOI: 10.1093/nar/gkm1019] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

176

Investigation of factors affecting prediction of protein-protein interaction networks by phylogenetic profiling. BMC Genomics 2007;8:393. [PMID: 17967189 PMCID: PMC2204017 DOI: 10.1186/1471-2164-8-393] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2007] [Accepted: 10/29/2007] [Indexed: 11/10/2022] Open

177

Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, Rodrigues V, White KP, Bork P, Sowdhamini R. Enhanced function annotations for Drosophila serine proteases: a case study for systematic annotation of multi-member gene families. Gene 2007;407:199-215. [PMID: 17996400 DOI: 10.1016/j.gene.2007.10.012] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2007] [Revised: 09/09/2007] [Accepted: 10/07/2007] [Indexed: 12/30/2022]

Abstract

Systematically annotating function of enzymes that belong to large protein families encoded in a single eukaryotic genome is a very challenging task. We carried out such an exercise to annotate function for serine-protease family of the trypsin fold in Drosophila melanogaster, with an emphasis on annotating serine-protease homologues (SPHs) that may have lost their catalytic function. Our approach involves data mining and data integration to provide function annotations for 190 Drosophila gene products containing serine-protease-like domains, of which 35 are SPHs. This was accomplished by analysis of structure-function relationships, gene-expression profiles, large-scale protein-protein interaction data, literature mining and bioinformatic tools. We introduce functional residue clustering (FRC), a method that performs hierarchical clustering of sequences using properties of functionally important residues and utilizes correlation co-efficient as a quantitative similarity measure to transfer in vivo substrate specificities to proteases. We show that the efficiency of transfer of substrate-specificity information using this method is generally high. FRC was also applied on Drosophila proteases to assign putative competitive inhibitor relationships (CIRs). Microarray gene-expression data were utilized to uncover a large-scale and dual involvement of proteases in development and in immune response. We found specific recruitment of SPHs and proteases with CLIP domains in immune response, suggesting evolution of a new function for SPHs. We also suggest existence of separate downstream protease cascades for immune response against bacterial/fungal infections and parasite/parasitoid infections. We verify quality of our annotations using information from RNAi screens and other evidence types. Utilization of such multi-fold approaches results in 10-fold increase of function annotation for Drosophila serine proteases and demonstrates value in increasing annotations in multiple genomes.

Collapse

178

Functional differentiation of proteins: implications for structural genomics. Structure 2007;15:405-15. [PMID: 17437713 DOI: 10.1016/j.str.2007.02.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2006] [Revised: 02/15/2007] [Accepted: 02/16/2007] [Indexed: 01/06/2023]

179

Shen HB, Chou KC. EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Commun 2007;364:53-9. [PMID: 17931599 DOI: 10.1016/j.bbrc.2007.09.098] [Citation(s) in RCA: 163] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2007] [Accepted: 09/22/2007] [Indexed: 11/25/2022]

180

Fuhrer T, Chen L, Sauer U, Vitkup D. Computational prediction and experimental verification of the gene encoding the NAD+/NADP+-dependent succinate semialdehyde dehydrogenase in Escherichia coli. J Bacteriol 2007;189:8073-8. [PMID: 17873044 PMCID: PMC2168661 DOI: 10.1128/jb.01027-07] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

181

Frishman D. Protein annotation at genomic scale: the current status. Chem Rev 2007;107:3448-66. [PMID: 17658902 DOI: 10.1021/cr068303k] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

182

Kunik V, Meroz Y, Solan Z, Sandbank B, Weingart U, Ruppin E, Horn D. Functional representation of enzymes by specific peptides. PLoS Comput Biol 2007;3:e167. [PMID: 17722976 PMCID: PMC1950953 DOI: 10.1371/journal.pcbi.0030167] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2007] [Accepted: 07/10/2007] [Indexed: 11/19/2022] Open

Abstract

Predicting the function of a protein from its sequence is a long-standing goal of bioinformatic research. While sequence similarity is the most popular tool used for this purpose, sequence motifs may also subserve this goal. Here we develop a motif-based method consisting of applying an unsupervised motif extraction algorithm (MEX) to all enzyme sequences, and filtering the results by the four-level classification hierarchy of the Enzyme Commission (EC). The resulting motifs serve as specific peptides (SPs), appearing on single branches of the EC. In contrast to previous motif-based methods, the new method does not require any preprocessing by multiple sequence alignment, nor does it rely on over-representation of motifs within EC branches. The SPs obtained comprise on average 8.4 ± 4.5 amino acids, and specify the functions of 93% of all enzymes, which is much higher than the coverage of 63% provided by ProSite motifs. The SP classification thus compares favorably with previous function annotation methods and successfully demonstrates an added value in extreme cases where sequence similarity fails. Interestingly, SPs cover most of the annotated active and binding site amino acids, and occur in active-site neighboring 3-D pockets in a highly statistically significant manner. The latter are assumed to have strong biological relevance to the activity of the enzyme. Further filtering of SPs by biological functional annotations results in reduced small subsets of SPs that possess very large enzyme coverage. Overall, SPs both form a very useful tool for enzyme functional classification and bear responsibility for the catalytic biological function carried out by enzymes.

Sequence motifs are known to provide information about functional properties of proteins. In the past, many approaches have looked for deterministic motifs in protein sequences, by searching for functionally over-represented k-mers, with moderate levels of success. Here we revisit and renew the utility of deterministic motifs, by searching for them in a partially unsupervised and context-dependent manner. Using a novel motif extraction algorithm, MEX, deterministic sequence motifs are extracted from Swiss Prot data containing more than 50,000 enzymes. They are then filtered by the Enzyme Commission classification hierarchy to produce sets of specific peptides (SPs). The latter specify enzyme function for 93% of the data, comparing well with existing approaches for enzyme classification. Importantly, SPs are found to have biological significance. A majority of all known active and binding sites of enzymes are covered by SPs, and many SPs are found to lie within spatial pockets in the neighborhood of the active sites. Both these results have extremely high statistical significance. A user-friendly tool that displays the hits of SPs for any protein sequence that is presented as a query, together with the EC assignments due to these SPs, is available at http://adios.tau.ac.il/SPSearch.

Collapse

183

Chen L, Vitkup D. Distribution of orphan metabolic activities. Trends Biotechnol 2007;25:343-8. [PMID: 17580095 DOI: 10.1016/j.tibtech.2007.06.001] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2007] [Revised: 04/17/2007] [Accepted: 06/01/2007] [Indexed: 10/23/2022]

184

Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res 2007;35:W182-5. [PMID: 17526522 PMCID: PMC1933193 DOI: 10.1093/nar/gkm321] [Citation(s) in RCA: 2843] [Impact Index Per Article: 167.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

185

Marti-Renom MA, Rossi A, Al-Shahrour F, Davis FP, Pieper U, Dopazo J, Sali A. The AnnoLite and AnnoLyze programs for comparative annotation of protein structures. BMC Bioinformatics 2007;8 Suppl 4:S4. [PMID: 17570147 PMCID: PMC1892083 DOI: 10.1186/1471-2105-8-s4-s4] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

186

Mirkovic N, Li Z, Parnassa A, Murray D. Strategies for high-throughput comparative modeling: applications to leverage analysis in structural genomics and protein family organization. Proteins 2007;66:766-77. [PMID: 17154423 DOI: 10.1002/prot.21191] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

187

Brown D, Sjölander K. Functional classification using phylogenomic inference. PLoS Comput Biol 2007;2:e77. [PMID: 16846248 PMCID: PMC1484587 DOI: 10.1371/journal.pcbi.0020077] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open

188

Marsden RL, Lewis TA, Orengo CA. Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint. BMC Bioinformatics 2007;8:86. [PMID: 17349043 PMCID: PMC1829165 DOI: 10.1186/1471-2105-8-86] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2006] [Accepted: 03/09/2007] [Indexed: 11/25/2022] Open

189

Ofran Y, Rost B. ISIS: interaction sites identified from sequence. Bioinformatics 2007;23:e13-6. [PMID: 17237081 DOI: 10.1093/bioinformatics/btl303] [Citation(s) in RCA: 172] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

190

Yuan Tseng Y, Lian J. Estimating evolutionary rate of local protein binding surfaces: a Bayesian Monte Carlo approach. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007;2006:739-42. [PMID: 17282289 DOI: 10.1109/iembs.2005.1616520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]

191

Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK. Relating protein pharmacology by ligand chemistry. Nat Biotechnol 2007;25:197-206. [PMID: 17287757 DOI: 10.1038/nbt1284] [Citation(s) in RCA: 1392] [Impact Index Per Article: 81.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

192

Shahbaba B, Neal RM. Gene function classification using Bayesian models with hierarchy-based priors. BMC Bioinformatics 2006;7:448. [PMID: 17038174 PMCID: PMC1618412 DOI: 10.1186/1471-2105-7-448] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2006] [Accepted: 10/12/2006] [Indexed: 11/10/2022] Open

193

Yu CS, Chen YC, Lu CH, Hwang JK. Prediction of protein subcellular localization. Proteins 2006;64:643-51. [PMID: 16752418 DOI: 10.1002/prot.21018] [Citation(s) in RCA: 1061] [Impact Index Per Article: 58.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Abstract

Because the protein's function is usually related to its subcellular localization, the ability to predict subcellular localization directly from protein sequences will be useful for inferring protein functions. Recent years have seen a surging interest in the development of novel computational tools to predict subcellular localization. At present, these approaches, based on a wide range of algorithms, have achieved varying degrees of success for specific organisms and for certain localization categories. A number of authors have noticed that sequence similarity is useful in predicting subcellular localization. For example, Nair and Rost (Protein Sci 2002;11:2836-2847) have carried out extensive analysis of the relation between sequence similarity and identity in subcellular localization, and have found a close relationship between them above a certain similarity threshold. However, many existing benchmark data sets used for the prediction accuracy assessment contain highly homologous sequences-some data sets comprising sequences up to 80-90% sequence identity. Using these benchmark test data will surely lead to overestimation of the performance of the methods considered. Here, we develop an approach based on a two-level support vector machine (SVM) system: the first level comprises a number of SVM classifiers, each based on a specific type of feature vectors derived from sequences; the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. We compare our approach with a global sequence alignment approach and other existing approaches for two benchmark data sets-one comprising prokaryotic sequences and the other eukaryotic sequences. Furthermore, we carried out all-against-all sequence alignment for several data sets to investigate the relationship between sequence homology and subcellular localization. Our results, which are consistent with previous studies, indicate that the homology search approach performs well down to 30% sequence identity, although its performance deteriorates considerably for sequences sharing lower sequence identity. A data set of high homology levels will undoubtedly lead to biased assessment of the performances of the predictive approaches-especially those relying on homology search or sequence annotations. Our two-level classification system based on SVM does not rely on homology search; therefore, its performance remains relatively unaffected by sequence homology. When compared with other approaches, our approach performed significantly better. Furthermore, we also develop a practical hybrid method, which combines the two-level SVM classifier and the homology search method, as a general tool for the sequence annotation of subcellular localization.

Collapse

194

Donini S, Percudani R, Credali A, Montanini B, Sartori A, Peracchi A. A threonine synthase homolog from a mammalian genome. Biochem Biophys Res Commun 2006;350:922-8. [PMID: 17034760 DOI: 10.1016/j.bbrc.2006.09.112] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2006] [Accepted: 09/21/2006] [Indexed: 11/19/2022]

195

Fernandez-Fuentes N, Fiser A. Saturating representation of loop conformational fragments in structure databanks. BMC STRUCTURAL BIOLOGY 2006;6:15. [PMID: 16820050 PMCID: PMC1574324 DOI: 10.1186/1472-6807-6-15] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2006] [Accepted: 07/04/2006] [Indexed: 11/30/2022]

Abstract

Background

Short fragments of proteins are fundamental starting points in various structure prediction applications, such as in fragment based loop modeling methods but also in various full structure build-up procedures. The applicability and performance of these approaches depend on the availability of short fragments in structure databanks.

Results

We studied the representation of protein loop fragments up to 14 residues in length. All possible query fragments found in sequence databases (Sequence Space) were clustered and cross referenced with available structural fragments in Protein Data Bank (Structure Space). We found that the expansion of PDB in the last few years resulted in a dense coverage of loop conformational fragments. For each loops of length 8 in the current Sequence Space there is at least one loop in Structure Space with 50% or higher sequence identity. By correlating sequence and structure clusters of loops we found that a 50% sequence identity generally guarantees structural similarity. These percentages of coverage at 50% sequence cutoff drop to 96, 94, 68, 53, 33 and 13% for loops of length 9, 10, 11, 12, 13, and 14, respectively. There is not a single loop in the current Sequence Space at any length up to 14 residues that is not matched with a conformational segment that shares at least 20% sequence identity. This minimum observed identity is 40% for loops of 12 residues or shorter and is as high as 50% for 10 residue or shorter loops. We also assessed the impact of rapidly growing sequence databanks on the estimated number of new loop conformations and found that while the number of sequentially unique sequence segments increased about six folds during the last five years there are almost no unique conformational segments among these up to 12 residues long fragments.

Conclusion

The results suggest that fragment based prediction approaches are not limited any more by the completeness of fragments in databanks but rather by the effective scoring and search algorithms to locate them. The current favorable coverage and trends observed will be further accentuated with the progress of Protein Structure Initiative that targets new protein folds and ultimately aims at providing an exhaustive coverage of the structure space.

Collapse

196

Han L, Cui J, Lin H, Ji Z, Cao Z, Li Y, Chen Y. Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity. Proteomics 2006;6:4023-37. [PMID: 16791826 DOI: 10.1002/pmic.200500938] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

197

Pellegrini-Calace M, Soro S, Tramontano A. Revisiting the prediction of protein function at CASP6. FEBS J 2006;273:2977-83. [PMID: 16759228 DOI: 10.1111/j.1742-4658.2006.05309.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

198

Soro S, Tramontano A. The prediction of protein function at CASP6. Proteins 2006;61 Suppl 7:201-213. [PMID: 16187363 DOI: 10.1002/prot.20738] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

199

Mika S, Rost B. Protein-protein interactions more conserved within species than across species. PLoS Comput Biol 2006;2:e79. [PMID: 16854211 PMCID: PMC1513270 DOI: 10.1371/journal.pcbi.0020079] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2005] [Indexed: 11/21/2022] Open

Abstract

Experimental high-throughput studies of protein–protein interactions are beginning to provide enough data for comprehensive computational studies. Today, about ten large data sets, each with thousands of interacting pairs, coarsely sample the interactions in fly, human, worm, and yeast. Another about 55,000 pairs of interacting proteins have been identified by more careful, detailed biochemical experiments. Most interactions are experimentally observed in prokaryotes and simple eukaryotes; very few interactions are observed in higher eukaryotes such as mammals. It is commonly assumed that pathways in mammals can be inferred through homology to model organisms, e.g. the experimental observation that two yeast proteins interact is transferred to infer that the two corresponding proteins in human also interact. Two pairs for which the interaction is conserved are often described as interologs. The goal of this investigation was a large-scale comprehensive analysis of such inferences, i.e. of the evolutionary conservation of interologs. Here, we introduced a novel score for measuring the overlap between protein–protein interaction data sets. This measure appeared to reflect the overall quality of the data and was the basis for our two surprising results from our large-scale analysis. Firstly, homology-based inferences of physical protein–protein interactions appeared far less successful than expected. In fact, such inferences were accurate only for extremely high levels of sequence similarity. Secondly, and most surprisingly, the identification of interacting partners through sequence similarity was significantly more reliable for protein pairs within the same organism than for pairs between species. Our analysis underlined that the discrepancies between different datasets are large, even when using the same type of experiment on the same organism. This reality considerably constrains the power of homology-based transfer of interactions. In particular, the experimental probing of interactions in distant model organisms has to be undertaken with some caution. More comprehensive images of protein–protein networks will require the combination of many high-throughput methods, including in silico inferences and predictions. http://www.rostlab.org/results/2006/ppi_homology/

The IntAct database contains about ten large-scale data sets of protein–protein interactions. Each set contains thousands of experimentally observed pair interactions. Most pairs were observed in yeast (Saccharomyces cerevisiae), fly (Drosophila melanogaster), and worm (Caenorhabditis elegans). These interactions are often perceived as model organisms in the sense that one can infer that two mouse proteins interact if one experimentally observes the two corresponding proteins in worm to interact. Here, the authors analyzed in detail how the sequence signals of physical protein–protein interactions are conserved. It is a common assumption that protein–protein interactions can easily be inferred through homology transfer from one model organism to another organism of interest. Here, the authors demonstrated that such homology transfers are only accurate at unexpectedly high levels of sequence identity. Even more surprisingly, homology transfers of protein–protein interactions are significantly more reliable for protein pairs from the same species than for two protein pairs from different organisms. The observation that interactions were much more conserved within than across species was valid for all levels of sequence similarity, i.e. for very similar as well as for more diverged interologs.

Collapse

200

Glasner ME, Fayazmanesh N, Chiang RA, Sakai A, Jacobson MP, Gerlt JA, Babbitt PC. Evolution of structure and function in the o-succinylbenzoate synthase/N-acylamino acid racemase family of the enolase superfamily. J Mol Biol 2006;360:228-50. [PMID: 16740275 DOI: 10.1016/j.jmb.2006.04.055] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2006] [Revised: 04/22/2006] [Accepted: 04/25/2006] [Indexed: 11/30/2022]

Abstract

Understanding how proteins evolve to provide both exquisite specificity and proficient activity is a fundamental problem in biology that has implications for protein function prediction and protein engineering. To study this problem, we analyzed the evolution of structure and function in the o-succinylbenzoate synthase/N-acylamino acid racemase (OSBS/NAAAR) family, part of the mechanistically diverse enolase superfamily. Although all characterized members of the family catalyze the OSBS reaction, this family is extraordinarily divergent, with some members sharing <15% identity. In addition, a member of this family, Amycolatopsis OSBS/NAAAR, is promiscuous, catalyzing both dehydration and racemization. Although the OSBS/NAAAR family appears to have a single evolutionary origin, no sequence or structural motifs unique to this family could be identified; all residues conserved in the family are also found in enolase superfamily members that have different functions. Based on their species distribution, several uncharacterized proteins similar to Amycolatopsis OSBS/NAAAR appear to have been transmitted by lateral gene transfer. Like Amycolatopsis OSBS/NAAAR, these might have additional or alternative functions to OSBS because many are from organisms lacking the pathway in which OSBS is an intermediate. In addition to functional differences, the OSBS/NAAAR family exhibits surprising structural variations, including large differences in orientation between the two domains. These results offer several insights into protein evolution. First, orthologous proteins can exhibit significant structural variation, and specificity can be maintained with little conservation of ligand-contacting residues. Second, the discovery of a set of proteins similar to Amycolatopsis OSBS/NAAAR supports the hypothesis that new protein functions evolve through promiscuous intermediates. Finally, a combination of evolutionary, structural, and sequence analyses identified characteristics that might prime proteins, such as Amycolatopsis OSBS/NAAAR, for the evolution of new activities.

Collapse