Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y. Predicting function: from genes to genomes and back. J Mol Biol 1998;283:707-25. [PMID: 9790834 DOI: 10.1006/jmbi.1998.2144] [Citation(s) in RCA: 262] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

For:	Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y. Predicting function: from genes to genomes and back. J Mol Biol 1998;283:707-25. [PMID: 9790834 DOI: 10.1006/jmbi.1998.2144] [Citation(s) in RCA: 262] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Number

Cited by Other Article(s)

151

Kim WK, In YJ, Kim JH, Cho HJ, Kim JH, Kang S, Lee CY, Lee SC. Quantitative relationship of dioxin-responsive gene expression to dioxin response element in Hep3B and HepG2 human hepatocarcinoma cell lines. Toxicol Lett 2006;165:174-81. [PMID: 16697128 DOI: 10.1016/j.toxlet.2006.03.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2005] [Revised: 03/03/2006] [Accepted: 03/10/2006] [Indexed: 11/29/2022]

152

Bryson K, Loux V, Bossy R, Nicolas P, Chaillou S, van de Guchte M, Penaud S, Maguin E, Hoebeke M, Bessières P, Gibrat JF. AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system. Nucleic Acids Res 2006;34:3533-45. [PMID: 16855290 PMCID: PMC1524909 DOI: 10.1093/nar/gkl471] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

153

Lozada-Chávez I, Janga SC, Collado-Vides J. Bacterial regulatory networks are extremely flexible in evolution. Nucleic Acids Res 2006;34:3434-45. [PMID: 16840530 PMCID: PMC1524901 DOI: 10.1093/nar/gkl423] [Citation(s) in RCA: 140] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

154

Han L, Cui J, Lin H, Ji Z, Cao Z, Li Y, Chen Y. Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity. Proteomics 2006;6:4023-37. [PMID: 16791826 DOI: 10.1002/pmic.200500938] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

155

Brylinski M, Konieczny L, Roterman I. Ligation site in proteins recognized in silico. Bioinformation 2006;1:127-9. [PMID: 17597871 PMCID: PMC1891674 DOI: 10.6026/97320630001127] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2006] [Accepted: 04/05/2006] [Indexed: 11/23/2022] Open

156

Schneider G, Neuberger G, Wildpaner M, Tian S, Berezovsky I, Eisenhaber F. Application of a sensitive collection heuristic for very large protein families: evolutionary relationship between adipose triglyceride lipase (ATGL) and classic mammalian lipases. BMC Bioinformatics 2006;7:164. [PMID: 16551354 PMCID: PMC1435942 DOI: 10.1186/1471-2105-7-164] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2005] [Accepted: 03/21/2006] [Indexed: 11/30/2022] Open

157

Wu J, Hu Z, DeLisi C. Gene annotation and network inference by phylogenetic profiling. BMC Bioinformatics 2006;7:80. [PMID: 16503966 PMCID: PMC1388238 DOI: 10.1186/1471-2105-7-80] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2005] [Accepted: 02/17/2006] [Indexed: 12/30/2022] Open

Abstract

BACKGROUND

Phylogenetic analysis is emerging as one of the most informative computational methods for the annotation of genes and identification of evolutionary modules of functionally related genes. The effectiveness with which phylogenetic profiles can be utilized to assign genes to pathways depends on an appropriate measure of correlation between gene profiles, and an effective decision rule to use the correlate. Current methods, though useful, perform at a level well below what is possible, largely because performance of the latter deteriorates rapidly as coverage increases.

RESULTS

We introduce, test and apply a new decision rule, correlation enrichment (CE), for assigning genes to functional categories at various levels of resolution. Among the results are: (1) CE performs better than standard guilt by association (SGA, assignment to a functional category when a simple correlate exceeds a pre-specified threshold) irrespective of the number of genes assigned (i.e. coverage); improvement is greatest at high coverage where precision (positive predictive value) of CE is approximately 6-fold higher than that of SGA. (2) CE is estimated to allocate each of the 2918 unannotated orthologs to KEGG pathways with an average precision of 49% (approximately 7-fold higher than SGA) (3) An estimated 94% of the 1846 unannotated orthologs in the COG ontology can be assigned a function with an average precision of 0.4 or greater. (4) Dozens of functional and evolutionarily conserved cliques or quasi-cliques can be identified, many having previously unannotated genes.

CONCLUSION

The method serves as a general computational tool for annotating large numbers of unknown genes, uncovering evolutionary and functional modules. It appears to perform substantially better than extant stand alone high throughout methods.

Collapse

158

Ahmad I, Hoessli DC, Walker-Nasir E, Rafik SM, Shakoori AR. Oct-2 DNA binding transcription factor: functional consequences of phosphorylation and glycosylation. Nucleic Acids Res 2006;34:175-84. [PMID: 16431844 PMCID: PMC1326018 DOI: 10.1093/nar/gkj401] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open

159

Pretzer G, Snel J, Molenaar D, Wiersma A, Bron PA, Lambert J, de Vos WM, van der Meer R, Smits MA, Kleerebezem M. Biodiversity-based identification and functional characterization of the mannose-specific adhesin of Lactobacillus plantarum. J Bacteriol 2005;187:6128-36. [PMID: 16109954 PMCID: PMC1196140 DOI: 10.1128/jb.187.17.6128-6136.2005] [Citation(s) in RCA: 223] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Abstract

Lactobacillus plantarum is a frequently encountered inhabitant of the human intestinal tract, and some strains are marketed as probiotics. Their ability to adhere to mannose residues is a potentially interesting characteristic with regard to proposed probiotic features such as colonization of the intestinal surface and competitive exclusion of pathogens. In this study, the variable capacity of 14 L. plantarum strains to agglutinate Saccharomyces cerevisiae in a mannose-specific manner was determined and subsequently correlated with an L. plantarum WCFS1-based genome-wide genotype database. This led to the identification of four candidate mannose adhesin-encoding genes. Two genes primarily predicted to code for sortase-dependent cell surface proteins displayed a complete gene-trait match. Their involvement in mannose adhesion was corroborated by the finding that a sortase (srtA) mutant of L. plantarum WCFS1 lost the capacity to agglutinate S. cerevisiae. The postulated role of these two candidate genes was investigated by gene-specific deletion and overexpression in L. plantarum WCFS1. Subsequent evaluation of the mannose adhesion capacity of the resulting mutant strains showed that inactivation of one candidate gene (lp_0373) did not affect mannose adhesion properties. In contrast, deletion of the other gene (lp_1229) resulted in a complete loss of yeast agglutination ability, while its overexpression quantitatively enhanced this phenotype. Therefore, this gene was designated to encode the mannose-specific adhesin (Msa; gene name, msa) of L. plantarum. Domain homology analysis of the predicted 1,000-residue Msa protein identified known carbohydrate-binding domains, further supporting its role as a mannose adhesin that is likely to be involved in the interaction of L. plantarum with its host in the intestinal tract.

Collapse

160

Francke C, Siezen RJ, Teusink B. Reconstructing the metabolic network of a bacterium from its genome. Trends Microbiol 2005;13:550-8. [PMID: 16169729 DOI: 10.1016/j.tim.2005.09.001] [Citation(s) in RCA: 124] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2005] [Revised: 08/25/2005] [Accepted: 09/08/2005] [Indexed: 10/25/2022]

161

McCarthy FM, Burgess SC, van den Berg BHJ, Koter MD, Pharr GT. Differential detergent fractionation for non-electrophoretic eukaryote cell proteomics. J Proteome Res 2005;4:316-24. [PMID: 15822906 DOI: 10.1021/pr049842d] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

162

Lappe M, Holm L. Algorithms for protein interaction networks. Biochem Soc Trans 2005;33:530-4. [PMID: 15916557 DOI: 10.1042/bst0330530] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

163

Gilks WR, Audit B, de Angelis D, Tsoka S, Ouzounis CA. Percolation of annotation errors through hierarchically structured protein sequence databases. Math Biosci 2005;193:223-34. [PMID: 15748731 DOI: 10.1016/j.mbs.2004.08.001] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2003] [Accepted: 08/30/2004] [Indexed: 11/18/2022]

164

Korbel JO, Doerks T, Jensen LJ, Perez-Iratxeta C, Kaczanowski S, Hooper SD, Andrade MA, Bork P. Systematic association of genes to phenotypes by genome and literature mining. PLoS Biol 2005;3:e134. [PMID: 15799710 PMCID: PMC1073694 DOI: 10.1371/journal.pbio.0030134] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2004] [Accepted: 02/02/2005] [Indexed: 11/23/2022] Open

165

Worthey EA, Myler PJ. Protozoan genomes: gene identification and annotation. Int J Parasitol 2005;35:495-512. [PMID: 15826642 DOI: 10.1016/j.ijpara.2005.02.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2004] [Revised: 01/25/2005] [Accepted: 02/06/2005] [Indexed: 12/01/2022]

166

Sampson EM, Johnson CLV, Bobik TA. Biochemical evidence that the pduS gene encodes a bifunctional cobalamin reductase. Microbiology (Reading) 2005;151:1169-1177. [PMID: 15817784 DOI: 10.1099/mic.0.27755-0] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Abstract

Salmonella enterica degrades 1,2-propanediol (1,2-PD) by a pathway that requires coenzyme B(12) (adenosylcobalamin; AdoCbl). The genes specifically involved in 1,2-PD utilization (pdu) are found in a large contiguous cluster, the pdu locus. Earlier studies have indicated that this locus includes genes for the conversion of vitamin B(12) (cyanocobalamin; CNCbl) to AdoCbl and that the pduO gene encodes an ATP : cob(I)alamin adenosyltransferase which catalyses the terminal step of this process. Here, in vitro evidence is presented that the pduS gene encodes a bifunctional cobalamin reductase that catalyses two reductive steps needed for the conversion of CNCbl into AdoCbl. The PduS enzyme was produced in high levels in Escherichia coli. Enzyme assays showed that cell extracts from the PduS expression strain reduced cob(III)alamin (hydroxycobalamin) to cob(II)alamin at a rate of 91 nmol min(-1) mg(-1) and cob(II)alamin to cob(I)alamin at a rate of 7.8 nmol min(-1) mg(-1). In contrast, control extracts had only 9.9 nmol min(-1) mg(-1) cob(III)alamin reductase activity and no detectable cob(II)alamin reductase activity. Thus, these results indicated that the PduS enzyme is a bifunctional cobalamin reductase. Enzyme assays also showed that the PduS enzyme reduced cob(II)alamin to cob(I)alamin for conversion into AdoCbl by purified PduO adenosyltransferase. Moreover, studies in which iodoacetate was used as a chemical trap for cob(I)alamin indicated that the PduS and PduO enzymes physically interact and that cob(I)alamin is sequestered during the conversion of cob(II)alamin to AdoCbl by these two enzymes. This is likely to be important physiologically, since cob(I)alamin is extremely reactive and would need to be protected from unproductive by-reactions. Lastly, bioinformatic analyses showed that the PduS enzyme is unrelated in amino acid sequence to enzymes of known function currently present in GenBank. Hence, results indicate that the PduS enzyme represents a new class of cobalamin reductase.

Collapse

167

Prigent V, Thierry JC, Poch O, Plewniak F. DbW: automatic update of a functional family-specific multiple alignment. Bioinformatics 2004;21:1437-42. [PMID: 15598832 DOI: 10.1093/bioinformatics/bti218] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

168

Hidden localization motifs: naturally occurring peroxisomal targeting signals in non-peroxisomal proteins. Genome Biol 2004;5:R97. [PMID: 15575971 PMCID: PMC545800 DOI: 10.1186/gb-2004-5-12-r97] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2004] [Revised: 10/11/2004] [Accepted: 11/09/2004] [Indexed: 11/13/2022] Open

169

Yu X, Lin J, Shi T, Li Y. A novel domain-based method for predicting the functional classes of proteins. ACTA ACUST UNITED AC 2004. [DOI: 10.1007/bf03183426] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

170

Galperin MY, Koonin EV. 'Conserved hypothetical' proteins: prioritization of targets for experimental study. Nucleic Acids Res 2004;32:5452-63. [PMID: 15479782 PMCID: PMC524295 DOI: 10.1093/nar/gkh885] [Citation(s) in RCA: 309] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

171

Sun YV, Boverhof DR, Burgoon LD, Fielden MR, Zacharewski TR. Comparative analysis of dioxin response elements in human, mouse and rat genomic sequences. Nucleic Acids Res 2004;32:4512-23. [PMID: 15328365 PMCID: PMC516056 DOI: 10.1093/nar/gkh782] [Citation(s) in RCA: 167] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

172

The planetary biology of cytochrome P450 aromatases. BMC Biol 2004;2:19. [PMID: 15315709 PMCID: PMC515309 DOI: 10.1186/1741-7007-2-19] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2004] [Accepted: 08/17/2004] [Indexed: 11/24/2022] Open

Abstract

Background

Joining a model for the molecular evolution of a protein family to the paleontological and geological records (geobiology), and then to the chemical structures of substrates, products, and protein folds, is emerging as a broad strategy for generating hypotheses concerning function in a post-genomic world. This strategy expands systems biology to a planetary context, necessary for a notion of fitness to underlie (as it must) any discussion of function within a biomolecular system.

Results

Here, we report an example of such an expansion, where tools from planetary biology were used to analyze three genes from the pig Sus scrofa that encode cytochrome P450 aromatases–enzymes that convert androgens into estrogens. The evolutionary history of the vertebrate aromatase gene family was reconstructed. Transition redundant exchange silent substitution metrics were used to interpolate dates for the divergence of family members, the paleontological record was consulted to identify changes in physiology that correlated in time with the change in molecular behavior, and new aromatase sequences from peccary were obtained. Metrics that detect changing function in proteins were then applied, including K_A/K_Svalues and those that exploit structural biology. These identified specific amino acid replacements that were associated with changing substrate and product specificity during the time of presumed adaptive change. The combined analysis suggests that aromatase paralogs arose in pigs as a result of selection for Suoidea with larger litters than their ancestors, and permitted the Suoidea to survive the global climatic trauma that began in the Eocene.

Conclusions

This combination of bioinformatics analysis, molecular evolution, paleontology, cladistics, global climatology, structural biology, and organic chemistry serves as a paradigm in planetary biology. As the geological, paleontological, and genomic records improve, this approach should become widely useful to make systems biology statements about high-level function for biomolecular systems.

Collapse

173

Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JDJ, Bertin N, Chung S, Vidal M, Gerstein M. Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res 2004;14:1107-18. [PMID: 15173116 PMCID: PMC419789 DOI: 10.1101/gr.1774904] [Citation(s) in RCA: 402] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

174

Veeramachaneni V, Makałowski W. Visualizing sequence similarity of protein families. Genome Res 2004;14:1160-9. [PMID: 15140831 PMCID: PMC419794 DOI: 10.1101/gr.2079204] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

175

Lau AY, Chasman DI. Functional classification of proteins and protein variants. Proc Natl Acad Sci U S A 2004;101:6576-81. [PMID: 15087495 PMCID: PMC404087 DOI: 10.1073/pnas.0305043101] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

176

Cai CZ, Han LY, Ji ZL, Chen YZ. Enzyme family classification by support vector machines. Proteins 2004;55:66-76. [PMID: 14997540 DOI: 10.1002/prot.20045] [Citation(s) in RCA: 102] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

177

Jim K, Parmar K, Singh M, Tavazoie S. A cross-genomic approach for systematic mapping of phenotypic traits to genes. Genome Res 2004;14:109-15. [PMID: 14707173 PMCID: PMC314287 DOI: 10.1101/gr.1586704] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

178

von Mering C, Zdobnov EM, Tsoka S, Ciccarelli FD, Pereira-Leal JB, Ouzounis CA, Bork P. Genome evolution reveals biochemical networks and functional modules. Proc Natl Acad Sci U S A 2003;100:15428-33. [PMID: 14673105 PMCID: PMC307584 DOI: 10.1073/pnas.2136809100] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

179

Tian W, Skolnick J. How well is enzyme function conserved as a function of pairwise sequence identity? J Mol Biol 2003;333:863-82. [PMID: 14568541 DOI: 10.1016/j.jmb.2003.08.057] [Citation(s) in RCA: 281] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Abstract

Enzyme function conservation has been used to derive the threshold of sequence identity necessary to transfer function from a protein of known function to an unknown protein. Using pairwise sequence comparison, several studies suggested that when the sequence identity is above 40%, enzyme function is well conserved. In contrast, Rost argued that because of database bias, the results from such simple pairwise comparisons might be misleading. Thus, by grouping enzyme sequences into families based on sequence similarity and selecting representative sequences for comparison, he showed that enzyme function starts to diverge quickly when the sequence identity is below 70%. Here, we employ a strategy similar to Rost's to reduce the database bias; however, we classify enzyme families based not only on sequence similarity, but also on functional similarity, i.e. sequences in each family must have the same four digits or the same first three digits of the enzyme commission (EC) number. Furthermore, instead of selecting representative sequences for comparison, we calculate the function conservation of each enzyme family and then average the degree of enzyme function conservation across all enzyme families. Our analysis suggests that for functional transferability, 40% sequence identity can still be used as a confident threshold to transfer the first three digits of an EC number; however, to transfer all four digits of an EC number, above 60% sequence identity is needed to have at least 90% accuracy. Moreover, when PSI-BLAST is used, the magnitude of the E-value is found to be weakly correlated with the extent of enzyme function conservation in the third iteration of PSI-BLAST. As a result, functional annotation based on the E-values from PSI-BLAST should be used with caution. We also show that by employing an enzyme family-specific sequence identity threshold above which 100% functional conservation is required, functional inference of unknown sequences can be accurately accomplished. However, this comes at a cost: those true positive sequences below this threshold cannot be uniquely identified.

Collapse

180

Nair R, Rost B. Better prediction of sub-cellular localization by combining evolutionary and structural information. Proteins 2003;53:917-30. [PMID: 14635133 DOI: 10.1002/prot.10507] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

181

Abascal F, Valencia A. Automatic annotation of protein function based on family identification. Proteins 2003;53:683-92. [PMID: 14579359 DOI: 10.1002/prot.10449] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Abstract

Although genomes are being sequenced at an impressive rate, the information generated tells us little about protein function, which is slow to characterize by traditional methods. Automatic protein function annotation based on computational methods has alleviated this imbalance. The most powerful current approach for inferring the function of new proteins is by studying the annotations of their homologues, since their common origin is assumed to be reflected in their structure and function. Unfortunately, as proteins evolve they acquire new functions, so annotation based on homology must be carried out in the context of orthologues or subfamilies. Evolution adds new complications through domain shuffling: homology (or orthology) frequently corresponds to domains rather than complete proteins. Moreover, the function of a protein may be seen as the result of combining the functions of its domains. Additionally, automatic annotation has to deal with problems related to the annotations in the databases: errors (which are likely to be propagated), inconsistencies, or different degrees of function specification. We describe a method that addresses these difficulties for the annotation of protein function. Sequence relationships are detected and measured to obtain a map of the sequence space, which is searched for differentiated groups of proteins (similar to islands on the map), which are expected to have a common function and correspond to groups of orthologues or subfamilies. This mapmaking is done by applying a clustering algorithm based on Normalized cuts in graphs. The domain problem is addressed in a simple way: pairwise local alignments are analyzed to determine the extent to which they cover the entire sequence lengths of the two proteins. This analysis determines both what homologues are preferred for functional inheritance and the level of confidence of the annotation. To alleviate the problems associated with database annotations, the information on all the homologues that are grouped together with the query protein are taken into account to select the most representative functional descriptors. This method has been applied for the annotation of the genome of Buchnera aphidicola (specific host Baizongia pistaciae). Human inspection of the annotations allowed an estimation of accuracy of 94%; the different kinds of error that may appear when using this approach are described. Results can be accessed at http://www.pdg.cnb.uam.es/funcut.html. The programs are available upon request, although installation in other systems may be complicated.

Collapse

182

Kaplan N, Vaaknin A, Linial M. PANDORA: keyword-based analysis of protein sets by integration of annotation sources. Nucleic Acids Res 2003;31:5617-26. [PMID: 14500825 PMCID: PMC206469 DOI: 10.1093/nar/gkg769] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

183

Cai CZ, Han LY, Ji ZL, Chen X, Chen YZ. SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res 2003;31:3692-7. [PMID: 12824396 PMCID: PMC169006 DOI: 10.1093/nar/gkg600] [Citation(s) in RCA: 366] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

184

Nair R, Rost B. LOC3D: annotate sub-cellular localization for protein structures. Nucleic Acids Res 2003;31:3337-40. [PMID: 12824321 PMCID: PMC168921 DOI: 10.1093/nar/gkg514] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

185

Eisenhaber F, Eisenhaber B, Kubina W, Maurer-Stroh S, Neuberger G, Schneider G, Wildpaner M. Prediction of lipid posttranslational modifications and localization signals from protein sequences: big-Pi, NMT and PTS1. Nucleic Acids Res 2003;31:3631-4. [PMID: 12824382 PMCID: PMC168944 DOI: 10.1093/nar/gkg537] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

186

Jackson DB, Minch E, Munro RE. Bioinformatics. EXS 2003:31-69. [PMID: 12613171 DOI: 10.1007/978-3-0348-7997-2_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]

187

Huynen MA, Snel B, von Mering C, Bork P. Function prediction and protein networks. Curr Opin Cell Biol 2003;15:191-8. [PMID: 12648675 DOI: 10.1016/s0955-0674(03)00009-7] [Citation(s) in RCA: 107] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

188

Brunk CF, Lee LC, Tran AB, Li J. Complete sequence of the mitochondrial genome of Tetrahymena thermophila and comparative methods for identifying highly divergent genes. Nucleic Acids Res 2003;31:1673-82. [PMID: 12626709 PMCID: PMC152872 DOI: 10.1093/nar/gkg270] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2002] [Revised: 01/16/2003] [Accepted: 01/16/2003] [Indexed: 11/13/2022] Open

189

Boneca IG, de Reuse H, Epinat JC, Pupin M, Labigne A, Moszer I. A revised annotation and comparative analysis of Helicobacter pylori genomes. Nucleic Acids Res 2003;31:1704-14. [PMID: 12626712 PMCID: PMC152854 DOI: 10.1093/nar/gkg250] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

190

van Belkum A. High-throughput epidemiologic typing in clinical microbiology. Clin Microbiol Infect 2003;9:86-100. [PMID: 12588328 DOI: 10.1046/j.1469-0691.2003.00549.x] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

191

Chen YZ, Ung CY. Computer automated prediction of potential therapeutic and toxicity protein targets of bioactive compounds from Chinese medicinal plants. THE AMERICAN JOURNAL OF CHINESE MEDICINE 2002;30:139-54. [PMID: 12067089 DOI: 10.1142/s0192415x02000156] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

192

Kell DB. Genotype-phenotype mapping: genes as computer programs. Trends Genet 2002;18:555-9. [PMID: 12414184 DOI: 10.1016/s0168-9525(02)02765-8] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

193

Rigoutsos I, Huynh T, Floratos A, Parida L, Platt D. Dictionary-driven protein annotation. Nucleic Acids Res 2002;30:3901-16. [PMID: 12202776 PMCID: PMC137405 DOI: 10.1093/nar/gkf464] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2002] [Revised: 06/04/2002] [Accepted: 06/04/2002] [Indexed: 11/14/2022] Open

Abstract

Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were released publicly after we built the Bio-Dictionary that is used in our experiments. Finally, we have computed the annotations of more than 70 complete genomes and made them available on the World Wide Web at http://cbcsrv.watson.ibm.com/Annotations/.

Collapse

194

Van Regenmortel MHV. Reductionism and the search for structure-function relationships in antibody molecules. J Mol Recognit 2002;15:240-7. [PMID: 12447900 DOI: 10.1002/jmr.584] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

195

Wu LF, Hughes TR, Davierwala AP, Robinson MD, Stoughton R, Altschuler SJ. Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat Genet 2002;31:255-65. [PMID: 12089522 DOI: 10.1038/ng906] [Citation(s) in RCA: 221] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

196

Hoover DM, Lubkowski J. DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis. Nucleic Acids Res 2002;30:e43. [PMID: 12000848 PMCID: PMC115297 DOI: 10.1093/nar/30.10.e43] [Citation(s) in RCA: 394] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

197

Chen YZ, Ung CY. Prediction of potential toxicity and side effect protein targets of a small molecule by a ligand-protein inverse docking approach. J Mol Graph Model 2002;20:199-218. [PMID: 11766046 DOI: 10.1016/s1093-3263(01)00109-7] [Citation(s) in RCA: 112] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

198

Caron M, Imam-Sghiouar N, Poirier F, Le Caër JP, Labas V, Joubert-Caron R. Proteomic map and database of lymphoblastoid proteins. J Chromatogr B Analyt Technol Biomed Life Sci 2002;771:197-209. [PMID: 12015999 DOI: 10.1016/s1570-0232(02)00040-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

199

YANO N, FADDEN-PAIVA KJ, ENDOH M, SAKAI H, KUROKAWA K, DWORKIN LD, RIFAI A. Profiling the IgA nephropathy renal transcriptome: analysis by complementary DNA array hybridization. Nephrology (Carlton) 2002. [DOI: 10.1111/j.1440-1797.2002.tb00524.x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

200

Snel B, Bork P, Huynen MA. The identification of functional modules from the genomic association of genes. Proc Natl Acad Sci U S A 2002;99:5890-5. [PMID: 11983890 PMCID: PMC122872 DOI: 10.1073/pnas.092632599] [Citation(s) in RCA: 195] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2001] [Indexed: 11/18/2022] Open