Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y. Predicting function: from genes to genomes and back. J Mol Biol 1998;283:707-25. [PMID: 9790834 DOI: 10.1006/jmbi.1998.2144] [Citation(s) in RCA: 262] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

For:	Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y. Predicting function: from genes to genomes and back. J Mol Biol 1998;283:707-25. [PMID: 9790834 DOI: 10.1006/jmbi.1998.2144] [Citation(s) in RCA: 262] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Number

Cited by Other Article(s)

101

Kugler JE, Kerner P, Bouquet JM, Jiang D, Di Gregorio A. Evolutionary changes in the notochord genetic toolkit: a comparative analysis of notochord genes in the ascidian Ciona and the larvacean Oikopleura. BMC Evol Biol 2011;11:21. [PMID: 21251251 PMCID: PMC3034685 DOI: 10.1186/1471-2148-11-21] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2010] [Accepted: 01/20/2011] [Indexed: 11/12/2022] Open

Abstract

Background

The notochord is a defining feature of the chordate clade, and invertebrate chordates, such as tunicates, are uniquely suited for studies of this structure. Here we used a well-characterized set of 50 notochord genes known to be targets of the notochord-specific Brachyury transcription factor in one tunicate, Ciona intestinalis (Class Ascidiacea), to begin determining whether the same genetic toolkit is employed to build the notochord in another tunicate, Oikopleura dioica (Class Larvacea). We identified Oikopleura orthologs of the Ciona notochord genes, as well as lineage-specific duplicates for which we determined the phylogenetic relationships with related genes from other chordates, and we analyzed their expression patterns in Oikopleura embryos.

Results

Of the 50 Ciona notochord genes that were used as a reference, only 26 had clearly identifiable orthologs in Oikopleura. Two of these conserved genes appeared to have undergone Oikopleura- and/or tunicate-specific duplications, and one was present in three copies in Oikopleura, thus bringing the number of genes to test to 30. We were able to clone and test 28 of these genes. Thirteen of the 28 Oikopleura orthologs of Ciona notochord genes showed clear expression in all or in part of the Oikopleura notochord, seven were diffusely expressed throughout the tail, six were expressed in tissues other than the notochord, while two probes did not provide a detectable signal at any of the stages analyzed. One of the notochord genes identified, Oikopleura netrin, was found to be unevenly expressed in notochord cells, in a pattern reminiscent of that previously observed for one of the Oikopleura Hox genes.

Conclusions

A surprisingly high number of Ciona notochord genes do not have apparent counterparts in Oikopleura, and only a fraction of the evolutionarily conserved genes show clear notochord expression. This suggests that Ciona and Oikopleura, despite the morphological similarities of their notochords, have developed rather divergent sets of notochord genes after their split from a common tunicate ancestor. This study demonstrates that comparisons between divergent tunicates can lead to insights into the basic complement of genes sufficient for notochord development, and elucidate the constraints that control its composition.

Collapse

102

Hu L, Huang T, Shi X, Lu WC, Cai YD, Chou KC. Predicting functions of proteins in mouse based on weighted protein-protein interaction network and protein hybrid properties. PLoS One 2011;6:e14556. [PMID: 21283518 PMCID: PMC3023709 DOI: 10.1371/journal.pone.0014556] [Citation(s) in RCA: 130] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2010] [Accepted: 12/21/2010] [Indexed: 11/27/2022] Open

103

Plett D, Toubia J, Garnett T, Tester M, Kaiser BN, Baumann U. Dichotomy in the NRT gene families of dicots and grass species. PLoS One 2010;5:e15289. [PMID: 21151904 PMCID: PMC2997785 DOI: 10.1371/journal.pone.0015289] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2010] [Accepted: 11/04/2010] [Indexed: 11/19/2022] Open

104

Cloning, characterization, and expression analysis of Toll-like receptor-7 cDNA from common carp, Cyprinus carpio L. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2010;5:245-55. [DOI: 10.1016/j.cbd.2010.07.001] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2010] [Revised: 07/05/2010] [Accepted: 07/13/2010] [Indexed: 01/02/2023]

105

Schröder A, Eichner J, Supper J, Eichner J, Wanke D, Henneges C, Zell A. Predicting DNA-binding specificities of eukaryotic transcription factors. PLoS One 2010;5:e13876. [PMID: 21152420 PMCID: PMC2994704 DOI: 10.1371/journal.pone.0013876] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2010] [Accepted: 10/14/2010] [Indexed: 11/18/2022] Open

106

Horst JA, Samudrala R. A protein sequence meta-functional signature for calcium binding residue prediction. Pattern Recognit Lett 2010;31:2103-2112. [PMID: 20824111 PMCID: PMC2932634 DOI: 10.1016/j.patrec.2010.04.012] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

107

de Almeida JMGCF. BiDiBlast: comparative genomics pipeline for the PC. GENOMICS PROTEOMICS & BIOINFORMATICS 2010;8:135-8. [PMID: 20691399 PMCID: PMC5054440 DOI: 10.1016/s1672-0229(10)60015-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

108

MacPherson JI, Dickerson JE, Pinney JW, Robertson DL. Patterns of HIV-1 protein interaction identify perturbed host-cellular subsystems. PLoS Comput Biol 2010;6:e1000863. [PMID: 20686668 PMCID: PMC2912648 DOI: 10.1371/journal.pcbi.1000863] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2010] [Accepted: 06/21/2010] [Indexed: 01/12/2023] Open

109

Wong WC, Maurer-Stroh S, Eisenhaber F. More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology. PLoS Comput Biol 2010;6:e1000867. [PMID: 20686689 PMCID: PMC2912341 DOI: 10.1371/journal.pcbi.1000867] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2010] [Accepted: 06/25/2010] [Indexed: 12/16/2022] Open

Abstract

Large-scale genome sequencing gained general importance for life science because functional annotation of otherwise experimentally uncharacterized sequences is made possible by the theory of biomolecular sequence homology. Historically, the paradigm of similarity of protein sequences implying common structure, function and ancestry was generalized based on studies of globular domains. Having the same fold imposes strict conditions over the packing in the hydrophobic core requiring similarity of hydrophobic patterns. The implications of sequence similarity among non-globular protein segments have not been studied to the same extent; nevertheless, homology considerations are silently extended for them. This appears especially detrimental in the case of transmembrane helices (TMs) and signal peptides (SPs) where sequence similarity is necessarily a consequence of physical requirements rather than common ancestry. Thus, matching of SPs/TMs creates the illusion of matching hydrophobic cores. Therefore, inclusion of SPs/TMs into domain models can give rise to wrong annotations. More than 1001 domains among the 10,340 models of Pfam release 23 and 18 domains of SMART version 6 (out of 809) contain SP/TM regions. As expected, fragment-mode HMM searches generate promiscuous hits limited to solely the SP/TM part among clearly unrelated proteins. More worryingly, we show explicit examples that the scores of clearly false-positive hits, even in global-mode searches, can be elevated into the significance range just by matching the hydrophobic runs. In the PIR iProClass database v3.74 using conservative criteria, we find that at least between 2.1% and 13.6% of its annotated Pfam hits appear unjustified for a set of validated domain models. Thus, false-positive domain hits enforced by SP/TM regions can lead to dramatic annotation errors where the hit has nothing in common with the problematic domain model except the SP/TM region itself. We suggest a workflow of flagging problematic hits arising from SP/TM-containing models for critical reconsideration by annotation users.

Sequence homology is a fundamental principle of biology. It implies common phylogenetic ancestry of genes and, subsequently, similarity of their protein products with regard to amino acid sequence, three-dimensional structure and molecular and cellular function. Originally an esoteric concept, homology with the proxy of sequence similarity is used to justify the transfer of functional annotation from well-studied protein examples to new sequences. Yet, functional annotation via sequence similarity seems to have hit a plateau in recent years since relentless annotation transfer led to error propagation across sequence databases; thus, leading experimental follow-up work astray. It must be emphasized that the trinity of sequence, 3D structural and functional similarity has only been proven for globular segments of proteins. For non-globular regions, similarity of sequence is not necessarily a result of divergent evolution from a common ancestor but the consequence of amino acid sequence bias. In our investigation, we found that protein domain databases contain many domain models with transmembrane regions and signal peptides, non-globular segments of proteins having hydrophobic bias. Many proteins have inherited completely wrong function assignments from these domain models. We fear that future function predictions will turn out futile if this issue is not immediately addressed.

Collapse

110

Rawat A, Gust KA, Deng Y, Garcia-Reyero N, Quinn MJ, Johnson MS, Indest KJ, Elasri MO, Perkins EJ. From raw materials to validated system: the construction of a genomic library and microarray to interpret systemic perturbations in Northern bobwhite. Physiol Genomics 2010;42:219-35. [PMID: 20406850 PMCID: PMC3032282 DOI: 10.1152/physiolgenomics.00022.2010] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2010] [Accepted: 04/16/2010] [Indexed: 01/02/2023] Open

111

Comparative transcriptome and secretome analysis of wood decay fungi Postia placenta and Phanerochaete chrysosporium. Appl Environ Microbiol 2010;76:3599-610. [PMID: 20400566 DOI: 10.1128/aem.00058-10] [Citation(s) in RCA: 173] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

112

Heo HS, Oh SJ, Kim JM, Kim HS, Chung HY. TREP_DB: transcriptional regulatory elements pattern database. Biochem Biophys Res Commun 2010;394:309-316. [PMID: 20206134 DOI: 10.1016/j.bbrc.2010.02.169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2010] [Accepted: 02/26/2010] [Indexed: 05/28/2023]

113

Tang ZQ, Lin HH, Zhang HL, Han LY, Chen X, Chen YZ. Prediction of functional class of proteins and peptides irrespective of sequence homology by support vector machines. Bioinform Biol Insights 2009;1:19-47. [PMID: 20066123 PMCID: PMC2789692 DOI: 10.4137/bbi.s315] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

114

Stubben CJ, Duffield ML, Cooper IA, Ford DC, Gans JD, Karlyshev AV, Lingard B, Oyston PCF, de Rochefort A, Song J, Wren BW, Titball RW, Wolinsky M. Steps toward broad-spectrum therapeutics: discovering virulence-associated genes present in diverse human pathogens. BMC Genomics 2009;10:501. [PMID: 19874620 PMCID: PMC2774872 DOI: 10.1186/1471-2164-10-501] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2009] [Accepted: 10/29/2009] [Indexed: 11/10/2022] Open

115

Herbert JMJ, Buffa FM, Vorschmitt H, Egginton S, Bicknell R. A new procedure for determining the genetic basis of a physiological process in a non-model species, illustrated by cold induced angiogenesis in the carp. BMC Genomics 2009;10:490. [PMID: 19852815 PMCID: PMC2771047 DOI: 10.1186/1471-2164-10-490] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2009] [Accepted: 10/23/2009] [Indexed: 12/26/2022] Open

Abstract

BACKGROUND

Physiological processes occur in many species for which there is yet no sequenced genome and for which we would like to identify the genetic basis. For example, some species increase their vascular network to minimise the effects of reduced oxygen diffusion and increased blood viscosity associated with low temperatures. Since many angiogenic and endothelial genes have been discovered in man, functional homolog relationships between carp, zebrafish and human were used to predict the genetic basis of cold-induced angiogenesis in Cyprinus Carpio (carp). In this work, carp sequences were collected and built into contigs. Human-carp functional homolog relationships were derived via zebrafish using a new Conditional Stepped Reciprocal Best Hit (CSRBH) protocol. Data sources including publications, Gene Ontology and cDNA libraries were then used to predict the identity of known or potential angiogenic genes. Finally, re-analyses of cold carp microarray data identified carp genes up-regulated in response to low temperatures in heart and muscle.

RESULTS

The CSRBH approach outperformed all other methods and attained 8,726 carp to human functional homolog relationships for 16,650 contiguous sequences. This represented 3,762 non-redundant genes and 908 of them were predicted to have a role in angiogenesis. The total number of up-regulated differentially expressed genes was 698 and 171 of them were putatively angiogenic. Of these, 5 genes representing the functional homologs NCL, RHOA, MMP9, GRN and MAPK1 are angiogenesis-related genes expressed in response to low temperature.

CONCLUSION

We show that CSRBH functional homologs relationships and re-analyses of gene expression data can be combined in a non-model species to predict genes of biological interest before a genome sequence is fully available. Programs to run these analyses locally are available from http://www.cbrg.ox.ac.uk/~jherbert/.

Collapse

116

Bergholdt R, Brorsson C, Lage K, Nielsen JH, Brunak S, Pociot F. Expression profiling of human genetic and protein interaction networks in type 1 diabetes. PLoS One 2009;4:e6250. [PMID: 19609442 PMCID: PMC2707614 DOI: 10.1371/journal.pone.0006250] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2009] [Accepted: 06/17/2009] [Indexed: 01/07/2023] Open

117

Janky R, Helden JV, Babu MM. Investigating transcriptional regulation: from analysis of complex networks to discovery of cis-regulatory elements. Methods 2009;48:277-86. [PMID: 19450688 DOI: 10.1016/j.ymeth.2009.04.022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2009] [Revised: 04/17/2009] [Accepted: 04/18/2009] [Indexed: 10/20/2022] Open

118

Sam LT, Mendonça EA, Li J, Blake J, Friedman C, Lussier YA. PhenoGO: an integrated resource for the multiscale mining of clinical and biological data. BMC Bioinformatics 2009;10 Suppl 2:S8. [PMID: 19208196 PMCID: PMC2646241 DOI: 10.1186/1471-2105-10-s2-s8] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

119

Annotating proteins with generalized functional linkages. Proc Natl Acad Sci U S A 2008;105:17700-5. [PMID: 19004787 DOI: 10.1073/pnas.0809583105] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

120

Eisenhaber F. Introduction to Bioinformatics. By Arthur M. Lesk. Biotechnol J 2008. [DOI: 10.1002/biot.200800277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

121

Koonin EV, Wolf YI. Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res 2008;36:6688-719. [PMID: 18948295 PMCID: PMC2588523 DOI: 10.1093/nar/gkn668] [Citation(s) in RCA: 480] [Impact Index Per Article: 28.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open

122

Discovering functional novelty in metagenomes: examples from light-mediated processes. J Bacteriol 2008;191:32-41. [PMID: 18849420 DOI: 10.1128/jb.01084-08] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

123

Moreno-Hagelsieb G. Inferring functional relationships from conservation of gene order. Methods Mol Biol 2008;453:181-99. [PMID: 18712303 DOI: 10.1007/978-1-60327-429-6_8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/22/2023]

124

Yano N, Fadden-Paiva KJ, Endoh M, Sakai H, Kurokawa K, Dworkin LD, Rifai A. Profiling the IgA nephropathy renal transcriptome: analysis by complementary DNA array hybridization. Nephrology (Carlton) 2008. [DOI: 10.1046/j.1440-1797.7.s3.10.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

125

Eisenhaber F. From a heap of facts to predictive biological theory: the future of life sciences viewed through the prism of a bioinformatics textbook introduction to bioinformatics 3rd edition. (2008). By Arthur M. Lesk. Oxford University Press. 482 pp. ISBN 978-0-19-920804-3. Bioessays 2008. [DOI: 10.1002/bies.20819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

126

Bergholdt R, Størling ZM, Lage K, Karlberg EO, Olason PI, Aalund M, Nerup J, Brunak S, Workman CT, Pociot F. Integrative analysis for finding genes and networks involved in diabetes and other complex diseases. Genome Biol 2008;8:R253. [PMID: 18045462 PMCID: PMC2258178 DOI: 10.1186/gb-2007-8-11-r253] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2007] [Revised: 10/31/2007] [Accepted: 11/28/2007] [Indexed: 01/17/2023] Open

127

Tran MK, Schultz CJ, Baumann U. Conserved upstream open reading frames in higher plants. BMC Genomics 2008;9:361. [PMID: 18667093 PMCID: PMC2527020 DOI: 10.1186/1471-2164-9-361] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2008] [Accepted: 07/31/2008] [Indexed: 11/10/2022] Open

128

Ahmad I, Hoessli DC, Qazi WM, Khurshid A, Mehmood A, Walker‐Nasir E, Ahmad M, Shakoori AR, Nasir‐ud‐Din. MAPRes: An efficient method to analyze protein sequence around post‐translational modification sites. J Cell Biochem 2008;104:1220-31. [DOI: 10.1002/jcb.21699] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

129

Martinez-Guerrero CE, Ciria R, Abreu-Goodger C, Moreno-Hagelsieb G, Merino E. GeConT 2: gene context analysis for orthologous proteins, conserved domains and metabolic pathways. Nucleic Acids Res 2008;36:W176-80. [PMID: 18511460 PMCID: PMC2447741 DOI: 10.1093/nar/gkn330] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

130

Khwaja TA, Wajahat T, Ahmad I, Hoessli DC, Walker-Nasir E, Kaleem A, Qazi WM, Shakoori AR, Din NU. In silico modulation of apoptotic Bcl-2 proteins by mistletoe lectin-1: functional consequences of protein modifications. J Cell Biochem 2008;103:479-91. [PMID: 17583555 DOI: 10.1002/jcb.21412] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

131

Sridhar J, Rafi ZA. Functional annotations in bacterial genomes based on small RNA signatures. Bioinformation 2008;2:284-95. [PMID: 18478081 PMCID: PMC2374372 DOI: 10.6026/97320630002284] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Accepted: 03/25/2008] [Indexed: 02/01/2023] Open

132

Gonzalez O, Zimmer R. Assigning functional linkages to proteins using phylogenetic profiles and continuous phenotypes. ACTA ACUST UNITED AC 2008;24:1257-63. [PMID: 18381403 DOI: 10.1093/bioinformatics/btn106] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

133

Linghu B, Snitkin ES, Holloway DT, Gustafson AM, Xia Y, DeLisi C. High-precision high-coverage functional inference from integrated data sources. BMC Bioinformatics 2008;9:119. [PMID: 18298847 PMCID: PMC2292694 DOI: 10.1186/1471-2105-9-119] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2007] [Accepted: 02/25/2008] [Indexed: 11/15/2022] Open

Abstract

Background

Information obtained from diverse data sources can be combined in a principled manner using various machine learning methods to increase the reliability and range of knowledge about protein function. The result is a weighted functional linkage network (FLN) in which linked neighbors share at least one function with high probability. Precision is, however, low. Aiming to provide precise functional annotation for as many proteins as possible, we explore and propose a two-step framework for functional annotation (1) construction of a high-coverage and reliable FLN via machine learning techniques (2) development of a decision rule for the constructed FLN to optimize functional annotation.

Results

We first apply this framework to Saccharomyces cerevisiae. In the first step, we demonstrate that four commonly used machine learning methods, Linear SVM, Linear Discriminant Analysis, Naïve Bayes, and Neural Network, all combine heterogeneous data to produce reliable and high-coverage FLNs, in which the linkage weight more accurately estimates functional coupling of linked proteins than use individual data sources alone. In the second step, empirical tuning of an adjustable decision rule on the constructed FLN reveals that basing annotation on maximum edge weight results in the most precise annotation at high coverages. In particular at low coverage all rules evaluated perform comparably. At coverage above approximately 50%, however, they diverge rapidly. At full coverage, the maximum weight decision rule still has a precision of approximately 70%, whereas for other methods, precision ranges from a high of slightly more than 30%, down to 3%. In addition, a scoring scheme to estimate the precisions of individual predictions is also provided. Finally, tests of the robustness of the framework indicate that our framework can be successfully applied to less studied organisms.

Conclusion

We provide a general two-step function-annotation framework, and show that high coverage, high precision annotations can be achieved by constructing a high-coverage and reliable FLN via data integration followed by applying a maximum weight decision rule.

Collapse

134

Nair R, Rost B. Protein subcellular localization prediction using artificial intelligence technology. Methods Mol Biol 2008;484:435-63. [PMID: 18592195 DOI: 10.1007/978-1-59745-398-1_27] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]

Abstract

Proteins perform many important tasks in living organisms, such as catalysis of biochemical reactions, transport of nutrients, and recognition and transmission of signals. The plethora of aspects of the role of any particular protein is referred to as its "function." One aspect of protein function that has been the target of intensive research by computational biologists is its subcellular localization. Proteins must be localized in the same subcellular compartment to cooperate toward a common physiological function. Aberrant subcellular localization of proteins can result in several diseases, including kidney stones, cancer, and Alzheimer's disease. To date, sequence homology remains the most widely used method for inferring the function of a protein. However, the application of advanced artificial intelligence (AI)-based techniques in recent years has resulted in significant improvements in our ability to predict the subcellular localization of a protein. The prediction accuracy has risen steadily over the years, in large part due to the application of AI-based methods such as hidden Markov models (HMMs), neural networks (NNs), and support vector machines (SVMs), although the availability of larger experimental datasets has also played a role. Automatic methods that mine textual information from the biological literature and molecular biology databases have considerably sped up the process of annotation for proteins for which some information regarding function is available in the literature. State-of-the-art methods based on NNs and HMMs can predict the presence of N-terminal sorting signals extremely accurately. Ab initio methods that predict subcellular localization for any protein sequence using only the native amino acid sequence and features predicted from the native sequence have shown the most remarkable improvements. The prediction accuracy of these methods has increased by over 30% in the past decade. The accuracy of these methods is now on par with high-throughput methods for predicting localization, and they are beginning to play an important role in directing experimental research. In this chapter, we review some of the most important methods for the prediction of subcellular localization.

Collapse

135

Lee D, Redfern O, Orengo C. Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 2007;8:995-1005. [PMID: 18037900 DOI: 10.1038/nrm2281] [Citation(s) in RCA: 371] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

136

Moreno-Hagelsieb G, Latimer K. Choosing BLAST options for better detection of orthologs as reciprocal best hits. Bioinformatics 2007;24:319-24. [PMID: 18042555 DOI: 10.1093/bioinformatics/btm585] [Citation(s) in RCA: 360] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

137

Gotzek D, Ross KG. Genetic regulation of colony social organization in fire ants: an integrative overview. QUARTERLY REVIEW OF BIOLOGY 2007;82:201-26. [PMID: 17937246 DOI: 10.1086/519965] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]

Abstract

Expression of colony social organization in fire ants appears to be under the control of a single Mendelian factor of large effect. Variation in colony queen number in Solenopsis invicta and its relatives is associated with allelic variation at the gene Gp-9, but not with variation at other unlinked genes; workers regulate queen identity and number on the basis of Gp-9 genotypic compatibility. Nongenetic factors, such as prior social experience, queen reproductive status, and local environment, have negligible effects on queen numbers which illustrates the nearly complete penetrance of Gp-9. As predicted, queen number can be manipulated experimentally by altering worker Gp-9 genotype frequencies. The Gp-9 allele lineage associated with polygyny in South American fire ants has been retained across multiple speciation events, which may signal the action of balancing selection to maintain social polymorphism in these species. Moreover, positive selection is implicated in driving the molecular evolution of Gp-9 in association with the origin of polygyny. The identity of the product of Gp-9 as an odorant-binding protein suggests plausible scenarios for its direct involvement in the regulation of queen number via a role in chemical communication. While these and other lines of evidence show that Gp-9 represents a legitimate candidate gene of major effect, studies aimed at determining (i) the biochemical pathways in which GP-9 functions; (ii) the phenotypic effects of molecular variation at Gp-9 and other pathway genes; and (iii) the potential involvement of genes in linkage disequilibrium with Gp-9 are needed to elucidate the genetic architecture underlying social organization in fire ants. Information that reveals the links between molecular variation, individual phenotype, and colony-level behaviors, combined with behavioral models that incorporate details of the chemical communication involved in regulating queen number, will yield a novel integrated view of the evolutionary changes underlying a key social adaptation.

Collapse

138

McLaughlin WA, Chen K, Hou T, Wang W. On the detection of functionally coherent groups of protein domains with an extension to protein annotation. BMC Bioinformatics 2007;8:390. [PMID: 17937820 PMCID: PMC2151957 DOI: 10.1186/1471-2105-8-390] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2007] [Accepted: 10/16/2007] [Indexed: 01/31/2023] Open

Abstract

Background

Protein domains coordinate to perform multifaceted cellular functions, and domain combinations serve as the functional building blocks of the cell. The available methods to identify functional domain combinations are limited in their scope, e.g. to the identification of combinations falling within individual proteins or within specific regions in a translated genome. Further effort is needed to identify groups of domains that span across two or more proteins and are linked by a cooperative function. Such functional domain combinations can be useful for protein annotation.

Results

Using a new computational method, we have identified 114 groups of domains, referred to as domain assembly units (DASSEM units), in the proteome of budding yeast Saccharomyces cerevisiae. The units participate in many important cellular processes such as transcription regulation, translation initiation, and mRNA splicing. Within the units the domains were found to function in a cooperative manner; and each domain contributed to a different aspect of the unit's overall function. The member domains of DASSEM units were found to be significantly enriched among proteins contained in transcription modules, defined as genes sharing similar expression profiles and presumably similar functions. The observation further confirmed the functional coherence of DASSEM units. The functional linkages of units were found in both functionally characterized and uncharacterized proteins, which enabled the assessment of protein function based on domain composition.

Conclusion

A new computational method was developed to identify groups of domains that are linked by a common function in the proteome of Saccharomyces cerevisiae. These groups can either lie within individual proteins or span across different proteins. We propose that the functional linkages among the domains within the DASSEM units can be used as a non-homology based tool to annotate uncharacterized proteins.

Collapse

139

Harrington ED, Singh AH, Doerks T, Letunic I, von Mering C, Jensen LJ, Raes J, Bork P. Quantitative assessment of protein function prediction from metagenomics shotgun sequences. Proc Natl Acad Sci U S A 2007;104:13913-8. [PMID: 17717083 PMCID: PMC1955820 DOI: 10.1073/pnas.0702636104] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

140

Ahmad I, Hoessli DC, Gupta R, Walker-Nasir E, Rafik SM, Choudhary MI, Shakoori AR. In silico determination of intracellular glycosylation and phosphorylation sites in human selectins: implications for biological function. J Cell Biochem 2007;100:1558-72. [PMID: 17230456 DOI: 10.1002/jcb.21156] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

141

Quantitative assessment of relationship between sequence similarity and function similarity. BMC Genomics 2007;8:222. [PMID: 17620139 PMCID: PMC1949826 DOI: 10.1186/1471-2164-8-222] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2006] [Accepted: 07/09/2007] [Indexed: 11/16/2022] Open

142

Ahmad I, Hoessli DC, Walker-Nasir E, Choudhary MI, Rafik SM, Shakoori AR. Phosphorylation and glycosylation interplay: protein modifications at hydroxy amino acids and prediction of signaling functions of the human beta3 integrin family. J Cell Biochem 2007;99:706-18. [PMID: 16676352 DOI: 10.1002/jcb.20814] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

143

Raes J, Harrington ED, Singh AH, Bork P. Protein function space: viewing the limits or limited by our view? Curr Opin Struct Biol 2007;17:362-9. [PMID: 17574832 DOI: 10.1016/j.sbi.2007.05.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2007] [Revised: 04/25/2007] [Accepted: 05/31/2007] [Indexed: 12/13/2022]

144

Bryliński M, Prymula K, Jurkowski W, Kochańczyk M, Stawowczyk E, Konieczny L, Roterman I. Prediction of functional sites based on the fuzzy oil drop model. PLoS Comput Biol 2007;3:e94. [PMID: 17530916 PMCID: PMC1876487 DOI: 10.1371/journal.pcbi.0030094] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2006] [Accepted: 04/11/2007] [Indexed: 11/19/2022] Open

145

Brylinski M, Kochanczyk M, Broniatowska E, Roterman I. Localization of ligand binding site in proteins identified in silico. J Mol Model 2007;13:665-75. [PMID: 17394030 DOI: 10.1007/s00894-007-0191-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2006] [Accepted: 02/26/2007] [Indexed: 01/21/2023]

146

Bi R, Zhou Y, Lu F, Wang W. Predicting Gene Ontology functions based on support vector machines and statistical significance estimation. Neurocomputing 2007. [DOI: 10.1016/j.neucom.2006.10.006] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

147

Duan ZH, Hughes B, Reichel L, Perez DM, Shi T. The relationship between protein sequences and their gene ontology functions. BMC Bioinformatics 2006;7 Suppl 4:S11. [PMID: 17217503 PMCID: PMC1780109 DOI: 10.1186/1471-2105-7-s4-s11] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

148

Liu Y, Li J, Sam L, Goh CS, Gerstein M, Lussier YA. An integrative genomic approach to uncover molecular mechanisms of prokaryotic traits. PLoS Comput Biol 2006;2:e159. [PMID: 17112314 PMCID: PMC1636675 DOI: 10.1371/journal.pcbi.0020159] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2006] [Accepted: 10/10/2006] [Indexed: 11/18/2022] Open

Abstract

With mounting availability of genomic and phenotypic databases, data integration and mining become increasingly challenging. While efforts have been put forward to analyze prokaryotic phenotypes, current computational technologies either lack high throughput capacity for genomic scale analysis, or are limited in their capability to integrate and mine data across different scales of biology. Consequently, simultaneous analysis of associations among genomes, phenotypes, and gene functions is prohibited. Here, we developed a high throughput computational approach, and demonstrated for the first time the feasibility of integrating large quantities of prokaryotic phenotypes along with genomic datasets for mining across multiple scales of biology (protein domains, pathways, molecular functions, and cellular processes). Applying this method over 59 fully sequenced prokaryotic species, we identified genetic basis and molecular mechanisms underlying the phenotypes in bacteria. We identified 3,711 significant correlations between 1,499 distinct Pfam and 63 phenotypes, with 2,650 correlations and 1,061 anti-correlations. Manual evaluation of a random sample of these significant correlations showed a minimal precision of 30% (95% confidence interval: 20%-42%; n = 50). We stratified the most significant 478 predictions and subjected 100 to manual evaluation, of which 60 were corroborated in the literature. We furthermore unveiled 10 significant correlations between phenotypes and KEGG pathways, eight of which were corroborated in the evaluation, and 309 significant correlations between phenotypes and 166 GO concepts evaluated using a random sample (minimal precision = 72%; 95% confidence interval: 60%-80%; n = 50). Additionally, we conducted a novel large-scale phenomic visualization analysis to provide insight into the modular nature of common molecular mechanisms spanning multiple biological scales and reused by related phenotypes (metaphenotypes). We propose that this method elucidates which classes of molecular mechanisms are associated with phenotypes or metaphenotypes and holds promise in facilitating a computable systems biology approach to genomic and biomedical research.

Collapse

149

Scheeff ED, Bourne PE. Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction. BMC Bioinformatics 2006;7:410. [PMID: 16970830 PMCID: PMC1622756 DOI: 10.1186/1471-2105-7-410] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2006] [Accepted: 09/14/2006] [Indexed: 11/30/2022] Open

Abstract

Background

One of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.

Results

We explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.

Conclusion

When attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used.

Collapse

150

Rossi A, Marti-Renom MA, Sali A. Localization of binding sites in protein structures by optimization of a composite scoring function. Protein Sci 2006;15:2366-80. [PMID: 16963645 PMCID: PMC2242385 DOI: 10.1110/ps.062247506] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]