Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Park J, Karplus K, Barrett C, Hughey R, Haussler D, Hubbard T, Chothia C. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol 1998;284:1201-10. [PMID: 9837738 DOI: 10.1006/jmbi.1998.2221] [Citation(s) in RCA: 340] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

For:	Park J, Karplus K, Barrett C, Hughey R, Haussler D, Hubbard T, Chothia C. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol 1998;284:1201-10. [PMID: 9837738 DOI: 10.1006/jmbi.1998.2221] [Citation(s) in RCA: 340] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Number

Cited by Other Article(s)

201

Madera M, Vogel C, Kummerfeld SK, Chothia C, Gough J. The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res 2004;32:D235-9. [PMID: 14681402 PMCID: PMC308851 DOI: 10.1093/nar/gkh117] [Citation(s) in RCA: 179] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

202

Qian B, Soyer OS, Neubig RR, Goldstein RA. Depicting a protein's two faces: GPCR classification by phylogenetic tree-based HMMs. FEBS Lett 2003;554:95-9. [PMID: 14596921 DOI: 10.1016/s0014-5793(03)01112-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

203

Sitbon E, Pietrokovski S. New types of conserved sequence domains in DNA-binding regions of homing endonucleases. Trends Biochem Sci 2003;28:473-7. [PMID: 13678957 DOI: 10.1016/s0968-0004(03)00170-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

204

Liao L, Noble WS. Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships. J Comput Biol 2003;10:857-68. [PMID: 14980014 DOI: 10.1089/106652703322756113] [Citation(s) in RCA: 151] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

205

Marti‐Renom MA, Madhusudhan M, Eswar N, Pieper U, Shen M, Sali A, Fiser A, Mirkovic N, John B, Stuart A. Modeling Protein Structure from its Sequence. ACTA ACUST UNITED AC 2003. [DOI: 10.1002/0471250953.bi0501s03] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

206

Wlodawer A, Durell SR, Li M, Oyama H, Oda K, Dunn BM. A model of tripeptidyl-peptidase I (CLN2), a ubiquitous and highly conserved member of the sedolisin family of serine-carboxyl peptidases. BMC STRUCTURAL BIOLOGY 2003;3:8. [PMID: 14609438 PMCID: PMC280685 DOI: 10.1186/1472-6807-3-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2003] [Accepted: 11/11/2003] [Indexed: 11/10/2022]

207

Theobald DL, Cervantes RB, Lundblad V, Wuttke DS. Homology among telomeric end-protection proteins. Structure 2003;11:1049-50. [PMID: 12962623 DOI: 10.1016/s0969-2126(03)00183-7] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

208

Sandhya S, Kishore S, Sowdhamini R, Srinivasan N. Effective detection of remote homologues by searching in sequence dataset of a protein domain fold. FEBS Lett 2003;552:225-30. [PMID: 14527691 DOI: 10.1016/s0014-5793(03)00929-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

209

Grigoriev IV, Choi IG. Target selection for structural genomics: a single genome approach. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2003;6:349-62. [PMID: 12626094 DOI: 10.1089/153623102321112773] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

210

Qian B, Goldstein RA. Detecting distant homologs using phylogenetic tree-based HMMs. Proteins 2003;52:446-53. [PMID: 12866055 DOI: 10.1002/prot.10373] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

211

de Bono B, Trowsdale J. Exploring the immunogenome with bioinformatics. Semin Immunol 2003;15:233-8. [PMID: 14690048 DOI: 10.1016/s1044-5323(03)00049-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

212

Parenicová L, de Folter S, Kieffer M, Horner DS, Favalli C, Busscher J, Cook HE, Ingram RM, Kater MM, Davies B, Angenent GC, Colombo L. Molecular and phylogenetic analyses of the complete MADS-box transcription factor family in Arabidopsis: new openings to the MADS world. THE PLANT CELL 2003;15:1538-51. [PMID: 12837945 PMCID: PMC165399 DOI: 10.1105/tpc.011544] [Citation(s) in RCA: 605] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2003] [Accepted: 04/21/2003] [Indexed: 05/18/2023]

213

Irving JA, Spithill TW, Pike RN, Whisstock JC, Smooker PM. The evolution of enzyme specificity in Fasciola spp. J Mol Evol 2003;57:1-15. [PMID: 12962301 DOI: 10.1007/s00239-002-2434-x] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

214

Lee D, Grant A, Buchan D, Orengo C. A structural perspective on genome evolution. Curr Opin Struct Biol 2003;13:359-69. [PMID: 12831888 DOI: 10.1016/s0959-440x(03)00079-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

215

Karchin R, Cline M, Mandel-Gutfreund Y, Karplus K. Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry. Proteins 2003;51:504-14. [PMID: 12784210 DOI: 10.1002/prot.10369] [Citation(s) in RCA: 137] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

216

Heger A, Holm L. Exhaustive enumeration of protein domain families. J Mol Biol 2003;328:749-67. [PMID: 12706730 DOI: 10.1016/s0022-2836(03)00269-9] [Citation(s) in RCA: 92] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

217

Lolkema JS, Slotboom DJ. Classification of 29 families of secondary transport proteins into a single structural class using hydropathy profile analysis. J Mol Biol 2003;327:901-9. [PMID: 12662917 DOI: 10.1016/s0022-2836(03)00214-6] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

218

Van Walle I, Lasters I, Wyns L. Consistency matrices: quantified structure alignments for sets of related proteins. Proteins 2003;51:1-9. [PMID: 12596259 DOI: 10.1002/prot.10293] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

219

Gurung R, Tan A, Ooms LM, McGrath MJ, Huysmans RD, Munday AD, Prescott M, Whisstock JC, Mitchell CA. Identification of a novel domain in two mammalian inositol-polyphosphate 5-phosphatases that mediates membrane ruffle localization. The inositol 5-phosphatase skip localizes to the endoplasmic reticulum and translocates to membrane ruffles following epidermal growth factor stimulation. J Biol Chem 2003;278:11376-85. [PMID: 12536145 DOI: 10.1074/jbc.m209991200] [Citation(s) in RCA: 81] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open

220

Gille C, Goede A, Schlöetelburg C, Preissner R, Kloetzel PM, Göbel UB, Frömmel C. A comprehensive view on proteasomal sequences: implications for the evolution of the proteasome. J Mol Biol 2003;326:1437-48. [PMID: 12595256 DOI: 10.1016/s0022-2836(02)01470-5] [Citation(s) in RCA: 73] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Abstract

Proteasomes are large multimeric self-compartmentizing proteases, which play a crucial role in the clearance of misfolded proteins, breakdown of regulatory proteins, processing of proteins by specific partial proteolysis, cell cycle control as well as preparation of peptides for immune presentation. Two main types can be distinguished by their different tertiary structure: the 20S proteasome and the proteasome-like heat shock protein encoded by heat shock locus V, hslV. Usually, each biological kingdom is characterized by its specific type of proteasome. The 20S proteasomes occur in eukarya and archaea whereas hslV protease is prevalent in bacteria. To verify this rule we applied a genome-wide sequence search to identify proteasomal sequences in data of finished and yet unfinished genome projects. We found several exceptions to this paradigm: (1) Protista: in addition to the 20S proteasome, Leishmania, Trypanosoma and Plasmodium contained hslV, which may have been acquired from an alpha-proteobacterial progenitor of mitochondria. (2) Bacteria: for Magnetospirillum magnetotacticum and Enterococcus faecium we found that each contained two distinct hslVs due to gene duplication or horizontal transfer. Including unassembled data into the analyses we confirmed that a number of bacterial genomes do not contain any proteasomal sequence due to gene loss. (3) High G+C Gram-positives: we confirmed that high G+C Gram-positives possess 20S proteasomes rather than hslV proteases. The core of the 20S proteasome consists of two distinct main types of homologous monomers, alpha and beta, which differentiated into seven subtypes by further gene duplications. By looking at the genome of the intracellular pathogen Encephalitozoon cuniculi we were able to show that differentiation of beta-type subunits into different subtypes occurred earlier than that of alpha-subunits. Additionally, our search strategy had an important methodological consequence: a comprehensive sequence search for a particular protein should also include the raw sequence data when possible because proteins might be missed in the completed assembled genome. The structure-based multiple proteasomal alignment of 433 sequences from 143 organisms can be downloaded from the URL dagger and will be updated regularly.

Collapse

221

Kanehisa M, Bork P. Bioinformatics in the post-sequence era. Nat Genet 2003;33 Suppl:305-10. [PMID: 12610540 DOI: 10.1038/ng1109] [Citation(s) in RCA: 82] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

222

Swalla BM, Gumport RI, Gardner JF. Conservation of structure and function among tyrosine recombinases: homology-based modeling of the lambda integrase core-binding domain. Nucleic Acids Res 2003;31:805-18. [PMID: 12560475 PMCID: PMC149183 DOI: 10.1093/nar/gkg142] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

223

Edwards YJK, Cottage A. Bioinformatics methods to predict protein structure and function. A practical approach. Mol Biotechnol 2003;23:139-66. [PMID: 12632698 DOI: 10.1385/mb:23:2:139] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

224

Panchenko AR. Finding weak similarities between proteins by sequence profile comparison. Nucleic Acids Res 2003;31:683-9. [PMID: 12527777 PMCID: PMC140518 DOI: 10.1093/nar/gkg154] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

225

Sen S. Statistical analysis of pair-wise compatibility of spatially nearest neighbor and adjacent residues in alpha-helix and beta-strands: application to a minimal model for secondary structure prediction. Biophys Chem 2003;103:35-49. [PMID: 12504253 DOI: 10.1016/s0301-4622(02)00230-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

226

Cazalis R, Aussenac T, Rhazi L, Marin A, Gibrat JF. Homology modeling and molecular dynamics simulations of the N-terminal domain of wheat high molecular weight glutenin subunit 10. Protein Sci 2003;12:34-43. [PMID: 12493826 PMCID: PMC2312395 DOI: 10.1110/ps.0229803] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

227

Fiser A, Sali A. Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol 2003;374:461-91. [PMID: 14696385 DOI: 10.1016/s0076-6879(03)74020-8] [Citation(s) in RCA: 1330] [Impact Index Per Article: 60.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

228

Dickens NJ, Ponting CP. THoR: a tool for domain discovery and curation of multiple alignments. Genome Biol 2003;4:R52. [PMID: 12914660 PMCID: PMC193644 DOI: 10.1186/gb-2003-4-8-r52] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2003] [Revised: 06/17/2003] [Accepted: 06/25/2003] [Indexed: 11/21/2022] Open

229

Krebs WG, Tsai J, Alexandrov V, Junker J, Jansen R, Gerstein M. Tools and Databases to Analyze Protein Flexibility; Approaches to Mapping Implied Features onto Sequences. Methods Enzymol 2003;374:544-84. [PMID: 14696388 DOI: 10.1016/s0076-6879(03)74023-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]

230

Marin A, Pothier J, Zimmermann K, Gibrat JF. FROST: a filter-based fold recognition method. Proteins 2002;49:493-509. [PMID: 12402359 DOI: 10.1002/prot.10231] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

231

Nair R, Rost B. Sequence conserved for subcellular localization. Protein Sci 2002;11:2836-47. [PMID: 12441382 PMCID: PMC2373743 DOI: 10.1110/ps.0207402] [Citation(s) in RCA: 114] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2002] [Revised: 09/05/2002] [Accepted: 09/10/2002] [Indexed: 10/27/2022]

232

Udaka K, Mamitsuka H, Nakaseko Y, Abe N. Empirical evaluation of a dynamic experiment design method for prediction of MHC class I-binding peptides. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2002;169:5744-53. [PMID: 12421954 DOI: 10.4049/jimmunol.169.10.5744] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

233

Mateos A, Dopazo J, Jansen R, Tu Y, Gerstein M, Stolovitzky G. Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genome Res 2002;12:1703-15. [PMID: 12421757 PMCID: PMC187551 DOI: 10.1101/gr.192502] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Abstract

Recent advances in microarray technology have opened new ways for functional annotation of previously uncharacterised genes on a genomic scale. This has been demonstrated by unsupervised clustering of co-expressed genes and, more importantly, by supervised learning algorithms. Using prior knowledge, these algorithms can assign functional annotations based on more complex expression signatures found in existing functional classes. Previously, support vector machines (SVMs) and other machine-learning methods have been applied to a limited number of functional classes for this purpose. Here we present, for the first time, the comprehensive application of supervised neural networks (SNNs) for functional annotation. Our study is novel in that we report systematic results for ~100 classes in the Munich Information Center for Protein Sequences (MIPS) functional catalog. We found that only ~10% of these are learnable (based on the rate of false negatives). A closer analysis reveals that false positives (and negatives) in a machine-learning context are not necessarily "false" in a biological sense. We show that the high degree of interconnections among functional classes confounds the signatures that ought to be learned for a unique class. We term this the "Borges effect" and introduce two new numerical indices for its quantification. Our analysis indicates that classification systems with a lower Borges effect are better suitable for machine learning. Furthermore, we introduce a learning procedure for combining false positives with the original class. We show that in a few iterations this process converges to a gene set that is learnable with considerably low rates of false positives and negatives and contains genes that are biologically related to the original class, allowing for a coarse reconstruction of the interactions between associated biological pathways. We exemplify this methodology using the well-studied tricarboxylic acid cycle.

Collapse

234

Müller A, MacCallum RM, Sternberg MJE. Structural characterization of the human proteome. Genome Res 2002;12:1625-41. [PMID: 12421749 PMCID: PMC187559 DOI: 10.1101/gr.221202] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

235

Harton JA, Linhoff MW, Zhang J, Ting JPY. Cutting edge: CATERPILLER: a large family of mammalian genes containing CARD, pyrin, nucleotide-binding, and leucine-rich repeat domains. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2002;169:4088-93. [PMID: 12370334 DOI: 10.4049/jimmunol.169.8.4088] [Citation(s) in RCA: 238] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

236

Schultz J, Pils B. Prediction of structure and functional residues for O-GlcNAcase, a divergent homologue of acetyltransferases. FEBS Lett 2002;529:179-82. [PMID: 12372596 DOI: 10.1016/s0014-5793(02)03322-7] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

237

Madera M, Gough J. A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Res 2002;30:4321-8. [PMID: 12364612 PMCID: PMC140544 DOI: 10.1093/nar/gkf544] [Citation(s) in RCA: 108] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

238

George RA, Heringa J. Protein domain identification and improved sequence similarity searching using PSI-BLAST. Proteins 2002;48:672-81. [PMID: 12211035 DOI: 10.1002/prot.10175] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

239

Luo RY, Feng ZP, Liu JK. Prediction of protein structural class by amino acid and polypeptide composition. EUROPEAN JOURNAL OF BIOCHEMISTRY 2002;269:4219-25. [PMID: 12199700 DOI: 10.1046/j.1432-1033.2002.03115.x] [Citation(s) in RCA: 110] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

240

Valdar WSJ. Scoring residue conservation. Proteins 2002;48:227-41. [PMID: 12112692 DOI: 10.1002/prot.10146] [Citation(s) in RCA: 473] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

241

Li W, Jaroszewski L, Godzik A. Sequence clustering strategies improve remote homology recognitions while reducing search times. Protein Eng Des Sel 2002;15:643-9. [PMID: 12364578 DOI: 10.1093/protein/15.8.643] [Citation(s) in RCA: 44] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

242

Ponting CP, Russell RR. The natural history of protein domains. ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE 2002;31:45-71. [PMID: 11988462 DOI: 10.1146/annurev.biophys.31.082901.134314] [Citation(s) in RCA: 199] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

243

Mougous JD, Green RE, Williams SJ, Brenner SE, Bertozzi CR. Sulfotransferases and sulfatases in mycobacteria. CHEMISTRY & BIOLOGY 2002;9:767-76. [PMID: 12144918 DOI: 10.1016/s1074-5521(02)00175-8] [Citation(s) in RCA: 92] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

244

Koretke KK, Russell RB, Lupas AN. Fold recognition without folds. Protein Sci 2002;11:1575-9. [PMID: 12021456 PMCID: PMC2373620 DOI: 10.1110/ps.3590102] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]

245

Vitale L, Casadei R, Canaider S, Lenzi L, Strippoli P, D'Addabbo P, Giannone S, Carinci P, Zannotti M. Cysteine and tyrosine-rich 1 (CYYR1), a novel unpredicted gene on human chromosome 21 (21q21.2), encodes a cysteine and tyrosine-rich protein and defines a new family of highly conserved vertebrate-specific genes. Gene 2002;290:141-51. [PMID: 12062809 DOI: 10.1016/s0378-1119(02)00550-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

246

Hegyi H, Lin J, Greenbaum D, Gerstein M. Structural genomics analysis: characteristics of atypical, common, and horizontally transferred folds. Proteins 2002;47:126-41. [PMID: 11933060 DOI: 10.1002/prot.10078] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Abstract

We conducted a structural genomics analysis of the folds and structural superfamilies in the first 20 completely sequenced genomes by focusing on the patterns of fold usage and trying to identify structural characteristics of typical and atypical folds. We assigned folds to sequences using PSI-blast, run with a systematic protocol to reduce the amount of computational overhead. On average, folds could be assigned to about a fourth of the ORFs in the genomes and about a fifth of the amino acids in the proteomes. More than 80% of all the folds in the SCOP structural classification were identified in one of the 20 organisms, with worm and E. coli having the largest number of distinct folds. Folds are particularly effective at comprehensively measuring levels of gene duplication, because they group together even very remote homologues. Using folds, we find the average level of duplication varies depending on the complexity of the organism, ranging from 2.4 in M. genitalium to 32 for the worm, values significantly higher than those observed based purely on sequence similarity. We rank the common folds in the 20 organisms, finding that the top three are the P-loop NTP hydrolase, the ferrodoxin fold, and the TIM-barrel, and discuss in detail the many factors that affect and bias these rankings. We also identify atypical folds that are "unique" to one of the organisms in our study and compare the characteristics of these folds with the most common ones. We find that common folds tend be more multifunctional and associated with more regular, "symmetrical" structures than the unique ones. In addition, many of the unique folds are associated with proteins involved in cell defense (e.g., toxins). We analyze specific patterns of fold occurrence in the genomes by associating some of them with instances of horizontal transfer and others with gene loss. In particular, we find three possible examples of transfer between archaea and bacteria and six between eukarya and bacteria. We make available our detailed results at http://genecensus.org/20.

Collapse

247

Rost B. Enzyme function less conserved than anticipated. J Mol Biol 2002;318:595-608. [PMID: 12051862 DOI: 10.1016/s0022-2836(02)00016-5] [Citation(s) in RCA: 255] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

248

Karwath A, King RD. Homology induction: the use of machine learning to improve sequence similarity searches. BMC Bioinformatics 2002;3:11. [PMID: 11972320 PMCID: PMC107726 DOI: 10.1186/1471-2105-3-11] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2001] [Accepted: 04/23/2002] [Indexed: 11/10/2022] Open

249

Buchan DWA, Shepherd AJ, Lee D, Pearl FMG, Rison SCG, Thornton JM, Orengo CA. Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database. Genome Res 2002;12:503-14. [PMID: 11875040 PMCID: PMC155287 DOI: 10.1101/gr.213802] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

250

Hedman M, Deloof H, Von Heijne G, Elofsson A. Improved detection of homologous membrane proteins by inclusion of information from topology predictions. Protein Sci 2002;11:652-8. [PMID: 11847287 PMCID: PMC2373465 DOI: 10.1110/ps.39402] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]