Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hegyi H, Lin J, Greenbaum D, Gerstein M. Structural genomics analysis: characteristics of atypical, common, and horizontally transferred folds. Proteins 2002;47:126-41. [PMID: 11933060 DOI: 10.1002/prot.10078] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

For:	Hegyi H, Lin J, Greenbaum D, Gerstein M. Structural genomics analysis: characteristics of atypical, common, and horizontally transferred folds. Proteins 2002;47:126-41. [PMID: 11933060 DOI: 10.1002/prot.10078] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Number

Cited by Other Article(s)

Hernandez-Guerrero R, Galán-Vásquez E, Pérez-Rueda E. The protein architecture in Bacteria and Archaea identifies a set of promiscuous and ancient domains. PLoS One 2019;14:e0226604. [PMID: 31856202 PMCID: PMC6922389 DOI: 10.1371/journal.pone.0226604] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Accepted: 11/29/2019] [Indexed: 11/19/2022] Open

Shin WH, Kihara D. 55 Years of the Rossmann Fold. Methods Mol Biol 2019;1958:1-13. [PMID: 30945211 DOI: 10.1007/978-1-4939-9161-7_1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Kauko A, Lehto K. Eukaryote specific folds: Part of the whole. Proteins 2018;86:868-881. [PMID: 29675831 DOI: 10.1002/prot.25517] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2017] [Revised: 04/17/2018] [Accepted: 04/18/2018] [Indexed: 01/07/2023]

Li H. Structural Principles of CRISPR RNA Processing. Structure 2014;23:13-20. [PMID: 25435327 DOI: 10.1016/j.str.2014.10.006] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2014] [Revised: 10/02/2014] [Accepted: 10/07/2014] [Indexed: 10/24/2022]

Lin J, Qian J. Systems biology approach to integrative comparative genomics. Expert Rev Proteomics 2014;4:107-19. [PMID: 17288519 DOI: 10.1586/14789450.4.1.107] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]

Narra HP, Cordes MHJ, Ochman H. Structural features and the persistence of acquired proteins. Proteomics 2009;8:4772-81. [PMID: 18924109 PMCID: PMC3014317 DOI: 10.1002/pmic.200800061] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Characterization of YvcJ, a conserved P-loop-containing protein, and its implication in competence in Bacillus subtilis. J Bacteriol 2008;191:1556-64. [PMID: 19074378 DOI: 10.1128/jb.01493-08] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

Frenkel ZM. Does Protein Relatedness Require Sequence Matching? AlignmentviaNetworks in Sequence Space. J Biomol Struct Dyn 2008;26:215-22. [DOI: 10.1080/07391102.2008.10507237] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Hartling J, Kim J. Mutational robustness and geometrical form in protein structures. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2008;310:216-26. [PMID: 17973270 DOI: 10.1002/jez.b.21203] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Linial M. Fishing with (Proto)Net-a principled approach to protein target selection. Comp Funct Genomics 2008;4:542-8. [PMID: 18629007 PMCID: PMC2447289 DOI: 10.1002/cfg.328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2003] [Revised: 08/05/2003] [Accepted: 08/05/2003] [Indexed: 12/02/2022] Open

Frenkel ZM, Trifonov EN. Evolutionary Networks in the Formatted Protein Sequence Space. J Comput Biol 2007;14:1044-57. [DOI: 10.1089/cmb.2007.0066] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Sprinzak E, Altuvia Y, Margalit H. Characterization and prediction of protein-protein interactions within and between complexes. Proc Natl Acad Sci U S A 2006;103:14718-23. [PMID: 17003128 PMCID: PMC1595418 DOI: 10.1073/pnas.0603352103] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Wong P, Frishman D. Fold designability, distribution, and disease. PLoS Comput Biol 2006;2:e40. [PMID: 16680196 PMCID: PMC1456317 DOI: 10.1371/journal.pcbi.0020040] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2005] [Accepted: 03/17/2006] [Indexed: 12/04/2022] Open

Abeln S, Deane CM. Fold usage on genomes and protein fold evolution. Proteins 2006;60:690-700. [PMID: 16001400 DOI: 10.1002/prot.20506] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Lee D, Grant A, Marsden RL, Orengo C. Identification and distribution of protein families in 120 completed genomes using Gene3D. Proteins 2006;59:603-15. [PMID: 15768405 DOI: 10.1002/prot.20409] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Abstract

Using a new protocol, PFscape, we undertake a systematic identification of protein families and domain architectures in 120 complete genomes. PFscape clusters sequences into protein families using a Markov clustering algorithm (Enright et al., Nucleic Acids Res 2002;30:1575-1584) followed by complete linkage clustering according to sequence identity. Within each protein family, domains are recognized using a library of hidden Markov models comprising CATH structural and Pfam functional domains. Domain architectures are then determined using DomainFinder (Pearl et al., Protein Sci 2002;11:233-244) and the protein family and domain architecture data are amalgamated in the Gene3D database (Buchan et al., Genome Res 2002;12:503-514). Using Gene3D, we have investigated protein sequence space, the extent of structural annotation, and the distribution of different domain architectures in completed genomes from all kingdoms of life. As with earlier studies by other researchers, the distribution of domain families shows power-law behavior such that the largest 2,000 domain families can be mapped to approximately 70% of nonsingleton genome sequences; the remaining sequences are assigned to much smaller families. While approximately 50% of domain annotations within a genome are assigned to 219 universal domain families, a much smaller proportion (< 10%) of protein sequences are assigned to universal protein families. This supports the mosaic theory of evolution whereby domain duplication followed by domain shuffling gives rise to novel domain architectures that can expand the protein functional repertoire of an organism. Functional data (e.g. COG/KEGG/GO) integrated within Gene3D result in a comprehensive resource that is currently being used in structure genomics initiatives and can be accessed via http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/.

Collapse

Doolittle RF. Evolutionary aspects of whole-genome biology. Curr Opin Struct Biol 2005;15:248-53. [PMID: 15963888 DOI: 10.1016/j.sbi.2005.04.001] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2005] [Revised: 02/08/2005] [Accepted: 04/12/2005] [Indexed: 11/28/2022]

Todd AE, Marsden RL, Thornton JM, Orengo CA. Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures. J Mol Biol 2005;348:1235-60. [PMID: 15854658 DOI: 10.1016/j.jmb.2005.03.037] [Citation(s) in RCA: 103] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2004] [Revised: 02/28/2005] [Accepted: 03/15/2005] [Indexed: 11/27/2022]

Abstract

The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (>/=30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.

Collapse

Ranea JAG, Grant A, Thornton JM, Orengo CA. Microeconomic principles explain an optimal genome size in bacteria. Trends Genet 2005;21:21-5. [PMID: 15680509 DOI: 10.1016/j.tig.2004.11.014] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Caetano-Anollés G, Caetano-Anollés D. Universal Sharing Patterns in Proteomes and Evolution of Protein Fold Architecture and Life. J Mol Evol 2005;60:484-98. [PMID: 15883883 DOI: 10.1007/s00239-004-0221-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2004] [Accepted: 10/11/2004] [Indexed: 11/30/2022]

Pagel P, Wong P, Frishman D. A Domain Interaction Map Based on Phylogenetic Profiling. J Mol Biol 2004;344:1331-46. [PMID: 15561146 DOI: 10.1016/j.jmb.2004.10.019] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 07/20/2004] [Accepted: 10/12/2004] [Indexed: 11/17/2022]

Kihara D, Skolnick J. Microbial genomes have over 72% structure assignment by the threading algorithm PROSPECTOR_Q. Proteins 2004;55:464-73. [PMID: 15048836 DOI: 10.1002/prot.20044] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

Ranea JAG, Buchan DWA, Thornton JM, Orengo CA. Evolution of protein superfamilies and bacterial genome size. J Mol Biol 2004;336:871-87. [PMID: 15095866 DOI: 10.1016/j.jmb.2003.12.044] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2003] [Revised: 12/11/2003] [Accepted: 12/12/2003] [Indexed: 10/26/2022]

Abstract

We present the structural annotation of 56 different bacterial species based on the assignment of genes to 816 evolutionary superfamilies in the CATH domain structure database. These assignments have enabled us to analyse the recurrence of specific superfamilies within and across the genomes. We have selected the superfamilies that have a very broad representation and therefore appear to be universally distributed in a significant number of bacterial lineages. Occurrence profiles of these universally distributed superfamilies are compared with genome size in order to estimate the correlation between superfamily duplication and the increase in proteome size. This distinguishes between those size-dependent superfamilies where frequency of occurrence is highly correlated with increase in genome size, and size-independent superfamilies where no correlation is observed. Consideration of the size correlation and the ratio between the mean and the standard deviations for all the superfamily profiles allows more detailed subdivisions and classification of superfamilies. For example, within the size-independent superfamilies, we distinguished a group that are distributed evenly amongst all the genomes. Within the size-dependent superfamilies we differentiated two groups: linearly distributed and non-linearly distributed. Functional annotation using the COG database was performed for all superfamilies in each of these groups, and this revealed significant differences amongst the three sets of superfamilies. Evenly distributed, size-independent domains are shown to be involved primarily in protein translation and biosynthesis. For the size-dependent superfamilies, linearly distributed superfamilies are involved mainly in metabolism, and non-linearly distributed superfamily domains are involved principally in gene regulation.

Collapse

Zhang Y, Skolnick J. Automated structure prediction of weakly homologous proteins on a genomic scale. Proc Natl Acad Sci U S A 2004;101:7594-9. [PMID: 15126668 PMCID: PMC419651 DOI: 10.1073/pnas.0305695101] [Citation(s) in RCA: 245] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Cherkasov A, Jones SJM. Structural characterization of genomes by large scale sequence-structure threading. BMC Bioinformatics 2004;5:37. [PMID: 15061866 PMCID: PMC419331 DOI: 10.1186/1471-2105-5-37] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2003] [Accepted: 04/03/2004] [Indexed: 12/02/2022] Open

Gensheimer M, Mushegian A. Chalcone isomerase family and fold: no longer unique to plants. Protein Sci 2004;13:540-4. [PMID: 14718655 PMCID: PMC2286709 DOI: 10.1110/ps.03395404] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Caetano-Anollés G, Caetano-Anollés D. An evolutionarily structured universe of protein architecture. Genome Res 2003;13:1563-71. [PMID: 12840035 PMCID: PMC403752 DOI: 10.1101/gr.1161903] [Citation(s) in RCA: 114] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2003] [Accepted: 04/17/2003] [Indexed: 11/25/2022]

Anantharaman V, Aravind L, Koonin EV. Emergence of diverse biochemical activities in evolutionarily conserved structural scaffolds of proteins. Curr Opin Chem Biol 2003;7:12-20. [PMID: 12547421 DOI: 10.1016/s1367-5931(02)00018-2] [Citation(s) in RCA: 119] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Lin J, Qian J, Greenbaum D, Bertone P, Das R, Echols N, Senes A, Stenger B, Gerstein M. GeneCensus: genome comparisons in terms of metabolic pathway activity and protein family sharing. Nucleic Acids Res 2002;30:4574-82. [PMID: 12384605 PMCID: PMC137121 DOI: 10.1093/nar/gkf555] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2002] [Revised: 08/08/2002] [Accepted: 08/08/2002] [Indexed: 11/15/2022] Open

Abstract

We present a prototype of a new database tool, GeneCensus, which focuses on comparing genomes globally, in terms of the collective properties of many genes, rather than in terms of the attributes of a single gene (e.g. sequence similarity for a particular ortholog). The comparisons are presented in a visual fashion over the web at GeneCensus.org. The system concentrates on two types of comparisons: (i) trees based on the sharing of generalized protein families between genomes, and (ii) whole pathway analysis in terms of activity levels. For the trees, we have developed a module (TreeViewer) that clusters genomes in terms of the folds, superfamilies or orthologs--all can be considered as generalized 'families' or 'protein parts'--they share, and compares the resulting trees side-by-side with those built from sequence similarity of individual genes (e.g. a traditional tree built on ribosomal similarity). We also include comparisons to trees built on whole-genome dinucleotide or codon composition. For pathway comparisons, we have implemented a module (PathwayPainter) that graphically depicts, in selected metabolic pathways, the fluxes or expression levels of the associated enzymes (i.e. generalized 'activities'). One can, consequently, compare organisms (and organism states) in terms of representations of these systemic quantities. Develop ment of this module involved compiling, calculating and standardizing flux and expression information from many different sources. We illustrate pathway analysis for enzymes involved in central metabolism. We are able to show that, to some degree, flux and expression fluctuations have characteristic values in different sections of the central metabolism and that control points in this system (e.g. hexokinase, pyruvate kinase, phosphofructokinase, isocitrate dehydrogenase and citric synthase) tend to be especially variable in flux and expression. Both the TreeViewer and PathwayPainter modules connect to other information sources related to individual-gene or organism properties (e.g. a single-gene structural annotation viewer).

Collapse

Harrison PM, Gerstein M. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Biol 2002;318:1155-74. [PMID: 12083509 DOI: 10.1016/s0022-2836(02)00109-2] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Abstract

Protein families can be used to understand many aspects of genomes, both their "live" and their "dead" parts (i.e. genes and pseudogenes). Surveys of genomes have revealed that, in every organism, there are always a few large families and many small ones, with the overall distribution following a power-law. This commonality is equally true for both genes and pseudogenes, and exists despite the fact that the specific families that are enlarged differ greatly between organisms. Furthermore, because of family structure there is great redundancy in proteomes, a fact linked to the large number of dispensable genes for each organism and the small size of the minimal, indispensable sub-proteome. Pseudogenes in prokaryotes represent families that are in the process of being dispensed with. In particular, the genome sequences of certain pathogenic bacteria (Mycobacterium leprae, Yersinia pestis and Rickettsia prowazekii) show how an organism can undergo reductive evolution on a large scale (i.e. the dying out of families) as a result of niche change. There appears to be less pressure to delete pseudogenes in eukaryotes. These can be divided into two varieties, duplicated and processed, where the latter involves reverse transcription from an mRNA intermediate. We discuss these collectively in yeast, worm, fly, and human. The fly has few pseudogenes apparently because of its high rate of genomic DNA deletion. In the other three organisms, the distribution of pseudogenes on the chromosome and amongst different families is highly non-uniform. Pseudogenes tend not to occur in the middle of chromosome arms, and tend to be associated with lineage-specific (as opposed to highly conserved) families that have environmental-response functions. This may be because, rather than being dead, they may form a reservoir of diverse "extra parts" that can be resurrected to help an organism adapt to its surroundings. In yeast, there may be a novel mechanism involving the [PSI+] prion that potentially enables this resurrection. In worm, the pseudogenes tend to arise out of families (e.g. chemoreceptors) that are greatly expanded in it compared to the fly. The human genome stands out in having many processed pseudogenes. These have a character very different from those of the duplicated variety, to a large extent just representing random insertions. Thus, their occurrence tends to be roughly in proportion to the amount of mRNA for a particular protein and to reflect the extent of the intergenic sequences. Further information about pseudogenes is available at http://genecensus.org/pseudogene

Collapse