Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Bray JE, Todd AE, Pearl FM, Thornton JM, Orengo CA. The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues. Protein Eng 2000;13:153-65. [PMID: 10775657 DOI: 10.1093/protein/13.3.153] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

For:	Bray JE, Todd AE, Pearl FM, Thornton JM, Orengo CA. The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues. Protein Eng 2000;13:153-65. [PMID: 10775657 DOI: 10.1093/protein/13.3.153] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Number

Cited by Other Article(s)

Song Y, Kim M, Kim Y. Homology Modeling and Optimized Expression of Truncated IK Protein, tIK, as an Anti-Inflammatory Peptide. Molecules 2020;25:molecules25194358. [PMID: 32977406 PMCID: PMC7583991 DOI: 10.3390/molecules25194358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 09/16/2020] [Accepted: 09/21/2020] [Indexed: 11/24/2022] Open

Oliveira H, Sampaio M, Melo LDR, Dias O, Pope WH, Hatfull GF, Azeredo J. Staphylococci phages display vast genomic diversity and evolutionary relationships. BMC Genomics 2019;20:357. [PMID: 31072320 PMCID: PMC6507118 DOI: 10.1186/s12864-019-5647-8] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Accepted: 03/27/2019] [Indexed: 11/25/2022] Open

Abstract

Background

Bacteriophages are the most abundant and diverse entities in the biosphere, and this diversity is driven by constant predator–prey evolutionary dynamics and horizontal gene transfer. Phage genome sequences are under-sampled and therefore present an untapped and uncharacterized source of genetic diversity, typically characterized by highly mosaic genomes and no universal genes. To better understand the diversity and relationships among phages infecting human pathogens, we have analysed the complete genome sequences of 205 phages of Staphylococcus sp.

Results

These are predicted to encode 20,579 proteins, which can be sorted into 2139 phamilies (phams) of related sequences; 745 of these are orphams and possess only a single gene. Based on shared gene content, these phages were grouped into four clusters (A, B, C and D), 27 subclusters (A1-A2, B1-B17, C1-C6 and D1-D2) and one singleton. However, the genomes have mosaic architectures and individual genes with common ancestors are positioned in distinct genomic contexts in different clusters. The staphylococcal Cluster B siphoviridae are predicted to be temperate, and the integration cassettes are often closely-linked to genes implicated in bacterial virulence determinants. There are four unusual endolysin organization strategies found in Staphylococcus phage genomes, with endolysins predicted to be encoded as single genes, two genes spliced, two genes adjacent and as a single gene with inter-lytic-domain secondary translational start site. Comparison of the endolysins reveals multi-domain modularity, with conservation of the SH3 cell wall binding domain.

Conclusions

This study provides a high-resolution view of staphylococcal viral genetic diversity, and insights into their gene flux patterns within and across different phage groups (cluster and subclusters) providing insights into their evolution.

Electronic supplementary material

The online version of this article (10.1186/s12864-019-5647-8) contains supplementary material, which is available to authorized users.

Collapse

Satpathy R, Konkimalla VB, Ratha J. Application of bioinformatics tools and databases in microbial dehalogenation research: A review. APPL BIOCHEM MICRO+ 2014. [DOI: 10.1134/s0003683815010147] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Fiser A. Protein structure modeling in the proteomics era. Expert Rev Proteomics 2014;1:97-110. [PMID: 15966803 DOI: 10.1586/14789450.1.1.97] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Kontoyianni M, Rosnick CB. Functional Prediction of Binding Pockets. J Chem Inf Model 2012;52:824-33. [DOI: 10.1021/ci2005912] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

A common evolutionary origin for tailed-bacteriophage functional modules and bacterial machineries. Microbiol Mol Biol Rev 2012;75:423-33, first page of table of contents. [PMID: 21885679 DOI: 10.1128/mmbr.00014-11] [Citation(s) in RCA: 222] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Andreeva A. Classification of proteins: available structural space for molecular modeling. Methods Mol Biol 2012;857:1-31. [PMID: 22323215 DOI: 10.1007/978-1-61779-588-6_1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Database of Protein and Bioactive Peptide Sequences. ACTA ACUST UNITED AC 2009. [DOI: 10.1201/9781420028836.sec6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

The phage lambda major tail protein structure reveals a common evolution for long-tailed phages and the type VI bacterial secretion system. Proc Natl Acad Sci U S A 2009;106:4160-5. [PMID: 19251647 DOI: 10.1073/pnas.0900044106] [Citation(s) in RCA: 224] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Veeramalai M, Gilbert D. A novel method for comparing topological models of protein structures enhanced with ligand information. Bioinformatics 2008;24:2698-705. [DOI: 10.1093/bioinformatics/btn518] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Hudson AO, Gilvarg C, Leustek T. Biochemical and phylogenetic characterization of a novel diaminopimelate biosynthesis pathway in prokaryotes identifies a diverged form of LL-diaminopimelate aminotransferase. J Bacteriol 2008;190:3256-63. [PMID: 18310350 PMCID: PMC2347407 DOI: 10.1128/jb.01381-07] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2007] [Accepted: 02/14/2008] [Indexed: 11/20/2022] Open

Zotenko E, Islamaj Dogan R, Wilbur WJ, O'Leary DP, Przytycka TM. Structural footprinting in protein structure comparison: the impact of structural fragments. BMC STRUCTURAL BIOLOGY 2007;7:53. [PMID: 17688700 PMCID: PMC2082327 DOI: 10.1186/1472-6807-7-53] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2007] [Accepted: 08/09/2007] [Indexed: 11/23/2022]

Abstract

Background

One approach for speeding-up protein structure comparison is the projection approach, where a protein structure is mapped to a high-dimensional vector and structural similarity is approximated by distance between the corresponding vectors. Structural footprinting methods are projection methods that employ the same general technique to produce the mapping: first select a representative set of structural fragments as models and then map a protein structure to a vector in which each dimension corresponds to a particular model and "counts" the number of times the model appears in the structure. The main difference between any two structural footprinting methods is in the set of models they use; in fact a large number of methods can be generated by varying the type of structural fragments used and the amount of detail in their representation. How do these choices affect the ability of the method to detect various types of structural similarity?

Results

To answer this question we benchmarked three structural footprinting methods that vary significantly in their selection of models against the CATH database. In the first set of experiments we compared the methods' ability to detect structural similarity characteristic of evolutionarily related structures, i.e., structures within the same CATH superfamily. In the second set of experiments we tested the methods' agreement with the boundaries imposed by classification groups at the Class, Architecture, and Fold levels of the CATH hierarchy.

Conclusion

In both experiments we found that the method which uses secondary structure information has the best performance on average, but no one method performs consistently the best across all groups at a given classification level. We also found that combining the methods' outputs significantly improves the performance. Moreover, our new techniques to measure and visualize the methods' agreement with the CATH hierarchy, including the threshholded affinity graph, are useful beyond this work. In particular, they can be used to expose a similar composition of different classification groups in terms of structural fragments used by the method and thus provide an alternative demonstration of the continuous nature of the protein structure universe.

Collapse

Balaji S, Srinivasan N. Comparison of sequence-based and structure-based phylogenetic trees of homologous proteins: Inferences on protein evolution. J Biosci 2007;32:83-96. [PMID: 17426382 DOI: 10.1007/s12038-007-0008-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res 2006;35:D291-7. [PMID: 17135200 PMCID: PMC1751535 DOI: 10.1093/nar/gkl959] [Citation(s) in RCA: 212] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

DWARF--a data warehouse system for analyzing protein families. BMC Bioinformatics 2006;7:495. [PMID: 17094801 PMCID: PMC1647292 DOI: 10.1186/1471-2105-7-495] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2006] [Accepted: 11/09/2006] [Indexed: 11/30/2022] Open

Reeves GA, Dallman TJ, Redfern OC, Akpor A, Orengo CA. Structural diversity of domain superfamilies in the CATH database. J Mol Biol 2006;360:725-41. [PMID: 16780872 DOI: 10.1016/j.jmb.2006.05.035] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2006] [Revised: 04/21/2006] [Accepted: 05/16/2006] [Indexed: 11/23/2022]

Abstract

The CATH database of domain structures has been used to explore the structural variation of homologous domains in 294 well populated domain structure superfamilies, each containing at least three sequence diverse relatives. Our analyses confirm some previously detected trends relating sequence divergence to structural variation but for a much larger dataset and in some superfamilies the new data reveal exceptional structural variation. Use of a new algorithm (2DSEC) to analyse variability in secondary structure compositions across a superfamily sheds new light on how structures evolve. 2DSEC detects inserted secondary structures that embellish the core of conserved secondary structures found throughout the superfamily. Analysis showed that for 56% of highly populated superfamilies (>9 sequence diverse relatives), there are twofold or more increases in the numbers of secondary structures in some relatives. In some families fivefold increases occur, sometimes modifying the fold of the domain. Manual inspection of secondary structure insertions or embellishments in 48 particularly variable superfamilies revealed that although these insertions were usually discontiguous in the sequence they were often co-located in 3D resulting in a larger structural motif that often modified the geometry of the active site or the surface conformation promoting diverse domain partnerships and protein interactions. These observations, supported by automatic analysis of all well populated CATH families, suggest that accretion of small secondary structure insertions may provide a simple mechanism for evolving new functions in diverse relatives. Some layered domain architectures (e.g. mainly-beta and alpha-beta sandwiches) that recur highly in the genomes more frequently exploit these types of embellishments to modify function. In these architectures, aggregation occurs most often at the edges, top or bottom of the beta-sheets. Information on structural variability across domain superfamilies has been made available through the CATH Dictionary of Homologous Structures (DHS).

Collapse

Lee KT, Park EW, Moon S, Park HS, Kim HY, Jang GW, Choi BH, Chung HY, Lee JW, Cheong IC, Oh SJ, Kim H, Suh DS, Kim TH. Genomic sequence analysis of a potential QTL region for fat trait on pig chromosome 6. Genomics 2005;87:218-24. [PMID: 16326071 DOI: 10.1016/j.ygeno.2005.09.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2005] [Revised: 08/22/2005] [Accepted: 09/03/2005] [Indexed: 11/19/2022]

Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J, Orengo C. The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 2005;33:D247-51. [PMID: 15608188 PMCID: PMC539978 DOI: 10.1093/nar/gki024] [Citation(s) in RCA: 185] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Chakrabarti S, Sowdhamini R. Regions of minimal structural variation among members of protein domain superfamilies: application to remote homology detection and modelling using distant relationships. FEBS Lett 2004;569:31-6. [PMID: 15225604 DOI: 10.1016/j.febslet.2004.05.028] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2004] [Accepted: 05/13/2004] [Indexed: 11/21/2022]

Constans P. On the functional significance of electron density protein structure alignments. Proteins 2004;55:646-55. [PMID: 15103628 DOI: 10.1002/prot.20059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

McLaughlin WA, Berman HM. Statistical models for discerning protein structures containing the DNA-binding helix-turn-helix motif. J Mol Biol 2003;330:43-55. [PMID: 12818201 DOI: 10.1016/s0022-2836(03)00532-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Edwards YJK, Cottage A. Bioinformatics methods to predict protein structure and function. A practical approach. Mol Biotechnol 2003;23:139-66. [PMID: 12632698 DOI: 10.1385/mb:23:2:139] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Buchan DWA, Rison SCG, Bray JE, Lee D, Pearl F, Thornton JM, Orengo CA. Gene3D: structural assignments for the biologist and bioinformaticist alike. Nucleic Acids Res 2003;31:469-73. [PMID: 12520054 PMCID: PMC165498 DOI: 10.1093/nar/gkg051] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Fiser A, Sali A. Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol 2003;374:461-91. [PMID: 14696385 DOI: 10.1016/s0076-6879(03)74020-8] [Citation(s) in RCA: 1330] [Impact Index Per Article: 60.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

Oldfield TJ. Data mining the protein data bank: residue interactions. Proteins 2002;49:510-28. [PMID: 12402360 DOI: 10.1002/prot.10221] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Constans P. Linear scaling approaches to quantum macromolecular similarity: evaluating the similarity function. J Comput Chem 2002;23:1305-13. [PMID: 12214313 DOI: 10.1002/jcc.10140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Hill EE, Morea V, Chothia C. Sequence conservation in families whose members have little or no sequence similarity: the four-helical cytokines and cytochromes. J Mol Biol 2002;322:205-33. [PMID: 12215425 DOI: 10.1016/s0022-2836(02)00653-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Abstract

Proteins for which there are good structural, functional and genetic similarities that imply a common evolutionary origin, can have sequences whose similarities are low or undetectable by conventional sequence comparison procedures. Do these proteins have sequence conservation beyond the simple conservation of hydrophobic and hydrophilic character at specific sites and if they do what is its nature? To answer these questions we have analysed the structures and sequences of two superfamilies: the four-helical cytokines and cytochromes c'-b(562). Members of these superfamilies have sequence similarities that are either very low or not detectable. The cytokine superfamily has within it a long chain family and a short chain family. The sequences of known representative structures of the two families were aligned using structural information. From these alignments we identified the regions that conserve the same main-chain conformation: the common core (CC). For members of the same family, the CC comprises some 50% of the individual structures; for the combination of both families it is 30%. We added homologous sequences to the structural alignment. Analysis of the residues occurring at sites within the CCs showed that 30% have little or no conservation, whereas about 40% conserve the polar/neutral or hydrophobic/neutral character of their residues. The remaining 30% conserve hydrophobic residues with strong or medium limitations on their volume variations. Almost all of these residues are found at sites that form the "buried spine" of each helix (at sites i, i+3, i+7, i+10, etc., or i, i+4, i+7, i+11, etc.) and they pack together at the centre of each structure to give a pattern of residue-residue contacts that is almost absolutely conserved. These CC conserved hydrophobic residues form only 10-15% of all the residues in the individual structures.A similar analysis of the cytochromes c'-b(562), which bind haem and have a very different function to that of the cytokines, gave very similar results. Again some 30% of the CC residues have hydrophobic residues with strong or medium conservation. Most of these form the buried spine of each helix and play the same role as those in the cytokines. The others, and some spine residues bind the haem co-factor.

Collapse

Getz G, Vendruscolo M, Sachs D, Domany E. Automated assignment of SCOP and CATH protein structure classifications from FSSP scores. Proteins 2002;46:405-15. [PMID: 11835515 DOI: 10.1002/prot.1176] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Buchan DWA, Shepherd AJ, Lee D, Pearl FMG, Rison SCG, Thornton JM, Orengo CA. Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database. Genome Res 2002;12:503-14. [PMID: 11875040 PMCID: PMC155287 DOI: 10.1101/gr.213802] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Pearl FMG, Lee D, Bray JE, Buchan DWA, Shepherd AJ, Orengo CA. The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Sci 2002;11:233-44. [PMID: 11790833 PMCID: PMC2373435 DOI: 10.1110/ps.16802] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Pandit SB, Gosar D, Abhiman S, Sujatha S, Dixit SS, Mhatre NS, Sowdhamini R, Srinivasan N. SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes. Nucleic Acids Res 2002;30:289-93. [PMID: 11752317 PMCID: PMC99061 DOI: 10.1093/nar/30.1.289] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Abstract

Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional structures of the proteins are determined using X-ray analysis or NMR. The SUPFAM database described here relates two homologous protein families in a multiple sequence alignment database of either known or unknown structure. The present release (1.1), which is the first version of the SUPFAM database, has been derived by analysing Pfam, which is one of the commonly used databases of multiple sequence alignments of homologous proteins. The first step in establishing SUPFAM is to relate Pfam families with the families in PALI, which is an alignment database of homologous proteins of known structure that is derived largely from SCOP. The second step involves relating Pfam families which could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, IMPALA, has been used in these steps. The first step resulted in identification of 1280 Pfam families (out of 2697, i.e. 47%) which are related, either by close homologous connection to a SCOP family or by distant relationship to a SCOP family, potentially forming new superfamily connections. Using the profiles of 1417 Pfam families with apparently no structural information, an all-against-all comparison involving a sequence-profile match using IMPALA resulted in clustering of 67 homologous protein families of Pfam into 28 potential new superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying 'priority proteins' for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. For example, we could assign 858 distinct Pfam domains in 2203 of the gene products in the genome of Mycobacterium tubercolosis. Fifty-one of these Pfam families of unknown structure could be clustered into 17 potentially new superfamilies forming good targets for structural genomics. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/~supfam.

Collapse

Pieper U, Eswar N, Stuart AC, Ilyin VA, Sali A. MODBASE, a database of annotated comparative protein structure models. Nucleic Acids Res 2002;30:255-9. [PMID: 11752309 PMCID: PMC99112 DOI: 10.1093/nar/30.1.255] [Citation(s) in RCA: 84] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2001] [Revised: 10/02/2001] [Accepted: 10/02/2001] [Indexed: 11/12/2022] Open

Orengo CA, Bray JE, Buchan DWA, Harrison A, Lee D, Pearl FMG, Sillitoe I, Todd AE, Thornton JM. The CATH protein family database: A resource for structural and functional annotation of genomes. Proteomics 2002. [DOI: 10.1002/1615-9861(200201)2:1<11::aid-prot11>3.0.co;2-t] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Thornton JM. From genome to function. Science 2001;292:2095-7. [PMID: 11408660 DOI: 10.1126/science.292.5524.2095] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Orengo CA, Sillitoe I, Reeves G, Pearl FM. Review: what can structural classifications reveal about protein evolution? J Struct Biol 2001;134:145-65. [PMID: 11551176 DOI: 10.1006/jsbi.2001.4398] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Pearl FM, Martin N, Bray JE, Buchan DW, Harrison AP, Lee D, Reeves GA, Shepherd AJ, Sillitoe I, Todd AE, Thornton JM, Orengo CA. A rapid classification protocol for the CATH Domain Database to support structural genomics. Nucleic Acids Res 2001;29:223-7. [PMID: 11125098 PMCID: PMC29791 DOI: 10.1093/nar/29.1.223] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant structural relatives by incorporating sequence information from relatives in the genomes (DomainFinder). In order to ensure the integrity of the database given the expected increase in data, the CATH Protein Family Database (CATH-PFDB), which currently includes 25,320 structural domains and a further 160,000 sequence relatives has now been installed in a relational ORACLE database. This was essential for developing more rigorous validation procedures and for allowing efficient querying of the database, particularly for genome analysis. The associated Dictionary of Homologous Superfamilies [Bray,J.E., Todd,A.E., Pearl,F.M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153-165], which provides multiple structural alignments and functional information to assist in assigning new relatives, has also been expanded recently and now includes information for 903 homologous superfamilies. In order to improve coverage of known structures, preliminary classification levels are now provided for new structures at interim stages in the classification protocol. Since a large proportion of new structures can be rapidly classified using profile-based sequence analysis [e.g. PSI-BLAST: Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402], this provides preliminary classification for easily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of the non-identical structures.

Collapse

Valdar WSJ, Thornton JM. Protein-protein interfaces: Analysis of amino acid conservation in homodimers. Proteins 2000. [DOI: 10.1002/1097-0134(20010101)42:1<108::aid-prot110>3.0.co;2-o] [Citation(s) in RCA: 245] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]