1
|
Song Y, Kim M, Kim Y. Homology Modeling and Optimized Expression of Truncated IK Protein, tIK, as an Anti-Inflammatory Peptide. Molecules 2020; 25:molecules25194358. [PMID: 32977406 PMCID: PMC7583991 DOI: 10.3390/molecules25194358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 09/16/2020] [Accepted: 09/21/2020] [Indexed: 11/24/2022] Open
Abstract
Rheumatoid arthritis, caused by abnormalities in the autoimmune system, affects about 1% of the population. Rheumatoid arthritis does not yet have a proper treatment, and current treatment has various side effects. Therefore, there is a need for a therapeutic agent that can effectively treat rheumatoid arthritis without side effects. Recently, research on pharmaceutical drugs based on peptides has been actively conducted to reduce negative effects. Because peptide drugs are bio-friendly and bio-specific, they are characterized by no side effects. Truncated-IK (tIK) protein, a fragment of IK protein, has anti-inflammatory effects, including anti-rheumatoid arthritis activity. This study focused on the fact that tIK protein phosphorylates the interleukin 10 receptor. Through homology modeling with interleukin 10, short tIK epitopes were proposed to find the essential region of the sequence for anti-inflammatory activity. TH17 differentiation experiments were also performed with the proposed epitope. A peptide composed of 18 amino acids with an anti-inflammatory effect was named tIK-18mer. Additionally, a tIK 9-mer and a 14-mer were also found. The procedure for the experimental expression of the proposed tIK series (9-mer, 14-mer, and 18-mer) using bacterial strain is discussed.
Collapse
Affiliation(s)
| | | | - Yongae Kim
- Correspondence: ; Tel.: +82-2-2173-8705; Fax: +82-31-330-4566
| |
Collapse
|
2
|
Oliveira H, Sampaio M, Melo LDR, Dias O, Pope WH, Hatfull GF, Azeredo J. Staphylococci phages display vast genomic diversity and evolutionary relationships. BMC Genomics 2019; 20:357. [PMID: 31072320 PMCID: PMC6507118 DOI: 10.1186/s12864-019-5647-8] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Accepted: 03/27/2019] [Indexed: 11/25/2022] Open
Abstract
Background Bacteriophages are the most abundant and diverse entities in the biosphere, and this diversity is driven by constant predator–prey evolutionary dynamics and horizontal gene transfer. Phage genome sequences are under-sampled and therefore present an untapped and uncharacterized source of genetic diversity, typically characterized by highly mosaic genomes and no universal genes. To better understand the diversity and relationships among phages infecting human pathogens, we have analysed the complete genome sequences of 205 phages of Staphylococcus sp. Results These are predicted to encode 20,579 proteins, which can be sorted into 2139 phamilies (phams) of related sequences; 745 of these are orphams and possess only a single gene. Based on shared gene content, these phages were grouped into four clusters (A, B, C and D), 27 subclusters (A1-A2, B1-B17, C1-C6 and D1-D2) and one singleton. However, the genomes have mosaic architectures and individual genes with common ancestors are positioned in distinct genomic contexts in different clusters. The staphylococcal Cluster B siphoviridae are predicted to be temperate, and the integration cassettes are often closely-linked to genes implicated in bacterial virulence determinants. There are four unusual endolysin organization strategies found in Staphylococcus phage genomes, with endolysins predicted to be encoded as single genes, two genes spliced, two genes adjacent and as a single gene with inter-lytic-domain secondary translational start site. Comparison of the endolysins reveals multi-domain modularity, with conservation of the SH3 cell wall binding domain. Conclusions This study provides a high-resolution view of staphylococcal viral genetic diversity, and insights into their gene flux patterns within and across different phage groups (cluster and subclusters) providing insights into their evolution. Electronic supplementary material The online version of this article (10.1186/s12864-019-5647-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hugo Oliveira
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal.
| | - Marta Sampaio
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Luís D R Melo
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Oscar Dias
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Welkin H Pope
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | - Graham F Hatfull
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | - Joana Azeredo
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal
| |
Collapse
|
3
|
Satpathy R, Konkimalla VB, Ratha J. Application of bioinformatics tools and databases in microbial dehalogenation research: A review. APPL BIOCHEM MICRO+ 2014. [DOI: 10.1134/s0003683815010147] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
4
|
Abstract
Structural proteomics aims to understand the structural basis of protein interactions and functions. A prerequisite for this is the availability of 3D protein structures that mediate the biochemical interactions. The explosion in the number of available gene sequences set the stage for the next step in genome-scale projects -- to obtain 3D structures for each protein. To achieve this ambitious goal, the slow and costly structure determination experiments are supplemented with theoretical approaches. The current state and recent advances in structure modeling approaches are reviewed here, with special emphasis on comparative protein structure modeling techniques.
Collapse
Affiliation(s)
- András Fiser
- Department of Biochemistry, Seaver Foundation Center for Bioinformatics, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA.
| |
Collapse
|
5
|
Affiliation(s)
- Maria Kontoyianni
- Department
of Pharmaceutical Sciences and §Department of Psychology, Southern Illinois University Edwardsville, Edwardsville,
Illinois 62026, United States
| | - Christopher B. Rosnick
- Department
of Pharmaceutical Sciences and §Department of Psychology, Southern Illinois University Edwardsville, Edwardsville,
Illinois 62026, United States
| |
Collapse
|
6
|
A common evolutionary origin for tailed-bacteriophage functional modules and bacterial machineries. Microbiol Mol Biol Rev 2012; 75:423-33, first page of table of contents. [PMID: 21885679 DOI: 10.1128/mmbr.00014-11] [Citation(s) in RCA: 222] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Bacteriophages belonging to the order Caudovirales possess a tail acting as a molecular nanomachine used during infection to recognize the host cell wall, attach to it, pierce it, and ensure the high-efficiency delivery of the genomic DNA to the host cytoplasm. In this review, we provide a comprehensive analysis of the various proteins constituting tailed bacteriophages from a structural viewpoint. To this end, we had in mind to pinpoint the resemblances within and between functional modules such as capsid/tail connectors, the tails themselves, or the tail distal host recognition devices, termed baseplates. This comparison has been extended to bacterial machineries embedded in the cell wall, for which shared molecular homology with phages has been recently revealed. This is the case for the type VI secretion system (T6SS), an inverted phage tail at the bacterial surface, or bacteriocins. Gathering all these data, we propose that a unique ancestral protein fold may have given rise to a large number of bacteriophage modules as well as to some related bacterial machinery components.
Collapse
|
7
|
Abstract
The wealth of available protein structural data provides unprecedented opportunity to study and better understand the underlying principles of protein folding and protein structure evolution. A key to achieving this lies in the ability to analyse these data and to organize them in a coherent classification scheme. Over the past years several protein classifications have been developed that aim to group proteins based on their structural relationships. Some of these classification schemes explore the concept of structural neighbourhood (structural continuum), whereas other utilize the notion of protein evolution and thus provide a discrete rather than continuum view of protein structure space. This chapter presents a strategy for classification of proteins with known three-dimensional structure. Steps in the classification process along with basic definitions are introduced. Examples illustrating some fundamental concepts of protein folding and evolution with a special focus on the exceptions to them are presented.
Collapse
|
8
|
|
9
|
The phage lambda major tail protein structure reveals a common evolution for long-tailed phages and the type VI bacterial secretion system. Proc Natl Acad Sci U S A 2009; 106:4160-5. [PMID: 19251647 DOI: 10.1073/pnas.0900044106] [Citation(s) in RCA: 224] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Most bacteriophages possess long tails, which serve as the conduit for genome delivery. We report the solution structure of the N-terminal domain of gpV, the protein comprising the major portion of the noncontractile phage lambda tail tube. This structure is very similar to a previously solved tail tube protein from a contractile-tailed phage, providing the first direct evidence of an evolutionary connection between these 2 distinct types of phage tails. A remarkable structural similarity is also seen to Hcp1, a component of the bacterial type VI secretion system. The hexameric structure of Hcp1 and its ability to form long tubes are strikingly reminiscent of gpV when it is polymerized into a tail tube. These data coupled with other similarities between phage and type VI secretion proteins support an evolutionary relationship between these systems. Using Hcp1 as a model, we propose a polymerization mechanism for gpV involving several disorder-to-order transitions.
Collapse
|
10
|
Veeramalai M, Gilbert D. A novel method for comparing topological models of protein structures enhanced with ligand information. Bioinformatics 2008; 24:2698-705. [DOI: 10.1093/bioinformatics/btn518] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
11
|
Hudson AO, Gilvarg C, Leustek T. Biochemical and phylogenetic characterization of a novel diaminopimelate biosynthesis pathway in prokaryotes identifies a diverged form of LL-diaminopimelate aminotransferase. J Bacteriol 2008; 190:3256-63. [PMID: 18310350 PMCID: PMC2347407 DOI: 10.1128/jb.01381-07] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2007] [Accepted: 02/14/2008] [Indexed: 11/20/2022] Open
Abstract
A variant of the diaminopimelate (DAP)-lysine biosynthesis pathway uses an LL-DAP aminotransferase (DapL, EC 2.6.1.83) to catalyze the direct conversion of L-2,3,4,5-tetrahydrodipicolinate to LL-DAP. Comparative genomic analysis and experimental verification of DapL candidates revealed the existence of two diverged forms of DapL (DapL1 and DapL2). DapL orthologs were identified in eubacteria and archaea. In some species the corresponding dapL gene was found to lie in genomic contiguity with other dap genes, suggestive of a polycistronic structure. The DapL candidate enzymes were found to cluster into two classes sharing approximately 30% amino acid identity. The function of selected enzymes from each class was studied. Both classes were able to functionally complement Escherichia coli dapD and dapE mutants and to catalyze LL-DAP transamination, providing functional evidence for a role in DAP/lysine biosynthesis. In all cases the occurrence of dapL in a species correlated with the absence of genes for dapD and dapE representing the acyl DAP pathway variants, and only in a few cases was dapL coincident with ddh encoding meso-DAP dehydrogenase. The results indicate that the DapL pathway is restricted to specific lineages of eubacteria including the Cyanobacteria, Desulfuromonadales, Firmicutes, Bacteroidetes, Chlamydiae, Spirochaeta, and Chloroflexi and two archaeal groups, the Methanobacteriaceae and Archaeoglobaceae.
Collapse
Affiliation(s)
- André O Hudson
- Biotech Center and Department of Plant Biology and Pathology, Rutgers University, New Brunswick, New Jersey 08901, USA
| | | | | |
Collapse
|
12
|
Zotenko E, Islamaj Dogan R, Wilbur WJ, O'Leary DP, Przytycka TM. Structural footprinting in protein structure comparison: the impact of structural fragments. BMC STRUCTURAL BIOLOGY 2007; 7:53. [PMID: 17688700 PMCID: PMC2082327 DOI: 10.1186/1472-6807-7-53] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2007] [Accepted: 08/09/2007] [Indexed: 11/23/2022]
Abstract
Background One approach for speeding-up protein structure comparison is the projection approach, where a protein structure is mapped to a high-dimensional vector and structural similarity is approximated by distance between the corresponding vectors. Structural footprinting methods are projection methods that employ the same general technique to produce the mapping: first select a representative set of structural fragments as models and then map a protein structure to a vector in which each dimension corresponds to a particular model and "counts" the number of times the model appears in the structure. The main difference between any two structural footprinting methods is in the set of models they use; in fact a large number of methods can be generated by varying the type of structural fragments used and the amount of detail in their representation. How do these choices affect the ability of the method to detect various types of structural similarity? Results To answer this question we benchmarked three structural footprinting methods that vary significantly in their selection of models against the CATH database. In the first set of experiments we compared the methods' ability to detect structural similarity characteristic of evolutionarily related structures, i.e., structures within the same CATH superfamily. In the second set of experiments we tested the methods' agreement with the boundaries imposed by classification groups at the Class, Architecture, and Fold levels of the CATH hierarchy. Conclusion In both experiments we found that the method which uses secondary structure information has the best performance on average, but no one method performs consistently the best across all groups at a given classification level. We also found that combining the methods' outputs significantly improves the performance. Moreover, our new techniques to measure and visualize the methods' agreement with the CATH hierarchy, including the threshholded affinity graph, are useful beyond this work. In particular, they can be used to expose a similar composition of different classification groups in terms of structural fragments used by the method and thus provide an alternative demonstration of the continuous nature of the protein structure universe.
Collapse
Affiliation(s)
- Elena Zotenko
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Rezarta Islamaj Dogan
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - W John Wilbur
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Dianne P O'Leary
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
- Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA
| | - Teresa M Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
13
|
Balaji S, Srinivasan N. Comparison of sequence-based and structure-based phylogenetic trees of homologous proteins: Inferences on protein evolution. J Biosci 2007; 32:83-96. [PMID: 17426382 DOI: 10.1007/s12038-007-0008-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Several studies based on the known three-dimensional (3-D) structures of proteins show that two homologous proteins with insignificant sequence similarity could adopt a common fold and may perform same or similar biochemical functions. Hence, it is appropriate to use similarities in 3-D structure of proteins rather than the amino acid sequence similarities in modelling evolution of distantly related proteins. Here we present an assessment of using 3-D structures in modelling evolution of homologous proteins. Using a dataset of 108 protein domain families of known structures with at least 10 members per family we present a comparison of extent of structural and sequence dissimilarities among pairs of proteins which are inputs into the construction of phylogenetic trees. We find that correlation between the structure-based dissimilarity measures and the sequence-based dissimilarity measures is usually good if the sequence similarity among the homologues is about 30% or more. For protein families with low sequence similarity among the members, the correlation coefficient between the sequence-based and the structure-based dissimilarities are poor. In these cases the structure-based dendrogram clusters proteins with most similar biochemical functional properties better than the sequence-similarity based dendrogram. In multi-domain protein families and disulphide-rich protein families the correlation coefficient for the match of sequence-based and structure-based dissimilarity (SDM) measures can be poor though the sequence identity could be higher than 30%. Hence it is suggested that protein evolution is best modelled using 3-D structures if the sequence similarities (SSM) of the homologues are very low.
Collapse
Affiliation(s)
- S Balaji
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| | | |
Collapse
|
14
|
Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res 2006; 35:D291-7. [PMID: 17135200 PMCID: PMC1751535 DOI: 10.1093/nar/gkl959] [Citation(s) in RCA: 212] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We report the latest release (version 3.0) of the CATH protein domain database (). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto ∼2 million sequences in completed genomes and UniProt.
Collapse
Affiliation(s)
| | | | | | - Alison Cuff
- To whom correspondence should be addressed: Tel: +1 44 207 679 3890; Fax: +1 44 207 679 7193;
| | | | | | | | | | | | | | | | | | - Janet M. Thornton
- European Bioinformatics Institute, Hinxton HallHinxton, Cambridge CB 10 IRQ, UK
| | | |
Collapse
|
15
|
DWARF--a data warehouse system for analyzing protein families. BMC Bioinformatics 2006; 7:495. [PMID: 17094801 PMCID: PMC1647292 DOI: 10.1186/1471-2105-7-495] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2006] [Accepted: 11/09/2006] [Indexed: 11/30/2022] Open
Abstract
Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering.
Collapse
|
16
|
Reeves GA, Dallman TJ, Redfern OC, Akpor A, Orengo CA. Structural diversity of domain superfamilies in the CATH database. J Mol Biol 2006; 360:725-41. [PMID: 16780872 DOI: 10.1016/j.jmb.2006.05.035] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2006] [Revised: 04/21/2006] [Accepted: 05/16/2006] [Indexed: 11/23/2022]
Abstract
The CATH database of domain structures has been used to explore the structural variation of homologous domains in 294 well populated domain structure superfamilies, each containing at least three sequence diverse relatives. Our analyses confirm some previously detected trends relating sequence divergence to structural variation but for a much larger dataset and in some superfamilies the new data reveal exceptional structural variation. Use of a new algorithm (2DSEC) to analyse variability in secondary structure compositions across a superfamily sheds new light on how structures evolve. 2DSEC detects inserted secondary structures that embellish the core of conserved secondary structures found throughout the superfamily. Analysis showed that for 56% of highly populated superfamilies (>9 sequence diverse relatives), there are twofold or more increases in the numbers of secondary structures in some relatives. In some families fivefold increases occur, sometimes modifying the fold of the domain. Manual inspection of secondary structure insertions or embellishments in 48 particularly variable superfamilies revealed that although these insertions were usually discontiguous in the sequence they were often co-located in 3D resulting in a larger structural motif that often modified the geometry of the active site or the surface conformation promoting diverse domain partnerships and protein interactions. These observations, supported by automatic analysis of all well populated CATH families, suggest that accretion of small secondary structure insertions may provide a simple mechanism for evolving new functions in diverse relatives. Some layered domain architectures (e.g. mainly-beta and alpha-beta sandwiches) that recur highly in the genomes more frequently exploit these types of embellishments to modify function. In these architectures, aggregation occurs most often at the edges, top or bottom of the beta-sheets. Information on structural variability across domain superfamilies has been made available through the CATH Dictionary of Homologous Structures (DHS).
Collapse
Affiliation(s)
- Gabrielle A Reeves
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | |
Collapse
|
17
|
Lee KT, Park EW, Moon S, Park HS, Kim HY, Jang GW, Choi BH, Chung HY, Lee JW, Cheong IC, Oh SJ, Kim H, Suh DS, Kim TH. Genomic sequence analysis of a potential QTL region for fat trait on pig chromosome 6. Genomics 2005; 87:218-24. [PMID: 16326071 DOI: 10.1016/j.ygeno.2005.09.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2005] [Revised: 08/22/2005] [Accepted: 09/03/2005] [Indexed: 11/19/2022]
Abstract
On pig chromosome 6, the SW71 microsatellite is located in the region corresponding to several quantitative trait loci (QTL), such as those for intramuscular fat content and for body weight at 4 weeks of age. The genomic sequence of approximately 909 kb was obtained from seven BAC clones encompassing the SW71 region corresponding to human 18q11.21-q11.22. By searching the NCBI GenBank using BLASTX and BLASTN, this 909-kb segment was found to contain eight genes, RAB31, TXNDC2, VAPA, APCDD1, NAPG, FAM38B, C18orf30, and C18orf58, and one putative gene (DN119777). The average G + C content in the sequence of this contig was 45.75% and 33 CpG islands were detected. CpG islands were scattered throughout the region in which most of the putative genes were located. Dense CpG islands of approximately 840 bp were observed, including within the 5' UTR and exon 1 of the orthologs of the RAB31, VAPA, APCDD1, and NAPG genes. Comparative analysis of conserved segments of six species showed that K(a)/K(s) ratios of the TXNDC2 gene in collinear and rearranged segments were significantly different at 4.1 and 1.3, respectively. In conclusion, we demonstrated the genomic organization of pig chromosome 6, including the gene order surrounding SW71, which provides important information for comparative mapping. Moreover, the genes revealed in this study may be positional candidate genes associated with QTL on chromosome 6 that affect fat deposition in pigs.
Collapse
Affiliation(s)
- Kyung-Tai Lee
- Division of Animal Genomics and Bioinformatics, National Livestock Research Institute, Rural Development Administration, Omokchun-dong 564, Kwonsun-gu, Suwon, Korea
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J, Orengo C. The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 2005; 33:D247-51. [PMID: 15608188 PMCID: PMC539978 DOI: 10.1093/nar/gki024] [Citation(s) in RCA: 185] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath/) currently contains 43,229 domains classified into 1467 superfamilies and 5107 sequence families. Each structural family is expanded with sequence relatives from GenBank and completed genomes, using a variety of efficient sequence search protocols and reliable thresholds. This extended CATH protein family database contains 616,470 domain sequences classified into 23,876 sequence families. This results in the significant expansion of the CATH HMM model library to include models built from the CATH sequence relatives, giving a 10% increase in coverage for detecting remote homologues. An improved Dictionary of Homologous superfamilies (DHS) (http://www.biochem.ucl.ac.uk/bsm/dhs/) containing specific sequence, structural and functional information for each superfamily in CATH considerably assists manual validation of homologues. Information on sequence relatives in CATH superfamilies, GenBank and completed genomes is presented in the CATH associated DHS and Gene3D resources. Domain partnership information can be obtained from Gene3D (http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/). A new CATH server has been implemented (http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl) providing automatic classification of newly determined sequences and structures using a suite of rapid sequence and structure comparison methods. The statistical significance of matches is assessed and links are provided to the putative superfamily or fold group to which the query sequence or structure is assigned.
Collapse
Affiliation(s)
- Frances Pearl
- Biochemistry and Molecular Biology Department, University College London, University of London, Gower Street, London WC1E 6BT, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Chakrabarti S, Sowdhamini R. Regions of minimal structural variation among members of protein domain superfamilies: application to remote homology detection and modelling using distant relationships. FEBS Lett 2004; 569:31-6. [PMID: 15225604 DOI: 10.1016/j.febslet.2004.05.028] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2004] [Accepted: 05/13/2004] [Indexed: 11/21/2022]
Abstract
Structurally conserved regions or structural templates have been identified and examined for features such as amino acid content, solvent accessibility, secondary structures, non-polar interaction, residue packing and extent of structural deviations in 179 aligned members of superfamilies involving 1208 pairs of protein domains. An analysis of these structural features shows that the retention of secondary structural conservation and similar hydrogen bonding pattern within the templates is 2.5 and 1.8 times higher, respectively, than full-length alignments suggesting that they form the minimum structural requirement of a superfamily. The identification and availability of structural templates find value in different areas of protein structure prediction and modelling such as in sensitive sequence searches, accurate sequence alignment and three-dimensional modelling on the basis of distant relationships.
Collapse
Affiliation(s)
- Saikat Chakrabarti
- National Centre for Biological Sciences, UAS-GKVK Campus, Bellary Road, Bangalore 560 065, India
| | | |
Collapse
|
20
|
Constans P. On the functional significance of electron density protein structure alignments. Proteins 2004; 55:646-55. [PMID: 15103628 DOI: 10.1002/prot.20059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Electron density protein alignments are analyzed in terms of their underlying similarity measure, the density overlap. These alignments are conceptually unrelated to biochemical structural elements and, therefore, are appropriate in structure-only similarity studies. The analysis is focused on the low sequence similarity subset of protein domains. A remarkable association is found between simple, density overlap measures and the expert designed Structural Classification of Proteins (SCOP) for which functional and evolutive analogies prevail. The association found validates the functional significance of electron density alignments.
Collapse
Affiliation(s)
- Pere Constans
- Department of Chemistry, Rice University, Houston, Texas, USA.
| |
Collapse
|
21
|
McLaughlin WA, Berman HM. Statistical models for discerning protein structures containing the DNA-binding helix-turn-helix motif. J Mol Biol 2003; 330:43-55. [PMID: 12818201 DOI: 10.1016/s0022-2836(03)00532-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A method for discerning protein structures containing the DNA-binding helix-turn-helix (HTH) motif has been developed. The method uses statistical models based on geometrical measurements of the motif. With a decision tree model, key structural features required for DNA binding were identified. These include a high average solvent-accessibility of residues within the recognition helix and a conserved hydrophobic interaction between the recognition helix and the second alpha helix preceding it. The Protein Data Bank was searched using a more accurate model of the motif created using the Adaboost algorithm to identify structures that have a high probability of containing the motif, including those that had not been reported previously.
Collapse
Affiliation(s)
- William A McLaughlin
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 610 Taylor Road, Piscataway 08854-8087, USA
| | | |
Collapse
|
22
|
Edwards YJK, Cottage A. Bioinformatics methods to predict protein structure and function. A practical approach. Mol Biotechnol 2003; 23:139-66. [PMID: 12632698 DOI: 10.1385/mb:23:2:139] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Protein structure prediction by using bioinformatics can involve sequence similarity searches, multiple sequence alignments, identification and characterization of domains, secondary structure prediction, solvent accessibility prediction, automatic protein fold recognition, constructing three-dimensional models to atomic detail, and model validation. Not all protein structure prediction projects involve the use of all these techniques. A central part of a typical protein structure prediction is the identification of a suitable structural target from which to extrapolate three-dimensional information for a query sequence. The way in which this is done defines three types of projects. The first involves the use of standard and well-understood techniques. If a structural template remains elusive, a second approach using nontrivial methods is required. If a target fold cannot be reliably identified because inconsistent results have been obtained from nontrivial data analyses, the project falls into the third type of project and will be virtually impossible to complete with any degree of reliability. In this article, a set of protocols to predict protein structure from sequence is presented and distinctions among the three types of project are given. These methods, if used appropriately, can provide valuable indicators of protein structure and function.
Collapse
Affiliation(s)
- Yvonne J K Edwards
- Research Division, UK Human Genome Mapping Project Resource Center, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10, 1SB, England, UK.
| | | |
Collapse
|
23
|
Buchan DWA, Rison SCG, Bray JE, Lee D, Pearl F, Thornton JM, Orengo CA. Gene3D: structural assignments for the biologist and bioinformaticist alike. Nucleic Acids Res 2003; 31:469-73. [PMID: 12520054 PMCID: PMC165498 DOI: 10.1093/nar/gkg051] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Gene3D database (http://www.biochem.ucl.ac.uk/bsm/cath_new/Gene3D/) provides structural assignments for genes within complete genomes. These are available via the internet from either the World Wide Web or FTP. Assignments are made using PSI-BLAST and subsequently processed using the DRange protocol. The DRange protocol is an empirically benchmarked method for assessing the validity of structural assignments made using sequence searching methods where appropriate assignment statistics are collected and made available. Gene3D links assignments to their appropriate entries in relevent structural and classification resources (PDBsum, CATH database and the Dictionary of Homologous Superfamilies). Release 2.0 of Gene3D includes 62 genomes, 2 eukaryotes, 10 archaea and 40 bacteria. Currently, structural assignments can be made for between 30 and 40 percent of any given genome. In any genome, around half of those genes assigned a structural domain are assigned a single domain and the other half of the genes are assigned multiple structural domains. Gene3D is linked to the CATH database and is updated with each new update of CATH.
Collapse
Affiliation(s)
- Daniel W A Buchan
- Biomolecular Structure and Modelling Group, Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK
| | | | | | | | | | | | | |
Collapse
|
24
|
Affiliation(s)
- András Fiser
- Department of Biochemistry and Seaver Foundation Center for Bioinformatics, Albert Einstein College of Medicine, Bronz, New York 10461, USA
| | | |
Collapse
|
25
|
Abstract
The protein databank contains a vast wealth of structural and functional information. The analysis of this macromolecular information has been the subject of considerable work in order to advance knowledge beyond the collection of molecular coordinates. This article presents a method that determines local structural information within proteins using mathematical data mining techniques. The mine program described returns many known configurations of residues such as the catalytic triad, metal binding sites and the N-linked glycosylation site; as well as many other multiple residue interactions not previously categorized. Because mathematical constructs are used as targets, this method can identify new information not previously known, and also provide unbiased results of typical structure and their expected deviations. Because the results are defined mathematically, they cannot indicate the biological implications of the results. Therefore two support programs are described that provide insight into the biological context for the mine results. The first allows a weighted RMSD search between a template set of coordinates and a list of PDB files, and the second allows the labeling of a protein with the template results from mining to aid in the classification of this protein.
Collapse
Affiliation(s)
- T J Oldfield
- Accelrys Inc., Department of Chemistry, University of York, Heslington, York, Yorkshire, United Kingdom.
| |
Collapse
|
26
|
Constans P. Linear scaling approaches to quantum macromolecular similarity: evaluating the similarity function. J Comput Chem 2002; 23:1305-13. [PMID: 12214313 DOI: 10.1002/jcc.10140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The evaluation of the electron density based similarity function scales quadratically with respect to the size of the molecules for simplified, atomic shell densities. Due to the exponential decay of the function's atom-atom terms most interatomic contributions are numerically negligible on large systems. An improved algorithm for the evaluation of the Quantum Molecular Similarity function is presented. This procedure identifies all non-negligible terms without computing unnecessary interatomic squared distances, thus effectively turning to linear scaling the similarity evaluation. Presented also is a minimalist dynamic electron density model. Approximate, single shell densities together with the proposed algorithm facilitate fast electron density based alignments on macromolecules.
Collapse
Affiliation(s)
- Pere Constans
- Department of Chemistry, Rice University, Houston, Texas 77005-1892, USA.
| |
Collapse
|
27
|
Hill EE, Morea V, Chothia C. Sequence conservation in families whose members have little or no sequence similarity: the four-helical cytokines and cytochromes. J Mol Biol 2002; 322:205-33. [PMID: 12215425 DOI: 10.1016/s0022-2836(02)00653-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Proteins for which there are good structural, functional and genetic similarities that imply a common evolutionary origin, can have sequences whose similarities are low or undetectable by conventional sequence comparison procedures. Do these proteins have sequence conservation beyond the simple conservation of hydrophobic and hydrophilic character at specific sites and if they do what is its nature? To answer these questions we have analysed the structures and sequences of two superfamilies: the four-helical cytokines and cytochromes c'-b(562). Members of these superfamilies have sequence similarities that are either very low or not detectable. The cytokine superfamily has within it a long chain family and a short chain family. The sequences of known representative structures of the two families were aligned using structural information. From these alignments we identified the regions that conserve the same main-chain conformation: the common core (CC). For members of the same family, the CC comprises some 50% of the individual structures; for the combination of both families it is 30%. We added homologous sequences to the structural alignment. Analysis of the residues occurring at sites within the CCs showed that 30% have little or no conservation, whereas about 40% conserve the polar/neutral or hydrophobic/neutral character of their residues. The remaining 30% conserve hydrophobic residues with strong or medium limitations on their volume variations. Almost all of these residues are found at sites that form the "buried spine" of each helix (at sites i, i+3, i+7, i+10, etc., or i, i+4, i+7, i+11, etc.) and they pack together at the centre of each structure to give a pattern of residue-residue contacts that is almost absolutely conserved. These CC conserved hydrophobic residues form only 10-15% of all the residues in the individual structures.A similar analysis of the cytochromes c'-b(562), which bind haem and have a very different function to that of the cytokines, gave very similar results. Again some 30% of the CC residues have hydrophobic residues with strong or medium conservation. Most of these form the buried spine of each helix and play the same role as those in the cytokines. The others, and some spine residues bind the haem co-factor.
Collapse
Affiliation(s)
- Emma E Hill
- MRC Laboratory of Molecular Biology, Cambridge, UK.
| | | | | |
Collapse
|
28
|
Getz G, Vendruscolo M, Sachs D, Domany E. Automated assignment of SCOP and CATH protein structure classifications from FSSP scores. Proteins 2002; 46:405-15. [PMID: 11835515 DOI: 10.1002/prot.1176] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We present an automated procedure to assign CATH and SCOP classifications to proteins whose FSSP score is available. CATH classification is assigned down to the topology level, and SCOP classification is assigned to the fold level. Because the FSSP database is updated weekly, this method makes it possible to update also CATH and SCOP with the same frequency. Our predictions have a nearly perfect success rate when ambiguous cases are discarded. These ambiguous cases are intrinsic in any protein structure classification that relies on structural information alone. Hence, we introduce the "twilight zone for structure classification." We further suggest that to resolve these ambiguous cases, other criteria of classification, based also on information about sequence and function, must be used.
Collapse
Affiliation(s)
- Gad Getz
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | | | | | | |
Collapse
|
29
|
Buchan DWA, Shepherd AJ, Lee D, Pearl FMG, Rison SCG, Thornton JM, Orengo CA. Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database. Genome Res 2002; 12:503-14. [PMID: 11875040 PMCID: PMC155287 DOI: 10.1101/gr.213802] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We present a novel web-based resource, Gene3D, of precalculated structural assignments to gene sequences and whole genomes. This resource assigns structural domains from the CATH database to whole genes and links these to their curated functional and structural annotations within the CATH domain structure database, the functional Dictionary of Homologous Superfamilies (DHS) and PDBsum. Currently Gene3D provides annotation for 36 complete genomes (two eukaryotes, six archaea, and 28 bacteria). On average, between 30% and 40% of the genes of a given genome can be structurally annotated. Matches to structural domains are found using the profile-based method (PSI-BLAST). and a novel protocol, DRange, is used to resolve conflicts in matches involving different homologous superfamilies.
Collapse
Affiliation(s)
- Daniel W A Buchan
- Biomolecular Structure and Modelling Group, Department of Biochemistry and Molecular Biology, University College London, London, WC1E 6BT, United Kingdom
| | | | | | | | | | | | | |
Collapse
|
30
|
Pearl FMG, Lee D, Bray JE, Buchan DWA, Shepherd AJ, Orengo CA. The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Sci 2002; 11:233-44. [PMID: 11790833 PMCID: PMC2373435 DOI: 10.1110/ps.16802] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
An automatic sequence search and analysis protocol (DomainFinder) based on PSI-BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI-BLAST identifies a clear relationship to at least one other Protein Data Bank sequence within that superfamily. This has resulted in an expansion of the CATH protein family database (CATH-PFDB v1.6) from 19,563 domain structures to 176,597 domain sequences. A further 50,000 putative homologous relationships can be identified using less stringent cut-offs and these relationships are maintained within neighbour tables in the CATH Oracle database, pending further evidence of their suggested evolutionary relationship. Analysis of the CATH-PFDB has shown that only 15% of the sequence families are close enough to a known structure for reliable homology modeling. IMPALA/PSI-BLAST profiles have been generated for each of the sequence families in the expanded CATH-PFDB and a web server has been provided so that new sequences may be scanned against the profile library and be assigned to a structure and homologous superfamily.
Collapse
Affiliation(s)
- Frances M G Pearl
- Department of Biochemistry and Molecular Biology, University College London, University of London, London WC1E 6BT, UK.
| | | | | | | | | | | |
Collapse
|
31
|
Pandit SB, Gosar D, Abhiman S, Sujatha S, Dixit SS, Mhatre NS, Sowdhamini R, Srinivasan N. SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes. Nucleic Acids Res 2002; 30:289-93. [PMID: 11752317 PMCID: PMC99061 DOI: 10.1093/nar/30.1.289] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional structures of the proteins are determined using X-ray analysis or NMR. The SUPFAM database described here relates two homologous protein families in a multiple sequence alignment database of either known or unknown structure. The present release (1.1), which is the first version of the SUPFAM database, has been derived by analysing Pfam, which is one of the commonly used databases of multiple sequence alignments of homologous proteins. The first step in establishing SUPFAM is to relate Pfam families with the families in PALI, which is an alignment database of homologous proteins of known structure that is derived largely from SCOP. The second step involves relating Pfam families which could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, IMPALA, has been used in these steps. The first step resulted in identification of 1280 Pfam families (out of 2697, i.e. 47%) which are related, either by close homologous connection to a SCOP family or by distant relationship to a SCOP family, potentially forming new superfamily connections. Using the profiles of 1417 Pfam families with apparently no structural information, an all-against-all comparison involving a sequence-profile match using IMPALA resulted in clustering of 67 homologous protein families of Pfam into 28 potential new superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying 'priority proteins' for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. For example, we could assign 858 distinct Pfam domains in 2203 of the gene products in the genome of Mycobacterium tubercolosis. Fifty-one of these Pfam families of unknown structure could be clustered into 17 potentially new superfamilies forming good targets for structural genomics. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/~supfam.
Collapse
Affiliation(s)
- Shashi B Pandit
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Pieper U, Eswar N, Stuart AC, Ilyin VA, Sali A. MODBASE, a database of annotated comparative protein structure models. Nucleic Acids Res 2002; 30:255-9. [PMID: 11752309 PMCID: PMC99112 DOI: 10.1093/nar/30.1.255] [Citation(s) in RCA: 84] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2001] [Revised: 10/02/2001] [Accepted: 10/02/2001] [Indexed: 11/12/2022] Open
Abstract
MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10(-4)) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server.
Collapse
Affiliation(s)
- Ursula Pieper
- Laboratories of Molecular Biophysics, The Pels Family Center for Biochemistry and Structural Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10021, USA
| | | | | | | | | |
Collapse
|
33
|
Orengo CA, Bray JE, Buchan DWA, Harrison A, Lee D, Pearl FMG, Sillitoe I, Todd AE, Thornton JM. The CATH protein family database: A resource for structural and functional annotation of genomes. Proteomics 2002. [DOI: 10.1002/1615-9861(200201)2:1<11::aid-prot11>3.0.co;2-t] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
34
|
Affiliation(s)
- J M Thornton
- University College Department of Biochemistry and Molecular Biology, London WC1E 6BT, UK.
| |
Collapse
|
35
|
Orengo CA, Sillitoe I, Reeves G, Pearl FM. Review: what can structural classifications reveal about protein evolution? J Struct Biol 2001; 134:145-65. [PMID: 11551176 DOI: 10.1006/jsbi.2001.4398] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In this article we present a review of the methods used for comparing and classifying protein structures. We discuss the hierarchies and populations of fold groups and evolutionary families in some of the major classifications and we consider some of the problems confronting any general analyses of structural evolution in protein families. We also review some more recent analyses that have expanded these classifications by identifying sequence relatives in the genomes and thereby reveal interesting trends in fold usage and recurrence.
Collapse
Affiliation(s)
- C A Orengo
- Department of Biochemistry and Molecular Biology, University College, Gower Street, London, WC1E 6BT, United Kingdom
| | | | | | | |
Collapse
|
36
|
Pearl FM, Martin N, Bray JE, Buchan DW, Harrison AP, Lee D, Reeves GA, Shepherd AJ, Sillitoe I, Todd AE, Thornton JM, Orengo CA. A rapid classification protocol for the CATH Domain Database to support structural genomics. Nucleic Acids Res 2001; 29:223-7. [PMID: 11125098 PMCID: PMC29791 DOI: 10.1093/nar/29.1.223] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant structural relatives by incorporating sequence information from relatives in the genomes (DomainFinder). In order to ensure the integrity of the database given the expected increase in data, the CATH Protein Family Database (CATH-PFDB), which currently includes 25,320 structural domains and a further 160,000 sequence relatives has now been installed in a relational ORACLE database. This was essential for developing more rigorous validation procedures and for allowing efficient querying of the database, particularly for genome analysis. The associated Dictionary of Homologous Superfamilies [Bray,J.E., Todd,A.E., Pearl,F.M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153-165], which provides multiple structural alignments and functional information to assist in assigning new relatives, has also been expanded recently and now includes information for 903 homologous superfamilies. In order to improve coverage of known structures, preliminary classification levels are now provided for new structures at interim stages in the classification protocol. Since a large proportion of new structures can be rapidly classified using profile-based sequence analysis [e.g. PSI-BLAST: Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402], this provides preliminary classification for easily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of the non-identical structures.
Collapse
Affiliation(s)
- F M Pearl
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
|