51
|
Abstract
PDBsum is a web-based database providing a largely pictorial summary of the key information on each macromolecular structure deposited at the Protein Data Bank (PDB). It includes images of the structure, annotated plots of each protein chain's secondary structure, detailed structural analyses generated by the PROMOTIF program, summary PROCHECK results and schematic diagrams of protein-ligand and protein-DNA interactions. RasMol scripts highlight key aspects of the structure, such as the protein's domains, PROSITE patterns and protein-ligand interactions, for interactive viewing in 3D. Numerous links take the user to related sites. PDBsum is updated whenever any new structures are released by the PDB and is freely accessible via http://www.biochem.ucl.ac.uk/bsm/pdbsum.
Collapse
Affiliation(s)
- R A Laskowski
- Department of Crystallography, Birkbeck College, University of London, Malet Street, London WC1E 7HX, UK.
| |
Collapse
|
52
|
Jonassen I, Eidhammer I, Grindhaug SH, Taylor WR. Searching the protein structure databank with weak sequence patterns and structural constraints. J Mol Biol 2000; 304:599-619. [PMID: 11099383 DOI: 10.1006/jmbi.2000.4211] [Citation(s) in RCA: 22] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
A method is described in which proteins that match PROSITE patterns are filtered by the root-mean-square deviation of the local 3D structures of the probe and target over the pattern components. This was found to increase the discrimination between true and false members of the protein family but was dependent on how unique the structural features in the pattern were compared to equivalent fragments extracted from the structure databank (for example; if the pattern fell in an alpha-helix, then discrimination was poor.) We then generalised the sequence patterns (by widening the range of amino acid residues allowed at each position) and monitored how well the structural information helped retain specificity. While the discrimination of the pure sequence pattern had generally disappeared at information content values less than ten bits, the discrimination of the combined sequence structure probe remained high at this point before following a similar decay. The displacement between these curves indicates that the structural component is, on average, equivalent to about ten bits. The sequence patterns were also filtered using the structure comparison program SAP, giving a global, rather than local "view" of the proteins. This allowed the information content of the sequence patterns to become even less specific but raised problems of whether some proteins encountered with the same fold but no PROSITE pattern should constitute family members.
Collapse
Affiliation(s)
- I Jonassen
- Department of Informatics, University of Bergen, Hoyteknologisenteret (P.B. 7800), Bergen, N-5020, Norway
| | | | | | | |
Collapse
|
53
|
Erlandsen H, Abola EE, Stevens RC. Combining structural genomics and enzymology: completing the picture in metabolic pathways and enzyme active sites. Curr Opin Struct Biol 2000; 10:719-30. [PMID: 11114510 DOI: 10.1016/s0959-440x(00)00154-8] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
An important goal of structural genomics is to complete the structural analysis of all the enzymes in metabolic pathways and to understand the structural similarities and differences. A preliminary glimpse of this type of analysis was achieved before structural genomics efforts with the glycolytic pathway and efforts are underway for many other pathways, including that of catecholamine metabolism. Structural enzymology necessitates a complete structural characterization, even for highly homologous proteins (greater than 80% sequence homology), as every active site has distinct structural features and it is these active site differences that distinguish one enzyme from another. Short cuts with homology modeling cannot be taken with our current knowledge base. Each enzyme structure in a pathway needs to be determined, including structures containing bound substrates, cofactors, products and transition state analogs, in order to obtain a complete structural and functional understanding of pathway-related enzymes.
Collapse
Affiliation(s)
- H Erlandsen
- The Scripps Research Institute, Department of Molecular Biology, La Jolla, CA 92037, USA
| | | | | |
Collapse
|
54
|
Abstract
Comparative protein structure prediction is limited mostly by the errors in alignment and loop modeling. We describe here a new automated modeling technique that significantly improves the accuracy of loop predictions in protein structures. The positions of all nonhydrogen atoms of the loop are optimized in a fixed environment with respect to a pseudo energy function. The energy is a sum of many spatial restraints that include the bond length, bond angle, and improper dihedral angle terms from the CHARMM-22 force field, statistical preferences for the main-chain and side-chain dihedral angles, and statistical preferences for nonbonded atomic contacts that depend on the two atom types, their distance through space, and separation in sequence. The energy function is optimized with the method of conjugate gradients combined with molecular dynamics and simulated annealing. Typically, the predicted loop conformation corresponds to the lowest energy conformation among 500 independent optimizations. Predictions were made for 40 loops of known structure at each length from 1 to 14 residues. The accuracy of loop predictions is evaluated as a function of thoroughness of conformational sampling, loop length, and structural properties of native loops. When accuracy is measured by local superposition of the model on the native loop, 100, 90, and 30% of 4-, 8-, and 12-residue loop predictions, respectively, had <2 A RMSD error for the mainchain N, C(alpha), C, and O atoms; the average accuracies were 0.59 +/- 0.05, 1.16 +/- 0.10, and 2.61 +/- 0.16 A, respectively. To simulate real comparative modeling problems, the method was also evaluated by predicting loops of known structure in only approximately correct environments with errors typical of comparative modeling without misalignment. When the RMSD distortion of the main-chain stem atoms is 2.5 A, the average loop prediction error increased by 180, 25, and 3% for 4-, 8-, and 12-residue loops, respectively. The accuracy of the lowest energy prediction for a given loop can be estimated from the structural variability among a number of low energy predictions. The relative value of the present method is gauged by (1) comparing it with one of the most successful previously described methods, and (2) describing its accuracy in recent blind predictions of protein structure. Finally, it is shown that the average accuracy of prediction is limited primarily by the accuracy of the energy function rather than by the extent of conformational sampling.
Collapse
Affiliation(s)
- A Fiser
- Laboratory of Molecular Biophysics, Pels Family Center for Biochemistry and Structural Biology, The Rockefeller University, New York, New York 10021, USA.
| | | | | |
Collapse
|
55
|
Zhang H, Huang K, Li Z, Banerjei L, Fisher KE, Grishin NV, Eisenstein E, Herzberg O. Crystal structure of YbaK protein from Haemophilus influenzae (HI1434) at 1.8 A resolution: functional implications. Proteins 2000; 40:86-97. [PMID: 10813833 DOI: 10.1002/(sici)1097-0134(20000701)40:1<86::aid-prot100>3.0.co;2-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Structural genomics of proteins of unknown function most straightforwardly assists with assignment of biochemical activity when the new structure resembles that of proteins whose functions are known. When a new fold is revealed, the universe of known folds is enriched, and once the function is determined by other means, novel structure-function relationships are established. The previously unannotated protein HI1434 from H. influenzae provides a hybrid example of these two paradigms. It is a member of a microbial protein family, labeled in SwissProt as YbaK and ebsC. The crystal structure at 1.8 A resolution reported here reveals a fold that is only remotely related to the C-lectin fold, in particular to endostatin, and thus is not sufficiently similar to imply that YbaK proteins are saccharide binding proteins. However, a crevice that may accommodate a small ligand is evident. The putative binding site contains only one invariant residue, Lys46, which carries a functional group that could play a role in catalysis, indicating that YbaK is probably not an enzyme. Detailed sequence analysis, including a number of newly sequenced microbial organisms, highlights sequence homology to an insertion domain in prolyl-tRNA synthetases (proRS) from prokaryote, a domain whose function is unknown. A HI1434-based model of the insertion domain shows that it should also contain the putative binding site. Being part of a tRNA synthetases, the insertion domain is likely to be involved in oligonucleotide binding, with possible roles in recognition/discrimination or editing of prolyl-tRNA. By analogy, YbaK may also play a role in nucleotide or oligonucleotide binding, the nature of which is yet to be determined.
Collapse
Affiliation(s)
- H Zhang
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, MD 20850, USA
| | | | | | | | | | | | | | | |
Collapse
|
56
|
Abstract
To determine if variable sequences (spacers) between conserved positions in a sequence motif or pattern share a consensus structure, three-dimensional structures containing PROSITE patterns with spacers of fixed length greater than three residues were analyzed. Structural similarities of a given pattern were evaluated by computing the backbone phi, psi and side-chain chi1 dihedral order parameters. The exact bias information in analyzing the conformational variability of the patterns was taken into account by introducing a new parameter, the bias coefficient, which describes the number and distribution of residue types found at each position of a pattern in the structures. The results of the analyses show that backbone conformational heterogeneity at a given position in a sequence motif does not necessarily correlate with the residue-type variability at that position, and the long spacer region can adopt a well-defined backbone conformation, in addition to the conserved residues. Furthermore, a PROSITE pattern may be redefined to yield two or more "refined" regular expressions, each corresponding to a distinct backbone conformation. A way in which the observed structural consensus in a pattern may be employed to improve the accuracy of function prediction from sequence is suggested.
Collapse
Affiliation(s)
- K Y Lin
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan, 11529
| | | | | |
Collapse
|
57
|
Skolnick J, Fetrow JS, Kolinski A. Structural genomics and its importance for gene function analysis. Nat Biotechnol 2000; 18:283-7. [PMID: 10700142 DOI: 10.1038/73723] [Citation(s) in RCA: 161] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Structural genomics projects aim to solve the experimental structures of all possible protein folds. Such projects entail a conceptual shift from traditional structural biology in which structural information is obtained on known proteins to one in which the structure of a protein is determined first and the function assigned only later. Whereas the goal of converting protein structure into function can be accomplished by traditional sequence motif-based approaches, recent studies have shown that assignment of a protein's biochemical function can also be achieved by scanning its structure for a match to the geometry and chemical identity of a known active site. Importantly, this approach can use low-resolution structures provided by contemporary structure prediction methods. When applied to genomes, structural information (either experimental or predicted) is likely to play an important role in high-throughput function assignment.
Collapse
Affiliation(s)
- J Skolnick
- Laboratory of Computational Genomics, The Danforth Plant Science Center, 893 N, Warson Rd., St. Louis, MO 63141, USA.
| | | | | |
Collapse
|
58
|
Bray JE, Todd AE, Pearl FM, Thornton JM, Orengo CA. The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues. PROTEIN ENGINEERING 2000; 13:153-65. [PMID: 10775657 DOI: 10.1093/protein/13.3.153] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
A consensus approach has been developed for identifying distant structural homologues. This is based on the CATH Dictionary of Homologous Superfamilies (DHS), a database of validated multiple structural alignments annotated with consensus functional information for evolutionary protein superfamilies (URL: http://www. biochem.ucl.ac.uk/bsm/dhs). Multiple structural alignments have been generated for 362 well-populated superfamilies in the CATH structural domain database and annotated with secondary structure, physicochemical properties, functional sequence patterns and protein-ligand interaction data. Consensus functional information for each superfamily includes descriptions and keywords extracted from SWISS-PROT and the ENZYME database. The Dictionary provides a powerful resource to validate, examine and visualize key structural and functional features of each homologous superfamily. The value of the DHS, for assessing functional variability and identifying distant evolutionary relationships, is illustrated using the pyridoxal-5'-phosphate (PLP) binding aspartate aminotransferase superfamily. The DHS also provides a tool for examining sequence-structure relationships for proteins within each fold group.
Collapse
Affiliation(s)
- J E Bray
- Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College London, Gower Street,London WC1E 6BT, UK.
| | | | | | | | | |
Collapse
|
59
|
Jackson RM, Russell RB. The serine protease inhibitor canonical loop conformation: examples found in extracellular hydrolases, toxins, cytokines and viral proteins. J Mol Biol 2000; 296:325-34. [PMID: 10669590 DOI: 10.1006/jmbi.1999.3389] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Methods for the prediction of protein function from structure are of growing importance in the age of structural genomics. Here, we focus on the problem of identifying sites of potential serine protease inhibitor interactions on the surface of proteins of known structure. Given that there is no sequence conservation within canonical loops from different inhibitor families, we first compare representative loops to all fragments of equal length among proteins of known structure by calculating main-chain RMS deviation. Fragments with RMS deviation below a certain threshold (hits) are removed if residues have solvent accessibilities appreciably lower than those observed in the search structure. These remaining hits are further filtered to remove those occurring largely within secondary structure elements. Likely functional significance is restricted further by considering only extracellular protein domains. By comparing different canonical loop structures to the protein structure database, we show that the method is able to detect previously known inhibitors. In addition, we discuss potentially new canonical loop structures found in secreted hydrolases, toxins, viral proteins, cytokines and other proteins. We discuss the possible functional significance of several of the examples found, and comment on implications for the prediction of function from protein 3D structure.
Collapse
Affiliation(s)
- R M Jackson
- Department of Biochemistry, University College, Gower Street, London, WC1E 6BT, UK.
| | | |
Collapse
|
60
|
Skolnick J, Fetrow JS. From genes to protein structure and function: novel applications of computational approaches in the genomic era. Trends Biotechnol 2000; 18:34-9. [PMID: 10631780 DOI: 10.1016/s0167-7799(99)01398-0] [Citation(s) in RCA: 92] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The genome-sequencing projects are providing a detailed 'parts list' of life. A key to comprehending this list is understanding the function of each gene and each protein at various levels. Sequence-based methods for function prediction are inadequate because of the multifunctional nature of proteins. However, just knowing the structure of the protein is also insufficient for prediction of multiple functional sites. Structural descriptors for protein functional sites are crucial for unlocking the secrets in both the sequence and structural-genomics projects.
Collapse
Affiliation(s)
- J Skolnick
- Danforth Plant Science Center, Laboratory of Computational Genomics, St Louis, MO 63108, USA.
| | | |
Collapse
|
61
|
Abstract
Several databases of protein structural families now exist-organised according to both evolutionary relationships and common folding arrangements. Although these lag behind sequence databases in size, the prospect of structural genomics initiatives means that they may soon include representatives of many of the sequence families. To some extent, functional information can be derived from structural similarity. For some structural families, their function is highly conserved, whereas, for others, it can only be inherited or derived on the basis of additional information (e.g. sequence patterns, common residue clusters and characteristic surface properties).
Collapse
Affiliation(s)
- C A Orengo
- Biomolecular Structure and Modelling Unit, Department of Biochemistry, University College London, UK.
| | | | | |
Collapse
|
62
|
Todd AE, Orengo CA, Thornton JM. DOMPLOT: a program to generate schematic diagrams of the structural domain organization within proteins, annotated by ligand contacts. PROTEIN ENGINEERING 1999; 12:375-9. [PMID: 10360977 DOI: 10.1093/protein/12.5.375] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
A program is described for automatically generating schematic linear representations of protein chains in terms of their structural domains. The program requires the co-ordinates of the chain, the domain assignment, PROSITE information and a file listing all intermolecular interactions in the protein structure. The output is a PostScript file in which each protein is represented by a set of linked boxes, each box corresponding to all or part of a structural domain. PROSITE motifs and residues involved in ligand interactions are highlighted. The diagrams allow immediate visualization of the domain arrangement within a protein chain, and by providing information on sequence motifs, and metal ion, ligand and DNA binding at the domain level, the program facilitates detection of remote evolutionary relationships between proteins.
Collapse
Affiliation(s)
- A E Todd
- Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK
| | | | | |
Collapse
|