51
|
Tang CL, Xie L, Koh IYY, Posy S, Alexov E, Honig B. On the Role of Structural Information in Remote Homology Detection and Sequence Alignment: New Methods Using Hybrid Sequence Profiles. J Mol Biol 2003; 334:1043-62. [PMID: 14643665 DOI: 10.1016/j.jmb.2003.10.025] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Structural alignments often reveal relationships between proteins that cannot be detected using sequence alignment alone. However, profile search methods based entirely on structural alignments alone have not been found to be effective in finding remote homologs. Here, we explore the role of structural information in remote homolog detection and sequence alignment. To this end, we develop a series of hybrid multidimensional alignment profiles that combine sequence, secondary and tertiary structure information into hybrid profiles. Sequence-based profiles are profiles whose position-specific scoring matrix is derived from sequence alignment alone; structure-based profiles are those derived from multiple structure alignments. We compare pure sequence-based profiles to pure structure-based profiles, as well as to hybrid profiles that use combined sequence-and-structure-based profiles, where sequence-based profiles are used in loop/motif regions and structural information is used in core structural regions. All of the hybrid methods offer significant improvement over simple profile-to-profile alignment. We demonstrate that both sequence-based and structure-based profiles contribute to remote homology detection and alignment accuracy, and that each contains some unique information. We discuss the implications of these results for further improvements in amino acid sequence and structural analysis.
Collapse
Affiliation(s)
- Christopher L Tang
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Columbia University, New York, NY 10032, USA
| | | | | | | | | | | |
Collapse
|
52
|
Skolnick J, Zhang Y, Arakaki AK, Kolinski A, Boniecki M, Szilágyi A, Kihara D. TOUCHSTONE: A unified approach to protein structure prediction. Proteins 2003; 53 Suppl 6:469-79. [PMID: 14579335 DOI: 10.1002/prot.10551] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We have applied the TOUCHSTONE structure prediction algorithm that spans the range from homology modeling to ab initio folding to all protein targets in CASP5. Using our threading algorithm PROSPECTOR that does not utilize input from metaservers, one threads against a representative set of PDB templates. If a template is significantly hit, Generalized Comparative Modeling designed to span the range from closely to distantly related proteins from the template is done. This involves freezing the aligned regions and relaxing the remaining structure to accommodate insertions or deletions with respect to the template. For all targets, consensus predicted side chain contacts from at least weakly threading templates are pooled and incorporated into ab initio folding. Often, TOUCHSTONE performs well in the CM to FR categories, with PROSPECTOR showing significant ability to identify analogous templates. When ab initio folding is done, frequently the best models are closer to the native state than the initial template. Among the particularly good predictions are T0130 in the CM/FR category, T0138 in the FR(H) category, T0135 in the FR(A) category, T0170 in the FR/NF category and T0181 in the NF category. Improvements in the approach are needed in the FR/NF and NF categories. Nevertheless, TOUCHSTONE was one of the best performing algorithms over all categories in CASP5.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, New York 14203, USA.
| | | | | | | | | | | | | |
Collapse
|
53
|
Grigoriev IV, Choi IG. Target selection for structural genomics: a single genome approach. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2003; 6:349-62. [PMID: 12626094 DOI: 10.1089/153623102321112773] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
We describe our strategy for selecting targets for protein structure determination in context of structural genomics of a single genome. In the course of target selection, we have studied two of the smallest microbial genomes, Mycoplasma genitalium and Mycoplasma pneumoniae. To our surprise, we found that only 71 Mycoplasma genes or their orthologues can be considered as easy targets for high-throughput structural studies--far fewer than expected. We discuss the methods and criteria used for target selection and the reasons explaining rarity of easy targets. First, despite the common opinion that protein folds can be predicted for only 30-50% of genes, the number of "truly unknown" structures is less than one-third. Second, due to the different codon usage, two thirds of Mycoplasma proteins cannot be directly expressed in E. coli in high-throughput manner and require substitution by their homologues from other organisms. Third, membrane or large multi-domain proteins are difficult targets because of solubility and size issues and often require identification and structure determination of protein domains. Finally, we propose different approaches to address the difficult targets.
Collapse
Affiliation(s)
- Igor V Grigoriev
- Department of Chemistry and E.O. Lawrence Berkeley National Laboratory, University of California, Berkeley, CA, USA.
| | | |
Collapse
|
54
|
Abstract
The ability to predict protein function from structure is becoming increasingly important as the number of structures resolved is growing more rapidly than our capacity to study function. Current methods for predicting protein function are mostly reliant on identifying a similar protein of known function. For proteins that are highly dissimilar or are only similar to proteins also lacking functional annotations, these methods fail. Here, we show that protein function can be predicted as enzymatic or not without resorting to alignments. We describe 1178 high-resolution proteins in a structurally non-redundant subset of the Protein Data Bank using simple features such as secondary-structure content, amino acid propensities, surface properties and ligands. The subset is split into two functional groupings, enzymes and non-enzymes. We use the support vector machine-learning algorithm to develop models that are capable of assigning the protein class. Validation of the method shows that the function can be predicted to an accuracy of 77% using 52 features to describe each protein. An adaptive search of possible subsets of features produces a simplified model based on 36 features that predicts at an accuracy of 80%. We compare the method to sequence-based methods that also avoid calculating alignments and predict a recently released set of unrelated proteins. The most useful features for distinguishing enzymes from non-enzymes are secondary-structure content, amino acid frequencies, number of disulphide bonds and size of the largest cleft. This method is applicable to any structure as it does not require the identification of sequence or structural similarity to a protein of known function.
Collapse
Affiliation(s)
- Paul D Dobson
- Department of Biomolecular Sciences, UMIST, P.O. Box 88, Manchester M60 1QD, UK
| | | |
Collapse
|
55
|
Karchin R, Cline M, Mandel-Gutfreund Y, Karplus K. Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry. Proteins 2003; 51:504-14. [PMID: 12784210 DOI: 10.1002/prot.10369] [Citation(s) in RCA: 154] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
An important problem in computational biology is predicting the structure of the large number of putative proteins discovered by genome sequencing projects. Fold-recognition methods attempt to solve the problem by relating the target proteins to known structures, searching for template proteins homologous to the target. Remote homologs that may have significant structural similarity are often not detectable by sequence similarities alone. To address this, we incorporated predicted local structure, a generalization of secondary structure, into two-track profile hidden Markov models (HMMs). We did not rely on a simple helix-strand-coil definition of secondary structure, but experimented with a variety of local structure descriptions, following a principled protocol to establish which descriptions are most useful for improving fold recognition and alignment quality. On a test set of 1298 nonhomologous proteins, HMMs incorporating a 3-letter STRIDE alphabet improved fold recognition accuracy by 15% over amino-acid-only HMMs and 23% over PSI-BLAST, measured by ROC-65 numbers. We compared two-track HMMs to amino-acid-only HMMs on a difficult alignment test set of 200 protein pairs (structurally similar with 3-24% sequence identity). HMMs with a 6-letter STRIDE secondary track improved alignment quality by 62%, relative to DALI structural alignments, while HMMs with an STR track (an expanded DSSP alphabet that subdivides strands into six states) improved by 40% relative to CE.
Collapse
Affiliation(s)
- Rachel Karchin
- Center for Biomolecular Science and Engineering, Baskin School of Engineering, University of California, Santa Cruz 95064, USA.
| | | | | | | |
Collapse
|
56
|
Jackson DB, Minch E, Munro RE. Bioinformatics. EXS 2003:31-69. [PMID: 12613171 DOI: 10.1007/978-3-0348-7997-2_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
|
57
|
Mallick P, Weiss R, Eisenberg D. The directional atomic solvation energy: an atom-based potential for the assignment of protein sequences to known folds. Proc Natl Acad Sci U S A 2002; 99:16041-6. [PMID: 12461172 PMCID: PMC138561 DOI: 10.1073/pnas.252626399] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The Directional Atomic Solvation EnergY (DASEY) is an atom-based description of the environment of an amino acid position within a known 3D protein structure. The DASEY has been developed to align and score a probe amino acid sequence to a library of template protein structures for fold assignment. DASEY is computed by summing the atomic solvation parameters of atoms falling within a tetrahedral sector, or petal, extending 16 A along each of the four bond axes of each alpha-carbon atom of the protein. The DASEY discriminates between pairs of structurally equivalent positions and random pairs in protein structures sharing a fold but belonging to different superfamilies, unlike some previous descriptors of protein environments, such as buried area. Furthermore, the DASEY values have characteristic patterns of residue replacement, an essential feature of a successful fold assignment method. Benchmarking fold assignment with DASEY achieves coverage of 56% of sequences with 90% accuracy when probe sequences are matched to protein structural templates belonging to the same fold but to a different superfamily, an improvement of greater than 200% over a previous method.
Collapse
Affiliation(s)
- Parag Mallick
- Department of Chemistry and Biochemistry, and University of California, UCLA-DOE Center for Genomics and Proteomics, Molecular Biology Institute, Howard Hughes Medical Institute, University of California, Los Angeles, CA 90095-1570, USA
| | | | | |
Collapse
|
58
|
Williams MG, Shirai H, Shi J, Nagendra HG, Mueller J, Mizuguchi K, Miguel RN, Lovell SC, Innis CA, Deane CM, Chen L, Campillo N, Burke DF, Blundell TL, de Bakker PI. Sequence-structure homology recognition by iterative alignment refinement and comparative modeling. Proteins 2002; Suppl 5:92-7. [PMID: 11835486 DOI: 10.1002/prot.1169] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Our approach to fold recognition for the fourth critical assessment of techniques for protein structure prediction (CASP4) experiment involved the use of the FUGUE sequence-structure homology recognition program (http://www-cryst.bioc.cam.ac.uk/fugue), followed by model building. We treat models as hypotheses and examine these to determine whether they explain the available data. Our method depends heavily on environment-specific substitution tables derived from our database of structural alignments of homologous proteins (HOMSTRAD, http://www-cryst.bioc.cam.ac.uk/homstrad/). FUGUE uses these tables to incorporate structural information into profiles created from HOMSTRAD alignments that are matched against a profile created for the target from multiple sequence alignment. In addition, environment-specific substitution tables are used throughout the modeling procedure and as part of the model evaluation. Annotation of sequence alignments with JOY, to reflect local structural features, proved valuable, both for modifying hypotheses, and for rejecting predictions when the expected pattern of conservation is not observed. Our stringency in rejecting incorrect predictions led us to submit a relatively small number of models, including only a low number of false positives, resulting in a high average score.
Collapse
Affiliation(s)
- M G Williams
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
59
|
Yuan Z, Burrage K, Mattick JS. Prediction of protein solvent accessibility using support vector machines. Proteins 2002; 48:566-70. [PMID: 12112679 DOI: 10.1002/prot.10176] [Citation(s) in RCA: 83] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
A Support Vector Machine learning system has been trained to predict protein solvent accessibility from the primary structure. Different kernel functions and sliding window sizes have been explored to find how they affect the prediction performance. Using a cut-off threshold of 15% that splits the dataset evenly (an equal number of exposed and buried residues), this method was able to achieve a prediction accuracy of 70.1% for single sequence input and 73.9% for multiple alignment sequence input, respectively. The prediction of three and more states of solvent accessibility was also studied and compared with other methods. The prediction accuracies are better than, or comparable to, those obtained by other methods such as neural networks, Bayesian classification, multiple linear regression, and information theory. In addition, our results further suggest that this system may be combined with other prediction methods to achieve more reliable results, and that the Support Vector Machine method is a very useful tool for biological sequence analysis.
Collapse
Affiliation(s)
- Zheng Yuan
- Institute for Molecular Bioscience and ARC Special Centre for Functional and Applied Genomics, The University of Queensland, Brisbane, Australia.
| | | | | |
Collapse
|
60
|
Lin AP, McAlister-Henn L. Isocitrate binding at two functionally distinct sites in yeast NAD+-specific isocitrate dehydrogenase. J Biol Chem 2002; 277:22475-83. [PMID: 11953438 DOI: 10.1074/jbc.m202534200] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Yeast NAD(+)-specific isocitrate dehydrogenase (IDH) is an octamer containing two types of homologous subunits. Ligand-binding analyses were conducted to examine effects of residue changes in putative catalytic and regulatory isocitrate-binding sites respectively contained in IDH2 and IDH1 subunits. Replacement of homologous serine residues in either subunit site, S98A in IDH2 or S92A in IDH1, was found to reduce by half the total number of holoenzyme isocitrate-binding sites, confirming a correlation between detrimental effects on isocitrate binding and respective kinetic defects in catalysis and allosteric activation by AMP. Replacement of both serine residues eliminates isocitrate binding and measurable catalytic activity. The putative isocitrate-binding sites of IDH1 and IDH2 contain five identical and four nonidentical residues. Reciprocal replacement of the four nonidentical residues in either or both subunits (A108R, F136Y, T241D, and N245D in IDH1 and/or R114A, Y142F, D248T, and D252N in IDH2) was found to be permissive for isocitrate binding. This provides further evidence for two types of binding sites in IDH, although the authentic residues have been shown to be necessary for normal kinetic contributions. Finally, the mutant enzymes with residue replacements in the IDH1 site were found to be unable to bind AMP, suggesting that allosteric activation is dependent both upon binding of isocitrate at the IDH1 site and upon the changes in the enzyme normally elicited by this binding.
Collapse
Affiliation(s)
- An-Ping Lin
- Department of Biochemistry, University of Texas Health Science Center, San Antonio, Texas 78229-3900, USA
| | | |
Collapse
|
61
|
Abstract
Multiple sequence alignments are a routine tool in protein fold recognition, but multiple structure alignments are computationally less cooperative. This work describes a method for protein sequence threading and sequence-to-structure alignments that uses multiple aligned structures, the aim being to improve models from protein threading calculations. Sequences are aligned into a field due to corresponding sites in homologous proteins. On the basis of a test set of more than 570 protein pairs, the procedure does improve alignment quality, although no more than averaging over sequences. For the force field tested, the benefit of structure averaging is smaller than that of adding sequence similarity terms or a contribution from secondary structure predictions. Although there is a significant improvement in the quality of sequence-to-structure alignments, this does not directly translate to an immediate improvement in fold recognition capability.
Collapse
Affiliation(s)
- Anthony J Russell
- Research School of Chemistry, Australian National University, Canberra, Australia
| | | |
Collapse
|
62
|
Hedman M, Deloof H, Von Heijne G, Elofsson A. Improved detection of homologous membrane proteins by inclusion of information from topology predictions. Protein Sci 2002; 11:652-8. [PMID: 11847287 PMCID: PMC2373465 DOI: 10.1110/ps.39402] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
A total of 20%-25% of the proteins in a typical genome are helical membrane proteins. The transmembrane regions of these proteins have markedly different properties when compared with globular proteins. This presents a problem when homology search algorithms optimized for globular proteins are applied to membrane proteins. Here we present modifications of the standard Smith-Waterman and profile search algorithms that significantly improve the detection of related membrane proteins. The improvement is based on the inclusion of information about predicted transmembrane segments in the alignment algorithm. This is done by simply increasing the alignment score if two residues predicted to belong to transmembrane segments are aligned with each other. Benchmarking over a test set of G-protein-coupled receptor sequences shows that the number of false positives is significantly reduced in this way, both when closely related and distantly related proteins are searched for.
Collapse
Affiliation(s)
- Maria Hedman
- Stockholm Bioinformatics Center, SCFAB, Stockholm University, SE-10691, Stockholm, Sweden
| | | | | | | |
Collapse
|
63
|
Visiers I, Ballesteros JA, Weinstein H. Three-dimensional representations of G protein-coupled receptor structures and mechanisms. Methods Enzymol 2002; 343:329-71. [PMID: 11665578 DOI: 10.1016/s0076-6879(02)43145-x] [Citation(s) in RCA: 151] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
- Irache Visiers
- Department of Physiology and Biophysics, Mount Sinai School of Medicine, New York, New York 10029, USA
| | | | | |
Collapse
|
64
|
Abstract
One of the most central methods in bioinformatics is the alignment of two protein or DNA sequences. However, so far large-scale benchmarks examining the quality of these alignments are scarce. On the other hand, recently several large-scale studies of the capacity of different methods to identify related sequences has led to new insights about the performance of fold recognition methods. To increase our understanding about fold recognition methods, we present a large-scale benchmark of alignment quality. We compare alignments from several different alignment methods, including sequence alignments, hidden Markov models, PSI-BLAST, CLUSTALW, and threading methods. For most methods, the alignment quality increases significantly at about 20% sequence identity. The difference in alignment quality between different methods is quite small, and the main difference can be seen at the exact positioning of the sharp rise in alignment quality, that is, around 15-20% sequence identity. The alignments are improved by using structural information. In general, the best alignments are obtained by methods that use predicted secondary structure information and sequence profiles obtained from PSI-BLAST. One interesting observation is that for different pairs many different methods create the best alignments. This finding implies that if a method that could select the best alignment method for each pair existed, a significant improvement of the alignment quality could be gained.
Collapse
Affiliation(s)
- Arne Elofsson
- Stockholm Bioinformatics Center, Stockholm University, SE-10691, Stockholm, Sweden.
| |
Collapse
|
65
|
Al-Lazikani B, Sheinerman FB, Honig B. Combining multiple structure and sequence alignments to improve sequence detection and alignment: application to the SH2 domains of Janus kinases. Proc Natl Acad Sci U S A 2001; 98:14796-801. [PMID: 11752426 PMCID: PMC64938 DOI: 10.1073/pnas.011577898] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In this paper, an approach is described that combines multiple structure alignments and multiple sequence alignments to generate sequence profiles for protein families. First, multiple sequence alignments are generated from sequences that are closely related to each sequence of known three-dimensional structure. These alignments then are merged through a multiple structure alignment of family members of known structure. The merged alignment is used to generate a Hidden Markov Model for the family in question. The Hidden Markov Model can be used to search for new family members or to improve alignments for distantly related family members that already have been identified. Application of a profile generated for SH2 domains indicates that the Janus family of nonreceptor protein tyrosine kinases contains SH2 domains. This conclusion is strongly supported by the results of secondary structure-prediction programs, threading calculations, and the analysis of comparative models generated for these domains. One of the Janus kinases, human TYK2, has an SH2 domain that contains a histidine instead of the conserved arginine at the key phosphotyrosine-binding position, betaB5. Calculations of the pK(a) values of the betaB5 arginines in a number of SH2 domains and of the betaB5 histidine in a homology model of TYK2 suggest that this histidine is likely to be neutral around pH 7, thus indicating that it may have lost the ability to bind phosphotyrosine. If this indeed is the case, TYK2 may contain a domain with an SH2 fold that has a modified binding specificity.
Collapse
Affiliation(s)
- B Al-Lazikani
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, NY 10032, USA
| | | | | |
Collapse
|
66
|
Pazos F, Heredia P, Valencia A, de las Rivas J. Threading structural model of the manganese-stabilizing protein PsbO reveals presence of two possible beta-sandwich domains. Proteins 2001; 45:372-81. [PMID: 11746685 DOI: 10.1002/prot.10012] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The manganese-stabilizing protein (PsbO) is an essential component of photosystem II (PSII) and is present in all oxyphotosynthetic organisms. PsbO allows correct water splitting and oxygen evolution by stabilizing the reactions driven by the manganese cluster. Despite its important role, its structure and detailed functional mechanism are still unknown. In this article we propose a structural model based on fold recognition and molecular modeling. This model has additional support from a study of the distribution of characteristics of the PsbO sequence family, such as the distribution of conserved, apolar, tree-determinants, and correlated positions. Our threading results consistently showed PsbO as an all-beta (beta) protein, with two homologous beta domains of approximately 120 amino acids linked by a flexible Proline-Glycine-Glycine (PGG) motif. These features are compatible with a general elongated and flexible architecture, in which the two domains form a sandwich-type structure with Greek key topology. The first domain is predicted to include 8 to 9 beta-strands, the second domain 6 to 7 beta-strands. An Ig-like beta-sandwich structure was selected as a template to build the 3-D model. The second domain has, between the strands, long-loops rich in Pro and Gly that are difficult to model. One of these long loops includes a highly conserved region (between P148 and P174) and a short alpha-helix (between E181 and N188)). These regions are characteristic parts of PsbO and show that the second domain is not so similar to the template. Overall, the model was able to account for much of the experimental data reported by several authors, and it would allow the detection of key residues and regions that are proposed in this article as essential for the structure and function of PsbO.
Collapse
Affiliation(s)
- F Pazos
- Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | | | | | | |
Collapse
|
67
|
Abstract
Conventional fold recognition techniques rely mainly on the analysis of the entire sequence of a protein. We present an MBA method to improve performance of any conventional sequence-based fold assignment. The method uses sequence motifs, such as those defined in the Prosite database, and the SwissProt annotation of the fold library. When combined with a simple SDP method, the coverage of MBA is comparable to the results obtained with PSI-BLAST. However, the set of the MBA predictions is significantly different from that of PSI-BLAST, leading to a 40% increase of the coverage for the combined MBA/PSI-BLAST method. The MBA approach can be easily adopted to include the results of sequence-independent function prediction methods and alternative motif and annotation databases. The method is available through the web server localized at http://www.doe-mbi.ucla.edu/mba.
Collapse
Affiliation(s)
- L Salwinski
- Department of Chemistry, UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, UCLA, Los Angeles, California 90095-1570, USA
| | | |
Collapse
|
68
|
Minchiotti G, Manco G, Parisi S, Lago CT, Rosa F, Persico MG. Structure-function analysis of the EGF-CFC family member Cripto identifies residues essential for nodal signalling. Development 2001; 128:4501-10. [PMID: 11714675 DOI: 10.1242/dev.128.22.4501] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
cripto is the founding member of the family of EGF-CFC genes, a class of extracellular factors essential for early vertebrate development. In this study we show that injection of Cripto recombinant protein in mid to late zebrafish Maternal-Zygotic one-eyed pinhead (MZoep) blastulae was able to fully rescue the mutant phenotype, thus providing the first direct evidence that Cripto activity can be added extracellularly to recover oep-encoded function in zebrafish early embryos. Moreover, 15 point mutations and two deletion mutants were generated to assess in vivo their functional relevance by comparing the ability of cripto wild-type and mutant RNAs to rescue the zebrafish MZoep mutant. From this study we concluded that the EGF-CFC domain is sufficient for Cripto biological activity and identified ten point mutations with a functional defective phenotype, two of which, located in the EGF-like domain, correspond to loss-of-function mutations. Finally, we have developed a three-dimensional structural model of Cripto protein and used it as a guide to predict amino acid residues potentially implicated in protein-protein interaction.
Collapse
Affiliation(s)
- G Minchiotti
- International Institute of Genetics and Biophysics, CNR, Via G. Marconi 12, 80125 Naples, Italy.
| | | | | | | | | | | |
Collapse
|
69
|
Lundström J, Rychlewski L, Bujnicki J, Elofsson A. Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci 2001; 10:2354-62. [PMID: 11604541 PMCID: PMC2374055 DOI: 10.1110/ps.08501] [Citation(s) in RCA: 255] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
During recent years many protein fold recognition methods have been developed, based on different algorithms and using various kinds of information. To examine the performance of these methods several evaluation experiments have been conducted. These include blind tests in CASP/CAFASP, large scale benchmarks, and long-term, continuous assessment with newly solved protein structures. These studies confirm the expectation that for different targets different methods produce the best predictions, and the final prediction accuracy could be improved if the available methods were combined in a perfect manner. In this article a neural-network-based consensus predictor, Pcons, is presented that attempts this task. Pcons attempts to select the best model out of those produced by six prediction servers, each using different methods. Pcons translates the confidence scores reported by each server into uniformly scaled values corresponding to the expected accuracy of each model. The translated scores as well as the similarity between models produced by different servers is used in the final selection. According to the analysis based on two unrelated sets of newly solved proteins, Pcons outperforms any single server by generating approximately 8%-10% more correct predictions. Furthermore, the specificity of Pcons is significantly higher than for any individual server. From analyzing different input data to Pcons it can be shown that the improvement is mainly attributable to measurement of the similarity between the different models. Pcons is freely accessible for the academic community through the protein structure-prediction metaserver at http://bioinfo.pl/meta/.
Collapse
Affiliation(s)
- J Lundström
- Stockholm Bioinformatics Center, Stockholm University, SE 10691 Stockholm, Sweden
| | | | | | | |
Collapse
|
70
|
Heinemann U. The Berlin "protein structure factory" initiative: a technology-oriented approach to structural genomics. ERNST SCHERING RESEARCH FOUNDATION WORKSHOP 2001:101-21. [PMID: 11394041 DOI: 10.1007/978-3-662-04645-6_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- U Heinemann
- Forschungsgruppe Kristallographie, Max-Delbrück-Center for Molecular Medicine, Robert Rössle-Strasse 10, 13122 Berlin, Germany
| |
Collapse
|
71
|
Rigden DJ, Monteiro AC, Grossi de Sá MF. The protease inhibitor chagasin of Trypanosoma cruzi adopts an immunoglobulin-type fold and may have arisen by horizontal gene transfer. FEBS Lett 2001; 504:41-4. [PMID: 11522293 DOI: 10.1016/s0014-5793(01)02753-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Chagasin, a protein from Trypanosoma cruzi, is the first member of a new family of tight binding cysteine protease inhibitors [Monteiro, A.C.S., Abrahamson, M., Lima, A.P.C., Vannier-Santos, M.A. and Scharfstein, J. (2001) J. Cell Sci., in press] [corrected]. Despite its lack of significant sequence identity with known proteins, convincing structural models, using variable light chain templates, could be constructed on the basis of threading results. Experimental support for the final structure came from inhibition data for overlapping oligopeptides spanning the chagasin sequence. Chagasin therefore exemplifies a new protease inhibitor structural class and a new natural use for an immunoglobulin-like domain. Limited sequence resemblance suggests that chagasin may represent the result of a rare horizontal gene transfer from host to parasite.
Collapse
Affiliation(s)
- D J Rigden
- National Centre of Genetic Resources and Biotechnology, Cenargen/Embrapa, S.A.I.N. Parque Rural, Final W5 Norte, 70770-900, Brasilia, Brazil.
| | | | | |
Collapse
|
72
|
Cristobal S, Zemla A, Fischer D, Rychlewski L, Elofsson A. A study of quality measures for protein threading models. BMC Bioinformatics 2001; 2:5. [PMID: 11545673 PMCID: PMC55330 DOI: 10.1186/1471-2105-2-5] [Citation(s) in RCA: 148] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2001] [Accepted: 08/01/2001] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Prediction of protein structures is one of the fundamental challenges in biology today. To fully understand how well different prediction methods perform, it is necessary to use measures that evaluate their performance. Every two years, starting in 1994, the CASP (Critical Assessment of protein Structure Prediction) process has been organized to evaluate the ability of different predictors to blindly predict the structure of proteins. To capture different features of the models, several measures have been developed during the CASP processes. However, these measures have not been examined in detail before. In an attempt to develop fully automatic measures that can be used in CASP, as well as in other type of benchmarking experiments, we have compared twenty-one measures. These measures include the measures used in CASP3 and CASP2 as well as have measures introduced later. We have studied their ability to distinguish between the better and worse models submitted to CASP3 and the correlation between them. RESULTS Using a small set of 1340 models for 23 different targets we show that most methods correlate with each other. Most pairs of measures show a correlation coefficient of about 0.5. The correlation is slightly higher for measures of similar types. We found that a significant problem when developing automatic measures is how to deal with proteins of different length. Also the comparisons between different measures is complicated as many measures are dependent on the size of the target. We show that the manual assessment can be reproduced to about 70% using automatic measures. Alignment independent measures, detects slightly more of the models with the correct fold, while alignment dependent measures agree better when selecting the best models for each target. Finally we show that using automatic measures would, to a large extent, reproduce the assessors ranking of the predictors at CASP3. CONCLUSIONS We show that given a sufficient number of targets the manual and automatic measures would have given almost identical results at CASP3. If the intent is to reproduce the type of scoring done by the manual assessor in in CASP3, the best approach might be to use a combination of alignment independent and alignment dependent measures, as used in several recent studies.
Collapse
Affiliation(s)
- Susana Cristobal
- Cell and Molecular Biology Department, Box 596. BMC Uppsala University, SE-751 24 Uppsala, Sweden
| | - Adam Zemla
- Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA 94550-9234 USA
| | - Daniel Fischer
- Department Bioinformatics/Computer Science, Ben Gurion University, Beer-Sheva 84015, Israel
| | - Leszek Rychlewski
- International Institute of Molecular and Cell Biology, Ks. Trojdena 4, 02-109 Warsaw, Poland
| | - Arne Elofsson
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
| |
Collapse
|
73
|
Grigoriev IV, Zhang C, Kim SH. Sequence-based detection of distantly related proteins with the same fold. PROTEIN ENGINEERING 2001; 14:455-8. [PMID: 11522917 DOI: 10.1093/protein/14.7.455] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Affiliation(s)
- I V Grigoriev
- Department of Chemistry, University of California, Berkeley, CA 94720, USA
| | | | | |
Collapse
|
74
|
Shi J, Blundell TL, Mizuguchi K. FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 2001; 310:243-57. [PMID: 11419950 DOI: 10.1006/jmbi.2001.4762] [Citation(s) in RCA: 922] [Impact Index Per Article: 40.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
FUGUE, a program for recognizing distant homologues by sequence-structure comparison (http://www-cryst.bioc.cam.ac.uk/fugue/), has three key features. (1) Improved environment-specific substitution tables. Substitutions of an amino acid in a protein structure are constrained by its local structural environment, which can be defined in terms of secondary structure, solvent accessibility, and hydrogen bonding status. The environment-specific substitution tables have been derived from structural alignments in the HOMSTRAD database (http://www-cryst.bioc. cam.ac.uk/homstrad/). (2) Automatic selection of alignment algorithm with detailed structure-dependent gap penalties. FUGUE uses the global-local algorithm to align a sequence-structure pair when they greatly differ in length and uses the global algorithm in other cases. The gap penalty at each position of the structure is determined according to its solvent accessibility, its position relative to the secondary structure elements (SSEs) and the conservation of the SSEs. (3) Combined information from both multiple sequences and multiple structures. FUGUE is designed to align multiple sequences against multiple structures to enrich the conservation/variation information. We demonstrate that the combination of these three key features implemented in FUGUE improves both homology recognition performance and alignment accuracy.
Collapse
Affiliation(s)
- J Shi
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Old Addenbrookes Site, Cambridge, CB2 1GA, UK
| | | | | |
Collapse
|
75
|
Wrabl JO, Larson SA, Hilser VJ. Thermodynamic propensities of amino acids in the native state ensemble: implications for fold recognition. Protein Sci 2001; 10:1032-45. [PMID: 11316884 PMCID: PMC2374190 DOI: 10.1110/ps.01601] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2001] [Revised: 02/26/2001] [Accepted: 02/26/2001] [Indexed: 10/17/2022]
Abstract
An amino acid sequence, in the context of the solvent environment, contains all of the thermodynamic information necessary to encode a three-dimensional protein structure. To investigate the relationship between an amino acid sequence and its corresponding protein fold, a database of thermodynamic stability information was assembled that spanned 2951 residues from 44 nonhomologous proteins. This information was obtained using the COREX algorithm, which computes an ensemble-based description of the native state of a protein. It was observed that amino acid types partitioned unequally into high, medium, and low thermodynamic stability environments. Furthermore, these distributions were reproducible and were significantly different than those expected from random partitioning. To assess the structural importance of the distributions, simple fold-recognition experiments were performed based on a 3D-1D scoring matrix containing only COREX residue stability information. This procedure was able to recover amino acid sequences corresponding to correct target structures more effectively than scoring matrices derived from randomized data. High-scoring sequences were often aligned correctly with their corresponding target profiles, suggesting that calculated thermodynamic stability profiles have the potential to encode sequence information. As a control, identical fold-recognition experiments were performed on the same database of proteins using DSSP secondary structure information in the scoring matrix, instead of COREX residue stability information. The comparable performance of both approaches suggested that COREX residue stability information and secondary structure information could be of equivalent utility in more sophisticated fold-recognition techniques. The results of this work are a consequence of the idea that amino acid sequences fold not into single, rigidly stable structures but rather into thermodynamic ensembles best represented by a time-averaged structure.
Collapse
Affiliation(s)
- J O Wrabl
- Department of Human Biological Chemistry & Genetics and Sealy Center for Structural Biology, University of Texas Medical Branch, Galveston, Texas 77555-1055, USA
| | | | | |
Collapse
|
76
|
Abstract
Improved sequence alignment at low pairwise identity is important for identifying potential remote homologues in database searches and for obtaining accurate alignments as a prelude to modeling structures by homology. Our work is motivated by two observations: structural data provide superior training examples for developing techniques to improve the alignment of remote homologues; and general substitution patterns for remote homologues differ from those of closely related proteins. We introduce a new set of amino acid residue interchange matrices built from structural superposition data. These matrices exploit known structural homology as a means of characterizing the effect evolution has on residue-substitution profiles. Given their origin, it is not surprising that the individual residue-residue interchange frequencies are chemically sensible. The structural interchange matrices show a significant increase both in pairwise alignment accuracy and in functional annotation/fold recognition accuracy across distantly related sequences. We demonstrate improved pairwise alignment by using superpositions of homologous domains extracted from a structural database as a gold standard and go on to show an increase in fold recognition accuracy using a database of homologous fold families. This was applied to the unassigned open reading frames from the genome of Helicobacter pylori to identify five matches, two of which are not represented by new annotations in the sequence databases. In addition, we describe a new cyclic permutation strategy to identify distant homologues that experienced gene duplication and subsequent deletions. Using this method, we have identified a potential homologue to one additional previously unassigned open reading frame from the H. pylori genome.
Collapse
Affiliation(s)
- J D Blake
- Department of Cellular and Molecular Pharmacology, University of California, Box 0450, San Francisco, CA 94143, USA
| | | |
Collapse
|
77
|
Heinemann U, Frevert J, Hofmann K, Illing G, Maurer C, Oschkinat H, Saenger W. An integrated approach to structural genomics. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2001; 73:347-62. [PMID: 11063780 DOI: 10.1016/s0079-6107(00)00009-2] [Citation(s) in RCA: 43] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Structural genomics aims at determining a set of protein structures that will represent all domain folds present in the biosphere. These structures can be used as the basis for the homology modelling of the majority of all remaining protein domains or, indeed, proteins. Structural genomics therefore promises to provide a comprehensive structural description of the protein universe. To achieve this, a broad scientific effort is required. The Berlin-based "Protein Structure Factory" (PSF) plans to contribute to this effort by setting up a local infrastructure for the low-cost, high-throughput analysis of soluble human proteins. In close collaboration with the German Human Genome Project (DHGP) protein-coding genes will be expressed in Escherichia coli or yeast. Affinity-tagged proteins will be purified semi-automatically for biophysical characterization and structure analysis by X-ray diffraction methods and NMR spectroscopy. In all steps of the structure analysis process, possibilities for automation, parallelization and standardization will be explored. Major new facilities that are created for the PSF include a robotic station for large-scale protein crystallization, an NMR center and an experimental station for protein crystallography at the synchrotron storage ring BESSY II in Berlin.
Collapse
Affiliation(s)
- U Heinemann
- Forschungsgruppe Kristallographie, Max-Delbrück-Centrum für Molekulare Medizin, Robert-Rössle-Strasse 10, 13122, Berlin, Germany.
| | | | | | | | | | | | | |
Collapse
|
78
|
Abstract
The GGDEF domain is detected in many prokaryotic proteins, most of which are of unknown function. Several bacteria carry 12-22 different GGDEF homologues in their genomes. Conducting extensive profile-based searches, we detect statistically supported sequence similarity between GGDEF domain and adenylyl cyclase catalytic domain. From this homology, we deduce that the prokaryotic GGDEF domain is a regulatory enzyme involved in nucleotide cyclization, with the fold similar to that of the eukaryotic cyclase catalytic domain. This prediction correlates with the functional information available on two GGDEF-containing proteins, namely diguanylate cyclase and phosphodiesterase A of Acetobacter xylinum, both of which regulate the turnover of cyclic diguanosine monophosphate. Domain architecture analysis shows that GGDEF is typically present in multidomain proteins containing regulatory domains of signaling pathways or protein-protein interaction modules. Evolutionary tree analysis indicates that GGDEF/cyclase superfamily forms a large diversified cluster of orthologous proteins present in bacteria, archaea, and eukaryotes. Proteins 2001;42:210-216.
Collapse
Affiliation(s)
- J Pei
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050, USA
| | | |
Collapse
|
79
|
Abstract
A homology-based structure prediction method ideally gives both a correct fold assignment and an accurate query-template alignment. In this article we show that the combination of two existing methods, PSI-BLAST and threading, leads to significant enhancement in the success rate of fold recognition. The combined approach, termed COBLATH, also yields much higher alignment accuracy than found in previous studies. It consists of two-way searches both by PSI-BLAST and by threading. In the PSI-BLAST portion, a query is used to search for hits in a library of potential templates and, conversely, each potential template is used to search for hits in a library of queries. In the threading portion, the scoring function is the sum of a sequence profile and a 6x6 substitution matrix between predicted query and known template secondary structure and solvent exposure. "Two-way" in threading means that the query's sequence profile is used to match the sequences of all potential templates and the sequence profiles of all potential templates are used to match the query's sequence. When tested on a set of 533 nonhomologous proteins, COBLATH was able to assign folds for 390 (73%). Among these 390 queries, 265 (68%) had root-mean-square deviations (RMSDs) of less than 8 A between predicted and actual structures. Such high success rate and accuracy make COBLATH an ideal tool for structural genomics.
Collapse
Affiliation(s)
- Y Shan
- Department of Physics, Drexel University, Philadelphia, Pennsylvania 19104, USA
| | | | | |
Collapse
|
80
|
Abstract
The threading approach to protein fold recognition attempts to evaluate how well a query sequence fits into an already-solved fold. 3D-1D threaders rely on matching 1-dimensional strings of 3-dimensional information predicted from the query sequence with corresponding features of the target structure. In many cases this is combined with a sequence comparison. The combination of sequence and structure information has been shown to improve the accuracy of fold recognition, relative to the exclusive use of sequence or structure. In this paper, we review progress made since the introduction of threading methods a decade ago, highlighting recent advances. We focus on two emerging methods that are unconventional 3D-1D threaders: proximity correlation matrices and parallel cascade identification.
Collapse
Affiliation(s)
- R David
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | | | | |
Collapse
|
81
|
Jung J, Lee B. Use of residue pairs in protein sequence-sequence and sequence-structure alignments. Protein Sci 2000; 9:1576-88. [PMID: 10975579 PMCID: PMC2144723 DOI: 10.1110/ps.9.8.1576] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Two new sets of scoring matrices are introduced: H2 for the protein sequence comparison and T2 for the protein sequence-structure correlation. Each element of H2 or T2 measures the frequency with which a pair of amino acid types in one protein, k-residues apart in the sequence, is aligned with another pair of residues, of given amino acid types (for H2) or in given structural states (for T2), in other structurally homologous proteins. There are four types, corresponding to the k-values of 1 to 4, for both H2 and T2. These matrices were set up using a large number of structurally homologous protein pairs, with little sequence homology between the pair, that were recently generated using the structure comparison program SHEBA. The two scoring matrices were incorporated into the main body of the sequence alignment program SSEARCH in the FASTA package and tested in a fold recognition setting in which a set of 107 test sequences were aligned to each of a panel of 3,539 domains that represent all known protein structures. Six procedures were tested; the straight Smith-Waterman (SW) and FASTA procedures, which used the Blosum62 single residue type substitution matrix; BLAST and PSI-BLAST procedures, which also used the Blosum62 matrix; PASH, which used Blosum62 and H2 matrices; and PASSC, which used Blosum62, H2, and T2 matrices. All procedures gave similar results when the probe and target sequences had greater than 30% sequence identity. However, when the sequence identity was below 30%, a similar structure could be found for more sequences using PASSC than using any other procedure. PASH and PSI-BLAST gave the next best results.
Collapse
Affiliation(s)
- J Jung
- Laboratory of Molecular Biology, Division of Basic Sciences, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | |
Collapse
|
82
|
Kleiger G, Beamer LJ, Grothe R, Mallick P, Eisenberg D. The 1.7 A crystal structure of BPI: a study of how two dissimilar amino acid sequences can adopt the same fold. J Mol Biol 2000; 299:1019-34. [PMID: 10843855 DOI: 10.1006/jmbi.2000.3805] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We have extended the resolution of the crystal structure of human bactericidal/permeability-increasing protein (BPI) to 1.7 A. BPI has two domains with the same fold, but with little sequence similarity. To understand the similarity in structure of the two domains, we compare the corresponding residue positions in the two domains by the method of 3D-1D profiles. A 3D-1D profile is a string formed by assigning each position in the 3D structure to one of 18 environment classes. The environment classes are defined by the local secondary structure, the area of the residue which is buried from solvent, and the fraction of the area buried by polar atoms. A structural alignment between the two BPI domains was used to compare the 3D-1D environments of structurally equivalent positions. Greater than 31% of the aligned positions have conserved 3D-1D environments, but only 13% have conserved residue identities. Analysis of the 3D-1D environmentally conserved positions helps to identify pairs of residues likely to be important in conserving the fold, regardless of the residue similarity. We find examples of 3D-1D environmentally conserved positions with dissimilar residues which nevertheless play similar structural roles. To generalize our findings, we analyzed four other proteins with similar structures yet dissimilar sequences. Together, these examples show that aligned pairs of dissimilar residues often share similar structural roles, stabilizing dissimilar sequences in the same fold.
Collapse
Affiliation(s)
- G Kleiger
- UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, Molecular Biology Institute, Los Angeles, CA, 90095-1570, USA
| | | | | | | | | |
Collapse
|
83
|
Poole LB, Godzik A, Nayeem A, Schmitt JD. AhpF can be dissected into two functional units: tandem repeats of two thioredoxin-like folds in the N-terminus mediate electron transfer from the thioredoxin reductase-like C-terminus to AhpC. Biochemistry 2000; 39:6602-15. [PMID: 10828978 DOI: 10.1021/bi000405w] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AhpF, the flavin-containing component of the Salmonella typhimurium alkyl hydroperoxide reductase system, catalyzes the NADH-dependent reduction of an active-site disulfide bond in the other component, AhpC, which in turn reduces hydroperoxide substrates. The amino acid sequence of the C-terminus of AhpF is 35% identical to that of thioredoxin reductase (TrR) from Escherichia coli. AhpF contains an additional 200-residue N-terminal domain possessing a second redox-active disulfide center also required for AhpC reduction. Our studies indicate that this N-terminus contains a tandem repeat of two thioredoxin (Tr)-like folds, the second of which contains the disulfide redox center. Structural and catalytic properties of independently expressed fragments of AhpF corresponding to the TrR-like C-terminus (F[208-521]) and the 2Tr-like N-terminal domain (F[1-202]) have been addressed. Enzymatic assays, reductive titrations, and circular dichroism studies of the fragments indicate that each folds properly and retains many functional properties. Electron transfer between F[208-521] and F[1-202] is, however, relatively slow (4 x 10(4) M(-)(1) s(-)(1) at 25 degrees C) and nonsaturable up to 100 microM F[1-202]. TrR is nearly as efficient at F[1-202] reduction as is F[208-521], although neither the latter fragment, nor intact AhpF, can reduce Tr. An engineered mutant AhpC substrate with a fluorophore attached via a disulfide bond has been used to demonstrate that only F[1-202], and not F[208-521], is capable of electron transfer to AhpC, thereby establishing the direct role this N-terminal domain plays in mediating electron transfer between the TrR-like part of AhpF and AhpC.
Collapse
Affiliation(s)
- L B Poole
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, North Carolina 27157, USA.
| | | | | | | |
Collapse
|
84
|
Olszewski KA, Yan L, Edwards D, Yeh T. From fold recognition to homology modeling: an analysis of protein modeling challenges at different levels of prediction complexity. COMPUTERS & CHEMISTRY 2000; 24:499-510. [PMID: 10816019 DOI: 10.1016/s0097-8485(99)00078-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
An analysis of different approaches to protein structure prediction is presented based solely on the range of models submitted to the third Critical Assessment of Protein Structure Prediction (CASP3) conference. CASP conferences evaluate the current state of the art of protein structure prediction by comparing blind prediction efforts of many groups for the same set of target sequences. Target sequences may be highly similar to those with known structure or can be totally (at least superficially) sequentially dissimilar. Techniques applied to those blind predictions (over 40 targets) ranges from a detailed homology prediction to the detection of remote homologues well below a twilight zone of protein sequence similarity. For the CASP3 conference, we have submitted predictions, totaling 35, with various levels of difficulty and complexity. For ten submitted homology targets, eight of them were determined by experiment so far. The RMSD of C-alpha atoms are 1.2-1.7, 2.3, and 4.6-17.9 A for the three easy targets, two hard targets, and three very hard homology targets, respectively. Out of 18-fold recognition predictions available for analysis, we got six correct predictions, five near misses, three tough near misses and four far misses. Here we analyze successes and failures of those predictions in an attempt to identify common problems and common achievements.
Collapse
|
85
|
Domingues FS, Lackner P, Andreeva A, Sippl MJ. Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. J Mol Biol 2000; 297:1003-13. [PMID: 10736233 DOI: 10.1006/jmbi.2000.3615] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The biological role, biochemical function, and structure of uncharacterized protein sequences is often inferred from their similarity to known proteins. A constant goal is to increase the reliability, sensitivity, and accuracy of alignment techniques to enable the detection of increasingly distant relationships. Development, tuning, and testing of these methods benefit from appropriate benchmarks for the assessment of alignment accuracy.Here, we describe a benchmark protocol to estimate sequence-to-sequence and sequence-to-structure alignment accuracy. The protocol consists of structurally related pairs of proteins and procedures to evaluate alignment accuracy over the whole set. The set of protein pairs covers all the currently known fold types. The benchmark is challenging in the sense that it consists of proteins lacking clear sequence similarity. Correct target alignments are derived from the three-dimensional structures of these pairs by rigid body superposition. An evaluation engine computes the accuracy of alignments obtained from a particular algorithm in terms of alignment shifts with respect to the structure derived alignments. Using this benchmark we estimate that the best results can be obtained from a combination of amino acid residue substitution matrices and knowledge-based potentials.
Collapse
Affiliation(s)
- F S Domingues
- Center for Applied Molecular Engineering, Institute for Chemistry and Biochemistry, University of Salzburg, Jakob Haringer Strasse 3, Salzburg, A-5020, Austria
| | | | | | | |
Collapse
|
86
|
Zhang C, Kim SH. Environment-dependent residue contact energies for proteins. Proc Natl Acad Sci U S A 2000; 97:2550-5. [PMID: 10706611 PMCID: PMC15966 DOI: 10.1073/pnas.040573597] [Citation(s) in RCA: 70] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/27/1999] [Indexed: 11/18/2022] Open
Abstract
We examine the interactions between amino acid residues in the context of their secondary structural environments (helix, strand, and coil) in proteins. Effective contact energies for an expanded 60-residue alphabet (20 aa x three secondary structural states) are estimated from the residue-residue contacts observed in known protein structures. Similar to the prototypical contact energies for 20 aa, the newly derived energy parameters reflect mainly the hydrophobic interactions; however, the relative strength of such interactions shows a strong dependence on the secondary structural environment, with nonlocal interactions in beta-sheet structures and alpha-helical structures dominating the energy table. Environment-dependent residue contact energies outperform existing residue pair potentials in both threading and three-dimensional contact prediction tests and should be generally applicable to protein structure prediction.
Collapse
Affiliation(s)
- C Zhang
- Department of Chemistry and E. O. Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA
| | | |
Collapse
|
87
|
Zuccola HJ, Filman DJ, Coen DM, Hogle JM. The crystal structure of an unusual processivity factor, herpes simplex virus UL42, bound to the C terminus of its cognate polymerase. Mol Cell 2000; 5:267-78. [PMID: 10882068 DOI: 10.1016/s1097-2765(00)80422-0] [Citation(s) in RCA: 129] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Herpes simplex virus DNA polymerase is a heterodimer composed of a catalytic subunit, Pol, and an unusual processivity subunit, UL42, which, unlike processivity factors such as PCNA, directly binds DNA. The crystal structure of a complex of the C-terminal 36 residues of Pol bound to residues 1-319 of UL42 reveals remarkable similarities between UL42 and PCNA despite contrasting biochemical properties and lack of sequence homology. Moreover, the Pol-UL42 interaction resembles the interaction between the cell cycle regulator p21 and PCNA. The structure and previous data suggest that the UL42 monomer interacts with DNA quite differently than does multimeric toroidal PCNA. The details of the structure lead to a model for the mechanism of UL42, provide the basis for drug design, and allow modeling of other proteins that lack sequence homology with UL42 or PCNA.
Collapse
Affiliation(s)
- H J Zuccola
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | |
Collapse
|
88
|
Abstract
Proteins might have considerable structural similarities even when no evolutionary relationship of their sequences can be detected. This property is often referred to as the proteins sharing only a "fold". Of course, there are also sequences of common origin in each fold, called a "superfamily", and in them groups of sequences with clear similarities, designated "family". Developing algorithms to reliably identify proteins related at any level is one of the most important challenges in the fast growing field of bioinformatics today. However, it is not at all certain that a method proficient at finding sequence similarities performs well at the other levels, or vice versa.Here, we have compared the performance of various search methods on these different levels of similarity. As expected, we show that it becomes much harder to detect proteins as their sequences diverge. For family related sequences the best method gets 75% of the top hits correct. When the sequences differ but the proteins belong to the same superfamily this drops to 29%, and in the case of proteins with only fold similarity it is as low as 15%. We have made a more complete analysis of the performance of different algorithms than earlier studies, also including threading methods in the comparison. Using this method a more detailed picture emerges, showing multiple sequence information to improve detection on the two closer levels of relationship. We have also compared the different methods of including this information in prediction algorithms. For lower specificities, the best scheme to use is a linking method connecting proteins through an intermediate hit. For higher specificities, better performance is obtained by PSI-BLAST and some procedures using hidden Markov models. We also show that a threading method, THREADER, performs significantly better than any other method at fold recognition.
Collapse
Affiliation(s)
- E Lindahl
- Royal Institute of Technology, Stockholm, SE-100 44, Sweden
| | | |
Collapse
|
89
|
|
90
|
|
91
|
Grigoriev IV, Kim SH. Detection of protein fold similarity based on correlation of amino acid properties. Proc Natl Acad Sci U S A 1999; 96:14318-23. [PMID: 10588703 PMCID: PMC24434 DOI: 10.1073/pnas.96.25.14318] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
An increasing number of proteins with weak sequence similarity have been found to assume similar three-dimensional fold and often have similar or related biochemical or biophysical functions. We propose a method for detecting the fold similarity between two proteins with low sequence similarity based on their amino acid properties alone. The method, the proximity correlation matrix (PCM) method, is built on the observation that the physical properties of neighboring amino acid residues in sequence at structurally equivalent positions of two proteins of similar fold are often correlated even when amino acid sequences are different. The hydrophobicity is shown to be the most strongly correlated property for all protein fold classes. The PCM method was tested on 420 proteins belonging to 64 different known folds, each having at least three proteins with little sequence similarity. The method was able to detect fold similarities for 40% of the 420 sequences. Compared with sequence comparison and several fold-recognition methods, the method demonstrates good performance in detecting fold similarities among the proteins with low sequence identity. Applied to the complete genome of Methanococcus jannaschii, the method recognized the folds for 22 hypothetical proteins.
Collapse
Affiliation(s)
- I V Grigoriev
- Department of Chemistry and E. O. Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA
| | | |
Collapse
|
92
|
Abstract
The study of the plant oncogene rolA has been hampered by a lack of structural information. Here we show that, despite a lack of significant sequence similarity to proteins of known structure, the rolA sequence adopts a known fold; that of the papillomavirus E2 DNA-binding domain. This fold is reliably identified by modern threading programs, which consider predicted secondary structure, but not by others. Although the rolA sequence is only around 16% identical to those of the available template structures, a structural model could be built that performed well against protein structure verification programs. The adopted strategy involved alignment corrections, justified by multiple model building and evaluation, with particular attention paid to the hydrophobic core residues. We find that rolA protein is predicted to resemble the template proteins in two key aspects; existence as a dimer and ability to bind DNA. rolA protein has recently been shown experimentally to possess DNA binding ability. This model predicts Lys 24 and Arg 27 to be involved in sequence-specific interactions and eight other residues to hydrogen-bond phosphate groups of the DNA.
Collapse
Affiliation(s)
- D J Rigden
- National Centre of Genetic Resources and Biotechnology, Cenargen/Embrapa, Brasilia, Brazil.
| | | |
Collapse
|
93
|
Abstract
We have developed a fully automated protein design strategy that works on the entire sequence of the protein and uses a full atom representation. At each step of the procedure, an all-atom model of the protein is built using the template protein structure and the current designed sequence. The energy of the model is used to drive a Monte Carlo optimization in sequence space: random moves are either accepted or rejected based on the Metropolis criterion. We rely on the physical forces that stabilize native protein structures to choose the optimum sequence. Our energy function includes van der Waals interactions, electrostatics and an environment free energy. Successful protein design should be specific and generate a sequence compatible with the template fold and incompatible with competing folds. We impose specificity by maintaining the amino acid composition constant, based on the random energy model. The specificity of the optimized sequence is tested by fold recognition techniques. Successful sequence designs for the B1 domain of protein G, for the lambda repressor and for sperm whale myoglobin are presented. We show that each additional term of the energy function improves the performance of our design procedure: the van der Waals term ensures correct packing, the electrostatics term increases the specificity for the correct native fold, and the environment solvation term ensures a correct pattern of buried hydrophobic and exposed hydrophilic residues. For the globin family, we show that we can design a protein sequence that is stable in the myoglobin fold, yet incompatible with the very similar hemoglobin fold.
Collapse
Affiliation(s)
- P Koehl
- Department of Structural Biology, Fairchild Building, Stanford University, Stanford, CA 94305, USA.
| | | |
Collapse
|
94
|
Daniel SC, Parish JH, Ison JC, Blades MJ, Findlay JB. Alignment of a sparse protein signature with protein sequences: application to fold prediction for three small globulins. FEBS Lett 1999; 459:349-52. [PMID: 10526163 DOI: 10.1016/s0014-5793(99)01238-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
A novel algorithm has been developed for scoring the match between an imprecise sparse signature and all the protein sequences in a sequence database. The method was applied to a specific problem: signatures were derived from the probable folding nucleus and positions obtained from the determined interactions that occur during the folding of three small globular proteins and points of inter-element contact and sequence comparison of the actual three-dimensional structures of the same three proteins. In the case of two of these, lysozyme and myoglobin, the residues in the folding nucleus corresponded well to the key residues spotted by examination of the structures and in the remaining case, barnase, they did not. The diagnostic performance of the two types of signatures were compared for all three proteins. The significance of this for the application of an understanding of the protein folding mechanisms for structure prediction is discussed. The algorithm is generic and could be applied to other user-defined problems of sequence analysis.
Collapse
Affiliation(s)
- S C Daniel
- School of Biochemistry and Molecular Biology, The University of Leeds, Leeds, UK
| | | | | | | | | |
Collapse
|
95
|
Takano K, Ota M, Ogasahara K, Yamagata Y, Nishikawa K, Yutani K. Experimental verification of the 'stability profile of mutant protein' (SPMP) data using mutant human lysozymes. PROTEIN ENGINEERING 1999; 12:663-72. [PMID: 10469827 DOI: 10.1093/protein/12.8.663] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The stability profile of mutant protein (SPMP) (Ota,M., Kanaya,S. and Nishikawa,K., 1995, J. Mol. Biol., 248, 733-738) estimates the changes in conformational stability due to single amino acid substitutions using a pseudo-energy potential developed for evaluating structure-sequence compatibility in the structure prediction method, the 3D-1D compatibility evaluation. Nine mutant human lysozymes expected to significantly increase in stability from SPMP were constructed, in order to experimentally verify the reliability of SPMP. The thermodynamic parameters for denaturation and crystal structures of these mutant proteins were determined. One mutant protein was stabilized as expected, compared with the wild-type protein. However, the others were not stabilized even though the structural changes were subtle, indicating that SPMP overestimates the increase in stability or underestimates negative effects due to substitution. The stability changes in the other mutant human lysozymes previously reported were also analyzed by SPMP. The correlation of the stability changes between the experiment and prediction depended on the types of substitution: there were some correlations for proline mutants and cavity-creating mutants, but no correlation for mutants related to side-chain hydrogen bonds. The present results may indicate some additional factors that should be considered in the calculation of SPMP, suggesting that SPMP can be refined further.
Collapse
Affiliation(s)
- K Takano
- Institute for Protein Research, Osaka University, Yamadaoka, Suita, Osaka 565-0871, Japan
| | | | | | | | | | | |
Collapse
|
96
|
|
97
|
Ayers DJ, Gooley PR, Widmer-Cooper A, Torda AE. Enhanced protein fold recognition using secondary structure information from NMR. Protein Sci 1999; 8:1127-33. [PMID: 10338023 PMCID: PMC2144327 DOI: 10.1110/ps.8.5.1127] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
NMR offers the possibility of accurate secondary structure for proteins that would be too large for structure determination. In the absence of an X-ray crystal structure, this information should be useful as an adjunct to protein fold recognition methods based on low resolution force fields. The value of this information has been tested by adding varying amounts of artificial secondary structure data and threading a sequence through a library of candidate folds. Using a literature test set, the threading method alone has only a one-third chance of producing a correct answer among the top ten guesses. With realistic secondary structure information, one can expect a 60-80% chance of finding a homologous structure. The method has then been applied to examples with published estimates of secondary structure. This implementation is completely independent of sequence homology, and sequences are optimally aligned to candidate structures with gaps and insertions allowed. Unlike work using predicted secondary structure, we test the effect of differing amounts of relatively reliable data.
Collapse
Affiliation(s)
- D J Ayers
- Research School of Chemistry, Australian National University, Canberra ACT
| | | | | | | |
Collapse
|
98
|
de la Cruz X, Thornton JM. Factors limiting the performance of prediction-based fold recognition methods. Protein Sci 1999; 8:750-9. [PMID: 10211821 PMCID: PMC2144320 DOI: 10.1110/ps.8.4.750] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
In the past few years, a new generation of fold recognition methods has been developed, in which the classical sequence information is combined with information obtained from secondary structure and, sometimes, accessibility predictions. The results are promising, indicating that this approach may compete with potential-based methods (Rost B et al., 1997, J Mol Biol 270:471-480). Here we present a systematic study of the different factors contributing to the performance of these methods, in particular when applied to the problem of fold recognition of remote homologues. Our results indicate that secondary structure and accessibility prediction methods have reached an accuracy level where they are not the major factor limiting the accuracy of fold recognition. The pattern degeneracy problem is confirmed as the major source of error of these methods. On the basis of these results, we study three different options to overcome these limitations: normalization schemes, mapping of the coil state into the different zones of the Ramachandran plot, and post-threading graphical analysis.
Collapse
Affiliation(s)
- X de la Cruz
- Department of Biochemistry and Molecular Biology, University College, London, United Kingdom
| | | |
Collapse
|
99
|
Gerloff DL, Cannarozzi GM, Joachimiak M, Cohen FE, Schreiber D, Benner SA. Evolutionary, mechanistic, and predictive analyses of the hydroxymethyldihydropterin pyrophosphokinase family of proteins. Biochem Biophys Res Commun 1999; 254:70-6. [PMID: 9920734 DOI: 10.1006/bbrc.1998.9884] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
A prediction has been prepared ab initio for the secondary structure of the hydroxymethyldihydropterin pyrophosphokinase (HPPK) family of proteins starting from a set of aligned homologous protein sequences. Attempts to identify a fold by threading failed, judging by the inability to find a threading "hit" that had a secondary structure that was plausibly congruent to the predicted secondary structure for the HPPK family. Therefore, a set of tertiary structure models was assembled ab initio, where alternative models were built and used to select between alternative secondary structure models. This prediction report illustrates the importance of non-computational approaches to structure prediction at its present frontier, which is to obtain medium resolution models of tertiary structure.
Collapse
Affiliation(s)
- D L Gerloff
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, California, 94143, USA
| | | | | | | | | | | |
Collapse
|
100
|
Ahmad KF, Engel CK, Privé GG. Crystal structure of the BTB domain from PLZF. Proc Natl Acad Sci U S A 1998; 95:12123-8. [PMID: 9770450 PMCID: PMC22795 DOI: 10.1073/pnas.95.21.12123] [Citation(s) in RCA: 236] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/1998] [Accepted: 08/19/1998] [Indexed: 01/11/2023] Open
Abstract
The BTB domain (also known as the POZ domain) is an evolutionarily conserved protein-protein interaction motif found at the N terminus of 5-10% of C2H2-type zinc-finger transcription factors, as well as in some actin-associated proteins bearing the kelch motif. Many BTB proteins are transcriptional regulators that mediate gene expression through the control of chromatin conformation. In the human promyelocytic leukemia zinc finger (PLZF) protein, the BTB domain has transcriptional repression activity, directs the protein to a nuclear punctate pattern, and interacts with components of the histone deacetylase complex. The association of the PLZF BTB domain with the histone deacetylase complex provides a mechanism of linking the transcription factor with enzymatic activities that regulate chromatin conformation. The crystal structure of the BTB domain of PLZF was determined at 1.9 A resolution and reveals a tightly intertwined dimer with an extensive hydrophobic interface. Approximately one-quarter of the monomer surface area is involved in the dimer intermolecular contact. These features are typical of obligate homodimers, and we expect the full-length PLZF protein to exist as a branched transcription factor with two C-terminal DNA-binding regions. A surface-exposed groove lined with conserved amino acids is formed at the dimer interface, suggestive of a peptide-binding site. This groove may represent the site of interaction of the PLZF BTB domain with nuclear corepressors or other nuclear proteins.
Collapse
Affiliation(s)
- K F Ahmad
- Division of Molecular and Structural Biology, Ontario Cancer Institute, and the Department of Medical Biophysics, University of Toronto, 610 University Avenue, Toronto, Ontario, Canada M5G 2M9
| | | | | |
Collapse
|