1
|
Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014; 11:20131147. [PMID: 24740960 DOI: 10.1098/rsif.2013.1147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Oxford, , Didcot OX11 0QX, UK
| | | |
Collapse
|
2
|
Gront D, Blaszczyk M, Wojciechowski P, Kolinski A. BioShell Threader: protein homology detection based on sequence profiles and secondary structure profiles. Nucleic Acids Res 2012; 40:W257-62. [PMID: 22693216 PMCID: PMC3394251 DOI: 10.1093/nar/gks555] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The BioShell package has recently been extended with a web server for protein homology detection based on profile-to-profile alignment (known as 1D threading). Its aim is to assign structural templates to each domain of the query. The server uses sequence profiles that describe observed sequence variability and secondary structure profiles providing expected probability for a certain secondary structure type at a given position in a protein. Three independent predictors are used to increase the rate of successful predictions. Careful evaluation shows that there is nearly 80% chance that the query sequence belongs to the same SCOP family as the top scoring template. The Bioshell Threader server is freely available at: http://www.bioshell.pl/threader/.
Collapse
Affiliation(s)
- Dominik Gront
- University of Warsaw, Faculty of Chemistry, Pasteura 1, 02-093 Warsaw, Poland.
| | | | | | | |
Collapse
|
3
|
Sundaramurthy P, Sreenivasan R, Shameer K, Gakkhar S, Sowdhamini R. HORIBALFRE program: Higher Order Residue Interactions Based ALgorithm for Fold REcognition. Bioinformation 2011; 7:352-9. [PMID: 22355236 PMCID: PMC3280490 DOI: 10.6026/97320630007352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Accepted: 11/24/2011] [Indexed: 11/23/2022] Open
Abstract
Understanding the functional and structural implication of a protein encoded in novel genes using function association or fold recognition approaches remains to be a challenging task in the current era of genomes, metagenomes and personal genomes. In an attempt to enhance potential-based fold-recognition methods in recognizing remote homology between proteins, we propose a new approach "Higher Order Residue Interaction Based ALgorithm for Fold REcognition (HORIBALFRE)". Higher order residue interactions refer to a class of interactions in protein structures mediated by C(α) or C(β) atoms within a pre-defined distance cut-off. Higher order residue interactions (pairwise, triplet and quadruplet interactions) play a vital role in attaining the stable conformation of a protein structure. In HORIBALFRE, we incorporated the potential contributions from two body (pairwise) interactions, three body (triplet interactions) and four-body (quadruple interaction) interactions, to implement a new fold recognition algorithm. Core of HORIBALFRE algorithm includes the potentials generated from a library of protein structure derived from manually curated CAMPASS database of structure based sequence alignment. We used Fischer's dataset, with 68 templates and 56 target sequences, derived from SCOP database and performed one-against-all sequence alignment using TCoffee. Various potentials were derived using custom scripts and these potentials were incorporated in the HORIBALFRE algorithm. In this manuscript, we report outline of a novel fold recognition algorithm and initial results. Our results show that inclusion of quadruplet class of higher order residue interaction improves fold recognition.
Collapse
Affiliation(s)
- Pandurangan Sundaramurthy
- National Center for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore - 560065, India
- Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee -247667, India
| | - Raashi Sreenivasan
- National Center for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore - 560065, India
- Centre for Biotechnology, Anna University, Chennai - 600025, India
- University of Wisconsin-Madison, Madison, WI 53706-1481, USA; 5Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN 55901 USA
| | - Khader Shameer
- National Center for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore - 560065, India
- Authors contributed equally to this work
| | - Sunita Gakkhar
- Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee -247667, India
| | - Ramanathan Sowdhamini
- National Center for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore - 560065, India
| |
Collapse
|
4
|
Abstract
Most newly sequenced proteins are likely to adopt a similar structure to one which has already been experimentally determined. For this reason, the most successful approaches to protein structure prediction have been template-based methods. Such prediction methods attempt to identify and model the folds of unknown structures by aligning the target sequences to a set of representative template structures within a fold library. In this chapter, I discuss the development of template-based approaches to fold prediction, from the traditional techniques to the recent state-of-the-art methods. I also discuss the recent development of structural annotation databases, which contain models built by aligning the sequences from entire proteomes against known structures. Finally, I run through a practical step-by-step guide for aligning target sequences to known structures and contemplate the future direction of template-based structure prediction.
Collapse
|
5
|
Powers R, Copeland JC, Germer K, Mercier KA, Ramanathan V, Revesz P. Comparison of protein active site structures for functional annotation of proteins and drug design. Proteins 2006; 65:124-35. [PMID: 16862592 DOI: 10.1002/prot.21092] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Rapid and accurate functional assignment of novel proteins is increasing in importance, given the completion of numerous genome sequencing projects and the vastly expanding list of unannotated proteins. Traditionally, global primary-sequence and structure comparisons have been used to determine putative function. These approaches, however, do not emphasize similarities in active site configurations that are fundamental to a protein's activity and highly conserved relative to the global and more variable structural features. The Comparison of Protein Active Site Structures (CPASS) database and software enable the comparison of experimentally identified ligand-binding sites to infer biological function and aid in drug discovery. The CPASS database comprises the ligand-defined active sites identified in the protein data bank, where the CPASS program compares these ligand-defined active sites to determine sequence and structural similarity without maintaining sequence connectivity. CPASS will compare any set of ligand-defined protein active sites, irrespective of the identity of the bound ligand.
Collapse
Affiliation(s)
- Robert Powers
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska 68588, USA.
| | | | | | | | | | | |
Collapse
|
6
|
Qiu J, Elber R. SSALN: an alignment algorithm using structure-dependent substitution matrices and gap penalties learned from structurally aligned protein pairs. Proteins 2006; 62:881-91. [PMID: 16385554 DOI: 10.1002/prot.20854] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In template-based modeling of protein structures, the generation of the alignment between the target and the template is a critical step that significantly affects the accuracy of the final model. This paper proposes an alignment algorithm SSALN that learns substitution matrices and position-specific gap penalties from a database of structurally aligned protein pairs. In addition to the amino acid sequence information, secondary structure and solvent accessibility information of a position are used to derive substitution scores and position-specific gap penalties. In a test set of CASP5 targets, SSALN outperforms sequence alignment methods such as a Smith-Waterman algorithm with BLOSUM50 and PSI_BLAST. SSALN also generates better alignments than PSI_BLAST in the CASP6 test set. LOOPP server prediction based on an SSALN alignment is ranked the best for target T0280_1 in CASP6. SSALN is also compared with several threading methods and sequence alignment methods on the ProSup benchmark. SSALN has the highest alignment accuracy among the methods compared. On the Fischer's benchmark, SSALN performs better than CLUSTALW and GenTHREADER, and generates more alignments with accuracy >50%, >60% or >70% than FUGUE, but fewer alignments with accuracy >80% than FUGUE. All the supplemental materials can be found at http://www.cs.cornell.edu/ approximately jianq/research.htm.
Collapse
Affiliation(s)
- Jian Qiu
- Department of Computer Science, Cornell University, Ithaca, New York 14853, USA
| | | |
Collapse
|
7
|
Abstract
Here, we report a novel protein sequence descriptor-based remote homology identification method, able to infer fold relationships without the explicit knowledge of structure. In a first phase, we have individually benchmarked 13 different descriptor types in fold identification experiments in a highly diverse set of protein sequences. The relevant descriptors were related to the fold class membership by using simple similarity measures in the descriptor spaces, such as the cosine angle. Our results revealed that the three best-performing sets of descriptors were the sequence-alignment-based descriptor using PSI-BLAST e-values, the descriptors based on the alignment of secondary structural elements (SSEA), and the descriptors based on the occurrence of PROSITE functional motifs. In a second phase, the three top-performing descriptors were combined to obtain a final method with improved performance, which we named DescFold. Class membership was predicted by Support Vector Machine (SVM) learning. In comparison with the individual PSI-BLAST-based descriptor, the rate of remote homology identification increased from 33.7% to 46.3%. We found out that the composite set of descriptors was able to identify the true remote homolog for nearly every sixth sequence at the 95% confidence level, or some 10% more than a single PSI-BLAST search. We have benchmarked the DescFold method against several other state-of-the-art fold recognition algorithms for the 172 LiveBench-8 targets, and we concluded that it was able to add value to the existing techniques by providing a confident hit for at least 10% of the sequences not identifiable by the previously known methods.
Collapse
Affiliation(s)
- Ziding Zhang
- Nestlé Research Center, BioAnalyti-cal Science, CH-1000 Lausanne 26, Switzerland. Ziding.
| | | | | |
Collapse
|
8
|
Xu J, Li M, Kim D, Xu Y. RAPTOR: optimal protein threading by linear programming. J Bioinform Comput Biol 2004; 1:95-117. [PMID: 15290783 DOI: 10.1142/s0219720003000186] [Citation(s) in RCA: 211] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2003] [Revised: 03/07/2003] [Accepted: 03/07/2003] [Indexed: 11/18/2022]
Abstract
This paper presents a novel linear programming approach to do protein 3-dimensional (3D) structure prediction via threading. Based on the contact map graph of the protein 3D structure template, the protein threading problem is formulated as a large scale integer programming (IP) problem. The IP formulation is then relaxed to a linear programming (LP) problem, and then solved by the canonical branch-and-bound method. The final solution is globally optimal with respect to energy functions. In particular, our energy function includes pairwise interaction preferences and allowing variable gaps which are two key factors in making the protein threading problem NP-hard. A surprising result is that, most of the time, the relaxed linear programs generate integral solutions directly. Our algorithm has been implemented as a software package RAPTOR-RApid Protein Threading by Operation Research technique. Large scale benchmark test for fold recognition shows that RAPTOR significantly outperforms other programs at the fold similarity level. The CAFASP3 evaluation, a blind and public test by the protein structure prediction community, ranks RAPTOR as top 1, among individual prediction servers, in terms of the recognition capability and alignment accuracy for Fold Recognition (FR) family targets. RAPTOR also performs very well in recognizing the hard Homology Modeling (HM) targets. RAPTOR was implemented at the University of Waterloo and it can be accessed at http://www.cs.uwaterloo.ca/~j3xu/RAPTOR_form.htm.
Collapse
Affiliation(s)
- Jinbo Xu
- Department of Computer Science, University of Waterloo, Waterloo, Ont. N2L 3G1, Canada.
| | | | | | | |
Collapse
|
9
|
Zhang Z, Kochhar S, Grigorov M. Exploring the sequence-structure protein landscape in the glycosyltransferase family. Protein Sci 2004; 12:2291-302. [PMID: 14500887 PMCID: PMC2366918 DOI: 10.1110/ps.03131303] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D-PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence-similarity search methods (i.e., BLAST and PSI-BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI-BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family.
Collapse
Affiliation(s)
- Ziding Zhang
- Nestlé Research Center, CH-1000 Lausanne 26, Switzerland.
| | | | | |
Collapse
|
10
|
Abstract
We have developed a new algorithm based on the mathematical theory of linear programming (LP) and implemented it in our program RAPTOR. Our new approach provides an elegant formulation of the protein-threading problem, overcomes the intractability problem of protein threading, in practice, and allows us to use existing powerful linear programming software to obtain optimal protein threading solutions. CASP5 and CAFASP3 gave us the first chance to test RAPTOR in an unbiased way. RAPTOR was ranked as the top individual (automatic) server for fold recognition by the CAFASP3 organizers. In this short article, we describe RAPTOR's LP formulation, assess RAPTOR's performance in CAFASP3/CASP5, explain why it has superceded other existing automatic individual methods, and point out its strengths, limitations, extensions, and prospects for improvement.
Collapse
Affiliation(s)
- Jinbo Xu
- Department of Computer Science, University of Waterloo, Waterloo, Canada. j3xu,
| | | |
Collapse
|
11
|
Abstract
Here we present a simplified form of threading that uses only a 20 x 20 two-body residue-based potential and restricted number of gaps. Despite its simplicity and transparency the Monte Carlo-based threading algorithm performs very well in a rigorous test of fold recognition. The results suggest that by simplifying and constraining the decoy space, one can achieve better fold recognition. Fold recognition results are compared with and supplemented by a PSI-BLAST search. The statistical significance of threading results is rigorously evaluated from statistics of extremes by comparison with optimal alignments of a large set of randomly shuffled sequences. The statistical theory, based on the Random Energy Model, yields a cumulative statistical parameter, epsilon, that attests to the likelihood of correct fold recognition. A large epsilon indicates a significant energy gap between the optimal alignment and decoy alignments and, consequently, a high probability that the fold is correctly recognized. For a particular number of gaps, the epsilon parameter reaches its maximal value, and the fold is recognized. As the number of gaps further increases, the likelihood of correct fold recognition drops off. This is because the decoy space is small when gaps are restricted to a small number, but the native alignment is still well approximated, whereas unrestricted increase of the number of gaps leads to rapid growth of the number of decoys and their statistical dominance over the correct alignment. It is shown that best results are obtained when a combination of one-, two-, and three-gap threading is used. To this end, use of the epsilon parameter is crucial for rigorous comparison of results across the different decoy spaces belonging to a different number of gaps.
Collapse
Affiliation(s)
- William Chen
- Department of Biophysics, Harvard University, Boston, Massachusetts, USA
| | | | | |
Collapse
|
12
|
Klepeis JL, Floudas CA. Prediction of beta-sheet topology and disulfide bridges in polypeptides. J Comput Chem 2003; 24:191-208. [PMID: 12497599 DOI: 10.1002/jcc.10167] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
An ab initio method has been developed to predict beta architectures in polypeptides. The approach predicts the topology of beta-sheets and disulfide bridges through a novel superstructure-based mathematical framework originally established for chemical process synthesis problems. Two types of superstructure are introduced, both of which emanate from the principle that hydrophobic interactions drive the formation of a beta-structure. The mathematical formulation of the problem results in a set of integer linear programming (ILP) problems that can be solved to global optimality to identify the optimal beta-configuration. These (ILP) models can also predict a ranked ordered list of the best, second-best, third-best, etc., topologies of beta-sheets and disulfide bridges. The approach is shown to perform very well for several benchmark polypeptide systems, as well as polypeptides exhibiting challenging nonsequential beta-sheet topologies folds (56 to 187 amino acids).
Collapse
Affiliation(s)
- J L Klepeis
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | |
Collapse
|
13
|
Panchenko AR. Finding weak similarities between proteins by sequence profile comparison. Nucleic Acids Res 2003; 31:683-9. [PMID: 12527777 PMCID: PMC140518 DOI: 10.1093/nar/gkg154] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To improve the recognition of weak similarities between proteins a method of aligning two sequence profiles is proposed. It is shown that exploring the sequence space in the vicinity of the sequence with unknown properties significantly improves the performance of sequence alignment methods. Consistent with the previous observations the recognition sensitivity and alignment accuracy obtained by a profile-profile alignment method can be as much as 30% higher compared to the sequence-profile alignment method. It is demonstrated that the choice of score function and the diversity of the test profile are very important factors for achieving the maximum performance of the method, whereas the optimum range of these parameters depends on the level of similarity to be recognized.
Collapse
Affiliation(s)
- Anna R Panchenko
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, Room 8N805, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| |
Collapse
|
14
|
Cowen L, Bradley P, Menke M, King J, Berger B. Predicting the beta-helix fold from protein sequence data. J Comput Biol 2002; 9:261-76. [PMID: 12015881 DOI: 10.1089/10665270252935458] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A method is presented that uses beta-strand interactions to predict the parallel right-handed beta-helix super-secondary structural motif in protein sequences. A program called BetaWrap implements this method and is shown to score known beta-helices above non-beta-helices in the Protein Data Bank in cross-validation. It is demonstrated that BetaWrap learns each of the seven known SCOP beta-helix families, when trained primarily on beta-structures that are not beta-helices, together with structural features of known beta-helices from outside the family. BetaWrap also predicts many bacterial proteins of unknown structure to be beta-helices; in particular, these proteins serve as virulence factors, adhesins, and toxins in bacterial pathogenesis and include cell surface proteins from Chlamydia and the intestinal bacterium Helicobacter pylori. The computational method used here may generalize to other beta-structures for which strand topology and profiles of residue accessibility are well conserved.
Collapse
Affiliation(s)
- Lenore Cowen
- Department of EECS, Tufts University, Medford, MA 02155, USA
| | | | | | | | | |
Collapse
|
15
|
Reva B, Finkelstein A, Topiol S. Threading with chemostructural restrictions method for predicting fold and functionally significant residues: application to dipeptidylpeptidase IV (DPP-IV). Proteins 2002; 47:180-93. [PMID: 11933065 DOI: 10.1002/prot.10076] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We present a new method for more accurate modeling of protein structure, called threading with chemostructural restrictions. This method addresses those cases in which a target sequence has only remote homologues of known structure for which sequence comparison methods cannot provide accurate alignments. Although remote homologues cannot provide an accurate model for the whole chain, they can be used in constructing practically useful models for the most conserved-and often the most interesting-part of the structure. For many proteins of interest, one can suggest certain chemostructural patterns for the native structure based on the available information on the structural superfamily of the protein, the type of activity, the sequence location of the functionally significant residues, and other factors. We use such patterns to restrict (1) a number of possible templates, and (2) a number of allowed chain conformations on a template. The latter restrictions are imposed in the form of additional template potentials (including terms acting as sequence anchors) that act on certain residues. This approach is tested on remote homologues of alpha/beta-hydrolases that have significant structural similarity in the positions of their catalytic triads. The study shows that, in spite of significant deviations between the model and the native structures, the surroundings of the catalytic triad (positions of C(alpha) atoms of 20-30 nearby residues) can be reproduced with accuracy of 2-3 A. We then apply the approach to predict the structure of dipeptidylpeptidase IV (DPP-IV). Using experimentally available data identifying the catalytic triad residues of DPP-IV (David et al., J Biol Chem 1993;268:17247-17252); we predict a model structure of the catalytic domain of DPP-IV based on the 3D fold of prolyl oligopeptidase (Fulop et al., Cell 1998;94:161-170) and use this structure for modeling the interaction of DPP-IV with inhibitor.
Collapse
Affiliation(s)
- Boris Reva
- Novartis Institute for Biomedical Research, Summit, New Jersey, USA.
| | | | | |
Collapse
|
16
|
Feldman HJ, Hogue CW. Probabilistic sampling of protein conformations: New hope for brute force? Proteins 2001. [DOI: 10.1002/prot.1163] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
17
|
Meller J, Elber R. Linear programming optimization and a double statistical filter for protein threading protocols. Proteins 2001; 45:241-61. [PMID: 11599028 DOI: 10.1002/prot.1145] [Citation(s) in RCA: 105] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The design of scoring functions (or potentials) for threading, differentiating native-like from non-native structures with a limited computational cost, is an active field of research. We revisit two widely used families of threading potentials: the pairwise and profile models. To design optimal scoring functions we use linear programming (LP). The LP protocol makes it possible to measure the difficulty of a particular training set in conjunction with a specific form of the scoring function. Gapless threading demonstrates that pair potentials have larger prediction capacity compared with profile energies. However, alignments with gaps are easier to compute with profile potentials. We therefore search and propose a new profile model with comparable prediction capacity to contact potentials. A protocol to determine optimal energy parameters for gaps, using LP, is also presented. A statistical test, based on a combination of local and global Z-scores, is employed to filter out false-positives. Extensive tests of the new protocol are presented. The new model provides an efficient alternative for threading with pair energies, maintaining comparable accuracy. The code, databases, and a prediction server are available at http://www.tc.cornell.edu/CBIO/loopp.
Collapse
Affiliation(s)
- J Meller
- Department of Computer Science, Cornell University, Ithaca, New York 14853, USA
| | | |
Collapse
|
18
|
Kolinski A, Betancourt MR, Kihara D, Rotkiewicz P, Skolnick J. Generalized comparative modeling (GENECOMP): a combination of sequence comparison, threading, and lattice modeling for protein structure prediction and refinement. Proteins 2001; 44:133-49. [PMID: 11391776 DOI: 10.1002/prot.1080] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
An improved generalized comparative modeling method, GENECOMP, for the refinement of threading models is developed and validated on the Fischer database of 68 probe-template pairs, a standard benchmark used to evaluate threading approaches. The basic idea is to perform ab initio folding using a lattice protein model, SICHO, near the template provided by the new threading algorithm PROSPECTOR. PROSPECTOR also provides predicted contacts and secondary structure for the template-aligned regions, and possibly for the unaligned regions by garnering additional information from other top-scoring threaded structures. Since the lowest-energy structure generated by the simulations is not necessarily the best structure, we employed two structure-selection protocols: distance geometry and clustering. In general, clustering is found to generate somewhat better quality structures in 38 of 68 cases. When applied to the Fischer database, the protocol does no harm and in a significant number of cases improves upon the initial threading model, sometimes dramatically. The procedure is readily automated and can be implemented on a genomic scale.
Collapse
Affiliation(s)
- A Kolinski
- Laboratory of Computational Genomics, Donald Danforth Plant Science Center, St. Louis, Missouri 63141, USA
| | | | | | | | | |
Collapse
|
19
|
Di Gennaro JA, Siew N, Hoffman BT, Zhang L, Skolnick J, Neilson LI, Fetrow JS. Enhanced functional annotation of protein sequences via the use of structural descriptors. J Struct Biol 2001; 134:232-45. [PMID: 11551182 DOI: 10.1006/jsbi.2001.4391] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In order to circumvent limitations of sequence based methods in the process of making functional predictions for proteins, we have developed a methodology that uses a sequence-to-structure-to-function paradigm. First, an approximate three-dimensional structure is predicted. Then, a three-dimensional descriptor of the functional site, termed a Fuzzy Functional Form, or FFF, is used to screen the structure for the presence of the functional site of interest (Fetrow et al., 1998; Fetrow and Skolnick, 1998). Previously, a disulfide oxidoreductase FFF was developed and applied to predicted structures obtained from a small structural database. Here, using a substantially larger structural database, we expand the analysis of the disulfide oxidoreductase FFF to the B. subtilis genome. To ascertain the performance of the FFF, its results are compared to those obtained using both the sequence alignment method BLAST and three local sequence motif databases: PRINTS, Prosite, and Blocks. The FFF method is then compared in detail to Blocks and it is shown that the FFF is more flexible and sensitive in finding a specific function in a set of unknown proteins. In addition, the estimated false positive rate of function prediction is significantly lower using the FFF structural motif, rather than the standard sequence motif methods. We also present a second FFF and describe a specific example of the results of its whole-genome application to D. melanogaster using a newer threading algorithm. Our results from all of these studies indicate that the addition of three-dimensional structural information adds significant value in the prediction of biochemical function of genomic sequences.
Collapse
Affiliation(s)
- J A Di Gennaro
- GeneFormatics, Incorporated, 5830 Oberlin Drive, Suite 200, San Diego, California 92121, USA.
| | | | | | | | | | | | | |
Collapse
|
20
|
Linial M, Yona G. Methodologies for target selection in structural genomics. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2001; 73:297-320. [PMID: 11063777 DOI: 10.1016/s0079-6107(00)00011-0] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
As the number of complete genomes that have been sequenced keeps growing, unknown areas of the protein space are revealed and new horizons open up. Most of this information will be fully appreciated only when the structural information about the encoded proteins becomes available. The goal of structural genomics is to direct large-scale efforts of protein structure determination, so as to increase the impact of these efforts. This review focuses on current approaches in structural genomics aimed at selecting representative proteins as targets for structure determination. We will discuss the concept of representative structures/folds, the current methodologies for identifying those proteins, and computational techniques for identifying proteins which are expected to adopt new structural folds.
Collapse
Affiliation(s)
- M Linial
- Department of Biological Chemistry, Institute of Life Sciences, Hebrew University, 91904, Jerusalem, Israel.
| | | |
Collapse
|
21
|
Abstract
A homology-based structure prediction method ideally gives both a correct fold assignment and an accurate query-template alignment. In this article we show that the combination of two existing methods, PSI-BLAST and threading, leads to significant enhancement in the success rate of fold recognition. The combined approach, termed COBLATH, also yields much higher alignment accuracy than found in previous studies. It consists of two-way searches both by PSI-BLAST and by threading. In the PSI-BLAST portion, a query is used to search for hits in a library of potential templates and, conversely, each potential template is used to search for hits in a library of queries. In the threading portion, the scoring function is the sum of a sequence profile and a 6x6 substitution matrix between predicted query and known template secondary structure and solvent exposure. "Two-way" in threading means that the query's sequence profile is used to match the sequences of all potential templates and the sequence profiles of all potential templates are used to match the query's sequence. When tested on a set of 533 nonhomologous proteins, COBLATH was able to assign folds for 390 (73%). Among these 390 queries, 265 (68%) had root-mean-square deviations (RMSDs) of less than 8 A between predicted and actual structures. Such high success rate and accuracy make COBLATH an ideal tool for structural genomics.
Collapse
Affiliation(s)
- Y Shan
- Department of Physics, Drexel University, Philadelphia, Pennsylvania 19104, USA
| | | | | |
Collapse
|
22
|
Prabhakaran M, Dudek M, Raghunathan G, Ramnarayan K. Sequencing and model structure of a Naja naja atra protein fragment. THE JOURNAL OF PEPTIDE RESEARCH : OFFICIAL JOURNAL OF THE AMERICAN PEPTIDE SOCIETY 2000; 56:12-23. [PMID: 10917453 DOI: 10.1034/j.1399-3011.2000.00725.x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
We report the amino acid sequence of a basic protein isolated from the snake venom of Naja naja atra. An automated Edman sequencer was used to determine the 65-residue sequence, aided by electrospray ionization/mass spectrometry. Online reduction and pyridylethylation of the peptide was performed to identify the cysteine residues. Trypsin, chymotrypsin and aspartic digestions were carried out to derive peptide fragments for further sequencing. Fragmented peptides were overlapped to obtain the complete sequence. Molecular mass measurements of the whole protein and its fragments were used as a countercheck for sequence assignment. Further confirmation of the sequence was indicated by sequence homology to other snake venom neurotoxins. A molecular model of the tertiary structure was constructed based on sequence homology, and was refined by global minimization and extensive quality control algorithms. Electrostatic and hydrophobic surface calculations and molecular dynamics simulations were carried out to determine the functional properties of the molecule.
Collapse
Affiliation(s)
- M Prabhakaran
- Structural Bioinformatics, Inc, San Diego California 92127, USA.
| | | | | | | |
Collapse
|
23
|
Kelley LA, MacCallum RM, Sternberg MJ. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 2000; 299:499-520. [PMID: 10860755 DOI: 10.1006/jmbi.2000.3741] [Citation(s) in RCA: 1198] [Impact Index Per Article: 49.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
A method (three-dimensional position-specific scoring matrix, 3D-PSSM) to recognise remote protein sequence homologues is described. The method combines the power of multiple sequence profiles with knowledge of protein structure to provide enhanced recognition and thus functional assignment of newly sequenced genomes. The method uses structural alignments of homologous proteins of similar three-dimensional structure in the structural classification of proteins (SCOP) database to obtain a structural equivalence of residues. These equivalences are used to extend multiply aligned sequences obtained by standard sequence searches. The resulting large superfamily-based multiple alignment is converted into a PSSM. Combined with secondary structure matching and solvation potentials, 3D-PSSM can recognise structural and functional relationships beyond state-of-the-art sequence methods. In a cross-validated benchmark on 136 homologous relationships unambiguously undetectable by position-specific iterated basic local alignment search tool (PSI-Blast), 3D-PSSM can confidently assign 18 %. The method was applied to the remaining unassigned regions of the Mycoplasma genitalium genome and an additional 13 regions were assigned with 95 % confidence. 3D-PSSM is available to the community as a web server: http://www.bmm.icnet.uk/servers/3dpssm
Collapse
Affiliation(s)
- L A Kelley
- Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, 44 Lincoln's Inn Fields, London, WC2A 3PX, England
| | | | | |
Collapse
|
24
|
Domingues FS, Lackner P, Andreeva A, Sippl MJ. Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. J Mol Biol 2000; 297:1003-13. [PMID: 10736233 DOI: 10.1006/jmbi.2000.3615] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The biological role, biochemical function, and structure of uncharacterized protein sequences is often inferred from their similarity to known proteins. A constant goal is to increase the reliability, sensitivity, and accuracy of alignment techniques to enable the detection of increasingly distant relationships. Development, tuning, and testing of these methods benefit from appropriate benchmarks for the assessment of alignment accuracy.Here, we describe a benchmark protocol to estimate sequence-to-sequence and sequence-to-structure alignment accuracy. The protocol consists of structurally related pairs of proteins and procedures to evaluate alignment accuracy over the whole set. The set of protein pairs covers all the currently known fold types. The benchmark is challenging in the sense that it consists of proteins lacking clear sequence similarity. Correct target alignments are derived from the three-dimensional structures of these pairs by rigid body superposition. An evaluation engine computes the accuracy of alignments obtained from a particular algorithm in terms of alignment shifts with respect to the structure derived alignments. Using this benchmark we estimate that the best results can be obtained from a combination of amino acid residue substitution matrices and knowledge-based potentials.
Collapse
Affiliation(s)
- F S Domingues
- Center for Applied Molecular Engineering, Institute for Chemistry and Biochemistry, University of Salzburg, Jakob Haringer Strasse 3, Salzburg, A-5020, Austria
| | | | | | | |
Collapse
|
25
|
Panchenko AR, Marchler-Bauer A, Bryant SH. Combination of threading potentials and sequence profiles improves fold recognition. J Mol Biol 2000; 296:1319-31. [PMID: 10698636 DOI: 10.1006/jmbi.2000.3541] [Citation(s) in RCA: 102] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Using a benchmark set of structurally similar proteins, we conduct a series of threading experiments intended to identify a scoring function with an optimal combination of contact-potential and sequence-profile terms. The benchmark set is selected to include many medium-difficulty fold recognition targets, where sequence similarity is undetectable by BLAST but structural similarity is extensive. The contact potential is based on the log-odds of non-local contacts involving different amino acid pairs, in native as opposed to randomly compacted structures. The sequence profile term is that used in PSI-BLAST. We find that combination of these terms significantly improves the success rate of fold recognition over use of either term alone, with respect to both recognition sensitivity and the accuracy of threading models. Improvement is greatest for targets between 10 % and 20 % sequence identity and 60 % to 80 % superimposable residues, where the number of models crossing critical accuracy and significance thresholds more than doubles. We suggest that these improvements account for the successful performance of the combined scoring function at CASP3. We discuss possible explanations as to why sequence-profile and contact-potential terms appear complementary.
Collapse
Affiliation(s)
- A R Panchenko
- National Center for Biotechnology Information, National Institutes of Health, Building 38A, Room 8N805, Bethesda, MD 20894, USA
| | | | | |
Collapse
|
26
|
Koppensteiner WA, Lackner P, Wiederstein M, Sippl MJ. Characterization of novel proteins based on known protein structures. J Mol Biol 2000; 296:1139-52. [PMID: 10686110 DOI: 10.1006/jmbi.1999.3501] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The genome sciences face the challenge to characterize structure and function of a vast number of novel genes. Sequence search techniques are used to infer functional and structural information from similarities to experimentally characterized genes or proteins. The persistent goal is to refine these techniques and to develop alternative and complementary methods to increase the range of reliable inference.Here, we focus on the structural and functional assignments that can be inferred from the known three-dimensional structures of proteins. The study uses all structures in the Protein Data Bank that were known by the end of 1997. The protein structures released in 1998 were then characterized in terms of functional and structural similarity to the previously known structures, yielding an estimate of the maximum amount of information on novel protein sequences that can be obtained from inference techniques. The 147 globular proteins corresponding to 196 domains released in 1998 have no clear sequence similarity to previously known structures. However, 75 % of the domains have extensive structure similarity to previously known folds, and most importantly, in two out of three cases similarity in structure coincides with related function. In view of this analysis, full utilization of existing structure data bases would provide information for many new targets even if the relationship is not accessible from sequence information alone. Currently, the most sophisticated techniques detect of the order of one-third of these relationships.
Collapse
Affiliation(s)
- W A Koppensteiner
- Center for Applied Molecular Engineering, Institute for Chemistry and Biochemistry, University of Salzburg, Jakob-Haringer-Strasse 3, Salzburg, A-5020, Austria
| | | | | | | |
Collapse
|
27
|
|
28
|
Abstract
It is generally accepted that many different protein sequences have similar folded structures, and that there is a relatively high probability that a new sequence possesses a previously observed fold. An indirect consequence of this is that protein design should define the sequence space accessible to a given structure, rather than providing a single optimized sequence. We have recently developed a new approach for protein sequence design, which optimizes the complete sequence of a protein based on the knowledge of its backbone structure, its amino acid composition and a physical energy function including van der Waals interactions, electrostatics, and environment free energy. The specificity of the designed sequence for its template backbone is imposed by keeping the amino acid composition fixed. Here, we show that our procedure converges in sequence space, albeit not to the native sequence of the protein. We observe that while polar residues are well conserved in our designed sequences, non-polar amino acids at the surface of a protein are often replaced by polar residues. The designed sequences provide a multiple alignment of sequences that all adopt the same three-dimensional fold. This alignment is used to derive a profile matrix for chicken triose phosphate isomerase, TIM. The matrix is found to recognize significantly the native sequence for TIM, as well as closely related sequences. Possible application of this approach to protein fold recognition is discussed.
Collapse
Affiliation(s)
- P Koehl
- Department of Structural Biology, Fairchild Building, Stanford University, Stanford, CA 94305, USA.
| | | |
Collapse
|
29
|
Abstract
We present the recursive dynamic programming (RDP) method for the threading approach to three-dimensional protein structure prediction. RDP is based on the divide-and-conquer paradigm and maps the protein sequence whose backbone structure is to be found (the protein target) onto the known backbone structure of a model protein (the protein template) in a stepwise fashion, a technique that is similar to computing local alignments but utilising different cost functions. We begin by mapping parts of the target onto the template that show statistically significant similarity with the template sequence. After mapping, the template structure is modified in order to account for the mapped target residues. Then significant similarities between the yet unmapped parts of the target and the modified template are searched, and the resulting segments of the target are mapped onto the template. This recursive process of identifying segments in the target to be mapped onto the template and modifying the template is continued until no significant similarities between the remaining parts of target and template are found. Those parts which are left unmapped by the procedure are interpreted as gaps. The RDP method is robust in the sense that different local alignment methods can be used, several alternatives of mapping parts of the target onto the template can be handled and compared in the process, and the cost functions can be dynamically adapted to biological needs. Our computer experiments show that the RDP procedure is efficient and effective. We can thread a typical protein sequence against a database of 887 template domains in about 12 hours even on a low-cost workstation (SUN Ultra 5). In statistical evaluations on databases of known protein structures, RDP significantly outperforms competing methods. RDP has been especially valuable in providing accurate alignments for modeling active sites of proteins.RDP is part of the ToPLign system (GMD Toolbox for protein alignment) and can be accessed via the WWW independently or in concert with other ToPLign tools at http://cartan.gmd.de/ToPLign.html.
Collapse
Affiliation(s)
- R Thiele
- German National Research Center for Information Technology (GMD), Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, D-53754, Germany
| | | | | |
Collapse
|
30
|
Sternberg MJ, Bates PA, Kelley LA, MacCallum RM. Progress in protein structure prediction: assessment of CASP3. Curr Opin Struct Biol 1999; 9:368-73. [PMID: 10361096 DOI: 10.1016/s0959-440x(99)80050-5] [Citation(s) in RCA: 81] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The third comparative assessment of techniques of protein structure prediction (CASP3) was held during 1998. This is a blind trial in which structures are predicted prior to having knowledge of the coordinates, which are then revealed to enable the assessment. Three sections at the meeting evaluated different methodologies - comparative modelling, fold recognition and ab initio methods. For some, but not all of the target coordinates, high quality models were submitted in each of these sections. There have been improvements in prediction techniques since CASP2 in 1996, most notably for ab initio methods.
Collapse
Affiliation(s)
- M J Sternberg
- Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, London, UK.
| | | | | | | |
Collapse
|
31
|
Kikuchi T. Study of protein fluctuation with an effective inter-C? atomic potential derived from average distances between amino acids in proteins. J Comput Chem 1999; 20:713-719. [DOI: 10.1002/(sici)1096-987x(199905)20:7<713::aid-jcc6>3.0.co;2-s] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/1998] [Accepted: 01/08/1999] [Indexed: 11/09/2022]
|
32
|
de la Cruz X, Thornton JM. Factors limiting the performance of prediction-based fold recognition methods. Protein Sci 1999; 8:750-9. [PMID: 10211821 PMCID: PMC2144320 DOI: 10.1110/ps.8.4.750] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
In the past few years, a new generation of fold recognition methods has been developed, in which the classical sequence information is combined with information obtained from secondary structure and, sometimes, accessibility predictions. The results are promising, indicating that this approach may compete with potential-based methods (Rost B et al., 1997, J Mol Biol 270:471-480). Here we present a systematic study of the different factors contributing to the performance of these methods, in particular when applied to the problem of fold recognition of remote homologues. Our results indicate that secondary structure and accessibility prediction methods have reached an accuracy level where they are not the major factor limiting the accuracy of fold recognition. The pattern degeneracy problem is confirmed as the major source of error of these methods. On the basis of these results, we study three different options to overcome these limitations: normalization schemes, mapping of the coil state into the different zones of the Ramachandran plot, and post-threading graphical analysis.
Collapse
Affiliation(s)
- X de la Cruz
- Department of Biochemistry and Molecular Biology, University College, London, United Kingdom
| | | |
Collapse
|
33
|
|
34
|
|
35
|
|
36
|
Pons T, Olmea O, Chinea G, Beldarraín A, Márquez G, Acosta N, Rodríguez L, Valencia A. Structural model for family 32 of glycosyl-hydrolase enzymes. Proteins 1998; 33:383-95. [PMID: 9829697 DOI: 10.1002/(sici)1097-0134(19981115)33:3<383::aid-prot7>3.0.co;2-r] [Citation(s) in RCA: 50] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A structural model is presented for family 32 of the glycosyl-hydrolase enzymes based on the beta-propeller fold. The model is derived from the common prediction of two different threading methods, TOPITS and THREADER. In addition, we used a correlated mutation analysis and prediction of active-site residues to corroborate the proposed model. Physical techniques (circular dichroism and differential scanning calorimetry) confirmed two aspects of the prediction, the proposed all-beta fold and the multi-domain structure. The most reliable three-dimensional model was obtained using the structure of neuraminidase (1nscA) as template. The analysis of the position of the active site residues in this model is compatible with the catalytic mechanism proposed by Reddy and Maley (J. Biol. Chem. 271:13953-13958, 1996), which includes three conserved residues, Asp, Glu, and Cys. Based on this analysis, we propose the participation of one more conserved residue (Asp 162) in the catalytic mechanism. The model will facilitate further studies of the physical and biochemical characteristics of family 32 of the glycosyl-hydrolases.
Collapse
Affiliation(s)
- T Pons
- Centro de Ingeniería Genética y Biotecnología, Havana, Cuba.
| | | | | | | | | | | | | | | |
Collapse
|
37
|
Mirny LA, Shakhnovich EI. Protein structure prediction by threading. Why it works and why it does not. J Mol Biol 1998; 283:507-26. [PMID: 9769221 DOI: 10.1006/jmbi.1998.2092] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We developed a novel Monte Carlo threading algorithm which allows gaps and insertions both in the template structure and threaded sequence. The algorithm is able to find the optimal sequence-structure alignment and sample suboptimal alignments. Using our algorithm we performed sequence-structure alignments for a number of examples for three protein folds (ubiquitin, immunoglobulin and globin) using both "ideal" set of potentials (optimized to provide the best Z-score for a given protein) and more realistic knowledge-based potentials. Two physically different scenarios emerged. If a template structure is similar to the native one (within 2 A RMS), then (i) the optimal threading alignment is correct and robust with respect to deviations of the potential from the "ideal" one; (ii) suboptimal alignments are very similar to the optimal one; (iii) as Monte Carlo temperature decreases a sharp cooperative transition to the optimal alignment is observed. In contrast, if the template structure is only moderately close to the native structure (RMS greater than 3.5 A), then (i) the optimal alignment changes dramatically when an "ideal" potential is substituted by the real one; (ii) the structures of suboptimal alignments are very different from the optimal one, reducing the reliability of the alignment; (iii) the transition to the apparently optimal alignment is non-cooperative. In the intermediate cases when the RMS between the template and the native conformations is in the range between 2 A and 3.5 A, the success of threading alignment may depend on the quality of potentials used. These results are rationalized in terms of a threading free energy landscape. Possible ways to overcome the fundamental limitations of threading are discussed briefly.
Collapse
Affiliation(s)
- L A Mirny
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA, 02138, USA
| | | |
Collapse
|
38
|
Abstract
Genome sequencing projects continue to provide a flood of new protein sequences, and prediction methods remain an important means of adding structural information. Recently, there have been advances in secondary structure prediction, which feed, in turn, into improved fold recognition algorithms. Finally, there have been technical improvements in comparative modelling, and studies of the expected accuracy of three-dimensional structural models built by this method.
Collapse
Affiliation(s)
- D R Westhead
- The European Bioinformatics Institute EMBL Outstation Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK.
| | | |
Collapse
|
39
|
Pons T, Chinea G, Olmea O, Beldarraín A, Roca H, Padrón G, Valencia A. Structural model of Dex protein from Penicillium minioluteum and its implications in the mechanism of catalysis. Proteins 1998; 31:345-54. [PMID: 9626695 DOI: 10.1002/(sici)1097-0134(19980601)31:4<345::aid-prot2>3.0.co;2-h] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The DEX gene encodes an extracellular dextranase (EC 3.2.1.11); this enzyme hydrolyzes the alpha(1,6) glucosidic bond contained in dextran to release small isomaltosaccharides. Sequence analysis has revealed only one homologous sequence, CB-8 protein, from Arthrobacter sp., with 30% sequence identity. The secondary structure prediction for Dex was corroborated by circular dichroism measurements. To explore the possibility that Dex protein might adopt a fold similar to any known structure, we conducted a threading search of a three-dimensional structure database. This search revealed that the Dex sequence is compatible with the galactose oxidase/methanol dehydrogenase/sialidase fold. A structural model of Dex based on these results is physically and biologically plausible and leads to testable predictions, including the prediction that Asp246 and Glu299 might be catalytic residues. Also, according to this model the Dex enzyme has a mechanism of hydrolysis with net inversion of anomeric configuration.
Collapse
Affiliation(s)
- T Pons
- Centro de Ingeniería Genética y Biotecnología (CIGB), Havana, Cuba.
| | | | | | | | | | | | | |
Collapse
|
40
|
Sunyaev SR, Eisenhaber F, Argos P, Kuznetsov EN, Tumanyan VG. Are knowledge-based potentials derived from protein structure sets discriminative with respect to amino acid types? Proteins 1998. [DOI: 10.1002/(sici)1097-0134(19980515)31:3<225::aid-prot1>3.0.co;2-i] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
41
|
Steven R, Kubiseski TJ, Zheng H, Kulkarni S, Mancillas J, Ruiz Morales A, Hogue CW, Pawson T, Culotti J. UNC-73 activates the Rac GTPase and is required for cell and growth cone migrations in C. elegans. Cell 1998; 92:785-95. [PMID: 9529254 DOI: 10.1016/s0092-8674(00)81406-3] [Citation(s) in RCA: 257] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
unc-73 is required for cell migrations and axon guidance in C. elegans and encodes overlapping isoforms of 283 and 189 kDa that are closely related to the vertebrate Trio and Kalirin proteins, respectively. UNC-73A contains, in order, eight spectrin-like repeats, a Dbl/Pleckstrin homology (DH/PH) element, an SH3-like domain, a second DH/PH element, an immunoglobulin domain, and a fibronectin type III domain. UNC-73B terminates just downstream of the SH3-like domain. The first DH/PH element specifically activates the Rac GTPase in vitro and stimulates actin polymerization when expressed in Rat2 cells. Both functions are eliminated by introducing the S1216F mutation of unc-73(rh40) into this DH domain. Our results suggest that UNC-73 acts cell autonomously in a protein complex to regulate actin dynamics during cell and growth cone migrations.
Collapse
Affiliation(s)
- R Steven
- Samuel Lunenfeld Research Institute of Mt. Sinai Hospital, Toronto, Ontario, Canada
| | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Turcotte M, Muggleton SH, Sternberg MJE. Application of inductive logic programming to discover rules governing the three-dimensional topology of protein structure. INDUCTIVE LOGIC PROGRAMMING 1998. [DOI: 10.1007/bfb0027310] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
43
|
Benner SA, Cannarozzi G, Gerloff D, Turcotte M, Chelvanayagam G. Bona Fide Predictions of Protein Secondary Structure Using Transparent Analyses of Multiple Sequence Alignments. Chem Rev 1997; 97:2725-2844. [PMID: 11851479 DOI: 10.1021/cr940469a] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Steven A. Benner
- Department of Chemistry, University of Florida, Gainesville, Florida 32611-7200
| | | | | | | | | |
Collapse
|
44
|
Abstract
Prediction of protein structure by fold recognition, or threading, was recently put to the test in a 'blind' structure prediction experiment, CASP2. Thirty-two teams from around the world participated, preparing predictions for 22 different 'target' proteins whose structures were soon to be determined. As experimental structures became available, we, as organizers of the threading competition, computed objective measures of fold-recognition specificity and model accuracy, to identify and characterize successful predictions. Here, we present a brief summary of these prediction evaluations, a tally of 'correct' predictions and a discussion of factors associated with correct predictions. We find that threading produced specific recognition and accurate models whenever the structural database contained a template spanning a large fraction of target sequence. Presence of conserved sequence motifs was helpful, but not required, and it would appear that threading can succeed whenever similarity to a known structure is sufficiently extensive.
Collapse
Affiliation(s)
- A Marchler-Bauer
- Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | | |
Collapse
|
45
|
Abstract
Methods that compare a protein sequence directly to a structure can be divided into those that construct a molecular model (threading methods) and those that perform a sequence alignment with the structure encoded as a sequence of structural states (one-dimensional/three-dimensional (1D/3D) matching). The former take into account the internal packing of the molecule but the latter do not. On the other hand, it is simple to include multiple sequence data in a 1D/3D comparison but difficult in a threading method. Here, a protein sequence/structure alignment method is described that uses a combination of matching predicted and observed residue exposure, predicted and observed secondary structure (1D/3D) together with pairwise packing interactions in the core (threading). Using a variety of distantly related and analogous protein structures, the multiple sequence threading (MST) method was compared to a single sequence threading (SST) method (that uses complex potentials of mean-force) and also to a multiple sequence alignment (MSA) program. It was found that the MST method produced alignments that were better than the best that could be obtained with either the SST or MSA method. The method was found to be stable to error in both secondary structure prediction and predicted exposure and also under variation of the key parameters (fully described in an Appendix). The contribution of the pairwise term was found to be small but without it, the correct alignments were less stable and structurally unreasonable deletions were observed when matching against larger structures. Using the parameters derived for alignment, the method was able to recognise related folds in the structure databank with a specificity comparable to other methods.
Collapse
Affiliation(s)
- W R Taylor
- Division of Mathematical Biology, National Institute for Medical Research, Mill Hill, London, UK
| |
Collapse
|
46
|
Abstract
If protein structure prediction methods are to make any impact on the impending onerous task of analyzing the large numbers of unknown protein sequences generated by the ongoing genome-sequencing projects, it is vital that they make the difficult transition from computational 'gedankenexperiments' to practical software tools. This has already happened in the field of comparative modelling and is currently happening in the threading field. Unfortunately, there is little evidence of this transition happening in the field of ab initio tertiary-structure prediction.
Collapse
Affiliation(s)
- D T Jones
- Department of Biological Sciences, University of Warwick, Coventry, UK.
| |
Collapse
|
47
|
Abstract
An ever increasing number of protein sequences are being compared, partly because of the availability of full sets of protein sequences from several completed genome-sequencing projects. The resulting problem of scale has shifted the emphasis of sequence analysis method development from sensitivity and flexibility, which relies on manual intervention and interpretation, to the automatic generation of results of known reliability.
Collapse
Affiliation(s)
- T J Hubbard
- Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
| |
Collapse
|
48
|
Abstract
The advantages and disadvantages of database and molecular mechanics force fields for the study of macromolecules are compared, with emphasis on the ability to distinguish between correct and incorrect structures. Molecular mechanics force fields have the advantage of resting on a clear theoretical basis, permitting an in-depth analysis of different contributions. On the other hand, large simplifications are necessary for tractable computing, and there has so far been little effective testing at the macromolecular level. Database potentials allow greater freedom of functional form and have been shown to be effective at discriminating between correct and incorrect complete structures. The principal negative is a controversial relationship to free energy. More testing and comparison of both sorts of potential are needed.
Collapse
Affiliation(s)
- J Moult
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, 9600 Gudelsky Drive, Rockville, MD 20850, USA.
| |
Collapse
|
49
|
Abstract
The computational techniques of sorting out protein folds (these techniques include dynamic programming, self-consistent field theory, etc.) have already ceased to be the bottleneck of predictions. The main problem is that all the methods of recognition and prediction of protein structure can actually use only some part of the interactions operating in the chain, and that even their energies are not known precisely. This is the principal source of errors now. The errors can be reduced by employment of many distant homologues, but this opens a possibility to predict a generalized folding pattern rather than a particular fold with all its details.
Collapse
Affiliation(s)
- A V Finkelstein
- Institute of Protein Research, Russian Academy of Sciences, 142292 Pushchino, Moscow Region, Russia.
| |
Collapse
|
50
|
|