351
|
Fetrow JS, Godzik A, Skolnick J. Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity. J Mol Biol 1998; 282:703-11. [PMID: 9743619 DOI: 10.1006/jmbi.1998.2061] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The application of an automated method for the screening of protein activity based on the sequence-to-structure-to-function paradigm is presented for the complete Escherichia coli genome. First, the structure of the protein is identified from its sequence using a threading algorithm, which aligns the sequences to the best matching structure in a structural database and extends sequence analysis well beyond the limits of local sequence identity. Then, the active site is identified in the resulting sequence-to-structure alignment using a "fuzzy functional form" (FFF), a three-dimensional descriptor of the active site of a protein. Here, this sequence-to-structure-to-function concept is applied to analysis of the complete E. coli genome, i.e. all E. coli open reading frames (ORFs) are screened for the thiol-disulfide oxidoreductase activity of the glutaredoxin/thioredoxin protein family. We show that the method can identify the active sites in ten sequences that are known to or proposed to exhibit this activity. Furthermore, oxidoreductase activity is predicted in two other sequences that have not been identified previously. This method distinguishes protein pairs with similar active sites from proteins pairs that are just topological cousins, i.e. those having similar global folds, but not necessarily similar active sites. Thus, this method provides a novel approach for extraction of active site and functional information based on three-dimensional structures, rather than simple sequence analysis. Prediction of protein activity is fully automated and easily extendible to new functions. Finally, it is demonstrated here that the method can be applied to complete genome database analysis.
Collapse
|
352
|
Rychlewski L, Zhang B, Godzik A. Fold and function predictions for Mycoplasma genitalium proteins. FOLDING & DESIGN 1998; 3:229-38. [PMID: 9710568 DOI: 10.1016/s1359-0278(98)00034-0] [Citation(s) in RCA: 79] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
BACKGROUND Uncharacterized proteins from newly sequenced genomes provide perfect targets for fold and function prediction. RESULTS For 38% of the entire genome of Mycoplasma genitalium, sequence similarity to a protein with a known structure can be recognized using a new sequence alignment algorithm. When comparing genomes of M. genitalium and Escherichia coli, > 80% of M. genitalium proteins have a significant sequence similarity to a protein in E. coli and there are > 40 examples that have not been recognized before. For all cases of proteins with significant profile similarities, there are strong analogies in their functions, if the functions of both proteins are known. The results presented here and other recent results strongly support the argument that such proteins are actually homologous. Assuming this homology allows one to make tentative functional assignments for > 50 previously uncharacterized proteins, including such intriguing cases as the putative beta-lactam antibiotic resistance protein in M. gentalium. CONCLUSIONS Using a new profile-to-profile alignment algorithm, the three-dimensional fold can be predicted for almost 40% of proteins from a genome of the small bacterium M. genitalium, and tentative function can be assigned to almost 80% of the entire genome. Some predictions lead to new insights about known functions or point to hitherto unexpected features of M. genitalium.
Collapse
|
353
|
Fetrow JS, Godzik A. Function driven protein evolution. A possible proto-protein for the RNA-binding proteins. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 1998:485-96. [PMID: 9697206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
We introduce a hypothesis that present day proteins evolved from "proto-proteins," small 15-20 residue peptides with some elements of secondary structure and primitive function. Increasingly stable and functional proteins arose by adding structural elements to produce the small domains or protein modules that we would recognize today. From this point of view, the surprising similarities between small structural fragments of large proteins, that are usually taken as examples of convergent, function-driven evolution, are interpreted in exactly the opposite way--as traces of common evolutionary origin. As an example, a hypothetical evolutionary tree for two families of RNA binding proteins, the OB fold, a family of all beta proteins, and RBD fold, an alpha/beta protein family is presented. We argue that both protein families could have evolved from the same RNA-binding proto-protein, which had a form of beta-loop-beta RNA binding motif.
Collapse
|
354
|
Jaroszewski L, Rychlewski L, Zhang B, Godzik A. Fold prediction by a hierarchy of sequence, threading, and modeling methods. Protein Sci 1998; 7:1431-40. [PMID: 9655348 PMCID: PMC2144032 DOI: 10.1002/pro.5560070620] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Several fold recognition algorithms are compared to each other in terms of prediction accuracy and significance. It is shown that on standard benchmarks, hybrid methods, which combine scoring based on sequence-sequence and sequence-structure matching, surpass both sequence and threading methods in the number of accurate predictions. However, the sequence similarity contributes most to the prediction accuracy. This strongly argues that most examples of apparently nonhomologous proteins with similar folds are actually related by evolution. While disappointing from the perspective of the fundamental understanding of protein folding, this adds a new significance to fold recognition methods as a possible first step in function prediction. Despite hybrid methods being more accurate at fold prediction than either the sequence or threading methods, each of the methods is correct in some cases where others have failed. This partly reflects a different perspective on sequence/structure relationship embedded in various methods. To combine predictions from different methods, estimates of significance of predictions are made for all methods. With the help of such estimates, it is possible to develop a "jury" method, which has accuracy higher than any of the single methods. Finally, building full three-dimensional models for all top predictions helps to eliminate possible false positives where alignments, which are optimal in the one-dimensional sequences, lead to unsolvable sterical conflicts for the full three-dimensional models.
Collapse
|
355
|
Zhang B, Jaroszewski L, Rychlewski L, Godzik A. Similarities and differences between nonhomologous proteins with similar folds: evaluation of threading strategies. FOLDING & DESIGN 1998; 2:307-17. [PMID: 9377714 DOI: 10.1016/s1359-0278(97)00042-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
BACKGROUND There are many pairs and groups of proteins with similar folds and interaction patterns, but whose sequence similarity is below the threshold of easily recognizable sequence homology. The existence of multiple sequence solutions for a given fold has inspired fold prediction methods in which structural information from one protein is used to estimate the energy of another, putatively similar, structure. RESULTS A set of 68 pairs of proteins with similar folds and sequence identity in the 8-30% range is identified from the literature. for each pair, the energy of one protein, calculated using knowledge-based statistical potentials, is compared to the estimated energy, calculated with the same potentials but using the structural information (burial status and interaction pattern) of another protein with the same fold. Different energy estimates, corresponding to approximations used in various fold recognition algorithms, are calculated and compared to each other, as well as to the correct energy. It is shown that the local energy terms, based on burial and secondary structure preferences, can be reliably estimated with an accuracy close to 70%. At the same time, the two-body nonlocal energy loses over 60% of its value due to the repacking of the structure. Further approximations, such as the 'frozen approximation', can bring it to an essentially random value. CONCLUSIONS Local energy terms could be used safely to improve fold recognition algorithms. To utilize pair interaction information, specially designed pair potentials and/or a self-consistent description of pair interactions is necessary.
Collapse
|
356
|
Kolinski A, Skolnick J, Godzik A. An algorithm for prediction of structural elements in small proteins. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 1997:446-60. [PMID: 9390250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
A method for predicting the location of surface loops/turns and assigning the intervening secondary structure of the transglobular linkers in small, single domain globular proteins has been developed. Application to a set of 10 proteins of known structure indicates a high level of accuracy. The secondary structure assignment in the center of transglobular connections is correct in more than 85% of the cases. A similar error rate is found for loops. Since more global information about the fold is provided, it is complementary to standard secondary structure prediction approaches. Consequently, it may be useful in early stages of tertiary structure prediction when establishment of the structural class and possible folding topologies is of interest.
Collapse
|
357
|
Rychlewski L, Godzik A. Secondary structure prediction using segment similarity. PROTEIN ENGINEERING 1997; 10:1143-53. [PMID: 9488139 DOI: 10.1093/protein/10.10.1143] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
We present a secondary structure prediction method based on finding similarities between sequence segments from the target sequence and segments contained in the database of proteins with known structures. The similarity definition is optimized using a genetic algorithm and is based on a 21 x 40 similarity matrix, comparing a target sequence with the sequence and burial status of the proteins from the database. The three-state secondary structure prediction accuracy reaches 72.4% on a non homologous (maximum sequence identity <25%) data set derived from PDB and is reproduced on two independent testing sets, including the set of CASP2 prediction targets and a group of newly solved PDB structures. The prediction method was developed with simplicity and open architecture in mind, allowing for an easy extension to other types of predictions and to the analysis of the contributions to the local structure formation. For instance, the design of the prediction procedure allows us to trace back segments of the database that contributed to the prediction. It can be shown that those segments came from various structural classes and that even complete exclusion of related folds from the database does not result in a significant decrease in prediction accuracy.
Collapse
|
358
|
|
359
|
Hu WP, Godzik A, Skolnick J. Sequence-structure specificity--how does an inverse folding approach work? PROTEIN ENGINEERING 1997; 10:317-31. [PMID: 9194156 DOI: 10.1093/protein/10.4.317] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The inverse folding approach is a powerful tool in protein structure prediction when the native state of a sequence adopts one of the known protein folds. This is because some proteins show strong sequence-structure specificity in inverse folding experiments that allow gaps and insertions in the sequence-structure alignment. In those cases when structures similar to their native folds are included in the structure database, the z-scores (which measure the sequence-structure specificity) of these folds are well separated from those of other alternative structures. In this paper, we seek to understand the origin of this sequence-structure specificity and to identify how the specificity arises on passing from a short peptide chain to the entire protein sequence. To accomplish this objective, a simplified version of inverse folding, gapless inverse folding, is performed using sequence fragments of different sizes from 53 proteins. The results indicate that usually a significant portion of the entire protein sequence is necessary to show sequence-structure specificity, but there are regions in the sequence that begin to show this specificity at relatively short fragment size (15-20 residues). An island picture, in which the regions in the sequence that recognize their own native structure grow from some seed fragments, is observed as the fragment size increases. Usually, more similar structures to the native states are found in the top-scoring structural fragments in these high-specificity regions.
Collapse
|
360
|
Skolnick J, Jaroszewski L, Kolinski A, Godzik A. Derivation and testing of pair potentials for protein folding. When is the quasichemical approximation correct? Protein Sci 1997; 6:676-88. [PMID: 9070450 PMCID: PMC2143667 DOI: 10.1002/pro.5560060317] [Citation(s) in RCA: 152] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Many existing derivations of knowledge-based statistical pair potentials invoke the quasichemical approximation to estimate the expected side-chain contact frequency if there were no amino acid pair-specific interactions. At first glance, the quasichemical approximation that treats the residues in a protein as being disconnected and expresses the side-chain contact probability as being proportional to the product of the mole fractions of the pair of residues would appear to be rather severe. To investigate the validity of this approximation, we introduce two new reference states in which no specific pair interactions between amino acids are allowed, but in which the connectivity of the protein chain is retained. The first estimates the expected number of side-chain contracts by treating the protein as a Gaussian random coil polymer. The second, more realistic reference state includes the effects of chain connectivity, secondary structure, and chain compactness by estimating the expected side-chain contrast probability by placing the sequence of interest in each member of a library of structures of comparable compactness to the native conformation. The side-chain contact maps are not allowed to readjust to the sequence of interest, i.e., the side chains cannot repack. This situation would hold rigorously if all amino acids were the same size. Both reference states effectively permit the factorization of the side-chain contact probability into sequence-dependent and structure-dependent terms. Then, because the sequence distribution of amino acids in proteins is random, the quasichemical approximation to each of these reference states is shown to be excellent. Thus, the range of validity of the quasichemical approximation is determined by the magnitude of the side-chain repacking term, which is, at present, unknown. Finally, the performance of these two sets of pair interaction potentials as well as side-chain contact fraction-based interaction scales is assessed by inverse folding tests both without and with allowing for gaps.
Collapse
|
361
|
Kolinski A, Skolnick J, Godzik A, Hu WP. A method for the prediction of surface "U"-turns and transglobular connections in small proteins. Proteins 1997; 27:290-308. [PMID: 9061792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
A simple method for predicting the location of surface loops/turns that change the overall direction of the chain that is, "U" turns, and assigning the dominant secondary structure of the intervening transglobular blocks in small, single-domain globular proteins has been developed. Since the emphasis of the method is on the prediction of the major topological elements that comprise the global structure of the protein rather than on a detailed local secondary structure description, this approach is complementary to standard secondary structure prediction schemes. Consequently, it may be useful in the early stages of tertiary structure prediction when establishment of the structural class and possible folding topologies is of interest. Application to a set of small proteins of known structure indicates a high level of accuracy. The prediction of the approximate location of the surface turns/loops that are responsible for the change in overall chain direction is correct in more than 95% of the cases. The accuracy for the dominant secondary structure assignment for the linear blocks between such surface turns/loops is in the range of 82%.
Collapse
|
362
|
Kolinski A, Skolnick J, Godzik A, Hu WP. A method for the prediction of surface “U”-turns and transglobular connections in small proteins. Proteins 1997. [DOI: 10.1002/(sici)1097-0134(199702)27:2<290::aid-prot14>3.0.co;2-h] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
363
|
Pawłowski K, Jaroszewski L, Bierzyñski A, Godzik A. Multiple model approach--dealing with alignment ambiguities in protein modeling. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 1997:328-339. [PMID: 9390303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Sequence alignments for distantly homologous proteins are often ambiguous, which creates a weak link in structure prediction by homology. We address this problem by using several plausible alignments in a modeling procedure, obtaining many models of the target. All are subsequently evaluated by a threading algorithm. It is shown that this approach can identify best alignments and produce reasonable models, whose quality is now limited only by the extent of the structural similarity between the known and predicted protein. Using a similar approach structure prediction for the oxidized dimer of S100A1 protein, for which the structure is not known, is presented.
Collapse
|
364
|
Abstract
Structurally similar but sequentially unrelated proteins have been discovered and rediscovered by many researchers, using a variety of structure comparison tools. For several pairs of such proteins, existing structural alignments obtained from the literature, as well as alignments prepared using several different similarity criteria, are compared with each other. It is shown that, in general, they differ from each other, with differences increasing with diminishing sequence similarity. Differences are particularly strong between alignments optimizing global similarity measures, such as RMS deviation between C alpha atoms, and alignments focusing on more local features, such as packing or interaction pattern similarity. Simply speaking, by putting emphasis on different aspects of structure, different structural alignments show the unquestionable similarity in a different way. With differences between various alignments extending to a point where they can differ at all positions, analysis of structural similarities leads to contradictory results reported by groups using different alignment techniques. The problem of uniqueness and stability of structural alignments is further studied with the help of visualization of the suboptimal alignments. It is shown that alignments are often degenerate and whole families of alignments can be generated with almost the same score as the "optimal alignment." However, for some similarity criteria, specially those based on side-chain positions, rather than C alpha positions, alignments in some areas of the protein are unique. This opens the question of how and if the structural alignments can be used as "standards of truth" for protein comparison.
Collapse
|
365
|
Abstract
An interesting example of a structurally diverse group of sequentially homologous proteins is analyzed at the level of molecular interactions. In this family, the EF-hand calcium-binding proteins, there are examples of at least three distinct mutual positions of the N and C-terminal domains, despite significant sequence homology between all members of this family. Why does a particular protein choose one arrangement over another? To answer this question, detailed models of all proteins in their native structures as well as all alternative sequence/structure combinations are built by comparative modeling. By studying and comparing interactions stabilizing native structures and destabilizing alternative conformations, it is possible to gain insight into how such conformational diversity is achieved. It is shown that some mechanisms used to achieve it are: correlated mutations on the surface of two units and the presence of additional domains/chain fragments stabilizing desired topologies. The implications of these findings, both for structure predictions for other members of this family as well as the general problem of quaternary structure formation, are discussed.
Collapse
|
366
|
Abstract
Empirical potentials capture the essence of regularities seen in protein structures and can be used in simulations and predictions of protein structure or function. Derivations of such potentials require comparisons to be made between experimentally derived protein structures and theoretically constructed reference states.
Collapse
|
367
|
Godzik A, Koliński A, Skolnick J. Are proteins ideal mixtures of amino acids? Analysis of energy parameter sets. Protein Sci 1995; 4:2107-17. [PMID: 8535247 PMCID: PMC2142984 DOI: 10.1002/pro.5560041016] [Citation(s) in RCA: 119] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Various existing derivations of the effective potentials of mean force for the two-body interactions between amino acid side chains in proteins are reviewed and compared to each other. The differences between different parameter sets can be traced to the reference state used to define the zero of energy. Depending on the reference state, the transfer free energy or other pseudo-one-body contributions can be present to various extents in two-body parameter sets. It is, however, possible to compare various derivations directly by concentrating on the "excess" energy-a term that describes the difference between a real protein and an ideal solution of amino acids. Furthermore, the number of protein structures available for analysis allows one to check the consistency of the derivation and the errors by comparing parameters derived from various subsets of the whole database. It is shown that pair interaction preferences are very consistent throughout the database. Independently derived parameter sets have correlation coefficients on the order of 0.8, with the mean difference between equivalent entries of 0.1 kT. Also, the low-quality (low resolution, little or no refinement) structures show similar regularities. There are, however, large differences between interaction parameters derived on the basis of crystallographic structures and structures obtained by the NMR refinement. The origin of the latter difference is not yet understood.
Collapse
|
368
|
Abstract
The inverse of a folding problem is to find the ideal sequence that folds into a particular protein structure. This problem has been addressed using the topology fingerprint-based threading algorithm, capable of calculating a score (energy) of an arbitrary sequence-structure pair. At first, the search is conducted by unconstrained minimization of the energy in sequence space. It is shown that using energy as the only design criterion leads to spurious solutions with incorrect amino acid composition. The problem lies in the general features of the protein energy surface as a function of both structure and sequence. The proposed solution is to design the sequence by maximizing the difference between its energy in the desired structure and in other known protein structures. Depending on the size of the database of structures 'to avoid', sequences bearing significant similarity to the native sequence of the target protein are obtained using this procedure.
Collapse
|
369
|
Godzik A, Skolnick J. Flexible algorithm for direct multiple alignment of protein structures and sequences. COMPUTER APPLICATIONS IN THE BIOSCIENCES : CABIOS 1994; 10:587-96. [PMID: 7704657 DOI: 10.1093/bioinformatics/10.6.587] [Citation(s) in RCA: 24] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
The recently described equivalence between the alignment of two proteins and a conformation of a lattice chain on a two-dimensional square lattice is extended to multiple alignments. The search for the optimal multiple alignment between several proteins, which is equivalent to finding the energy minimum in the conformational space of a multi-dimensional lattice chain, is studied by the Monte Carlo approach. This method, while not deterministic, and for two-dimensional problems slower than dynamic programming, can accept arbitrary scoring functions, including non-local ones, and its speed decreases slowly with increasing number of dimensions. For the local scoring functions, the MC algorithm can also reproduce known exact solutions for the direct multiple alignments. As illustrated by examples, both for structure- and sequence-based alignments, direct multi-dimensional alignments are able to capture weak similarities between divergent families much better than ones built from pairwise alignments by a hierarchical approach.
Collapse
|
370
|
Godzik A, Skolnick J, Kolinski A. Regularities in interaction patterns of globular proteins. PROTEIN ENGINEERING 1993; 6:801-10. [PMID: 8309927 DOI: 10.1093/protein/6.8.801] [Citation(s) in RCA: 58] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The description of protein structure in the language of side chain contact maps is shown to offer many advantages over more traditional approaches. Because it focuses on side chain interactions, it aids in the discovery, study and classification of similarities between interactions defining particular protein folds and offers new insights into the rules of protein structure. For example, there is a small number of characteristic patterns of interactions between protein supersecondary structural fragments, which can be seen in various non-related proteins. Furthermore, the overlap of the side chain contact maps of two proteins provides a new measure of protein structure similarity. As shown in several examples, alignments based on contact map overlaps are a powerful alternative to other structure-based alignments.
Collapse
|
371
|
Godzik A, Kolinski A, Skolnick J. Lattice representations of globular proteins: How good are they? J Comput Chem 1993. [DOI: 10.1002/jcc.540141009] [Citation(s) in RCA: 76] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
372
|
Godzik A, Kolinski A, Skolnick J. De novo and inverse folding predictions of protein structure and dynamics. J Comput Aided Mol Des 1993; 7:397-438. [PMID: 8229093 DOI: 10.1007/bf02337559] [Citation(s) in RCA: 76] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
In the last two years, the use of simplified models has facilitated major progress in the globular protein folding problem, viz., the prediction of the three-dimensional (3D) structure of a globular protein from its amino acid sequence. A number of groups have addressed the inverse folding problem where one examines the compatibility of a given sequence with a given (and already determined) structure. A comparison of extant inverse protein-folding algorithms is presented, and methodologies for identifying sequences likely to adopt identical folding topologies, even when they lack sequence homology, are described. Extension to produce structural templates or fingerprints from idealized structures is discussed, and for eight-membered beta-barrel proteins, it is shown that idealized fingerprints constructed from simple topology diagrams can correctly identify sequences having the appropriate topology. Furthermore, this inverse folding algorithm is generalized to predict elements of supersecondary structure including beta-hairpins, helical hairpins and alpha/beta/alpha fragments. Then, we describe a very high coordination number lattice model that can predict the 3D structure of a number of globular proteins de novo; i.e. using just the amino acid sequence. Applications to sequences designed by DeGrado and co-workers [Biophys. J., 61 (1992) A265] predict folding intermediates, native states and relative stabilities in accord with experiment. The methodology has also been applied to the four-helix bundle designed by Richardson and co-workers [Science, 249 (1990) 884] and a redesigned monomeric version of a naturally occurring four-helix dimer, rop. Based on comparison to the rop dimer, the simulations predict conformations with rms values of 3-4 A from native. Furthermore, the de novo algorithms can assess the stability of the folds predicted from the inverse algorithm, while the inverse folding algorithms can assess the quality of the de novo models. Thus, the synergism of the de novo and inverse folding algorithm approaches provides a set of complementary tools that will facilitate further progress on the protein-folding problem.
Collapse
|
373
|
Skolnick J, Kolinski A, Brooks CL, Godzik A, Rey A. A method for predicting protein structure from sequence. Curr Biol 1993; 3:414-23. [PMID: 15335708 DOI: 10.1016/0960-9822(93)90348-r] [Citation(s) in RCA: 47] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/1993] [Revised: 06/08/1993] [Accepted: 06/08/1993] [Indexed: 10/26/2022]
Abstract
BACKGROUND The ability to predict the native conformation of a globular protein from its amino-acid sequence is an important unsolved problem of molecular biology. We have previously reported a method in which reduced representations of proteins are folded on a lattice by Monte Carlo simulation, using statistically-derived potentials. When applied to sequences designed to fold into four-helix bundles, this method generated predicted conformations closely resembling the real ones. RESULTS We now report a hierarchical approach to protein-structure prediction, in which two cycles of the above-mentioned lattice method (the second on a finer lattice) are followed by a full-atom molecular dynamics simulation. The end product of the simulations is thus a full-atom representation of the predicted structure. The application of this procedure to the 60 residue, B domain of staphylococcal protein A predicts a three-helix bundle with a backbone root mean square (rms) deviation of 2.25-3 A from the experimentally determined structure. Further application to a designed, 120 residue monomeric protein, mROP, based on the dimeric ROP protein of Escherichia coli, predicts a left turning, four-helix bundle native state. Although the ultimate assessment of the quality of this prediction awaits the experimental determination of the mROP structure, a comparison of this structure with the set of equivalent residues in the ROP dime- crystal structure indicates that they have a rms deviation of approximately 3.6-4.2 A. CONCLUSION Thus, for a set of helical proteins that have simple native topologies, the native folds of the proteins can be predicted with reasonable accuracy from their sequences alone. Our approach suggest a direction for future work addressing the protein-folding problem.
Collapse
|
374
|
Kolinski A, Godzik A, Skolnick J. A general method for the prediction of the three dimensional structure and folding pathway of globular proteins: Application to designed helical proteins. J Chem Phys 1993. [DOI: 10.1063/1.464706] [Citation(s) in RCA: 163] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
375
|
Skolnick J, Kolinski A, Godzik A. From independent modules to molten globules: observations on the nature of protein folding intermediates. Proc Natl Acad Sci U S A 1993; 90:2099-100. [PMID: 8460114 PMCID: PMC46030 DOI: 10.1073/pnas.90.6.2099] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
|
376
|
Godzik A, Skolnick J. Sequence-structure matching in globular proteins: application to supersecondary and tertiary structure determination. Proc Natl Acad Sci U S A 1992; 89:12098-102. [PMID: 1465445 PMCID: PMC50705 DOI: 10.1073/pnas.89.24.12098] [Citation(s) in RCA: 110] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
A methodology designed to address the inverse globular protein-folding problem (the identification of which sequences are compatible with a given three-dimensional structure) is described. By using a library of protein finger-prints, defined by the side chain interaction pattern, it is possible to match each structure to its own sequence in an exhaustive data base search. It is shown that this is a permissive requirement for the validation of the methodology. To pass the more rigorous test of identifying proteins that are not close sequence homologs, but that have similar structure, the method has been extended to include insertions and deletions in the sequence, which is compared to the fingerprint. This allows for the identification of sequences having little or no sequence homology to the fingerprint. Examples include plastocyanin/azurin/pseudoazurin, the globin family, different families of proteases and cytochromes, including cytochromes c' and b-562, actinidin/papain, and lysozyme/alpha-lactalbumin. Turning to supersecondary structure prediction, we find that alpha/beta/alpha fragments possess sufficient specificity to identify their own and related sequences. By threading a beta-hairpin through a sequence, it is possible to predict the location of such hairpins and turns with remarkable fidelity. Thus, the method greatly extends existing techniques for the prediction of both global structural homology and local supersecondary structure.
Collapse
|
377
|
Abstract
We describe the most general solution to date of the problem of matching globular protein sequences to the appropriate three-dimensional structures. The screening template, against which sequences are tested, is provided by a protein "structural fingerprint" library based on the contact map and the buried/exposed pattern of residues. Then, a lattice Monte Carlo algorithm validates or dismisses the stability of the proposed fold. Examples of known structural similarities between proteins having weakly or unrelated sequences such as the globins and phycocyanins, the eight-member alpha/beta fold of triose phosphate isomerase and even a close structural equivalence between azurin and immunoglobulins are found.
Collapse
|
378
|
Godzik A, Skolnick J, Kolinski A. Simulations of the folding pathway of triose phosphate isomerase-type alpha/beta barrel proteins. Proc Natl Acad Sci U S A 1992; 89:2629-33. [PMID: 1557367 PMCID: PMC48715 DOI: 10.1073/pnas.89.7.2629] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Simulations of the folding pathways of two large alpha/beta proteins, the alpha subunit of tryptophan synthase and triose phosphate isomerase, are reported using the knight's walk lattice model of globular proteins and Monte Carlo dynamics. Starting from randomly generated unfolded states and with no assumptions regarding the nature of the folding intermediates, for the tryptophan synthase subunit these simulations predict, in agreement with experiment, the existence and location of a stable equilibrium intermediate comprised of six beta strands on the amino terminus of the molecule. For the case of triose phosphate isomerase, the simulations predict that both amino- and carboxyl-terminal intermediates should be observed. In a significant modification of previous lattice models, this model includes a full heavy atom side chain description and is capable of representing native conformations at the level of 2.5- to 3-A rms deviation for the C alpha positions, as compared to the crystal structure. With a well-balanced compromise between accuracy of the protein description and the computer requirements necessary to perform simulations spanning biologically significant amounts of time, the lattice model described here brings the possibility of studying important biological processes to present-day computers.
Collapse
|
379
|
Godzik A. An estimation of energy parameters for the soliton movement in hydrogen-bonded chains. Chem Phys Lett 1990. [DOI: 10.1016/0009-2614(90)85229-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
380
|
Godzik A, Sander C. Conservation of residue interactions in a family of Ca-binding proteins. PROTEIN ENGINEERING 1989; 2:589-96. [PMID: 2813336 DOI: 10.1093/protein/2.8.589] [Citation(s) in RCA: 33] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
In the TNC family of Ca-binding proteins (calmodulin, parvalbumin, intestinal calcium binding protein and troponin C) approximately 70 well-conserved amino acid sequences and six crystal structures are known. We find a clear correlation between residue contacts in the structures and residue conservation in the sequences: residues with strong sidechain-sidechain contacts in the three-dimenesional structure tend to be the more conserved in the sequence. This is one way to quantify the intuitive notion of the importance of sidechain interactions for maintaining protein three-dimensional structure in evolution and may usefully be taken into account in planning point mutations in protein engineering.
Collapse
|
381
|
Dadlez M, Bierzyński A, Godzik A, Sobocińska M, Kupryszewski G. Conformational role of His-12 in C-peptide of ribonuclease A. Biophys Chem 1988; 31:175-81. [PMID: 3233287 DOI: 10.1016/0301-4622(88)80023-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Possible interactions of the His-12 ring with other side chain and backbone groups of C-peptide lactone (CPL) are discussed. The works published so far are critically reviewed and compared with the latest results obtained by the authors. The main new conclusion is that in the helical conformation of CPL, the Phe-8 and His-12 rings are clustered together. Studies of Phe-8----Ala analogs of CPL and calculations of ring current effects satisfactorily explain the observed environmental shifts of Phe-8 and His-12 protons in NMR spectra of CPL. Interaction between both rings is favorable for alpha-helix formation, but cannot explain an increase in helix stability related with protonation of His-12. This effect arises from favorable interactions of the charged His+-12 ring with the helix backbone.
Collapse
|
382
|
Abstract
The effects of the position of charged amino acid side chains on the stability of the alpha-helix are investigated. Calculations for the model polyAla 13 residue alpha-helix, with modifications based on experimental work, are performed at three levels of approximation. The observed stabilization of the alpha-helix could be explained by interactions between its macrodipole and charged amino acid side chains. Limitations of the model are discussed.
Collapse
|