26
|
Ortiz AR, Kolinski A, Skolnick J. Tertiary structure prediction of the KIX domain of CBP using Monte Carlo simulations driven by restraints derived from multiple sequence alignments. Proteins 1998; 30:287-94. [PMID: 9517544 DOI: 10.1002/(sici)1097-0134(19980215)30:3<287::aid-prot8>3.0.co;2-h] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Using a recently developed protein folding algorithm, a prediction of the tertiary structure of the KIX domain of the CREB binding protein is described. The method incorporates predicted secondary and tertiary restraints derived from multiple sequence alignments in a reduced protein model whose conformational space is explored by Monte Carlo dynamics. Secondary structure restraints are provided by the PHD secondary structure prediction algorithm that was modified for the presence of predicted U-turns, i.e., regions where the chain reverses global direction. Tertiary restraints are obtained via a two-step process: First, seed side-chain contacts are identified from a correlated mutation analysis, and then, a threading-based algorithm expands the number of these seed contacts. Blind predictions indicate that the KIX domain is a putative three-helix bundle, although the chirality of the bundle could not be uniquely determined. The expected root-mean-square deviation for the correct chirality of the KIX domain is between 5.0 and 6.2 A. This is to be compared with the estimate of 12.9 A that would be expected by a random prediction, using the model of F. Cohen and M. Sternberg (J. Mol. Biol. 138:321-333, 1980).
Collapse
|
27
|
Ortiz AR, Kolinski A, Skolnick J. Nativelike topology assembly of small proteins using predicted restraints in Monte Carlo folding simulations. Proc Natl Acad Sci U S A 1998; 95:1020-5. [PMID: 9448278 PMCID: PMC18658 DOI: 10.1073/pnas.95.3.1020] [Citation(s) in RCA: 47] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
By incorporating predicted secondary and tertiary restraints derived from multiple sequence alignments into ab initio folding simulations, it has been possible to assemble native-like tertiary structures for a test set of 19 nonhomologous proteins ranging from 29 to 100 residues in length and representing all secondary structural classes. Secondary structural restraints are provided by the PHD secondary structure prediction algorithm that incorporates multiple sequence information. Multiple sequence alignments also provide predicted tertiary restraints via a two-step process: First, seed side chain contacts are selected from a correlated mutation analysis, and then an inverse folding algorithm expands these seed contacts. The predicted secondary and tertiary restraints are incorporated into a lattice-based, reduced protein model for structure assembly and refinement. The resulting native-like topologies exhibit a coordinate root-mean-square deviation from native for the whole chain between 3.1 and 6.7 A, with values ranging from 2.6 to 4.1 A over approximately 80% of the structure. Overall, this study suggests that the use of restraints derived from multiple sequence alignments combined with a fold assembly algorithm is a promising approach to the prediction of the global topology of small proteins.
Collapse
|
28
|
Ortiz AR, Kolinski A, Skolnick J. Combined multiple sequence reduced protein model approach to predict the tertiary structure of small proteins. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 1998:377-388. [PMID: 9697197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
By incorporating predicted secondary and tertiary restraints into ab initio folding simulations, low resolution tertiary structures of a test set of 20 nonhomologous proteins have been predicted. These proteins, which represent all secondary structural classes, contain from 37 to 100 residues. Secondary structural restraints are provided by the PHD secondary structure prediction algorithm that incorporates multiple sequence information. Predicted tertiary restraints are obtained from multiple sequence alignments via a two-step process: First, "seed" side chain contacts are identified from a correlated mutation analysis, and then, the seed contacts are "expanded" by an inverse folding algorithm. These predicted restraints are then incorporated into a lattice based, reduced protein model. Depending upon fold complexity, the resulting nativelike topologies exhibit a coordinate root-mean-square deviation, cRMSD, from native between 3.1 and 6.7 A. Overall, this study suggests that the use of restraints derived from multiple sequence alignments combined with a fold assembly algorithm is a promising approach to the prediction of the global topology of small proteins.
Collapse
|
29
|
Kolinski A, Skolnick J, Godzik A. An algorithm for prediction of structural elements in small proteins. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 1997:446-60. [PMID: 9390250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
A method for predicting the location of surface loops/turns and assigning the intervening secondary structure of the transglobular linkers in small, single domain globular proteins has been developed. Application to a set of 10 proteins of known structure indicates a high level of accuracy. The secondary structure assignment in the center of transglobular connections is correct in more than 85% of the cases. A similar error rate is found for loops. Since more global information about the fold is provided, it is complementary to standard secondary structure prediction approaches. Consequently, it may be useful in early stages of tertiary structure prediction when establishment of the structural class and possible folding topologies is of interest.
Collapse
|
30
|
Hu WP, Kolinski A, Skolnick J. Improved method for prediction of protein backbone U-turn positions and major secondary structural elements between U-turns. Proteins 1997; 29:443-60. [PMID: 9408942 DOI: 10.1002/(sici)1097-0134(199712)29:4<443::aid-prot5>3.0.co;2-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
A new and more accurate method has been developed for predicting the backbone U-turn positions (where the chain reverses global direction) and the dominant secondary structure elements between U-turns in globular proteins. The current approach uses sequence-specific secondary structure propensities and multiple sequence information. The latter plays an important role in the enhanced success of this approach. Application to two sets (total 108) of small to medium-sized, single-domain proteins indicates that approximately 94% of the U-turn locations are correctly predicted within three residues, as are 88% of dominant secondary structure elements. These results are significantly better than our previous method (Kolinski et al., Proteins 27:290-308, 1997). The current study strongly suggests that the U-turn locations are primarily determined by local interactions. Furthermore, both global length constraints and local interactions contribute significantly to the determination of the secondary structure types between U-turns. Accurate U-turn predictions are crucial for accurate secondary structure predictions in the current method. Protein structure modeling, tertiary structure predictions, and possibly, fold recognition should benefit from the predicted structural data provided by this new method.
Collapse
|
31
|
Skolnick J, Jaroszewski L, Kolinski A, Godzik A. Derivation and testing of pair potentials for protein folding. When is the quasichemical approximation correct? Protein Sci 1997; 6:676-88. [PMID: 9070450 PMCID: PMC2143667 DOI: 10.1002/pro.5560060317] [Citation(s) in RCA: 152] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Many existing derivations of knowledge-based statistical pair potentials invoke the quasichemical approximation to estimate the expected side-chain contact frequency if there were no amino acid pair-specific interactions. At first glance, the quasichemical approximation that treats the residues in a protein as being disconnected and expresses the side-chain contact probability as being proportional to the product of the mole fractions of the pair of residues would appear to be rather severe. To investigate the validity of this approximation, we introduce two new reference states in which no specific pair interactions between amino acids are allowed, but in which the connectivity of the protein chain is retained. The first estimates the expected number of side-chain contracts by treating the protein as a Gaussian random coil polymer. The second, more realistic reference state includes the effects of chain connectivity, secondary structure, and chain compactness by estimating the expected side-chain contrast probability by placing the sequence of interest in each member of a library of structures of comparable compactness to the native conformation. The side-chain contact maps are not allowed to readjust to the sequence of interest, i.e., the side chains cannot repack. This situation would hold rigorously if all amino acids were the same size. Both reference states effectively permit the factorization of the side-chain contact probability into sequence-dependent and structure-dependent terms. Then, because the sequence distribution of amino acids in proteins is random, the quasichemical approximation to each of these reference states is shown to be excellent. Thus, the range of validity of the quasichemical approximation is determined by the magnitude of the side-chain repacking term, which is, at present, unknown. Finally, the performance of these two sets of pair interaction potentials as well as side-chain contact fraction-based interaction scales is assessed by inverse folding tests both without and with allowing for gaps.
Collapse
|
32
|
Kolinski A, Skolnick J, Godzik A, Hu WP. A method for the prediction of surface "U"-turns and transglobular connections in small proteins. Proteins 1997; 27:290-308. [PMID: 9061792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
A simple method for predicting the location of surface loops/turns that change the overall direction of the chain that is, "U" turns, and assigning the dominant secondary structure of the intervening transglobular blocks in small, single-domain globular proteins has been developed. Since the emphasis of the method is on the prediction of the major topological elements that comprise the global structure of the protein rather than on a detailed local secondary structure description, this approach is complementary to standard secondary structure prediction schemes. Consequently, it may be useful in the early stages of tertiary structure prediction when establishment of the structural class and possible folding topologies is of interest. Application to a set of small proteins of known structure indicates a high level of accuracy. The prediction of the approximate location of the surface turns/loops that are responsible for the change in overall chain direction is correct in more than 95% of the cases. The accuracy for the dominant secondary structure assignment for the linear blocks between such surface turns/loops is in the range of 82%.
Collapse
|
33
|
Skolnick J, Kolinski A, Ortiz AR. MONSSTER: a method for folding globular proteins with a small number of distance restraints. J Mol Biol 1997; 265:217-41. [PMID: 9020984 DOI: 10.1006/jmbi.1996.0720] [Citation(s) in RCA: 217] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
The MONSSTER (MOdeling of New Structures from Secondary and TEritary Restraints) method for folding of proteins using a small number of long-distance restraints (which can be up to seven times less than the total number of residues) and some knowledge of the secondary structure of regular fragments is described. The method employs a high-coordination lattice representation of the protein chain that incorporates a variety of potentials designed to produce protein-like behaviour. These include statistical preferences for secondary structure, side-chain burial interactions, and a hydrogen-bond potential. Using this algorithm, several globular proteins (1ctf, 2gbl, 2trx, 3fxn, 1mba, 1pcy and 6pti) have been folded to moderate-resolution, native-like compact states. For example, the 68 residue 1ctf molecule having ten loosely defined, long-range restraints was reproducibly obtained with a C alpha-backbone root-mean-square deviation (RMSD) from native of about 4. A. Flavodoxin with 35 restraints has been folded to structures whose average RMSD is 4.28 A. Furthermore, using just 20 restraints, myoglobin, which is a 146 residue helical protein, has been folded to structures whose average RMSD from native is 5.65 A. Plastocyanin with 25 long-range restraints adopts conformations whose average RMSD is 5.44 A. Possible applications of the proposed approach to the refinement of structures from NMR data, homology model-building and the determination of tertiary structure when the secondary structure and a small number of restraints are predicted are briefly discussed.
Collapse
|
34
|
Ortiz AR, Hu WP, Kolinski A, Skolnick J. Method for low resolution prediction of small protein tertiary structure. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 1997:316-327. [PMID: 9390302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
A new method for the de novo prediction of protein structures at low resolution has been developed. Starting from a multiple sequence alignment, protein secondary structure is predicted, and only those topological elements with high reliability are selected. Then, the multiple sequence alignment and the secondary structure prediction are combined to predict side chain contacts. Such contact map prediction is carried out in two stages. First, an analysis of correlated mutations is carried out to identify pairs of topological elements of secondary structure which are in contact. Then, inverse folding is used to select compatible fragments in contact, thereby enriching the number and identity of predicted side chain contacts. The final outcome of the procedure is a set of noisy secondary and tertiary restraints. These are used as a restrained potential in a Monte Carlo simulation of simplified protein models driven by statistical potentials. Low energy structures are then searched for by using simulated annealing techniques. Implementation of the restraints is carried out so as to take into account of their low resolution. Using this procedure, it has been possible to predict de novo the structure of three very different protein topologies: an alpha/beta protein, the bovine pancreatic trypsin inhibitor (6pti), an alpha-helical protein, calbindin (3icb), and an all beta- protein, the SH3 domain of spectrin (1shg). In all cases, low resolution folds have been obtained with a root mean square deviation (RMSD) of 4.5-5.5 A with respect to the native structure. Some misfolded topologies appear in the simulations, but it is possible to select the native one on energetic grounds. Thus, it is demonstrated that the methodology is general for all protein motifs. Work is in progress in order to test the methodology on a larger set of protein structures.
Collapse
|
35
|
Abstract
There is considerable experimental evidence that the cooperativity of protein folding resides in the transition from the molten globule to the native state. The objective of this study is to examine whether simplified models can reproduce this cooperativity and if so, to identify its origin. In particular, the thermodynamics of the conformational transition of a previously designed sequence (A. Kolinski, W. Galazka, and J. Skolnick, J. Chem. Phys. 103: 10286-10297, 1995), which adopts a very stable Greek-key beta-barrel fold has been investigated using the entropy Monte Carlo sampling (ESMC) technique of Hao and Scheraga (M.-H. Hao and H.A. Scheraga, J. Phys. Chem. 98: 9882-9883, 1994). Here, in addition to the original potential, which includes one body and pair interactions between side chains, the force field has been supplemented by two types of multi-body potentials describing side chain interactions. These potentials facilitate the protein-like pattern of side chain packing and consequently increase the cooperativity of the folding process. Those models that include an explicit cooperative side chain packing term exhibit a well-defined all-or-none transition from a denatured, random coil state to a high-density, well-defined, nativelike low-energy state. By contrast, models lacking such a term exhibit a conformational transition that is essentially continuous. Finally, an examination of the conformations at the free-energy barrier between the native and denatured states reveals that they contain a substantial amount of native-state secondary structure, about 50% of the native contacts, and have an average root mean square radius of gyration that is about 15% larger than native.
Collapse
|
36
|
Abstract
In solution, the B domain of protein A from Staphylococcus aureus (B domain) possesses a three-helix bundle structure. This simple motif has been previously reproduced by Kolinski and Skolnick (Proteins 18: 353-366, 1994) using a reduced representation lattice model of proteins with a statistical interaction scheme. In this paper, an improved version of the potential has been used, and the robustness of this result has been tested by folding from the random state a set of three-helix bundle proteins that are highly homologous to the B domain of protein A. Furthermore, an attempt to redesign the B domain native structure to its topological mirror image fold has been made by multiple mutations of the hydrophobic core and the turn region between helices I and II. A sieve method for scanning a large set of mutations to search for this desired property has been proposed. It has been shown that mutations of native B domain hydrophobic core do not introduce significant changes in the protein motif. Mutations in the turn region were also very conservative; nevertheless, a few mutants acquired the desired topological mirror image motif. A set of all atom models of the most probable mutant was reconstructed from the reduced models and refined using a molecular dynamics algorithm in the presence of water. The packing of all atom structures obtained corroborates the lattice model results. We conclude that the change in the handedness of the turn induced by the mutations, augmented by the repacking of hydrophobic core and the additional burial of the second helix N-cap side chain, are responsible for the predicted preferential adoption of the mirror image structure.
Collapse
|
37
|
Vieth M, Kolinski A, Skolnick J. Method for predicting the state of association of discretized protein models. Application to leucine zippers. Biochemistry 1996; 35:955-67. [PMID: 8547278 DOI: 10.1021/bi9520702] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
A method that employs a transfer matrix treatment combined with Monte Carlo sampling has been used to calculate the configurational free energies of folded and unfolded states of lattice models of proteins. The method is successfully applied to study the monomer-dimer equilibria in various coiled coils. For the short coiled coils, GCN4 leucine zipper, and its fragments, Fos and Jun, very good agreement is found with experiment. Experimentally, some subdomains of the GCN4 leucine zipper form stable dimeric structures, suggesting the regions of differential stability in the parent structure. Our calculations suggest that the stabilities of the subdomains are in general different from the values expected simply from the stability of the corresponding fragment in the wild type molecule. Furthermore, parts of the fragments structurally rearrange in some regions with respect to their corresponding wild type positions. Our results suggest for an Asn in the dimerization interface at least a pair of hydrophobic interacting helical turns at each side is required to stabilize the stable coiled coil. Finally, the specificity of heterodimer formation in the Fos-Jun system comes from the relative instability of Fos homodimers, resulting from unfavorable intra- and interhelical interactions in the interfacial coiled coil region.
Collapse
|
38
|
Olszewski KA, Kolinski A, Skolnick J. Does a backwardly read protein sequence have a unique native state? PROTEIN ENGINEERING 1996; 9:5-14. [PMID: 9053902 DOI: 10.1093/protein/9.1.5] [Citation(s) in RCA: 50] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Amino acid sequences of native proteins are generally not palindromic. Nevertheless, the protein molecule obtained as a result of reading the sequence backwards, i.e. a retro-protein, obviously has the same amino acid composition and the same hydrophobicity profile as the native sequence. The important questions which arise in the context of retro-proteins are: does a retro-protein fold to a well defined native-like structure as natural proteins do and, if the answer is positive, does a retro-protein fold to a structure similar to the native conformation of the original protein? In this work, the fold of retro-protein A, originated from the retro-sequence of the B domain of Staphylococcal protein A, was studied. As a result of lattice model simulations, it is conjectured that the retro-protein A also forms a three-helix bundle structure in solution. It is also predicted that the topology of the retro-protein A three-helix bundle is that of the native protein A, rather than that corresponding to the mirror image of native protein A. Secondary structure elements in the retro-protein do not exactly match their counterparts in the original protein structure; however, the amino acid side chain contract pattern of the hydrophobic core is partly conserved.
Collapse
|
39
|
Vieth M, Kolinski A, Brooks CL, Skolnick J. Prediction of quaternary structure of coiled coils. Application to mutants of the GCN4 leucine zipper. J Mol Biol 1995; 251:448-67. [PMID: 7650742 DOI: 10.1006/jmbi.1995.0447] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Using a simplified protein model, the equilibrium between different oligomeric species of the wild-type GCN4 leucine zipper and seven of its mutants have been predicted. Over the entire experimental concentration range, agreement with experiment is found in five cases, while in two cases agreement is found over a portion of the concentration range. These studies demonstrate a methodology for predicting coiled coil quaternary structure and allow for the dissection of the interactions responsible for the global fold. In agreement with the conclusion of Harbury et al., the results of the simulations indicate that the pattern of hydrophobic and hydrophilic residues alone is insufficient to define a protein's three-dimensional structure. In addition, these simulations indicate that the degree of chain association is determined by the balance between specific side-chain packing preferences and the entropy reduction associated with side-chain burial in higher-order multimers.
Collapse
|
40
|
Milik M, Kolinski A, Skolnick J. Neural network system for the evaluation of side-chain packing in protein structures. PROTEIN ENGINEERING 1995; 8:225-36. [PMID: 7479684 DOI: 10.1093/protein/8.3.225] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
An artificial neural network system is used for pattern recognition in protein side-chain-side-chain contact maps. A back-propagation network was trained on a set of patterns which are popular in side-chain contact maps of protein structures. Several neural network architectures and different training parameters were tested to decide on the best combination for the neural network. The resulting network can distinguish between original (from protein structures) and randomized patterns with an accuracy of 84.5% and a Matthews' coefficient of 0.72 for the testing set. Applications of this system for protein structure evaluation and refinement are also proposed. Examples include structures obtained after the application of molecular dynamics to crystal structures, structures obtained from X-ray crystallography at various stages of refinement, structures obtained from a de novo folding algorithm and deliberately misfolded structures.
Collapse
|
41
|
Vieth M, Kolinski A, Brooks CL, Skolnick J. Prediction of the folding pathways and structure of the GCN4 leucine zipper. J Mol Biol 1994; 237:361-7. [PMID: 8151697 DOI: 10.1006/jmbi.1994.1239] [Citation(s) in RCA: 93] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
A hierarchical approach is described for the prediction of the three-dimensional structure and folding pathway of the GCN4 leucine zipper. Dimer assembly is simulated by Monte Carlo dynamics. The resulting lowest energy structures undergo cooperative rearrangement of their hydrophobic core leading to side-chain fixation. The coarse-grained structures are further refined using a molecular dynamics annealing protocol. This produces full atom models with a backbone root-mean-square deviation from the crystal structure of 0.81 A. Thus, we demonstrate the predictive ability of our approach to yield high resolution structures of small coiled coils from their sequence.
Collapse
|
42
|
Kolinski A, Skolnick J. Monte Carlo simulations of protein folding. II. Application to protein A, ROP, and crambin. Proteins 1994; 18:353-66. [PMID: 8208727 DOI: 10.1002/prot.340180406] [Citation(s) in RCA: 122] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The hierarchy of lattice Monte Carlo models described in the accompanying paper (Kolinski, A., Skolnick, J. Monte Carlo simulations of protein folding. I. Lattice model and interaction scheme. Proteins 18:338-352, 1994) is applied to the simulation of protein folding and the prediction of 3-dimensional structure. Using sequence information alone, three proteins have been successfully folded: the B domain of staphylococcal protein A, a 120 residue, monomeric version of ROP dimer, and crambin. Starting from a random expanded conformation, the model proteins fold along relatively well-defined folding pathways. These involve a collection of early intermediates, which are followed by the final (and rate-determining) transition from compact intermediates closely resembling the molten globule state to the native-like state. The predicted structures are rather unique, with native-like packing of the side chains. The accuracy of the predicted native conformations is better than those obtained in previous folding simulations. The best (but by no means atypical) folds of protein A have a coordinate rms of 2.25 A from the native C alpha trace, and the best coordinate rms from crambin is 3.18 A. For ROP monomer, the lowest coordinate rms from equivalent C alpha s of ROP dimer is 3.65 A. Thus, for two simple helical proteins and a small alpha/beta protein, the ability to predict protein structure from sequence has been demonstrated.
Collapse
|
43
|
Kolinski A, Skolnick J. Monte Carlo simulations of protein folding. I. Lattice model and interaction scheme. Proteins 1994; 18:338-52. [PMID: 8208726 DOI: 10.1002/prot.340180405] [Citation(s) in RCA: 244] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
A new hierarchical method for the simulation of the protein folding process and the de novo prediction of protein three-dimensional structure is proposed. The reduced representation of the protein alpha-carbon backbone employs lattice discretizations of increasing geometrical resolution and a single ball representation of side chain rotamers. In particular, coarser and finer lattice backbone descriptions are used. The coarser (finer) lattice represents C alpha traces of native proteins with an accuracy of 1.0 (0.7) A rms. Folding is simulated by means of very fast Monte Carlo lattice dynamics. The potential of mean force, predominantly of statistical origin, contains several novel terms that facilitate the cooperative assembly of secondary structure elements and the cooperative packing of the side chains. Particular contributions to the interaction scheme are discussed in detail. In the accompanying paper (Kolinski, A., Skolnick, J. Monte Carlo simulation of protein folding. II. Application to protein A, ROP, and crambin. Proteins 18:353-366, 1994), the method is applied to three small globular proteins.
Collapse
|
44
|
Godzik A, Skolnick J, Kolinski A. Regularities in interaction patterns of globular proteins. PROTEIN ENGINEERING 1993; 6:801-10. [PMID: 8309927 DOI: 10.1093/protein/6.8.801] [Citation(s) in RCA: 58] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The description of protein structure in the language of side chain contact maps is shown to offer many advantages over more traditional approaches. Because it focuses on side chain interactions, it aids in the discovery, study and classification of similarities between interactions defining particular protein folds and offers new insights into the rules of protein structure. For example, there is a small number of characteristic patterns of interactions between protein supersecondary structural fragments, which can be seen in various non-related proteins. Furthermore, the overlap of the side chain contact maps of two proteins provides a new measure of protein structure similarity. As shown in several examples, alignments based on contact map overlaps are a powerful alternative to other structure-based alignments.
Collapse
|
45
|
Godzik A, Kolinski A, Skolnick J. De novo and inverse folding predictions of protein structure and dynamics. J Comput Aided Mol Des 1993; 7:397-438. [PMID: 8229093 DOI: 10.1007/bf02337559] [Citation(s) in RCA: 76] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
In the last two years, the use of simplified models has facilitated major progress in the globular protein folding problem, viz., the prediction of the three-dimensional (3D) structure of a globular protein from its amino acid sequence. A number of groups have addressed the inverse folding problem where one examines the compatibility of a given sequence with a given (and already determined) structure. A comparison of extant inverse protein-folding algorithms is presented, and methodologies for identifying sequences likely to adopt identical folding topologies, even when they lack sequence homology, are described. Extension to produce structural templates or fingerprints from idealized structures is discussed, and for eight-membered beta-barrel proteins, it is shown that idealized fingerprints constructed from simple topology diagrams can correctly identify sequences having the appropriate topology. Furthermore, this inverse folding algorithm is generalized to predict elements of supersecondary structure including beta-hairpins, helical hairpins and alpha/beta/alpha fragments. Then, we describe a very high coordination number lattice model that can predict the 3D structure of a number of globular proteins de novo; i.e. using just the amino acid sequence. Applications to sequences designed by DeGrado and co-workers [Biophys. J., 61 (1992) A265] predict folding intermediates, native states and relative stabilities in accord with experiment. The methodology has also been applied to the four-helix bundle designed by Richardson and co-workers [Science, 249 (1990) 884] and a redesigned monomeric version of a naturally occurring four-helix dimer, rop. Based on comparison to the rop dimer, the simulations predict conformations with rms values of 3-4 A from native. Furthermore, the de novo algorithms can assess the stability of the folds predicted from the inverse algorithm, while the inverse folding algorithms can assess the quality of the de novo models. Thus, the synergism of the de novo and inverse folding algorithm approaches provides a set of complementary tools that will facilitate further progress on the protein-folding problem.
Collapse
|
46
|
Skolnick J, Kolinski A, Brooks CL, Godzik A, Rey A. A method for predicting protein structure from sequence. Curr Biol 1993; 3:414-23. [PMID: 15335708 DOI: 10.1016/0960-9822(93)90348-r] [Citation(s) in RCA: 47] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/1993] [Revised: 06/08/1993] [Accepted: 06/08/1993] [Indexed: 10/26/2022]
Abstract
BACKGROUND The ability to predict the native conformation of a globular protein from its amino-acid sequence is an important unsolved problem of molecular biology. We have previously reported a method in which reduced representations of proteins are folded on a lattice by Monte Carlo simulation, using statistically-derived potentials. When applied to sequences designed to fold into four-helix bundles, this method generated predicted conformations closely resembling the real ones. RESULTS We now report a hierarchical approach to protein-structure prediction, in which two cycles of the above-mentioned lattice method (the second on a finer lattice) are followed by a full-atom molecular dynamics simulation. The end product of the simulations is thus a full-atom representation of the predicted structure. The application of this procedure to the 60 residue, B domain of staphylococcal protein A predicts a three-helix bundle with a backbone root mean square (rms) deviation of 2.25-3 A from the experimentally determined structure. Further application to a designed, 120 residue monomeric protein, mROP, based on the dimeric ROP protein of Escherichia coli, predicts a left turning, four-helix bundle native state. Although the ultimate assessment of the quality of this prediction awaits the experimental determination of the mROP structure, a comparison of this structure with the set of equivalent residues in the ROP dime- crystal structure indicates that they have a rms deviation of approximately 3.6-4.2 A. CONCLUSION Thus, for a set of helical proteins that have simple native topologies, the native folds of the proteins can be predicted with reasonable accuracy from their sequences alone. Our approach suggest a direction for future work addressing the protein-folding problem.
Collapse
|
47
|
Skolnick J, Kolinski A, Godzik A. From independent modules to molten globules: observations on the nature of protein folding intermediates. Proc Natl Acad Sci U S A 1993; 90:2099-100. [PMID: 8460114 PMCID: PMC46030 DOI: 10.1073/pnas.90.6.2099] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
|
48
|
Abstract
We describe the most general solution to date of the problem of matching globular protein sequences to the appropriate three-dimensional structures. The screening template, against which sequences are tested, is provided by a protein "structural fingerprint" library based on the contact map and the buried/exposed pattern of residues. Then, a lattice Monte Carlo algorithm validates or dismisses the stability of the proposed fold. Examples of known structural similarities between proteins having weakly or unrelated sequences such as the globins and phycocyanins, the eight-member alpha/beta fold of triose phosphate isomerase and even a close structural equivalence between azurin and immunoglobulins are found.
Collapse
|
49
|
Godzik A, Skolnick J, Kolinski A. Simulations of the folding pathway of triose phosphate isomerase-type alpha/beta barrel proteins. Proc Natl Acad Sci U S A 1992; 89:2629-33. [PMID: 1557367 PMCID: PMC48715 DOI: 10.1073/pnas.89.7.2629] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Simulations of the folding pathways of two large alpha/beta proteins, the alpha subunit of tryptophan synthase and triose phosphate isomerase, are reported using the knight's walk lattice model of globular proteins and Monte Carlo dynamics. Starting from randomly generated unfolded states and with no assumptions regarding the nature of the folding intermediates, for the tryptophan synthase subunit these simulations predict, in agreement with experiment, the existence and location of a stable equilibrium intermediate comprised of six beta strands on the amino terminus of the molecule. For the case of triose phosphate isomerase, the simulations predict that both amino- and carboxyl-terminal intermediates should be observed. In a significant modification of previous lattice models, this model includes a full heavy atom side chain description and is capable of representing native conformations at the level of 2.5- to 3-A rms deviation for the C alpha positions, as compared to the crystal structure. With a well-balanced compromise between accuracy of the protein description and the computer requirements necessary to perform simulations spanning biologically significant amounts of time, the lattice model described here brings the possibility of studying important biological processes to present-day computers.
Collapse
|
50
|
Skolnick J, Kolinski A. Dynamic Monte Carlo simulations of a new lattice model of globular protein folding, structure and dynamics. J Mol Biol 1991; 221:499-531. [PMID: 1920430 DOI: 10.1016/0022-2836(91)80070-b] [Citation(s) in RCA: 140] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
A long-standing problem of molecular biology is the prediction of globular protein tertiary structure from the primary sequence. In the context of a new, 24-nearest-neighbor lattice model of proteins that includes both alpha and beta-carbon atoms, the requirements for folding to a unique four-member beta-barrel, four-helix bundles and a model alpha/beta-bundle have been explored. A number of distinct situations are examined, but the common requirements for the formation of a unique native conformation are tertiary interactions plus the presence of relatively small (but not irrelevant) intrinsic turn preferences that select out the native conformer from a manifold of compact states. When side-chains are explicitly included, there are many conformations having the same or a slightly greater number of side-chain contacts as in the native conformation, and it is the local intrinsic turn preferences that produce the conformational selectivity on collapse. The local preference for helix or beta-sheet secondary structure may be at odds with the secondary structure ultimately found in the native conformation. The requisite intrinsic turn populations are about 0.3% for beta-proteins, 2% for mixed alpha/beta-proteins and 6% for helix bundles. In addition, an idealized model of an allosteric conformational transition has been examined. Folding occurs predominantly by a sequential on-site assembly mechanism with folding initiating either at a turn or from an isolated helix or beta-strand (where appropriate). For helical and beta-protein models, similar folding pathways were obtained in diamond lattice simulations, using an entirely different set of local Monte Carlo moves. This argues strongly that the results are universal; that is, they are independent of lattice, protein model or the particular realization of Monte Carlo dynamics. Overall, these simulations demonstrate that the folding of all known protein motifs can be achieved in the context of a single class of lattice models that includes realistic backbone structures and idealized side-chains.
Collapse
|