151
|
Bahar I, Jernigan RL. Inter-residue potentials in globular proteins and the dominance of highly specific hydrophilic interactions at close separation. J Mol Biol 1997; 266:195-214. [PMID: 9054980 DOI: 10.1006/jmbi.1996.0758] [Citation(s) in RCA: 244] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Residue-specific potentials between pairs of side-chains and pairs of side-chain-backbone interaction sites have been generated by collecting radial distribution data for 302 protein structures. Multiple atomic interactions have been utilized to enhance the specificity and smooth the distance-dependence of the potentials. The potentials are demonstrated to successfully discriminate correct sequences in inverse folding experiments. Many specific effects are observable in the non-bonded potentials; grouping of residue types is inappropriate, since each residue type manifests some unique behavior. Only a weak dependence is seen on protein size and composition. Effective contact potentials operating in three different environments (self, solvent-exposed and residue-exposed) and over any distance range are presented. The effective contact potentials obtained from the integration of radial distributions over the distance interval r < or = 6.4 A are in excellent agreement with published values. The hydrophobic interactions are verified to be dominantly strong in this range. Comparison of these with a newly derived set of effective contact potentials for closer inter-residue separations (r < or = 4.0 A) demonstrates drastic changes in the most favorable interactions. In the closer approach case, where the number of pairs with a given residue is approximately one, the highly specific interactions between charged and polar side-chains predominate. These closer approach values could be utilized to select successively the relative positions and directions of residue side-chains in protein simulations, following a hierarchical algorithm optimizing side-chain-side-chain interactions over the two successively closer distance ranges. The homogeneous contribution to stability is stronger than the specific contribution by about a factor of 5. Overall, the total non-bonded interaction energy calculated for individual proteins follows a dependence on the number of residues of the form of n1.28, indicating an enhanced stability for larger proteins.
Collapse
Affiliation(s)
- I Bahar
- Molecular Structure Section, National Cancer Institute, National Institutes of Health, Bethesda MD 20892-5677, USA
| | | |
Collapse
|
152
|
Abstract
The computational techniques of sorting out protein folds (these techniques include dynamic programming, self-consistent field theory, etc.) have already ceased to be the bottleneck of predictions. The main problem is that all the methods of recognition and prediction of protein structure can actually use only some part of the interactions operating in the chain, and that even their energies are not known precisely. This is the principal source of errors now. The errors can be reduced by employment of many distant homologues, but this opens a possibility to predict a generalized folding pattern rather than a particular fold with all its details.
Collapse
Affiliation(s)
- A V Finkelstein
- Institute of Protein Research, Russian Academy of Sciences, 142292 Pushchino, Moscow Region, Russia.
| |
Collapse
|
153
|
Abstract
The last stage of protein folding, the "endgame," involves the ordering of amino acid side-chains into a well defined and closely packed configuration. We review a number of topics related to this process. We first describe how the observed packing in protein crystal structures is measured. Such measurements show that the protein interior is packed exceptionally tightly, more so than the protein surface or surrounding solvent and even more efficiently than crystals of simple organic molecules. In vitro protein folding experiments also show that the protein is close-packed in solution and that the tight packing and intercalation of side-chains is a final and essential step in the folding pathway. These experimental observations, in turn, suggest that a folded protein structure can be described as a kind of three-dimensional jigsaw puzzle and that predicting side-chain packing is possible in the sense of solving this puzzle. The major difficulty that must be overcome in predicting side-chain packing is a combinatorial "explosion" in the number of possible configurations. There has been much recent progress towards overcoming this problem, and we survey a variety of the approaches. These approaches differ principally in whether they use ab initio (physical) or more knowledge-based methods, how they divide up and search conformational space, and how they evaluate candidate configurations (using scoring functions). The accuracy of side-chain prediction depends crucially on the (assumed) positioning of the main-chain. Methods for predicting main-chain conformation are, in a sense, not as developed as that for side-chains. We conclude by surveying these methods. As with side-chain prediction, there are a great variety of approaches, which differ in how they divide up and search space and in how they score candidate conformations.
Collapse
Affiliation(s)
- M Levitt
- Department of Structural Biology, Stanford University School of Medicine, California 94305, USA
| | | | | | | | | |
Collapse
|
154
|
Gomar J, Sodano P, Ptak M, Vovelle F. Homology modelling of an antimicrobial protein, Ace-AMP1, from lipid transfer protein structures. FOLDING & DESIGN 1997; 2:183-92. [PMID: 9218956 DOI: 10.1016/s1359-0278(97)00025-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
BACKGROUND Plant nonspecific lipid transfer proteins (ns-LTPs) are small basic proteins that facilitate lipid shuttling between membranes in vitro. The function of ns-LTPs in vivo is still unknown. It has been suggested, in relation to their lipid binding ability, that they may be involved in cutin formation. Alternatively, they may act in the plant defence system against pathogenic agents. Ace-AMP1 is an antimicrobial protein extracted from onion seed that shows sequence homology with ns-LTPs but that is unable to transfer lipids. We have recently determined the three-dimensional structure of wheat and maize ns-LTPs. In order to compare the structural features of Ace-AMP1 and ns-LTPs, we have used the comparative modelling software MODELLER to predict the structure of Ace-AMP1. RESULTS The global fold of Ace-AMP1 is very similar to those of ns-LTPs, involving four helices and a C-terminal tail without secondary structure elements. The structure of maize and wheat ns-LTP is characterized by the existence of a tunnel-like hydrophobic cavity in which a lipid molecule can be inserted. In the Ace-AMP1 structure, this cavity is blocked by a number of bulky residues. Similarly, the electrostatic potential contours of ns-LTPs show some common features that were not observed in Ace-AMP1. CONCLUSIONS Although Ace-AMP1 displays a similar global fold to ns-LTPs, it does not present a hydrophobic cavity, which may explain why Ace-AMP1 cannot shuttle lipids between membranes in vitro. The large differences in the electrostatic properties of Ace-AMP1 and ns-LTPs suggest a different mode of interaction with membranes.
Collapse
Affiliation(s)
- J Gomar
- Centre de Biophysique Moleculaire, Orléans, France
| | | | | | | |
Collapse
|
155
|
Abstract
Protein folding and inverse protein folding problems are examined for the extremely simplified model of short self-avoiding square lattice walks involving only two or three residue types. Simple interresidue contact free energy functions are given and are used to determine which sequences fold uniquely to which conformations. Contrary to general theories of protein folding, this model system shows little correlation between free energy and conformational distance from the native, nor is there any marked energy gap between the native and the best non-native structures. Furthermore, even the given free energy function sometimes fails to identify which sequences fold to a particular target structure. If current ideas about protein folding and structure/sequence compatibility fail in this model system, it is unclear why they should be valid for real proteins.
Collapse
Affiliation(s)
- G M Crippen
- College of Pharmacy, University of Michigan, Ann Arbor 48109-1065, USA.
| |
Collapse
|
156
|
de Araújo AF, Pochapsky TC. Estimates for the potential accuracy required in realistic protein folding simulations and structure recognition experiments. FOLDING & DESIGN 1997; 2:135-9. [PMID: 9135986 DOI: 10.1016/s1359-0278(97)00018-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
BACKGROUND We have recently addressed the problem of the potential accuracy required for protein folding simulations using a combination of theoretical considerations and lattice model simulations. In the present study, we combine the previously developed theoretical formalism with the law of corresponding states proposed recently by Onuchic, Wolynes and collaborators and obtain estimates for the potential accuracy required for computational studies of a small helical protein. RESULTS Our estimates suggest that effective energies of interaction between amino acid residues could be measured with an error around +/- 330 cal mol-1 for a resulting inaccurate potential still appropriate for structure recognition experiments, where the native conformation must remain the global energy minimum. For an ab initio folding simulation, where the energy of the native conformation must be sufficient to balance the entropy of the denatured state at a temperature at which the dynamics of the system are fast, the permissible error depends on the simulation temperature and can be as high as +/- 120 cal mol-1. CONCLUSIONS The results indicate that potentials do not need to be extremely accurate in order to be useful in computational studies. Results from different groups can be interpreted as an indication that available potentials are too inaccurate for ab initio simulations but not far from the permissive limit required for structure recognition.
Collapse
Affiliation(s)
- A F de Araújo
- Biophysics Program, Brandeis University, Waltham, MA 02254, USA
| | | |
Collapse
|
157
|
Thomas PD, Dill KA. An iterative method for extracting energy-like quantities from protein structures. Proc Natl Acad Sci U S A 1996; 93:11628-33. [PMID: 8876187 PMCID: PMC38109 DOI: 10.1073/pnas.93.21.11628] [Citation(s) in RCA: 165] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
We present a method (ENERGI) for extracting energy-like quantities from a data base of protein structures. In this paper, we use the method to generate pairwise additive amino acid "energy" scores. These scores are obtained by iteration until they correctly discriminate a set of known protein folds from decoy conformations. The method succeeds in lattice model tests and in the gapless threading problem as defined by Maiorov and Crippen [Maiorov, V. N. & Crippen, G. M. (1992) J. Mol. Biol. 227, 876-888]. A more challenging test of threading a larger set of test proteins derived from the representative set of Hobohm and Sander [Hobohm, U. & Sander, C. (1994) Protein Sci. 3, 522-524] is used as a "workbench" for exploring how the ENERGI scores depend on their parameter sets.
Collapse
Affiliation(s)
- P D Thomas
- Graduate Group in Biophysics, University of California, San Francisco 94143-0448, USA
| | | |
Collapse
|
158
|
Zhao D, Gilfoyle DJ, Smith AT, Loew GH. Refinement of 3D models of horseradish peroxidase isoenzyme C: predictions of 2D NMR assignments and substrate binding sites. Proteins 1996; 26:204-16. [PMID: 8916228 DOI: 10.1002/(sici)1097-0134(199610)26:2<204::aid-prot10>3.0.co;2-t] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
In this study, two alternative three-dimensional (3D) models of horseradish peroxidase (HRP-C)-differing mainly in the structure of a long untemplated insertion-were refined, systematically assessed, and used to make predictions that can both guide and be tested by future experimental studies. A key first step in the model-building process was a procedure for multiple sequence alignment based on structurally conserved regions and key conserved residues, including those side chains providing ligands to the two Ca2+ binding sites. The model refinements reported here include (1) optimization of side-chain conformations; (3) addition of structural waters using a template-independent procedure; (2) structural refinement of the untemplated 34 amino acid insertion located between the F and G helices, using both energy criteria and NMR data; (4) unconstrained energy optimization of the refined models. Using these procedures, two refined structures of HRP-C were obtained, differing mainly in the conformation of this long insertion. The presence of residues in this insertion that could potentially interact with bound substrates suggests a functional role that may be related to the general ability of class III peroxidases to form stable 1:1 complexes with a variety of substrates. The structural validity of the models was systematically assessed by a variety of criteria. Most notably, the ProsaII z scores and Profiles 3D scores of the two HRP-C models indicated that they are significantly better than would be obtained by simple amino acid replacement, using any of the known structures as a template. These two 3D HRP-C models, were then used to predict candidate residues for the assignment of NOESY cross-peaks previously noted in 2D-NMR studies. Specifically, the residues known as Ile X, Phe A, Phe B, aliphatic residue Q, and Ile T. Candidate substrate binding sites were also identified and compared with experimentally based predictions. This work is timely because new X-ray structures are anticipated that will facilitate the validation of these procedures.
Collapse
Affiliation(s)
- D Zhao
- Molecular Research Institute, Palo Alto, California 94304, USA
| | | | | | | |
Collapse
|
159
|
Cheng B, Nayeem A, Scheraga HA. From secondary structure to three-dimensional structure: Improved dihedral angle probability distribution function for use with energy searches for native structures of polypeptides and proteins. J Comput Chem 1996. [DOI: 10.1002/(sici)1096-987x(199609)17:12<1453::aid-jcc6>3.0.co;2-j] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
160
|
Hao MH, Scheraga HA. How optimization of potential functions affects protein folding. Proc Natl Acad Sci U S A 1996; 93:4984-9. [PMID: 8643516 PMCID: PMC39392 DOI: 10.1073/pnas.93.10.4984] [Citation(s) in RCA: 76] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
The relationship between the optimization of the potential function and the foldability of theoretical protein models is studied based on investigations of a 27-mer cubic-lattice protein model and a more realistic lattice model for the protein crambin. In both the simple and the more complicated systems, optimization of the energy parameters achieves significant improvements in the statistical-mechanical characteristics of the systems and leads to foldable protein models in simulation experiments. The foldability of the protein models is characterized by their statistical-mechanical properties--e.g., by the density of states and by Monte Carlo folding simulations of the models. With optimized energy parameters, a high level of consistency exists among different interactions in the native structures of the protein models, as revealed by a correlation function between the optimized energy parameters and the native structure of the model proteins. The results of this work are relevant to the design of a general potential function for folding proteins by theoretical simulations.
Collapse
Affiliation(s)
- M H Hao
- Baker Laboratory of Chemistry, Cornell University, Ithaca, NY 14853-1301, USA
| | | |
Collapse
|
161
|
Jones DT, Moody CM, Uppenbrink J, Viles JH, Doyle PM, Harris CJ, Pearl LH, Sadler PJ, Thornton JM. Towards meeting the Paracelsus Challenge: The design, synthesis, and characterization of paracelsin-43, an alpha-helical protein with over 50% sequence identity to an all-beta protein. Proteins 1996; 24:502-13. [PMID: 8859998 DOI: 10.1002/(sici)1097-0134(199604)24:4<502::aid-prot9>3.0.co;2-f] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
In response to the Paracelsus Challenge (Rose and Creamer, Proteins, 19:1-3, 1994), we present here the design, synthesis, and characterization of a helical protein, whose sequence is 50% identical to that of an all-beta protein. The new sequence was derived by applying an inverse protein folding approach, in which the sequence was optimized to "fit" the new helical structure, but constrained to retain 50% of the original amino acid residues. The program utilizes a genetic algorithm to optimize the sequence, together with empirical potentials of mean force to evaluate the sequence-structure compatibility. Although the designed sequence has little ordered (secondary) structure in water, circular dichroism and nuclear magnetic resonance data show clear evidence for significant helical content in water/ethylene glycol and in water/methanol mixtures at low temperatures, as well as melting behavior indicative of cooperative folding. We believe that this represents a significant step toward meeting the Paracelsus Challenge.
Collapse
Affiliation(s)
- D T Jones
- Department of Biochemistry and Molecular Biology, University College, London, UK
| | | | | | | | | | | | | | | | | |
Collapse
|
162
|
Abstract
There has recently been an explosion in the number of structure-derived potential functions that are based on the increasing number of high-resolution protein crystal structures. These functions differ principally in their reference states; the usual two classes correspond either to initial solvent exposure or to residue exposure of residues. Reference states are critically important for applications of these potentials functions. Inspection of the potential functions and their derivation can tell us not only about protein interaction strengths themselves, but can also provide suggestions for the design of better folding simulations. An appropriate goal in this field is achieving self-consistency between the details in the derivation of potentials and the applied simulations.
Collapse
Affiliation(s)
- R L Jernigan
- Laboratory of Mathematical Biology, National Institutes of Health, Bethesda, MD 20892-5677, USA.
| | | |
Collapse
|
163
|
Abstract
Despite little progress in ab initio solutions to the problem of predicting a protein's tertiary structure, over the past four years or so the development of fold-recognition methods for tertiary structure prediction has been the source of some encouragement in this difficult field. Despite promising initial results, these methods are clearly not yet fully mature and many groups are now working on different aspects of the methods involved in the hope of increasing the reliability and sensitivity of these tools.
Collapse
Affiliation(s)
- D T Jones
- Department of Biological Sciences, University of Warwick, Coventry, UK.
| | | |
Collapse
|
164
|
Sippl MJ, Ortner M, Jaritz M, Lackner P, Flöckner H. Helmholtz free energies of atom pair interactions in proteins. FOLDING & DESIGN 1996; 1:289-98. [PMID: 9079391 DOI: 10.1016/s1359-0278(96)00042-9] [Citation(s) in RCA: 85] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
BACKGROUND Proteins fold to unique three-dimensional structures, but how they achieve this transition and how they maintain their native folds is controversial. Information on the functional form of molecular interactions is required to address these issues. The basic building blocks are the free energies of atom pair interactions in dense protein solvent systems. In a dense medium, entropic effects often dominate over internal energies but free energy estimates are notoriously difficult to obtain. A prominent example is the peptide hydrogen bond (H-bond). It is still unclear to what extent H-bonds contribute to protein folding and stability of native structures. RESULTS Radial distribution functions of atom pair interactions are compiled from a database of known protein folds. The functions are transformed to Helmholtz free energies using a recipe from the statistical mechanics of dense interacting systems. In particular we concentrate on the features of the free energy functions of peptide H-bonds. Differences in Helmholtz free energies correspond to the reversible work required or gained when the distance between two particles is changed. Consequently, the functions directly display the energetic features of the respective thermodynamic process, such as H-bond formation or disruption. CONCLUSIONS In the H-bond potential, a high barrier isolates a deep narrow minimum at H-bond contact from large distances, but the free energy difference between H-bond contact and large distances is close to zero. The energy barrier plays an intriguing role in H-bond formation and disruption: both processes require activation energy in the order of 2kT. H-bond formation opposes folding to compact states, but once formed, H-bonds act as molecular locks and a network of such bonds keeps polypeptide chains in a precise spatial configuration. On the other hand, peptide H-bonds do not contribute to the thermodynamic stability of native folds, because the energy balance of H-bond formation is close to zero.
Collapse
Affiliation(s)
- M J Sippl
- Center for Applied-Molecular Engineering, University of Salzburg, Austria.
| | | | | | | | | |
Collapse
|
165
|
Elofsson A, Fischer D, Rice DW, Le Grand SM, Eisenberg D. A study of combined structure/sequence profiles. FOLDING & DESIGN 1996; 1:451-61. [PMID: 9080191 DOI: 10.1016/s1359-0278(96)00061-2] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
BACKGROUND For genome sequencing projects to achieve their full impact on biology and medicine, each protein sequence must be identified with its three-dimensional structure. Fold assignment methods (also called profile and threading methods) attempt to assign sequences to known protein folds by computing the compatibility of sequence to fold. RESULTS We have extended profile methods for the detection of protein folds having structural similarity but low sequence similarity to sequence probes. Our extension combines sequence substitution tables with structural properties to form a combined profile. The structural properties used in this study include distances between residues, exposed areas, areas buried by polar atoms, and properties of the original three-dimensional profile method. We compared the performance of these combined profiles with different sequence matrices and with the original three-dimensional profile method. To determine the optimal gap penalties and weights used with these profiles, we employed a genetic algorithm. The performance of these combined profiles was tested by cross validation using independent test and training sets. CONCLUSIONS These studies show that the combined profiles perform better than profiles based on either structural or sequence information alone.
Collapse
Affiliation(s)
- A Elofsson
- UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, UCLA 90095-1570, USA
| | | | | | | | | |
Collapse
|
166
|
Pattabiraman N, Ward KB, Fleming PJ. Occluded molecular surface: analysis of protein packing. J Mol Recognit 1995; 8:334-44. [PMID: 9052974 DOI: 10.1002/jmr.300080603] [Citation(s) in RCA: 97] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
We describe a novel method to calculate the packing interactions in protein structural models. The method calculates the interatomic occluded surface areas for each atom in the protein model. The identification of, and degree of interaction with, neighboring atoms is accomplished by extending surface normals from a dot surface of each atom to the point of intersection with neighboring atoms. The combined occluded and non-occluded surface areas may be normalized for the amino acid composition of the protein providing a single parameter, the normalized protein surface ratio, which is diagnostic for native-like structures. Individual residues in the model which are in infrequent occluded surface environments may be identified. The method provides a means to explicitly describe packing densities and packing environments of individual atoms in a protein model. Finally, the method allows estimation of the complementarity between any interacting molecules, for example a ligand binding to a receptor.
Collapse
Affiliation(s)
- N Pattabiraman
- Laboratory for the Structure of Matter, Naval Research Laboratory, Washington, DC 20375-5000, USA
| | | | | |
Collapse
|
167
|
Flöckner H, Braxenthaler M, Lackner P, Jaritz M, Ortner M, Sippl MJ. Progress in fold recognition. Proteins 1995; 23:376-86. [PMID: 8710830 DOI: 10.1002/prot.340230311] [Citation(s) in RCA: 74] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The prediction experiment reveals that fold recognition has become a powerful tool in structural biology. We applied our fold recognition technique to 13 target sequences. In two cases, replication terminating protein and prosequence of subtilisin, the predicted structures are very similar to the experimentally determined folds. For the first time, in a public blind test, the unknown structures of proteins have been predicted ahead of experiment to an accuracy approaching molecular detail. In two other cases the approximate folds have been predicted correctly. According to the assessors there were 12 recognizable folds among the target proteins. In our postprediction analysis we find that in 7 cases our fold recognition technique is successful. In several of the remaining cases the predicted folds have interesting features in common with the experimental results. We present our procedure, discuss the results, and comment on several fundamental and technical problems encountered in fold recognition.
Collapse
Affiliation(s)
- H Flöckner
- Center for Applied Molecular Engineering, University of Salzburg, Austria
| | | | | | | | | | | |
Collapse
|
168
|
Jones DT, Miller RT, Thornton JM. Successful protein fold recognition by optimal sequence threading validated by rigorous blind testing. Proteins 1995; 23:387-97. [PMID: 8710831 DOI: 10.1002/prot.340230312] [Citation(s) in RCA: 78] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Analysis of the results of the recent protein structure prediction experiment for our method shows that we achieved a high level of success. Of the 18 available prediction targets of known structure, the assessors have identified 11 chains which either entirely match a previously known fold, or which partially match a substantial region of a known fold. Of these 11 chains, we made predictions for 9, and correctly assigned the folds in 5 cases. We have also identified a further 2 chains which also partially match known folds, and both of these were correctly predicted. The success rate for our method under blind testing is therefore 7 out of 11 chains. A further 2 folds could have easily been recognized but failed due to either overzealous filtering of potential matches, or to simple human error on our part. One of the two targets for which we did not submit a prediction, prosubtilisin, would not have been recognized by our usual criteria, but even in this case, it is possible that a correct prediction could have been made by considering a combination of pairwise energy and solvation energy Z-scores. Inspection of the threading alignments for the (alpha beta)8 barrels provides clues as to how fold recognition by threading works, in that these folds are recognized by parts rather than as a whole. The prospects for developing sequence threading technology further is discussed.
Collapse
Affiliation(s)
- D T Jones
- Department of Biochemistry and Molecular Biology, University College, London
| | | | | |
Collapse
|
169
|
Godzik A, Koliński A, Skolnick J. Are proteins ideal mixtures of amino acids? Analysis of energy parameter sets. Protein Sci 1995; 4:2107-17. [PMID: 8535247 PMCID: PMC2142984 DOI: 10.1002/pro.5560041016] [Citation(s) in RCA: 119] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Various existing derivations of the effective potentials of mean force for the two-body interactions between amino acid side chains in proteins are reviewed and compared to each other. The differences between different parameter sets can be traced to the reference state used to define the zero of energy. Depending on the reference state, the transfer free energy or other pseudo-one-body contributions can be present to various extents in two-body parameter sets. It is, however, possible to compare various derivations directly by concentrating on the "excess" energy-a term that describes the difference between a real protein and an ideal solution of amino acids. Furthermore, the number of protein structures available for analysis allows one to check the consistency of the derivation and the errors by comparing parameters derived from various subsets of the whole database. It is shown that pair interaction preferences are very consistent throughout the database. Independently derived parameter sets have correlation coefficients on the order of 0.8, with the mean difference between equivalent entries of 0.1 kT. Also, the low-quality (low resolution, little or no refinement) structures show similar regularities. There are, however, large differences between interaction parameters derived on the basis of crystallographic structures and structures obtained by the NMR refinement. The origin of the latter difference is not yet understood.
Collapse
Affiliation(s)
- A Godzik
- Department of Molecular Biology, Scripps Research Institute, La Jolla, California 92037, USA
| | | | | |
Collapse
|
170
|
Elofsson A, Le Grand SM, Eisenberg D. Local moves: an efficient algorithm for simulation of protein folding. Proteins 1995; 23:73-82. [PMID: 8539252 DOI: 10.1002/prot.340230109] [Citation(s) in RCA: 54] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
We have enhanced genetic algorithms and Monte Carlo methods for simulation of protein folding by introducing "local moves" in dihedral space. A local move consists of changes in backbone dihedral angles in a sequential window while the positions of all atoms outside the window remain unchanged. We find three advantages of local moves: (1) For some energy functions, protein conformations of lower energy are found; (2) these low energy conformations are found in fewer steps; and (3) the simulations are less sensitive to the details of the annealing protocol. To distinguish the effectiveness of local move algorithm from the complexity of the energy function, we have used several different energy functions. These energy functions include the Profile score (Bowie et al., Science 253:164-170, 1991), the knowledge-based energy function used by Bowie and Eisenberg 1994 (Proc. Natl. Acad. Sci. U.S.A. 91:4434-4440, 1994), two energy terms developed as suggested by Sippl and coworkers (Hendlich et al., J. Mol. Biol. 216:167-180, 1990), and AMBER (Weiner and Kollman, J. Comp. Chem. 2:287-303, 1981). Besides these energy functions we have used three energy functions that include knowledge of the native structures: the RMSD from the native structure, the distance matrix error, and an energy term based on the distance between different residue types called DBIN. In some of these simulations the main advantage of local moves is the reduced dependence on the details of the annealing schedule. In other simulations, local moves are superior to other algorithms as structures with lower energy are found.
Collapse
Affiliation(s)
- A Elofsson
- UCLA-DOE Lab of Structural Biology and Molecular Medicine, Molecular Biology Institute 90095-1570, USA
| | | | | |
Collapse
|
171
|
Sasai M. Conformation, energy, and folding ability of selected amino acid sequences. Proc Natl Acad Sci U S A 1995; 92:8438-42. [PMID: 7667308 PMCID: PMC41172 DOI: 10.1073/pnas.92.18.8438] [Citation(s) in RCA: 26] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Evolutionary selection of sequences is studied with a knowledge-based Hamiltonian to find the design principle for folding to a model protein structure. With sequences selected by naive energy minimization, the model structure tends to be unstable and the folding ability is low. Sequences with high folding ability have only the low-lying energy minimum but also an energy landscape which is similar to that found for the native sequence over a wide region of the conformation space. Though there is a large fluctuation in foldable sequences, the hydrophobicity pattern and the glycine locations are preserved among them. Implications of the design principle for the molecular mechanism of folding are discussed.
Collapse
Affiliation(s)
- M Sasai
- Graduate School of Human Informatics, Nagoya University, Japan
| |
Collapse
|
172
|
Abstract
A protein sequence with at lease 40% identity to a known structure can now be modelled automatically, with an accuracy approaching that o fa low-resolution X-ray structure or a medium-resolution nuclear magnetic resonance structure. In general, these models have goods stereochemistry and an overall structural accuracy that is as high as the similarity between the template and the actual structure being predicted. As a result, the number of sequences that can be modelled is an order of magnitude larger then the number of experimentally determined protein structures. In addition, evaluation techniques are available that can estimated errors in different regions of the model. Thus, the number of applications where homology modelling is proving useful is growing rapidly.
Collapse
Affiliation(s)
- A Sali
- The Rockefeller University, New York, USA
| |
Collapse
|
173
|
Abstract
One of the major goals of molecular biology is to understand how protein chains fold into a unique three-dimensional structure. Given this knowledge, perhaps the most exciting prospect will be the possibility of designing new proteins to perform designated tasks. The eventual pinnacle of protein engineering will be the fully automated design of a protein with novel structure and function. Achievement of this aim lies far in the future, although some early progress has been made recently.
Collapse
Affiliation(s)
- D T Jones
- Department of Biochemistry and Molecular Biology, University College, London, UK
| |
Collapse
|
174
|
Abstract
A new model for calculating the solvation energy of proteins is developed and tested for its ability to identify the native conformation as the global energy minimum among a group of thousands of computationally generated compact non-native conformations for a series of globular proteins. In the model (called the WZS model), solvation preferences for a set of 17 chemically derived molecular fragments of the 20 amino acids are learned by a training algorithm based on maximizing the solvation energy difference between native and non-native conformations for a training set of proteins. The performance of the WZS model confirms the success of this learning approach; the WZS model misrecognizes (as more stable than native) only 7 of 8,200 non-native structures. Possible applications of this model to the prediction of protein structure from sequence are discussed.
Collapse
Affiliation(s)
- Y Wang
- Department of Molecular Biology, Jilin University, Changchun, People's Republic of China
| | | | | |
Collapse
|
175
|
Abstract
Knowledge based potentials and energy functions are extracted from a number of databases of known protein structures. Recent developments have shown that this type of potential is successful in many areas of protein structure research. Among these are quality assessment and error recognition of folds and the prediction of unknown structures by fold-recognition techniques.
Collapse
Affiliation(s)
- M J Sippl
- Center for Applied Molecular Engineering, University of Salzburg, Austria
| |
Collapse
|
176
|
Abstract
The past two years have seen the rapid development of new recognition methods for protein structure prediction. These algorithms 'thread' the sequence of one protein through the known structure of another, looking for an alignment that corresponds to an energetically favorable model structure. Because they are based on energy calculation, rather than evolutionary distance, these methods extend the possibility of structure prediction by comparative modeling to a larger class of new sequences, where similarity to known structures is recognizable by no other means. The strength of the evidence they offer should be judged by objective statistical tests, however, so as to rule out the possibility that favorable scores arise from chance factors such as similarity of length, composition, or the consideration of a large number of alternative alignments. Calculation of objective p-values by analytical means is not yet possible, but it would appear that approximate values may be obtained by simulation, as they are in gapped, global sequence alignment. We propose that the results of threading experiments should include Z-scores relative to the composition-corrected score distribution obtained for shuffled and optimally aligned sequences.
Collapse
Affiliation(s)
- S H Bryant
- Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | | |
Collapse
|
177
|
|
178
|
Hua QX, Gozani SN, Chance RE, Hoffmann JA, Frank BH, Weiss MA. Structure of a protein in a kinetic trap. NATURE STRUCTURAL BIOLOGY 1995; 2:129-38. [PMID: 7749917 DOI: 10.1038/nsb0295-129] [Citation(s) in RCA: 101] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
We have determined the structure of a metastable disulphide isomer of human insulin. Although not observed for proinsulin folding or insulin-chain recombination, the isomer retains ordered secondary structure and a compact hydrophobic core. Comparison with native insulin reveals a global rearrangement in the orientation of A- and B-chains. One face of the protein's surface is nevertheless in common between native and non-native structures. This face contains receptor-binding determinants, rationalizing the partial biological activity of the isomer. Structures of native and non-native disulphide isomers also define alternative three-dimensional templates. Threading of insulin-like sequences provide an experimental realization of the inverse protein-folding problem.
Collapse
Affiliation(s)
- Q X Hua
- Department of Biochemistry and Molecular Biology, University of Chicago, Illinois 60637-5419, USA
| | | | | | | | | | | |
Collapse
|
179
|
Abstract
On the study of protein inverse folding problem, one goal is to find simple and efficient potential to evaluate the compatibility between structure and a given sequence. We present here a novo empirical mean force potential to address the importance of electrostatic interactions in protein inverse folding study. It is based on protein main chain polar fraction and constructed in a way similar with Sippl's from a database of 64 known independent three-dimensional protein structures. This potential was applied to recognize the protein native conformations among a conformation pool. Calculated results show that this potential is powerful in picking out native conformations, in addition it can also find structure similarity between proteins with low sequence similarity. The success of this new potential clearly shows the importance of electrostatic factors in protein inverse folding studies.
Collapse
Affiliation(s)
- Y Wang
- Department of Chemistry, Peking University, Beijing, P. R. China
| | | | | | | | | |
Collapse
|
180
|
Wang Y, Zhang H, Li W, Scott RA. Discriminating compact nonnative structures from the native structure of globular proteins. Proc Natl Acad Sci U S A 1995; 92:709-13. [PMID: 7846040 PMCID: PMC42689 DOI: 10.1073/pnas.92.3.709] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Prediction of the native tertiary structure of a globular protein from the primary sequence will require a potential energy model that can discriminate all nonnative structures from the native structure(s). A successful model must distinguish not only alternate structures that are very nonnative but also alternate structures that are compact and near-native. We describe here a method, based on molecular dynamics simulation, that allows generation of hundreds of compact alternate structures that are arbitrarily close to the native structure. In this way, a significant amount of conformational space in the neighborhood of the native structure can be sampled and these alternate structures can be used as a stringent test of protein folding models. We have used two sets of these alternate structures generated for six crystallographically characterized small globular proteins (1200 alternate structures in all) to test eight empirical energy models for their ability to discriminate alternate from native structures. Seven of the models fail to correctly identify at least some of the alternate structures as nonnative. An atomic solvation model is presented that succeeds in discriminating all 1200 alternate structures from native.
Collapse
Affiliation(s)
- Y Wang
- Department of Molecular Biology, Jilin University, Changchun, People's Republic of China
| | | | | | | |
Collapse
|
181
|
Abstract
The identification of protein sequences that fold into certain known three-dimensional (3D) structures, or motifs, is evaluated through a probabilistic analysis of their one-dimensional (1D) sequences. We present a correlation method that runs in linear time and incorporates pairwise dependencies between amino acid residues at multiple distances to assess the conditional probability that a given residue is part of a given 3D structure. This method is generalized to multiple motifs, where a dynamic programming approach leads to an efficient algorithm that runs in linear time for practical problems. By this approach, we were able to distinguish (2-stranded) coiled-coil from non-coiled-coil domains and globins from nonglobins. When tested on the Brookhaven X-ray crystal structure database, the method does not produce any false-positive or false-negative predictions of coiled coils.
Collapse
Affiliation(s)
- B Berger
- Mathematics Department, Massachusetts Institute of Technology, Cambridge 02139, USA
| |
Collapse
|
182
|
|
183
|
Eisenhaber F, Persson B, Argos P. Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence. Crit Rev Biochem Mol Biol 1995; 30:1-94. [PMID: 7587278 DOI: 10.3109/10409239509085139] [Citation(s) in RCA: 96] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
This review attempts a critical stock-taking of the current state of the science aimed at predicting structural features of proteins from their amino acid sequences. At the primary structure level, methods are considered for detection of remotely related sequences and for recognizing amino acid patterns to predict posttranslational modifications and binding sites. The techniques involving secondary structural features include prediction of secondary structure, membrane-spanning regions, and secondary structural class. At the tertiary structural level, methods for threading a sequence into a mainchain fold, homology modeling and assigning sequences to protein families with similar folds are discussed. A literature analysis suggests that, to date, threading techniques are not able to show their superiority over sequence pattern recognition methods. Recent progress in the state of ab initio structure calculation is reviewed in detail. The analysis shows that many structural features can be predicted from the amino acid sequence much better than just a few years ago and with attendant utility in experimental research. Best prediction can be achieved for new protein sequences that can be assigned to well-studied protein families. For single sequences without homologues, the folding problem has not yet been solved.
Collapse
Affiliation(s)
- F Eisenhaber
- Institut für Biochemie der Charité, Medizinische Fakultät, Humboldt-Universität zu Berlin, Fed. Rep. Germany
| | | | | |
Collapse
|
184
|
Abstract
A mathematical formalism is introduced that has general applicability to many protein structure models used in the various approaches to the "inverse protein folding problem." The inverse nature of the problem arises from the fact that one begins with a set of assumed tertiary structures and searches for those most compatible with a new sequence, rather than attempting to predict the structure directly from the new sequence. The formalism is based on the well-known theory of Markov random fields (MRFs). Our MRF formulation provides explicit representations for the relevant amino acid position environments and the physical topologies of the structural contacts. In particular, MRF models can readily be constructed for the secondary structure packing topologies found in protein domain cores, or other structural motifs, that are anticipated to be common among large sets of both homologous and nonhomologous proteins. MRF models are probabilistic and can exploit the statistical data from the limited number of proteins having known domain structures. The MRF approach leads to a new scoring function for comparing different threadings (placements) of a sequence through different structure models. The scoring function is very important, because comparing alternative structure models with each other is a key step in the inverse folding problem. Unlike previously published scoring functions, the one derived in this paper is based on a comprehensive probabilistic formulation of the threading problem.
Collapse
Affiliation(s)
- J V White
- TASC, Walkers Brook Drive, Reading, Massachusetts
| | | | | |
Collapse
|
185
|
Koehl P, Delarue M. Polar and nonpolar atomic environments in the protein core: implications for folding and binding. Proteins 1994; 20:264-78. [PMID: 7892175 DOI: 10.1002/prot.340200307] [Citation(s) in RCA: 73] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Hydrophobic interactions are believed to play an important role in protein folding and stability. Semi-empirical attempts to estimate these interactions are usually based on a model of solvation, whose contribution to the stability of proteins is assumed to be proportional to the surface area buried upon folding. Here we propose an extension of this idea by defining an environment free energy that characterizes the environment of each atom of the protein, including solvent, polar or nonpolar atoms of the same protein or of another molecule that interacts with the protein. In our model, the difference of this environment free energy between the folded state and the unfolded (extended) state of a protein is shown to be proportional to the area buried by nonpolar atoms upon folding. General properties of this environment free energy are derived from statistical studies on a database of 82 well-refined protein structures. This free energy is shown to be able to discriminate misfolded from correct structural models, to provide an estimate of the stabilization due to oligomerization, and to predict the stability of mutants in which hydrophobic residues have been substituted by site-directed mutagenesis, provided that no large structural modifications occur.
Collapse
Affiliation(s)
- P Koehl
- UPR 9003 Cancérogénèse et Mutagénèse Moléculaire et Structurale du CNRS, Graffenstaden, France
| | | |
Collapse
|
186
|
Le Grand SM, Merz KM. The Genetic Algorithm and the Conformational Search of Polypeptides and Proteins. MOLECULAR SIMULATION 1994. [DOI: 10.1080/08927029408021995] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
187
|
Abstract
Over the last few years we have developed an empirical potential function that solves the protein structure recognition problem: given the sequence for an n-residue globular protein and a collection of plausible protein conformations, including the native conformation for that sequence, identify the correct, native conformation. Having determined this potential on the basis of only some 6500 native/nonnative pairs of structures for 58 proteins, we find it recognizes the native conformation for essentially all compact, soluble, globular proteins having known native conformations in comparisons with 10(4) to 10(6) reasonable alternative conformations apiece. In this sense, the potential encodes nearly all the essential features of globular protein conformational preference. In addition it "knows" about many additional factors in protein folding, such as the stabilization of multimeric proteins, quaternary structure, the role of disulfide bridges and ligands, proproteins vs. processed proteins, and minimal strand lengths in globular proteins. Comparisons are made with other sorts of protein folding problems, and applications in protein conformational determination and prediction are discussed.
Collapse
Affiliation(s)
- V N Maiorov
- College of Pharmacy, University of Michigan, Ann Arbor 48109
| | | |
Collapse
|
188
|
Anthonsen HW, Baptista A, Drabløs F, Martel P, Petersen SB. The blind watchmaker and rational protein engineering. J Biotechnol 1994; 36:185-220. [PMID: 7765263 PMCID: PMC7173218 DOI: 10.1016/0168-1656(94)90152-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/1994] [Accepted: 04/23/1994] [Indexed: 01/27/2023]
Abstract
In the present review some scientific areas of key importance for protein engineering are discussed, such as problems involved in deducting protein sequence from DNA sequence (due to posttranscriptional editing, splicing and posttranslational modifications), modelling of protein structures by homology, NMR of large proteins (including probing the molecular surface with relaxation agents), simulation of protein structures by molecular dynamics and simulation of electrostatic effects in proteins (including pH-dependent effects). It is argued that all of these areas could be of key importance in most protein engineering projects, because they give access to increased and often unique information. In the last part of the review some potential areas for future applications of protein engineering approaches are discussed, such as non-conventional media, de novo design and nanotechnology.
Collapse
|
189
|
Abstract
Through the comprehensive analysis of protein sequence and structural data, relationships can be established that suggest, with varying degrees of success, structural models for a protein for which only the sequence is known. The certainty with which a model can be proposed depends on the degree of similarity between the sequence of unknown structure and the sequence of a protein of known structure. Methods are being developed to detect remote similarities between sequences or structures, and to predict protein structure based on such small levels of similarity.
Collapse
Affiliation(s)
- W R Taylor
- Laboratory of Mathematical Biology, National Institute for Medical Research, London, UK
| |
Collapse
|
190
|
Abstract
One of the major goals of molecular biology is to understand how protein chains fold into a unique 3-dimensional structure. Given this knowledge, perhaps the most exciting prospect will be the possibility of designing new proteins to perform designated tasks, an application that could prove to be of great importance in medicine and biotechnology. It is possible that effective protein design may be achieved without the requirement for a full understanding of the protein folding process. In this paper a simple method is described for designing an amino acid sequence to fit a given 3-dimensional structure. The compatibility of a designed sequence with a given fold is assessed by means of a set of statistically determined potentials (including interresidue pairwise and solvation terms), which have been previously applied to the problem of protein fold recognition. In order to generate sequences that best fit the fold, a genetic algorithm is used, whereby the sequence is optimized by a stochastic search in the style of natural selection.
Collapse
Affiliation(s)
- D T Jones
- Department of Biochemistry and Molecular Biology, University College, London United Kingdom
| |
Collapse
|
191
|
Abstract
We present a novel method to improve a simple pair potential of mean force, derived from experimentally determined protein structures, in such a way that it recognizes native protein folds with high reliability. This improvement is based on the use of mutation data matrices to overcome difficulties arising from the poor statistics of small sample sizes. A set of 167 protein chains taken from the Brookhaven Protein Structure Data Base, selected from high-resolution structures and avoiding homologous proteins, is used for generation of the potential set. The potential describes interresidue pair energies depending on distance and sequential separation, and is calculated using the Boltzmann equation. Its performance is evaluated by jackknife tests that try to identify the native fold for a given sequence among a large number of possible threadings on all structures in the set without allowing for gaps. Up to 94% of the protein chains are correctly assigned to their native folds, so that all proper single-chain domains are recognized.
Collapse
Affiliation(s)
- A Bauer
- Research Institute of Molecular Pathology, Vienna, Austria
| | | |
Collapse
|
192
|
Abstract
Knowledge, both from the three-dimensional structures of homologous proteins and from the general analysis of protein structure, is of value in modeling a protein of known sequence but unknown structure. While many models are still constructed at least in part by manual methods on graphics devices, automated procedures have come into greater use. These procedures include those that assemble fragments of structure from other known structures and those that derive coordinates for the model from the satisfaction of restraints placed on atomic positions.
Collapse
Affiliation(s)
- M S Johnson
- Imperial Cancer Research Fund, Department of Crystallography, Birkbeck College, London
| | | | | | | |
Collapse
|
193
|
|
194
|
Abstract
A major problem in the determination of the three-dimensional structure of proteins concerns the quality of the structural models obtained from the interpretation of experimental data. New developments in X-ray crystallography and nuclear magnetic resonance spectroscopy have accelerated the process of structure determination and the biological community is confronted with a steadily increasing number of experimentally determined protein folds. However, in the recent past several experimentally determined protein structures have been proven to contain major errors, indicating that in some cases the interpretation of experimental data is difficult and may yield incorrect models. Such problems can be avoided when computational methods are employed which complement experimental structure determinations. A prerequisite of such computational tools is that they are independent of the parameters obtained from a particular experiment. In addition such techniques are able to support and accelerate experimental structure determinations. Here we present techniques based on knowledge based mean fields which can be used to judge the quality of protein folds. The methods can be used to identify misfolded structures as well as faulty parts of structural models. The techniques are even applicable in cases where only the C alpha trace of a protein conformation is available. The capabilities of the technique are demonstrated using correct and incorrect protein folds.
Collapse
Affiliation(s)
- M J Sippl
- Center for Applied Molecular Engineering, University of Salzburg, Austria
| |
Collapse
|
195
|
Herzyk P, Hubbard RE. A reduced representation of proteins for use in restraint satisfaction calculations. Proteins 1993; 17:310-24. [PMID: 8272428 DOI: 10.1002/prot.340170308] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
A reduced representation of proteins has been developed for use in restraint satisfaction calculations with dynamic simulated annealing. Each amino acid residue is represented by up to four spherical virtual atoms. The virtual bonds and excluded volume of these atoms has been parameterized by analysis of 83 protein structures determined at high resolution by X-ray crystallography. The use of the new representation in NOE distance restraint satisfaction has been compared with the standard all-atom representation for the determination of the structures of crambin, echistatin, and protein G. Using the reduced representation, there is a 30-fold decrease in the computer time needed for generating a single structure, and up to a 20-fold decrease in the time taken to produce an acceptable structure compared to using the all-atom representation. The root mean square deviation between the mean structure obtained with all-atom and reduced representations is between 1.5 and 1.7 A for C alpha atoms. The new representation is adequate for describing the "low-resolution" features of protein structure such as the general fold and the positions of secondary structure elements. It can also provide an initial structure for more detailed refinement with the full all-atom representation.
Collapse
Affiliation(s)
- P Herzyk
- Department of Chemistry, University of York, Heslington, England
| | | |
Collapse
|
196
|
Abstract
A novel method for differentiating between correctly and incorrectly determined regions of protein structures based on characteristic atomic interaction is described. Different types of atoms are distributed nonrandomly with respect to each other in proteins. Errors in model building lead to more randomized distributions of the different atom types, which can be distinguished from correct distributions by statistical methods. Atoms are classified in one of three categories: carbon (C), nitrogen (N), and oxygen (O). This leads to six different combinations of pairwise noncovalently bonded interactions (CC, CN, CO, NN, NO, and OO). A quadratic error function is used to characterize the set of pairwise interactions from nine-residue sliding windows in a database of 96 reliable protein structures. Regions of candidate protein structures that are mistraced or misregistered can then be identified by analysis of the pattern of nonbonded interactions from each window.
Collapse
Affiliation(s)
- C Colovos
- Department of Chemistry & Biochemistry, University of California, Los Angeles 90024-1569
| | | |
Collapse
|
197
|
Unger R, Sussman JL. The importance of short structural motifs in protein structure analysis. J Comput Aided Mol Des 1993; 7:457-72. [PMID: 8229095 DOI: 10.1007/bf02337561] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Proteins tend to use recurrent structural motifs on all levels of organization. In this paper we first survey the topics of recurrent motifs on the local secondary structure level and on the global fold level. Then, we focus on the intermediate level which we call the short structural motifs. We were able to identify a set of structural building blocks that are very common in protein structure. We suggest that these building blocks can be used as an important link between the primary sequence and the tertiary structure. In this framework, we present our latest results on the structural variability of the extended strand motifs. We show that extended strands can be divided into three distinct structural classes, each with its own sequence specificity. Other approaches to the study of short structural motifs are reviewed.
Collapse
Affiliation(s)
- R Unger
- Center for Advanced Research in Biotechnology, University of Maryland, Rockville 20850
| | | |
Collapse
|
198
|
Sippl MJ. Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures. J Comput Aided Mol Des 1993; 7:473-501. [PMID: 8229096 DOI: 10.1007/bf02337562] [Citation(s) in RCA: 269] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The data base of known protein structures contains a tremendous amount of information on protein-solvent systems. Boltzmann's principle enables the extraction of this information in the form of potentials of mean force. The resulting force field constitutes an energetic model for protein-solvent systems. We outline the basic physical principles of this approach to protein folding and summarize several techniques which are useful in the development of knowledge-based force fields. Among the applications presented are the validation of experimentally determined protein structures, data base searches which aim at the identification of native-like sequence structure pairs, sequence structure alignments and the calculation of protein conformations from amino acid sequences.
Collapse
Affiliation(s)
- M J Sippl
- Center for Applied Molecular Engineering, University of Salzburg, Austria
| |
Collapse
|
199
|
Abstract
An important, yet seemingly unattainable, goal in structural molecular biology is to be able to predict the native three-dimensional structure of a protein entirely from its amino acid sequence. Prediction methods based on rigorous energy calculations have not yet been successful, and best results have been obtained from homology modelling and statistical secondary structure prediction. Homology modelling is limited to cases where significant sequence similarity is shared between a protein of known structure and the unknown. Secondary structure prediction methods are not only unreliable, but also do not offer any obvious route to the full tertiary structure. Recently, methods have been developed whereby entire protein folds are recognized from sequence, even where little or no sequence similarity is shared between the proteins under consideration. In this paper we review the current methods, including our own, and in particular offer a historical background to their development. In addition, we also discuss the future of these methods and outline the developments under investigation in our laboratory.
Collapse
Affiliation(s)
- D Jones
- Department of Biochemistry and Molecular Biology, University College, London, U.K
| | | |
Collapse
|
200
|
Abstract
Different components of the theoretical protein folding problem are evaluated critically. It is argued that: (i) as a rule, small- and medium-sized proteins are in the free energy minimum; (ii) long-living metastable states may either appear occasionally with growing protein size, or be selected by evolution for a specific function; (iii) functions discriminating against incorrect folds would fail if they were used directly in the global optimization, unless they approximate the true free energy accurately; (iv) surface and electrostatic free energies should be treated separately; (v) conformational entropy (of side chains in particular) should be taken into account; (vi) Monte Carlo procedures considering all free energy terms and combining global knowledge-based random moves with local optimization have the largest potential for success.
Collapse
Affiliation(s)
- R A Abagyan
- European Molecular Biology Laboratory, Heidelberg, Germany
| |
Collapse
|