1
|
Ferrada E. The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets. PLoS Comput Biol 2014; 10:e1003946. [PMID: 25473967 PMCID: PMC4256021 DOI: 10.1371/journal.pcbi.1003946] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2014] [Accepted: 09/26/2014] [Indexed: 11/19/2022] Open
Abstract
The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet.
Collapse
Affiliation(s)
- Evandro Ferrada
- Santa Fe Institute, Santa Fe, New Mexico, United States of America
| |
Collapse
|
2
|
Abstract
By focusing on essential features, while averaging over less important details, coarse-grained (CG) models provide significant computational and conceptual advantages with respect to more detailed models. Consequently, despite dramatic advances in computational methodologies and resources, CG models enjoy surging popularity and are becoming increasingly equal partners to atomically detailed models. This perspective surveys the rapidly developing landscape of CG models for biomolecular systems. In particular, this review seeks to provide a balanced, coherent, and unified presentation of several distinct approaches for developing CG models, including top-down, network-based, native-centric, knowledge-based, and bottom-up modeling strategies. The review summarizes their basic philosophies, theoretical foundations, typical applications, and recent developments. Additionally, the review identifies fundamental inter-relationships among the diverse approaches and discusses outstanding challenges in the field. When carefully applied and assessed, current CG models provide highly efficient means for investigating the biological consequences of basic physicochemical principles. Moreover, rigorous bottom-up approaches hold great promise for further improving the accuracy and scope of CG models for biomolecular systems.
Collapse
Affiliation(s)
- W G Noid
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
3
|
Abstract
We extend PRIME, an intermediate-resolution protein model previously used in simulations of the aggregation of polyalanine and polyglutamine, to the description of the geometry and energetics of peptides containing all 20 amino acid residues. The 20 amino acid side chains are classified into 14 groups according to their hydrophobicity, polarity, size, charge, and potential for side chain hydrogen bonding. The parameters for extended PRIME, called PRIME 20, include hydrogen-bonding energies, side chain interaction range and energy, and excluded volume. The parameters are obtained by applying a perceptron-learning algorithm and a modified stochastic learning algorithm that optimizes the energy gap between 711 known native states from the PDB and decoy structures generated by gapless threading. The number of independent pair interaction parameters is chosen to be small enough to be physically meaningful yet large enough to give reasonably accurate results in discriminating decoys from native structures. The most physically meaningful results are obtained with 19 energy parameters.
Collapse
Affiliation(s)
- Mookyung Cheon
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina, USA
| | | | | |
Collapse
|
4
|
Abstract
Knowledge-based approaches frequently employ empirical relations to determine effective potentials for coarse-grained protein models directly from protein databank structures. Although these approaches have enjoyed considerable success and widespread popularity in computational protein science, their fundamental basis has been widely questioned. It is well established that conventional knowledge-based approaches do not correctly treat many-body correlations between amino acids. Moreover, the physical significance of potentials determined by using structural statistics from different proteins has remained obscure. In the present work, we address both of these concerns by introducing and demonstrating a theory for calculating transferable potentials directly from a databank of protein structures. This approach assumes that the databank structures correspond to representative configurations sampled from equilibrium solution ensembles for different proteins. Given this assumption, this physics-based theory exactly treats many-body structural correlations and directly determines the transferable potentials that provide a variationally optimized approximation to the free energy landscape for each protein. We illustrate this approach by first constructing a databank of protein structures using a model potential and then quantitatively recovering this potential from the structure databank. The proposed framework will clarify the assumptions and physical significance of knowledge-based potentials, allow for their systematic improvement, and provide new insight into many-body correlations and cooperativity in folded proteins.
Collapse
|
5
|
Betancourt MR. Another look at the conditions for the extraction of protein knowledge-based potentials. Proteins 2009; 76:72-85. [PMID: 19089977 DOI: 10.1002/prot.22320] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Protein knowledge-based potentials are effective free energies obtained from databases of known protein structures. They are used to parameterize coarse-grained protein models in many folding simulation and structure prediction methods. Two common approaches are used in the derivation of knowledge-based potentials. One assumes that the energy parameters optimize the native structure stability. The other assumes that interaction events are related to their energies according to the Boltzmann distribution, and that they are distributed independently of other events, that is, the quasi-chemical approximation. Here, these assumptions are systematically tested by extracting contact energies from artificial databases of lattice proteins with predefined pairwise contact energies. Databases of protein sequences are designed to either satisfy the Boltzmann distribution at high or low temperatures, or to simultaneously optimize the native stability and folding kinetics. It is found that the quasi-chemical approximation, with the ideal reference state, accurately reproduce the true energies for high temperature Boltzmann distributed sequences (weakly interacting residues), but less accurately at low temperatures, where the sequences correspond to energy minima and the residues are strongly interacting. To overcome this problem, an iterative procedure for Boltzmann distributed sequences is introduced, which accounts for interacting residue correlations and eliminates the need for the quasi-chemical approximation. In this case, the energies are accurately reproduced at any ensemble temperature. However, when the database of sequences designed for optimal stability and kinetics is used, the energy correlation is less than optimal using either method, exhibiting random and systematic deviations from linearity. Therefore, the assumption that native structures are maximally stable or that sequences are determined according to the Boltzmann distribution seems to be inadequate for obtaining accurate energies. The limited number of sequences in the database and the inhomogeneous concentration of amino acids from one structure to another do not seem to be major obstacles for improving the quality of the extracted pairwise energies, with the exception of repulsive interactions.
Collapse
Affiliation(s)
- Marcos R Betancourt
- Department of Physics, Indiana University Purdue University Indianapolis, Indianapolis, Indiana 46202, USA.
| |
Collapse
|
6
|
Wu JC, Gardner DP, Ozer S, Gutell RR, Ren P. Correlation of RNA secondary structure statistics with thermodynamic stability and applications to folding. J Mol Biol 2009; 391:769-83. [PMID: 19540243 DOI: 10.1016/j.jmb.2009.06.036] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2009] [Revised: 06/05/2009] [Accepted: 06/12/2009] [Indexed: 11/15/2022]
Abstract
The accurate prediction of the secondary and tertiary structure of an RNA with different folding algorithms is dependent on several factors, including the energy functions. However, an RNA higher-order structure cannot be predicted accurately from its sequence based on a limited set of energy parameters. The inter- and intramolecular forces between this RNA and other small molecules and macromolecules, in addition to other factors in the cell such as pH, ionic strength, and temperature, influence the complex dynamics associated with transition of a single stranded RNA to its secondary and tertiary structure. Since all of the factors that affect the formation of an RNAs 3D structure cannot be determined experimentally, statistically derived potential energy has been used in the prediction of protein structure. In the current work, we evaluate the statistical free energy of various secondary structure motifs, including base-pair stacks, hairpin loops, and internal loops, using their statistical frequency obtained from the comparative analysis of more than 50,000 RNA sequences stored in the RNA Comparative Analysis Database (rCAD) at the Comparative RNA Web (CRW) Site. Statistical energy was computed from the structural statistics for several datasets. While the statistical energy for a base-pair stack correlates with experimentally derived free energy values, suggesting a Boltzmann-like distribution, variation is observed between different molecules and their location on the phylogenetic tree of life. Our statistical energy values calculated for several structural elements were utilized in the Mfold RNA-folding algorithm. The combined statistical energy values for base-pair stacks, hairpins and internal loop flanks result in a significant improvement in the accuracy of secondary structure prediction; the hairpin flanks contribute the most.
Collapse
Affiliation(s)
- Johnny C Wu
- Department of Biomedical Engineering, University of Texas at Austin, 78712-1062, USA
| | | | | | | | | |
Collapse
|
7
|
Rajgaria R, McAllister SR, Floudas CA. Distance dependent centroid to centroid force fields using high resolution decoys. Proteins 2008; 70:950-70. [PMID: 17847088 DOI: 10.1002/prot.21561] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Simplified force fields play an important role in protein structure prediction and de novo protein design by requiring less computational effort than detailed atomistic potentials. A side chain centroid based, distance dependent pairwise interaction potential has been developed. A linear programming based formulation was used in which non-native "decoy" conformers are forced to take a higher energy compared with the corresponding native structure. This model was trained on an enhanced and diverse protein set. High quality decoy structures were generated for approximately 1400 nonhomologous proteins using torsion angle dynamics along with restricted variations of the hydrophobic cores of the native structure. The resulting decoy set was used to train the model yielding two different side chain centroid based force fields that differ in the way distance dependence has been used to calculate energy parameters. These force fields were tested on an independent set of 148 test proteins with 500 decoy structures for each protein. The side chain centroid force fields were successful in correctly identifying approximately 86% native structures. The Z-scores produced by the proposed centroid-centroid distance dependent force fields improved compared with other distance dependent C(alpha)-C(alpha) or side chain based force fields.
Collapse
Affiliation(s)
- R Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | |
Collapse
|
8
|
Merlino A, Sica F, Mazzarella L. Approximate Values for Force Constant and Wave Number Associated with a Low-Frequency Concerted Motion in Proteins Can Be Evaluated by a Comparison of X-ray Structures. J Phys Chem B 2007; 111:5483-6. [PMID: 17429995 DOI: 10.1021/jp071399h] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Low-frequency internal motions in protein molecules play a key role in biological functions. A direct relationship between low-frequency motions and enzymatic activity has been suggested for bovine pancreatic ribonuclease (RNase A). The flexibility-function relationship in this enzyme has been attributed to a subtle and concerted breathing motion of the beta-sheet regions occurring upon substrate binding and release. Here, we calculate an approximate value for the force constant and the wave number of the low-frequency beta-sheet breathing motion of RNase A, by using the Boltzmann hypothesis on a set of data derived from a simple conventional structural superimposition of an unusual large number of X-ray structures available for the protein. The results agree with previous observations and with theoretical predictions on the basis of normal-mode analysis. To the best of our knowledge, this is the first example in which the wave number and the force constant of a low-frequency concerted motion in a protein are directly derived from X-ray structures.
Collapse
Affiliation(s)
- Antonello Merlino
- Dipartimento di Chimica, Università degli Studi di Napoli Federico I", Via Cynthia, 80126 Napoli, Italy
| | | | | |
Collapse
|
9
|
Rajgaria R, McAllister SR, Floudas CA. A novel high resolution Calpha--Calpha distance dependent force field based on a high quality decoy set. Proteins 2007; 65:726-41. [PMID: 16981202 DOI: 10.1002/prot.21149] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
This work presents a novel C(alpha)--C(alpha) distance dependent force field which is successful in selecting native structures from an ensemble of high resolution near-native conformers. An enhanced and diverse protein set, along with an improved decoy generation technique, contributes to the effectiveness of this potential. High quality decoys were generated for 1489 nonhomologous proteins and used to train an optimization based linear programming formulation. The goal in developing a set of high resolution decoys was to develop a simple, distance-dependent force field that yields the native structure as the lowest energy structure and assigns higher energies to decoy structures that are quite similar as well as those that are less similar. The model also includes a set of physical constraints that were based on experimentally observed physical behavior of the amino acids. The force field was tested on two sets of test decoys not in the training set and was found to excel on all the metrics that are widely used to measure the effectiveness of a force field. The high resolution force field was successful in correctly identifying 113 native structures out of 150 test cases and the average rank obtained for this test was 1.87. All the high resolution structures (training and testing) used for this work are available online and can be downloaded from http://titan.princeton.edu/HRDecoys.
Collapse
Affiliation(s)
- R Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | |
Collapse
|
10
|
Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R. Advances in protein structure prediction and de novo protein design: A review. Chem Eng Sci 2006. [DOI: 10.1016/j.ces.2005.04.009] [Citation(s) in RCA: 175] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
11
|
Schiffer C, Hermans J. Promise of advances in simulation methods for protein crystallography: implicit solvent models, time-averaging refinement, and quantum mechanical modeling. Methods Enzymol 2004; 374:412-61. [PMID: 14696384 DOI: 10.1016/s0076-6879(03)74019-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
Affiliation(s)
- Celia Schiffer
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts, Medical School, Worcesster, Massachusetts 01655, USA
| | | |
Collapse
|
12
|
Butterfoss GL, Hermans J. Boltzmann-type distribution of side-chain conformation in proteins. Protein Sci 2003; 12:2719-31. [PMID: 14627733 PMCID: PMC2366981 DOI: 10.1110/ps.03273303] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2003] [Revised: 08/29/2003] [Accepted: 08/29/2003] [Indexed: 10/26/2022]
Abstract
We analyze packing imperfections in globular proteins as reflected in deviations of torsion angles from the equilibrium values for the isolated side chains. The distribution of conformations of methionine and lysine residues in a database of high-resolution structures is compared with energies of model compounds calculated with high-level quantum-mechanics. The distribution of the C-C and C-S torsion angles (chi(3)) correlates well with the Boltzmann factor of the torsion energy, exp(-betaE) of the model compounds C(2)H(5)-C(2)H(5) and C(2)H(5)-S-CH(3). An exponential relation was again found between the relative occurrence of g+, g- and t conformations for C(alpha)-C(beta) bonds in long side chains and the energy differences of rotamers of alpha-amino n-butyric acid, when dependence on backbone conformation was taken into account. The distribution of all 27 rotamers of methionine was correlated with the energy differences between the model's rotamers, corrected for clashes with nearby residues, the correlation being good for a set with backbone in the beta-conformation, but less clear for backbone alpha-conformation. In all correlations, the value of the coefficient beta corresponds to a temperature of circa 300 K. These results can be interpreted with a model that considers the structure of a folded protein as resulting from packing imperfectly complementary parts, with a requirement of an overall low energy. Compromises are required to optimize the fit of nonbonded contacts with surrounding groups, and side chains assume conformations away from the energy minimum. An exponential distribution is a most probable distribution, and this can be established easily under conditions other than thermal equilibrium.
Collapse
Affiliation(s)
- Glenn L Butterfoss
- Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599-7260, USA
| | | |
Collapse
|
13
|
Dumontier M, Michalickova K, Hogue CWV. Species-specific protein sequence and fold optimizations. BMC Bioinformatics 2002; 3:39. [PMID: 12487631 PMCID: PMC139977 DOI: 10.1186/1471-2105-3-39] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2002] [Accepted: 12/17/2002] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND An organism's ability to adapt to its particular environmental niche is of fundamental importance to its survival and proliferation. In the largest study of its kind, we sought to identify and exploit the amino-acid signatures that make species-specific protein adaptation possible across 100 complete genomes. RESULTS Environmental niche was determined to be a significant factor in variability from correspondence analysis using the amino acid composition of over 360,000 predicted open reading frames (ORFs) from 17 archaea, 76 bacteria and 7 eukaryote complete genomes. Additionally, we found clusters of phylogenetically unrelated archaea and bacteria that share similar environments by amino acid composition clustering. Composition analyses of conservative, domain-based homology modeling suggested an enrichment of small hydrophobic residues Ala, Gly, Val and charged residues Asp, Glu, His and Arg across all genomes. However, larger aromatic residues Phe, Trp and Tyr are reduced in folds, and these results were not affected by low complexity biases. We derived two simple log-odds scoring functions from ORFs (CG) and folds (CF) for each of the complete genomes. CF achieved an average cross-validation success rate of 85 +/- 8% whereas the CG detected 73 +/- 9% species-specific sequences when competing against all other non-redundant CG. Continuously updated results are available at http://genome.mshri.on.ca. CONCLUSION Our analysis of amino acid compositions from the complete genomes provides stronger evidence for species-specific and environmental residue preferences in genomic sequences as well as in folds. Scoring functions derived from this work will be useful in future protein engineering experiments and possibly in identifying horizontal transfer events.
Collapse
Affiliation(s)
- Michel Dumontier
- Department of Biochemistry, University of Toronto, Toronto, Ontario, M5S 1A8, Canada
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, 600 University Ave., Toronto, Ontario, M5G 1X5 Canada
| | - Katerina Michalickova
- Department of Biochemistry, University of Toronto, Toronto, Ontario, M5S 1A8, Canada
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, 600 University Ave., Toronto, Ontario, M5G 1X5 Canada
| | - Christopher WV Hogue
- Department of Biochemistry, University of Toronto, Toronto, Ontario, M5S 1A8, Canada
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, 600 University Ave., Toronto, Ontario, M5G 1X5 Canada
| |
Collapse
|
14
|
Abstract
A protein structure model generally needs to be evaluated to assess whether or not it has the correct fold. To improve fold assessment, four types of a residue-level statistical potential were optimized, including distance-dependent, contact, Phi/Psi dihedral angle, and accessible surface statistical potentials. Approximately 10,000 test models with the correct and incorrect folds were built by automated comparative modeling of protein sequences of known structure. The criterion used to discriminate between the correct and incorrect models was the Z-score of the model energy. The performance of a Z-score was determined as a function of many variables in the derivation and use of the corresponding statistical potential. The performance was measured by the fractions of the correctly and incorrectly assessed test models. The most discriminating combination of any one of the four tested potentials is the sum of the normalized distance-dependent and accessible surface potentials. The distance-dependent potential that is optimal for assessing models of all sizes uses both C(alpha) and C(beta) atoms as interaction centers, distinguishes between all 20 standard residue types, has the distance range of 30 A, and is derived and used by taking into account the sequence separation of the interacting atom pairs. The terms for the sequentially local interactions are significantly less informative than those for the sequentially nonlocal interactions. The accessible surface potential that is optimal for assessing models of all sizes uses C(beta) atoms as interaction centers and distinguishes between all 20 standard residue types. The performance of the tested statistical potentials is not likely to improve significantly with an increase in the number of known protein structures used in their derivation. The parameters of fold assessment whose optimal values vary significantly with model size include the size of the known protein structures used to derive the potential and the distance range of the accessible surface potential. Fold assessment by statistical potentials is most difficult for the very small models. This difficulty presents a challenge to fold assessment in large-scale comparative modeling, which produces many small and incomplete models. The results described in this study provide a basis for an optimal use of statistical potentials in fold assessment.
Collapse
Affiliation(s)
- Francisco Melo
- Laboratories of Molecular Biophysics, Pels Family Center for Biochemistry and Structural Biology, The Rockefeller University, New York, New York 10021, USA
| | | | | |
Collapse
|
15
|
Martí-Renom MA, Stuart AC, Fiser A, Sánchez R, Melo F, Sali A. Comparative protein structure modeling of genes and genomes. ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE 2001; 29:291-325. [PMID: 10940251 DOI: 10.1146/annurev.biophys.29.1.291] [Citation(s) in RCA: 2354] [Impact Index Per Article: 102.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Comparative modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. The number of protein sequences that can be modeled and the accuracy of the predictions are increasing steadily because of the growth in the number of known protein structures and because of the improvements in the modeling software. Further advances are necessary in recognizing weak sequence-structure similarities, aligning sequences with structures, modeling of rigid body shifts, distortions, loops and side chains, as well as detecting errors in a model. Despite these problems, it is currently possible to model with useful accuracy significant parts of approximately one third of all known protein sequences. The use of individual comparative models in biology is already rewarding and increasingly widespread. A major new challenge for comparative modeling is the integration of it with the torrents of data from genome sequencing projects as well as from functional and structural genomics. In particular, there is a need to develop an automated, rapid, robust, sensitive, and accurate comparative modeling pipeline applicable to whole genomes. Such large-scale modeling is likely to encourage new kinds of applications for the many resulting models, based on their large number and completeness at the level of the family, organism, or functional network.
Collapse
Affiliation(s)
- M A Martí-Renom
- Laboratories of Molecular Biophysics, Pels Family Center for Biochemistry and Structural Biology, Rockefeller University, New York, NY 10021, USA
| | | | | | | | | | | |
Collapse
|
16
|
Vijayakumar M, Zhou HX. Prediction of Residue−Residue Pair Frequencies in Proteins. J Phys Chem B 2000. [DOI: 10.1021/jp001757f] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- M. Vijayakumar
- Department of Physics, Drexel University, Philadelphia, Pennsylvania 19104
| | - Huan-Xiang Zhou
- Department of Physics, Drexel University, Philadelphia, Pennsylvania 19104
| |
Collapse
|
17
|
Abstract
A simple electrostatic model has been used to investigate the extent to which the structure of protein molecules is organized to optimize the internal electrostatic interactions. We find that the model provides a favorable total intra-protein electrostatic energy for almost all polar and charged groups of atoms, suggesting a high degree of structural optimization. By contrast, a significant fraction of individual group-group interactions are found to be unfavorable. An analysis as a function of the range of interactions included shows the electrostatic organization is generally relatively short range (up to 6 or 7 A between group centers). Although the model is very simple, it is useful for assessing the overall quality of protein experimental structures, for pin-pointing some types of errors and as a guide to improving protein design.
Collapse
Affiliation(s)
- M T Oliva
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, 9600 Gudelsky Drive, Rockville, MD 20850, USA
| | | |
Collapse
|
18
|
Miyazawa S, Jernigan RL. Evaluation of short-range interactions as secondary structure energies for protein fold and sequence recognition. Proteins 1999. [DOI: 10.1002/(sici)1097-0134(19990815)36:3<347::aid-prot9>3.0.co;2-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
19
|
Zhu H, Braun W. Sequence specificity, statistical potentials, and three-dimensional structure prediction with self-correcting distance geometry calculations of beta-sheet formation in proteins. Protein Sci 1999; 8:326-42. [PMID: 10048326 PMCID: PMC2144259 DOI: 10.1110/ps.8.2.326] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
A statistical analysis of a representative data set of 169 known protein structures was used to analyze the specificity of residue interactions between spatial neighboring strands in beta-sheets. Pairwise potentials were derived from the frequency of residue pairs in nearest contact, second nearest and third nearest contacts across neighboring beta-strands compared to the expected frequency of residue pairs in a random model. A pseudo-energy function based on these statistical pairwise potentials recognized native beta-sheets among possible alternative pairings. The native pairing was found within the three lowest energies in 73% of the cases in the training data set and in 63% of beta-sheets in a test data set of 67 proteins, which were not part of the training set. The energy function was also used to detect tripeptides, which occur frequently in beta-sheets of native proteins. The majority of native partners of tripeptides were distributed in a low energy range. Self-correcting distance geometry (SECODG) calculations using distance constraints sets derived from possible low energy pairing of beta-strands uniquely identified the native pairing of the beta-sheet in pancreatic trypsin inhibitor (BPTI). These results will be useful for predicting the structure of proteins from their amino acid sequence as well as for the design of proteins containing beta-sheets.
Collapse
Affiliation(s)
- H Zhu
- Sealy Center for Structural Biology, Department of Human Biological Chemistry and Genetics, University of Texas Medical Branch, Galveston 77555-1157, USA
| | | |
Collapse
|
20
|
Miyazawa S, Jernigan RL. Self-consistent estimation of inter-residue protein contact energies based on an equilibrium mixture approximation of residues. Proteins 1999; 34:49-68. [PMID: 10336383 DOI: 10.1002/(sici)1097-0134(19990101)34:1<49::aid-prot5>3.0.co;2-l] [Citation(s) in RCA: 143] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Pairwise contact energies for 20 types of residues are estimated self-consistently from the actual observed frequencies of contacts with regression coefficients that are obtained by comparing "input" and predicted values with the Bethe approximation for the equilibrium mixtures of residues interacting. This is premised on the fact that correlations between the "input" and the predicted values are sufficiently high although the regression coefficients themselves can depend to some extent on protein structures as well as interaction strengths. Residue coordination numbers are optimized to obtain the best correlation between "input" and predicted values for the partition energies. The contact energies self-consistently estimated this way indicate that the partition energies predicted with the Bethe approximation should be reduced by a factor of about 0.3 and the intrinsic pairwise energies by a factor of about 0.6. The observed distribution of contacts can be approximated with a small relative error of only about 0.08 as an equilibrium mixture of residues, if many proteins were employed to collect more than 20,000 contacts. Including repulsive packing interactions and secondary structure interactions further reduces the relative errors. These new contact energies are demonstrated by threading to have improved their ability to discriminate native structures from other non-native folds.
Collapse
Affiliation(s)
- S Miyazawa
- Faculty of Technology, Gunma University, Kiryu, Japan
| | | |
Collapse
|
21
|
|
22
|
Abstract
A long standing goal in protein structure studies is the development of reliable energy functions that can be used both to verify protein models derived from experimental constraints as well as for theoretical protein folding and inverse folding computer experiments. In that respect, knowledge-based statistical pair potentials have attracted considerable interests recently mainly because they include the essential features of protein structures as well as solvent effects at a low computing cost. However, the basis on which statistical potentials are derived have been questioned. In this paper, we investigate statistical pair potentials derived from protein three-dimensional structures, addressing in particular questions related to the form of these potentials, as well as to the content of the database from which they are derived. We have shown that statistical pair potentials depend on the size of the proteins included in the database, and that this dependence can be reduced by considering only pairs of residue close in space (i.e., with a cutoff of 8 A). We have shown also that statistical potentials carry a memory of the quality of the database in terms of the amount and diversity of secondary structure it contains. We find, for example, that potentials derived from a database containing alpha-proteins will only perform best on alpha-proteins in fold recognition computer experiments. We believe that this is an overall weakness of these potentials, which must be kept in mind when constructing a database.
Collapse
|
23
|
Dasgupta S, Iyer GH, Bryant SH, Lawrence CE, Bell JA. Extent and nature of contacts between protein molecules in crystal lattices and between subunits of protein oligomers. Proteins 1997; 28:494-514. [PMID: 9261866 DOI: 10.1002/(sici)1097-0134(199708)28:4<494::aid-prot4>3.0.co;2-a] [Citation(s) in RCA: 120] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
A survey was compiled of several characteristics of the intersubunit contacts in 58 oligomeric proteins, and of the intermolecular contracts in the lattice for 223 protein crystal structures. The total number of atoms in contact and the secondary structure elements involved are similar in the two types of interfaces. Crystal contact patches are frequently smaller than patches involved in oligomer interfaces. Crystal contacts result from more numerous interactions by polar residues, compared with a tendency toward nonpolar amino acids at oligomer interfaces. Arginine is the only amino acid prominent in both types of interfaces. Potentials of mean force for residue-residue contacts at both crystal and oligomer interfaces were derived from comparison of the number of observed residue-residue interactions with the number expected by mass action. They show that hydrophobic interactions at oligomer interfaces favor aromatic amino acids and methionine over aliphatic amino acids; and that crystal contacts form in such a way as to avoid inclusion of hydrophobic interactions. They also suggest that complex salt bridges with certain amino acid compositions might be important in oligomer formation. For a protein that is recalcitrant to crystallization, substitution of lysine residues with arginine or glutamine is a recommended strategy.
Collapse
Affiliation(s)
- S Dasgupta
- Department of Chemistry, Rensselaer Polytechnic Institute, Troy, New York 12180, USA
| | | | | | | | | |
Collapse
|
24
|
Dasgupta S, Iyer GH, Bryant SH, Lawrence CE, Bell JA. Extent and nature of contacts between protein molecules in crystal lattices and between subunits of protein oligomers. Proteins 1997. [DOI: 10.1002/(sici)1097-0134(199708)28:4%3c494::aid-prot4%3e3.0.co;2-a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
25
|
Skolnick J, Jaroszewski L, Kolinski A, Godzik A. Derivation and testing of pair potentials for protein folding. When is the quasichemical approximation correct? Protein Sci 1997; 6:676-88. [PMID: 9070450 PMCID: PMC2143667 DOI: 10.1002/pro.5560060317] [Citation(s) in RCA: 152] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Many existing derivations of knowledge-based statistical pair potentials invoke the quasichemical approximation to estimate the expected side-chain contact frequency if there were no amino acid pair-specific interactions. At first glance, the quasichemical approximation that treats the residues in a protein as being disconnected and expresses the side-chain contact probability as being proportional to the product of the mole fractions of the pair of residues would appear to be rather severe. To investigate the validity of this approximation, we introduce two new reference states in which no specific pair interactions between amino acids are allowed, but in which the connectivity of the protein chain is retained. The first estimates the expected number of side-chain contracts by treating the protein as a Gaussian random coil polymer. The second, more realistic reference state includes the effects of chain connectivity, secondary structure, and chain compactness by estimating the expected side-chain contrast probability by placing the sequence of interest in each member of a library of structures of comparable compactness to the native conformation. The side-chain contact maps are not allowed to readjust to the sequence of interest, i.e., the side chains cannot repack. This situation would hold rigorously if all amino acids were the same size. Both reference states effectively permit the factorization of the side-chain contact probability into sequence-dependent and structure-dependent terms. Then, because the sequence distribution of amino acids in proteins is random, the quasichemical approximation to each of these reference states is shown to be excellent. Thus, the range of validity of the quasichemical approximation is determined by the magnitude of the side-chain repacking term, which is, at present, unknown. Finally, the performance of these two sets of pair interaction potentials as well as side-chain contact fraction-based interaction scales is assessed by inverse folding tests both without and with allowing for gaps.
Collapse
Affiliation(s)
- J Skolnick
- Department of Molecular Biology, Scripps Research Institute, La Jolla, California 92037, USA.
| | | | | | | |
Collapse
|
26
|
Abstract
Threading experiments with proteins from the globin family provide an indication of the nature of the structural similarity required for successful fold recognition and accurate sequence-structure alignment. Threading scores are found to rise above the noise of false positives whenever roughly 60% of residues from a sequence can be aligned with analogous sites in the structure of a remote homolog. Fold recognition specificity thus appears to be limited by the extent of structural similarity, regardless of the degree of sequence similarity. Threading alignment accuracy is found to depend more critically on the degree of structural similarity. Alignments are accurate, placing the majority of residues exactly as in structural alignment, only when superposition residuals are less than 2.5 A. These criteria for successful recognition and sequence-structure alignment appear to be consistent with the successes and failures of threading methods in blind structure prediction. They also suggest a direct assay for improved threading methods: Potentials and alignment models should be tested for their ability to detect less extensive structural similarities, and to produce accurate alignments when superposition residuals for this conserved "core" fall in the range characteristic of remote homologs.
Collapse
Affiliation(s)
- S H Bryant
- Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
27
|
Eisenhaber F. Hydrophobic regions on protein surfaces. Derivation of the solvation energy from their area distribution in crystallographic protein structures. Protein Sci 1996; 5:1676-86. [PMID: 8844856 PMCID: PMC2143472 DOI: 10.1002/pro.5560050821] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
For the first time, a direct approach for the derivation of an atomic solvation parameter from macromolecular structural data alone is presented. The specific free energy of solvation for hydrophobic surface regions of proteins is delineated from the area distribution of hydrophobic surface patches. The resulting value is 18 cal/(mol.A2), with a statistical uncertainty of +/-2 cal/mol.A2) at the 5% significance level. It compares favorably with the parameters for carbon obtained by other authors who use the the crystal geometry of succinic acid or energies of transfer from hydrophobic solvent to water for small organic compounds. Thus, the transferability of atomic solvation parameters for hydrophobic atoms to macromolecules has been directly demonstrated. A careful statistical analysis demonstrates that surface energy parameters derived from thermodynamic data of protein mutation experiments are clearly less confident.
Collapse
Affiliation(s)
- F Eisenhaber
- Institut für Biochemie der Charité, Medizinische Fakultät, Humboldt-Universität zu Berlin, Berlin-Mitte, Germany.
| |
Collapse
|
28
|
Novotny J, Bajorath J. Computational biochemistry of antibodies and T-cell receptors. ADVANCES IN PROTEIN CHEMISTRY 1996; 49:149-260. [PMID: 8908299 DOI: 10.1016/s0065-3233(08)60490-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- J Novotny
- Department of Macromolecular Modeling, Bristol-Myers Squibb Research Institute, Princeton, New Jersey 08540, USA
| | | |
Collapse
|
29
|
Liu JS, Neuwald AF, Lawrence CE. Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies. J Am Stat Assoc 1995. [DOI: 10.1080/01621459.1995.10476622] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
30
|
Abstract
A theoretical study has shown that the occurrence of various structural elements in stable folds of random copolymers is exponentially dependent on the own energy of the element. A similar occurrence-on-energy dependence is observed in globular proteins from the level of amino acid conformations to the level of overall architectures. Thus, the structural features stabilized by many random sequences are typical of globular proteins while the features rarely observed in proteins are those which are stabilized by only a minor part of the random sequences.
Collapse
Affiliation(s)
- A V Finkelstein
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
| | | |
Collapse
|
31
|
Godzik A, Koliński A, Skolnick J. Are proteins ideal mixtures of amino acids? Analysis of energy parameter sets. Protein Sci 1995; 4:2107-17. [PMID: 8535247 PMCID: PMC2142984 DOI: 10.1002/pro.5560041016] [Citation(s) in RCA: 119] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Various existing derivations of the effective potentials of mean force for the two-body interactions between amino acid side chains in proteins are reviewed and compared to each other. The differences between different parameter sets can be traced to the reference state used to define the zero of energy. Depending on the reference state, the transfer free energy or other pseudo-one-body contributions can be present to various extents in two-body parameter sets. It is, however, possible to compare various derivations directly by concentrating on the "excess" energy-a term that describes the difference between a real protein and an ideal solution of amino acids. Furthermore, the number of protein structures available for analysis allows one to check the consistency of the derivation and the errors by comparing parameters derived from various subsets of the whole database. It is shown that pair interaction preferences are very consistent throughout the database. Independently derived parameter sets have correlation coefficients on the order of 0.8, with the mean difference between equivalent entries of 0.1 kT. Also, the low-quality (low resolution, little or no refinement) structures show similar regularities. There are, however, large differences between interaction parameters derived on the basis of crystallographic structures and structures obtained by the NMR refinement. The origin of the latter difference is not yet understood.
Collapse
Affiliation(s)
- A Godzik
- Department of Molecular Biology, Scripps Research Institute, La Jolla, California 92037, USA
| | | | | |
Collapse
|
32
|
Abstract
A protein sequence with at lease 40% identity to a known structure can now be modelled automatically, with an accuracy approaching that o fa low-resolution X-ray structure or a medium-resolution nuclear magnetic resonance structure. In general, these models have goods stereochemistry and an overall structural accuracy that is as high as the similarity between the template and the actual structure being predicted. As a result, the number of sequences that can be modelled is an order of magnitude larger then the number of experimentally determined protein structures. In addition, evaluation techniques are available that can estimated errors in different regions of the model. Thus, the number of applications where homology modelling is proving useful is growing rapidly.
Collapse
Affiliation(s)
- A Sali
- The Rockefeller University, New York, USA
| |
Collapse
|
33
|
Abstract
The past two years have seen the rapid development of new recognition methods for protein structure prediction. These algorithms 'thread' the sequence of one protein through the known structure of another, looking for an alignment that corresponds to an energetically favorable model structure. Because they are based on energy calculation, rather than evolutionary distance, these methods extend the possibility of structure prediction by comparative modeling to a larger class of new sequences, where similarity to known structures is recognizable by no other means. The strength of the evidence they offer should be judged by objective statistical tests, however, so as to rule out the possibility that favorable scores arise from chance factors such as similarity of length, composition, or the consideration of a large number of alternative alignments. Calculation of objective p-values by analytical means is not yet possible, but it would appear that approximate values may be obtained by simulation, as they are in gapped, global sequence alignment. We propose that the results of threading experiments should include Z-scores relative to the composition-corrected score distribution obtained for shuffled and optimally aligned sequences.
Collapse
Affiliation(s)
- S H Bryant
- Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | | |
Collapse
|
34
|
Finkelstein AV, Gutin AM. Boltzmann-like statistics of protein architectures. Origins and consequences. Subcell Biochem 1995; 24:1-26. [PMID: 7900172 DOI: 10.1007/978-1-4899-1727-0_1] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Affiliation(s)
- A V Finkelstein
- Institute of Protein Research, Russian Academy of Sciences, Moscow Region
| | | |
Collapse
|
35
|
Spassov VZ, Karshikoff AD, Ladenstein R. Optimization of the electrostatic interactions in proteins of different functional and folding type. Protein Sci 1994; 3:1556-69. [PMID: 7833815 PMCID: PMC2142941 DOI: 10.1002/pro.5560030921] [Citation(s) in RCA: 57] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The 3-dimensional optimization of the electrostatic interactions between the charged amino acid residues was studied by Monte Carlo simulations on an extended representative set of 141 protein structures with known atomic coordinates. The proteins were classified by different functional and structural criteria, and the optimization of the electrostatic interactions was analyzed. The optimization parameters were obtained by comparison of the contribution of charge-charge interactions to the free energy of the native protein structures and for a large number of randomly distributed charge constellations obtained by the Monte Carlo technique. On the basis of the results obtained, one can conclude that the charge-charge interactions are better optimized in the enzymes than in the proteins without enzymatic functions. Proteins that belong to the mixed alpha beta folding type are electrostatically better optimized than pure alpha-helical or beta-strand structures. Proteins that are stabilized by disulfide bonds show a lower degree of electrostatic optimization. The electrostatic interactions in a native protein are effectively optimized by rejection of the conformers that lead to repulsive charge-charge interactions. Particularly, the rejection of the repulsive contacts seems to be a major goal in the protein folding process. The dependence of the optimization parameters on the choice of the potential function was tested. The majority of the potential functions gave practically identical results.
Collapse
Affiliation(s)
- V Z Spassov
- Centre for Structural Biochemistry, Karolinska Institute, NOVUM, Stockholm, Sweden
| | | | | |
Collapse
|
36
|
Lawrence C. Toward the unification of sequence and structural data for identification of structural and functional constraints. COMPUTERS & CHEMISTRY 1994; 18:255-8. [PMID: 7952896 DOI: 10.1016/0097-8485(94)85021-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The identification and characterization of local residue patterns or conserved segments shared by a set of biopolymers has provided a number of insights in molecular biology. Biopolymer sequences are observations from macro molecules that share common structural or function features. The approach taken here rests on the notion that information may be most efficiently extracted from these observations through the use of a model that faithfully represents macro-molecular characteristics. Accordingly, our efforts are focused on statistical models which attempt to capture central features of protein structure, function, and change. Here the assumptions that underlie two new methods for the analysis of protein sequence data are explicitly delineated. (1) Threading of a sequence through structural motifs seeks to determine if a protein sequence fits a known protein structure. The assumptions delineated here also generally apply to other contact based threading methods that have been recently described. (2) Multiple sequence alignment via the Gibbs sampling algorithm seeks to identify position specific empirical free energy models for residue sites in common motifs and simultaneously the align sequence observations form these motifs.
Collapse
Affiliation(s)
- C Lawrence
- Wadsworth Labs, NYS-DOH, Albany, NY 12222
| |
Collapse
|
37
|
Spassov VZ, Atanasov BP. Spatial optimization of electrostatic interactions between the ionized groups in globular proteins. Proteins 1994; 19:222-9. [PMID: 7937735 DOI: 10.1002/prot.340190306] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
A model approach is suggested to estimate the degree of spatial optimization of the electrostatic interactions in protein molecules. The method is tested on a set of 44 globular proteins, representative of the available crystallographic data. The theoretical model is based on macroscopic computation of the contribution of charge-charge interactions to the electrostatic term of the free energy for the native proteins and for a big number of virtual structures with randomly distributed on protein surface charge constellations (generated by a Monte-Carlo technique). The statistical probability of occurrence of random structures with electrostatic energies lower than the energy of the native protein is suggested as a criterion for spatial optimization of the electrostatic interactions. The results support the hypothesis that the folding process optimizes the stabilizing effect of electrostatic interactions, but to very different degree for different proteins. A parallel analysis of ion pairs shows that the optimization of the electrostatic term in globular proteins has increasingly gone in the direction of rejecting the repulsive short contacts between charges of equal sign than of creating of more salt bridges (in comparison with the statistically expected number of short-range ion pairs in the simulated random structures). It is observed that the decrease in the spatial optimization of the electrostatic interactions is usually compensated for by an appearance of disulfide bridges in the covalent structure of the examined proteins.
Collapse
Affiliation(s)
- V Z Spassov
- Central Laboratory of Biophysics, Bulgarian Academy of Sciences, Sofia
| | | |
Collapse
|
38
|
Abstract
Many methods exist for taking a sequence that exhibits similarity to another of known structure and building a molecular model. However, when the sequence similarity is very remote and fragmentary, this 'modelling-by-homology' approach is less reliable. Current methods that tackle this problem are reviewed below, taking as an example the construction of a predicted model for the retroviral protease. This earlier work, which was only partially automatic, identified many of the outstanding difficulties that have subsequently been automated in computer programs, developed both by the author and many others. Because of the rapid proliferation of methods and their variants, an exhaustive review of the literature has not been possible and the following survey concentrates on the developments of the author and colleagues to explain the basic methods.
Collapse
Affiliation(s)
- W R Taylor
- Laboratory of Mathematical Biology, National Institute for Medical Research, London, UK
| |
Collapse
|
39
|
Kireev DB, Fetisov VI, Zefirov NS. Approximate molecular electrostatic potential computations: applications to quantitative structure-activity relationships. ACTA ACUST UNITED AC 1994. [DOI: 10.1016/s0166-1280(96)80006-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
40
|
Godzik A, Kolinski A, Skolnick J. De novo and inverse folding predictions of protein structure and dynamics. J Comput Aided Mol Des 1993; 7:397-438. [PMID: 8229093 DOI: 10.1007/bf02337559] [Citation(s) in RCA: 76] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
In the last two years, the use of simplified models has facilitated major progress in the globular protein folding problem, viz., the prediction of the three-dimensional (3D) structure of a globular protein from its amino acid sequence. A number of groups have addressed the inverse folding problem where one examines the compatibility of a given sequence with a given (and already determined) structure. A comparison of extant inverse protein-folding algorithms is presented, and methodologies for identifying sequences likely to adopt identical folding topologies, even when they lack sequence homology, are described. Extension to produce structural templates or fingerprints from idealized structures is discussed, and for eight-membered beta-barrel proteins, it is shown that idealized fingerprints constructed from simple topology diagrams can correctly identify sequences having the appropriate topology. Furthermore, this inverse folding algorithm is generalized to predict elements of supersecondary structure including beta-hairpins, helical hairpins and alpha/beta/alpha fragments. Then, we describe a very high coordination number lattice model that can predict the 3D structure of a number of globular proteins de novo; i.e. using just the amino acid sequence. Applications to sequences designed by DeGrado and co-workers [Biophys. J., 61 (1992) A265] predict folding intermediates, native states and relative stabilities in accord with experiment. The methodology has also been applied to the four-helix bundle designed by Richardson and co-workers [Science, 249 (1990) 884] and a redesigned monomeric version of a naturally occurring four-helix dimer, rop. Based on comparison to the rop dimer, the simulations predict conformations with rms values of 3-4 A from native. Furthermore, the de novo algorithms can assess the stability of the folds predicted from the inverse algorithm, while the inverse folding algorithms can assess the quality of the de novo models. Thus, the synergism of the de novo and inverse folding algorithm approaches provides a set of complementary tools that will facilitate further progress on the protein-folding problem.
Collapse
Affiliation(s)
- A Godzik
- Department of Molecular Biology, Scripps Research Institute, La Jolla, CA 92037
| | | | | |
Collapse
|
41
|
|
42
|
Bryant SH, Lawrence CE. An empirical energy function for threading protein sequence through the folding motif. Proteins 1993; 16:92-112. [PMID: 8497488 DOI: 10.1002/prot.340160110] [Citation(s) in RCA: 307] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
In this paper we present a new residue contact potential derived by statistical analysis of protein crystal structures. This gives mean hydrophobic and pairwise contact energies as a function of residue type and distance interval. To test the accuracy of this potential we generate model structures by "threading" different sequences through backbone folding motifs found in the structural data base. We find that conformational energies calculated by summing contact potentials show perfect specificity in matching the correct sequences with each globular folding motif in a 161-protein data set. They also identify correct models with the core folding motifs of hemerythrin and immunoglobulin McPC603 V1-domain, among millions of alternatives possible when we align subsequences with alpha-helices and beta-strands, and allow for variation in the lengths of intervening loops. We suggest that contact potentials reflect important constraints on nonbonded interaction in native proteins, and that "threading" may be useful for structure prediction by recognition of folding motif.
Collapse
Affiliation(s)
- S H Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20879
| | | |
Collapse
|
43
|
Kolinski A, Skolnick J. Discretized model of proteins. I. Monte Carlo study of cooperativity in homopolypeptides. J Chem Phys 1992. [DOI: 10.1063/1.463317] [Citation(s) in RCA: 103] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
44
|
Shenkin PS, Erman B, Mastrandrea LD. Information-theoretical entropy as a measure of sequence variability. Proteins 1991; 11:297-313. [PMID: 1758884 DOI: 10.1002/prot.340110408] [Citation(s) in RCA: 151] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
We propose the use of the information-theoretical entrophy, S = -sigman pi log2 pi, as a measure of variability at a given position in a set of aligned sequences. pi stands for the fraction of times the i-th type appears at a position. For protein sequences, the sum has up to 20 terms, for nucleotide sequences, up to 4 terms, and for codon sequences, up to 61 terms. We compare S and Vs, a related measure, in detail with Vk, the traditional measure of immunoglobulin sequence variability, both in the abstract and as applied to the immunoglobulins. We conclude that S has desirable mathematical properties that Vk lacks and has intuitive and statistical meanings that accord well with the notion of variability. We find that Vk and the S-based measures are highly correlated for the immunoglobulins. We show by analysis of sequence data and by means of a mathematical model that this correlation is due to a strong tendency for the frequency of occurrence of amino acid types at a given position to be log-linear. It is not known whether the immunoglobulins are typical or atypical of protein families in this regard, nor is the origin of the observed rank-frequency distribution obvious, although we discuss several possible etiologies.
Collapse
Affiliation(s)
- P S Shenkin
- Department of Chemistry, Barnard College, New York, New York 10027
| | | | | |
Collapse
|
45
|
Lawrence CE, Bryant SH. Hydrophobic potentials from statistical analysis of protein structures. Methods Enzymol 1991; 202:20-31. [PMID: 1784174 DOI: 10.1016/0076-6879(91)02004-s] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|