1
|
From propensities to patterns to principles in protein folding. Proteins 2023. [PMID: 37353953 DOI: 10.1002/prot.26540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 06/05/2023] [Accepted: 06/08/2023] [Indexed: 06/25/2023]
Abstract
As proposed here, β-turns play an essential role in protein self-assembly. This compact, four-residue motif affects protein conformation dramatically by reversing the overall chain direction. Turns are the "hinges" in globular proteins. This new proposal broadens a previous hypothesis that globular proteins solve the folding problem in part by filtering conformers with unsatisfied backbone hydrogen bonds, thereby preorganizing the folding population. Recapitulating that hypothesis: unsatisfied conformers would be dramatically destabilizing, shifting the U(nfolded) ⇌ N(ative) equilibrium far to the left. If even a single backbone polar group is satisfied by solvent when unfolded but buried and unsatisfied when folded, that energy penalty alone, approximately +5 kcal/mol, would rival almost the entire free energy of protein stabilization at room temperature. Consequently, globular proteins are built on scaffolds of hydrogen-bonded α-helices and/or strands of β-sheet, motifs that can be extended indefinitely, with intra-segment hydrogen bond partners for their backbone polar groups and without steric clash. Scaffolds foster a protein-wide hydrogen-bonded network, and, of thermodynamic necessity, they self-assemble cooperatively. Unlike elements of repetitive secondary structure, α-helices and β-sheet, a four-residue β-turn has only a single hydrogen bond (from i + 3 → i), not a cooperatively formed assembly of hydrogen bonds. As such, turns can form autonomously and are poised to initiate assembly of scaffold elements by bringing them together in an orientation and registration that promotes cooperative "zipping". The overall effect of this self-assembly mechanism is to induce substantial preorganization in the thermodynamically accessible folding population and, concomitantly, to reduce the folding entropy.
Collapse
|
2
|
|
3
|
Abstract
It has been a long-standing conviction that a protein's native fold is selected from a vast number of conformers by the optimal constellation of enthalpically favorable interactions. In marked contrast, this Perspective introduces a different mechanism, one that emphasizes conformational entropy as the principal organizer in protein folding while proposing that the conventional view is incomplete. This mechanism stems from the realization that hydrogen bond satisfaction is a thermodynamic necessity. In particular, a backbone hydrogen bond may add little to the stability of the native state, but a completely unsatisfied backbone hydrogen bond would be dramatically destabilizing, shifting the U(nfolded) ⇌ N(ative) equilibrium far to the left. If even a single backbone polar group is satisfied by solvent when unfolded but buried and unsatisfied when folded, that energy penalty alone, approximately +5 kcal/mol, would rival almost the entire free energy of protein stabilization, typically between -5 and -15 kcal/mol under physiological conditions. Consequently, upon folding, buried backbone polar groups must form hydrogen bonds, and they do so by assembling scaffolds of α-helices and/or strands of β-sheet, the only conformers in which, with rare exception, hydrogen bond donors and acceptors are exactly balanced. In addition, only a few thousand viable scaffold topologies are possible for a typical protein domain. This thermodynamic imperative winnows the folding population by culling conformers with unsatisfied hydrogen bonds, thereby reducing the entropy cost of folding. Importantly, conformational restrictions imposed by backbone···backbone hydrogen bonding in the scaffold are sequence-independent, enabling mutation─and thus evolution─without sacrificing the structure.
Collapse
|
4
|
Building blocks of protein structures: Physics meets biology. Phys Rev E 2021; 104:014402. [PMID: 34412233 DOI: 10.1103/physreve.104.014402] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 03/22/2021] [Indexed: 12/11/2022]
Abstract
The native state structures of globular proteins are stable and well packed indicating that self-interactions are favored over protein-solvent interactions under folding conditions. We use this as a guiding principle to derive the geometry of the building blocks of protein structures-α helices and strands assembled into β sheets-with no adjustable parameters, no amino acid sequence information, and no chemistry. There is an almost perfect fit between the dictates of mathematics and physics and the rules of quantum chemistry. Protein evolution is facilitated by sequence-independent platforms, which can elaborate sequence-dependent functional diversity. Our work highlights the vital role of discreteness in life and may have implications for the creation of artificial life and on the nature of life elsewhere in the cosmos.
Collapse
|
5
|
Abstract
This Perspective is intended to raise questions about the conventional interpretation of protein folding. According to the conventional interpretation, developed over many decades, a protein population can visit a vast number of conformations under unfolding conditions, but a single dominant native population emerges under folding conditions. Accordingly, folding comes with a substantial loss of conformational entropy. How is this price paid? The conventional answer is that favorable interactions between and among the side chains can compensate for entropy loss, and moreover, these interactions are responsible for the structural particulars of the native conformation. Challenging this interpretation, the Perspective introduces a proposal that high energy (i.e., unfavorable) excluding interactions winnow the accessible population substantially under physical-chemical conditions that favor folding. Both steric clash and unsatisfied hydrogen bond donors and acceptors are classified as excluding interactions, so called because conformers with such disfavored interactions will be largely excluded from the thermodynamic population. Both excluding interactions and solvent factors that induce compactness are somewhat nonspecific, yet together they promote substantial chain organization. Moreover, proteins are built on a backbone scaffold consisting of α-helices and strands of β-sheet, where the number of hydrogen bond donors and acceptors is exactly balanced. These repetitive secondary structural elements are the only two conformers that can be both completely hydrogen-bond satisfied and extended indefinitely without encountering a steric clash. Consequently, the number of fundamental folds is limited to no more than ~10,000 for a protein domain. Once excluding interactions are taken into account, the issue of "frustration" is largely eliminated and the Levinthal paradox is resolved. Putting the "bottom line" at the top: it is likely that hydrogen-bond satisfaction represents a largely under-appreciated parameter in protein folding models.
Collapse
|
6
|
Cover Image, Volume 87, Issue 5. Proteins 2019. [DOI: 10.1002/prot.25671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
7
|
Ramachandran maps for side chains in globular proteins. Proteins 2019; 87:357-364. [PMID: 30629766 DOI: 10.1002/prot.25656] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2018] [Accepted: 12/30/2018] [Indexed: 11/05/2022]
Abstract
The Ramachandran plot for backbone ϕ,ψ-angles in a blocked monopeptide has played a central role in understanding protein structure. Curiously, a similar analysis for side chain χ-angles has been comparatively neglected. Instead, efforts have focused on compiling various types of side chain libraries extracted from proteins of known structure. Departing from this trend, the following analysis presents backbone-based maps of side chains in blocked monopeptides. As in the original ϕ,ψ-plot, these maps are derived solely from hard-sphere steric repulsion. Remarkably, the side chain biases exhibit marked similarities to corresponding biases seen in high-resolution protein structures. Consequently, some of the entropic cost for side chain localization in proteins is prepaid prior to the onset of folding events because conformational bias is built into the chain at the covalent level. Furthermore, side chain conformations are seen to experience fewer steric restrictions for backbone conformations in either the α or β basins, those map regions where repetitive ϕ,ψ-angles result in α-helices or strands of β-sheet, respectively. Here, these α and β basins are entropically favored for steric reasons alone; a blocked monopeptide is too short to accommodate the peptide hydrogen bonds that stabilize repetitive secondary structure. Thus, despite differing energetics, α/β-basins are favored for both monopeptides and repetitive secondary structure, underpinning an energetically unfrustrated compatibility between these two levels of protein structure.
Collapse
|
8
|
|
9
|
On interpretation of protein X-ray structures: Planarity of the peptide unit. Proteins 2015; 83:1687-92. [PMID: 26148341 DOI: 10.1002/prot.24854] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Revised: 06/26/2015] [Accepted: 06/30/2015] [Indexed: 11/09/2022]
Abstract
Pauling's mastery of peptide stereochemistry-based on small molecule crystal structures and the theory of chemical bonding-led to his realization that the peptide unit is planar and then to the Pauling-Corey-Branson model of the α-helix. Similarly, contemporary protein structure refinement is based on experimentally determined diffraction data together with stereochemical restraints. However, even an X-ray structure at ultra-high resolution is still an under-determined model in which the linkage among refinement parameters is complex. Consequently, restrictions imposed on any given parameter can affect the entire structure. Here, we examine recent studies of high resolution protein X-ray structures, where substantial distortions of the peptide plane are found to be commonplace. Planarity is assessed by the ω-angle, a dihedral angle determined by the peptide bond (C-N) and its flanking covalent neighbors; for an ideally planar trans peptide, ω = 180°. By using a freely available refinement package, Phenix [Afonine et al. (2012) Acta Cryst. D, 68:352-367], we demonstrate that tightening default restrictions on the ω-angle can significantly reduce apparent deviations from peptide unit planarity without consequent reduction in reported evaluation metrics (e.g., R-factors). To be clear, our result does not show that substantial non-planarity is absent, only that an equivalent alternative model is possible. Resolving this disparity will ultimately require improved understanding of the deformation energy. Meanwhile, we urge inclusion of ω-angle statistics in new structure reports in order to focus critical attention on the usual practice of assigning default values to ω-angle constraints during structure refinement.
Collapse
|
10
|
Molten globules, entropy-driven conformational change and protein folding. Curr Opin Struct Biol 2013; 23:4-10. [DOI: 10.1016/j.sbi.2012.11.004] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2012] [Revised: 11/14/2012] [Accepted: 11/15/2012] [Indexed: 10/27/2022]
|
11
|
165 Protein domains: a thermodynamic definition. J Biomol Struct Dyn 2013. [DOI: 10.1080/07391102.2013.786407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
12
|
Reducing the dimensionality of the protein-folding search problem. Protein Sci 2012; 21:1231-40. [PMID: 22692765 DOI: 10.1002/pro.2106] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2012] [Revised: 06/04/2012] [Accepted: 06/05/2012] [Indexed: 11/10/2022]
Abstract
How does a folding protein negotiate a vast, featureless conformational landscape and adopt its native structure in biological real time? Motivated by this search problem, we developed a novel algorithm to compare protein structures. Procedures to identify structural analogs are typically conducted in three-dimensional space: the tertiary structure of a target protein is matched against each candidate in a database of structures, and goodness of fit is evaluated by a distance-based measure, such as the root-mean-square distance between target and candidate. This is an expensive approach because three-dimensional space is complex. Here, we transform the problem into a simpler one-dimensional procedure. Specifically, we identify and label the 11 most populated residue basins in a database of high-resolution protein structures. Using this 11-letter alphabet, any protein's three-dimensional structure can be transformed into a one-dimensional string by mapping each residue onto its corresponding basin. Similarity between the resultant basin strings can then be evaluated by conventional sequence-based comparison. The disorder → order folding transition is abridged on both sides. At the onset, folding conditions necessitate formation of hydrogen-bonded scaffold elements on which proteins are assembled, severely restricting the magnitude of accessible conformational space. Near the end, chain topology is established prior to emergence of the close-packed native state. At this latter stage of folding, the chain remains molten, and residues populate natural basins that are approximated by the 11 basins derived here. In essence, our algorithm reduces the protein-folding search problem to mapping the amino acid sequence onto a restricted basin string.
Collapse
|
13
|
|
14
|
The Levinthal paradox of the interactome. Protein Sci 2011; 20:2074-9. [PMID: 21987416 DOI: 10.1002/pro.747] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Revised: 09/22/2011] [Accepted: 09/23/2011] [Indexed: 02/06/2023]
Abstract
The central biological question of the 21st century is: how does a viable cell emerge from the bewildering combinatorial complexity of its molecular components? Here, we estimate the combinatorics of self-assembling the protein constituents of a yeast cell, a number so vast that the functional interactome could only have emerged by iterative hierarchic assembly of its component sub-assemblies. A protein can undergo both reversible denaturation and hierarchic self-assembly spontaneously, but a functioning interactome must expend energy to achieve viability. Consequently, it is implausible that a completely "denatured" cell could be reversibly renatured spontaneously, like a protein. Instead, new cells are generated by the division of pre-existing cells, an unbroken chain of renewal tracking back through contingent conditions and evolving responses to the origin of life on the prebiotic earth. We surmise that this non-deterministic temporal continuum could not be reconstructed de novo under present conditions.
Collapse
|
15
|
Comment on "Revisiting the Ramachandran plot from a new angle". Protein Sci 2011; 20:1771-3; author reply 1774. [PMID: 21898646 DOI: 10.1002/pro.724] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Indexed: 11/12/2022]
|
16
|
Abstract
It is often assumed that the peptide backbone forms a substantial number of additional hydrogen bonds when a protein unfolds. We challenge that assumption in this article. Early surveys of hydrogen bonding in proteins of known structure typically found that most, but not all, backbone polar groups are satisfied, either by intramolecular partners or by water. When the protein is folded, these groups form approximately two hydrogen bonds per peptide unit, one donor or acceptor for each carbonyl oxygen or amide hydrogen, respectively. But when unfolded, the backbone chain is often believed to form three hydrogen bonds per peptide unit, one partner for each oxygen lone pair or amide hydrogen. This assumption is based on the properties of small model compounds, like N-methylacetamide, or simply accepted as self-evident fact. If valid, a chain of N residues would have approximately 2N backbone hydrogen bonds when folded but 3N backbone hydrogen bonds when unfolded, a sufficient difference to overshadow any uncertainties involved in calculating these per-residue averages. Here, we use exhaustive conformational sampling to monitor the number of H-bonds in a statistically adequate population of blocked polyalanyl-six-mers as the solvent quality ranges from good to poor. Solvent quality is represented by a scalar parameter used to Boltzmann-weight the population energy. Recent experimental studies show that a repeating (Gly-Ser) polypeptide undergoes a denaturant-induced expansion accompanied by breaking intramolecular peptide H-bonds. Results from our simulations augment this experimental finding by showing that the number of H-bonds is approximately conserved during such expansion⇋compaction transitions.
Collapse
|
17
|
Abstract
New experimental results show that either gain or loss of close packing can be observed as a discrete step in protein folding or unfolding reactions. This finding poses a significant challenge to the conventional two-state model of protein folding. Results of interest involve dry molten globule (DMG) intermediates, an expanded form of the protein that lacks appreciable solvent. When an unfolding protein expands to the DMG state, side chains unlock and gain conformational entropy, while liquid-like van der Waals interactions persist. Four unrelated proteins are now known to form DMGs as the first step of unfolding, suggesting that such an intermediate may well be commonplace in both folding and unfolding. Data from the literature show that peptide amide protons are protected in the DMG, indicating that backbone structure is intact despite loss of side-chain close packing. Other complementary evidence shows that secondary structure formation provides a major source of compaction during folding. In our model, the major free-energy barrier separating unfolded from native states usually occurs during the transition between the unfolded state and the DMG. The absence of close packing at this barrier provides an explanation for why phi-values, derived from a Brønsted-Leffler plot, depend primarily on structure at the mutational site and not on specific side-chain interactions. The conventional two-state folding model breaks down when there are DMG intermediates, a realization that has major implications for future experimental work on the mechanism of protein folding.
Collapse
|
18
|
Negative Design in Protein Coils. Biophys J 2011. [DOI: 10.1016/j.bpj.2010.12.3035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
19
|
Abstract
We present a method with the potential to generate a library of coil segments from first principles. Proteins are built from alpha-helices and/or beta-strands interconnected by these coil segments. Here, we investigate the conformational determinants of short coil segments, with particular emphasis on chain turns. Toward this goal, we extracted a comprehensive set of two-, three-, and four-residue turns from X-ray-elucidated proteins and classified them by conformation. A remarkably small number of unique conformers account for most of this experimentally determined set, whereas remaining members span a large number of rare conformers, many occurring only once in the entire protein database. Factors determining conformation were identified via Metropolis Monte Carlo simulations devised to test the effectiveness of various energy terms. Simulated structures were validated by comparison to experimental counterparts. After filtering rare conformers, we found that 98% of the remaining experimentally determined turn population could be reproduced by applying a hydrogen bond energy term to an exhaustively generated ensemble of clash-free conformers in which no backbone polar group lacks a hydrogen-bond partner. Further, at least 90% of longer coil segments, ranging from 5- to 20 residues, were found to be structural composites of these shorter primitives. These results are pertinent to protein structure prediction, where approaches can be divided into either empirical or ab initio methods. Empirical methods use database-derived information; ab initio methods rely on physical-chemical principles exclusively. Replacing the database-derived coil library with one generated from first principles would transform any empirically based method into its corresponding ab initio homologue.
Collapse
|
20
|
|
21
|
In memoriam. Proteins 2009; 75:535-9. [DOI: 10.1002/prot.22400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
22
|
Abstract
We seek to understand the link between protein thermodynamics and protein structure in molecular detail. A classical approach to this problem involves assessing changes in protein stability resulting from added cosolvents. Under any given conditions, protein molecules in aqueous buffer are in equilibrium between unfolded and folded states, U(nfolded) <==> N(ative). Addition of organic osmolytes, small uncharged compounds found throughout nature, shift this equilibrium. Urea, a denaturing osmolyte, shifts the equilibrium toward U; trimethylamine N-oxide (TMAO), a protecting osmolyte, shifts the equilibrium toward N. Using the Tanford Transfer Model, the thermodynamic response to many such osmolytes has been dissected into groupwise free energy contributions. It is found that the energetics involving backbone hydrogen bonding controls these shifts in protein stability almost entirely, with osmolyte cosolvents simply dialing between solvent-backbone versus backbone-backbone hydrogen bonds, as a function of solvent quality. This reciprocal relationship establishes the essential link between protein thermodynamics and the protein's hydrogen-bonded backbone structure.
Collapse
|
23
|
Abstract
Globular proteins adopt complex folds, composed of organized assemblies of alpha-helix and beta-sheet together with irregular regions that interconnect these scaffold elements. Here, we seek to parse the irregular regions into their structural constituents and to rationalize their formative energetics. Toward this end, we dissected the Protein Coil Library, a structural database of protein segments that are neither alpha-helix nor beta-strand, extracted from high-resolution protein structures. The backbone dihedral angles of residues from coil library segments are distributed indiscriminately across the phi,psi map, but when contoured, seven distinct basins emerge clearly. The structures and energetics associated with the two least-studied basins are the primary focus of this article. Specifically, the structural motifs associated with these basins were characterized in detail and then assessed in simple simulations designed to capture their energetic determinants. It is found that conformational constraints imposed by excluded volume and hydrogen bonding are sufficient to reproduce the observed ,psi distributions of these motifs; no additional energy terms are required. These three motifs in conjunction with alpha-helices, strands of beta-sheet, canonical beta-turns, and polyproline II conformers comprise approximately 90% of all protein structure.
Collapse
|
24
|
Building native protein conformation from NMR backbone chemical shifts using Monte Carlo fragment assembly. Protein Sci 2007; 16:1515-21. [PMID: 17656574 PMCID: PMC2203357 DOI: 10.1110/ps.072988407] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
We have been analyzing the extent to which protein secondary structure determines protein tertiary structure in simple protein folds. An earlier paper demonstrated that three-dimensional structure can be obtained successfully using only highly approximate backbone torsion angles for every residue. Here, the initial information is further diluted by introducing a realistic degree of experimental uncertainty into this process. In particular, we tackle the practical problem of determining three-dimensional structure solely from backbone chemical shifts, which can be measured directly by NMR and are known to be correlated with a protein's backbone torsion angles. Extending our previous algorithm to incorporate these experimentally determined data, clusters of structures compatible with the experimentally determined chemical shifts were generated by fragment assembly Monte Carlo. The cluster that corresponds to the native conformation was then identified based on four energy terms: steric clash, solvent-squeezing, hydrogen-bonding, and hydrophobic contact. Currently, the method has been applied successfully to five small proteins with simple topology. Although still under development, this approach offers promise for high-throughput NMR structure determination.
Collapse
|
25
|
Abstract
Globular proteins are assemblies of alpha-helices and beta-strands, interconnected by reverse turns and longer loops. Most short turns can be classified readily into a limited repertoire of discrete backbone conformations, but the physical-chemical determinants of these distinct conformational basins remain an open question. We investigated this question by exhaustive analysis of all backbone conformations accessible to short chain segments bracketed by either an alpha-helix or a beta-strand (i.e., alpha-segment-alpha, beta-segment-beta, alpha-segment-beta, and beta-segment-alpha) in a nine-state model. We find that each of these four secondary structure environments imposes its own unique steric and hydrogen-bonding constraints on the intervening segment, resulting in a limited repertoire of conformations. In greater detail, an exhaustive set of conformations was generated for short backbone segments having reverse-turn chain topology and bracketed between elements of secondary structure. This set was filtered, and only clash-free, hydrogen-bond-satisfied conformers having reverse-turn topology were retained. The filtered set includes authentic turn conformations, observed in proteins of known structure, but little else. In particular, over 99% of the alternative conformations failed to satisfy at least one criterion and were excluded from the filtered set. Furthermore, almost all of the remaining alternative conformations have close tolerances that would be too tight to accommodate side chains longer than a single beta-carbon. These results provide a molecular explanation for the observation that reverse turns between elements of regular secondary can be classified into a small number of discrete conformations.
Collapse
|
26
|
Abstract
Under physiological conditions, a protein undergoes a spontaneous disorder order transition called "folding." The protein polymer is highly flexible when unfolded but adopts its unique native, three-dimensional structure when folded. Current experimental knowledge comes primarily from thermodynamic measurements in solution or the structures of individual molecules, elucidated by either x-ray crystallography or NMR spectroscopy. From the former, we know the enthalpy, entropy, and free energy differences between the folded and unfolded forms of hundreds of proteins under a variety of solvent/cosolvent conditions. From the latter, we know the structures of approximately 35,000 proteins, which are built on scaffolds of hydrogen-bonded structural elements, alpha-helix and beta-sheet. Anfinsen showed that the amino acid sequence alone is sufficient to determine a protein's structure, but the molecular mechanism responsible for self-assembly remains an open question, probably the most fundamental open question in biochemistry. This perspective is a hybrid: partly review, partly proposal. First, we summarize key ideas regarding protein folding developed over the past half-century and culminating in the current mindset. In this view, the energetics of side-chain interactions dominate the folding process, driving the chain to self-organize under folding conditions. Next, having taken stock, we propose an alternative model that inverts the prevailing side-chain/backbone paradigm. Here, the energetics of backbone hydrogen bonds dominate the folding process, with preorganization in the unfolded state. Then, under folding conditions, the resultant fold is selected from a limited repertoire of structural possibilities, each corresponding to a distinct hydrogen-bonded arrangement of alpha-helices and/or strands of beta-sheet.
Collapse
|
27
|
Abstract
Osmolytes are small organic compounds that affect protein stability and are ubiquitous in living systems. In the equilibrium protein folding reaction, unfolded (U) native (N), protecting osmolytes push the equilibrium toward N, whereas denaturing osmolytes push the equilibrium toward U. As yet, there is no universal molecular theory that can explain the mechanism by which osmolytes interact with the protein to affect protein stability. Here, we lay the groundwork for such a theory, starting with a key observation: the transfer free energy of protein backbone from water to a water/osmolyte solution, Deltagtr, is negatively correlated with an osmolyte's fractional polar surface area. Deltagtr measures the degree to which an osmolyte stabilizes a protein. Consequently, a straightforward interpretation of this correlation implies that the interaction between the protein backbone and osmolyte polar groups is more favorable than the corresponding interaction with nonpolar groups. Such an interpretation immediately suggests the existence of a universal mechanism involving osmolyte, backbone, and water. We test this idea by using it to construct a quantitative solvation model in which backbone/solvent interaction energy is a function of interactant polarity, and the number of energetically equivalent ways of realizing a given interaction is a function of interactant surface area. Using this model, calculated Deltagtr values show a strong correlation with measured values (R = 0.99). In addition, the model correctly predicts that protecting/denaturing osmolytes will be preferentially excluded/accumulated around the protein backbone. Taken together, these model-based results rationalize the dominant interactions observed in experimental studies of osmolyte-induced protein stabilization and denaturation.
Collapse
|
28
|
Abstract
Using a test set of 13 small, compact proteins, we demonstrate that a remarkably simple protocol can capture native topology from secondary structure information alone, in the absence of long-range interactions. It has been a long-standing open question whether such information is sufficient to determine a protein's fold. Indeed, even the far simpler problem of reconstructing the three-dimensional structure of a protein from its exact backbone torsion angles has remained a difficult challenge owing to the small, but cumulative, deviations from ideality in backbone planarity, which, if ignored, cause large errors in structure. As a familiar example, a small change in an elbow angle causes a large displacement at the end of your arm; the longer the arm, the larger the displacement. Here, correct secondary structure assignments (alpha-helix, beta-strand, beta-turn, polyproline II, coil) were used to constrain polypeptide backbone chains devoid of side chains, and the most stable folded conformations were determined, using Monte Carlo simulation. Just three terms were used to assess stability: molecular compaction, steric exclusion, and hydrogen bonding. For nine of the 13 proteins, this protocol restricts the main chain to a surprisingly small number of energetically favorable topologies, with the native one prominent among them.
Collapse
|
29
|
The Protein Coil Library: a structural database of nonhelix, nonstrand fragments derived from the PDB. Proteins 2006; 58:852-4. [PMID: 15657933 DOI: 10.1002/prot.20394] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Approximately half the structure of folded proteins is either alpha-helix or beta-strand. We have developed a convenient repository of all remaining structure after these two regular secondary structure elements are removed. The Protein Coil Library (http://roselab.jhu.edu/coil/) allows rapid and comprehensive access to non-alpha-helix and non-beta-strand fragments contained in the Protein Data Bank (PDB). The library contains both sequence and structure information together with calculated torsion angles for both the backbone and side chains. Several search options are implemented, including a query function that uses output from popular PDB-culling servers directly. Additionally, several popular searches are stored and updated for immediate access. The library is a useful tool for exploring conformational propensities, turn motifs, and a recent model of the unfolded state.
Collapse
|
30
|
The role of introns in repeat protein gene formation. J Mol Biol 2006; 360:258-66. [PMID: 16781737 DOI: 10.1016/j.jmb.2006.05.024] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2006] [Revised: 05/08/2006] [Accepted: 05/10/2006] [Indexed: 11/23/2022]
Abstract
Genes composed of tandem repetitive sequence motifs are abundant in nature and are enriched in eukaryotes. To investigate repeat protein gene formation mechanisms, we have conducted a large-scale analysis of their introns and exons. We find that a wide variety of repeat motifs exhibit a striking conservation of intron position and phase, and are composed of exons that encode one or two complete repeats. These results suggest a simple model of repeat protein gene formation from local duplications. This model is corroborated by amino acid sequence similarity patterns among neighboring repeats from various repeat protein genes. The distribution of one- and two-repeat exons indicates that intron-facilitated repeat motif duplication, in which the start and end points of duplication are located in consecutive intronic regions, significantly exceeds intron-independent duplication. These results suggest that introns have contributed to the greater abundance of repeat protein genes in eukaryotic versus prokaryotic organisms, a conclusion that is supported by taxonomic analysis.
Collapse
|
31
|
|
32
|
Abstract
Is highly approximate knowledge of a protein's backbone structure sufficient to successfully identify its family, superfamily, and tertiary fold? To explore this question, backbone dihedral angles were extracted from the known three-dimensional structure of 2,439 proteins and mapped into 36 labeled, 60 degrees x 60 degrees bins, called mesostates. Using this coarse-grained mapping, protein conformation can be approximated by a linear sequence of mesostates. These linear strings can then be aligned and assessed by conventional sequence-comparison methods. We report that the mesostate sequence is sufficient to recognize a protein's family, superfamily, and fold with good fidelity.
Collapse
|
33
|
Abstract
Beta-turns are sites at which proteins change their overall chain direction, and they occur with high frequency in globular proteins. The Protein Data Bank has many instances of conformations that resemble beta-turns but lack the characteristic N-H(i) --> O=C(i - 3) hydrogen bond of an authentic beta-turn. Here, we identify potential hydrogen-bonded beta-turns in the coil library, a Web-accessible database utility comprised of all residues not in repetitive secondary structure, neither alpha-helix nor beta-sheet (http://www.roselab.jhu.edu/coil). In particular, candidate turns were identified as four-residue segments satisfying highly relaxed geometric criteria but lacking a strictly defined hydrogen bond. Such candidates were then subjected to a minimization protocol to determine whether slight changes in torsion angles are sufficient to shift the conformation into reference-quality geometry without deviating significantly from the original structure. This approach of applying constrained minimization to known structures reveals a substantial population of previously unidentified, stringently defined, hydrogen-bonded beta-turns. In particular, 33% of coil library residues were classified as beta-turns prior to minimization. After minimization, 45% of such residues could be classified as beta-turns, with another 8% in 3(10) helixes (which closely resemble type III beta-turns). Of the remaining coil library residues, 37% have backbone dihedral angles in left-handed polyproline II structure.
Collapse
|
34
|
Building native protein conformation from highly approximate backbone torsion angles. Proc Natl Acad Sci U S A 2005; 102:16227-32. [PMID: 16251268 PMCID: PMC1283474 DOI: 10.1073/pnas.0508415102] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2005] [Indexed: 11/18/2022] Open
Abstract
Reconstructing a protein in three dimensions from its backbone torsion angles is an ongoing challenge because minor inaccuracies in these angles produce major errors in the structure. As a familiar example, a small change in an elbow angle causes a large displacement at the end of your arm, the longer the arm, the larger the displacement. Even accurate knowledge of the backbone torsions and Psi is insufficient, owing to the small, but cumulative, deviations from ideality in backbone planarity, which, if ignored, also lead to major errors in the structure. Against this background, we conducted a computational experiment to assess whether protein conformation can be determined from highly approximate backbone torsion angles, the kind of information that is now obtained readily from NMR. Specifically, backbone torsion angles were taken from proteins of known structure and mapped into 60 degrees x 60 degrees grid squares, called mesostates. Side-chain atoms beyond the beta -carbon were discarded. A mesostate representation of the protein backbone was then used to extract likely candidates from a fragment library of mesostate pentamers, followed by Monte Carlo-based fragment-assembly simulations to identify stable conformations compatible with the given mesostate sequence. Only three simple energy terms were used to gauge stability: molecular compaction, soft-sphere repulsion, and hydrogen bonding. For the six representative proteins described here, stable conformers can be partitioned into a remarkably small number of topologically distinct clusters. Among these, the native topology is found with high frequency and can be identified as the cluster with the most favorable energy.
Collapse
|
35
|
Sterics and solvation winnow accessible conformational space for unfolded proteins. J Mol Biol 2005; 353:873-87. [PMID: 16185713 DOI: 10.1016/j.jmb.2005.08.062] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2005] [Revised: 08/19/2005] [Accepted: 08/26/2005] [Indexed: 10/25/2022]
Abstract
The magnitude of protein conformational space is over-estimated by the traditional random-coil model, in which local steric restrictions arise exclusively from interactions between adjacent chain neighbors. Using a five-state model, we assessed the extent to which steric hindrance and hydrogen bond satisfaction, energetically significant factors, impose additional conformational restrictions on polypeptide chains, beyond adjacent residues. Steric hindrance is repulsive: the distance of closest approach between any two atoms cannot be less than the sum of their van der Waals radii. Hydrogen bond satisfaction is attractive: polar backbone atoms must form hydrogen bonds, either intramolecularly or to solvent water. To gauge the impact of these two factors on the magnitude of conformational space, we systematically enumerated and classified the disfavored conformations that restrict short polyalanyl backbone chains. Applying such restrictions to longer chains, we derived a scaling law to estimate conformational restriction as a function of chain length. Disfavored conformations predicted by the model were tested against experimentally determined structures in the coil library, a non-helix, non-strand subset of the PDB. These disfavored conformations are usually absent from the coil library, and exceptions can be uniformly rationalized.
Collapse
|
36
|
Abstract
Evidence from proteins and peptides supports the conclusion that intrapeptide hydrogen bonds stabilize the folded form of proteins. Paradoxically, evidence from small molecules supports the opposite conclusion, that intrapeptide hydrogen bonds are less favorable than peptide-water hydrogen bonds. A related issue-often lost in this debate about comparing peptide-peptide to peptide- water hydrogen bonds-involves the energetic cost of an unsatisfied hydrogen bond. Here, experiment and theory agree that breaking a hydrogen bond costs between 5 and 6 kcal/mol. Accordingly, the likelihood of finding an unsatisfied hydrogen bond in a protein is insignificant. This realization establishes a powerful rule for evaluating protein conformations.
Collapse
|
37
|
Abstract
Understanding the process of protein folding has been recognized as an important challenge for >70 years. It is, quintessentially, a thermodynamic problem and, arguably, thermodynamics is our most powerful discipline for understanding biological systems. Yet, despite all this, we still lack predictive understanding of protein folding. Is something missing from this picture?
Collapse
|
38
|
A novel method reveals that solvent water favors polyproline II over beta-strand conformation in peptides and unfolded proteins: conditional hydrophobic accessible surface area (CHASA). Protein Sci 2004; 14:111-8. [PMID: 15576559 PMCID: PMC2253334 DOI: 10.1110/ps.041047005] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
In aqueous solution, the ensemble of conformations sampled by peptides and unfolded proteins is largely determined by their interaction with water. It has been a long-standing goal to capture these solute-water energetics accurately and efficiently in calculations. Historically, accessible surface area (ASA) has been used to estimate these energies, but this method breaks down when applied to amphipathic peptides and proteins. Here we introduce a novel method in which hydrophobic ASA is determined after first positioning water oxygens in hydrogen-bonded orientations proximate to all accessible peptide/protein backbone N and O atoms. This conditional hydrophobic accessible surface area is termed CHASA. The CHASA method was validated by predicting the polyproline-II (P(II)) and beta-strand conformational preferences of non-proline residues in the coil library (i.e., non-alpha-helix, non-beta-strand, non-beta-turn library derived from X-ray elucidated structures). Further, the method successfully rationalizes the previously unexplained solvation energies in polyalanyl peptides and compares favorably with published experimentally determined P(II) residue propensities. We dedicate this paper to Frederic M. Richards.
Collapse
|
39
|
Abstract
The Gaussian-distributed random coil has been the dominant model for denatured proteins since the 1950s, and it has long been interpreted to mean that proteins are featureless, statistical coils in 6 M guanidinium chloride. Here, we demonstrate that random-coil statistics are not a unique signature of featureless polymers. The random-coil model does predict the experimentally determined coil dimensions of denatured proteins successfully. Yet, other equally convincing experiments have shown that denatured proteins are biased toward specific conformations, in apparent conflict with the random-coil model. We seek to resolve this paradox by introducing a contrived counterexample in which largely native protein ensembles nevertheless exhibit random-coil characteristics. Specifically, proteins of known structure were used to generate disordered conformers by varying backbone torsion angles at random for approximately 8% of the residues; the remaining approximately 92% of the residues remained fixed in their native conformation. Ensembles of these disordered structures were generated for 33 proteins by using a torsion-angle Monte Carlo algorithm with hard-sphere sterics; bulk statistics were then calculated for each ensemble. Despite this extreme degree of imposed internal structure, these ensembles have end-to-end distances and mean radii of gyration that agree well with random-coil expectations in all but two cases.
Collapse
|
40
|
Abstract
Does aqueous solvent discriminate among peptide conformers? To address this question, we computed the solvation free energy of a blocked, 12-residue polyalanyl-peptide in explicit water and analyzed its solvent structure. The peptide was modeled in each of 4 conformers: alpha-helix, antiparallel beta-strand, parallel beta-strand, and polyproline II helix (P(II)). Monte Carlo simulations in the canonical ensemble were performed at 300 K using the CHARMM 22 forcefield with TIP3P water. The simulations indicate that the solvation free energy of P(II) is favored over that of other conformers for reasons that defy conventional explanation. Specifically, in these 4 conformers, an almost perfect correlation is found between a residue's solvent-accessible surface area and the volume of its first solvent shell, but neither quantity is correlated with the observed differences in solvation free energy. Instead, solvation free energy tracks with the interaction energy between the peptide and its first-shell water. An additional, previously unrecognized contribution involves the conformation-dependent perturbation of first-shell solvent organization. Unlike P(II), beta-strands induce formation of entropically disfavored peptide:water bridges that order vicinal water in a manner reminiscent of the hydrophobic effect. The use of explicit water allows us to capture and characterize these dynamic water bridges that form and dissolve during our simulations.
Collapse
|
41
|
|
42
|
Steric restrictions in protein folding: an alpha-helix cannot be followed by a contiguous beta-strand. Protein Sci 2004; 13:633-9. [PMID: 14767081 PMCID: PMC2286724 DOI: 10.1110/ps.03503304] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Using only hard-sphere repulsion, we investigated short polyalanyl chains for the presence of sterically imposed conformational constraints beyond the dipeptide level. We found that a central residue in a helical peptide cannot adopt dihedral angles from strand regions without encountering a steric collision. Consequently, an alpha-helical segment followed by a beta-strand segment must be connected by an intervening linker. This restriction was validated both by simulations and by seeking violations within proteins of known structure. In fact, no violations were found within an extensive database of high-resolution X-ray structures. Nature's exclusion of alpha-beta hybrid segments, fashioned from an alpha-helix adjoined to a beta-strand, is built into proteins at the covalent level. This straightforward conformational constraint has far-reaching consequences in organizing unfolded proteins and limiting the number of possible protein domains.
Collapse
|
43
|
|
44
|
|
45
|
Abstract
Many single-domain proteins exhibit two-state folding kinetics, with folding rates that span more than six orders of magnitude. A quantity of much recent interest for such proteins is their contact order, the average separation in sequence between contacting residue pairs. Numerous studies have reached the surprising conclusion that contact order is well-correlated with the logarithm of the folding rate for these small, well-characterized molecules. Here, we investigate the physico-chemical basis for this finding by asking whether contact order is actually a composite number that measures the fraction of local secondary structure in the protein; viz. turns, helices, and hairpins. To pursue this question, we calculated the secondary structure content for 24 two-state proteins and obtained coefficients that predict their folding rates. The predicted rates correlate strongly with experimentally determined rates, comparable to the correlation with contact order. Further, these predicted folding rates are correlated strongly with contact order. Our results suggest that the folding rate of two-state proteins is a function of their local secondary structure content, consistent with the hierarchic model of protein folding. Accordingly, it should be possible to utilize secondary structure prediction methods to predict folding rates from sequence alone.
Collapse
|
46
|
Abstract
RNABase is a unified database of all three-dimensional structures containing RNA deposited in either the Protein Data Bank (PDB) or Nucleic Acid Data Base (NDB). For each structure, RNABase contains a brief summary as well as annotation of conformational parameters, identification of possible model errors, Ramachandran-style conformational maps and classification of ribonucleotides into conformers. These same analyses can also be performed on structures submitted by users. To facilitate access, structures are automatically placed into a variety of functional and structural categories, including: ribozymes, pseudoknots, etc. RNABase can be freely accessed on the web at http://www.rnabase.org. We are committed to maintaining this database indefinitely.
Collapse
|
47
|
Abstract
Most often, the unfolded state of peptides and proteins has been modeled as a statistical random coil. Here, we suggest an alternative model based on the presence of a significant, temperature-dependent conformational bias in the unfolded population. Conformational bias is suggested by our calculations [Proc. Natl. Acad. Sci. USA 96 (1999) 14258-14263], and it is found in recent studies of both proteins and peptides. The imposition of even a modest bias would transform our assessment of the folding problem.
Collapse
|
48
|
A simple model for polyproline II structure in unfolded states of alanine-based peptides. Protein Sci 2002; 11:2437-55. [PMID: 12237465 PMCID: PMC2373714 DOI: 10.1110/ps.0217402] [Citation(s) in RCA: 115] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2002] [Revised: 07/16/2002] [Accepted: 07/17/2002] [Indexed: 10/27/2022]
Abstract
The striking similarity between observed circular dichroism spectra of nonprolyl homopolymers and that of regular left-handed polyproline II (P(II)) helices prompted Tiffany and Krimm to propose in 1968 that unordered peptides and unfolded proteins are built of P(II) segments linked by sharp bends. A large body of experimental evidence, accumulated over the past three decades, provides compelling evidence in support of the original hypothesis of Tiffany and Krimm. Of particular interest are the recent experiments of Shi et al. who find significant P(II) structure in a short unfolded alanine-based peptide. What is the physical basis for P(II) helices in peptide and protein unfolded states? The widely accepted view is that favorable chain-solvent hydrogen bonds lead to a preference for dynamical fluctuations about noncooperative P(II) helices in water. Is this preference simply a consequence of hydrogen bonding or is it a manifestation of a more general trend for unfolded states which are appropriately viewed as chains in a good solvent? The prevalence of closely packed interiors in folded proteins suggests that under conditions that favor folding, water-which is a better solvent for itself than for any polypeptide chain-expels the chain from its midst, thereby maximizing chain packing. Implicit in this view is a complementary idea: under conditions that favor unfolding, chain-solvent interactions are preferred and in a so-called good solvent, chain packing density is minimized. In this work we show that minimization of chain packing density leads to preferred fluctuations for short polyalanyl chains around canonical, noncooperative P(II)-like conformations. Minimization of chain packing is modeled using a purely repulsive soft-core potential between polypeptide atoms. Details of chain-solvent interactions are ignored. Remarkably, the simple model captures the essential physics behind the preference of short unfolded alanine-based peptides for P(II) helices. Our results are based on a detailed analysis of the potential energy landscape which determines the system's structural and thermodynamic preferences. We use the inherent structure formalism of Stillinger and Weber, according to which the energy landscape is partitioned into basins of attraction around local minima. We find that the landscape for the experimentally studied seven-residue alanine-based peptide is dominated by fluctuations about two noncooperative structures: the left-handed polyproline II helix and its symmetry mate.
Collapse
|
49
|
Abstract
We report the identification and characterization of a novel cytokine-like gene family using structure-based methods to search for novel four-helix-bundle cytokines in genomics databases. There are four genes in this family, FAM3A, FAM3B, FAM3C, and FAM3D, each encoding a protein (224-235 amino acids) with a hydrophobic leader sequence. Northern analysis indicates that FAM3B is highly expressed in pancreas, FAM3D in placenta, and FAM3A and FAM3C in almost all tissues. Immunohistochemistry showed that FAM3A is expressed prominently in the vascular endothelium, particularly capillaries. We found that FAM3A and FAM3B protein were both localized to the islets of Langerhans of the endocrine pancreas. Recombinant FAM3B protein has delayed effects on beta-cell function, inhibiting basal insulin secretion from a beta-cell line in a dose-dependent manner.
Collapse
|
50
|
Abstract
A sequence of seven alanine residues-too short to form an alpha-helix and whose side chains do not interact with each other-is a particularly simple model for testing the common description of denatured proteins as structureless random coils. The (3)J(HN alpha) coupling constants of individual alanine residues have been measured from 2 to 56 degrees C by using isotopically labeled samples. The results display a thermal transition between different backbone conformations, which is confirmed by CD spectra. The NMR results suggest that polyproline II is the dominant conformation at 2 degrees C and the content of beta strand is increased by approximately 10% at 55 degrees C relative to that at 2 degrees C. The polyproline II conformation is consistent with recent studies of short alanine peptides, including structure prediction by ab initio quantum mechanics and solution structures for both a blocked alanine dipeptide and an alanine tripeptide. CD and other optical spectroscopies have found structure in longer "random coil" peptides and have implicated polyproline II, which is a major backbone conformation in residues within loop regions of protein structures. Our result suggests that the backbone conformational entropy in alanine peptides is considerably smaller than estimated by the random coil model. New thermodynamic data confirm this suggestion: the entropy loss on alanine helix formation is only 2.2 entropy units per residue.
Collapse
|