1
|
Pesce F, Bremer A, Tesei G, Hopkins JB, Grace CR, Mittag T, Lindorff-Larsen K. Design of intrinsically disordered protein variants with diverse structural properties. SCIENCE ADVANCES 2024; 10:eadm9926. [PMID: 39196930 PMCID: PMC11352843 DOI: 10.1126/sciadv.adm9926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 06/07/2024] [Indexed: 08/30/2024]
Abstract
Intrinsically disordered proteins (IDPs) perform a broad range of functions in biology, suggesting that the ability to design IDPs could help expand the repertoire of proteins with novel functions. Computational design of IDPs with specific conformational properties has, however, been difficult because of their substantial dynamics and structural complexity. We describe a general algorithm for designing IDPs with specific structural properties. We demonstrate the power of the algorithm by generating variants of naturally occurring IDPs that differ in compaction, long-range contacts, and propensity to phase separate. We experimentally tested and validated our designs and analyzed the sequence features that determine conformations. We show how our results are captured by a machine learning model, enabling us to speed up the algorithm. Our work expands the toolbox for computational protein design and will facilitate the design of proteins whose functions exploit the many properties afforded by protein disorder.
Collapse
Affiliation(s)
- Francesco Pesce
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Anne Bremer
- Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Giulio Tesei
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jesse B. Hopkins
- BioCAT, Department of Physics, Illinois Institute of Technology, Chicago, IL 60616, USA
| | - Christy R. Grace
- Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Tanja Mittag
- Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
2
|
Zambon A, Zecchina R, Tiana G. Structure of the space of folding protein sequences defined by large language models. Phys Biol 2024; 21:026002. [PMID: 38237200 DOI: 10.1088/1478-3975/ad205c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 01/18/2024] [Indexed: 02/01/2024]
Abstract
Proteins populate a manifold in the high-dimensional sequence space whose geometrical structure guides their natural evolution. Leveraging recently-developed structure prediction tools based on transformer models, we first examine the protein sequence landscape as defined by an effective energy that is a proxy of sequence foldability. This landscape shares characteristics with optimization challenges encountered in machine learning and constraint satisfaction problems. Our analysis reveals that natural proteins predominantly reside in wide, flat minima within this energy landscape. To investigate further, we employ statistical mechanics algorithms specifically designed to explore regions with high local entropy in relatively flat landscapes. Our findings indicate that these specialized algorithms can identify valleys with higher entropy compared to those found using traditional methods such as Monte Carlo Markov Chains. In a proof-of-concept case, we find that these highly entropic minima exhibit significant similarities to natural sequences, especially in critical key sites and local entropy. Additionally, evaluations through Molecular Dynamics suggests that the stability of these sequences closely resembles that of natural proteins. Our tool combines advancements in machine learning and statistical physics, providing new insights into the exploration of sequence landscapes where wide, flat minima coexist alongside a majority of narrower minima.
Collapse
Affiliation(s)
- A Zambon
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano, Via Celoria 16, 20133 Milano, Italy
| | - R Zecchina
- Bocconi University, via Roentgen 1, 20136 Milano, Italy
| | - G Tiana
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano, Via Celoria 16, 20133 Milano, Italy
- INFN, Sezione di Milano, Via Celoria 16, 20133 Milano, Italy
| |
Collapse
|
3
|
Tajana M, Trovato A, Tiana G. Key interaction patterns in proteins revealed by cluster expansion of the partition function. THE EUROPEAN PHYSICAL JOURNAL. E, SOFT MATTER 2022; 45:95. [PMID: 36447074 DOI: 10.1140/epje/s10189-022-00250-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 11/19/2022] [Indexed: 06/16/2023]
Abstract
The native conformation of structured proteins is stabilized by a complex network of interactions. We analyzed the elementary patterns that constitute such network and ranked them according to their importance in shaping protein sequence design. To achieve this goal, we employed a cluster expansion of the partition function in the space of sequences and evaluated numerically the statistical importance of each cluster. An important feature of this procedure is that it is applied to a dense finite system. We found that patterns that contribute most to the partition function are cycles with even numbers of nodes, while cliques are typically detrimental. Each cluster also gives a contribute to the sequence entropy, which is a measure of the evolutionary designability of a fold. We compared the entropies associated with different interaction patterns to their abundances in the native structures of real proteins.
Collapse
Affiliation(s)
- Matteo Tajana
- Department of Physics, Università degli Studi di Milano, Via Celoria 16, 20133, Milan, Italy
| | - Antonio Trovato
- Department of Physics and Astronomy "G. Galilei", Università degli Studi di Padova and INFN, Via Marzolo 8, 35121, Padova, Italy
| | - Guido Tiana
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, Via Celoria 16, 20133, Milan, Italy.
| |
Collapse
|
4
|
Magi Meconi G, Sasselli IR, Bianco V, Onuchic JN, Coluzza I. Key aspects of the past 30 years of protein design. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2022; 85:086601. [PMID: 35704983 DOI: 10.1088/1361-6633/ac78ef] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 06/15/2022] [Indexed: 06/15/2023]
Abstract
Proteins are the workhorse of life. They are the building infrastructure of living systems; they are the most efficient molecular machines known, and their enzymatic activity is still unmatched in versatility by any artificial system. Perhaps proteins' most remarkable feature is their modularity. The large amount of information required to specify each protein's function is analogically encoded with an alphabet of just ∼20 letters. The protein folding problem is how to encode all such information in a sequence of 20 letters. In this review, we go through the last 30 years of research to summarize the state of the art and highlight some applications related to fundamental problems of protein evolution.
Collapse
Affiliation(s)
- Giulia Magi Meconi
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | - Ivan R Sasselli
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | | | - Jose N Onuchic
- Center for Theoretical Biological Physics, Department of Physics & Astronomy, Department of Chemistry, Department of Biosciences, Rice University, Houston, TX 77251, United States of America
| | - Ivan Coluzza
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Bld. Martina Casiano, UPV/EHU Science Park, Barrio Sarriena s/n, 48940 Leioa, Spain
- Basque Foundation for Science, Ikerbasque, 48009, Bilbao, Spain
| |
Collapse
|
5
|
Miyazawa S. Boltzmann Machine Learning and Regularization Methods for Inferring Evolutionary Fields and Couplings From a Multiple Sequence Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:328-342. [PMID: 32396099 DOI: 10.1109/tcbb.2020.2993232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The inverse Potts problem to infer a Boltzmann distribution for homologous protein sequences from their single-site and pairwise amino acid frequencies recently attracts a great deal of attention in the studies of protein structure and evolution. We study regularization and learning methods and how to tune regularization parameters to correctly infer interactions in Boltzmann machine learning. Using L2 regularization for fields, group L1 for couplings is shown to be very effective for sparse couplings in comparison with L2 and L1. Two regularization parameters are tuned to yield equal values for both the sample and ensemble averages of evolutionary energy. Both averages smoothly change and converge, but their learning profiles are very different between learning methods. The Adam method is modified to make stepsize proportional to the gradient for sparse couplings and to use a soft-thresholding function for group L1. It is shown by first inferring interactions from protein sequences and then from Monte Carlo samples that the fields and couplings can be well recovered, but that recovering the pairwise correlations in the resolution of a total energy is harder for the natural proteins than for the protein-like sequences. Selective temperature for folding/structural constrains in protein evolution is also estimated.
Collapse
|
6
|
Zhao VY, Rodrigues JV, Lozovsky ER, Hartl DL, Shakhnovich EI. Switching an active site helix in dihydrofolate reductase reveals limits to subdomain modularity. Biophys J 2021; 120:4738-4750. [PMID: 34571014 PMCID: PMC8595743 DOI: 10.1016/j.bpj.2021.09.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 09/14/2021] [Accepted: 09/22/2021] [Indexed: 11/23/2022] Open
Abstract
To what degree are individual structural elements within proteins modular such that similar structures from unrelated proteins can be interchanged? We study subdomain modularity by creating 20 chimeras of an enzyme, Escherichia coli dihydrofolate reductase (DHFR), in which a catalytically important, 10-residue α-helical sequence is replaced by α-helical sequences from a diverse set of proteins. The chimeras stably fold but have a range of diminished thermal stabilities and catalytic activities. Evolutionary coupling analysis indicates that the residues of this α-helix are under selection pressure to maintain catalytic activity in DHFR. Reversion to phenylalanine at key position 31 was found to partially restore catalytic activity, which could be explained by evolutionary coupling values. We performed molecular dynamics simulations using replica exchange with solute tempering. Chimeras with low catalytic activity exhibit nonhelical conformations that block the binding site and disrupt the positioning of the catalytically essential residue D27. Simulation observables and in vitro measurements of thermal stability and substrate-binding affinity are strongly correlated. Several E. coli strains with chromosomally integrated chimeric DHFRs can grow, with growth rates that follow predictions from a kinetic flux model that depends on the intracellular abundance and catalytic activity of DHFR. Our findings show that although α-helices are not universally substitutable, the molecular and fitness effects of modular segments can be predicted by the biophysical compatibility of the replacement segment.
Collapse
Affiliation(s)
- Victor Y Zhao
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - João V Rodrigues
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - Elena R Lozovsky
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts
| | - Daniel L Hartl
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts.
| |
Collapse
|
7
|
Crippa M, Andreghetti D, Capelli R, Tiana G. Evolution of frustrated and stabilising contacts in reconstructed ancient proteins. EUROPEAN BIOPHYSICS JOURNAL 2021; 50:699-712. [PMID: 33569610 PMCID: PMC8260555 DOI: 10.1007/s00249-021-01500-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 12/14/2020] [Accepted: 01/13/2021] [Indexed: 11/30/2022]
Abstract
Energetic properties of a protein are a major determinant of its evolutionary fitness. Using a reconstruction algorithm, dating the reconstructed proteins and calculating the interaction network between their amino acids through a coevolutionary approach, we studied how the interactions that stabilise 890 proteins, belonging to five families, evolved for billions of years. In particular, we focused our attention on the network of most strongly attractive contacts and on that of poorly optimised, frustrated contacts. Our results support the idea that the cluster of most attractive interactions extends its size along evolutionary time, but from the data, we cannot conclude that protein stability or that the degree of frustration tends always to decrease.
Collapse
Affiliation(s)
- Martina Crippa
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Damiano Andreghetti
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy
| | - Riccardo Capelli
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Guido Tiana
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy.
| |
Collapse
|
8
|
The hydrophobic effect characterises the thermodynamic signature of amyloid fibril growth. PLoS Comput Biol 2020; 16:e1007767. [PMID: 32365068 PMCID: PMC7282669 DOI: 10.1371/journal.pcbi.1007767] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 06/09/2020] [Accepted: 03/02/2020] [Indexed: 11/19/2022] Open
Abstract
Many proteins have the potential to aggregate into amyloid fibrils, protein polymers associated with a wide range of human disorders such as Alzheimer’s and Parkinson’s disease. The thermodynamic stability of amyloid fibrils, in contrast to that of folded proteins, is not well understood: the balance between entropic and enthalpic terms, including the chain entropy and the hydrophobic effect, are poorly characterised. Using a combination of theory, in vitro experiments, simulations of a coarse-grained protein model and meta-data analysis, we delineate the enthalpic and entropic contributions that dominate amyloid fibril elongation. Our prediction of a characteristic temperature-dependent enthalpic signature is confirmed by the performed calorimetric experiments and a meta-analysis over published data. From these results we are able to define the necessary conditions to observe cold denaturation of amyloid fibrils. Overall, we show that amyloid fibril elongation is associated with a negative heat capacity, the magnitude of which correlates closely with the hydrophobic surface area that is buried upon fibril formation, highlighting the importance of hydrophobicity for fibril stability. Most proteins fold in the cell into stable, compact structures. Nevertheless, many proteins also have the ability to stick together, forming long fibrillar structures that are associated with a wide range of human disorders including Alzheimer’s and Parkinson’s disease. The exact nature of the amyloid-causing stickiness is not well understood, nevertheless amyloid fibrils show some very specific thermodynamic characteristics. Some fibrils even destabilise at low temperatures. In this work we translate hydrophobic theory previously used to model protein folding to fibril formation. We combine this theory with experimental measurements, simulations and meta-data analysis of different types of fibrils. This allowed us to unravel the nature of the stickiness in amyloid fibrils by observing the effect of temperature changes, specifically at low temperatures, on hydrophobicity.
Collapse
|
9
|
Bianco V, Franzese G, Coluzza I. In Silico Evidence That Protein Unfolding is a Precursor of Protein Aggregation. Chemphyschem 2020; 21:377-384. [DOI: 10.1002/cphc.201900904] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 11/01/2019] [Indexed: 11/08/2022]
Affiliation(s)
- Valentino Bianco
- Faculty of Chemistry, Chemical Physics Department, Universidad Complutense de Madrid, Plaza de las Ciencias Ciudad Universitaria Madrid 28040 Spain
| | - Giancarlo Franzese
- Secció de Física Estadística i Interdisciplinària-Departament de Física de la Matèria Condensada, Facultat de Física & Institute of Nanoscience and Nanotechnology (IN2UB) Universitat de Barcelona Martí i Franquès 1 08028 Barcelona Spain
| | - Ivan Coluzza
- CIC biomaGUNE Paseo Miramon 182 20014 San Sebastian Spain
- IKERBASQUE, Basque Foundation for Science 48013 Bilbao Spain
| |
Collapse
|
10
|
Zhou J, Panaitiu AE, Grigoryan G. A general-purpose protein design framework based on mining sequence-structure relationships in known protein structures. Proc Natl Acad Sci U S A 2020; 117:1059-1068. [PMID: 31892539 PMCID: PMC6969538 DOI: 10.1073/pnas.1908723117] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Current state-of-the-art approaches to computational protein design (CPD) aim to capture the determinants of structure from physical principles. While this has led to many successful designs, it does have strong limitations associated with inaccuracies in physical modeling, such that a reliable general solution to CPD has yet to be found. Here, we propose a design framework-one based on identifying and applying patterns of sequence-structure compatibility found in known proteins, rather than approximating them from models of interatomic interactions. We carry out extensive computational analyses and an experimental validation for our method. Our results strongly argue that the Protein Data Bank is now sufficiently large to enable proteins to be designed by using only examples of structural motifs from unrelated proteins. Because our method is likely to have orthogonal strengths relative to existing techniques, it could represent an important step toward removing remaining barriers to robust CPD.
Collapse
Affiliation(s)
- Jianfu Zhou
- Department of Computer Science, Dartmouth College, Hanover, NH 03755
| | | | - Gevorg Grigoryan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755;
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755
| |
Collapse
|
11
|
Bianco V, Alonso-Navarro M, Di Silvio D, Moya S, Cortajarena AL, Coluzza I. Proteins are Solitary! Pathways of Protein Folding and Aggregation in Protein Mixtures. J Phys Chem Lett 2019; 10:4800-4804. [PMID: 31373499 DOI: 10.1021/acs.jpclett.9b01753] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a computational and experimental study on the folding and aggregation in solutions of multiple protein mixtures at different concentrations. We show how in protein mixtures each component is capable of maintaining its folded state at densities greater than the one at which they would precipitate in single-species solutions. We demonstrate the generality of our observation over many different proteins using computer simulations capable of fully characterizing the cross-aggregation phase diagram of all the mixtures. Dynamic light scattering experiments were performed to evaluate the aggregation of two proteins, bovine serum albumin (BSA) and consensus tetratricopeptide repeat (CTPR), in solutions of one or both proteins. The experiments confirm our hypothesis and the simulations. These findings elucidate critical aspects of the cross-regulation of expression and aggregation of proteins exerted by the cell and on the evolutionary selection of folding and non-aggregating protein sequences, paving the way for new experimental tests.
Collapse
Affiliation(s)
- Valentino Bianco
- Faculty of Chemistry, Chemical Physics Deprtment, Universidad Complutense de Madrid, Plaza de las Ciencias, Ciudad Universitaria, Madrid 28040, Spain
| | | | | | - Sergio Moya
- CIC biomaGUNE, Paseo Miramon 182, 20014 San Sebastian, Spain
| | - Aitziber L Cortajarena
- CIC biomaGUNE, Paseo Miramon 182, 20014 San Sebastian, Spain
- IKERBASQUE, Basque Foundation for Science, 48013 Bilbao, Spain
| | - Ivan Coluzza
- CIC biomaGUNE, Paseo Miramon 182, 20014 San Sebastian, Spain
- IKERBASQUE, Basque Foundation for Science, 48013 Bilbao, Spain
| |
Collapse
|
12
|
Marchi J, Galpern EA, Espada R, Ferreiro DU, Walczak AM, Mora T. Size and structure of the sequence space of repeat proteins. PLoS Comput Biol 2019; 15:e1007282. [PMID: 31415557 PMCID: PMC6733475 DOI: 10.1371/journal.pcbi.1007282] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2019] [Revised: 09/09/2019] [Accepted: 07/24/2019] [Indexed: 11/18/2022] Open
Abstract
The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family—the total number of sequences in that family—can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of ∼30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design. Natural protein molecules are only a small subset of the possible strings of amino acids. This naturally calls the question of how many protein sequences theoretically exist that are functional, and how many have already been explored by nature. To help answer this question, we developed a statistical method to calculate the total potential number of protein sequences of a given family, focusing on three families of repeat proteins, which play important roles in a variety of cellular processes. The number of sequences that we compute is limited by functional interactions between the residues of the protein, as well as its evolutionary history. Applying techniques from the physics of disordered systems, we show that the space of sequences has a rugged structure, which could hinder their evolution. Individual proteins can be organised into distinct clusters corresponding to basins of attraction of the landscape, suggesting the existence of subfamilies within each family.
Collapse
Affiliation(s)
- Jacopo Marchi
- Laboratoire de physique de l’École normale supérieure (PSL University), CNRS, Sorbonne Université, and Université de Paris, 75005 Paris, France
| | - Ezequiel A. Galpern
- Protein Physiology Lab, Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Química Biológica, Buenos Aires, Argentina
- CONICET - Universidad de Buenos Aires, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Buenos Aires, Argentina
| | - Rocio Espada
- Laboratoire Gulliver, Ecole supérieure de physique et chimie industrielles (PSL University) and CNRS, 75005, Paris, France
| | - Diego U. Ferreiro
- Protein Physiology Lab, Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Química Biológica, Buenos Aires, Argentina
- CONICET - Universidad de Buenos Aires, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Buenos Aires, Argentina
| | - Aleksandra M. Walczak
- Laboratoire de physique de l’École normale supérieure (PSL University), CNRS, Sorbonne Université, and Université de Paris, 75005 Paris, France
- * E-mail: (AMW); (TM)
| | - Thierry Mora
- Laboratoire de physique de l’École normale supérieure (PSL University), CNRS, Sorbonne Université, and Université de Paris, 75005 Paris, France
- * E-mail: (AMW); (TM)
| |
Collapse
|
13
|
Dijkstra M, Fokkink W, Heringa J, van Dijk E, Abeln S. The characteristics of molten globule states and folding pathways strongly depend on the sequence of a protein. Mol Phys 2018. [DOI: 10.1080/00268976.2018.1496290] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- M.J.J. Dijkstra
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - W.J. Fokkink
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - J. Heringa
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - E. van Dijk
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - S. Abeln
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
14
|
Kinjo AR. Cooperative "folding transition" in the sequence space facilitates function-driven evolution of protein families. J Theor Biol 2018; 443:18-27. [PMID: 29355538 DOI: 10.1016/j.jtbi.2018.01.019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2017] [Revised: 01/16/2018] [Accepted: 01/17/2018] [Indexed: 12/23/2022]
Abstract
In the protein sequence space, natural proteins form clusters of families which are characterized by their unique native folds whereas the great majority of random polypeptides are neither clustered nor foldable to unique structures. Since a given polypeptide can be either foldable or unfoldable, a kind of "folding transition" is expected at the boundary of a protein family in the sequence space. By Monte Carlo simulations of a statistical mechanical model of protein sequence alignment that coherently incorporates both short-range and long-range interactions as well as variable-length insertions to reproduce the statistics of the multiple sequence alignment of a given protein family, we demonstrate the existence of such transition between natural-like sequences and random sequences in the sequence subspaces for 15 domain families of various folds. The transition was found to be highly cooperative and two-state-like. Furthermore, enforcing or suppressing consensus residues on a few of the well-conserved sites enhanced or diminished, respectively, the natural-like pattern formation over the entire sequence. In most families, the key sites included ligand binding sites. These results suggest some selective pressure on the key residues, such as ligand binding activity, may cooperatively facilitate the emergence of a protein family during evolution. From a more practical aspect, the present results highlight an essential role of long-range effects in precisely defining protein families, which are absent in conventional sequence models.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan.
| |
Collapse
|
15
|
Selection originating from protein stability/foldability: Relationships between protein folding free energy, sequence ensemble, and fitness. J Theor Biol 2017; 433:21-38. [DOI: 10.1016/j.jtbi.2017.08.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Revised: 07/27/2017] [Accepted: 08/21/2017] [Indexed: 11/19/2022]
|
16
|
Bianco V, Pagès-Gelabert N, Coluzza I, Franzese G. How the stability of a folded protein depends on interfacial water properties and residue-residue interactions. J Mol Liq 2017. [DOI: 10.1016/j.molliq.2017.08.026] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
17
|
Tian P, Best RB. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis. Biophys J 2017; 113:1719-1730. [PMID: 29045866 PMCID: PMC5647607 DOI: 10.1016/j.bpj.2017.08.039] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Revised: 08/03/2017] [Accepted: 08/08/2017] [Indexed: 12/23/2022] Open
Abstract
Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance.
Collapse
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Robert B Best
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland.
| |
Collapse
|
18
|
Rahamim G, Amir D, Haas E. Simultaneous Determination of Two Subdomain Folding Rates Using the "Transfer-Quench" Method. Biophys J 2017; 112:1786-1796. [PMID: 28494950 DOI: 10.1016/j.bpj.2017.01.037] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Revised: 12/21/2016] [Accepted: 01/06/2017] [Indexed: 11/29/2022] Open
Abstract
The investigation of the mechanism of protein folding is complicated by the context dependence of the rates of intramolecular contact formation. Methods based on site-specific labeling and ultrafast spectroscopic detection of fluorescence signals were developed for monitoring the rates of individual subdomain folding transitions in situ, in the context of the whole molecule. However, each site-specific labeling modification might affect rates of folding of near-neighbor structural elements, and thus limit the ability to resolve fine differences in rates of folding of these elements. Therefore, it is highly desirable to be able to study the rates of folding of two or more neighboring subdomain structures using a single mutant to facilitate resolution of the order and interdependence of such steps. Here, we report the development of the "Transfer-Quench" method for measuring the rate of formation of two structural elements using a single triple-labeled mutant. This method is based on Förster resonance energy transfer combined with fluorescence quenching. We placed the donor and acceptor at the loop ends, and a quencher at an α-helical element involved in the node forming the loop. The folding of the triple-labeled mutant is monitored by the acceptor emission. The formation of nonlocal contact (loop closure) increases the time-dependent acceptor emission, while the closure of the labeled helix turn reduces this emission. The method was applied in a study of the folding mechanism of the common model protein, the B domain of staphylococcal protein A. Only natural amino acids were used as probes, and thus possible structural perturbations were minimized. Tyr and Trp residues served as donor and acceptor at the ends of a long loop between helices I and II, and a Cys residue as a quencher for the acceptor. We found that the closure of the loop (segment 14-33) occurs with the same rate constant as the nucleation of helix HII (segment 33-29), in line with the nucleation-condensation model.
Collapse
Affiliation(s)
- Gil Rahamim
- The Goodman Faculty of Life Sciences Bar Ilan University, Ramat Gan, Israel
| | - Dan Amir
- The Goodman Faculty of Life Sciences Bar Ilan University, Ramat Gan, Israel
| | - Elisha Haas
- The Goodman Faculty of Life Sciences Bar Ilan University, Ramat Gan, Israel.
| |
Collapse
|
19
|
Coluzza I. Computational protein design: a review. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2017; 29:143001. [PMID: 28140371 DOI: 10.1088/1361-648x/aa5c76] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Proteins are one of the most versatile modular assembling systems in nature. Experimentally, more than 110 000 protein structures have been identified and more are deposited every day in the Protein Data Bank. Such an enormous structural variety is to a first approximation controlled by the sequence of amino acids along the peptide chain of each protein. Understanding how the structural and functional properties of the target can be encoded in this sequence is the main objective of protein design. Unfortunately, rational protein design remains one of the major challenges across the disciplines of biology, physics and chemistry. The implications of solving this problem are enormous and branch into materials science, drug design, evolution and even cryptography. For instance, in the field of drug design an effective computational method to design protein-based ligands for biological targets such as viruses, bacteria or tumour cells, could give a significant boost to the development of new therapies with reduced side effects. In materials science, self-assembly is a highly desired property and soon artificial proteins could represent a new class of designable self-assembling materials. The scope of this review is to describe the state of the art in computational protein design methods and give the reader an outline of what developments could be expected in the near future.
Collapse
Affiliation(s)
- Ivan Coluzza
- Computational Physics, Faculty of Physics, University of Vienna, Vienna, Austria
| |
Collapse
|
20
|
Williams M. Statistical physics of the symmetric group. Phys Rev E 2017; 95:042126. [PMID: 28505735 DOI: 10.1103/physreve.95.042126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Indexed: 06/07/2023]
Abstract
Ordered chains (such as chains of amino acids) are ubiquitous in biological cells, and these chains perform specific functions contingent on the sequence of their components. Using the existence and general properties of such sequences as a theoretical motivation, we study the statistical physics of systems whose state space is defined by the possible permutations of an ordered list, i.e., the symmetric group, and whose energy is a function of how certain permutations deviate from some chosen correct ordering. Such a nonfactorizable state space is quite different from the state spaces typically considered in statistical physics systems and consequently has novel behavior in systems with interacting and even noninteracting Hamiltonians. Various parameter choices of a mean-field model reveal the system to contain five different physical regimes defined by two transition temperatures, a triple point, and a quadruple point. Finally, we conclude by discussing how the general analysis can be extended to state spaces with more complex combinatorial properties and to other standard questions of statistical mechanics models.
Collapse
Affiliation(s)
- Mobolaji Williams
- Department of Physics, Harvard University, Cambridge, Massachusetts 02138, USA
| |
Collapse
|
21
|
Choi JM, Gilson AI, Shakhnovich EI. Graph's Topology and Free Energy of a Spin Model on the Graph. PHYSICAL REVIEW LETTERS 2017; 118:088302. [PMID: 28282198 PMCID: PMC5668130 DOI: 10.1103/physrevlett.118.088302] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Indexed: 06/06/2023]
Abstract
In this Letter we investigate a direct relationship between a graph's topology and the free energy of a spin system on the graph. We develop a method of separating topological and energetic contributions to the free energy, and find that considering the topology is sufficient to qualitatively compare the free energies of different graph systems at high temperature, even when the energetics are not fully known. This method was applied to the metal lattice system with defects, and we found that it partially explains why point defects are more stable than high-dimensional defects. Given the energetics, we can even quantitatively compare free energies of different graph structures via a closed form of linear graph contributions. The closed form is applied to predict the sequence-space free energy of lattice proteins, which is a key factor determining the designability of a protein structure.
Collapse
Affiliation(s)
- Jeong-Mo Choi
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, Massachusetts 02138, USA
| | - Amy I Gilson
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, Massachusetts 02138, USA
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, Massachusetts 02138, USA
| |
Collapse
|
22
|
Bianchi E, Capone B, Coluzza I, Rovigatti L, van Oostrum PDJ. Limiting the valence: advancements and new perspectives on patchy colloids, soft functionalized nanoparticles and biomolecules. Phys Chem Chem Phys 2017; 19:19847-19868. [DOI: 10.1039/c7cp03149a] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Artistic representation of limited valance units consisting of a soft core (in blue) and a small number of flexible bonding patches (in orange).
Collapse
Affiliation(s)
- Emanuela Bianchi
- Faculty of Physics
- University of Vienna
- A-1090 Vienna
- Austria
- Institute for Theoretical Physics
| | - Barbara Capone
- Faculty of Physics
- University of Vienna
- A-1090 Vienna
- Austria
- Dipartimento di Scienze
| | - Ivan Coluzza
- Faculty of Physics
- University of Vienna
- A-1090 Vienna
- Austria
| | - Lorenzo Rovigatti
- Faculty of Physics
- University of Vienna
- A-1090 Vienna
- Austria
- Rudolf Peierls Centre for Theoretical Physics
| | - Peter D. J. van Oostrum
- Department of Nanobiotechnology
- Institute for Biologically Inspired Materials
- University of Natural Resources and Life Sciences
- A-1190 Vienna
- Austria
| |
Collapse
|
23
|
Contini A, Tiana G. A many-body term improves the accuracy of effective potentials based on protein coevolutionary data. J Chem Phys 2016; 143:025103. [PMID: 26178131 DOI: 10.1063/1.4926665] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The study of correlated mutations in alignments of homologous proteins proved to be successful not only in the prediction of their native conformation but also in the development of a two-body effective potential between pairs of amino acids. In the present work, we extend the effective potential, introducing a many-body term based on the same theoretical framework, making use of a principle of maximum entropy. The extended potential performs better than the two-body one in predicting the energetic effect of 308 mutations in 14 proteins (including membrane proteins). The average value of the parameters of the many-body term correlates with the degree of hydrophobicity of the corresponding residues, suggesting that this term partly reflects the effect of the solvent.
Collapse
Affiliation(s)
- A Contini
- Department of Physics, Università degli Studi di Milano, via Celoria 16, 20133 Milano, Italy
| | - G Tiana
- Department of Physics, Università degli Studi di Milano, and INFN, via Celoria 16, 20133 Milano, Italy
| |
Collapse
|
24
|
van Dijk E, Varilly P, Knowles TPJ, Frenkel D, Abeln S. Consistent Treatment of Hydrophobicity in Protein Lattice Models Accounts for Cold Denaturation. PHYSICAL REVIEW LETTERS 2016; 116:078101. [PMID: 26943560 DOI: 10.1103/physrevlett.116.078101] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Indexed: 05/04/2023]
Abstract
The hydrophobic effect stabilizes the native structure of proteins by minimizing the unfavorable interactions between hydrophobic residues and water through the formation of a hydrophobic core. Here, we include the entropic and enthalpic contributions of the hydrophobic effect explicitly in an implicit solvent model. This allows us to capture two important effects: a length-scale dependence and a temperature dependence for the solvation of a hydrophobic particle. This consistent treatment of the hydrophobic effect explains cold denaturation and heat capacity measurements of solvated proteins.
Collapse
Affiliation(s)
- Erik van Dijk
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom
- Centre for Integrative Bioinformatics (IBIVU), Vrije Universiteit, De Boelelaan 1081A, 1081 HV Amsterdam, Netherlands
| | - Patrick Varilly
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom
| | - Tuomas P J Knowles
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom
| | - Daan Frenkel
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom
| | - Sanne Abeln
- Centre for Integrative Bioinformatics (IBIVU), Vrije Universiteit, De Boelelaan 1081A, 1081 HV Amsterdam, Netherlands
| |
Collapse
|
25
|
Huang YM, Banerjee S, Crone DE, Schenkelberg CD, Pitman DJ, Buck PM, Bystroff C. Toward Computationally Designed Self-Reporting Biosensors Using Leave-One-Out Green Fluorescent Protein. Biochemistry 2015; 54:6263-73. [PMID: 26397806 DOI: 10.1021/acs.biochem.5b00786] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Leave-one-out green fluorescent protein (LOOn-GFP) is a circularly permuted and truncated GFP lacking the nth β-strand element. LOO7-GFP derived from the wild-type sequence (LOO7-WT) folds and reconstitutes fluorescence upon addition of β-strand 7 (S7) as an exogenous peptide. Computational protein design may be used to modify the sequence of LOO7-GFP to fit a different peptide sequence, while retaining the reconstitution activity. Here we present a computationally designed leave-one-out GFP in which wild-type strand 7 has been replaced by a 12-residue peptide (HA) from the H5 antigenic region of the Thailand strain of H5N1 influenza virus hemagglutinin. The DEEdesign software was used to generate a sequence library with mutations at 13 positions around the peptide, coding for approximately 3 × 10(5) sequence combinations. The library was coexpressed with the HA peptide in E. coli and colonies were screened for in vivo fluorescence. Glowing colonies were sequenced, and one (LOO7-HA4) with 7 mutations was purified and characterized. LOO7-HA4 folds, fluoresces in vivo and in vitro, and binds HA. However, binding results in a decrease in fluorescence instead of the expected increase, caused by the peptide-induced dissociation of a novel, glowing oligomeric complex instead of the reconstitution of the native structure. Efforts to improve binding and recover reconstitution using in vitro evolution produced colonies that glowed brighter and matured faster. Two of these were characterized. One lost all affinity for the HA peptide but glowed more brightly in the unbound oligomeric state. The other increased in affinity to the HA peptide but still did not reconstitute the fully folded state. Despite failing to fold completely, peptide binding by computational design was observed and was improved by directed evolution. The ratio of HA to S7 binding increased from 0.0 for the wild-type sequence (no binding) to 0.01 after computational design (weak binding) and to 0.48 (comparable binding) after in vitro evolution. The novel oligomeric state is composed of an open barrel.
Collapse
Affiliation(s)
- Yao-Ming Huang
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco , San Francisco, California 94158, United States
| | | | | | | | | | | | | |
Collapse
|
26
|
Affiliation(s)
- Ivan Coluzza
- Department of Computational Physics, Faculty of Physics, University of Vienna , Vienna, Austria
| |
Collapse
|
27
|
Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection. Proc Natl Acad Sci U S A 2014; 111:12408-13. [PMID: 25114242 DOI: 10.1073/pnas.1413575111] [Citation(s) in RCA: 111] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The energy landscape used by nature over evolutionary timescales to select protein sequences is essentially the same as the one that folds these sequences into functioning proteins, sometimes in microseconds. We show that genomic data, physical coarse-grained free energy functions, and family-specific information theoretic models can be combined to give consistent estimates of energy landscape characteristics of natural proteins. One such characteristic is the effective temperature T(sel) at which these foldable sequences have been selected in sequence space by evolution. T(sel) quantifies the importance of folded-state energetics and structural specificity for molecular evolution. Across all protein families studied, our estimates for T(sel) are well below the experimental folding temperatures, indicating that the energy landscapes of natural foldable proteins are strongly funneled toward the native state.
Collapse
|
28
|
Lui S, Tiana G. The network of stabilizing contacts in proteins studied by coevolutionary data. J Chem Phys 2014; 139:155103. [PMID: 24160546 DOI: 10.1063/1.4826096] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The primary structure of proteins, that is their sequence, represents one of the most abundant sets of experimental data concerning biomolecules. The study of correlations in families of co-evolving proteins by means of an inverse Ising-model approach allows to obtain information on their native conformation. Following up on a recent development along this line, we optimize the algorithm to calculate effective energies between the residues, validating the approach both back-calculating interaction energies in a model system, and predicting the free energies associated to mutations in real systems. Making use of these effective energies, we study the network of interactions which stabilizes the native conformation of some well-studied proteins, showing that it displays different properties than the associated contact network.
Collapse
Affiliation(s)
- Sara Lui
- Department of Physics, Università degli Studi di Milano, via Celoria 16, 20133 Milano, Italy
| | | |
Collapse
|
29
|
Grigoryan G. Absolute free energies of biomolecules from unperturbed ensembles. J Comput Chem 2013; 34:2726-41. [PMID: 24132787 DOI: 10.1002/jcc.23448] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Revised: 07/11/2013] [Accepted: 08/31/2013] [Indexed: 01/31/2023]
Abstract
Computing the absolute free energy of a macromolecule's structural state, F, is a challenging problem of high relevance. This study presents a method that computes F using only information from an unperturbed simulation of the macromolecule in the relevant conformational state, ensemble, and environment. Absolute free energies produced by this method, dubbed Valuation of Local Configuration Integral with Dynamics (VALOCIDY), enable comparison of alternative states. For example, comparing explicitly solvated and vaporous states of amino acid side-chain analogs produces solvation free energies in good agreement with experiments. Also, comparisons between alternative conformational states of model heptapeptides (including the unfolded state) produce free energy differences in agreement with data from μs molecular-dynamics simulations and experimental propensities. The potential of using VALOCIDY in computational protein design is explored via a small design problem of stabilizing a β-turn structure. When VALOCIDY-based estimation of folding free energy is used as the design metric, the resulting sequence folds into the desired structure within the atomistic force field used in design. The VALOCIDY-based approach also recognizes the distinct status of the native sequence regardless of minor details of the starting template structure, in stark contrast with a traditional fixed-backbone approach.
Collapse
Affiliation(s)
- Gevorg Grigoryan
- Department of Computer Science and Department of Biology, Dartmouth College, Hanover, New Hampshire, 03755
| |
Collapse
|
30
|
Mach P, Koehl P. Capturing protein sequence-structure specificity using computational sequence design. Proteins 2013; 81:1556-70. [DOI: 10.1002/prot.24307] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Revised: 03/28/2013] [Accepted: 04/11/2013] [Indexed: 02/05/2023]
Affiliation(s)
- Paul Mach
- Department of Applied Mathematics; Genome Center; University of California; Davis 95616 California
| | - Patrice Koehl
- Department of Computer Science; Genome Center; University of California; Davis 95616 California
| |
Collapse
|
31
|
Orevi T, Rahamim G, Hazan G, Amir D, Haas E. The loop hypothesis: contribution of early formed specific non-local interactions to the determination of protein folding pathways. Biophys Rev 2013; 5:85-98. [PMID: 28510159 DOI: 10.1007/s12551-013-0113-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2013] [Accepted: 03/01/2013] [Indexed: 12/12/2022] Open
Abstract
The extremely fast and efficient folding transition (in seconds) of globular proteins led to the search for some unifying principles embedded in the physics of the folding polypeptides. Most of the proposed mechanisms highlight the role of local interactions that stabilize secondary structure elements or a folding nucleus as the starting point of the folding pathways, i.e., a "bottom-up" mechanism. Non-local interactions were assumed either to stabilize the nucleus or lead to the later steps of coalescence of the secondary structure elements. An alternative mechanism was proposed, an "up-down" mechanism in which it was assumed that folding starts with the formation of very few non-local interactions which form closed long loops at the initiation of folding. The possible biological advantage of this mechanism, the "loop hypothesis", is that the hydrophobic collapse is associated with ordered compactization which reduces the chance for degradation and misfolding. In the present review the experiments, simulations and theoretical consideration that either directly or indirectly support this mechanism are summarized. It is argued that experiments monitoring the time-dependent development of the formation of specifically targeted early-formed sub-domain structural elements, either long loops or secondary structure elements, are necessary. This can be achieved by the time-resolved FRET-based "double kinetics" method in combination with mutational studies. Yet, attempts to improve the time resolution of the folding initiation should be extended down to the sub-microsecond time regime in order to design experiments that would resolve the classes of proteins which first fold by local or non-local interactions.
Collapse
Affiliation(s)
- Tomer Orevi
- The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel, 52900
| | - Gil Rahamim
- The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel, 52900
| | - Gershon Hazan
- The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel, 52900
| | - Dan Amir
- The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel, 52900
| | - Elisha Haas
- The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel, 52900.
| |
Collapse
|
32
|
Matthies MC, Bienert S, Torda AE. Dynamics in Sequence Space for RNA Secondary Structure Design. J Chem Theory Comput 2012; 8:3663-70. [PMID: 26593011 DOI: 10.1021/ct300267j] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
We have implemented a method for the design of RNA sequences that should fold to arbitrary secondary structures. A popular energy model allows one to take the derivative with respect to composition, which can then be interpreted as a force and used for Newtonian dynamics in sequence space. Combined with a negative design term, one can rapidly sample sequences which are compatible with a desired secondary structure via simulated annealing. Results for 360 structures were compared with those from another nucleic acid design program using measures such as the probability of the target structure and an ensemble-weighted distance to the target structure.
Collapse
Affiliation(s)
- Marco C Matthies
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany
| | - Stefan Bienert
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany.,Biozentrum, University of Basel, Klingelbergstr. 50/70, 4056 Basel, Switzerland
| | - Andrew E Torda
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany
| |
Collapse
|
33
|
Perez-Aguilar JM, Saven JG. Computational design of membrane proteins. Structure 2012; 20:5-14. [PMID: 22244752 DOI: 10.1016/j.str.2011.12.003] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Revised: 12/21/2011] [Accepted: 12/21/2011] [Indexed: 11/26/2022]
Abstract
Membrane proteins are involved in a wide variety of cellular processes, and are typically part of the first interaction a cell has with extracellular molecules. As a result, these proteins comprise a majority of known drug targets. Membrane proteins are among the most difficult proteins to obtain and characterize, and a structure-based understanding of their properties can be difficult to elucidate. Notwithstanding, the design of membrane proteins can provide stringent tests of our understanding of these crucial biological systems, as well as introduce novel or targeted functionalities. Computational design methods have been particularly helpful in addressing these issues, and this review discusses recent studies that tailor membrane proteins to display specific structures or functions and examines how redesigned membrane proteins are being used to facilitate structural and functional studies.
Collapse
|
34
|
Radhakrishna M, Sharma S, Kumar SK. Enhanced Wang Landau sampling of adsorbed protein conformations. J Chem Phys 2012; 136:114114. [DOI: 10.1063/1.3691669] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
|
35
|
Tiana G, Sutto L. Equilibrium properties of realistic random heteropolymers and their relevance for globular and naturally unfolded proteins. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 84:061910. [PMID: 22304119 DOI: 10.1103/physreve.84.061910] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2011] [Indexed: 05/31/2023]
Abstract
Random heteropolymers do not display the typical equilibrium properties of globular proteins, but are the starting point to understand the physics of proteins and, in particular, to describe their non-native states. So far, they have been studied with mean-field models in the thermodynamic limit, or with computer simulations of very small chains on lattice. After describing a self-adjusting parallel-tempering technique to sample efficiently the low-energy states of frustrated systems without the need of tuning the system-dependent parameters of the algorithm, we apply it to random heteropolymers moving in continuous space. We show that if the mean interaction between monomers is negative, the usual description through the random-energy model is nearly correct, provided that it is extended to account for noncompact conformations. If the mean interaction is positive, such a simple description breaks out and the system behaves in a way more similar to Ising spin glasses. The former case is a model for the denatured state of globular proteins, the latter of naturally unfolded proteins, whose equilibrium properties thus result as qualitatively different.
Collapse
Affiliation(s)
- G Tiana
- Department of Physics, Università degli Studi di Milano and Istituto Nazionale di Fisica Nucleare, via Celoria 16, I-20133 Milano, Italy
| | | |
Collapse
|
36
|
Xu F, Zahid S, Silva T, Nanda V. Computational design of a collagen A:B:C-type heterotrimer. J Am Chem Soc 2011; 133:15260-3. [PMID: 21902217 DOI: 10.1021/ja205597g] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We have successfully designed an A:B:C collagen peptide heterotrimer using an automated computational approach. The algorithm maximizes the energy gap between the target and competing misfolded states while enforcing a minimum target stability. Circular dichroism (CD) measurements confirm that all three peptides are required to form a stable, structured triple helix. This study highlights the power of automated computational design, providing model systems to probe the biophysics of collagen assembly and developing general methods for the design of fibrous proteins.
Collapse
Affiliation(s)
- Fei Xu
- Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ and the Center for Advanced Biotechnology and Medicine, Piscataway, New Jersey 08854, United States
| | | | | | | |
Collapse
|
37
|
Abstract
The ability to engineer novel proteins using the principles of molecular structure and energetics is a stringent test of our basic understanding of how proteins fold and maintain structure. The design of protein self-assembly has the potential to impact many fields of biology from molecular recognition to cell signaling to biomaterials. Most progress in computational design of protein self-assembly has focused on α-helical systems, exploring ways to concurrently optimize the stability and specificity of a target state. Applying these methods to collagen self-assembly is very challenging, due to fundamental differences in folding and structure of α- versus triple-helices. Here, we explore various computational methods for designing stable and specific oligomeric systems, with a focus on α-helix and collagen self-assembly.
Collapse
|
38
|
Biswas P, Bhattacherjee A. Role of foldability and stability in designing real protein sequences. Phys Chem Chem Phys 2011; 13:9223-31. [DOI: 10.1039/c0cp02973d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
39
|
Pham TT, Dünweg B, Prakash JR. Collapse Dynamics of Copolymers in a Poor Solvent: Influence of Hydrodynamic Interactions and Chain Sequence. Macromolecules 2010. [DOI: 10.1021/ma101806n] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Tri Thanh Pham
- Department of Chemical Engineering, Monash University, VIC-3800, Melbourne, Australia
- Max Planck Institute for Polymer Research, Ackermannweg 10, D-55128 Mainz, Germany
| | - Burkhard Dünweg
- Department of Chemical Engineering, Monash University, VIC-3800, Melbourne, Australia
- Max Planck Institute for Polymer Research, Ackermannweg 10, D-55128 Mainz, Germany
| | - J. Ravi Prakash
- Department of Chemical Engineering, Monash University, VIC-3800, Melbourne, Australia
| |
Collapse
|
40
|
De novo self-assembling collagen heterotrimers using explicit positive and negative design. Biochemistry 2010; 49:2307-16. [PMID: 20170197 DOI: 10.1021/bi902077d] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We sought to computationally design model collagen peptides that specifically associate as heterotrimers. Computational design has been successfully applied to the creation of new protein folds and functions. Despite the high abundance of collagen and its key role in numerous biological processes, fibrous proteins have received little attention as computational design targets. Collagens are composed of three polypeptide chains that wind into triple helices. We developed a discrete computational model to design heterotrimer-forming collagen-like peptides. Stability and specificity of oligomerization were concurrently targeted using a combined positive and negative design approach. The sequences of three 30-residue peptides, A, B, and C, were optimized to favor charge-pair interactions in an ABC heterotrimer, while disfavoring the 26 competing oligomers (i.e., AAA, ABB, BCA). Peptides were synthesized and characterized for thermal stability and triple-helical structure by circular dichroism and NMR. A unique A:B:C-type species was not achieved. Negative design was partially successful, with only A + B and B + C competing mixtures formed. Analysis of computed versus experimental stabilities helps to clarify the role of electrostatics and secondary-structure propensities determining collagen stability and to provide important insight into how subsequent designs can be improved.
Collapse
|
41
|
Abstract
Progress in understanding protein folding allows to simulate, with atomic detail, the evolution of amino-acid sequences folding to a given native conformation. A particularly attractive example is the HIV-1 protease, main target of therapies to fight AIDS, which under drug pressure is able to develop resistance within few months from the starting of therapy. By comparing the results of simulations of the evolution of the protease with the corresponding proteomic data, one can approximately determine the value of the associated evolution pressure under which the enzyme has become and, as a consequence, map out the energy landscape in sequence space of the HIV-1 protease. It is found that there are several families of sequences folding to the native conformations of the enzyme. Each of these families are characterized by different sets of highly conserved ("hot") amino acids which play a critical role in the folding and stability of the protease. There are two main possibilities for the virus to move from one family to a different one: (a) in a single generation, through the concerted mutations of the hot amino acids, a highly unlikely event, (b) through a folding path (if it exists), again a very improbable event. In fact, the number of generations needed by the virus to change stepwise its sequence from one family to another is astronomically large. These results point to the "hot" segments of the protease as promising targets for a nonconventional inhibition strategy, likely not to create resistance.
Collapse
Affiliation(s)
- G Tiana
- Department of Physics, University of Milano and INFN, via Celoria 16, 20133 Milano, Italy.
| | | |
Collapse
|
42
|
Bhattacherjee A, Biswas P. Combinatorial design of protein sequences with applications to lattice and real proteins. J Chem Phys 2009; 131:125101. [DOI: 10.1063/1.3236519] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
43
|
Babor M, Kortemme T. Multi-constraint computational design suggests that native sequences of germline antibody H3 loops are nearly optimal for conformational flexibility. Proteins 2009; 75:846-58. [PMID: 19194863 DOI: 10.1002/prot.22293] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The limited size of the germline antibody repertoire has to recognize a far larger number of potential antigens. The ability of a single antibody to bind multiple ligands due to conformational flexibility in the antigen-binding site can significantly enlarge the repertoire. Among the six complementarity determining regions (CDRs) that generally comprise the binding site, the CDR H3 loop is particularly variable. Computational protein design studies showed that predicted low energy sequences compatible with a given backbone structure often have considerable similarity to the corresponding native sequences of naturally occurring proteins, indicating that native protein sequences are close to optimal for their structures. Here, we take a step forward to determine whether conformational flexibility, believed to play a key functional role in germline antibodies, is also central in shaping their native sequence. In particular, we use a multi-constraint computational design strategy, along with the Rosetta scoring function, to propose that the native sequences of CDR H3 loops from germline antibodies are nearly optimal for conformational flexibility. Moreover, we find that antibody maturation may lead to sequences with a higher degree of optimization for a single conformation, while disfavoring sequences that are intrinsically flexible. In addition, this computational strategy allows us to predict mutations in the CDR H3 loop to stabilize the antigen-bound conformation, a computational mimic of affinity maturation, that may increase antigen binding affinity by preorganizing the antigen binding loop. In vivo affinity maturation data are consistent with our predictions. The method described here can be useful to design antibodies with higher selectivity and affinity by reducing conformational diversity.
Collapse
Affiliation(s)
- Mariana Babor
- California Institute for Quantitative Biosciences, University of California San Francisco, San Francisco, California 94158-2330, USA
| | | |
Collapse
|
44
|
Evaluating and optimizing computational protein design force fields using fixed composition-based negative design. Proc Natl Acad Sci U S A 2008; 105:12242-7. [PMID: 18708527 DOI: 10.1073/pnas.0805858105] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
An accurate force field is essential to computational protein design and protein fold prediction studies. Proper force field tuning is problematic, however, due in part to the incomplete modeling of the unfolded state. Here, we evaluate and optimize a protein design force field by constraining the amino acid composition of the designed sequences to that of a well behaved model protein. According to the random energy model, unfolded state energies are dependent only on amino acid composition and not the specific arrangement of amino acids. Therefore, energy discrepancies between computational predictions and experimental results, for sequences of identical composition, can be directly attributed to flaws in the force field's ability to properly account for folded state sequence energies. This aspect of fixed composition design allows for force field optimization by focusing solely on the interactions in the folded state. Several rounds of fixed composition optimization of the 56-residue beta1 domain of protein G yielded force field parameters with significantly greater predictive power: Optimized sequences exhibited higher wild-type sequence identity in critical regions of the structure, and the wild-type sequence showed an improved Z-score. Experimental studies revealed a designed 24-fold mutant to be stably folded with a melting temperature similar to that of the wild-type protein. Sequence designs using engrailed homeodomain as a scaffold produced similar results, suggesting the tuned force field parameters were not specific to protein G.
Collapse
|
45
|
Zeldovich KB, Shakhnovich EI. Understanding protein evolution: from protein physics to Darwinian selection. Annu Rev Phys Chem 2008; 59:105-27. [PMID: 17937598 DOI: 10.1146/annurev.physchem.58.032806.104449] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Efforts in whole-genome sequencing and structural proteomics start to provide a global view of the protein universe, the set of existing protein structures and sequences. However, approaches based on the selection of individual sequences have not been entirely successful at the quantitative description of the distribution of structures and sequences in the protein universe because evolutionary pressure acts on the entire organism, rather than on a particular molecule. In parallel to this line of study, studies in population genetics and phenomenological molecular evolution established a mathematical framework to describe the changes in genome sequences in populations of organisms over time. Here, we review both microscopic (physics-based) and macroscopic (organism-level) models of protein-sequence evolution and demonstrate that bridging the two scales provides the most complete description of the protein universe starting from clearly defined, testable, and physiologically relevant assumptions.
Collapse
Affiliation(s)
- Konstantin B Zeldovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
| | | |
Collapse
|
46
|
Shakhnovich BE, Shakhnovich EI. Improvisation in evolution of genes and genomes: whose structure is it anyway? Curr Opin Struct Biol 2008; 18:375-81. [PMID: 18487041 DOI: 10.1016/j.sbi.2008.02.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2008] [Accepted: 02/13/2008] [Indexed: 01/31/2023]
Abstract
Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales--from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.
Collapse
Affiliation(s)
- Boris E Shakhnovich
- Department of Molecular and Cellular Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, United States
| | | |
Collapse
|
47
|
Broglia RA, Levy Y, Tiana G. HIV-1 protease folding and the design of drugs which do not create resistance. Curr Opin Struct Biol 2008; 18:60-6. [DOI: 10.1016/j.sbi.2007.10.004] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2007] [Accepted: 10/29/2007] [Indexed: 10/22/2022]
|
48
|
Amatori A, Tiana G, Ferkinghoff-Borg J, Broglia RA. Denatured state is critical in determining the properties of model proteins designed on different folds. Proteins 2008; 70:1047-55. [PMID: 17847099 DOI: 10.1002/prot.21599] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The thermodynamics of proteins designed on three common folds (SH3, chymotrypsin inhibitor 2 [CI2], and protein G) is studied with a simplified C(alpha) model and compared with the thermodynamics of proteins designed on random-generated folds. The model allows to design sequences to fold within a dRMSD ranging from 1.2 to 4.2 A from the crystallographic native conformation and to study properties that are hard to be measured experimentally. It is found that the denatured state of all of them is not random but is, to different extents, partially structured. The degree of structure is more abundant for SH3 and protein G, giving rise to a weaker stability but a more efficient folding kinetics than CI2 and, even more, than the random-generated folds. Consequently, the features of the unfolded state seem to be as important in the determination of the thermodynamic properties of these proteins as the features of the native state.
Collapse
Affiliation(s)
- A Amatori
- Department of Physics, University of Milano and INFN, 20133 Milano, Italy
| | | | | | | |
Collapse
|
49
|
Biswas P, Zou J, Saven JG. Statistical theory for protein ensembles with designed energy landscapes. J Chem Phys 2007; 123:154908. [PMID: 16252973 DOI: 10.1063/1.2062047] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Combinatorial protein libraries provide a promising route to investigate the determinants and features of protein folding and to identify novel folding amino acid sequences. A library of sequences based on a pool of different monomer types are screened for folding molecules, consistent with a particular foldability criterion. The number of sequences grows exponentially with the length of the polymer, making both experimental and computational tabulations of sequences infeasible. Herein a statistical theory is extended to specify the properties of sequences having particular values of global energetic quantities that specify their energy landscape. The theory yields the site-specific monomer probabilities. A foldability criterion is derived that characterizes the properties of sequences by quantifying the energetic separation of the target state from low-energy states in the unfolded ensemble and the fluctuations of the energies in the unfolded state ensemble. For a simple lattice model of proteins, excellent agreement is observed between the theory and the results of exact enumeration. The theory may be used to provide a quantitative framework for the design and interpretation of combinatorial experiments.
Collapse
Affiliation(s)
- Parbati Biswas
- Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
| | | | | |
Collapse
|
50
|
Yang JY, Yu ZG, Anh V. Correlations between designability and various structural characteristics of protein lattice models. J Chem Phys 2007; 126:195101. [PMID: 17523837 DOI: 10.1063/1.2737042] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Using six kinds of lattice types (4 x 4, 5 x 5, and 6 x 6 square lattices; 3 x 3 x 3 cubic lattice; and 2+3+4+3+2 and 4+5+6+5+4 triangular lattices), three different size alphabets (HP, HNUP, and 20 letters), and two energy functions, the designability of protein structures is calculated based on random samplings of structures and common biased sampling (CBS) of protein sequence space. Then three quantities stability (average energy gap), foldability, and partnum of the structure, which are defined to elucidate the designability, are calculated. The authors find that whatever the type of lattice, alphabet size, and energy function used, there will be an emergence of highly designable (preferred) structure. For all cases considered, the local interactions reduce degeneracy and make the designability higher. The designability is sensitive to the lattice type, alphabet size, energy function, and sampling method of the sequence space. Compared with the random sampling method, both the CBS and the Metropolis Monte Carlo sampling methods make the designability higher. The correlation coefficients between the designability, stability, and foldability are mostly larger than 0.5, which demonstrate that they have strong correlation relationship. But the correlation relationship between the designability and the partnum is not so strong because the partnum is independent of the energy. The results are useful in practical use of the designability principle, such as to predict the protein tertiary structure.
Collapse
Affiliation(s)
- Jian-Yi Yang
- School of Mathematics and Computing Science, Xiangtan University, Hunan 411105, China
| | | | | |
Collapse
|