1
|
Listov D, Goverde CA, Correia BE, Fleishman SJ. Opportunities and challenges in design and optimization of protein function. Nat Rev Mol Cell Biol 2024; 25:639-653. [PMID: 38565617 PMCID: PMC7616297 DOI: 10.1038/s41580-024-00718-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2024] [Indexed: 04/04/2024]
Abstract
The field of protein design has made remarkable progress over the past decade. Historically, the low reliability of purely structure-based design methods limited their application, but recent strategies that combine structure-based and sequence-based calculations, as well as machine learning tools, have dramatically improved protein engineering and design. In this Review, we discuss how these methods have enabled the design of increasingly complex structures and therapeutically relevant activities. Additionally, protein optimization methods have improved the stability and activity of complex eukaryotic proteins. Thanks to their increased reliability, computational design methods have been applied to improve therapeutics and enzymes for green chemistry and have generated vaccine antigens, antivirals and drug-delivery nano-vehicles. Moreover, the high success of design methods reflects an increased understanding of basic rules that govern the relationships among protein sequence, structure and function. However, de novo design is still limited mostly to α-helix bundles, restricting its potential to generate sophisticated enzymes and diverse protein and small-molecule binders. Designing complex protein structures is a challenging but necessary next step if we are to realize our objective of generating new-to-nature activities.
Collapse
Affiliation(s)
- Dina Listov
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Casper A Goverde
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Bruno E Correia
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
| | - Sarel Jacob Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
2
|
Magi Meconi G, Sasselli IR, Bianco V, Onuchic JN, Coluzza I. Key aspects of the past 30 years of protein design. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2022; 85:086601. [PMID: 35704983 DOI: 10.1088/1361-6633/ac78ef] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 06/15/2022] [Indexed: 06/15/2023]
Abstract
Proteins are the workhorse of life. They are the building infrastructure of living systems; they are the most efficient molecular machines known, and their enzymatic activity is still unmatched in versatility by any artificial system. Perhaps proteins' most remarkable feature is their modularity. The large amount of information required to specify each protein's function is analogically encoded with an alphabet of just ∼20 letters. The protein folding problem is how to encode all such information in a sequence of 20 letters. In this review, we go through the last 30 years of research to summarize the state of the art and highlight some applications related to fundamental problems of protein evolution.
Collapse
Affiliation(s)
- Giulia Magi Meconi
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | - Ivan R Sasselli
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | | | - Jose N Onuchic
- Center for Theoretical Biological Physics, Department of Physics & Astronomy, Department of Chemistry, Department of Biosciences, Rice University, Houston, TX 77251, United States of America
| | - Ivan Coluzza
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Bld. Martina Casiano, UPV/EHU Science Park, Barrio Sarriena s/n, 48940 Leioa, Spain
- Basque Foundation for Science, Ikerbasque, 48009, Bilbao, Spain
| |
Collapse
|
3
|
Das S, Lange M, Cacciuto A. Designing active colloidal folders. J Chem Phys 2022; 156:094901. [DOI: 10.1063/5.0081071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Can active forces be exploited to drive the consistent collapse of an active polymer into a folded structure? In this paper, we introduce and perform numerical simulations of a simple model of active colloidal folders and show that a judicious inclusion of active forces into a stiff colloidal chain can generate designable and reconfigurable two-dimensional folded structures. The key feature is to organize the forces perpendicular to the chain backbone according to specific patterns (sequences). We characterize the physical properties of this model and perform, using a number of numerical techniques, an in-depth statistical analysis of structure and dynamics of the emerging conformations. We discovered a number of interesting features, including the existence of a direct correspondence between the sequence of the active forces and the structure of folded conformations, and we discover the existence of an ensemble of highly mobile compact structures capable of moving from conformation to conformation. Finally, akin to protein design problems, we discuss a method that is capable of designing specific target folds by sampling over sequences of active forces.
Collapse
Affiliation(s)
- S. Das
- Department of Chemistry, Columbia University, 3000 Broadway, New York, New York 10027, USA
- Department of Polymer Science and Engineering, University of Massachusetts Amherst, 120 Governors Drive, Amherst, Massachusetts 01003, USA
| | - M. Lange
- Department of Chemistry, Columbia University, 3000 Broadway, New York, New York 10027, USA
| | - A. Cacciuto
- Department of Chemistry, Columbia University, 3000 Broadway, New York, New York 10027, USA
| |
Collapse
|
4
|
Saikia B, Gogoi CR, Rahman A, Baruah A. Identification of an optimal foldability criterion to design misfolding resistant protein. J Chem Phys 2021; 155:144102. [PMID: 34654294 DOI: 10.1063/5.0057533] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Proteins achieve their functional, active, and operative three dimensional native structures by overcoming the possibility of being trapped in non-native energy minima present in the energy landscape. The enormous and intricate interactions that play an important role in protein folding also determine the stability of the proteins. The large number of stabilizing/destabilizing interactions makes proteins to be only marginally stable as compared to the other competing structures. Therefore, there are some possibilities that they become trapped in the non-native conformations and thus get misfolded. These misfolded proteins lead to several debilitating diseases. This work performs a comparative study of some existing foldability criteria in the computational design of misfold resistant protein sequences based on self-consistent mean field theory. The foldability criteria selected for this study are Ef, Δ, and Φ that are commonly used in protein design procedures to determine the most efficient foldability criterion for the design of misfolding resistant proteins. The results suggest that the foldability criterion Δ is significantly better in designing a funnel energy landscape stabilizing the target state. The results also suggest that inclusion of negative design features is important for designing misfolding resistant proteins, but more information about the non-native conformations in terms of Φ leads to worse results compared to even simple positive design. The sequences designed using Δ show better resistance to misfolding in the Monte Carlo simulations performed in the study.
Collapse
Affiliation(s)
- Bondeepa Saikia
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India
| | - Chimi Rekha Gogoi
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India
| | - Aziza Rahman
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India
| | - Anupaul Baruah
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India
| |
Collapse
|
5
|
Banerjee A, Pal K, Mitra P. An Evolutionary Profile Guided Greedy Parallel Replica-Exchange Monte Carlo Search Algorithm for Rapid Convergence in Protein Design. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:489-499. [PMID: 31329126 DOI: 10.1109/tcbb.2019.2928809] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein design, also known as the inverse protein folding problem, is the identification of a protein sequence that folds into a target protein structure. Protein design is proved as an NP-hard problem. While researchers are working on designing heuristics with an emphasis on new scoring functions, we propose a replica-exchange Monte Carlo (REMC) search algorithm that ensures faster convergence using a greedy strategy. Using biological insights, we construct an evolutionary profile to encode the amino acid variability in different positions of the target protein from its structural homologs. The evolutionary profile guides the REMC search, and the greedy approach confirms appreciable exploration and exploitation of the sequence-structure fitness surface. We allow termination of a simulation trajectory once stagnant situation is detected. A series of sequence and structure level validations establish the goodness of our design. On a benchmark dataset, our algorithm reports an average root-mean-square deviation of 1.21Å between the target and the design proteins when modeled with an existing protein folding software. Besides, our algorithm assures 6.16 times overall speedup. In Molecular Dynamics simulations, we observe that four out of selected five design proteins report better to comparable stability to the corresponding target proteins.
Collapse
|
6
|
Tian P, Best RB. Exploring the sequence fitness landscape of a bridge between protein folds. PLoS Comput Biol 2020; 16:e1008285. [PMID: 33048928 PMCID: PMC7553338 DOI: 10.1371/journal.pcbi.1008285] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 08/24/2020] [Indexed: 12/15/2022] Open
Abstract
Most foldable protein sequences adopt only a single native fold. Recent protein design studies have, however, created protein sequences which fold into different structures apon changes of environment, or single point mutation, the best characterized example being the switch between the folds of the GA and GB binding domains of streptococcal protein G. To obtain further insight into the design of sequences which can switch folds, we have used a computational model for the fitness landscape of a single fold, built from the observed sequence variation of protein homologues. We have recently shown that such coevolutionary models can be used to design novel foldable sequences. By appropriately combining two of these models to describe the joint fitness landscape of GA and GB, we are able to describe the propensity of a given sequence for each of the two folds. We have successfully tested the combined model against the known series of designed GA/GB hybrids. Using Monte Carlo simulations on this landscape, we are able to identify pathways of mutations connecting the two folds. In the absence of a requirement for domain stability, the most frequent paths go via sequences in which neither domain is stably folded, reminiscent of the propensity for certain intrinsically disordered proteins to fold into different structures according to context. Even if the folded state is required to be stable, we find that there is nonetheless still a wide range of sequences which are close to the transition region and therefore likely fold switches, consistent with recent estimates that fold switching may be more widespread than had been thought.
Collapse
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, U.S.A
| | - Robert B. Best
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, U.S.A
| |
Collapse
|
7
|
The Marginal Stability of Proteins: How the Jiggling and Wiggling of Atoms is Connected to Neutral Evolution. J Mol Evol 2020; 88:424-426. [DOI: 10.1007/s00239-020-09940-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/19/2020] [Indexed: 01/29/2023]
|
8
|
Marchi J, Galpern EA, Espada R, Ferreiro DU, Walczak AM, Mora T. Size and structure of the sequence space of repeat proteins. PLoS Comput Biol 2019; 15:e1007282. [PMID: 31415557 PMCID: PMC6733475 DOI: 10.1371/journal.pcbi.1007282] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2019] [Revised: 09/09/2019] [Accepted: 07/24/2019] [Indexed: 11/18/2022] Open
Abstract
The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family—the total number of sequences in that family—can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of ∼30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design. Natural protein molecules are only a small subset of the possible strings of amino acids. This naturally calls the question of how many protein sequences theoretically exist that are functional, and how many have already been explored by nature. To help answer this question, we developed a statistical method to calculate the total potential number of protein sequences of a given family, focusing on three families of repeat proteins, which play important roles in a variety of cellular processes. The number of sequences that we compute is limited by functional interactions between the residues of the protein, as well as its evolutionary history. Applying techniques from the physics of disordered systems, we show that the space of sequences has a rugged structure, which could hinder their evolution. Individual proteins can be organised into distinct clusters corresponding to basins of attraction of the landscape, suggesting the existence of subfamilies within each family.
Collapse
Affiliation(s)
- Jacopo Marchi
- Laboratoire de physique de l’École normale supérieure (PSL University), CNRS, Sorbonne Université, and Université de Paris, 75005 Paris, France
| | - Ezequiel A. Galpern
- Protein Physiology Lab, Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Química Biológica, Buenos Aires, Argentina
- CONICET - Universidad de Buenos Aires, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Buenos Aires, Argentina
| | - Rocio Espada
- Laboratoire Gulliver, Ecole supérieure de physique et chimie industrielles (PSL University) and CNRS, 75005, Paris, France
| | - Diego U. Ferreiro
- Protein Physiology Lab, Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Química Biológica, Buenos Aires, Argentina
- CONICET - Universidad de Buenos Aires, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Buenos Aires, Argentina
| | - Aleksandra M. Walczak
- Laboratoire de physique de l’École normale supérieure (PSL University), CNRS, Sorbonne Université, and Université de Paris, 75005 Paris, France
- * E-mail: (AMW); (TM)
| | - Thierry Mora
- Laboratoire de physique de l’École normale supérieure (PSL University), CNRS, Sorbonne Université, and Université de Paris, 75005 Paris, France
- * E-mail: (AMW); (TM)
| |
Collapse
|
9
|
Posfai A, Zhou J, Plotkin JB, Kinney JB, McCandlish DM. Selection for Protein Stability Enriches for Epistatic Interactions. Genes (Basel) 2018; 9:E423. [PMID: 30134605 PMCID: PMC6162820 DOI: 10.3390/genes9090423] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Revised: 07/30/2018] [Accepted: 08/14/2018] [Indexed: 12/15/2022] Open
Abstract
A now classical argument for the marginal thermodynamic stability of proteins explains the distribution of observed protein stabilities as a consequence of an entropic pull in protein sequence space. In particular, most sequences that are sufficiently stable to fold will have stabilities near the folding threshold. Here, we extend this argument to consider its predictions for epistatic interactions for the effects of mutations on the free energy of folding. Although there is abundant evidence to indicate that the effects of mutations on the free energy of folding are nearly additive and conserved over evolutionary time, we show that these observations are compatible with the hypothesis that a non-additive contribution to the folding free energy is essential for observed proteins to maintain their native structure. In particular, through both simulations and analytical results, we show that even very small departures from additivity are sufficient to drive this effect.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| | - Joshua B Plotkin
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| |
Collapse
|
10
|
Cocco S, Feinauer C, Figliuzzi M, Monasson R, Weigt M. Inverse statistical physics of protein sequences: a key issues review. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2018; 81:032601. [PMID: 29120346 DOI: 10.1088/1361-6633/aa9965] [Citation(s) in RCA: 110] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
Collapse
Affiliation(s)
- Simona Cocco
- Laboratoire de Physique Statistique de l'Ecole Normale Supérieure-UMR 8549, CNRS and PSL Research, Sorbonne Universités UPMC, Paris, France
| | | | | | | | | |
Collapse
|
11
|
Abstract
The field of disordered systems in statistical physics provides many simple models in which the competing influences of thermal and nonthermal disorder lead to new phases and nontrivial thermal behavior of order parameters. In this paper, we add a model to the subject by considering a disordered system where the state space consists of various orderings of a list. As in spin glasses, the disorder of such "permutation glasses" arises from a parameter in the Hamiltonian being drawn from a distribution of possible values, thus allowing nominally "incorrect orderings" to have lower energies than "correct orderings" in the space of permutations. We analyze a Gaussian, uniform, and symmetric Bernoulli distribution of energy costs, and, by employing Jensen's inequality, derive a simple condition requiring the permutation glass to always transition to the correctly ordered state at a temperature lower than that of the nondisordered system, provided that this correctly ordered state is accessible. We in turn find that in order for the correctly ordered state to be accessible, the probability that an incorrectly ordered component is energetically favored must be less than the inverse of the number of components in the system. We show that all of these results are consistent with a replica symmetric ansatz of the system. We conclude by arguing that there is no distinct permutation glass phase for the simplest model considered here and by discussing how to extend the analysis to more complex Hamiltonians capable of novel phase behavior and replica symmetry breaking. Finally, we outline an apparent correspondence between the presented system and a discrete-energy-level fermion gas. In all, the investigation introduces a class of exactly soluble models into statistical mechanics and provides a fertile ground to investigate statistical models of disorder.
Collapse
Affiliation(s)
- Mobolaji Williams
- Department of Physics, Harvard University, Cambridge, Massachusetts 02138, USA
| |
Collapse
|
12
|
Williams PD, Pollock DD, Goldstein RA. Functionality and the Evolution of Marginal Stability in Proteins: Inferences from Lattice Simulations. Evol Bioinform Online 2017. [DOI: 10.1177/117693430600200013] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
It has been known for some time that many proteins are marginally stable. This has inspired several explanations. Having noted that the functionality of many enzymes is correlated with subunit motion, flexibility, or general disorder, some have suggested that marginally stable proteins should have an evolutionary advantage over proteins of differing stability. Others have suggested that stability and functionality are contradictory qualities, and that selection for both criteria results in marginally stable proteins, optimised to satisfy the competing design pressures. While these explanations are plausible, recent research simulating the evolution of model proteins has shown that selection for stability, ignoring any aspects of functionality, can result in marginally stable proteins because of the underlying makeup of protein sequence-space. We extend this research by simulating the evolution of proteins, using a computational protein model that equates functionality with binding and catalysis. In the model, marginal stability is not required for ligand-binding functionality and we observe no competing design pressures. The resulting proteins are marginally stable, again demonstrating that neutral evolution is sufficient for explaining marginal stability in observed proteins.
Collapse
Affiliation(s)
- Paul D. Williams
- Department of Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA
| | - David D. Pollock
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Richard A. Goldstein
- Mathematical Biology, National Institute for Medical Sciences, The Ridgeway, Mill Hill, London MW7 1AA, UK
| |
Collapse
|
13
|
Tian P, Best RB. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis. Biophys J 2017; 113:1719-1730. [PMID: 29045866 PMCID: PMC5647607 DOI: 10.1016/j.bpj.2017.08.039] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Revised: 08/03/2017] [Accepted: 08/08/2017] [Indexed: 12/23/2022] Open
Abstract
Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance.
Collapse
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Robert B Best
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland.
| |
Collapse
|
14
|
Towards designing new nano-scale protein architectures. Essays Biochem 2017; 60:315-324. [PMID: 27903819 DOI: 10.1042/ebc20160018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Revised: 08/11/2016] [Accepted: 08/18/2016] [Indexed: 11/17/2022]
Abstract
The complexity of designed bionano-scale architectures is rapidly increasing mainly due to the expanding field of DNA-origami technology and accurate protein design approaches. The major advantage offered by polypeptide nanostructures compared with most other polymers resides in their highly programmable complexity. Proteins allow in vivo formation of well-defined structures with a precise spatial arrangement of functional groups, providing extremely versatile nano-scale scaffolds. Extending beyond existing proteins that perform a wide range of functions in biological systems, it became possible in the last few decades to engineer and predict properties of completely novel protein folds, opening the field of protein nanostructure design. This review offers an overview on rational and computational design approaches focusing on the main achievements of novel protein nanostructure design.
Collapse
|
15
|
Williams M. Statistical physics of the symmetric group. Phys Rev E 2017; 95:042126. [PMID: 28505735 DOI: 10.1103/physreve.95.042126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Indexed: 06/07/2023]
Abstract
Ordered chains (such as chains of amino acids) are ubiquitous in biological cells, and these chains perform specific functions contingent on the sequence of their components. Using the existence and general properties of such sequences as a theoretical motivation, we study the statistical physics of systems whose state space is defined by the possible permutations of an ordered list, i.e., the symmetric group, and whose energy is a function of how certain permutations deviate from some chosen correct ordering. Such a nonfactorizable state space is quite different from the state spaces typically considered in statistical physics systems and consequently has novel behavior in systems with interacting and even noninteracting Hamiltonians. Various parameter choices of a mean-field model reveal the system to contain five different physical regimes defined by two transition temperatures, a triple point, and a quadruple point. Finally, we conclude by discussing how the general analysis can be extended to state spaces with more complex combinatorial properties and to other standard questions of statistical mechanics models.
Collapse
Affiliation(s)
- Mobolaji Williams
- Department of Physics, Harvard University, Cambridge, Massachusetts 02138, USA
| |
Collapse
|
16
|
Computational protein design with backbone plasticity. Biochem Soc Trans 2016; 44:1523-1529. [PMID: 27911735 PMCID: PMC5264498 DOI: 10.1042/bst20160155] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Revised: 08/01/2016] [Accepted: 08/03/2016] [Indexed: 11/17/2022]
Abstract
The computational algorithms used in the design of artificial proteins have become increasingly sophisticated in recent years, producing a series of remarkable successes. The most dramatic of these is the de novo design of artificial enzymes. The majority of these designs have reused naturally occurring protein structures as ‘scaffolds’ onto which novel functionality can be grafted without having to redesign the backbone structure. The incorporation of backbone flexibility into protein design is a much more computationally challenging problem due to the greatly increased search space, but promises to remove the limitations of reusing natural protein scaffolds. In this review, we outline the principles of computational protein design methods and discuss recent efforts to consider backbone plasticity in the design process.
Collapse
|
17
|
Berezovsky IN, Guarnera E, Zheng Z. Basic units of protein structure, folding, and function. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2016; 128:85-99. [PMID: 27697476 DOI: 10.1016/j.pbiomolbio.2016.09.009] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Revised: 09/05/2016] [Accepted: 09/26/2016] [Indexed: 10/20/2022]
Abstract
Study of the hierarchy of domain structure with alternative sets of domains and analysis of discontinuous domains, consisting of remote segments of the polypeptide chain, raised a question about the minimal structural unit of the protein domain. The hypothesis on the decisive role of the polypeptide backbone in determining the elementary units of globular proteins have led to the discovery of closed loops. It is reviewed here how closed loops form the loop-n-lock structure of proteins, providing the foundation for stability and designability of protein folds/domain and underlying their co-translational folding. Simplified protein sequences are considered here with the aim to explore the basic principles that presumably dominated the folding and stability of proteins in the early stages of structural evolution. Elementary functional loops (EFLs), closed loops with one or few catalytic residues, are, in turn, units of the protein function. They are apparent descendants of the prebiotic ring-like peptides, which gave rise to the first functional folds/domains being fused in the beginning of the evolution of protein structure. It is also shown how evolutionary relations between protein functional superfamilies and folds delineated with the help of EFLs can contribute to establishing the rules for design of desired enzymatic functions. Generalized descriptors of the elementary functions are proposed to be used as basic units in the future computational design.
Collapse
Affiliation(s)
- Igor N Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore; Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore.
| | - Enrico Guarnera
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | - Zejun Zheng
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| |
Collapse
|
18
|
Movahedi M, Zare-Mirakabad F, Arab SS. Evaluating the accuracy of protein design using native secondary sub-structures. BMC Bioinformatics 2016; 17:353. [PMID: 27597167 PMCID: PMC5011913 DOI: 10.1186/s12859-016-1199-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 08/24/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND According to structure-dependent function of proteins, two main challenging problems called Protein Structure Prediction (PSP) and Inverse Protein Folding (IPF) are investigated. In spite of IPF essential applications, it has not been investigated as much as PSP problem. In fact, the ultimate goal of IPF problem or protein design is to create proteins with enhanced properties or even novel functions. One of the major computational challenges in protein design is its large sequence space, namely searching through all plausible sequences is impossible. Inasmuch as, protein secondary structure represents an appropriate primary scaffold of the protein conformation, undoubtedly studying the Protein Secondary Structure Inverse Folding (PSSIF) problem is a quantum leap forward in protein design, as it can reduce the search space. In this paper, a novel genetic algorithm which uses native secondary sub-structures is proposed to solve PSSIF problem. In essence, evolutionary information can lead the algorithm to design appropriate amino acid sequences respective to the target secondary structures. Furthermore, they can be folded to tertiary structures almost similar to their reference 3D structures. RESULTS The proposed algorithm called GAPSSIF benefits from evolutionary information obtained by solved proteins in the PDB. Therefore, we construct a repository of protein secondary sub-structures to accelerate convergence of the algorithm. The secondary structure of designed sequences by GAPSSIF is comparable with those obtained by Evolver and EvoDesign. Although we do not explicitly consider tertiary structure features through the algorithm, the structural similarity of native and designed sequences declares acceptable values. CONCLUSIONS Using the evolutionary information of native structures can significantly improve the quality of designed sequences. In fact, the combination of this information and effective features such as solvent accessibility and torsion angles leads IPF problem to an efficient solution. GAPSSIF can be downloaded at http://bioinformatics.aut.ac.ir/GAPSSIF/ .
Collapse
Affiliation(s)
- Marziyeh Movahedi
- Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - Fatemeh Zare-Mirakabad
- Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - Seyed Shahriar Arab
- Department of Biophysics, Faculty of Biological Sciences Tarbiat Modares University (TMU), Tehran, Iran
| |
Collapse
|
19
|
Venev SV, Zeldovich KB. Massively parallel sampling of lattice proteins reveals foundations of thermal adaptation. J Chem Phys 2016; 143:055101. [PMID: 26254668 DOI: 10.1063/1.4927565] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Evolution of proteins in bacteria and archaea living in different conditions leads to significant correlations between amino acid usage and environmental temperature. The origins of these correlations are poorly understood, and an important question of protein theory, physics-based prediction of types of amino acids overrepresented in highly thermostable proteins, remains largely unsolved. Here, we extend the random energy model of protein folding by weighting the interaction energies of amino acids by their frequencies in protein sequences and predict the energy gap of proteins designed to fold well at elevated temperatures. To test the model, we present a novel scalable algorithm for simultaneous energy calculation for many sequences in many structures, targeting massively parallel computing architectures such as graphics processing unit. The energy calculation is performed by multiplying two matrices, one representing the complete set of sequences, and the other describing the contact maps of all structural templates. An implementation of the algorithm for the CUDA platform is available at http://www.github.com/kzeldovich/galeprot and calculates protein folding energies over 250 times faster than a single central processing unit. Analysis of amino acid usage in 64-mer cubic lattice proteins designed to fold well at different temperatures demonstrates an excellent agreement between theoretical and simulated values of energy gap. The theoretical predictions of temperature trends of amino acid frequencies are significantly correlated with bioinformatics data on 191 bacteria and archaea, and highlight protein folding constraints as a fundamental selection pressure during thermal adaptation in biological evolution.
Collapse
Affiliation(s)
- Sergey V Venev
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, Massachusetts 01605, USA
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, Massachusetts 01605, USA
| |
Collapse
|
20
|
Leelananda SP, Jernigan RL, Kloczkowski A. Predicting Designability of Small Proteins from Graph Features of Contact Maps. J Comput Biol 2016; 23:400-11. [PMID: 27159634 PMCID: PMC4876523 DOI: 10.1089/cmb.2015.0209] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Highly designable structures can be distinguished based on certain geometric graphical features of the interactions, confirming the fact that the topology of a protein structure and its residue-residue interaction network are important determinants of its designability. The most designable structures and least designable structures obtained for sets of proteins having the same number of residues are compared. It is shown that the most designable structures predicted by the graph features of the contact diagrams are more densely packed, whereas the poorly designable structures are more open structures or structures that are loosely packed. Interestingly enough, it can also be seen that the highly designable identified are also common structural motifs found in nature.
Collapse
Affiliation(s)
| | - Robert L. Jernigan
- Iowa State University, Ames, Iowa
- Baker Center for Bioinformatics and Biological Statistics, Ames, Iowa
| | - Andrzej Kloczkowski
- Nationwide Children's Hospital, Columbus, Ohio
- The Ohio State University, Columbus, Ohio
| |
Collapse
|
21
|
Chen SH, Meller J, Elber R. Comprehensive analysis of sequences of a protein switch. Protein Sci 2016; 25:135-46. [PMID: 26073558 PMCID: PMC4815306 DOI: 10.1002/pro.2723] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Revised: 05/28/2015] [Accepted: 05/28/2015] [Indexed: 11/08/2022]
Abstract
Switches form a special class of proteins that dramatically change their three-dimensional structures upon a small perturbation. One possible perturbation that we explore is that of a single point mutation. Building on the pioneering experimental work of Alexander et al. (Alexander et al. PNAS, 2007; 104,11963-11968) that determines switch sequences between α and α+β folds we conduct a comprehensive sequence sampling by a Markov Chain with multiple fitness criteria to identify new switches given the experimental folds. We screen for switch sequences using a combination of contact potential, secondary structure prediction, and finally molecular dynamics simulations. Statistical properties of switch sequences are discussed and illustrated to be most sensitive to mutation at the N- and C- termini of the switch protein. Based on this analysis, a particularly stable putative switch pair is identified and proposed for further experimental analysis.
Collapse
Affiliation(s)
- Szu-Hua Chen
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas
- Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, Texas
| | - Jaroslaw Meller
- Department of Environmental Health, University of Cincinnati College of Medicine, Cincinnati, Ohio
- Department of Electrical Engineering and Computing Systems, University of Cincinnati College of Medicine, Cincinnati, Ohio
- Department of Biomedical Informatics, University of Cincinnati College of Medicine, Cincinnati, Ohio
- Department of Informatics, Nicholas Copernicus University, Torun, Poland
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Ron Elber
- Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, Texas
- Department of Chemistry, University of Texas at Austin, Austin, Texas
| |
Collapse
|
22
|
Ferrada E. The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets. PLoS Comput Biol 2014; 10:e1003946. [PMID: 25473967 PMCID: PMC4256021 DOI: 10.1371/journal.pcbi.1003946] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2014] [Accepted: 09/26/2014] [Indexed: 11/19/2022] Open
Abstract
The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet.
Collapse
Affiliation(s)
- Evandro Ferrada
- Santa Fe Institute, Santa Fe, New Mexico, United States of America
| |
Collapse
|
23
|
Krick T, Verstraete N, Alonso LG, Shub DA, Ferreiro DU, Shub M, Sánchez IE. Amino Acid metabolism conflicts with protein diversity. Mol Biol Evol 2014; 31:2905-12. [PMID: 25086000 PMCID: PMC4209132 DOI: 10.1093/molbev/msu228] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
The 20 protein-coding amino acids are found in proteomes with different relative abundances. The most abundant amino acid, leucine, is nearly an order of magnitude more prevalent than the least abundant amino acid, cysteine. Amino acid metabolic costs differ similarly, constraining their incorporation into proteins. On the other hand, a diverse set of protein sequences is necessary to build functional proteomes. Here, we present a simple model for a cost-diversity trade-off postulating that natural proteomes minimize amino acid metabolic flux while maximizing sequence entropy. The model explains the relative abundances of amino acids across a diverse set of proteomes. We found that the data are remarkably well explained when the cost function accounts for amino acid chemical decay. More than 100 organisms reach comparable solutions to the trade-off by different combinations of proteome cost and sequence diversity. Quantifying the interplay between proteome size and entropy shows that proteomes can get optimally large and diverse.
Collapse
Affiliation(s)
- Teresa Krick
- Departamento de Matemática, Facultad de Ciencias Exactas y Naturales and IMAS-CONICET, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Nina Verstraete
- Protein Physiology Laboratory, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales and IQUIBICEN-CONICET, Universidad de Buenos Aires, Buenos Aires, Argentina
| | | | - David A Shub
- Department of Biological Sciences, University at Albany, State University of New York
| | - Diego U Ferreiro
- Protein Physiology Laboratory, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales and IQUIBICEN-CONICET, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Michael Shub
- IMAS-CONICET, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Ignacio E Sánchez
- Protein Physiology Laboratory, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales and IQUIBICEN-CONICET, Universidad de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
24
|
Zhang J, Zheng F, Grigoryan G. Design and designability of protein-based assemblies. Curr Opin Struct Biol 2014; 27:79-86. [DOI: 10.1016/j.sbi.2014.05.009] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Revised: 05/19/2014] [Accepted: 05/20/2014] [Indexed: 10/25/2022]
|
25
|
Yadahalli S, Hemanth Giri Rao VV, Gosavi S. Modeling Non-Native Interactions in Designed Proteins. Isr J Chem 2014. [DOI: 10.1002/ijch.201400035] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
26
|
Abstract
Protein switches are made of highly similar sequences that fold to dramatically different structures. A structural switching system with 31 sequence variants for α and α+β folds has been illustrated experimentally by He et al., Structure, 2012, 20, 283 and is investigated computationally in the present study. Methods to assign a sequence to one of the two folds are reported and analyzed. A fast and accurate protocol to identify the correct fold of the 31 sequences is based on enriching modeled structures using short molecular dynamics (MD) trajectories and scoring these structures with coarse-grained energy functions. We examine five coarse-grained energy functions and illustrate that the Hinds-Levitt potential works the best for this task. We show that enrichment by MD significantly enhances prediction accuracy. Finally, we find that melting temperature correlates well with the energy difference between the two folds (correlation coefficient ∼-0.7). The correlation reduces dramatically (∼0.4) if the absolute energy of the correct fold is considered. Moreover, prediction of melting temperature is sensitive to the structural templates. We emphasize in our analyses the use of native structures as templates since these folds are more readily available from structural biology experiments.
Collapse
Affiliation(s)
- Szu-Hua Chen
- Department of Molecular Biosciences, Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, TX 78712, USA
| | | |
Collapse
|
27
|
Amino acid composition of proteins reduces deleterious impact of mutations. Sci Rep 2013; 3:2919. [PMID: 24108121 PMCID: PMC3794375 DOI: 10.1038/srep02919] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2012] [Accepted: 09/24/2013] [Indexed: 12/02/2022] Open
Abstract
The evolutionary origin of amino acid occurrence frequencies in proteins (composition) is not yet fully understood. We suggest that protein composition works alongside the genetic code to minimize impact of mutations on protein structure. First, we propose a novel method for estimating thermodynamic stability of proteins whose sequence is constrained to a fixed composition. Second, we quantify the average deleterious impact of substituting one amino acid with another. Natural proteome compositions are special in at least two ways: 1) Natural compositions do not generate more stable proteins than the average random composition, however, they result in proteins that are less susceptible to damage from mutations. 2) Natural proteome compositions that result in more stable proteins (i.e. those of thermophiles) are also tuned to have a higher tolerance for mutations. This is consistent with the observation that environmental factors selecting for more stable proteins also enhance the deleterious impact of mutations.
Collapse
|
28
|
Mannige RV. Two modes of protein sequence evolution and their compositional dependencies. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013; 87:062714. [PMID: 23848722 DOI: 10.1103/physreve.87.062714] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2013] [Revised: 05/10/2013] [Indexed: 06/02/2023]
Abstract
Protein sequence evolution has resulted in a vast repertoire of molecular functionality crucial to life. Despite the central importance of sequence evolution to biology, our fundamental understanding of how sequence composition affects evolution is incomplete. This report describes the utilization of lattice model simulations of directed evolution, which indicate that, on average, peptide and protein evolvability is strongly dependent on initial sequence composition. The report also discusses two distinct regimes of sequence evolution by point mutation: (a) the "classical" mode where sequences "crawl" over free energy barriers towards acquiring a target fold, and (b) the "quantum" mode where sequences appear to "tunnel" through large energy barriers generally insurmountable by means of a crawl. Finally, the simulations indicate that oily and charged peptides are the most efficient substrates for evolution at the "classical" and "quantum" regimes, respectively, and that their respective response to temperature is commensurate with analogies made to barrier crossing in classical and quantum systems. On the whole, these results show that sequence composition can tune both the evolvability and the optimal mode of evolution of peptides and proteins.
Collapse
Affiliation(s)
- Ranjan V Mannige
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
| |
Collapse
|
29
|
Longo LM, Blaber M. Protein design at the interface of the pre-biotic and biotic worlds. Arch Biochem Biophys 2012; 526:16-21. [DOI: 10.1016/j.abb.2012.06.009] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Accepted: 06/23/2012] [Indexed: 12/01/2022]
|
30
|
Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, Colwell LJ, de Koning APJ, Dokholyan NV, Echave J, Elofsson A, Gerloff DL, Goldstein RA, Grahnen JA, Holder MT, Lakner C, Lartillot N, Lovell SC, Naylor G, Perica T, Pollock DD, Pupko T, Regan L, Roger A, Rubinstein N, Shakhnovich E, Sjölander K, Sunyaev S, Teufel AI, Thorne JL, Thornton JW, Weinreich DM, Whelan S. The interface of protein structure, protein biophysics, and molecular evolution. Protein Sci 2012; 21:769-85. [PMID: 22528593 PMCID: PMC3403413 DOI: 10.1002/pro.2071] [Citation(s) in RCA: 149] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2012] [Revised: 03/22/2012] [Accepted: 03/23/2012] [Indexed: 12/20/2022]
Abstract
Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction.
Collapse
Affiliation(s)
- David A Liberles
- Department of Molecular Biology, University of WyomingLaramie, Wyoming 82071
| | - Sarah A Teichmann
- MRC Laboratory of Molecular BiologyHills Road, Cambridge CB2 0QH, United Kingdom
| | - Ivet Bahar
- Department of Computational and Systems Biology, School of Medicine, University of PittsburghPittsburgh, Pennsylvania 15213
| | - Ugo Bastolla
- Bioinformatics Unit. Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autonoma de Madrid28049 Cantoblanco Madrid, Spain
| | - Jesse Bloom
- Division of Basic Sciences, Fred Hutchinson Cancer Research CenterSeattle, Washington 98109
| | - Erich Bornberg-Bauer
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of MuensterGermany
| | - Lucy J Colwell
- MRC Laboratory of Molecular BiologyHills Road, Cambridge CB2 0QH, United Kingdom
| | - A P Jason de Koning
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of ColoradoAurora, Colorado
| | - Nikolay V Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel HillNorth Carolina 27599
| | - Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San MartínMartín de Irigoyen 3100, 1650 San Martín, Buenos Aires, Argentina
| | - Arne Elofsson
- Department of Biochemistry and Biophysics, Center for Biomembrane Research, Stockholm Bioinformatics Center, Science for Life Laboratory, Swedish E-science Research Center, Stockholm University106 91 Stockholm, Sweden
| | - Dietlind L Gerloff
- Biomolecular Engineering Department, University of CaliforniaSanta Cruz, California 95064
| | - Richard A Goldstein
- Division of Mathematical Biology, National Institute for Medical Research (MRC)Mill Hill, London NW7 1AA, United Kingdom
| | - Johan A Grahnen
- Department of Molecular Biology, University of WyomingLaramie, Wyoming 82071
| | - Mark T Holder
- Department of Ecology and Evolutionary Biology, University of KansasLawrence, Kansas 66045
| | - Clemens Lakner
- Bioinformatics Research Center, North Carolina State UniversityRaleigh, North Carolina 27695
| | - Nicholas Lartillot
- Département de Biochimie, Faculté de Médecine, Université de MontréalMontréal, QC H3T1J4, Canada
| | - Simon C Lovell
- Faculty of Life Sciences, University of ManchesterManchester M13 9PT, United Kingdom
| | - Gavin Naylor
- Department of Biology, College of CharlestonCharleston, South Carolina 29424
| | - Tina Perica
- MRC Laboratory of Molecular BiologyHills Road, Cambridge CB2 0QH, United Kingdom
| | - David D Pollock
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of ColoradoAurora, Colorado
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv UniversityTel Aviv, Israel
| | - Lynne Regan
- Department of Molecular Biophysics and Biochemistry, Yale UniversityNew Haven 06511
| | - Andrew Roger
- Department of Biochemistry and Molecular Biology, Dalhousie UniversityHalifax, NS, Canada
| | - Nimrod Rubinstein
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv UniversityTel Aviv, Israel
| | - Eugene Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard UniversityCambridge, Massachusetts 02138
| | - Kimmen Sjölander
- Department of Bioengineering, University of CaliforniaBerkeley, Berkeley, California 94720
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School77 Avenue Louis Pasteur, Boston, Massachusetts 02115
| | - Ashley I Teufel
- Department of Molecular Biology, University of WyomingLaramie, Wyoming 82071
| | - Jeffrey L Thorne
- Bioinformatics Research Center, North Carolina State UniversityRaleigh, North Carolina 27695
| | - Joseph W Thornton
- Howard Hughes Medical Institute and Institute for Ecology and Evolution, University of OregonEugene, Oregon 97403
- Department of Human Genetics, University of ChicagoChicago, Illinois 60637
- Department of Ecology and Evolution, University of ChicagoChicago, Illinois 60637
| | - Daniel M Weinreich
- Department of Ecology and Evolutionary Biology, and Center for Computational Molecular Biology, Brown UniversityProvidence, Rhode Island 02912
| | - Simon Whelan
- Faculty of Life Sciences, University of ManchesterManchester M13 9PT, United Kingdom
| |
Collapse
|
31
|
Burke S, Elber R. Super folds, networks, and barriers. Proteins 2012; 80:463-70. [PMID: 22095563 PMCID: PMC3290721 DOI: 10.1002/prot.23212] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2011] [Revised: 08/31/2011] [Accepted: 09/22/2011] [Indexed: 11/06/2022]
Abstract
Exhaustive enumeration of sequences and folds is conducted for a simple lattice model of conformations, sequences, and energies. Examination of all foldable sequences and their nearest connected neighbors (sequences that differ by no more than a point mutation) illustrates the following: (i) There exist unusually large number of sequences that fold into a few structures (super-folds). The same observation was made experimentally and computationally using stochastic sampling and exhaustive enumeration of related models. (ii) There exist only a few large networks of connected sequences that are not restricted to one fold. These networks cover a significant fraction of fold spaces (super-networks). (iii) There exist barriers in sequence space that prevent foldable sequences of the same structure to "connect" through a series of single point mutations (super-barrier), even in the presence of the sequence connection between folds. While there is ample experimental evidence for the existence of super-folds, evidence for a super-network is just starting to emerge. The prediction of a sequence barrier is an intriguing characteristic of sequence space, suggesting that the overall sequence space may be disconnected. The implications and limitations of these observations for evolution of protein structures are discussed.
Collapse
Affiliation(s)
- Sean Burke
- Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin TX 78712
| | - Ron Elber
- Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin TX 78712
- Department of Chemistry and Biochemistry, University of Texas at Austin, Austin TX 78712
| |
Collapse
|
32
|
GALZITSKAYA OXANAV, BOGATYREVA NATALYAS, IVANKOV DMITRYN. COMPACTNESS DETERMINES PROTEIN FOLDING TYPE. J Bioinform Comput Biol 2011; 6:667-80. [DOI: 10.1142/s0219720008003618] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2007] [Revised: 01/02/2008] [Accepted: 01/04/2008] [Indexed: 11/18/2022]
Abstract
We have demonstrated here that protein compactness, which we define as the ratio of the accessible surface area of a protein to that of the ideal sphere of the same volume, is one of the factors determining the mechanism of protein folding. Proteins with multi-state kinetics, on average, are more compact (compactness is 1.49 ± 0.02 for proteins within the size range of 101–151 amino acid residues) than proteins with two-state kinetics (compactness is 1.59 ± 0.03 for proteins within the same size range of 101–151 amino acid residues). We have shown that compactness for homologous proteins can explain both the difference in folding rates and the difference in folding mechanisms.
Collapse
Affiliation(s)
- OXANA V. GALZITSKAYA
- Institute of Protein Research, Russian Academy of Sciences, Institutskaya Str. 4, Pushchino, Moscow Region 142290, Russia
| | - NATALYA S. BOGATYREVA
- Institute of Protein Research, Russian Academy of Sciences, Institutskaya Str. 4, Pushchino, Moscow Region 142290, Russia
| | - DMITRY N. IVANKOV
- Institute of Protein Research, Russian Academy of Sciences, Institutskaya Str. 4, Pushchino, Moscow Region 142290, Russia
| |
Collapse
|
33
|
Samish I, MacDermaid CM, Perez-Aguilar JM, Saven JG. Theoretical and Computational Protein Design. Annu Rev Phys Chem 2011; 62:129-49. [DOI: 10.1146/annurev-physchem-032210-103509] [Citation(s) in RCA: 119] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
| | | | | | - Jeffery G. Saven
- Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
| |
Collapse
|
34
|
Shukla P. Thermodynamics of protein folding: a random matrix formulation. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2010; 22:415106. [PMID: 21386596 DOI: 10.1088/0953-8984/22/41/415106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The process of protein folding from an unfolded state to a biologically active, folded conformation is governed by many parameters, e.g. the sequence of amino acids, intermolecular interactions, the solvent, temperature and chaperon molecules. Our study, based on random matrix modeling of the interactions, shows, however, that the evolution of the statistical measures, e.g. Gibbs free energy, heat capacity, and entropy, is single parametric. The information can explain the selection of specific folding pathways from an infinite number of possible ways as well as other folding characteristics observed in computer simulation studies.
Collapse
Affiliation(s)
- Pragya Shukla
- Department of Physics, Indian Institute of Technology, Kharagpur, India
| |
Collapse
|
35
|
Begum T, Ghosh TC. Understanding the Effect of Secondary Structures and Aggregation on Human Protein Folding Class Evolution. J Mol Evol 2010; 71:60-9. [DOI: 10.1007/s00239-010-9364-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 06/23/2010] [Indexed: 12/01/2022]
|
36
|
Bhattacherjee A, Biswas P. Neutrality and evolvability of designed protein sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2010; 82:011906. [PMID: 20866647 DOI: 10.1103/physreve.82.011906] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2009] [Revised: 03/25/2010] [Indexed: 05/29/2023]
Abstract
The effect of foldability on protein's evolvability is analyzed by a two-prong approach consisting of a self-consistent mean-field theory and Monte Carlo simulations. Theory and simulation models representing protein sequences with binary patterning of amino acid residues compatible with a particular foldability criteria are used. This generalized foldability criterion is derived using the high temperature cumulant expansion approximating the free energy of folding. The effect of cumulative point mutations on these designed proteins is studied under neutral condition. The robustness, protein's ability to tolerate random point mutations is determined with a selective pressure of stability (ΔΔG) for the theory designed sequences, which are found to be more robust than that of Monte Carlo and mean-field-biased Monte Carlo generated sequences. The results show that this foldability criterion selects viable protein sequences more effectively compared to the Monte Carlo method, which has a marked effect on how the selective pressure shapes the evolutionary sequence space. These observations may impact de novo sequence design and its applications in protein engineering.
Collapse
|
37
|
Interplay between pleiotropy and secondary selection determines rise and fall of mutators in stress response. PLoS Comput Biol 2010; 6:e1000710. [PMID: 20300650 PMCID: PMC2837395 DOI: 10.1371/journal.pcbi.1000710] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2009] [Accepted: 02/08/2010] [Indexed: 11/19/2022] Open
Abstract
Mutators are clones whose mutation rate is about two to three orders of magnitude higher than the rate of wild-type clones and their roles in adaptive evolution of asexual populations have been controversial. Here we address this problem by using an ab initio microscopic model of living cells, which combines population genetics with a physically realistic presentation of protein stability and protein-protein interactions. The genome of model organisms encodes replication controlling genes (RCGs) and genes modeling the mismatch repair (MMR) complexes. The genotype-phenotype relationship posits that the replication rate of an organism is proportional to protein copy numbers of RCGs in their functional form and there is a production cost penalty for protein overexpression. The mutation rate depends linearly on the concentration of homodimers of MMR proteins. By simulating multiple runs of evolution of populations under various environmental stresses—stationary phase, starvation or temperature-jump—we find that adaptation most often occurs through transient fixation of a mutator phenotype, regardless of the nature of stress. By contrast, the fixation mechanism does depend on the nature of stress. In temperature jump stress, mutators take over the population due to loss of stability of MMR complexes. In contrast, in starvation and stationary phase stresses, a small number of mutators are supplied to the population via epigenetic stochastic noise in production of MMR proteins (a pleiotropic effect), and their net supply is higher due to reduced genetic drift in slowly growing populations under stressful environments. Subsequently, mutators in stationary phase or starvation hitchhike to fixation with a beneficial mutation in the RCGs, (second order selection) and finally a mutation stabilizing the MMR complex arrives, returning the population to a non-mutator phenotype. Our results provide microscopic insights into the rise and fall of mutators in adapting finite asexual populations. The dramatic rise of mutators has been found to accompany adaptation of bacteria in response to many kinds of stress. Two views on the evolutionary origin of this phenomenon emerged: the pleiotropic hypothesis positing that it is a byproduct of environmental stress or other specific stress response mechanisms and the second order selection which states that mutators hitchhike to fixation with unrelated beneficial alleles. Conventional population genetics models could not fully resolve this controversy because they are based on certain assumptions about fitness landscape. Here we address this problem using a microscopic multiscale model, which couples physically realistic molecular descriptions of proteins and their interactions with population genetics of carrier organisms without assuming any a priori mutational effect on fitness landscape. We found that both pleiotropy and second order selection play a crucial role at different stages of adaptation: the supply of mutators is provided through destabilization of error correction complexes or, alternatively, fluctuations of production levels of prototypic mismatch repair proteins (pleiotropic effects), while the rise and fixation of mutators occurs when there is a sufficient supply of beneficial mutations in replication-controlling genes. This general mechanism assures a robust and reliable adaptation of organisms to unforeseen challenges. This study highlights physical principles underlying biological mechanisms of stress response and adaptation.
Collapse
|
38
|
Guarnera E, Pellarin R, Caflisch A. How does a simplified-sequence protein fold? Biophys J 2009; 97:1737-46. [PMID: 19751679 PMCID: PMC2749778 DOI: 10.1016/j.bpj.2009.06.047] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2009] [Revised: 06/24/2009] [Accepted: 06/30/2009] [Indexed: 11/21/2022] Open
Abstract
To investigate a putatively primordial protein we have simplified the sequence of a 56-residue alpha/beta fold (the immunoglobulin-binding domain of protein G) by replacing it with polyalanine, polythreonine, and diglycine segments at regions of the sequence that in the folded structure are alpha-helical, beta-strand, and turns, respectively. Remarkably, multiple folding and unfolding events are observed in a 15-micros molecular dynamics simulation at 330 K. The most stable state (populated at approximately 20%) of the simplified-sequence variant of protein G has the same alpha/beta topology as the wild-type but shows the characteristics of a molten globule, i.e., loose contacts among side chains and lack of a specific hydrophobic core. The unfolded state is heterogeneous and includes a variety of alpha/beta topologies but also fully alpha-helical and fully beta-sheet structures. Transitions within the denatured state are very fast, and the molten-globule state is reached in <1 micros by a framework mechanism of folding with multiple pathways. The native structure of the wild-type is more rigid than the molten-globule conformation of the simplified-sequence variant. The difference in structural stability and the very fast folding of the simplified protein suggest that evolution has enriched the primordial alphabet of amino acids mainly to optimize protein function by stabilization of a unique structure with specific tertiary interactions.
Collapse
Affiliation(s)
| | | | - Amedeo Caflisch
- Department of Biochemistry, University of Zurich, Zurich, Switzerland
| |
Collapse
|
39
|
Ivankov DN, Bogatyreva NS, Lobanov MY, Galzitskaya OV. Coupling between properties of the protein shape and the rate of protein folding. PLoS One 2009; 4:e6476. [PMID: 19649298 PMCID: PMC2714458 DOI: 10.1371/journal.pone.0006476] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2009] [Accepted: 06/21/2009] [Indexed: 11/19/2022] Open
Abstract
There are several important questions on the coupling between properties of the protein shape and the rate of protein folding. We have studied a series of structural descriptors intended for describing protein shapes (the radius of gyration, the radius of cross-section, and the coefficient of compactness) and their possible connection with folding behavior, either rates of folding or the emergence of folding intermediates, and compared them with classical descriptors, protein chain length and contact order. It has been found that when a descriptor is normalized to eliminate the influence of the protein size (the radius of gyration normalized to the radius of gyration of a ball of equal volume, the coefficient of compactness defined as the ratio of the accessible surface area of a protein to that of an ideal ball of equal volume, and relative contact order) it completely looses its ability to predict folding rates. On the other hand, when a descriptor correlates well with protein size (the radius of cross-section and absolute contact order in our consideration) then it correlates well with the logarithm of folding rates and separates reasonably well two-state folders from multi-state ones. The critical control for the performance of new descriptors demonstrated that the radius of cross-section has a somewhat higher predictive power (the correlation coefficient is −0.74) than size alone (the correlation coefficient is −0.65). So, we have shown that the numerical descriptors of the overall shape-geometry of protein structures are one of the important determinants of the protein-folding rate and mechanism.
Collapse
Affiliation(s)
- Dmitry N. Ivankov
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
| | | | - Michail Yu Lobanov
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
| | - Oxana V. Galzitskaya
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
- * E-mail:
| |
Collapse
|
40
|
Kapsokalivas L, Gan X, Albrecht AA, Steinhöfel K. Population-based local search for protein folding simulation in the MJ energy model and cubic lattices. Comput Biol Chem 2009; 33:283-94. [PMID: 19647489 DOI: 10.1016/j.compbiolchem.2009.06.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2009] [Accepted: 06/17/2009] [Indexed: 10/20/2022]
Abstract
We present experimental results on benchmark problems in 3D cubic lattice structures with the Miyazawa-Jernigan energy function for two local search procedures that utilise the pull-move set: (i) population-based local search (PLS) that traverses the energy landscape with greedy steps towards (potential) local minima followed by upward steps up to a certain level of the objective function; (ii) simulated annealing with a logarithmic cooling schedule (LSA). The parameter settings for PLS are derived from short LSA-runs executed in pre-processing and the procedure utilises tabu lists generated for each member of the population. In terms of the total number of energy function evaluations both methods perform equally well, however, PLS has the potential of being parallelised with an expected speed-up in the region of the population size. Furthermore, both methods require a significant smaller number of function evaluations when compared to Monte Carlo simulations with kink-jump moves.
Collapse
Affiliation(s)
- L Kapsokalivas
- King's College London, Department of Computer Science, London WC2R 2LS, England, United Kingdom
| | | | | | | |
Collapse
|
41
|
Lai Z, Su J, Chen W, Wang C. Uncovering the properties of energy-weighted conformation space networks with a hydrophobic-hydrophilic model. Int J Mol Sci 2009; 10:1808-1823. [PMID: 19468340 PMCID: PMC2680648 DOI: 10.3390/ijms10041808] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2009] [Revised: 03/30/2009] [Accepted: 04/07/2009] [Indexed: 11/16/2022] Open
Abstract
The conformation spaces generated by short hydrophobic-hydrophilic (HP) lattice chains are mapped to conformation space networks (CSNs). The vertices (nodes) of the network are the conformations and the links are the transitions between them. It has been found that these networks have "small-world" properties without considering the interaction energy of the monomers in the chain, i. e. the hydrophobic or hydrophilic amino acids inside the chain. When the weight based on the interaction energy of the monomers in the chain is added to the CSNs, it is found that the weighted networks show the "scale-free" characteristic. In addition, it reveals that there is a connection between the scale-free property of the weighted CSN and the folding dynamics of the chain by investigating the relationship between the scale-free structure of the weighted CSN and the noted parameter Z score. Moreover, the modular (community) structure of weighted CSNs is also studied. These results are helpful to understand the topological properties of the CSN and the underlying free-energy landscapes.
Collapse
Affiliation(s)
- Zaizhi Lai
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, P.R. China; E-Mails:
(Z.L.);
(J.S.);
(W.C.)
| | - Jiguo Su
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, P.R. China; E-Mails:
(Z.L.);
(J.S.);
(W.C.)
- College of Science, Yanshan University, Qinhuangdao, 066004, P.R. China
| | - Weizu Chen
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, P.R. China; E-Mails:
(Z.L.);
(J.S.);
(W.C.)
| | - Cunxin Wang
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, P.R. China; E-Mails:
(Z.L.);
(J.S.);
(W.C.)
| |
Collapse
|
42
|
Zeldovich KB, Shakhnovich EI. Understanding protein evolution: from protein physics to Darwinian selection. Annu Rev Phys Chem 2008; 59:105-27. [PMID: 17937598 DOI: 10.1146/annurev.physchem.58.032806.104449] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Efforts in whole-genome sequencing and structural proteomics start to provide a global view of the protein universe, the set of existing protein structures and sequences. However, approaches based on the selection of individual sequences have not been entirely successful at the quantitative description of the distribution of structures and sequences in the protein universe because evolutionary pressure acts on the entire organism, rather than on a particular molecule. In parallel to this line of study, studies in population genetics and phenomenological molecular evolution established a mathematical framework to describe the changes in genome sequences in populations of organisms over time. Here, we review both microscopic (physics-based) and macroscopic (organism-level) models of protein-sequence evolution and demonstrate that bridging the two scales provides the most complete description of the protein universe starting from clearly defined, testable, and physiologically relevant assumptions.
Collapse
Affiliation(s)
- Konstantin B Zeldovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
| | | |
Collapse
|
43
|
Patel BA, Debenedetti PG, Stillinger FH, Rossky PJ. The effect of sequence on the conformational stability of a model heteropolymer in explicit water. J Chem Phys 2008; 128:175102. [PMID: 18465941 DOI: 10.1063/1.2909974] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We investigate the properties of a two-dimensional lattice heteropolymer model for a protein in which water is explicitly represented. The model protein distinguishes between hydrophobic and polar monomers through the effect of the hydrophobic monomers on the entropy and enthalpy of the hydrogen bonding of solvation shell water molecules. As experimentally observed, model heteropolymer sequences fold into stable native states characterized by a hydrophobic core to avoid unfavorable interactions with the solvent. These native states undergo cold, pressure, and thermal denaturation into distinct configurations for each type of unfolding transition. However, the heteropolymer sequence is an important element, since not all sequences will fold into stable native states at positive pressures. Simulation of a large collection of sequences indicates that these fall into two general groups, those exhibiting highly stable native structures and those that do not. Statistical analysis of important patterns in sequences shows a strong tendency for observing long blocks of hydrophobic or polar monomers in the most stable sequences. Statistical analysis also shows that alternation of hydrophobic and polar monomers appears infrequently among the most stable sequences. These observations are not absolute design rules and, in practice, these are not sufficient to rationally design very stable heteropolymers. We also study the effect of mutations on improving the stability of the model proteins, and demonstrate that it is possible to obtain a very stable heteropolymer from directed evolution of an initially unstable heteropolymer.
Collapse
Affiliation(s)
- Bryan A Patel
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544, USA
| | | | | | | |
Collapse
|
44
|
Shakhnovich BE, Shakhnovich EI. Improvisation in evolution of genes and genomes: whose structure is it anyway? Curr Opin Struct Biol 2008; 18:375-81. [PMID: 18487041 DOI: 10.1016/j.sbi.2008.02.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2008] [Accepted: 02/13/2008] [Indexed: 01/31/2023]
Abstract
Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales--from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.
Collapse
Affiliation(s)
- Boris E Shakhnovich
- Department of Molecular and Cellular Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, United States
| | | |
Collapse
|
45
|
Goldstein RA. The structure of protein evolution and the evolution of protein structure. Curr Opin Struct Biol 2008; 18:170-7. [DOI: 10.1016/j.sbi.2008.01.006] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2007] [Revised: 12/20/2007] [Accepted: 01/09/2008] [Indexed: 11/29/2022]
|
46
|
Zhou T, Drummond DA, Wilke CO. Contact density affects protein evolutionary rate from bacteria to animals. J Mol Evol 2008; 66:395-404. [PMID: 18379715 DOI: 10.1007/s00239-008-9094-4] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2007] [Revised: 02/16/2008] [Accepted: 02/25/2008] [Indexed: 12/29/2022]
Abstract
The density of contacts or the fraction of buried sites in a protein structure is thought to be related to a protein's designability, and genes encoding more designable proteins should evolve faster than other genes. Several recent studies have tested this hypothesis but have found conflicting results. Here, we investigate how a gene's evolutionary rate is affected by its protein's contact density, considering the four species Escherichia coli, Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. We find for all four species that contact density correlates positively with evolutionary rate, and that these correlations do not seem to be confounded by gene expression level. The strength of this signal, however, varies widely among species. We also study the effect of contact density on domain evolution in multidomain proteins and find that a domain's contact density influences the domain's evolutionary rate. Within the same protein, a domain with higher contact density tends to evolve faster than a domain with lower contact density. Our study provides evidence that contact density can increase evolutionary rates, and that it acts similarly on the level of entire proteins and of individual protein domains.
Collapse
Affiliation(s)
- Tong Zhou
- Center for Computational Biology and Bioinformatics, Section of Integrative Biology, University of Texas at Austin, Austin, TX 78731, USA
| | | | | |
Collapse
|
47
|
Cao Y, Liang J. Optimal enumeration of state space of finitely buffered stochastic molecular networks and exact computation of steady state landscape probability. BMC SYSTEMS BIOLOGY 2008; 2:30. [PMID: 18373871 PMCID: PMC2375859 DOI: 10.1186/1752-0509-2-30] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2008] [Accepted: 03/29/2008] [Indexed: 11/10/2022]
Abstract
BACKGROUND Stochasticity plays important roles in many molecular networks when molecular concentrations are in the range of 0.1 muM to 10nM (about 100 to 10 copies in a cell). The chemical master equation provides a fundamental framework for studying these networks, and the time-varying landscape probability distribution over the full microstates, i.e., the combination of copy numbers of molecular species, provide a full characterization of the network dynamics. A complete characterization of the space of the microstates is a prerequisite for obtaining the full landscape probability distribution of a network. However, there are neither closed-form solutions nor algorithms fully describing all microstates for a given molecular network. RESULTS We have developed an algorithm that can exhaustively enumerate the microstates of a molecular network of small copy numbers under the condition that the net gain in newly synthesized molecules is smaller than a predefined limit. We also describe a simple method for computing the exact mean or steady state landscape probability distribution over microstates. We show how the full landscape probability for the gene networks of the self-regulating gene and the toggle-switch in the steady state can be fully characterized. We also give an example using the MAPK cascade network. Data and server will be available at URL: http://scsb.sjtu.edu.cn/statespace. CONCLUSION Our algorithm works for networks of small copy numbers buffered with a finite copy number of net molecules that can be synthesized, regardless of the reaction stoichiometry, and is optimal in both storage and time complexity. The algorithm can also be used to calculate the rates of all transitions between microstates from given reactions and reaction rates. The buffer size is limited by the available memory or disk storage. Our algorithm is applicable to a class of biological networks when the copy numbers of molecules are small and the network is closed, or the network is open but the net gain in newly synthesized molecules does not exceed a predefined buffer capacity. For these networks, our method allows full stochastic characterization of the mean landscape probability distribution, and the steady state when it exists.
Collapse
Affiliation(s)
- Youfang Cao
- Shanghai Center for Systems Biomedicine (SCSB), Shanghai Jiao Tong University, Shanghai, 200240, China
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Jie Liang
- Shanghai Center for Systems Biomedicine (SCSB), Shanghai Jiao Tong University, Shanghai, 200240, China
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60612, USA
| |
Collapse
|
48
|
Zeldovich KB, Chen P, Shakhnovich EI. Protein stability imposes limits on organism complexity and speed of molecular evolution. Proc Natl Acad Sci U S A 2007; 104:16152-7. [PMID: 17913881 PMCID: PMC2042177 DOI: 10.1073/pnas.0705366104] [Citation(s) in RCA: 187] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2007] [Indexed: 01/18/2023] Open
Abstract
Classical population genetics a priori assigns fitness to alleles without considering molecular or functional properties of proteins that these alleles encode. Here we study population dynamics in a model where fitness can be inferred from physical properties of proteins under a physiological assumption that loss of stability of any protein encoded by an essential gene confers a lethal phenotype. Accumulation of mutations in organisms containing Gamma genes can then be represented as diffusion within the Gamma-dimensional hypercube with adsorbing boundaries determined, in each dimension, by loss of a protein's stability and, at higher stability, by lack of protein sequences. Solving the diffusion equation whose parameters are derived from the data on point mutations in proteins, we determine a universal distribution of protein stabilities, in agreement with existing data. The theory provides a fundamental relation between mutation rate, maximal genome size, and thermodynamic response of proteins to point mutations. It establishes a universal speed limit on rate of molecular evolution by predicting that populations go extinct (via lethal mutagenesis) when mutation rate exceeds approximately six mutations per essential part of genome per replication for mesophilic organisms and one to two mutations per genome per replication for thermophilic ones. Several RNA viruses function close to the evolutionary speed limit, whereas error correction mechanisms used by DNA viruses and nonmutant strains of bacteria featuring various genome lengths and mutation rates have brought these organisms universally approximately 1,000-fold below the natural speed limit.
Collapse
Affiliation(s)
| | - Peiqiu Chen
- Departments of Chemistry and Chemical Biology and
- Physics, Harvard University, 12 Oxford Street, Cambridge, MA 02138
| | | |
Collapse
|
49
|
Meyerguz L, Kleinberg J, Elber R. The network of sequence flow between protein structures. Proc Natl Acad Sci U S A 2007; 104:11627-32. [PMID: 17596339 PMCID: PMC1913895 DOI: 10.1073/pnas.0701393104] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2007] [Indexed: 12/24/2022] Open
Abstract
Sequence-structure relationships in proteins are highly asymmetric because many sequences fold into relatively few structures. What is the number of sequences that fold into a particular protein structure? Is it possible to switch between stable protein folds by point mutations? To address these questions, we compute a directed graph of sequences and structures of proteins, which is based on 2,060 experimentally determined protein shapes from the Protein Data Bank. The directed graph is highly connected at native energies with "sinks" that attract many sequences from other folds. The sinks are rich in beta-sheets. The number of sequences that transition between folds is significantly smaller than the number of sequences retained by their fold. The sequence flow into a particular protein shape from other proteins correlates with the number of sequences that matches this shape in empirically determined genomes. Properties of strongly connected components of the graph are correlated with protein length and secondary structure.
Collapse
Affiliation(s)
- Leonid Meyerguz
- Department of Computer Science, Cornell University, Ithaca, NY 14853
| | - Jon Kleinberg
- Department of Computer Science, Cornell University, Ithaca, NY 14853
| | - Ron Elber
- Department of Computer Science, Cornell University, Ithaca, NY 14853
| |
Collapse
|
50
|
Yang JY, Yu ZG, Anh V. Correlations between designability and various structural characteristics of protein lattice models. J Chem Phys 2007; 126:195101. [PMID: 17523837 DOI: 10.1063/1.2737042] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Using six kinds of lattice types (4 x 4, 5 x 5, and 6 x 6 square lattices; 3 x 3 x 3 cubic lattice; and 2+3+4+3+2 and 4+5+6+5+4 triangular lattices), three different size alphabets (HP, HNUP, and 20 letters), and two energy functions, the designability of protein structures is calculated based on random samplings of structures and common biased sampling (CBS) of protein sequence space. Then three quantities stability (average energy gap), foldability, and partnum of the structure, which are defined to elucidate the designability, are calculated. The authors find that whatever the type of lattice, alphabet size, and energy function used, there will be an emergence of highly designable (preferred) structure. For all cases considered, the local interactions reduce degeneracy and make the designability higher. The designability is sensitive to the lattice type, alphabet size, energy function, and sampling method of the sequence space. Compared with the random sampling method, both the CBS and the Metropolis Monte Carlo sampling methods make the designability higher. The correlation coefficients between the designability, stability, and foldability are mostly larger than 0.5, which demonstrate that they have strong correlation relationship. But the correlation relationship between the designability and the partnum is not so strong because the partnum is independent of the energy. The results are useful in practical use of the designability principle, such as to predict the protein tertiary structure.
Collapse
Affiliation(s)
- Jian-Yi Yang
- School of Mathematics and Computing Science, Xiangtan University, Hunan 411105, China
| | | | | |
Collapse
|