1
|
Yan Z, Wang J. Superfunneled Energy Landscape of Protein Evolution Unifies the Principles of Protein Evolution, Folding, and Design. PHYSICAL REVIEW LETTERS 2019; 122:018103. [PMID: 31012725 DOI: 10.1103/physrevlett.122.018103] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 11/08/2018] [Indexed: 06/09/2023]
Abstract
Evolution is essential for shaping the biological functions. Darwin proposed the selection as the driving force for evolution upon mutations. While mutations are clear, the quantification of the selection force is still challenging. In this study, we identified and quantified both thermodynamic stability and kinetic accessibility as the selection forces for protein evolution. The protein evolution can be viewed and quantified as a trajectory moving along a superfunneled energy landscape with a line attractor at the bottom. The resulting evolved sequences and structures show strong protein characteristics including the hydrophobic core, high designability, and fast folding. The evolution principle uncovered here is validated on real proteins and sheds light on the protein design.
Collapse
Affiliation(s)
- Zhiqiang Yan
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin 130022, China
| | - Jin Wang
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin 130022, China
- Department of Chemistry & Physics, State University of New York at Stony Brook, Stony Brook, New York 11790, USA
| |
Collapse
|
2
|
Ferreira DC, van der Linden MG, de Oliveira LC, Onuchic JN, de Araújo AFP. Information and redundancy in the burial folding code of globular proteins within a wide range of shapes and sizes. Proteins 2016; 84:515-31. [PMID: 26815167 DOI: 10.1002/prot.24998] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Revised: 12/28/2015] [Accepted: 01/19/2016] [Indexed: 11/09/2022]
Abstract
Recent ab initio folding simulations for a limited number of small proteins have corroborated a previous suggestion that atomic burial information obtainable from sequence could be sufficient for tertiary structure determination when combined to sequence-independent geometrical constraints. Here, we use simulations parameterized by native burials to investigate the required amount of information in a diverse set of globular proteins comprising different structural classes and a wide size range. Burial information is provided by a potential term pushing each atom towards one among a small number L of equiprobable concentric layers. An upper bound for the required information is provided by the minimal number of layers L(min) still compatible with correct folding behavior. We obtain L(min) between 3 and 5 for seven small to medium proteins with 50 ≤ Nr ≤ 110 residues while for a larger protein with Nr = 141 we find that L ≥ 6 is required to maintain native stability. We additionally estimate the usable redundancy for a given L ≥ L(min) from the burial entropy associated to the largest folding-compatible fraction of "superfluous" atoms, for which the burial term can be turned off or target layers can be chosen randomly. The estimated redundancy for small proteins with L = 4 is close to 0.8. Our results are consistent with the above-average quality of burial predictions used in previous simulations and indicate that the fraction of approachable proteins could increase significantly with even a mild, plausible, improvement on sequence-dependent burial prediction or on sequence-independent constraints that augment the detectable redundancy during simulations.
Collapse
Affiliation(s)
- Diogo C Ferreira
- Laboratório de Biofísica Teórica e Computacional, Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
| | - Marx G van der Linden
- Laboratório de Biofísica Teórica e Computacional, Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
| | - Leandro C de Oliveira
- Departamento de Física, IBILCE, Universidade Estadual Paulista - UNESP, São José do Rio Preto, SP, 15054-000, Brazil
| | - José N Onuchic
- Center for Theoretical Biological Physics and Departments of Physics and Astronomy, Chemistry and Biosciences Rice University, 6100 Main Street, Houston, Texas, 77005
| | - Antônio F Pereira de Araújo
- Laboratório de Biofísica Teórica e Computacional, Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
| |
Collapse
|
3
|
The universal statistical distributions of the affinity, equilibrium constants, kinetics and specificity in biomolecular recognition. PLoS Comput Biol 2015; 11:e1004212. [PMID: 25885453 PMCID: PMC4401658 DOI: 10.1371/journal.pcbi.1004212] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2014] [Accepted: 02/24/2015] [Indexed: 01/01/2023] Open
Abstract
We uncovered the universal statistical laws for the biomolecular recognition/binding process. We quantified the statistical energy landscapes for binding, from which we can characterize the distributions of the binding free energy (affinity), the equilibrium constants, the kinetics and the specificity by exploring the different ligands binding with a particular receptor. The results of the analytical studies are confirmed by the microscopic flexible docking simulations. The distribution of binding affinity is Gaussian around the mean and becomes exponential near the tail. The equilibrium constants of the binding follow a log-normal distribution around the mean and a power law distribution in the tail. The intrinsic specificity for biomolecular recognition measures the degree of discrimination of native versus non-native binding and the optimization of which becomes the maximization of the ratio of the free energy gap between the native state and the average of non-native states versus the roughness measured by the variance of the free energy landscape around its mean. The intrinsic specificity obeys a Gaussian distribution near the mean and an exponential distribution near the tail. Furthermore, the kinetics of binding follows a log-normal distribution near the mean and a power law distribution at the tail. Our study provides new insights into the statistical nature of thermodynamics, kinetics and function from different ligands binding with a specific receptor or equivalently specific ligand binding with different receptors. The elucidation of distributions of the kinetics and free energy has guiding roles in studying biomolecular recognition and function through small-molecule evolution and chemical genetics. Uncovering the principles and underlying mechanisms of biomolecular recognition and molecular binding process is crucial for understanding the function and evolution, yet challenging. We meet the challenge by quantifying the statistical natures of the relevant physical variables of biomolecular recognition using the analytical model combined with microscopic flexible docking simulation methods. We uncovered the universal statistical laws obeyed by the affinity, equilibrium constant, intrinsic specificity and kinetics for biomolecular recognition. The general statistical laws based on energy landscape theory can serve as a conceptual framework for molecular recognition in biological repertoires. They can be applied to molecular selection, in vitro evolution process, high throughput screening and virtual screening for drug discovery. The statistical laws in combinations with experiments provide quantitative signatures of a specific ligand binding to a specific receptor, these resultant laws as a guideline will contribute to drug design against a specific target. Our developed statistical methodology is general and applicable for all other biomolecular recognitions.
Collapse
|
4
|
Arenas M, Sánchez-Cobos A, Bastolla U. Maximum-Likelihood Phylogenetic Inference with Selection on Protein Folding Stability. Mol Biol Evol 2015; 32:2195-207. [PMID: 25837579 DOI: 10.1093/molbev/msv085] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Despite intense work, incorporating constraints on protein native structures into the mathematical models of molecular evolution remains difficult, because most models and programs assume that protein sites evolve independently, whereas protein stability is maintained by interactions between sites. Here, we address this problem by developing a new mean-field substitution model that generates independent site-specific amino acid distributions with constraints on the stability of the native state against both unfolding and misfolding. The model depends on a background distribution of amino acids and one selection parameter that we fix maximizing the likelihood of the observed protein sequence. The analytic solution of the model shows that the main determinant of the site-specific distributions is the number of native contacts of the site and that the most variable sites are those with an intermediate number of native contacts. The mean-field models obtained, taking into account misfolded conformations, yield larger likelihood than models that only consider the native state, because their average hydrophobicity is more realistic, and they produce on the average stable sequences for most proteins. We evaluated the mean-field model with respect to empirical substitution models on 12 test data sets of different protein families. In all cases, the observed site-specific sequence profiles presented smaller Kullback-Leibler divergence from the mean-field distributions than from the empirical substitution model. Next, we obtained substitution rates combining the mean-field frequencies with an empirical substitution model. The resulting mean-field substitution model assigns larger likelihood than the empirical model to all studied families when we consider sequences with identity larger than 0.35, plausibly a condition that enforces conservation of the native structure across the family. We found that the mean-field model performs better than other structurally constrained models with similar or higher complexity. With respect to the much more complex model recently developed by Bordner and Mittelmann, which takes into account pairwise terms in the amino acid distributions and also optimizes the exchangeability matrix, our model performed worse for data with small sequence divergence but better for data with larger sequence divergence. The mean-field model has been implemented into the computer program Prot_Evol that is freely available at http://ub.cbm.uam.es/software/Prot_Evol.php.
Collapse
Affiliation(s)
- Miguel Arenas
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| | - Agustin Sánchez-Cobos
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| | - Ugo Bastolla
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| |
Collapse
|
5
|
Wolynes PG. Evolution, energy landscapes and the paradoxes of protein folding. Biochimie 2014; 119:218-30. [PMID: 25530262 DOI: 10.1016/j.biochi.2014.12.007] [Citation(s) in RCA: 110] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 12/11/2014] [Indexed: 01/25/2023]
Abstract
Protein folding has been viewed as a difficult problem of molecular self-organization. The search problem involved in folding however has been simplified through the evolution of folding energy landscapes that are funneled. The funnel hypothesis can be quantified using energy landscape theory based on the minimal frustration principle. Strong quantitative predictions that follow from energy landscape theory have been widely confirmed both through laboratory folding experiments and from detailed simulations. Energy landscape ideas also have allowed successful protein structure prediction algorithms to be developed. The selection constraint of having funneled folding landscapes has left its imprint on the sequences of existing protein structural families. Quantitative analysis of co-evolution patterns allows us to infer the statistical characteristics of the folding landscape. These turn out to be consistent with what has been obtained from laboratory physicochemical folding experiments signaling a beautiful confluence of genomics and chemical physics.
Collapse
Affiliation(s)
- Peter G Wolynes
- Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA.
| |
Collapse
|
6
|
Detecting selection on protein stability through statistical mechanical models of folding and evolution. Biomolecules 2014; 4:291-314. [PMID: 24970217 PMCID: PMC4030984 DOI: 10.3390/biom4010291] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2013] [Revised: 02/13/2014] [Accepted: 02/14/2014] [Indexed: 12/31/2022] Open
Abstract
The properties of biomolecules depend both on physics and on the evolutionary process that formed them. These two points of view produce a powerful synergism. Physics sets the stage and the constraints that molecular evolution has to obey, and evolutionary theory helps in rationalizing the physical properties of biomolecules, including protein folding thermodynamics. To complete the parallelism, protein thermodynamics is founded on the statistical mechanics in the space of protein structures, and molecular evolution can be viewed as statistical mechanics in the space of protein sequences. In this review, we will integrate both points of view, applying them to detecting selection on the stability of the folded state of proteins. We will start discussing positive design, which strengthens the stability of the folded against the unfolded state of proteins. Positive design justifies why statistical potentials for protein folding can be obtained from the frequencies of structural motifs. Stability against unfolding is easier to achieve for longer proteins. On the contrary, negative design, which consists in destabilizing frequently formed misfolded conformations, is more difficult to achieve for longer proteins. The folding rate can be enhanced by strengthening short-range native interactions, but this requirement contrasts with negative design, and evolution has to trade-off between them. Finally, selection can accelerate functional movements by favoring low frequency normal modes of the dynamics of the native state that strongly correlate with the functional conformation change.
Collapse
|
7
|
Krobath H, Shakhnovich EI, Faísca PFN. Structural and energetic determinants of co-translational folding. J Chem Phys 2014; 138:215101. [PMID: 23758397 DOI: 10.1063/1.4808044] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We performed extensive lattice Monte Carlo simulations of ribosome-bound stalled nascent chains (RNCs) to explore the relative roles of native topology and non-native interactions in co-translational folding of small proteins. We found that the formation of a substantial part of the native structure generally occurs towards the end of protein synthesis. However, multi-domain structures, which are rich in local interactions, are able to develop gradually during chain elongation, while those with proximate chain termini require full protein synthesis to fold. A detailed assessment of the conformational ensembles populated by RNCs with different lengths reveals that the directionality of protein synthesis has a fine-tuning effect on the probability to populate low-energy conformations. In particular, if the participation of non-native interactions in folding energetics is mild, the formation of native-like conformations is majorly determined by the properties of the contact map around the tethering terminus. Likewise, a pair of RNCs differing by only 1-2 residues can populate structurally well-resolved low energy conformations with significantly different probabilities. An interesting structural feature of these low-energy conformations is that, irrespective of native structure, their non-native interactions are always long-ranged and marginally stabilizing. A comparison between the conformational spectra of RNCs and chain fragments folding freely in the bulk reveals drastic changes amongst the two set-ups depending on the native structure. Furthermore, they also show that the ribosome may enhance (up to 20%) the population of low energy conformations for chains folding to native structures dominated by local interactions. In contrast, a RNC folding to a non-local topology is forced to remain largely unstructured but can attain low energy conformations in bulk.
Collapse
Affiliation(s)
- Heinrich Krobath
- Centro de Física da Matéria Condensada and Departamento de Física, Universidade de Lisboa, Av. Prof. Gama Pinto 2, 1649-003 Lisboa, Portugal
| | | | | |
Collapse
|
8
|
Minning J, Porto M, Bastolla U. Detecting selection for negative design in proteins through an improved model of the misfolded state. Proteins 2013; 81:1102-12. [PMID: 23280507 DOI: 10.1002/prot.24244] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 12/17/2012] [Indexed: 11/05/2022]
Abstract
Proteins that need to be structured in their native state must be stable both against the unfolded ensemble and against incorrectly folded (misfolded) conformations with low free energy. Positive design targets the first type of stability by strengthening native interactions. The second type of stability is achieved by destabilizing interactions that occur frequently in the misfolded ensemble, a strategy called negative design. Here, we investigate negative design adopting a statistical mechanical model of the misfolded ensemble, which improves the usual Gaussian approximation by taking into account the third moment of the energy distribution and contact correlations. Applying this model, we detect and quantify selection for negative design in most natural proteins, and we analytically design protein sequences that are stable both against unfolding and against misfolding.
Collapse
Affiliation(s)
- Jonas Minning
- Institut für Festkörperphysik, Technische Universität Darmstadt, Darmstadt, Germany
| | | | | |
Collapse
|
9
|
Krobath H, Faísca PFN. Interplay between native topology and non-native interactions in the folding of tethered proteins. Phys Biol 2013; 10:016002. [DOI: 10.1088/1478-3975/10/1/016002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
10
|
Bastolla U, Bruscolini P, Velasco JL. Sequence determinants of protein folding rates: Positive correlation between contact energy and contact range indicates selection for fast folding. Proteins 2012; 80:2287-304. [DOI: 10.1002/prot.24118] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2011] [Revised: 05/14/2012] [Accepted: 05/17/2012] [Indexed: 11/12/2022]
|
11
|
Dal Molin JP, da Silva MAA, Caliri A. Effect of local thermal fluctuations on folding kinetics: a study from the perspective of nonextensive statistical mechanics. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 84:041903. [PMID: 22181171 DOI: 10.1103/physreve.84.041903] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2010] [Revised: 08/04/2011] [Indexed: 05/31/2023]
Abstract
The search through the proteins conformational space is thought as an early independent stage of the folding process, governed mainly by the hydrophobic effect. Because of the nanoscopic size of proteins, we assume that the effects of local thermal fluctuations work like folding assistants, managed by the nonextensive parameter q. Using a 27-mer heteropolymer on a cubic lattice, we obtained--by Monte Carlo simulations--kinetic and thermodynamic amounts (such as the characteristic folding time and the native stability) as a function of temperature T and q for a few distinct native targets. We found that for each native structure, at a specific system temperature T, there exists an optimum q* that minimizes the folding characteristic time τ(min); for T=1, it is found that q* lies in the interval 1.15±0.05, even for native structures presenting significantly different topological complexities. The distribution of τ(min) obtained for specific q>1 (nonextensive approach) and temperature T can be fully reproduced for q=1 (Boltzmann approach), but only at higher temperatures T'>T. However, assuming that the complete set of proteins of each organism is optimized to work in a narrow range of temperature, we conclude that--for the present problem--the two approaches, namely, (T,q>1) and (T>T',q=1), cannot be equivalent; it is not a simple matter of reparametrization. Finally, by associating the nonextensive parameter q with the instantaneous degree of compactness of the globule, q becomes a dynamic variable, self-adjusted along the simulation. The results obtained through the q-variable approach are utterly consistent with those obtained by using a target-tuned parameter q*. However, in the former approach, q is automatically adjusted by the chain conformational evolution, eliminating the need to seek for a specific optimized value of q for each case. Besides, using the q-variable approach, different target structures are promptly characterized by inherent distributions of q, which reflect the overall complexity of their corresponding native topologies and energy landscapes.
Collapse
Affiliation(s)
- J P Dal Molin
- Departamento de Física e Química, FCFRP, Universidade de São Paulo, 14040-903 Ribeirão Preto, SP, Brazil.
| | | | | |
Collapse
|
12
|
|
13
|
Faísca PFN, Nunes A, Travasso RDM, Shakhnovich EI. Non-native interactions play an effective role in protein folding dynamics. Protein Sci 2011; 19:2196-209. [PMID: 20836137 DOI: 10.1002/pro.498] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Systematic Monte Carlo simulations of simple lattice models show that the final stage of protein folding is an ordered process where native contacts get locked (i.e., the residues come into contact and remain in contact for the duration of the folding process) in a well-defined order. The detailed study of the folding dynamics of protein-like sequences designed as to exhibit different contact energy distributions, as well as different degrees of sequence optimization (i.e., participation of non-native interactions in the folding process), reveals significant differences in the corresponding locking scenarios--the collection of native contacts and their average locking times, which are largely ascribable to the dynamics of non-native contacts. Furthermore, strong evidence for a positive role played by non-native contacts at an early folding stage was also found. Interestingly, for topologically simple target structures, a positive interplay between native and non-native contacts is observed also toward the end of the folding process, suggesting that non-native contacts may indeed affect the overall folding process. For target models exhibiting clear two-state kinetics, the relation between the nucleation mechanism of folding and the locking scenario is investigated. Our results suggest that the stabilization of the folding transition state can be achieved through the establishment of a very small network of native contacts that are the first to lock during the folding process.
Collapse
Affiliation(s)
- Patrícia F N Faísca
- Centro de Física da Matéria Condensada, Universidade de Lisboa, Av. Prof. Gama Pinto 2, 1649-003 Lisboa, Portugal.
| | | | | | | |
Collapse
|
14
|
Deeds EJ, Shakhnovich EI. A structure-centric view of protein evolution, design, and adaptation. ADVANCES IN ENZYMOLOGY AND RELATED AREAS OF MOLECULAR BIOLOGY 2010; 75:133-91, xi-xii. [PMID: 17124867 DOI: 10.1002/9780471224464.ch2] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Proteins, by virtue of their central role in most biological processes, represent one of the key subjects of the study of molecular evolution. Inherent in the indispensability of proteins for living cells is the fact that a given protein can adopt a specific three-dimensional shape that is specified solely by the protein's sequence of amino acids. Over the past several decades, structural biologists have demonstrated that the array of structures that proteins may adopt is quite astounding, and this has lead to a strong interest in understanding how protein structures change and evolve over time. In this review we consider a large body of recent work that attempts to illuminate this structure-centric picture of protein evolution. Much of this work has focused on the question of how completely new protein structures (i.e., new folds or topologies) are discovered by protein sequences as they evolve. Pursuant to this question of structural innovation has been a desire to describe and understand the observation that certain types of protein structures are far more abundant than others and how this uneven distribution of proteins implicates on the process through which new shapes are discovered. We consider a number of theoretical models that have been successful at explaining this heterogeneity in protein populations and discuss the increasing amount of evidence that indicates that the process of structural evolution involves the divergence of protein sequences and structures from one another. We also consider the topic of protein designability, which concerns itself with understanding how a protein's structure influences the number of sequences that can fold successfully into that structure. Understanding and quantifying the relationship between the physical feature of a structure and its designability has been a long-standing goal of the study of protein structure and evolution, and we discuss a number of recent advances that have yielded a promising answer to this question. Finally, we review the relatively new field of protein structural phylogeny, an area of study in which information about the distribution of protein structures among different organisms is used to reconstruct the evolutionary relationships between them. Taken together, the work that we review presents an increasingly coherent picture of how these unique polymers have evolved over the course of life on Earth.
Collapse
Affiliation(s)
- Eric J Deeds
- Department of Molecular and Cellular Biology, Harvard University, 7 Divinity Avenue, Cambridge, MA 02138, USA
| | | |
Collapse
|
15
|
Universal distribution of protein evolution rates as a consequence of protein folding physics. Proc Natl Acad Sci U S A 2010; 107:2983-8. [PMID: 20133769 DOI: 10.1073/pnas.0910445107] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
The hypothesis that folding robustness is the primary determinant of the evolution rate of proteins is explored using a coarse-grained off-lattice model. The simplicity of the model allows rapid computation of the folding probability of a sequence to any folded conformation. For each robust folder, the network of sequences that share its native structure is identified. The fitness of a sequence is postulated to be a simple function of the number of misfolded molecules that have to be produced to reach a characteristic protein abundance. After fixation probabilities of mutants are computed under a simple population dynamics model, a Markov chain on the fold network is constructed, and the fold-averaged evolution rate is computed. The distribution of the logarithm of the evolution rates across distinct networks exhibits a peak with a long tail on the low rate side and resembles the universal empirical distribution of the evolutionary rates more closely than either distribution resembles the log-normal distribution. The results suggest that the universal distribution of the evolutionary rates of protein-coding genes is a direct consequence of the basic physics of protein folding.
Collapse
|
16
|
Betancourt MR. Another look at the conditions for the extraction of protein knowledge-based potentials. Proteins 2009; 76:72-85. [PMID: 19089977 DOI: 10.1002/prot.22320] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Protein knowledge-based potentials are effective free energies obtained from databases of known protein structures. They are used to parameterize coarse-grained protein models in many folding simulation and structure prediction methods. Two common approaches are used in the derivation of knowledge-based potentials. One assumes that the energy parameters optimize the native structure stability. The other assumes that interaction events are related to their energies according to the Boltzmann distribution, and that they are distributed independently of other events, that is, the quasi-chemical approximation. Here, these assumptions are systematically tested by extracting contact energies from artificial databases of lattice proteins with predefined pairwise contact energies. Databases of protein sequences are designed to either satisfy the Boltzmann distribution at high or low temperatures, or to simultaneously optimize the native stability and folding kinetics. It is found that the quasi-chemical approximation, with the ideal reference state, accurately reproduce the true energies for high temperature Boltzmann distributed sequences (weakly interacting residues), but less accurately at low temperatures, where the sequences correspond to energy minima and the residues are strongly interacting. To overcome this problem, an iterative procedure for Boltzmann distributed sequences is introduced, which accounts for interacting residue correlations and eliminates the need for the quasi-chemical approximation. In this case, the energies are accurately reproduced at any ensemble temperature. However, when the database of sequences designed for optimal stability and kinetics is used, the energy correlation is less than optimal using either method, exhibiting random and systematic deviations from linearity. Therefore, the assumption that native structures are maximally stable or that sequences are determined according to the Boltzmann distribution seems to be inadequate for obtaining accurate energies. The limited number of sequences in the database and the inhomogeneous concentration of amino acids from one structure to another do not seem to be major obstacles for improving the quality of the extracted pairwise energies, with the exception of repulsive interactions.
Collapse
Affiliation(s)
- Marcos R Betancourt
- Department of Physics, Indiana University Purdue University Indianapolis, Indianapolis, Indiana 46202, USA.
| |
Collapse
|
17
|
Lai Z, Su J, Chen W, Wang C. Uncovering the properties of energy-weighted conformation space networks with a hydrophobic-hydrophilic model. Int J Mol Sci 2009; 10:1808-1823. [PMID: 19468340 PMCID: PMC2680648 DOI: 10.3390/ijms10041808] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2009] [Revised: 03/30/2009] [Accepted: 04/07/2009] [Indexed: 11/16/2022] Open
Abstract
The conformation spaces generated by short hydrophobic-hydrophilic (HP) lattice chains are mapped to conformation space networks (CSNs). The vertices (nodes) of the network are the conformations and the links are the transitions between them. It has been found that these networks have "small-world" properties without considering the interaction energy of the monomers in the chain, i. e. the hydrophobic or hydrophilic amino acids inside the chain. When the weight based on the interaction energy of the monomers in the chain is added to the CSNs, it is found that the weighted networks show the "scale-free" characteristic. In addition, it reveals that there is a connection between the scale-free property of the weighted CSN and the folding dynamics of the chain by investigating the relationship between the scale-free structure of the weighted CSN and the noted parameter Z score. Moreover, the modular (community) structure of weighted CSNs is also studied. These results are helpful to understand the topological properties of the CSN and the underlying free-energy landscapes.
Collapse
Affiliation(s)
- Zaizhi Lai
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, P.R. China; E-Mails:
(Z.L.);
(J.S.);
(W.C.)
| | - Jiguo Su
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, P.R. China; E-Mails:
(Z.L.);
(J.S.);
(W.C.)
- College of Science, Yanshan University, Qinhuangdao, 066004, P.R. China
| | - Weizu Chen
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, P.R. China; E-Mails:
(Z.L.);
(J.S.);
(W.C.)
| | - Cunxin Wang
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, P.R. China; E-Mails:
(Z.L.);
(J.S.);
(W.C.)
| |
Collapse
|
18
|
Peto M, Kloczkowski A, Honavar V, Jernigan RL. Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable. BMC Bioinformatics 2008; 9:487. [PMID: 19014713 PMCID: PMC2655094 DOI: 10.1186/1471-2105-9-487] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2008] [Accepted: 11/18/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations. RESULTS First, we generate all possible compact lattice conformations for the specified shape (a hexagon or a triangle) on the 2D triangular lattice. Then we generate all possible binary hydrophobic/polar (H/P) sequences and by using a specified energy function, thread them through all of these compact conformations. If for a given sequence the lowest energy is obtained for a particular lattice conformation we assume that this sequence folds to that conformation. Highly-designable conformations have many H/P sequences folding to them, while poorly-designable conformations have few or no H/P sequences. We classify sequences as folding to either highly- or poorly-designable conformations. We have randomly selected subsets of the sequences belonging to highly-designable and poorly-designable conformations and used them to train several different standard machine learning algorithms. CONCLUSION By using these machine learning algorithms with ten-fold cross-validation we are able to classify the two classes of sequences with high accuracy -- in some cases exceeding 95%.
Collapse
Affiliation(s)
- Myron Peto
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011-3020, USA
| | - Andrzej Kloczkowski
- Laurence H Baker Center for Bioinformatics and Biological Statistics, 112 Office and Lab Bldg, Iowa State University, Ames, IA 50011-3020, USA
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011-3020, USA
| | - Vasant Honavar
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| | - Robert L Jernigan
- Laurence H Baker Center for Bioinformatics and Biological Statistics, 112 Office and Lab Bldg, Iowa State University, Ames, IA 50011-3020, USA
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011-3020, USA
| |
Collapse
|
19
|
|
20
|
Zeldovich KB, Chen P, Shakhnovich BE, Shakhnovich EI. A first-principles model of early evolution: emergence of gene families, species, and preferred protein folds. PLoS Comput Biol 2008; 3:e139. [PMID: 17630830 PMCID: PMC1914367 DOI: 10.1371/journal.pcbi.0030139] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2007] [Accepted: 06/04/2007] [Indexed: 11/19/2022] Open
Abstract
In this work we develop a microscopic physical model of early evolution where phenotype—organism life expectancy—is directly related to genotype—the stability of its proteins in their native conformations—which can be determined exactly in the model. Simulating the model on a computer, we consistently observe the “Big Bang” scenario whereby exponential population growth ensues as soon as favorable sequence–structure combinations (precursors of stable proteins) are discovered. Upon that, random diversity of the structural space abruptly collapses into a small set of preferred proteins. We observe that protein folds remain stable and abundant in the population at timescales much greater than mutation or organism lifetime, and the distribution of the lifetimes of dominant folds in a population approximately follows a power law. The separation of evolutionary timescales between discovery of new folds and generation of new sequences gives rise to emergence of protein families and superfamilies whose sizes are power-law distributed, closely matching the same distributions for real proteins. On the population level we observe emergence of species—subpopulations that carry similar genomes. Further, we present a simple theory that relates stability of evolving proteins to the sizes of emerging genomes. Together, these results provide a microscopic first-principles picture of how first-gene families developed in the course of early evolution. Here, we address the question of how Darwinian evolution of organisms determines molecular evolution of their proteins and genomes. We developed a microscopic ab initio model of early biological evolution where the fitness (essentially lifetime) of an organism is explicitly related to the evolving sequences of its proteins. The main assumption of the model is that the death rate of an organism is determined by the stability of the least stable of their proteins. A lattice model is used to calculate stability of all proteins in a genome from their amino acid sequence. The simulation of the model starts from 100 identical organisms, each carrying the same random gene, and proceeds via random mutations, gene duplication, organism births via replication, and organism deaths. We find that exponential population growth is possible only after the discovery of a very small number of specific advantageous protein structures. The number of genes in the evolving organisms depends on the mutation rate, demonstrating the intricate relationship between the genome sizes and protein stability requirements. Further, the model explains the observed power-law distributions of protein family and superfamily sizes, as well as the scale-free character of protein structural similarity graphs. Together, these results and their analysis suggest a plausible comprehensive scenario of emergence of the protein universe in early biological evolution.
Collapse
Affiliation(s)
- Konstantin B Zeldovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Peiqiu Chen
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Physics, Harvard University, Cambridge, Massachusetts, United States of America
| | - Boris E Shakhnovich
- Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
21
|
Zeldovich KB, Shakhnovich EI. Understanding protein evolution: from protein physics to Darwinian selection. Annu Rev Phys Chem 2008; 59:105-27. [PMID: 17937598 DOI: 10.1146/annurev.physchem.58.032806.104449] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Efforts in whole-genome sequencing and structural proteomics start to provide a global view of the protein universe, the set of existing protein structures and sequences. However, approaches based on the selection of individual sequences have not been entirely successful at the quantitative description of the distribution of structures and sequences in the protein universe because evolutionary pressure acts on the entire organism, rather than on a particular molecule. In parallel to this line of study, studies in population genetics and phenomenological molecular evolution established a mathematical framework to describe the changes in genome sequences in populations of organisms over time. Here, we review both microscopic (physics-based) and macroscopic (organism-level) models of protein-sequence evolution and demonstrate that bridging the two scales provides the most complete description of the protein universe starting from clearly defined, testable, and physiologically relevant assumptions.
Collapse
Affiliation(s)
- Konstantin B Zeldovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
| | | |
Collapse
|
22
|
Analyzing pathogenic mutations of C5 domain from cardiac myosin binding protein C through MD simulations. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2008; 37:683-91. [DOI: 10.1007/s00249-008-0308-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2007] [Revised: 02/04/2008] [Accepted: 03/10/2008] [Indexed: 11/26/2022]
|
23
|
Franzosa E, Xia Y. Structural Perspectives on Protein Evolution. ANNUAL REPORTS IN COMPUTATIONAL CHEMISTRY 2008. [DOI: 10.1016/s1574-1400(08)00001-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
24
|
Abstract
Familiar concepts for small molecules may require reinterpretation for larger systems. For example, rearrangements between geometrical isomers are usually considered in terms of transitions between the corresponding local minima on the underlying potential energy surface, V. However, transitions between bulk phases such as solid and liquid, or between the denatured and native states of a protein, are normally addressed in terms of free energy minima. To reestablish a connection with the potential energy surface we must think in terms of representative samples of local minima of V, from which a free energy surface is projected by averaging over most of the coordinates. The present contribution outlines how this connection can be developed into a tool for quantitative calculations. In particular, stepping between the local minima of V provides powerful methods for locating the global potential energy minimum, and for calculating global thermodynamic properties. When the transition states that link local minima are also sampled we can exploit statistical rate theory to obtain insight into global dynamics and rare events. Visualizing the potential energy landscape helps to explain how the network of local minima and transition states determines properties such as heat capacity features, which signify transitions between free energy minima. The organization of the landscape also reveals how certain systems can reliably locate particular structures on the experimental time scale from among an exponentially large number of local minima. Such directed searches not only enable proteins to overcome Levinthal's paradox but may also underlie the formation of "magic numbers" in molecular beams, the self-assembly of macromolecular structures, and crystallization.
Collapse
Affiliation(s)
- David J Wales
- University Chemical Laboratories, Lensfield Road, Cambridge CB2 1EW, UK.
| | | |
Collapse
|
25
|
Peto M, Kloczkowski A, Jernigan RL. Shape-dependent designability studies of lattice proteins. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2007; 19:285220-285230. [PMID: 18079979 PMCID: PMC2134837 DOI: 10.1088/0953-8984/19/28/285220] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
One important problem in computational structural biology is protein designability, that is, why protein sequences are not random strings of amino acids but instead show regular patterns that encode protein structures. Many previous studies that have attempted to solve the problem have relied upon reduced models of proteins. In particular, the 2D square and the 3D cubic lattices together with reduced amino acid alphabet models have been examined extensively and have lead to interesting results that shed some light on evolutionary relationship among proteins. Here we perform designability studies on the 2D square lattice and explore the effects of variable overall shapes on protein designability using a binary hydrophobic-polar (HP) amino acid alphabet. Because we rely on a simple energy function that counts the total number of H-H interactions between non-sequential residues, we restrict our studies to protein shapes that have the same number of residues and also a constant number of non-bonded contacts. We have found that there is a marked difference in the designability between various protein shapes, with some of them accounting for a significantly larger share of the total foldable sequences.
Collapse
Affiliation(s)
- Myron Peto
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011-3020
| | | | | |
Collapse
|
26
|
The Structurally Constrained Neutral Model of Protein Evolution. ACTA ACUST UNITED AC 2007. [DOI: 10.1007/978-3-540-35306-5_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
27
|
Matysiak S, Clementi C. Minimalist protein model as a diagnostic tool for misfolding and aggregation. J Mol Biol 2006; 363:297-308. [PMID: 16959265 DOI: 10.1016/j.jmb.2006.07.088] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2006] [Revised: 07/25/2006] [Accepted: 07/28/2006] [Indexed: 11/24/2022]
Abstract
We propose a realistic coarse-grained protein model and a technique to "anchor" the model to available experimental data. We apply this procedure to characterize the effect of multiple mutations on the folding mechanism of protein S6. We show that the mutation of a few "gatekeeper" residues triggers significant changes on the folding landscape of S6. These results suggest that gatekeeper residues control the flexibility of critical regions of S6, that in turn regulates the delicate balance between folding and aggregation. Although obtained with a minimalist protein model, these results are fully consistent with experimental evidence and offer a clue to understand the interplay between folding and aggregation in protein S6.
Collapse
Affiliation(s)
- Silvina Matysiak
- Department of Chemistry, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | | |
Collapse
|
28
|
Bastolla U, Porto M, Roman HE, Vendruscolo M. A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank. BMC Evol Biol 2006; 6:43. [PMID: 16737532 PMCID: PMC1570368 DOI: 10.1186/1471-2148-6-43] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2005] [Accepted: 05/31/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Since thermodynamic stability is a global property of proteins that has to be conserved during evolution, the selective pressure at a given site of a protein sequence depends on the amino acids present at other sites. However, models of molecular evolution that aim at reconstructing the evolutionary history of macromolecules become computationally intractable if such correlations between sites are explicitly taken into account. RESULTS We introduce an evolutionary model with sites evolving independently under a global constraint on the conservation of structural stability. This model consists of a selection process, which depends on two hydrophobicity parameters that can be computed from protein sequences without any fit, and a mutation process for which we consider various models. It reproduces quantitatively the results of Structurally Constrained Neutral (SCN) simulations of protein evolution in which the stability of the native state is explicitly computed and conserved. We then compare the predicted site-specific amino acid distributions with those sampled from the Protein Data Bank (PDB). The parameters of the mutation model, whose number varies between zero and five, are fitted from the data. The mean correlation coefficient between predicted and observed site-specific amino acid distributions is larger than <r> = 0.70 for a mutation model with no free parameters and no genetic code. In contrast, considering only the mutation process with no selection yields a mean correlation coefficient of <r> = 0.56 with three fitted parameters. The mutation model that best fits the data takes into account increased mutation rate at CpG dinucleotides, yielding <r> = 0.90 with five parameters. CONCLUSION The effective selection process that we propose reproduces well amino acid distributions as observed in the protein sequences in the PDB. Its simplicity makes it very promising for likelihood calculations in phylogenetic studies. Interestingly, in this approach the mutation process influences the effective selection process, i.e. selection and mutation must be entangled in order to obtain effectively independent sites. This interdependence between mutation and selection reflects the deep influence that mutation has on the evolutionary process: The bias in the mutation influences the thermodynamic properties of the evolving proteins, in agreement with comparative studies of bacterial proteomes, and it also influences the rate of accepted mutations.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Biología Molecular "Severo Ochoa", (CSIC-UAM), Cantoblanco, 28049 Madrid, Spain
| | - Markus Porto
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstr. 8, 64289 Darmstadt, Germany
| | - H Eduardo Roman
- Dipartimento di Fisica, Università di Milano Bicocca, Piazza della Scienza 3, 20126 Milano, Italy
| | - Michele Vendruscolo
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| |
Collapse
|
29
|
Abstract
With the aim of studying the relationship between protein sequences and their native structures, we adopted vectorial representations for both sequence and structure. The structural representation was based on the principal eigenvector of the fold's contact matrix (PE). As has been recently shown, the latter encodes sufficient information for reconstructing the whole contact matrix. The sequence was represented through a hydrophobicity profile (HP), using a generalized hydrophobicity scale that we obtained from the principal eigenvector of a residue-residue interaction matrix, and denoted as interactivity scale. Using this novel scale, we defined the optimal HP of a protein fold, and, by means of stability arguments, predicted to be strongly correlated with the PE of the fold's contact matrix. This prediction was confirmed through an evolutionary analysis, which showed that the PE correlates with the HP of each individual sequence adopting the same fold and, even more strongly, with the average HP of this set of sequences. Thus, protein sequences evolve in such a way that their average HP is close to the optimal one, implying that neutral evolution can be viewed as a kind of motion in sequence space around the optimal HP. Our results indicate that the correlation coefficient between N-dimensional vectors constitutes a natural metric in the vectorial space in which we represent both protein sequences and protein structures, which we call vectorial protein space. In this way, we define a unified framework for sequence-to-sequence, sequence-to-structure and structure-to-structure alignments. We show that the interactivity scale is nearly optimal both for the comparison of sequences to sequences and sequences to structures.
Collapse
|
30
|
Shell MS, Debenedetti PG, Panagiotopoulos AZ. Computational characterization of the sequence landscape in simple protein alphabets. Proteins 2005; 62:232-43. [PMID: 16284961 DOI: 10.1002/prot.20714] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We characterize the "sequence landscapes" in several simple, heteropolymer models of proteins by examining their mutation properties. Using an efficient flat-histogram Monte Carlo search method, our approach involves determining the distribution in energy of all sequences of a given length when threaded through a common backbone. These calculations are performed for a number of Protein Data Bank structures using two variants of the 20-letter contact potential developed by Miyazawa and Jernigan [Miyazawa S, Jernigan WL. Macromolecules 1985;18:534], and the 2-monomer HP model of Lau and Dill [Lau KF, Dill KA. Macromolecules 1989;22:3986]. Our results indicate significant differences among the energy functions in terms of the "smoothness" of their landscapes. In particular, one of the Miyazawa-Jernigan contact potentials reveals unusual cooperative behavior among its species' interactions, resulting in what is essentially a set of phase transitions in sequence space. Our calculations suggest that model-specific features can have a profound effect on protein design algorithms, and our methods offer a number of ways by which sequence landscapes can be quantified.
Collapse
Affiliation(s)
- M Scott Shell
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544, USA.
| | | | | |
Collapse
|
31
|
Bastolla U, Demetrius L. Stability constraints and protein evolution: the role of chain length, composition and disulfide bonds. Protein Eng Des Sel 2005; 18:405-15. [PMID: 16085657 DOI: 10.1093/protein/gzi045] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Stability of the native state is an essential requirement in protein evolution and design. Here we investigated the interplay between chain length and stability constraints using a simple model of protein folding and a statistical study of the Protein Data Bank. We distinguish two types of stability of the native state: with respect to the unfolded state (unfolding stability) and with respect to misfolded configurations (misfolding stability). Several contributions to stability are evaluated and their correlations are disentangled through principal components analysis, with the following main results. (1) We show that longer proteins can fulfil more easily the requirements of unfolding and misfolding stability, because they have a higher number of native interactions per residue. Consistently, in longer proteins native interactions are weaker and they are less optimized with respect to non-native interactions. (2) Stability against misfolding is negatively correlated with the strength of native interactions, which is related to hydrophobicity. Hence there is a trade-off between unfolding and misfolding stability. This trade-off is influenced by protein length: less hydrophobic sequences are observed in very long proteins. (3) The number of disulfide bonds is positively correlated with the deficit of free energy stabilizing the native state. Chain length and the number of disulfide bonds per residue are negatively correlated in proteins with short chains and uncorrelated in proteins with long chains. (4) The number of salt bridges per residue and per native contact increases with chain length. We interpret these observations as an indication that the constraints imposed by unfolding stability are less demanding in long proteins and they are further reduced by the competing requirement for stability against misfolding. In particular, disulfide bonds appear to be positively selected in short proteins, whereas they evolve in an effectively neutral way in long proteins.
Collapse
Affiliation(s)
- U Bastolla
- Centro de Astrobiología (INTA-CSIC), 28850 Torrejón de Ardoz and Centro de Biología Molecular 'Severo Ochoa', Cantoblanco, 28049 Madrid, Spain.
| | | |
Collapse
|
32
|
Das P, Matysiak S, Clementi C. Balancing energy and entropy: a minimalist model for the characterization of protein folding landscapes. Proc Natl Acad Sci U S A 2005; 102:10141-6. [PMID: 16006532 PMCID: PMC1177359 DOI: 10.1073/pnas.0409471102] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2005] [Indexed: 11/18/2022] Open
Abstract
Coarse-grained models have been extremely valuable in promoting our understanding of protein folding. However, the quantitative accuracy of existing simplified models is strongly hindered either from the complete removal of frustration (as in the widely used Gō-like models) or from the compromise with the minimal frustration principle and/or realistic protein geometry (as in the simple on-lattice models). We present a coarse-grained model that "naturally" incorporates sequence details and energetic frustration into an overall minimally frustrated folding landscape. The model is coupled with an optimization procedure to design the parameters of the protein Hamiltonian to fold into a desired native structure. The application to the study of src-Src homology 3 domain shows that this coarse-grained model contains the main physical-chemical ingredients that are responsible for shaping the folding landscape of this protein. The results illustrate the importance of nonnative interactions and energetic heterogeneity for a quantitative characterization of folding mechanisms.
Collapse
Affiliation(s)
- Payel Das
- Department of Chemistry, Rice University, Houston, TX 77005, USA
| | | | | |
Collapse
|
33
|
Choi HS, Huh J, Jo WH. Comparison between denaturant- and temperature-induced unfolding pathways of protein: a lattice Monte Carlo simulation. Biomacromolecules 2005; 5:2289-96. [PMID: 15530044 DOI: 10.1021/bm049663p] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Denaturant-induced unfolding of protein is simulated by using a Monte Carlo simulation with a lattice model for protein and denaturant. Following the binding theory for denaturant-induced unfolding, the denaturant molecules are modeled to interact with protein by nearest-neighbor interactions. By analyzing the conformational states on the unfolding pathway of protein, the denaturant-induced unfolding pathway is compared with the temperature-induced unfolding pathway under the same condition; that is, the free energies of unfolding under two different pathways are equal. The two unfoldings show markedly different conformational distributions in unfolded states. From the calculation of the free energy of protein as a function of the number fraction (Q0) of native contacts relative to the total number of contacts, it is found that the free energy of the largely unfolded state corresponding to low Q0 (0.1 < Q0 < 0.5) under temperature-induced unfolding is lower than that under denaturant-induced unfolding, whereas the free energy of the unfolded state close to the native state (Q0 > 0.5) is lower in denaturant-induced unfolding than in temperature-induced unfolding. A comparison of two unfolding pathways reveals that the denaturant-induced unfolding shows a wider conformational distribution than the temperature-induced unfolding, while the temperature-induced unfolding shows a more compact unfolded state than the denaturant-induced unfolding especially in the low Q0 region (0.1 < Q0 < 0.5).
Collapse
Affiliation(s)
- Ho Sup Choi
- Hyperstructured Organic Materials Research Center, School of Material Science and Engineering, Seoul National University, Seoul 151-744, Korea
| | | | | |
Collapse
|
34
|
Wang J, Huang W, Lu H, Wang E. Downhill kinetics of biomolecular interface binding: globally connected scenario. Biophys J 2005; 87:2187-94. [PMID: 15454421 PMCID: PMC1304644 DOI: 10.1529/biophysj.104.042747] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We study the kinetics of the biomolecular binding process at the interface using energy landscape theory. The global kinetic connectivity case is considered for a downhill funneled energy landscape. By solving the kinetic master equation, the kinetic time for binding is obtained and shown to have a U-shape curve-dependence on the temperature. The kinetic minimum of the binding time monotonically decreases when the ratio of the underlying energy gap between native state and average non-native states versus the roughness or the fluctuations of the landscape increases. At intermediate temperatures, fluctuations measured by the higher moments of the binding time lead to non-Poissonian, non-exponential kinetics. At both high and very low temperatures, the kinetics is nearly Poissonian and exponential.
Collapse
Affiliation(s)
- Jin Wang
- State Key Laboratory of Electro-analytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, People's Republic of China.
| | | | | | | |
Collapse
|
35
|
Briones C, Bastolla U. Protein evolution in viral quasispecies under selective pressure: A thermodynamic and phylogenetic analysis. Gene 2005; 347:237-46. [PMID: 15725390 DOI: 10.1016/j.gene.2004.12.018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2004] [Revised: 11/23/2004] [Accepted: 12/10/2004] [Indexed: 01/21/2023]
Abstract
The evolution of RNA viruses under antiviral pressure is characterized by high mutation rates and strong selective forces that induce extremely rapid changes of protein sequences. This makes the course of molecular evolution directly observable on time scales of months. Here we study the interplay between selection for drug resistance and selection for thermodynamic stability in the protease (PR) and the reverse transcriptase (RT) of human immunodeficiency virus type 1 (HIV-1) clones extracted from two patients with complex treatment histories. This analysis shows that folding thermodynamic properties may fluctuate very strongly in the course of quasispecies evolution under selective pressure. For the first case, our data suggest that folding efficiency of the RT is sacrificed at the advantage of drug resistance, while the corresponding PR seems to undergo selection for thermodynamic stability in the absence of substitutions associated to resistance. The PR of the second case is not submitted to antiviral pressure during the period analyzed and seems to initiate random fluctuations that lead to the accidental increase of its folding efficiency. In summary, joint consideration of sequence evolution and thermodynamic parameters can represent a more comprehensive approach for the study of the evolution of RNA viruses.
Collapse
Affiliation(s)
- Carlos Briones
- Molecular Evolution Laboratory, Centro de Astrobiología, CSIC-INTA, Carretera de Ajalvir, Km. 4, 28850 Torrejón de Ardoz, Madrid, Spain.
| | | |
Collapse
|
36
|
Bastolla U, Porto M, Roman HE, Vendruscolo M. Looking at structure, stability, and evolution of proteins through the principal eigenvector of contact matrices and hydrophobicity profiles. Gene 2005; 347:219-30. [PMID: 15777696 DOI: 10.1016/j.gene.2004.12.015] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2004] [Revised: 11/29/2004] [Accepted: 12/10/2004] [Indexed: 11/28/2022]
Abstract
We review and further develop an analytical model that describes how thermodynamic constraints on the stability of the native state influence protein evolution in a site-specific manner. To this end, we represent both protein sequences and protein structures as vectors: structures are represented by the principal eigenvector (PE) of the protein contact matrix, a quantity that resembles closely the effective connectivity of each site; sequences are represented through the "interactivity" of each amino acid type, using novel parameters that are correlated with hydropathy scales. These interactivity parameters are more strongly correlated than the other hydropathy scales that we examine with: (1) the change upon mutations of the unfolding free energy of proteins with two-states thermodynamics; (2) genomic properties as the genome-size and the genome-wide GC content; (3) the main eigenvectors of the substitution matrices. The evolutionary average of the interactivity vector correlates very strongly with the PE of a protein structure. Using this result, we derive an analytic expression for site-specific distributions of amino acids across protein families in the form of Boltzmann distributions whose "inverse temperature" is a function of the PE component. We show that our predictions are in agreement with site-specific amino acid distributions obtained from the Protein Data Bank, and we determine the mutational model that best fits the observed site-specific amino acid distributions. Interestingly, the optimal model almost minimizes the rate at which deleterious mutations are eliminated by natural selection.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Astrobiología, INTA-CSIC, c.tra de Ajalvir km.4, E-28850, Torrejón de Ardoz, Madrid, Spain.
| | | | | | | |
Collapse
|
37
|
Abstract
We present a directed essential dynamics (DED) method for peptide and protein folding. DED is a molecular dynamics method based on the essential dynamics sampling and the principal component analysis. The main idea of DED is to use principal component analysis to determine the direction of the most active collective motion of peptides at short intervals of time (20 fs) during the folding process and then add an additional force along it to adjust the folding direction. This method can make the peptides avoid being trapped in the local minima for a long time and enhance the sampling efficiency in conformational space during the simulation. An S-peptide with 15 amino acids is used to demonstrate the DED method. The results show that DED can lead the S-peptide to fold quickly into the native state, whereas traditional molecular dynamics needs more time to do this.
Collapse
Affiliation(s)
- Changjun Chen
- Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | | | | |
Collapse
|
38
|
Porto M, Roman HE, Vendruscolo M, Bastolla U. Prediction of site-specific amino acid distributions and limits of divergent evolutionary changes in protein sequences. Mol Biol Evol 2004; 22:630-8. [PMID: 15537801 DOI: 10.1093/molbev/msi048] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We derive an analytic expression for site-specific stationary distributions of amino acids from the structurally constrained neutral (SCN) model of protein evolution with conservation of folding stability. The stationary distributions that we obtain have a Boltzmann-like shape, and their effective temperature parameter, measuring the limit of divergent evolutionary changes at a given site, can be predicted from a site-specific topological property, the principal eigenvector of the contact matrix of the native conformation of the protein. These analytic results, obtained without free parameters, are compared with simulations of the SCN model and with the site-specific amino acid distributions obtained from the Protein Data Bank. These results also provide new insights into how the topology of a protein fold influences its designability, i.e., the number of sequences compatible with that fold. The dependence of the effective temperature on the principal eigenvector decreases for longer proteins, as a possible consequence of the fact that selection for thermodynamic stability becomes weaker in this case.
Collapse
Affiliation(s)
- Markus Porto
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstr. 8, 64289 Darmstadt, Germany.
| | | | | | | |
Collapse
|
39
|
Bastolla U, Moya A, Viguera E, van Ham RCHJ. Genomic determinants of protein folding thermodynamics in prokaryotic organisms. J Mol Biol 2004; 343:1451-66. [PMID: 15491623 DOI: 10.1016/j.jmb.2004.08.086] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2004] [Revised: 08/24/2004] [Accepted: 08/27/2004] [Indexed: 02/07/2023]
Abstract
Here we investigate how thermodynamic properties of orthologous proteins are influenced by the genomic environment in which they evolve. We performed a comparative computational study of 21 protein families in 73 prokaryotic species and obtained the following main results. (i) Protein stability with respect to the unfolded state and with respect to misfolding are anticorrelated. There appears to be a trade-off between these two properties, which cannot be optimized simultaneously. (ii) Folding thermodynamic parameters are strongly correlated with two genomic features, genome size and G+C composition. In particular, the normalized energy gap, an indicator of folding efficiency in statistical mechanical models of protein folding, is smaller in proteins of organisms with a small genome size and a compositional bias towards A+T. Such genomic features are characteristic for bacteria with an intracellular lifestyle. We interpret these correlations in light of mutation pressure and natural selection. A mutational bias toward A+T at the DNA level translates into a mutational bias toward more hydrophobic (and in general more interactive) proteins, a consequence of the structure of the genetic code. Increased hydrophobicity renders proteins more stable against unfolding but less stable against misfolding. Proteins with high hydrophobicity and low stability against misfolding occur in organisms with reduced genomes, like obligate intracellular bacteria. We argue that they are fixed because these organisms experience weaker purifying selection due to their small effective population sizes. This interpretation is supported by the observation of a high expression level of chaperones in these bacteria. Our results indicate that the mutational spectrum of a genome and the strength of selection significantly influence protein folding thermodynamics.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Astrobiología (CSIC-INTA), E-28850 Torrejón de Ardoz, Spain.
| | | | | | | |
Collapse
|
40
|
Abstract
The fastest simple, kinetically two-state protein folds a million times more rapidly than the slowest. Here we review many recent theories of protein folding kinetics in terms of their ability to qualitatively rationalize, if not quantitatively predict, this fundamental experimental observation.
Collapse
Affiliation(s)
- Blake Gillespie
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, Santa Barbara, California 93106, USA.
| | | |
Collapse
|
41
|
Bloom JD, Wilke CO, Arnold FH, Adami C. Stability and the evolvability of function in a model protein. Biophys J 2004; 86:2758-64. [PMID: 15111394 PMCID: PMC1304146 DOI: 10.1016/s0006-3495(04)74329-5] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2003] [Accepted: 01/12/2004] [Indexed: 11/18/2022] Open
Abstract
Functional proteins must fold with some minimal stability to a structure that can perform a biochemical task. Here we use a simple model to investigate the relationship between the stability requirement and the capacity of a protein to evolve the function of binding to a ligand. Although our model contains no built-in tradeoff between stability and function, proteins evolved function more efficiently when the stability requirement was relaxed. Proteins with both high stability and high function evolved more efficiently when the stability requirement was gradually increased than when there was constant selection for high stability. These results show that in our model, the evolution of function is enhanced by allowing proteins to explore sequences corresponding to marginally stable structures, and that it is easier to improve stability while maintaining high function than to improve function while maintaining high stability. Our model also demonstrates that even in the absence of a fundamental biophysical tradeoff between stability and function, the speed with which function can evolve is limited by the stability requirement imposed on the protein.
Collapse
Affiliation(s)
- Jesse D Bloom
- Department of Chemistry, California Institute of Technology, Pasadena, California 91125, USA.
| | | | | | | |
Collapse
|
42
|
Li J, Wang J, Zhang J, Wang W. Thermodynamic stability and kinetic foldability of a lattice protein model. J Chem Phys 2004; 120:6274-87. [PMID: 15267515 DOI: 10.1063/1.1651053] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
By using serial mutations, i.e., a residue replaced by 19 kinds of naturally occurring residues, the stability of native conformation and folding behavior of mutated sequences are studied. The 3 x 3 x 3 lattice protein model with two kinds of interaction potentials between the residues, namely the original Miyazawa and Jernigan (MJ) potentials and the modified MJ potentials (MMJ), is used. Effects of various sites in the mutated sequences on the stability and foldability are characterized through the Z-score and the folding time. It is found that the sites can be divided into three types, namely the hydrophobic-type (H-type), the hydrophilic-type (P-type) and the neutral-type (N-type). These three types of sites relate to the hydrophobic core, the hydrophilic surface and the parts between them. The stability of the native conformation for the serial mutated sequences increases (or decreases) as the increasing in the hydrophobicity of the mutated residues for the H-type sites (or the P-type sites), while varies randomly for the N-type sites. However, the foldability of the mutated sequences is not always consistent with the thermodynamic stability, and their relationship depends on the site types. Since the hydrophobic tendency of the MJ potentials is strong, the ratio between the number of the H-type sites and the number of the P-type sites is found to be 1:2. Differently, for the MJJ potentials it is found that such a ratio is about 1:1 which is relevant to that of real proteins. This suggests that the modification of the MJ potentials is rational in the aspect of thermodynamic stability. The folding of model proteins with the MMJ potentials is fast. However, the relationship between the foldability and the thermodynamic stability of the mutated sequences is complex.
Collapse
Affiliation(s)
- Jie Li
- National Lab of Solid State Microstructure and Physics Department, Nanjing University, Nanjing 210093, China
| | | | | | | |
Collapse
|
43
|
Tiana G, Shakhnovich BE, Dokholyan NV, Shakhnovich EI. Imprint of evolution on protein structures. Proc Natl Acad Sci U S A 2004; 101:2846-51. [PMID: 14970345 PMCID: PMC365708 DOI: 10.1073/pnas.0306638101] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2003] [Accepted: 12/22/2003] [Indexed: 11/18/2022] Open
Abstract
We attempt to understand the evolutionary origin of protein folds by simulating their divergent evolution with a three-dimensional lattice model. Starting from an initial seed lattice structure, evolution of model proteins progresses by sequence duplication and subsequent point mutations. A new gene's ability to fold into a stable and unique structure is tested each time through direct kinetic folding simulations. Where possible, the algorithm accepts the new sequence and structure and thus a "new protein structure" is born. During the course of each run, this model evolutionary algorithm provides several thousand new proteins with diverse structures. Analysis of evolved structures shows that later evolved structures are more designable than seed structures as judged by recently developed structural determinant of protein designability, as well as direct estimate of designability for selected structures by thermodynamic sampling of their sequence space. We test the significance of this trend predicted on lattice models on real proteins and show that protein domains that are found in eukaryotic organisms only feature statistically significant higher designability than their prokaryotic counterparts. These results present a fundamental view on protein evolution highlighting the relative roles of structural selection and evolutionary dynamics on genesis of modern proteins.
Collapse
Affiliation(s)
- Guido Tiana
- Department of Physics and Istituto Nazionale di Fisica Nucleare, University of Milano, Via Celoria 16, 20133 Milan, Italy
| | | | | | | |
Collapse
|
44
|
|
45
|
Ball RC, Fink TMA, Bowler NE. Stochastic annealing. PHYSICAL REVIEW LETTERS 2003; 91:030201. [PMID: 12906405 DOI: 10.1103/physrevlett.91.030201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2003] [Indexed: 05/24/2023]
Abstract
We show how to simulate a system in thermal equilibrium when the energy cannot be evaluated exactly: the error distribution needs to be symmetric, but it does not need to be known. We also solve the Ceperley-Dewing version of this problem, where the error distribution is taken to be fully known. These underlying ideas give an effective optimization strategy for problems where the evaluation of each design can be sampled only statistically, including an application to protein folding.
Collapse
Affiliation(s)
- Robin C Ball
- Department of Physics, University of Warwick, Coventry CV4 7AL, United Kingdom.
| | | | | |
Collapse
|
46
|
Qin M, Wang J, Tang Y, Wang W. Folding behaviors of lattice model proteins with three kinds of contact potentials. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2003; 67:061905. [PMID: 16241259 DOI: 10.1103/physreve.67.061905] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2003] [Indexed: 05/04/2023]
Abstract
The interaction potentials between the amino acids are very important in the study of protein folding and design. In this work, the folding behaviors of lattice model protein chains are studied using three kinds of contact potentials between the beads. For these three cases, a number of sequences are designed using the Z-score method, and then their folding behaviors are obtained via Monte Carlo simulations for different sizes of the chains. It is found that the proper weakening of hydrophobicity may speed up the folding and the elimination of the mixing interaction terms may deteriorate the foldability. The different features of the foldability are discussed by comparing the characteristics of the energy landscapes of these model chains. The formations of various contacts are also analyzed, which provide us with some microscopic information on the model systems and interaction potentials.
Collapse
Affiliation(s)
- Meng Qin
- National Laboratory of Solid State Microstructure and Department of Physics, Nanjing University, China
| | | | | | | |
Collapse
|
47
|
Abstract
Experimental studies have shown that the full sequence complexity of naturally occurring proteins is not required to generate rapidly folding and functional proteins, i.e. proteins can be designed with fewer than 20 letters. This raises the question of what is the minimum number of amino acid types required to encode complex protein folds? Here, we investigate this issue from three aspects. First, we study the minimum sequence complexity that can reserve the necessary structural information for detection of distantly related homologues. Second, we compare the ability of designing foldable model sequences over a wide range of reduced amino acid alphabets, which find the minimum number of letters that have the similar design ability as 20. Finally, we survey the lower bound of alphabet size of globular proteins in a non-redundant protein database. These different approaches give a remarkably consistent view, that the minimum number of letters required to fold a protein is around ten.
Collapse
Affiliation(s)
- Ke Fan
- National Laboratory of Solid State Microstructure and Department of Physics, Nanjing University, People's Republic of China
| | | |
Collapse
|
48
|
Nelson E, Grishin N. Investigation of the folding profiles of evolutionarily selected model proteins. J Chem Phys 2003. [DOI: 10.1063/1.1536621] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
49
|
van Ham RCHJ, Kamerbeek J, Palacios C, Rausell C, Abascal F, Bastolla U, Fernández JM, Jiménez L, Postigo M, Silva FJ, Tamames J, Viguera E, Latorre A, Valencia A, Morán F, Moya A. Reductive genome evolution in Buchnera aphidicola. Proc Natl Acad Sci U S A 2003; 100:581-6. [PMID: 12522265 PMCID: PMC141039 DOI: 10.1073/pnas.0235981100] [Citation(s) in RCA: 350] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2002] [Indexed: 02/07/2023] Open
Abstract
We have sequenced the genome of the intracellular symbiont Buchnera aphidicola from the aphid Baizongia pistacea. This strain diverged 80-150 million years ago from the common ancestor of two previously sequenced Buchnera strains. Here, a field-collected, nonclonal sample of insects was used as source material for laboratory procedures. As a consequence, the genome assembly unveiled intrapopulational variation, consisting of approximately 1,200 polymorphic sites. Comparison of the 618-kb (kbp) genome with the two other Buchnera genomes revealed a nearly perfect gene-order conservation, indicating that the onset of genomic stasis coincided closely with establishment of the symbiosis with aphids, approximately 200 million years ago. Extensive genome reduction also predates the synchronous diversification of Buchnera and its host; but, at a slower rate, gene loss continues among the extant lineages. A computational study of protein folding predicts that proteins in Buchnera, as well as proteins of other intracellular bacteria, are generally characterized by smaller folding efficiency compared with proteins of free living bacteria. These and other degenerative genomic features are discussed in light of compensatory processes and theoretical predictions on the long-term evolutionary fate of symbionts like Buchnera.
Collapse
Affiliation(s)
- Roeland C H J van Ham
- Centro de Astrobiologia, Instituto Nacional de Técnica Aeroespacial-Consejo Superior de Investigaciones Cientificas, Carretera de Ajalvir kilómetro 4, 28850 Torrejón de Ardoz, Madrid, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Ball RC, Fink TMA. Protein design depends on the size of the amino acid alphabet. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2002; 66:031902. [PMID: 12366147 DOI: 10.1103/physreve.66.031902] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/26/2001] [Indexed: 05/23/2023]
Abstract
We consider the design of proteins to be simultaneously thermodynamically stable in multiple independent and correlated conformations. We first show that a protein can be trained to fold to multiple independent conformations and calculate its capacity. The number of configurations that it can remember is proportional to the logarithm of the number of amino acid species A, independent of chain length. Next we investigate the recognition of correlated conformations, which we apply to funnel design around a single configuration. The maximum basin of attraction, as parametrized in our model, also depends on the number of amino acid species as ln A. We argue that the extent to which the protein energy landscape can be manipulated is fixed, effecting a trade off between well breadth, well depth, and well number. This emerging picture motivates a clearer understanding of the scope and limits of protein and heteropolymer function.
Collapse
Affiliation(s)
- Robin C Ball
- Department of Physics, University of Warwick, Coventry CV4 7AL, England.
| | | |
Collapse
|