1
|
Mazmanian K, Chen T, Sargsyan K, Lim C. From quantum-derived principles underlying cysteine reactivity to combating the COVID-19 pandemic. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL MOLECULAR SCIENCE 2022; 12:e1607. [PMID: 35600063 PMCID: PMC9111396 DOI: 10.1002/wcms.1607] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 01/31/2022] [Accepted: 02/13/2022] [Indexed: 12/20/2022]
Abstract
The COVID-19 pandemic poses a challenge in coming up with quick and effective means to counter its cause, the SARS-CoV-2. Here, we show how the key factors governing cysteine reactivity in proteins derived from combined quantum mechanical/continuum calculations led to a novel multi-targeting strategy against SARS-CoV-2, in contrast to developing potent drugs/vaccines against a single viral target such as the spike protein. Specifically, they led to the discovery of reactive cysteines in evolutionary conserved Zn2+-sites in several SARS-CoV-2 proteins that are crucial for viral polypeptide proteolysis as well as viral RNA synthesis, proofreading, and modification. These conserved, reactive cysteines, both free and Zn2+-bound, can be targeted using the same Zn-ejector drug (disulfiram/ebselen), which enables the use of broad-spectrum anti-virals that would otherwise be removed by the virus's proofreading mechanism. Our strategy of targeting multiple, conserved viral proteins that operate at different stages of the virus life cycle using a Zn-ejector drug combined with other broad-spectrum anti-viral drug(s) could enhance the barrier to drug resistance and antiviral effects, as compared to each drug alone. Since these functionally important nonstructural proteins containing reactive cysteines are highly conserved among coronaviruses, our proposed strategy has the potential to tackle future coronaviruses. This article is categorized under:Structure and Mechanism > Reaction Mechanisms and CatalysisStructure and Mechanism > Computational Biochemistry and BiophysicsElectronic Structure Theory > Density Functional Theory.
Collapse
Affiliation(s)
| | - Ting Chen
- Institute of Biomedical Sciences Academia Sinica Taipei Taiwan
| | - Karen Sargsyan
- Institute of Biomedical Sciences Academia Sinica Taipei Taiwan
| | - Carmay Lim
- Institute of Biomedical Sciences Academia Sinica Taipei Taiwan
- Department of Chemistry National Tsing Hua University Hsinchu Taiwan
| |
Collapse
|
2
|
Manrubia S, Cuesta JA, Aguirre J, Ahnert SE, Altenberg L, Cano AV, Catalán P, Diaz-Uriarte R, Elena SF, García-Martín JA, Hogeweg P, Khatri BS, Krug J, Louis AA, Martin NS, Payne JL, Tarnowski MJ, Weiß M. From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics. Phys Life Rev 2021; 38:55-106. [PMID: 34088608 DOI: 10.1016/j.plrev.2021.03.004] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/01/2021] [Indexed: 12/21/2022]
Abstract
Understanding how genotypes map onto phenotypes, fitness, and eventually organisms is arguably the next major missing piece in a fully predictive theory of evolution. We refer to this generally as the problem of the genotype-phenotype map. Though we are still far from achieving a complete picture of these relationships, our current understanding of simpler questions, such as the structure induced in the space of genotypes by sequences mapped to molecular structures, has revealed important facts that deeply affect the dynamical description of evolutionary processes. Empirical evidence supporting the fundamental relevance of features such as phenotypic bias is mounting as well, while the synthesis of conceptual and experimental progress leads to questioning current assumptions on the nature of evolutionary dynamics-cancer progression models or synthetic biology approaches being notable examples. This work delves with a critical and constructive attitude into our current knowledge of how genotypes map onto molecular phenotypes and organismal functions, and discusses theoretical and empirical avenues to broaden and improve this comprehension. As a final goal, this community should aim at deriving an updated picture of evolutionary processes soundly relying on the structural properties of genotype spaces, as revealed by modern techniques of molecular and functional analysis.
Collapse
Affiliation(s)
- Susanna Manrubia
- Department of Systems Biology, Centro Nacional de Biotecnología (CSIC), Madrid, Spain; Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain; Instituto de Biocomputación y Física de Sistemas Complejos (BiFi), Universidad de Zaragoza, Spain; UC3M-Santander Big Data Institute (IBiDat), Getafe, Madrid, Spain
| | - Jacobo Aguirre
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Centro de Astrobiología, CSIC-INTA, ctra. de Ajalvir km 4, 28850 Torrejón de Ardoz, Madrid, Spain
| | - Sebastian E Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK
| | | | - Alejandro V Cano
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pablo Catalán
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain; Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid, Spain
| | - Santiago F Elena
- Instituto de Biología Integrativa de Sistemas, I(2)SysBio (CSIC-UV), València, Spain; The Santa Fe Institute, Santa Fe, NM, USA
| | | | - Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Utrecht University, the Netherlands
| | - Bhavin S Khatri
- The Francis Crick Institute, London, UK; Department of Life Sciences, Imperial College London, London, UK
| | - Joachim Krug
- Institute for Biological Physics, University of Cologne, Köln, Germany
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK
| | - Nora S Martin
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Marcel Weiß
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| |
Collapse
|
3
|
Tian P, Best RB. Exploring the sequence fitness landscape of a bridge between protein folds. PLoS Comput Biol 2020; 16:e1008285. [PMID: 33048928 PMCID: PMC7553338 DOI: 10.1371/journal.pcbi.1008285] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 08/24/2020] [Indexed: 12/15/2022] Open
Abstract
Most foldable protein sequences adopt only a single native fold. Recent protein design studies have, however, created protein sequences which fold into different structures apon changes of environment, or single point mutation, the best characterized example being the switch between the folds of the GA and GB binding domains of streptococcal protein G. To obtain further insight into the design of sequences which can switch folds, we have used a computational model for the fitness landscape of a single fold, built from the observed sequence variation of protein homologues. We have recently shown that such coevolutionary models can be used to design novel foldable sequences. By appropriately combining two of these models to describe the joint fitness landscape of GA and GB, we are able to describe the propensity of a given sequence for each of the two folds. We have successfully tested the combined model against the known series of designed GA/GB hybrids. Using Monte Carlo simulations on this landscape, we are able to identify pathways of mutations connecting the two folds. In the absence of a requirement for domain stability, the most frequent paths go via sequences in which neither domain is stably folded, reminiscent of the propensity for certain intrinsically disordered proteins to fold into different structures according to context. Even if the folded state is required to be stable, we find that there is nonetheless still a wide range of sequences which are close to the transition region and therefore likely fold switches, consistent with recent estimates that fold switching may be more widespread than had been thought.
Collapse
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, U.S.A
| | - Robert B. Best
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, U.S.A
| |
Collapse
|
4
|
Zabel WJ, Hagner KP, Livesey BJ, Marsh JA, Setayeshgar S, Lynch M, Higgs PG. Evolution of protein interfaces in multimers and fibrils. J Chem Phys 2019; 150:225102. [PMID: 31202237 PMCID: PMC6561775 DOI: 10.1063/1.5086042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A majority of cellular proteins function as part of multimeric complexes of two or more subunits. Multimer formation requires interactions between protein surfaces that lead to closed structures, such as dimers and tetramers. If proteins interact in an open-ended way, uncontrolled growth of fibrils can occur, which is likely to be detrimental in most cases. We present a statistical physics model that allows aggregation of proteins as either closed dimers or open fibrils of all lengths. We use pairwise amino-acid contact energies to calculate the energies of interacting protein surfaces. The probabilities of all possible aggregate configurations can be calculated for any given sequence of surface amino acids. We link the statistical physics model to a population genetics model that describes the evolution of the surface residues. When proteins evolve neutrally, without selection for or against multimer formation, we find that a majority of proteins remain as monomers at moderate concentrations, but strong dimer-forming or fibril-forming sequences are also possible. If selection is applied in favor of dimers or in favor of fibrils, then it is easy to select either dimer-forming or fibril-forming sequences. It is also possible to select for oriented fibrils with protein subunits all aligned in the same direction. We measure the propensities of amino acids to occur at interfaces relative to noninteracting surfaces and show that the propensities in our model are strongly correlated with those that have been measured in real protein structures. We also show that there are significant differences between amino acid frequencies at isologous and heterologous interfaces in our model, and we observe that similar effects occur in real protein structures.
Collapse
Affiliation(s)
- W Jeffrey Zabel
- Department of Physics and Astronomy, McMaster University, Hamilton, Ontario L8S 4M1, Canada
| | - Kyle P Hagner
- Department of Physics, Indiana University, Bloomington, Indiana 47405, USA
| | - Benjamin J Livesey
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, United Kingdom
| | - Joseph A Marsh
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, United Kingdom
| | - Sima Setayeshgar
- Department of Physics, Indiana University, Bloomington, Indiana 47405, USA
| | - Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, Arizona 85287, USA
| | - Paul G Higgs
- Department of Physics and Astronomy, McMaster University, Hamilton, Ontario L8S 4M1, Canada
| |
Collapse
|
5
|
Venev SV, Zeldovich KB. Thermophilic Adaptation in Prokaryotes Is Constrained by Metabolic Costs of Proteostasis. Mol Biol Evol 2019; 35:211-224. [PMID: 29106597 PMCID: PMC5850847 DOI: 10.1093/molbev/msx282] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Prokaryotes evolved to thrive in an extremely diverse set of habitats, and their proteomes bear signatures of environmental conditions. Although correlations between amino acid usage and environmental temperature are well-documented, understanding of the mechanisms of thermal adaptation remains incomplete. Here, we couple the energetic costs of protein folding and protein homeostasis to build a microscopic model explaining both the overall amino acid composition and its temperature trends. Low biosynthesis costs lead to low diversity of physical interactions between amino acid residues, which in turn makes proteins less stable and drives up chaperone activity to maintain appropriate levels of folded, functional proteins. Assuming that the cost of chaperone activity is proportional to the fraction of unfolded client proteins, we simulated thermal adaptation of model proteins subject to minimization of the total cost of amino acid synthesis and chaperone activity. For the first time, we predicted both the proteome-average amino acid abundances and their temperature trends simultaneously, and found strong correlations between model predictions and 402 genomes of bacteria and archaea. The energetic constraint on protein evolution is more apparent in highly expressed proteins, selected by codon adaptation index. We found that in bacteria, highly expressed proteins are similar in composition to thermophilic ones, whereas in archaea no correlation between predicted expression level and thermostability was observed. At the same time, thermal adaptations of highly expressed proteins in bacteria and archaea are nearly identical, suggesting that universal energetic constraints prevail over the phylogenetic differences between these domains of life.
Collapse
Affiliation(s)
- Sergey V Venev
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, MA
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, MA
| |
Collapse
|
6
|
The direction of protein evolution is destined by the stability. Biochimie 2018; 150:100-109. [DOI: 10.1016/j.biochi.2018.05.006] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 05/15/2018] [Indexed: 01/29/2023]
|
7
|
Protein Evolution is Potentially Governed by Protein Stability: Directed Evolution of an Esterase from the Hyperthermophilic Archaeon Sulfolobus tokodaii. J Mol Evol 2018; 86:283-292. [DOI: 10.1007/s00239-018-9843-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 04/18/2018] [Indexed: 11/27/2022]
|
8
|
The Role of Evolutionary Selection in the Dynamics of Protein Structure Evolution. Biophys J 2017; 112:1350-1365. [PMID: 28402878 DOI: 10.1016/j.bpj.2017.02.029] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 02/16/2017] [Accepted: 02/22/2017] [Indexed: 02/05/2023] Open
Abstract
Homology modeling is a powerful tool for predicting a protein's structure. This approach is successful because proteins whose sequences are only 30% identical still adopt the same structure, while structure similarity rapidly deteriorates beyond the 30% threshold. By studying the divergence of protein structure as sequence evolves in real proteins and in evolutionary simulations, we show that this nonlinear sequence-structure relationship emerges as a result of selection for protein folding stability in divergent evolution. Fitness constraints prevent the emergence of unstable protein evolutionary intermediates, thereby enforcing evolutionary paths that preserve protein structure despite broad sequence divergence. However, on longer timescales, evolution is punctuated by rare events where the fitness barriers obstructing structure evolution are overcome and discovery of new structures occurs. We outline biophysical and evolutionary rationale for broad variation in protein family sizes, prevalence of compact structures among ancient proteins, and more rapid structure evolution of proteins with lower packing density.
Collapse
|
9
|
Alotaibi M, Reyes BD, Le T, Luong P, Valafar F, Metzger RP, Fogel GB, Hecht D. Structure-based analysis of Bacilli and plasmid dihydrofolate reductase evolution. J Mol Graph Model 2017; 71:135-153. [PMID: 27914300 PMCID: PMC5203806 DOI: 10.1016/j.jmgm.2016.10.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Revised: 10/04/2016] [Accepted: 10/10/2016] [Indexed: 12/15/2022]
Abstract
Dihydrofolate reductase (DHFR), a key enzyme in tetrahydrofolate-mediated biosynthetic pathways, has a structural motif known to be highly conserved over a wide range of organisms. Given its critical role in purine and amino acid synthesis, DHFR is a well established therapeutic target for treating a wide range of prokaryotic and eukaryotic infections as well as certain types of cancer. Here we present a structural-based computer analysis of bacterial (Bacilli) and plasmid DHFR evolution. We generated a structure-based sequence alignment using 7 wild-type DHFR x-ray crystal structures obtained from the RCSB Protein Data Bank and 350 chromosomal and plasmid homology models we generated from sequences obtained from the NCBI Protein Database. We used these alignments to compare active site and non-active site conservation in terms of amino acid residues, secondary structure and amino acid residue class. With respect to amino acid sequences and residue classes, active-site positions in both plasmid and chromosomal DHFR are significantly more conserved than non-active site positions. Secondary structure conservation was similar for active site and non-active site positions. Plasmid-encoded DHFR proteins have greater degree of sequence and residue class conservation, particularly in sequence positions associated with a network of concerted protein motions, than chromosomal-encoded DHFR proteins. These structure-based were used to build DHFR specific phylogenetic trees from which evidence for horizontal gene transfer was identified.
Collapse
Affiliation(s)
- Mona Alotaibi
- Department of Chemistry and Biochemistry, San Diego State University, San Diego, CA 92182-1030, USA; King Saud University, P.O. Box 245714, Riyadh 11312, Saudi Arabia.
| | - Ben Delos Reyes
- Department of Chemistry and Biochemistry, San Diego State University, San Diego, CA 92182-1030, USA
| | - Tin Le
- Department of Chemistry and Biochemistry, San Diego State University, San Diego, CA 92182-1030, USA
| | - Phuong Luong
- Department of Chemistry and Biochemistry, San Diego State University, San Diego, CA 92182-1030, USA
| | - Faramarz Valafar
- Bioinformatics and Medical Informatics Research Center, San Diego State University, San Diego, CA 92182-7720, USA.
| | - Robert P Metzger
- Department of Chemistry and Biochemistry, San Diego State University, San Diego, CA 92182-1030, USA.
| | - Gary B Fogel
- Natural Selection, Inc., 6480 Weathers Place, Suite 350, San Diego, CA 92121, USA.
| | - David Hecht
- Department of Chemistry and Biochemistry, San Diego State University, San Diego, CA 92182-1030, USA; Department of Chemistry, Southwestern College, 900 Otay Lakes Rd., Chula Vista, CA 91910, USA.
| |
Collapse
|
10
|
Thorvaldsen S. A Mutation Model from First Principles of the Genetic Code. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:878-886. [PMID: 26485722 DOI: 10.1109/tcbb.2015.2489641] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The paper presents a neutral Codons Probability Mutations (CPM) model of molecular evolution and genetic decay of an organism. The CPM model uses a Markov process with a 20-dimensional state space of probability distributions over amino acids. The transition matrix of the Markov process includes the mutation rate and those single point mutations compatible with the genetic code. This is an alternative to the standard Point Accepted Mutation (PAM) and BLOcks of amino acid SUbstitution Matrix (BLOSUM). Genetic decay is quantified as a similarity between the amino acid distribution of proteins from a (group of) species on one hand, and the equilibrium distribution of the Markov chain on the other. Amino acid data for the eukaryote, bacterium, and archaea families are used to illustrate how both the CPM and PAM models predict their genetic decay towards the equilibrium value of 1. A family of bacteria is studied in more detail. It is found that warm environment organisms on average have a higher degree of genetic decay compared to those species that live in cold environments. The paper addresses a new codon-based approach to quantify genetic decay due to single point mutations compatible with the genetic code. The present work may be seen as a first approach to use codon-based Markov models to study how genetic entropy increases with time in an effectively neutral biological regime. Various extensions of the model are also discussed.
Collapse
|
11
|
Li W, Fontanelli O, Miramontes P. Size distribution of function-based human gene sets and the split-merge model. ROYAL SOCIETY OPEN SCIENCE 2016; 3:160275. [PMID: 27853602 PMCID: PMC5108952 DOI: 10.1098/rsos.160275] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 07/01/2016] [Indexed: 06/06/2023]
Abstract
The sizes of paralogues-gene families produced by ancestral duplication-are known to follow a power-law distribution. We examine the size distribution of gene sets or gene families where genes are grouped by a similar function or share a common property. The size distribution of Human Gene Nomenclature Committee (HGNC) gene sets deviate from the power-law, and can be fitted much better by a beta rank function. We propose a simple mechanism to break a power-law size distribution by a combination of splitting and merging operations. The largest gene sets are split into two to account for the subfunctional categories, and a small proportion of other gene sets are merged into larger sets as new common themes might be realized. These operations are not uncommon for a curator of gene sets. A simulation shows that iteration of these operations changes the size distribution of Ensembl paralogues and could lead to a distribution fitted by a rank beta function. We further illustrate application of beta rank function by the example of distribution of transcription factors and drug target genes among HGNC gene families.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, USA
| | - Oscar Fontanelli
- Departamento de Matemáticas, Facultad de Ciencias, Universidad Nacional Autónoma de México, Circuito Exterior, Ciudad Universitaria, México 04510 DF, México
| | - Pedro Miramontes
- Departamento de Matemáticas, Facultad de Ciencias, Universidad Nacional Autónoma de México, Circuito Exterior, Ciudad Universitaria, México 04510 DF, México
- Bioinformatics Group and Interdisciplinary Center for Bioinformatics, University of Leipzig, Haertelstrasse 16–18, 04107 Leipzig, Germany
| |
Collapse
|
12
|
Venev SV, Zeldovich KB. Massively parallel sampling of lattice proteins reveals foundations of thermal adaptation. J Chem Phys 2016; 143:055101. [PMID: 26254668 DOI: 10.1063/1.4927565] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Evolution of proteins in bacteria and archaea living in different conditions leads to significant correlations between amino acid usage and environmental temperature. The origins of these correlations are poorly understood, and an important question of protein theory, physics-based prediction of types of amino acids overrepresented in highly thermostable proteins, remains largely unsolved. Here, we extend the random energy model of protein folding by weighting the interaction energies of amino acids by their frequencies in protein sequences and predict the energy gap of proteins designed to fold well at elevated temperatures. To test the model, we present a novel scalable algorithm for simultaneous energy calculation for many sequences in many structures, targeting massively parallel computing architectures such as graphics processing unit. The energy calculation is performed by multiplying two matrices, one representing the complete set of sequences, and the other describing the contact maps of all structural templates. An implementation of the algorithm for the CUDA platform is available at http://www.github.com/kzeldovich/galeprot and calculates protein folding energies over 250 times faster than a single central processing unit. Analysis of amino acid usage in 64-mer cubic lattice proteins designed to fold well at different temperatures demonstrates an excellent agreement between theoretical and simulated values of energy gap. The theoretical predictions of temperature trends of amino acid frequencies are significantly correlated with bioinformatics data on 191 bacteria and archaea, and highlight protein folding constraints as a fundamental selection pressure during thermal adaptation in biological evolution.
Collapse
Affiliation(s)
- Sergey V Venev
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, Massachusetts 01605, USA
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, Massachusetts 01605, USA
| |
Collapse
|
13
|
A Dynamic Model for the Evolution of Protein Structure. J Mol Evol 2016; 82:230-43. [PMID: 27146880 DOI: 10.1007/s00239-016-9740-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Accepted: 04/12/2016] [Indexed: 10/21/2022]
Abstract
Domains are folded structures and evolutionary building blocks of protein molecules. Their three-dimensional atomic conformations, which define biological functions, can be coarse-grained into levels of a hierarchy. Here we build global dynamical models for the evolution of domains at fold and fold superfamily (FSF) levels. We fit the models with data from phylogenomic trees of domain structures and evaluate the distributions of the resulting parameters and their implications. The trees were inferred from a census of domain structures in hundreds of genomes from all three superkingdoms of life. The models used birth-death differential equations with the global abundances of structures as state variables, with one set of equations for folds and another for FSFs. Only the transitions present in the tree are assumed possible. Each fold or FSF diversifies in variants, eventually producing a new fold or FSF. The parameters specify rates of generation of variants and of new folds or FSFs. The equations were solved for the parameters by simplifying the trees to a comb-like topology, treating branches as emerging directly from a trunk. We found that the rate constants for folds and FSFs evolved similarly. These parameters showed a sharp transient change at about 1.5 Gyrs ago. This time coincides with a period in which domains massively combined in proteins and their arrangements distributed in novel lineages during the rise of organismal diversification. Our simulations suggest that exploration of protein structure space occurs through coarse-grained discoveries that undergo fine-grained elaboration.
Collapse
|
14
|
Avetisov VA, Ivanov VA, Meshkov DA, Nechaev SK. Fractal globules: a new approach to artificial molecular machines. Biophys J 2015; 107:2361-8. [PMID: 25418305 DOI: 10.1016/j.bpj.2014.10.019] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Revised: 09/22/2014] [Accepted: 10/14/2014] [Indexed: 02/07/2023] Open
Abstract
The over-damped relaxation of elastic networks constructed by contact maps of hierarchically folded fractal (crumpled) polymer globules was investigated in detail. It was found that the relaxation dynamics of an anisotropic fractal globule is very similar to the behavior of biological molecular machines like motor proteins. When it is perturbed, the system quickly relaxes to a low-dimensional manifold, M, with a large basin of attraction and then slowly approaches equilibrium, not escaping M. Taking these properties into account, it is suggested that fractal globules, even those made by synthetic polymers, are artificial molecular machines that can transform perturbations into directed quasimechanical motion along a defined path.
Collapse
Affiliation(s)
- Vladik A Avetisov
- N. N. Semenov Institute of Chemical Physics, Russian Academy of Sciences, Moscow, Russia; Department of Applied Mathematics, National Research University Higher School of Economics, Moscow, Russia.
| | - Viktor A Ivanov
- Faculty of Physics of the M. V. Lomonosov Moscow State University, Moscow, Russia
| | - Dmitry A Meshkov
- N. N. Semenov Institute of Chemical Physics, Russian Academy of Sciences, Moscow, Russia
| | - Sergei K Nechaev
- Université Paris-Sud/Centre National de la Recherche Scientifique, Laboratoire de Physique Theorique et Modèles Statistiques, Orsay, France; P. N. Lebedev Physical Institute, Russian Academy of Sciences, Moscow, Russia; Department of Applied Mathematics, National Research University Higher School of Economics, Moscow, Russia
| |
Collapse
|
15
|
Sikosek T, Chan HS. Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 2015; 11:20140419. [PMID: 25165599 DOI: 10.1098/rsif.2014.0419] [Citation(s) in RCA: 150] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence-structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by 'hidden' conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.
Collapse
Affiliation(s)
- Tobias Sikosek
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Hue Sun Chan
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
16
|
Holzgräfe C, Wallin S. Local versus global fold switching in protein evolution: insight from a three-letter continuous model. Phys Biol 2015; 12:026002. [DOI: 10.1088/1478-3975/12/2/026002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
17
|
Merging molecular mechanism and evolution: theory and computation at the interface of biophysics and evolutionary population genetics. Curr Opin Struct Biol 2014; 26:84-91. [PMID: 24952216 DOI: 10.1016/j.sbi.2014.05.005] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Revised: 04/19/2014] [Accepted: 05/16/2014] [Indexed: 11/24/2022]
Abstract
The variation among sequences and structures in nature is both determined by physical laws and by evolutionary history. However, these two factors are traditionally investigated by disciplines with different emphasis and philosophy-molecular biophysics on one hand and evolutionary population genetics in another. Here, we review recent theoretical and computational approaches that address the crucial need to integrate these two disciplines. We first articulate the elements of these approaches. Then, we survey their contribution to our mechanistic understanding of molecular evolution, the polymorphisms in coding region, the distribution of fitness effects (DFE) of mutations, the observed folding stability of proteins in nature, and the distribution of protein folds in genomes.
Collapse
|
18
|
Caetano-Anollés G, Sun FJ. The natural history of transfer RNA and its interactions with the ribosome. Front Genet 2014; 5:127. [PMID: 24847358 PMCID: PMC4023039 DOI: 10.3389/fgene.2014.00127] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2014] [Accepted: 04/22/2014] [Indexed: 12/20/2022] Open
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana-Champaign, IL, USA
| | - Feng-Jie Sun
- School of Science and Technology, Georgia Gwinnett College Lawrenceville, GA, USA
| |
Collapse
|
19
|
Sequence and structure space model of protein divergence driven by point mutations. J Theor Biol 2013; 330:1-8. [DOI: 10.1016/j.jtbi.2013.03.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Revised: 03/07/2013] [Accepted: 03/18/2013] [Indexed: 12/11/2022]
|
20
|
Dixit PD, Maslov S. Evolutionary capacitance and control of protein stability in protein-protein interaction networks. PLoS Comput Biol 2013; 9:e1003023. [PMID: 23592969 PMCID: PMC3617028 DOI: 10.1371/journal.pcbi.1003023] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Accepted: 02/20/2013] [Indexed: 11/19/2022] Open
Abstract
In addition to their biological function, protein complexes reduce the exposure of the constituent proteins to the risk of undesired oligomerization by reducing the concentration of the free monomeric state. We interpret this reduced risk as a stabilization of the functional state of the protein. We estimate that protein-protein interactions can account for ~2-4 k(B)T of additional stabilization; a substantial contribution to intrinsic stability. We hypothesize that proteins in the interaction network act as evolutionary capacitors which allows their binding partners to explore regions of the sequence space which correspond to less stable proteins. In the interaction network of baker's yeast, we find that statistically proteins that receive higher energetic benefits from the interaction network are more likely to misfold. A simplified fitness landscape wherein the fitness of an organism is inversely proportional to the total concentration of unfolded proteins provides an evolutionary justification for the proposed trends. We conclude by outlining clear biophysical experiments to test our predictions.
Collapse
Affiliation(s)
- Purushottam D. Dixit
- Biology, Brookhaven National Laboratory, Upton, New York, United States of America
| | - Sergei Maslov
- Biology, Brookhaven National Laboratory, Upton, New York, United States of America
- Physics and Astronomy, Stony Brook University, Stony Brook, New York, United States of America
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, United States of America
- * E-mail:
| |
Collapse
|
21
|
Yafremava LS, Wielgos M, Thomas S, Nasir A, Wang M, Mittenthal JE, Caetano-Anollés G. A general framework of persistence strategies for biological systems helps explain domains of life. Front Genet 2013; 4:16. [PMID: 23443991 PMCID: PMC3580334 DOI: 10.3389/fgene.2013.00016] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Accepted: 01/28/2013] [Indexed: 11/13/2022] Open
Abstract
The nature and cause of the division of organisms in superkingdoms is not fully understood. Assuming that environment shapes physiology, here we construct a novel theoretical framework that helps identify general patterns of organism persistence. This framework is based on Jacob von Uexküll's organism-centric view of the environment and James G. Miller's view of organisms as matter-energy-information processing molecular machines. Three concepts describe an organism's environmental niche: scope, umwelt, and gap. Scope denotes the entirety of environmental events and conditions to which the organism is exposed during its lifetime. Umwelt encompasses an organism's perception of these events. The gap is the organism's blind spot, the scope that is not covered by umwelt. These concepts bring organisms of different complexity to a common ecological denominator. Ecological and physiological data suggest organisms persist using three strategies: flexibility, robustness, and economy. All organisms use umwelt information to flexibly adapt to environmental change. They implement robustness against environmental perturbations within the gap generally through redundancy and reliability of internal constituents. Both flexibility and robustness improve survival. However, they also incur metabolic matter-energy processing costs, which otherwise could have been used for growth and reproduction. Lineages evolve unique tradeoff solutions among strategies in the space of what we call "a persistence triangle." Protein domain architecture and other evidence support the preferential use of flexibility and robustness properties. Archaea and Bacteria gravitate toward the triangle's economy vertex, with Archaea biased toward robustness. Eukarya trade economy for survivability. Protista occupy a saddle manifold separating akaryotes from multicellular organisms. Plants and the more flexible Fungi share an economic stratum, and Metazoa are locked in a positive feedback loop toward flexibility.
Collapse
Affiliation(s)
- Liudmila S Yafremava
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana, IL, USA
| | | | | | | | | | | | | |
Collapse
|
22
|
Bershtein S, Mu W, Serohijos AWR, Zhou J, Shakhnovich EI. Protein quality control acts on folding intermediates to shape the effects of mutations on organismal fitness. Mol Cell 2012; 49:133-44. [PMID: 23219534 DOI: 10.1016/j.molcel.2012.11.004] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2012] [Revised: 08/08/2012] [Accepted: 11/02/2012] [Indexed: 11/26/2022]
Abstract
What are the molecular properties of proteins that fall on the radar of protein quality control (PQC)? Here we mutate the E. coli's gene encoding dihydrofolate reductase (DHFR) and replace it with bacterial orthologous genes to determine how components of PQC modulate fitness effects of these genetic changes. We find that chaperonins GroEL/ES and protease Lon compete for binding to molten globule intermediate of DHFR, resulting in a peculiar symmetry in their action: overexpression of GroEL/ES and deletion of Lon both restore growth of deleterious DHFR mutants and most of the slow-growing orthologous DHFR strains. Kinetic steady-state modeling predicts and experimentation verifies that mutations affect fitness by shifting the flux balance in cellular milieu between protein production, folding, and degradation orchestrated by PQC through the interaction with folding intermediates.
Collapse
Affiliation(s)
- Shimon Bershtein
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, USA
| | | | | | | | | |
Collapse
|
23
|
Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, Colwell LJ, de Koning APJ, Dokholyan NV, Echave J, Elofsson A, Gerloff DL, Goldstein RA, Grahnen JA, Holder MT, Lakner C, Lartillot N, Lovell SC, Naylor G, Perica T, Pollock DD, Pupko T, Regan L, Roger A, Rubinstein N, Shakhnovich E, Sjölander K, Sunyaev S, Teufel AI, Thorne JL, Thornton JW, Weinreich DM, Whelan S. The interface of protein structure, protein biophysics, and molecular evolution. Protein Sci 2012; 21:769-85. [PMID: 22528593 PMCID: PMC3403413 DOI: 10.1002/pro.2071] [Citation(s) in RCA: 149] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2012] [Revised: 03/22/2012] [Accepted: 03/23/2012] [Indexed: 12/20/2022]
Abstract
Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction.
Collapse
Affiliation(s)
- David A Liberles
- Department of Molecular Biology, University of WyomingLaramie, Wyoming 82071
| | - Sarah A Teichmann
- MRC Laboratory of Molecular BiologyHills Road, Cambridge CB2 0QH, United Kingdom
| | - Ivet Bahar
- Department of Computational and Systems Biology, School of Medicine, University of PittsburghPittsburgh, Pennsylvania 15213
| | - Ugo Bastolla
- Bioinformatics Unit. Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autonoma de Madrid28049 Cantoblanco Madrid, Spain
| | - Jesse Bloom
- Division of Basic Sciences, Fred Hutchinson Cancer Research CenterSeattle, Washington 98109
| | - Erich Bornberg-Bauer
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of MuensterGermany
| | - Lucy J Colwell
- MRC Laboratory of Molecular BiologyHills Road, Cambridge CB2 0QH, United Kingdom
| | - A P Jason de Koning
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of ColoradoAurora, Colorado
| | - Nikolay V Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel HillNorth Carolina 27599
| | - Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San MartínMartín de Irigoyen 3100, 1650 San Martín, Buenos Aires, Argentina
| | - Arne Elofsson
- Department of Biochemistry and Biophysics, Center for Biomembrane Research, Stockholm Bioinformatics Center, Science for Life Laboratory, Swedish E-science Research Center, Stockholm University106 91 Stockholm, Sweden
| | - Dietlind L Gerloff
- Biomolecular Engineering Department, University of CaliforniaSanta Cruz, California 95064
| | - Richard A Goldstein
- Division of Mathematical Biology, National Institute for Medical Research (MRC)Mill Hill, London NW7 1AA, United Kingdom
| | - Johan A Grahnen
- Department of Molecular Biology, University of WyomingLaramie, Wyoming 82071
| | - Mark T Holder
- Department of Ecology and Evolutionary Biology, University of KansasLawrence, Kansas 66045
| | - Clemens Lakner
- Bioinformatics Research Center, North Carolina State UniversityRaleigh, North Carolina 27695
| | - Nicholas Lartillot
- Département de Biochimie, Faculté de Médecine, Université de MontréalMontréal, QC H3T1J4, Canada
| | - Simon C Lovell
- Faculty of Life Sciences, University of ManchesterManchester M13 9PT, United Kingdom
| | - Gavin Naylor
- Department of Biology, College of CharlestonCharleston, South Carolina 29424
| | - Tina Perica
- MRC Laboratory of Molecular BiologyHills Road, Cambridge CB2 0QH, United Kingdom
| | - David D Pollock
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of ColoradoAurora, Colorado
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv UniversityTel Aviv, Israel
| | - Lynne Regan
- Department of Molecular Biophysics and Biochemistry, Yale UniversityNew Haven 06511
| | - Andrew Roger
- Department of Biochemistry and Molecular Biology, Dalhousie UniversityHalifax, NS, Canada
| | - Nimrod Rubinstein
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv UniversityTel Aviv, Israel
| | - Eugene Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard UniversityCambridge, Massachusetts 02138
| | - Kimmen Sjölander
- Department of Bioengineering, University of CaliforniaBerkeley, Berkeley, California 94720
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School77 Avenue Louis Pasteur, Boston, Massachusetts 02115
| | - Ashley I Teufel
- Department of Molecular Biology, University of WyomingLaramie, Wyoming 82071
| | - Jeffrey L Thorne
- Bioinformatics Research Center, North Carolina State UniversityRaleigh, North Carolina 27695
| | - Joseph W Thornton
- Howard Hughes Medical Institute and Institute for Ecology and Evolution, University of OregonEugene, Oregon 97403
- Department of Human Genetics, University of ChicagoChicago, Illinois 60637
- Department of Ecology and Evolution, University of ChicagoChicago, Illinois 60637
| | - Daniel M Weinreich
- Department of Ecology and Evolutionary Biology, and Center for Computational Molecular Biology, Brown UniversityProvidence, Rhode Island 02912
| | - Simon Whelan
- Faculty of Life Sciences, University of ManchesterManchester M13 9PT, United Kingdom
| |
Collapse
|
24
|
Soluble oligomerization provides a beneficial fitness effect on destabilizing mutations. Proc Natl Acad Sci U S A 2012; 109:4857-62. [PMID: 22411825 DOI: 10.1073/pnas.1118157109] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Mutations create the genetic diversity on which selective pressures can act, yet also create structural instability in proteins. How, then, is it possible for organisms to ameliorate mutation-induced perturbations of protein stability while maintaining biological fitness and gaining a selective advantage? Here we used site-specific chromosomal mutagenesis to introduce a selected set of mostly destabilizing mutations into folA--an essential chromosomal gene of Escherichia coli encoding dihydrofolate reductase (DHFR)--to determine how changes in protein stability, activity, and abundance affect fitness. In total, 27 E. coli strains carrying mutant DHFR were created. We found no significant correlation between protein stability and its catalytic activity nor between catalytic activity and fitness in a limited range of variation of catalytic activity observed in mutants. The stability of these mutants is strongly correlated with their intracellular abundance, suggesting that protein homeostatic machinery plays an active role in maintaining intracellular concentrations of proteins. Fitness also shows a significant correlation with intracellular abundance of soluble DHFR in cells growing at 30 °C. At 42 °C, the picture was mixed, yet remarkable: A few strains carrying mutant DHFR proteins aggregated, rendering them nonviable, but, intriguingly, the majority exhibited fitness higher than wild type. We found that mutational destabilization of DHFR proteins in E. coli is counterbalanced at 42 °C by their soluble oligomerization, thereby restoring structural stability and protecting against aggregation.
Collapse
|
25
|
Cuypers TD, Hogeweg P. Virtual genomes in flux: an interplay of neutrality and adaptability explains genome expansion and streamlining. Genome Biol Evol 2012; 4:212-29. [PMID: 22234601 PMCID: PMC3318439 DOI: 10.1093/gbe/evr141] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The picture that emerges from phylogenetic gene content reconstructions is that genomes evolve in a dynamic pattern of rapid expansion and gradual streamlining. Ancestral organisms have been estimated to possess remarkably rich gene complements, although gene loss is a driving force in subsequent lineage adaptation and diversification. Here, we study genome dynamics in a model of virtual cells evolving to maintain homeostasis. We observe a pattern of an initial rapid expansion of the genome and a prolonged phase of mutational load reduction. Generally, load reduction is achieved by the deletion of redundant genes, generating a streamlining pattern. Load reduction can also occur as a result of the generation of highly neutral genomic regions. These regions can expand and contract in a neutral fashion. Our study suggests that genome expansion and streamlining are generic patterns of evolving systems. We propose that the complex genotype to phenotype mapping in virtual cells as well as in their biological counterparts drives genome size dynamics, due to an emerging interplay between adaptation, neutrality, and evolvability.
Collapse
Affiliation(s)
- Thomas D Cuypers
- Department of Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, The Netherlands.
| | | |
Collapse
|
26
|
Holzgräfe C, Irbäck A, Troein C. Mutation-induced fold switching among lattice proteins. J Chem Phys 2011; 135:195101. [DOI: 10.1063/1.3660691] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
27
|
Maruvka YE, Kessler DA, Shnerb NM. The birth-death-mutation process: a new paradigm for fat tailed distributions. PLoS One 2011; 6:e26480. [PMID: 22069453 PMCID: PMC3206027 DOI: 10.1371/journal.pone.0026480] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Accepted: 09/27/2011] [Indexed: 11/19/2022] Open
Abstract
Fat tailed statistics and power-laws are ubiquitous in many complex systems. Usually the appearance of of a few anomalously successful individuals (bio-species, investors, websites) is interpreted as reflecting some inherent "quality" (fitness, talent, giftedness) as in Darwin's theory of natural selection. Here we adopt the opposite, "neutral", outlook, suggesting that the main factor explaining success is merely luck. The statistics emerging from the neutral birth-death-mutation (BDM) process is shown to fit marvelously many empirical distributions. While previous neutral theories have focused on the power-law tail, our theory economically and accurately explains the entire distribution. We thus suggest the BDM distribution as a standard neutral model: effects of fitness and selection are to be identified by substantial deviations from it.
Collapse
Affiliation(s)
| | | | - Nadav M. Shnerb
- Department of Physics, Bar Ilan University, Ramat-Gan, Israel
| |
Collapse
|
28
|
Topology of protein interaction network shapes protein abundances and strengths of their functional and nonspecific interactions. Proc Natl Acad Sci U S A 2011; 108:4258-63. [PMID: 21368118 DOI: 10.1073/pnas.1009392108] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
How do living cells achieve sufficient abundances of functional protein complexes while minimizing promiscuous nonfunctional interactions? Here we study this problem using a first-principle model of the cell whose phenotypic traits are directly determined from its genome through biophysical properties of protein structures and binding interactions in a crowded cellular environment. The model cell includes three independent prototypical pathways, whose topologies of protein-protein interaction (PPI) subnetworks are different, but whose contributions to the cell fitness are equal. Model cells evolve through genotypic mutations and phenotypic protein copy number variations. We found a strong relationship between evolved physical-chemical properties of protein interactions and their abundances due to a "frustration" effect: Strengthening of functional interactions brings about hydrophobic interfaces, which make proteins prone to promiscuous binding. The balancing act is achieved by lowering concentrations of hub proteins while raising solubilities and abundances of functional monomers. On the basis of these principles we generated and analyzed a possible realization of the proteome-wide PPI network in yeast. In this simulation we found that high-throughput affinity capture-mass spectroscopy experiments can detect functional interactions with high fidelity only for high-abundance proteins while missing most interactions for low-abundance proteins.
Collapse
|
29
|
Chen P, Shakhnovich EI. Thermal adaptation of viruses and bacteria. Biophys J 2010; 98:1109-18. [PMID: 20371310 DOI: 10.1016/j.bpj.2009.11.048] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2009] [Revised: 11/10/2009] [Accepted: 11/30/2009] [Indexed: 01/24/2023] Open
Abstract
A previously established multiscale population genetics model posits that fitness can be inferred from the physical properties of proteins under the physiological assumption that a loss of stability by any protein confers the lethal phenotype to an organism. Here, we develop this model further by positing that replication rate (fitness) of a bacterial or viral strain directly depends on the copy number of folded proteins, which determine its replication rate. Using this model, and both numerical and analytical approaches, we studied the adaptation process of bacteria and viruses at varied environmental temperatures. We found that a broad distribution of protein stabilities observed in the model and in experiment is the key determinant of thermal response for viruses and bacteria. Our results explain most of the earlier experimental observations: the striking asymmetry of thermal response curves; the absence of evolutionary tradeoff, which was expected but not found in experiments; correlation between denaturation temperature for several protein families and the optimal growth temperature of their carrier organisms; and proximity of bacterial or viral optimal growth temperatures to their evolutionary temperatures. Our theory quantitatively and with high accuracy described thermal response curves for 35 bacterial species using, for each species, only two adjustable parameters-the number of rate-determining genes and the energy barrier for metabolic reactions.
Collapse
Affiliation(s)
- Peiqiu Chen
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts; Department of Physics, Harvard University, Cambridge, Massachusetts, USA
| | | |
Collapse
|
30
|
Tlusty T. A colorful origin for the genetic code: Information theory, statistical mechanics and the emergence of molecular codes. Phys Life Rev 2010; 7:362-76. [DOI: 10.1016/j.plrev.2010.06.002] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Revised: 01/25/2010] [Accepted: 02/06/2010] [Indexed: 10/19/2022]
|
31
|
Chen T, Vernazobres D, Yomo T, Bornberg-Bauer E, Chan HS. Evolvability and single-genotype fluctuation in phenotypic properties: a simple heteropolymer model. Biophys J 2010; 98:2487-96. [PMID: 20513392 PMCID: PMC2877360 DOI: 10.1016/j.bpj.2010.02.046] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2009] [Revised: 02/15/2010] [Accepted: 02/26/2010] [Indexed: 11/26/2022] Open
Abstract
Experiment showed that the response of a genotype to mutation, i.e., the magnitude of mutational change in a phenotypic property, can be correlated with the extent of phenotypic fluctuation among genetic clones. To address a possible statistical mechanical basis for such phenomena at the protein level, we consider a simple hydrophobic-polar lattice protein-chain model with an exhaustive mapping between sequence (genotype) and conformational (phenotype) spaces. Using squared end-to-end distance, R(N)(2), as an example conformational property, we study how the thermal fluctuation of a sequence's R(N)(2) may be predictive of the changes in the Boltzmann average R(N)(2) caused by single-point mutations on that sequence. We found that sequences with the same ground-state (R(N)(2))(0) exhibit a funnel-like organization under conditions favorable to chain collapse or folding: fluctuation (standard deviation sigma) of R(N)(2) tends to increase with mutational distance from a prototype sequence whose R(N)(2) deviates little from its (R(N)(2))(0). In general, large mutational decreases in R(N)(2) or in sigma are only possible for some, though not all, sequences with large sigma values. This finding suggests that single-genotype phenotypic fluctuation is a necessary, though not sufficient, indicator of evolvability toward genotypes with less phenotypic fluctuations.
Collapse
Affiliation(s)
- Tao Chen
- Departments of Biochemistry and of Molecular Genetics, Faculty of Medicine, and Department of Physics, University of Toronto, Toronto, Ontario, Canada
| | - David Vernazobres
- Institute for Evolution and Biodiversity, School of Biological Sciences, University of Münster, Münster, Germany
| | - Tetsuya Yomo
- Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, and the Graduate School of Frontier Bioscience, Osaka University, Osaka, Japan
- Exploratory Research for Advanced Technology, Japan Science and Technology Agency, Osaka, Japan
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, School of Biological Sciences, University of Münster, Münster, Germany
| | - Hue Sun Chan
- Departments of Biochemistry and of Molecular Genetics, Faculty of Medicine, and Department of Physics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
32
|
Interplay between pleiotropy and secondary selection determines rise and fall of mutators in stress response. PLoS Comput Biol 2010; 6:e1000710. [PMID: 20300650 PMCID: PMC2837395 DOI: 10.1371/journal.pcbi.1000710] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2009] [Accepted: 02/08/2010] [Indexed: 11/19/2022] Open
Abstract
Mutators are clones whose mutation rate is about two to three orders of magnitude higher than the rate of wild-type clones and their roles in adaptive evolution of asexual populations have been controversial. Here we address this problem by using an ab initio microscopic model of living cells, which combines population genetics with a physically realistic presentation of protein stability and protein-protein interactions. The genome of model organisms encodes replication controlling genes (RCGs) and genes modeling the mismatch repair (MMR) complexes. The genotype-phenotype relationship posits that the replication rate of an organism is proportional to protein copy numbers of RCGs in their functional form and there is a production cost penalty for protein overexpression. The mutation rate depends linearly on the concentration of homodimers of MMR proteins. By simulating multiple runs of evolution of populations under various environmental stresses—stationary phase, starvation or temperature-jump—we find that adaptation most often occurs through transient fixation of a mutator phenotype, regardless of the nature of stress. By contrast, the fixation mechanism does depend on the nature of stress. In temperature jump stress, mutators take over the population due to loss of stability of MMR complexes. In contrast, in starvation and stationary phase stresses, a small number of mutators are supplied to the population via epigenetic stochastic noise in production of MMR proteins (a pleiotropic effect), and their net supply is higher due to reduced genetic drift in slowly growing populations under stressful environments. Subsequently, mutators in stationary phase or starvation hitchhike to fixation with a beneficial mutation in the RCGs, (second order selection) and finally a mutation stabilizing the MMR complex arrives, returning the population to a non-mutator phenotype. Our results provide microscopic insights into the rise and fall of mutators in adapting finite asexual populations. The dramatic rise of mutators has been found to accompany adaptation of bacteria in response to many kinds of stress. Two views on the evolutionary origin of this phenomenon emerged: the pleiotropic hypothesis positing that it is a byproduct of environmental stress or other specific stress response mechanisms and the second order selection which states that mutators hitchhike to fixation with unrelated beneficial alleles. Conventional population genetics models could not fully resolve this controversy because they are based on certain assumptions about fitness landscape. Here we address this problem using a microscopic multiscale model, which couples physically realistic molecular descriptions of proteins and their interactions with population genetics of carrier organisms without assuming any a priori mutational effect on fitness landscape. We found that both pleiotropy and second order selection play a crucial role at different stages of adaptation: the supply of mutators is provided through destabilization of error correction complexes or, alternatively, fluctuations of production levels of prototypic mismatch repair proteins (pleiotropic effects), while the rise and fixation of mutators occurs when there is a sufficient supply of beneficial mutations in replication-controlling genes. This general mechanism assures a robust and reliable adaptation of organisms to unforeseen challenges. This study highlights physical principles underlying biological mechanisms of stress response and adaptation.
Collapse
|
33
|
Karev GP. Replicator equations and the principle of minimal production of information. Bull Math Biol 2010; 72:1124-42. [PMID: 20146021 DOI: 10.1007/s11538-009-9484-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2008] [Accepted: 11/05/2009] [Indexed: 11/29/2022]
Abstract
Many complex systems in mathematical biology and other areas can be described by the replicator equation. We show that solutions of a wide class of replicator equations minimize the KL-divergence of the initial and current distributions under time-dependent constraints, which in their turn, can be computed explicitly at every instant due to the system dynamics. Therefore, the Kullback principle of minimum discrimination information, as well as the maximum entropy principle, for systems governed by the replicator equations can be derived from the system dynamics rather than postulated. Applications to the Malthusian inhomogeneous models, global demography, and the Eigen quasispecies equation are given.
Collapse
Affiliation(s)
- G P Karev
- Lockheed Martin MSD, National Institute of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
34
|
Koonin EV, Wolf YI. The fundamental units, processes and patterns of evolution, and the tree of life conundrum. Biol Direct 2009; 4:33. [PMID: 19788730 PMCID: PMC2761301 DOI: 10.1186/1745-6150-4-33] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2009] [Accepted: 09/29/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The elucidation of the dominant role of horizontal gene transfer (HGT) in the evolution of prokaryotes led to a severe crisis of the Tree of Life (TOL) concept and intense debates on this subject. CONCEPT Prompted by the crisis of the TOL, we attempt to define the primary units and the fundamental patterns and processes of evolution. We posit that replication of the genetic material is the singular fundamental biological process and that replication with an error rate below a certain threshold both enables and necessitates evolution by drift and selection. Starting from this proposition, we outline a general concept of evolution that consists of three major precepts. 1. The primary agency of evolution consists of Fundamental Units of Evolution (FUEs), that is, units of genetic material that possess a substantial degree of evolutionary independence. The FUEs include both bona fide selfish elements such as viruses, viroids, transposons, and plasmids, which encode some of the information required for their own replication, and regular genes that possess quasi-independence owing to their distinct selective value that provides for their transfer between ensembles of FUEs (genomes) and preferential replication along with the rest of the recipient genome. 2. The history of replication of a genetic element without recombination is isomorphously represented by a directed tree graph (an arborescence, in the graph theory language). Recombination within a FUE is common between very closely related sequences where homologous recombination is feasible but becomes negligible for longer evolutionary distances. In contrast, shuffling of FUEs occurs at all evolutionary distances. Thus, a tree is a natural representation of the evolution of an individual FUE on the macro scale, but not of an ensemble of FUEs such as a genome. 3. The history of life is properly represented by the "forest" of evolutionary trees for individual FUEs (Forest of Life, or FOL). Search for trends and patterns in the FOL is a productive direction of study that leads to the delineation of ensembles of FUEs that evolve coherently for a certain time span owing to a shared history of vertical inheritance or horizontal gene transfer; these ensembles are commonly known as genomes, taxa, or clades, depending on the level of analysis. A small set of genes (the universal genetic core of life) might show a (mostly) coherent evolutionary trend that transcends the entire history of cellular life forms. However, it might not be useful to denote this trend "the tree of life", or organismal, or species tree because neither organisms nor species are fundamental units of life. CONCLUSION A logical analysis of the units and processes of biological evolution suggests that the natural fundamental unit of evolution is a FUE, that is, a genetic element with an independent evolutionary history. Evolution of a FUE on the macro scale is naturally represented by a tree. Only the full compendium of trees for individual FUEs (the FOL) is an adequate depiction of the evolution of life. Coherent evolution of FUEs over extended evolutionary intervals is a crucial aspect of the history of life but a "species" or "organismal" tree is not a fundamental concept. REVIEWERS This articles was reviewed by Valerian Dolja, W. Ford Doolittle, Nicholas Galtier, and William Martin.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | |
Collapse
|
35
|
The continuity of protein structure space is an intrinsic property of proteins. Proc Natl Acad Sci U S A 2009; 106:15690-5. [PMID: 19805219 DOI: 10.1073/pnas.0907683106] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The classical view of the space of protein structures is that it is populated by a discrete set of protein folds. For proteins up to 200 residues long, by using structural alignments and building upon ideas of the completeness and continuity of structure space, we show that nearly any structure is significantly related to any other using a transitive set of no more than 7 intermediate structurally related proteins. This result holds for all structures in the Protein Data Bank, even when structural relationships between evolutionary related proteins (as detected by threading or functional analyses) are excluded. A similar picture holds for an artificial library of compact, hydrogen-bonded, homopolypeptide structures. The 3 sets share the global connectivity features of random graphs, in which the local connectivity of each node (i.e., the number of neighboring structures per protein) is preserved. This high connectivity supports the continuous view of single-domain protein structure space. More importantly, these results do not depend on evolution, rather just on the physics of protein structures. The fact that evolutionary divergence need not be invoked to explain the continuous nature of protein structure space has implications for how the universe of protein structures might have originated, and how function should be transferred between proteins of similar structure.
Collapse
|
36
|
Sadreyev RI, Kim BH, Grishin NV. Discrete-continuous duality of protein structure space. Curr Opin Struct Biol 2009; 19:321-8. [PMID: 19482467 PMCID: PMC3688466 DOI: 10.1016/j.sbi.2009.04.009] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2009] [Revised: 04/29/2009] [Accepted: 04/29/2009] [Indexed: 11/30/2022]
Abstract
Recently, the nature of protein structure space has been widely discussed in the literature. The traditional discrete view of protein universe as a set of separate folds has been criticized in the light of growing evidence that almost any arrangement of secondary structures is possible and the whole protein space can be traversed through a path of similar structures. Here we argue that the discrete and continuous descriptions are not mutually exclusive, but complementary: the space is largely discrete in evolutionary sense, but continuous geometrically when purely structural similarities are quantified. Evolutionary connections are mainly confined to separate structural prototypes corresponding to folds as islands of structural stability, with few remaining traceable links between the islands. However, for a geometric similarity measure, it is usually possible to find a reasonable cutoff that yields paths connecting any two structures through intermediates.
Collapse
Affiliation(s)
- Ruslan I. Sadreyev
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA
| | - Bong-Hyun Kim
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA
| | - Nick V. Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA
| |
Collapse
|
37
|
Karev GP. On mathematical theory of selection: continuous time population dynamics. J Math Biol 2009; 60:107-29. [PMID: 19283384 DOI: 10.1007/s00285-009-0252-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2007] [Revised: 01/27/2009] [Indexed: 10/21/2022]
Abstract
Mathematical theory of selection is developed within the frameworks of general models of inhomogeneous populations with continuous time. Methods that allow us to study the distribution dynamics under natural selection and to construct explicit solutions of the models are developed. All statistical characteristics of interest, such as the mean values of the fitness or any trait can be computed effectively, and the results depend in a crucial way on the initial distribution. The developed theory provides an effective method for solving selection systems; it reduces the initial complex model to a special system of ordinary differential equations (the escort system). Applications of the method to the Price equations are given; the solutions of some particular inhomogeneous Malthusian, Ricker and logistic-like models used but not solved in the literature are derived in explicit form.
Collapse
Affiliation(s)
- Georgiy P Karev
- Lockheed Martin MSD, National Institutes of Health, Bldg. 38A, Rm. 5N511N, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| |
Collapse
|
38
|
Abstract
Contemporary protein architectures can be regarded as molecular fossils, historical imprints that mark important milestones in the history of life. Whereas sequences change at a considerable pace, higher-order structures are constrained by the energetic landscape of protein folding, the exploration of sequence and structure space, and complex interactions mediated by the proteostasis and proteolytic machineries of the cell. The survey of architectures in the living world that was fuelled by recent structural genomic initiatives has been summarized in protein classification schemes, and the overall structure of fold space explored with novel bioinformatic approaches. However, metrics of general structural comparison have not yet unified architectural complexity using the 'shared and derived' tenet of evolutionary analysis. In contrast, a shift of focus from molecules to proteomes and a census of protein structure in fully sequenced genomes were able to uncover global evolutionary patterns in the structure of proteins. Timelines of discovery of architectures and functions unfolded episodes of specialization, reductive evolutionary tendencies of architectural repertoires in proteomes and the rise of modularity in the protein world. They revealed a biologically complex ancestral proteome and the early origin of the archaeal lineage. Studies also identified an origin of the protein world in enzymes of nucleotide metabolism harbouring the P-loop-containing triphosphate hydrolase fold and the explosive discovery of metabolic functions that recapitulated well-defined prebiotic shells and involved the recruitment of structures and functions. These observations have important implications for origins of modern biochemistry and diversification of life.
Collapse
|
39
|
Noirel J, Simonson T. Neutral evolution of proteins: The superfunnel in sequence space and its relation to mutational robustness. J Chem Phys 2009; 129:185104. [PMID: 19045432 DOI: 10.1063/1.2992853] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Following Kimura's neutral theory of molecular evolution [M. Kimura, The Neutral Theory of Molecular Evolution (Cambridge University Press, Cambridge, 1983) (reprinted in 1986)], it has become common to assume that the vast majority of viable mutations of a gene confer little or no functional advantage. Yet, in silico models of protein evolution have shown that mutational robustness of sequences could be selected for, even in the context of neutral evolution. The evolution of a biological population can be seen as a diffusion on the network of viable sequences. This network is called a "neutral network." Depending on the mutation rate mu and the population size N, the biological population can evolve purely randomly (muN<<1) or it can evolve in such a way as to select for sequences of higher mutational robustness (muN>>1). The stringency of the selection depends not only on the product muN but also on the exact topology of the neutral network, the special arrangement of which was named "superfunnel." Even though the relation between mutation rate, population size, and selection was thoroughly investigated, a study of the salient topological features of the superfunnel that could affect the strength of the selection was wanting. This question is addressed in this study. We use two different models of proteins: on lattice and off lattice. We compare neutral networks computed using these models to random networks. From this, we identify two important factors of the topology that determine the stringency of the selection for mutationally robust sequences. First, the presence of highly connected nodes ("hubs") in the network increases the selection for mutationally robust sequences. Second, the stringency of the selection increases when the correlation between a sequence's mutational robustness and its neighbors' increases. The latter finding relates a global characteristic of the neutral network to a local one, which is attainable through experiments or molecular modeling.
Collapse
Affiliation(s)
- Josselin Noirel
- Laboratoire de Biochimie, Ecole Polytechnique, Route de Saclay, Palaiseau 91128 Cedex, France.
| | | |
Collapse
|
40
|
Abstract
Which factors govern the evolution of mutation rates and emergence of species? Here, we address this question by using a first principles model of life where population dynamics of asexual organisms is coupled to molecular properties and interactions of proteins encoded in their genomes. Simulating evolution of populations, we found that fitness increases in punctuated steps via epistatic events, leading to formation of stable and functionally interacting proteins. At low mutation rates, species form populations of organisms tightly localized in sequence space, whereas at higher mutation rates, species are lost without an apparent loss of fitness. However, when mutation rate was a selectable trait, the population initially maintained high mutation rate until a high fitness level was reached, after which organisms with low mutation rates are gradually selected, with the population eventually reaching mutation rates comparable with those of modern DNA-based organisms. This study shows that the fitness landscape of a biophysically realistic system is extremely complex, with huge number of local peaks rendering adaptation dynamics to be a glass-like process. On a more practical level, our results provide a rationale to experimental observations of the effect of mutation rate on fitness of populations of asexual organisms.
Collapse
|
41
|
Zeldovich KB, Shakhnovich EI. Understanding protein evolution: from protein physics to Darwinian selection. Annu Rev Phys Chem 2008; 59:105-27. [PMID: 17937598 DOI: 10.1146/annurev.physchem.58.032806.104449] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Efforts in whole-genome sequencing and structural proteomics start to provide a global view of the protein universe, the set of existing protein structures and sequences. However, approaches based on the selection of individual sequences have not been entirely successful at the quantitative description of the distribution of structures and sequences in the protein universe because evolutionary pressure acts on the entire organism, rather than on a particular molecule. In parallel to this line of study, studies in population genetics and phenomenological molecular evolution established a mathematical framework to describe the changes in genome sequences in populations of organisms over time. Here, we review both microscopic (physics-based) and macroscopic (organism-level) models of protein-sequence evolution and demonstrate that bridging the two scales provides the most complete description of the protein universe starting from clearly defined, testable, and physiologically relevant assumptions.
Collapse
Affiliation(s)
- Konstantin B Zeldovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
| | | |
Collapse
|
42
|
Stylus: a system for evolutionary experimentation based on a protein/proteome model with non-arbitrary functional constraints. PLoS One 2008; 3:e2246. [PMID: 18523658 PMCID: PMC2405935 DOI: 10.1371/journal.pone.0002246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2008] [Accepted: 04/15/2008] [Indexed: 11/28/2022] Open
Abstract
The study of protein evolution is complicated by the vast size of protein sequence space, the huge number of possible protein folds, and the extraordinary complexity of the causal relationships between protein sequence, structure, and function. Much simpler model constructs may therefore provide an attractive complement to experimental studies in this area. Lattice models, which have long been useful in studies of protein folding, have found increasing use here. However, while these models incorporate actual sequences and structures (albeit non-biological ones), they incorporate no actual functions—relying instead on largely arbitrary structural criteria as a proxy for function. In view of the central importance of function to evolution, and the impossibility of incorporating real functional constraints without real function, it is important that protein-like models be developed around real structure–function relationships. Here we describe such a model and introduce open-source software that implements it. The model is based on the structure–function relationship in written language, where structures are two-dimensional ink paths and functions are the meanings that result when these paths form legible characters. To capture something like the hierarchical complexity of protein structure, we use the traditional characters of Chinese origin. Twenty coplanar vectors, encoded by base triplets, act like amino acids in building the character forms. This vector-world model captures many aspects of real proteins, including life-size sequences, a life-size structural repertoire, a realistic genetic code, secondary, tertiary, and quaternary structure, structural domains and motifs, operon-like genetic structures, and layered functional complexity up to a level resembling bacterial genomes and proteomes. Stylus is a full-featured implementation of the vector world for Unix systems. To demonstrate the utility of Stylus, we generated a sample set of homologous vector proteins by evolving successive lines from a single starting gene. These homologues show sequence and structure divergence resembling those of natural homologues in many respects, suggesting that the system may be sufficiently life-like for informative comparison to biology.
Collapse
|
43
|
Shakhnovich BE, Shakhnovich EI. Improvisation in evolution of genes and genomes: whose structure is it anyway? Curr Opin Struct Biol 2008; 18:375-81. [PMID: 18487041 DOI: 10.1016/j.sbi.2008.02.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2008] [Accepted: 02/13/2008] [Indexed: 01/31/2023]
Abstract
Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales--from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.
Collapse
Affiliation(s)
- Boris E Shakhnovich
- Department of Molecular and Cellular Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, United States
| | | |
Collapse
|
44
|
Goldstein RA. The structure of protein evolution and the evolution of protein structure. Curr Opin Struct Biol 2008; 18:170-7. [DOI: 10.1016/j.sbi.2008.01.006] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2007] [Revised: 12/20/2007] [Accepted: 01/09/2008] [Indexed: 11/29/2022]
|