101
|
Reia SM, Campos PRA. Analysis of statistical correlations between properties of adaptive walks in fitness landscapes. ROYAL SOCIETY OPEN SCIENCE 2020; 7:192118. [PMID: 32218986 PMCID: PMC7029893 DOI: 10.1098/rsos.192118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 01/13/2020] [Indexed: 06/10/2023]
Abstract
The fitness landscape metaphor has been central in our way of thinking about adaptation. In this scenario, adaptive walks are idealized dynamics that mimic the uphill movement of an evolving population towards a fitness peak of the landscape. Recent works in experimental evolution have demonstrated that the constraints imposed by epistasis are responsible for reducing the number of accessible mutational pathways towards fitness peaks. Here, we exhaustively analyse the statistical properties of adaptive walks for two empirical fitness landscapes and theoretical NK landscapes. Some general conclusions can be drawn from our simulation study. Regardless of the dynamics, we observe that the shortest paths are more regularly used. Although the accessibility of a given fitness peak is reasonably correlated to the number of monotonic pathways towards it, the two quantities are not exactly proportional. A negative correlation between predictability and mean path divergence is established, and so the decrease of the number of effective mutational pathways ensures the convergence of the attraction basin of fitness peaks. On the other hand, other features are not conserved among fitness landscapes, such as the relationship between accessibility and predictability.
Collapse
Affiliation(s)
- Sandro M. Reia
- Instituto de Física de São Carlos, Universidade de São Paulo, Caixa Postal 369, 13560-970 São Carlos, São Paulo, Brazil
| | - Paulo R. A. Campos
- Evolutionary Dynamics Lab, Physics Department, Federal University of Pernambuco, Recife, Brazil
| |
Collapse
|
102
|
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 2019; 20:223. [PMID: 31679514 PMCID: PMC6827219 DOI: 10.1186/s13059-019-1845-6] [Citation(s) in RCA: 125] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 10/01/2019] [Indexed: 11/10/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here, we present MaveDB ( https://www.mavedb.org ), a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first such application, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.
Collapse
Affiliation(s)
- Daniel Esposito
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
| | - Jochen Weile
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Lea M Starita
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Frederick P Roth
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia.
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
| |
Collapse
|
103
|
J B, M M B, Chanda K. Evolutionary approaches in protein engineering towards biomaterial construction. RSC Adv 2019; 9:34720-34734. [PMID: 35530663 PMCID: PMC9074691 DOI: 10.1039/c9ra06807d] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 10/01/2019] [Indexed: 11/29/2022] Open
Abstract
The tailoring of proteins for specific applications by evolutionary methods is a highly active area of research. Rational design and directed evolution are the two main strategies to reengineer proteins or create chimeric structures. Rational engineering is often limited by insufficient knowledge about proteins' structure-function relationships; directed evolution overcomes this restriction but poses challenges in the screening of candidates. A combination of these protein engineering approaches will allow us to create protein variants with a wide range of desired properties. Herein, we focus on the application of these approaches towards the generation of protein biomaterials that are known for biodegradability, biocompatibility and biofunctionality, from combinations of natural, synthetic, or engineered proteins and protein domains. Potential applications depend on the enhancement of biofunctional, mechanical, or other desired properties. Examples include scaffolds for tissue engineering, thermostable enzymes for industrial biocatalysis, and other therapeutic applications.
Collapse
Affiliation(s)
- Brindha J
- Department of Chemistry, School of Advanced Science, Vellore Institute of Technology, Chennai Campus Vandalur-Kelambakkam Road Chennai-600 127 Tamil Nadu India
| | - Balamurali M M
- Department of Chemistry, School of Advanced Science, Vellore Institute of Technology, Chennai Campus Vandalur-Kelambakkam Road Chennai-600 127 Tamil Nadu India
| | - Kaushik Chanda
- Department of Chemistry, School of Advanced Science, Vellore Institute of Technology Vellore-632014 Tamil Nadu India
| |
Collapse
|
104
|
Adaptive walks on high-dimensional fitness landscapes and seascapes with distance-dependent statistics. Theor Popul Biol 2019; 130:13-49. [PMID: 31605706 DOI: 10.1016/j.tpb.2019.09.011] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 09/07/2019] [Accepted: 09/12/2019] [Indexed: 11/21/2022]
Abstract
The dynamics of evolution is intimately shaped by epistasis - interactions between genetic elements which cause the fitness-effect of combinations of mutations to be non-additive. Analyzing evolutionary dynamics that involves large numbers of epistatic mutations is intrinsically difficult. A crucial feature is that the fitness landscape in the vicinity of the current genome depends on the evolutionary history. A key step is thus developing models that enable study of the effects of past evolution on future evolution. In this work, we introduce a broad class of high-dimensional random fitness landscapes for which the correlations between fitnesses of genomes are a general function of genetic distance. Their Gaussian character allows for tractable computational as well as analytic understanding. We study the properties of these landscapes focusing on the simplest evolutionary process: random adaptive (uphill) walks. Conventional measures of "ruggedness" are shown to not much affect such adaptive walks. Instead, the long-distance statistics of epistasis cause all properties to be highly conditional on past evolution, determining the statistics of the local landscape (the distribution of fitness-effects of available mutations and combinations of these), as well as the global geometry of evolutionary trajectories. In order to further explore the effects of conditioning on past evolution, we model the effects of slowly changing environments. At long times, such fitness "seascapes" cause a statistical steady state with highly intermittent evolutionary dynamics: populations undergo bursts of rapid adaptation, interspersed with periods in which adaptive mutations are rare and the population waits for more new directions to be opened up by changes in the environment. Finally, we discuss prospects for studying more complex evolutionary dynamics and on broader classes of high-dimensional landscapes and seascapes.
Collapse
|
105
|
Wang S, Dai L. Evolving generalists in switching rugged landscapes. PLoS Comput Biol 2019; 15:e1007320. [PMID: 31574088 PMCID: PMC6771975 DOI: 10.1371/journal.pcbi.1007320] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 08/02/2019] [Indexed: 01/05/2023] Open
Abstract
Evolving systems, be it an antibody repertoire in the face of mutating pathogens or a microbial population exposed to varied antibiotics, constantly search for adaptive solutions in time-varying fitness landscapes. Generalists refer to genotypes that remain fit across diverse selective pressures; while multi-drug resistant microbes are undesired yet prevalent, broadly-neutralizing antibodies are much wanted but rare. However, little is known about under what conditions such generalists with a high capacity to adapt can be efficiently discovered by evolution. In addition, can epistasis-the source of landscape ruggedness and path constraints-play a different role, if the environment varies in a non-random way? We present a generative model to estimate the propensity of evolving generalists in rugged landscapes that are tunably related and alternating relatively slowly. We find that environmental cycling can substantially facilitate the search for fit generalists by dynamically enlarging their effective basins of attraction. Importantly, these high performers are most likely to emerge at intermediate levels of ruggedness and environmental relatedness. Our approach allows one to estimate correlations across environments from the topography of experimental fitness landscapes. Our work provides a conceptual framework to study evolution in time-correlated complex environments, and offers statistical understanding that suggests general strategies for eliciting broadly neutralizing antibodies or preventing microbes from evolving multi-drug resistance.
Collapse
Affiliation(s)
- Shenshen Wang
- Department of Physics and Astronomy, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| | - Lei Dai
- Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
106
|
Large-effect flowering time mutations reveal conditionally adaptive paths through fitness landscapes in Arabidopsis thaliana. Proc Natl Acad Sci U S A 2019; 116:17890-17899. [PMID: 31420516 PMCID: PMC6731683 DOI: 10.1073/pnas.1902731116] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Mutations are often assumed to be largely detrimental to fitness, but they may also be beneficial, and mutations with large phenotypic effects can persist in nature. One explanation for these observations is that mutations may be beneficial in specific environments because these conditions shift trait expression toward higher fitness. This hypothesis is rarely tested due to the difficulty of replicating mutants in multiple natural environments and measuring their phenotypes. We did so by planting Arabidopsis thaliana genotypes with large-effect flowering time mutations in field sites across the species’ European climate range. We quantified the adaptive value of mutant traits, finding that certain mutations increased fitness in some environments but not in others. Contrary to previous assumptions that most mutations are deleterious, there is increasing evidence for persistence of large-effect mutations in natural populations. A possible explanation for these observations is that mutant phenotypes and fitness may depend upon the specific environmental conditions to which a mutant is exposed. Here, we tested this hypothesis by growing large-effect flowering time mutants of Arabidopsis thaliana in multiple field sites and seasons to quantify their fitness effects in realistic natural conditions. By constructing environment-specific fitness landscapes based on flowering time and branching architecture, we observed that a subset of mutations increased fitness, but only in specific environments. These mutations increased fitness via different paths: through shifting flowering time, branching, or both. Branching was under stronger selection, but flowering time was more genetically variable, pointing to the importance of indirect selection on mutations through their pleiotropic effects on multiple phenotypes. Finally, mutations in hub genes with greater connectedness in their regulatory networks had greater effects on both phenotypes and fitness. Together, these findings indicate that large-effect mutations may persist in populations because they influence traits that are adaptive only under specific environmental conditions. Understanding their evolutionary dynamics therefore requires measuring their effects in multiple natural environments.
Collapse
|
107
|
Lebeuf-Taylor E, McCloskey N, Bailey SF, Hinz A, Kassen R. The distribution of fitness effects among synonymous mutations in a gene under directional selection. eLife 2019; 8:45952. [PMID: 31322500 PMCID: PMC6692132 DOI: 10.7554/elife.45952] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Accepted: 07/18/2019] [Indexed: 12/21/2022] Open
Abstract
The fitness effects of synonymous mutations, nucleotide changes that do not alter the encoded amino acid, have often been assumed to be neutral, but a growing body of evidence suggests otherwise. We used site-directed mutagenesis coupled with direct measures of competitive fitness to estimate the distribution of fitness effects among synonymous mutations for a gene under directional selection and capable of adapting via synonymous nucleotide changes. Synonymous mutations had highly variable fitness effects, both deleterious and beneficial, resembling those of nonsynonymous mutations in the same gene. This variation in fitness was underlain by changes in transcription linked to the creation of internal promoter sites. A positive correlation between fitness and the presence of synonymous substitutions across a phylogeny of related Pseudomonads suggests these mutations may be common in nature. Taken together, our results provide the most compelling evidence to date that synonymous mutations with non-neutral fitness effects may in fact be commonplace.
Collapse
Affiliation(s)
| | - Nick McCloskey
- Department of Biology, University of Ottawa, Ottawa, Canada
| | - Susan F Bailey
- Department of Biology, Clarkson University, Potsdam, United States
| | - Aaron Hinz
- Department of Biology, University of Ottawa, Ottawa, Canada
| | - Rees Kassen
- Department of Biology, University of Ottawa, Ottawa, Canada
| |
Collapse
|
108
|
Bendixsen DP, Collet J, Østman B, Hayden EJ. Genotype network intersections promote evolutionary innovation. PLoS Biol 2019; 17:e3000300. [PMID: 31136568 PMCID: PMC6555535 DOI: 10.1371/journal.pbio.3000300] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 06/07/2019] [Accepted: 05/15/2019] [Indexed: 12/27/2022] Open
Abstract
Evolutionary innovations are qualitatively novel traits that emerge through evolution and increase biodiversity. The genetic mechanisms of innovation remain poorly understood. A systems view of innovation requires the analysis of genotype networks—the vast networks of genetic variants that produce the same phenotype. Innovations can occur at the intersection of two different genotype networks. However, the experimental characterization of genotype networks has been hindered by the vast number of genetic variants that need to be functionally analyzed. Here, we use high-throughput sequencing to study the fitness landscape at the intersection of the genotype networks of two catalytic RNA molecules (ribozymes). We determined the ability of numerous neighboring RNA sequences to catalyze two different chemical reactions, and we use these data as a proxy for a genotype to fitness map where two functions come in close proximity. We find extensive functional overlap, and numerous genotypes can catalyze both functions. We demonstrate through evolutionary simulations that these numerous points of intersection facilitate the discovery of a new function. However, the rate of adaptation of the new function depends upon the local ruggedness around the starting location in the genotype network. As a consequence, one direction of adaptation is more rapid than the other. We find that periods of neutral evolution increase rates of adaptation to the new function by allowing populations to spread out in their genotype network. Our study reveals the properties of a fitness landscape where genotype networks intersect and the consequences for evolutionary innovations. Our results suggest that historic innovations in natural systems may have been facilitated by overlapping genotype networks. The determination of the empirical fitness landscape at the genotypic intersection between two different catalytic RNA (ribozyme) functions reveals details about how novel traits can emerge through evolutionary innovation.
Collapse
Affiliation(s)
- Devin P. Bendixsen
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, Idaho, United States of America
- * E-mail: (DPB); (EJH)
| | - James Collet
- Department of Biological Science, Boise State University, Boise, Idaho, United States of America
| | - Bjørn Østman
- Keck Graduate Institute, Claremont, California, United States of America
| | - Eric J. Hayden
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, Idaho, United States of America
- Department of Biological Science, Boise State University, Boise, Idaho, United States of America
- * E-mail: (DPB); (EJH)
| |
Collapse
|
109
|
Kinney JB, McCandlish DM. Massively Parallel Assays and Quantitative Sequence-Function Relationships. Annu Rev Genomics Hum Genet 2019; 20:99-127. [PMID: 31091417 DOI: 10.1146/annurev-genom-083118-014845] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Over the last decade, a rich variety of massively parallel assays have revolutionized our understanding of how biological sequences encode quantitative molecular phenotypes. These assays include deep mutational scanning, high-throughput SELEX, and massively parallel reporter assays. Here, we review these experimental methods and how the data they produce can be used to quantitatively model sequence-function relationships. In doing so, we touch on a diverse range of topics, including the identification of clinically relevant genomic variants, the modeling of transcription factor binding to DNA, the functional and evolutionary landscapes of proteins, and cis-regulatory mechanisms in both transcription and mRNA splicing. We further describe a unified conceptual framework and a core set of mathematical modeling strategies that studies in these diverse areas can make use of. Finally, we highlight key aspects of experimental design and mathematical modeling that are important for the results of such studies to be interpretable and reproducible.
Collapse
Affiliation(s)
- Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| |
Collapse
|
110
|
Domingo J, Baeza-Centurion P, Lehner B. The Causes and Consequences of Genetic Interactions (Epistasis). Annu Rev Genomics Hum Genet 2019; 20:433-460. [PMID: 31082279 DOI: 10.1146/annurev-genom-083118-014857] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The same mutation can have different effects in different individuals. One important reason for this is that the outcome of a mutation can depend on the genetic context in which it occurs. This dependency is known as epistasis. In recent years, there has been a concerted effort to quantify the extent of pairwise and higher-order genetic interactions between mutations through deep mutagenesis of proteins and RNAs. This research has revealed two major components of epistasis: nonspecific genetic interactions caused by nonlinearities in genotype-to-phenotype maps, and specific interactions between particular mutations. Here, we provide an overview of our current understanding of the mechanisms causing epistasis at the molecular level, the consequences of genetic interactions for evolution and genetic prediction, and the applications of epistasis for understanding biology and determining macromolecular structures.
Collapse
Affiliation(s)
- Júlia Domingo
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , ,
| | - Pablo Baeza-Centurion
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , ,
| | - Ben Lehner
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , , .,Universitat Pompeu Fabra, 08003 Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
| |
Collapse
|
111
|
Abstract
For nearly a century adaptive landscapes have provided overviews of the evolutionary process and yet they remain metaphors. We redefine adaptive landscapes in terms of biological processes rather than descriptive phenomenology. We focus on the underlying mechanisms that generate emergent properties such as epistasis, dominance, trade-offs and adaptive peaks. We illustrate the utility of landscapes in predicting the course of adaptation and the distribution of fitness effects. We abandon aged arguments concerning landscape ruggedness in favor of empirically determining landscape architecture. In so doing, we transform the landscape metaphor into a scientific framework within which causal hypotheses can be tested.
Collapse
Affiliation(s)
- Xiao Yi
- BioTechnology Institute, University of Minnesota, St. Paul, MN
| | - Antony M Dean
- BioTechnology Institute, University of Minnesota, St. Paul, MN
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN
| |
Collapse
|
112
|
Machine learning-assisted directed protein evolution with combinatorial libraries. Proc Natl Acad Sci U S A 2019; 116:8852-8858. [PMID: 30979809 DOI: 10.1073/pnas.1901979116] [Citation(s) in RCA: 290] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To reduce experimental effort associated with directed protein evolution and to explore the sequence space encoded by mutating multiple positions simultaneously, we incorporate machine learning into the directed evolution workflow. Combinatorial sequence space can be quite expensive to sample experimentally, but machine-learning models trained on tested variants provide a fast method for testing sequence space computationally. We validated this approach on a large published empirical fitness landscape for human GB1 binding protein, demonstrating that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. We then provide an example application in evolving an enzyme to produce each of the two possible product enantiomers (i.e., stereodivergence) of a new-to-nature carbene Si-H insertion reaction. The approach predicted libraries enriched in functional enzymes and fixed seven mutations in two rounds of evolution to identify variants for selective catalysis with 93% and 79% ee (enantiomeric excess). By greatly increasing throughput with in silico modeling, machine learning enhances the quality and diversity of sequence solutions for a protein engineering problem.
Collapse
|
113
|
Pressman AD, Liu Z, Janzen E, Blanco C, Müller UF, Joyce GF, Pascal R, Chen IA. Mapping a Systematic Ribozyme Fitness Landscape Reveals a Frustrated Evolutionary Network for Self-Aminoacylating RNA. J Am Chem Soc 2019; 141:6213-6223. [PMID: 30912655 PMCID: PMC6548421 DOI: 10.1021/jacs.8b13298] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
![]()
Molecular
evolution can be conceptualized as a walk over a “fitness
landscape”, or the function of fitness (e.g., catalytic activity)
over the space of all possible sequences. Understanding evolution
requires knowing the structure of the fitness landscape and identifying
the viable evolutionary pathways through the landscape. However, the
fitness landscape for any catalytic biomolecule is largely unknown.
The evolution of catalytic RNA is of special interest because RNA
is believed to have been foundational to early life. In particular,
an essential activity leading to the genetic code would be the reaction
of ribozymes with activated amino acids, such as 5(4H)-oxazolones, to form aminoacyl-RNA. Here we combine in vitro selection
with a massively parallel kinetic assay to map a fitness landscape
for self-aminoacylating RNA, with nearly complete coverage of sequence
space in a central 21-nucleotide region. The method (SCAPE: sequencing
to measure catalytic activity paired with in vitro evolution) shows
that the landscape contains three major ribozyme families (landscape
peaks). An analysis of evolutionary pathways shows that, while local
optimization within a ribozyme family would be possible, optimization
of activity over the entire landscape would be frustrated by large
valleys of low activity. The sequence motifs associated with each
peak represent different solutions to the problem of catalysis, so
the inability to traverse the landscape globally corresponds to an
inability to restructure the ribozyme without losing activity. The
frustrated nature of the evolutionary network suggests that chance
emergence of a ribozyme motif would be more important than optimization
by natural selection.
Collapse
Affiliation(s)
- Abe D Pressman
- Department of Chemistry and Biochemistry 9510 , University of California , Santa Barbara , California 93106 , United States.,Program in Chemical Engineering , University of California , Santa Barbara , California 93106 , United States
| | - Ziwei Liu
- MRC Laboratory of Molecular Biology , Cambridge Biomedical Campus , Cambridge CB2 0QH , U.K.,IBMM, CNRS, University of Montpellier, ENSCM , 34090 Montpellier , France
| | - Evan Janzen
- Department of Chemistry and Biochemistry 9510 , University of California , Santa Barbara , California 93106 , United States.,Program in Biomolecular Sciences and Engineering , University of California , Santa Barbara , California 93106 , United States
| | - Celia Blanco
- Department of Chemistry and Biochemistry 9510 , University of California , Santa Barbara , California 93106 , United States
| | - Ulrich F Müller
- Department of Chemistry and Biochemistry , University of California , San Diego , California 92093 , United States
| | - Gerald F Joyce
- Salk Institute for Biological Studies , La Jolla , California 92037 , United States
| | - Robert Pascal
- IBMM, CNRS, University of Montpellier, ENSCM , 34090 Montpellier , France
| | - Irene A Chen
- Department of Chemistry and Biochemistry 9510 , University of California , Santa Barbara , California 93106 , United States.,Program in Biomolecular Sciences and Engineering , University of California , Santa Barbara , California 93106 , United States
| |
Collapse
|
114
|
Blanco C, Janzen E, Pressman A, Saha R, Chen IA. Molecular Fitness Landscapes from High-Coverage Sequence Profiling. Annu Rev Biophys 2019; 48:1-18. [PMID: 30601678 DOI: 10.1146/annurev-biophys-052118-115333] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The function of fitness (or molecular activity) in the space of all possible sequences is known as the fitness landscape. Evolution is a random walk on the fitness landscape, with a bias toward climbing hills. Mapping the topography of real fitness landscapes is fundamental to understanding evolution, but previous efforts were hampered by the difficulty of obtaining large, quantitative data sets. The accessibility of high-throughput sequencing (HTS) has transformed this study, enabling large-scale enumeration of fitness for many mutants and even complete sequence spaces in some cases. We review the progress of high-throughput studies in mapping molecular fitness landscapes, both in vitro and in vivo, as well as opportunities for future research. Such studies are rapidly growing in number. HTS is expected to have a profound effect on the understanding of real molecular fitness landscapes.
Collapse
Affiliation(s)
- Celia Blanco
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA; , , , ,
| | - Evan Janzen
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA; , , , , .,Biomolecular Science and Engineering Program, University of California, Santa Barbara, California 93106, USA
| | - Abe Pressman
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA; , , , , .,Department of Chemical Engineering, University of California, Santa Barbara, California 93106, USA
| | - Ranajay Saha
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA; , , , ,
| | - Irene A Chen
- Biomolecular Science and Engineering Program, University of California, Santa Barbara, California 93106, USA
| |
Collapse
|
115
|
Hilton SK, Bloom JD. Modeling site-specific amino-acid preferences deepens phylogenetic estimates of viral sequence divergence. Virus Evol 2018; 4:vey033. [PMID: 30425841 PMCID: PMC6220371 DOI: 10.1093/ve/vey033] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Molecular phylogenetics is often used to estimate the time since the divergence of modern gene sequences. For highly diverged sequences, such phylogenetic techniques sometimes estimate surprisingly recent divergence times. In the case of viruses, independent evidence indicates that the estimates of deep divergence times from molecular phylogenetics are sometimes too recent. This discrepancy is caused in part by inadequate models of purifying selection leading to branch-length underestimation. Here we examine the effect on branch-length estimation of using models that incorporate experimental measurements of purifying selection. We find that models informed by experimentally measured site-specific amino-acid preferences estimate longer deep branches on phylogenies of influenza virus hemagglutinin. This lengthening of branches is due to more realistic stationary states of the models, and is mostly independent of the branch-length extension from modeling site-to-site variation in amino-acid substitution rate. The branch-length extension from experimentally informed site-specific models is similar to that achieved by other approaches that allow the stationary state to vary across sites. However, the improvements from all of these site-specific but time homogeneous and site independent models are limited by the fact that a protein’s amino-acid preferences gradually shift as it evolves. Overall, our work underscores the importance of modeling site-specific amino-acid preferences when estimating deep divergence times—but also shows the inherent limitations of approaches that fail to account for how these preferences shift over time.
Collapse
Affiliation(s)
- Sarah K Hilton
- Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center.,Department of Genome Sciences, University of Washington, USA
| | - Jesse D Bloom
- Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center.,Department of Genome Sciences, University of Washington, USA.,Howard Hughes Medical Institute, Seattle, WA, USA
| |
Collapse
|
116
|
McCandlish DM. Long-term evolution on complex fitness landscapes when mutation is weak. Heredity (Edinb) 2018; 121:449-465. [PMID: 30232363 PMCID: PMC6180110 DOI: 10.1038/s41437-018-0142-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2017] [Revised: 08/04/2018] [Accepted: 08/06/2018] [Indexed: 12/25/2022] Open
Abstract
Understanding evolution on complex fitness landscapes is difficult both because of the large dimensionality of sequence space and the stochasticity inherent to population-genetic processes. Here, I present an integrated suite of mathematical tools for understanding evolution on time-invariant fitness landscapes when mutations occur sufficiently rarely that the population is typically monomorphic and evolution can be modeled as a sequence of well-separated fixation events. The basic intuition behind this suite of tools is that surrounding any particular genotype lies a region of the fitness landscape that is easy to evolve to, while other pieces of the fitness landscape are difficult to evolve to (due to distance, being across a fitness valley, etc.). I propose a rigorous definition for this "dynamical neighborhood" of a genotype which captures several aspects of the distribution of waiting times to evolve from one genotype to another. The neighborhood structure of the landscape as a whole can be summarized as a matrix, and I show how this matrix can be used to approximate the expected waiting time for certain evolutionary events to occur and to provide an intuitive interpretation to existing formal results on the index of dispersion of the molecular clock.
Collapse
Affiliation(s)
- David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
117
|
Castiglione GM, Chang BS. Functional trade-offs and environmental variation shaped ancient trajectories in the evolution of dim-light vision. eLife 2018; 7:35957. [PMID: 30362942 PMCID: PMC6203435 DOI: 10.7554/elife.35957] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Accepted: 09/09/2018] [Indexed: 12/11/2022] Open
Abstract
Trade-offs between protein stability and activity can restrict access to evolutionary trajectories, but widespread epistasis may facilitate indirect routes to adaptation. This may be enhanced by natural environmental variation, but in multicellular organisms this process is poorly understood. We investigated a paradoxical trajectory taken during the evolution of tetrapod dim-light vision, where in the rod visual pigment rhodopsin, E122 was fixed 350 million years ago, a residue associated with increased active-state (MII) stability but greatly diminished rod photosensitivity. Here, we demonstrate that high MII stability could have likely evolved without E122, but instead, selection appears to have entrenched E122 in tetrapods via epistatic interactions with nearby coevolving sites. In fishes by contrast, selection may have exploited these epistatic effects to explore alternative trajectories, but via indirect routes with low MII stability. Our results suggest that within tetrapods, E122 and high MII stability cannot be sacrificed-not even for improvements to rod photosensitivity.
Collapse
Affiliation(s)
- Gianni M Castiglione
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada.,Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada
| | - Belinda Sw Chang
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada.,Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada.,Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Canada
| |
Collapse
|
118
|
Nelson ED, Grishin NV. Inference of epistatic effects in a key mitochondrial protein. Phys Rev E 2018; 97:062404. [PMID: 30011480 DOI: 10.1103/physreve.97.062404] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2017] [Indexed: 12/17/2022]
Abstract
We use Potts model inference to predict pair epistatic effects in a key mitochondrial protein-cytochrome c oxidase subunit 2-for ray-finned fishes. We examine the effect of phylogenetic correlations on our predictions using a simple exact fitness model, and we find that, although epistatic effects are underpredicted, they maintain a roughly linear relationship to their true (model) values. After accounting for this correction, epistatic effects in the protein are still relatively weak, leading to fitness valleys of depth 2Ns≃-5 in compensatory double mutants. Interestingly, positive epistasis is more pronounced than negative epistasis, and the strongest positive effects capture nearly all sites subject to positive selection in fishes, similar to virus proteins evolving under selection pressure in the context of drug therapy.
Collapse
Affiliation(s)
- Erik D Nelson
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Blvd., Room ND10.124, Dallas, Texas 75235-9050, USA
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Blvd., Room ND10.124, Dallas, Texas 75235-9050, USA
| |
Collapse
|
119
|
Otwinowski J. Biophysical Inference of Epistasis and the Effects of Mutations on Protein Stability and Function. Mol Biol Evol 2018; 35:2345-2354. [PMID: 30085303 PMCID: PMC6188545 DOI: 10.1093/molbev/msy141] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Understanding the relationship between protein sequence, function, and stability is a fundamental problem in biology. The essential function of many proteins that fold into a specific structure is their ability to bind to a ligand, which can be assayed for thousands of mutated variants. However, binding assays do not distinguish whether mutations affect the stability of the binding interface or the overall fold. Here, we introduce a statistical method to infer a detailed energy landscape of how a protein folds and binds to a ligand by combining information from many mutated variants. We fit a thermodynamic model describing the bound, unbound, and unfolded states to high quality data of protein G domain B1 binding to IgG-Fc. We infer distinct folding and binding energies for each mutation providing a detailed view of how mutations affect binding and stability across the protein. We accurately infer the folding energy of each variant in physical units, validated by independent data, whereas previous high-throughput methods could only measure indirect changes in stability. While we assume an additive sequence-energy relationship, the binding fraction is epistatic due its nonlinear relation to energy. Despite having no epistasis in energy, our model explains much of the observed epistasis in binding fraction, with the remaining epistasis identifying conformationally dynamic regions.
Collapse
Affiliation(s)
- Jakub Otwinowski
- Biology Department, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
120
|
The fitness landscape of the codon space across environments. Heredity (Edinb) 2018; 121:422-437. [PMID: 30127529 DOI: 10.1038/s41437-018-0125-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 06/16/2018] [Accepted: 06/18/2018] [Indexed: 12/24/2022] Open
Abstract
Fitness landscapes map the relationship between genotypes and fitness. However, most fitness landscape studies ignore the genetic architecture imposed by the codon table and thereby neglect the potential role of synonymous mutations. To quantify the fitness effects of synonymous mutations and their potential impact on adaptation on a fitness landscape, we use a new software based on Bayesian Monte Carlo Markov Chain methods and re-estimate selection coefficients of all possible codon mutations across 9 amino acid positions in Saccharomyces cerevisiae Hsp90 across 6 environments. We quantify the distribution of fitness effects of synonymous mutations and show that it is dominated by many mutations of small or no effect and few mutations of larger effect. We then compare the shape of the codon fitness landscape across amino acid positions and environments, and quantify how the consideration of synonymous fitness effects changes the evolutionary dynamics on these fitness landscapes. Together these results highlight a possible role of synonymous mutations in adaptation and indicate the potential mis-inference when they are neglected in fitness landscape studies.
Collapse
|
121
|
Abstract
The sequence space of five protein superfamilies was investigated by constructing sequence networks. The nodes represent individual sequences, and two nodes are connected by an edge if the global sequence identity of two sequences exceeds a threshold. The networks were characterized by their degree distribution (number of nodes with a given number of neighbors) and by their fractal network dimension. Although the five protein families differed in sequence length, fold, and domain arrangement, their network properties were similar. The fractal network dimension Df was distance-dependent: a high dimension for single and double mutants (Df = 4.0), which dropped to Df = 0.7-1.0 at 90% sequence identity, and increased to Df = 3.5-4.5 below 70% sequence identity. The distance dependency of the network dimension is consistent with evolutionary constraints for functional proteins. While random single and double mutations often result in a functional protein, the accumulation of more than ten mutations is dominated by epistasis. The networks of the five protein families were highly inhomogeneous with few highly connected communities ("hub sequences") and a large number of smaller and less connected communities. The degree distributions followed a power-law distribution with similar scaling exponents close to 1. Because the hub sequences have a large number of functional neighbors, they are expected to be robust toward possible deleterious effects of mutations. Because of their robustness, hub sequences have the potential of high innovability, with additional mutations readily inducing new functions. Therefore, they form hotspots of evolution and are promising candidates as starting points for directed evolution experiments in biotechnology.
Collapse
|
122
|
Abstract
Genotype-phenotype relationships are notoriously complicated. Idiosyncratic interactions between specific combinations of mutations occur and are difficult to predict. Yet it is increasingly clear that many interactions can be understood in terms of global epistasis. That is, mutations may act additively on some underlying, unobserved trait, and this trait is then transformed via a nonlinear function to the observed phenotype as a result of subsequent biophysical and cellular processes. Here we infer the shape of such global epistasis in three proteins, based on published high-throughput mutagenesis data. To do so, we develop a maximum-likelihood inference procedure using a flexible family of monotonic nonlinear functions spanned by an I-spline basis. Our analysis uncovers dramatic nonlinearities in all three proteins; in some proteins a model with global epistasis accounts for virtually all of the measured variation, whereas in others we find substantial local epistasis as well. This method allows us to test hypotheses about the form of global epistasis and to distinguish variance components attributable to global epistasis, local epistasis, and measurement error.
Collapse
|
123
|
Pairwise and higher-order genetic interactions during the evolution of a tRNA. Nature 2018; 558:117-121. [PMID: 29849145 PMCID: PMC6193533 DOI: 10.1038/s41586-018-0170-7] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Accepted: 04/09/2018] [Indexed: 01/09/2023]
Abstract
A central question in genetics and evolution is the extent to which the outcomes of mutations change depending on the genetic context in which they occur1-3. Pairwise interactions between mutations have been systematically mapped within4-18 and between 19 genes, and have been shown to contribute substantially to phenotypic variation among individuals 20 . However, the extent to which genetic interactions themselves are stable or dynamic across genotypes is unclear21, 22. Here we quantify more than 45,000 genetic interactions between the same 87 pairs of mutations across more than 500 closely related genotypes of a yeast tRNA. Notably, all pairs of mutations interacted in at least 9% of genetic backgrounds and all pairs switched from interacting positively to interacting negatively in different genotypes (false discovery rate < 0.1). Higher-order interactions are also abundant and dynamic across genotypes. The epistasis in this tRNA means that all individual mutations switch from detrimental to beneficial, even in closely related genotypes. As a consequence, accurate genetic prediction requires mutation effects to be measured across different genetic backgrounds and the use of higher-order epistatic terms.
Collapse
|
124
|
Wang CY, Chang PM, Ary ML, Allen BD, Chica RA, Mayo SL, Olafson BD. ProtaBank: A repository for protein design and engineering data. Protein Sci 2018; 27:1113-1124. [PMID: 29575358 PMCID: PMC5980626 DOI: 10.1002/pro.3406] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Revised: 03/13/2018] [Accepted: 03/21/2018] [Indexed: 01/01/2023]
Abstract
We present ProtaBank, a repository for storing, querying, analyzing, and sharing protein design and engineering data in an actively maintained and updated database. ProtaBank provides a format to describe and compare all types of protein mutational data, spanning a wide range of properties and techniques. It features a user‐friendly web interface and programming layer that streamlines data deposition and allows for batch input and queries. The database schema design incorporates a standard format for reporting protein sequences and experimental data that facilitates comparison of results across different data sets. A suite of analysis and visualization tools are provided to facilitate discovery, to guide future designs, and to benchmark and train new predictive tools and algorithms. ProtaBank will provide a valuable resource to the protein engineering community by storing and safeguarding newly generated data, allowing for fast searching and identification of relevant data from the existing literature, and exploring correlations between disparate data sets. ProtaBank invites researchers to contribute data to the database to make it accessible for search and analysis. ProtaBank is available at https://protabank.org.
Collapse
Affiliation(s)
- Connie Y Wang
- Protabit LLC, 129 N. Hill Avenue, Suite 102, Pasadena, California, 91106
| | - Paul M Chang
- Protabit LLC, 129 N. Hill Avenue, Suite 102, Pasadena, California, 91106
| | - Marie L Ary
- Protabit LLC, 129 N. Hill Avenue, Suite 102, Pasadena, California, 91106
| | - Benjamin D Allen
- Protabit LLC, 129 N. Hill Avenue, Suite 102, Pasadena, California, 91106.,Department of Biochemistry and Molecular Biology, and the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, 16802
| | - Roberto A Chica
- Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Stephen L Mayo
- Protabit LLC, 129 N. Hill Avenue, Suite 102, Pasadena, California, 91106.,Division of Biology and Biological Engineering, and Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California, 91125
| | - Barry D Olafson
- Protabit LLC, 129 N. Hill Avenue, Suite 102, Pasadena, California, 91106
| |
Collapse
|
125
|
Platt A, Weber CC, Liberles DA. Protein evolution depends on multiple distinct population size parameters. BMC Evol Biol 2018; 18:17. [PMID: 29422024 PMCID: PMC5806465 DOI: 10.1186/s12862-017-1085-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 11/20/2017] [Indexed: 01/08/2023] Open
Abstract
That population size affects the fate of new mutations arising in genomes, modulating both how frequently they arise and how efficiently natural selection is able to filter them, is well established. It is therefore clear that these distinct roles for population size that characterize different processes should affect the evolution of proteins and need to be carefully defined. Empirical evidence is consistent with a role for demography in influencing protein evolution, supporting the idea that functional constraints alone do not determine the composition of coding sequences. Given that the relationship between population size, mutant fitness and fixation probability has been well characterized, estimating fitness from observed substitutions is well within reach with well-formulated models. Molecular evolution research has, therefore, increasingly begun to leverage concepts from population genetics to quantify the selective effects associated with different classes of mutation. However, in order for this type of analysis to provide meaningful information about the intra- and inter-specific evolution of coding sequences, a clear definition of concepts of population size, what they influence, and how they are best parameterized is essential. Here, we present an overview of the many distinct concepts that “population size” and “effective population size” may refer to, what they represent for studying proteins, and how this knowledge can be harnessed to produce better specified models of protein evolution.
Collapse
Affiliation(s)
- Alexander Platt
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA
| | - Claudia C Weber
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA.
| |
Collapse
|
126
|
Buchholz PCF, Fademrecht S, Pleiss J. Percolation in protein sequence space. PLoS One 2017; 12:e0189646. [PMID: 29261740 PMCID: PMC5738032 DOI: 10.1371/journal.pone.0189646] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 11/28/2017] [Indexed: 01/08/2023] Open
Abstract
The currently known protein sequences are not distributed equally in sequence space, but cluster into families. Analyzing the cluster size distribution gives a glimpse of the large and unknown extant protein sequence space, which has been explored during evolution. For six protein superfamilies with different fold and function, the cluster size distributions followed a power law with slopes between 2.4 and 3.3, which represent upper limits to the cluster distribution of extant sequences. The power law distribution of cluster sizes is in accordance with percolation theory and strongly supports connectedness of extant sequence space. Percolation of extant sequence space has three major consequences: (1) It transforms our view of sequence space as a highly connected network where each sequence has multiple neighbors, and each pair of sequences is connected by many different paths. A high degree of connectedness is a necessary condition of efficient evolution, because it overcomes the possible blockage by sign epistasis and reciprocal sign epistasis. (2) The Fisher exponent is an indicator of connectedness and saturation of sequence space of each protein superfamily. (3) All clusters are expected to be connected by extant sequences that become apparent as a higher portion of extant sequence space becomes known. Being linked to biochemically distinct homologous families, bridging sequences are promising enzyme candidates for applications in biotechnology because they are expected to have substrate ambiguity or catalytic promiscuity.
Collapse
Affiliation(s)
- Patrick C. F. Buchholz
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Stuttgart, Germany
| | - Silvia Fademrecht
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Stuttgart, Germany
| | - Jürgen Pleiss
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Stuttgart, Germany
- * E-mail:
| |
Collapse
|
127
|
Crona K, Gavryushkin A, Greene D, Beerenwinkel N. Inferring genetic interactions from comparative fitness data. eLife 2017; 6. [PMID: 29260711 PMCID: PMC5737811 DOI: 10.7554/elife.28629] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Accepted: 11/21/2017] [Indexed: 01/13/2023] Open
Abstract
Darwinian fitness is a central concept in evolutionary biology. In practice, however, it is hardly possible to measure fitness for all genotypes in a natural population. Here, we present quantitative tools to make inferences about epistatic gene interactions when the fitness landscape is only incompletely determined due to imprecise measurements or missing observations. We demonstrate that genetic interactions can often be inferred from fitness rank orders, where all genotypes are ordered according to fitness, and even from partial fitness orders. We provide a complete characterization of rank orders that imply higher order epistasis. Our theory applies to all common types of gene interactions and facilitates comprehensive investigations of diverse genetic interactions. We analyzed various genetic systems comprising HIV-1, the malaria-causing parasite Plasmodium vivax, the fungus Aspergillus niger, and the TEM-family of β-lactamase associated with antibiotic resistance. For all systems, our approach revealed higher order interactions among mutations.
Collapse
Affiliation(s)
- Kristina Crona
- Department of Mathematics and Statistics, American University, Washington, DC, United States
| | - Alex Gavryushkin
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Devin Greene
- Department of Mathematics and Statistics, American University, Washington, DC, United States
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
128
|
Pressman A, Moretti JE, Campbell GW, Müller UF, Chen IA. Analysis of in vitro evolution reveals the underlying distribution of catalytic activity among random sequences. Nucleic Acids Res 2017. [PMID: 28645146 PMCID: PMC5737207 DOI: 10.1093/nar/gkx540] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
The emergence of catalytic RNA is believed to have been a key event during the origin of life. Understanding how catalytic activity is distributed across random sequences is fundamental to estimating the probability that catalytic sequences would emerge. Here, we analyze the in vitro evolution of triphosphorylating ribozymes and translate their fitnesses into absolute estimates of catalytic activity for hundreds of ribozyme families. The analysis efficiently identified highly active ribozymes and estimated catalytic activity with good accuracy. The evolutionary dynamics follow Fisher's Fundamental Theorem of Natural Selection and a corollary, permitting retrospective inference of the distribution of fitness and activity in the random sequence pool for the first time. The frequency distribution of rate constants appears to be log-normal, with a surprisingly steep dropoff at higher activity, consistent with a mechanism for the emergence of activity as the product of many independent contributions.
Collapse
Affiliation(s)
- Abe Pressman
- Department of Chemistry and Biochemistry 9510, University of California, Santa Barbara, CA 93106, USA.,Program in Chemical Engineering, University of California, Santa Barbara, CA 93106, USA
| | - Janina E Moretti
- Department of Chemistry and Biochemistry, University of California, San Diego, CA 92093, USA
| | - Gregory W Campbell
- Department of Chemistry and Biochemistry 9510, University of California, Santa Barbara, CA 93106, USA.,Program in Biomolecular Sciences and Engineering, University of California, Santa Barbara, CA 93106, USA
| | - Ulrich F Müller
- Department of Chemistry and Biochemistry, University of California, San Diego, CA 92093, USA
| | - Irene A Chen
- Department of Chemistry and Biochemistry 9510, University of California, Santa Barbara, CA 93106, USA.,Program in Biomolecular Sciences and Engineering, University of California, Santa Barbara, CA 93106, USA
| |
Collapse
|
129
|
Gene Conversion Facilitates Adaptive Evolution on Rugged Fitness Landscapes. Genetics 2017; 207:1577-1589. [PMID: 28978673 DOI: 10.1534/genetics.117.300350] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 09/30/2017] [Indexed: 01/11/2023] Open
Abstract
Gene conversion is a ubiquitous phenomenon that leads to the exchange of genetic information between homologous DNA regions and maintains coevolving multi-gene families in most prokaryotic and eukaryotic organisms. In this paper, we study its implications for the evolution of a single functional gene with a silenced duplicate, using two different models of evolution on rugged fitness landscapes. Our analytical and numerical results show that, by helping to circumvent valleys of low fitness, gene conversion with a passive duplicate gene can cause a significant speedup of adaptation, which depends nontrivially on the frequency of gene conversion and the structure of the landscape. We find that stochastic effects due to finite population sizes further increase the likelihood of exploiting this evolutionary pathway. A universal feature appearing in both deterministic and stochastic analysis of our models is the existence of an optimal gene conversion rate, which maximizes the speed of adaptation. Our results reveal the potential for duplicate genes to act as a "scratch paper" that frees evolution from being limited to strictly beneficial mutations in strongly selective environments.
Collapse
|
130
|
Starr TN, Picton LK, Thornton JW. Alternative evolutionary histories in the sequence space of an ancient protein. Nature 2017; 549:409-413. [PMID: 28902834 PMCID: PMC6214350 DOI: 10.1038/nature23902] [Citation(s) in RCA: 112] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 08/08/2017] [Indexed: 12/28/2022]
Abstract
To understand why molecular evolution turned out as it did, we must characterize not only the path that evolution followed across the space of possible molecular sequences but also the many alternative trajectories that could have been taken but were not. A large-scale comparison of real and possible histories would establish whether the outcome of evolution represents an optimal state driven by natural selection or the contingent product of historical chance events; it would also reveal how the underlying distribution of functions across sequence space shaped historical evolution. Here we combine ancestral protein reconstruction with deep mutational scanning to characterize alternative histories in the sequence space around an ancient transcription factor, which evolved a novel biological function through well-characterized mechanisms. We find hundreds of alternative protein sequences that use diverse biochemical mechanisms to perform the derived function at least as well as the historical outcome. These alternatives all require prior permissive substitutions that do not enhance the derived function, but not all require the same permissive changes that occurred during history. We find that if evolution had begun from a different starting point within the network of sequences encoding the ancestral function, outcomes with different genetic and biochemical forms would probably have resulted; this contingency arises from the distribution of functional variants in sequence space and epistasis between residues. Our results illuminate the topology of the vast space of possibilities from which history sampled one path, highlighting how the outcome of evolution depends on a serial chain of compounding chance events.
Collapse
Affiliation(s)
- Tyler N Starr
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois 60637, USA
| | - Lora K Picton
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA
| | - Joseph W Thornton
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| |
Collapse
|
131
|
Rubin AF, Gelman H, Lucas N, Bajjalieh SM, Papenfuss AT, Speed TP, Fowler DM. A statistical framework for analyzing deep mutational scanning data. Genome Biol 2017; 18:150. [PMID: 28784151 PMCID: PMC5547491 DOI: 10.1186/s13059-017-1272-5] [Citation(s) in RCA: 124] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 07/06/2017] [Indexed: 11/10/2022] Open
Abstract
Deep mutational scanning is a widely used method for multiplex measurement of functional consequences of protein variants. We developed a new deep mutational scanning statistical model that generates error estimates for each measurement, capturing both sampling error and consistency between replicates. We apply our model to one novel and five published datasets comprising 243,732 variants and demonstrate its superiority in removing noisy variants and conducting hypothesis testing. Simulations show our model applies to scans based on cell growth or binding and handles common experimental errors. We implemented our model in Enrich2, software that can empower researchers analyzing deep mutational scanning data.
Collapse
Affiliation(s)
- Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, VIC, 3010, Australia.,Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, 3000, Australia.,Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
| | - Hannah Gelman
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA.,Institute for Protein Design, University of Washington, Seattle, WA, 98195, USA
| | - Nathan Lucas
- Department of Pathology, University of Washington, Seattle, WA, 98195, USA
| | - Sandra M Bajjalieh
- Department of Pathology, University of Washington, Seattle, WA, 98195, USA
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, VIC, 3010, Australia.,Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, 3000, Australia.,Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, 3010, Australia.,Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Terence P Speed
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA. .,Department of Bioengineering, University of Washington, Seattle, WA, 98195, USA.
| |
Collapse
|
132
|
Pál C, Papp B. Evolution of complex adaptations in molecular systems. Nat Ecol Evol 2017; 1:1084-1092. [PMID: 28782044 PMCID: PMC5540182 DOI: 10.1038/s41559-017-0228-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 05/02/2017] [Indexed: 12/31/2022]
Abstract
A central challenge in evolutionary biology concerns the mechanisms by which complex adaptations arise. Such adaptations depend on the fixation of multiple, highly specific mutations, where intermediate stages of evolution seemingly provide little or no benefit. It is generally assumed that the establishment of complex adaptations is very slow in nature, as evolution of such traits demands special population genetic or environmental circumstances. However, blueprints of complex adaptations in molecular systems are pervasive, indicating that they can readily evolve. We discuss the prospects and limitations of non-adaptive scenarios, which assume multiple neutral or deleterious steps in the evolution of complex adaptations. Next, we examine how complex adaptations can evolve by natural selection in changing environment. Finally, we argue that molecular 'springboards', such as phenotypic heterogeneity and promiscuous interactions facilitate this process by providing access to new adaptive paths.
Collapse
Affiliation(s)
- Csaba Pál
- Synthetic and Systems Biology Unit, Biological Research Center, Szeged, 6726, Hungary.
| | - Balázs Papp
- Synthetic and Systems Biology Unit, Biological Research Center, Szeged, 6726, Hungary
| |
Collapse
|
133
|
Sailer ZR, Harms MJ. High-order epistasis shapes evolutionary trajectories. PLoS Comput Biol 2017; 13:e1005541. [PMID: 28505183 PMCID: PMC5448810 DOI: 10.1371/journal.pcbi.1005541] [Citation(s) in RCA: 65] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Revised: 05/30/2017] [Accepted: 04/24/2017] [Indexed: 01/02/2023] Open
Abstract
High-order epistasis—where the effect of a mutation is determined by interactions with two or more other mutations—makes small, but detectable, contributions to genotype-fitness maps. While epistasis between pairs of mutations is known to be an important determinant of evolutionary trajectories, the evolutionary consequences of high-order epistasis remain poorly understood. To determine the effect of high-order epistasis on evolutionary trajectories, we computationally removed high-order epistasis from experimental genotype-fitness maps containing all binary combinations of five mutations. We then compared trajectories through maps both with and without high-order epistasis. We found that high-order epistasis strongly shapes the accessibility and probability of evolutionary trajectories. A closer analysis revealed that the magnitude of epistasis, not its order, predicts is effects on evolutionary trajectories. We further find that high-order epistasis makes it impossible to predict evolutionary trajectories from the individual and paired effects of mutations. We therefore conclude that high-order epistasis profoundly shapes evolutionary trajectories through genotype-fitness maps. A key goal for evolutionary biologists is understanding why one evolutionary trajectory is taken rather than others. This requires understanding how individual mutations, as well as interactions between them, determine the accessibility of evolutionary pathways. We used a robust statistical analysis to reveal interactions between up to five mutations in published datasets, meaning that the effect of a mutation can depend on the presence or absence of four other mutations. Simulations reveal that these interactions strongly shape evolutionary trajectories. These interactions lead to profound unpredictability in evolution, as one cannot use the effect of a mutation in the ancestor to predict its effect later in the trajectory.
Collapse
Affiliation(s)
- Zachary R. Sailer
- Institute of Molecular Biology, University of Oregon, Eugene, OR, USA
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR, USA
| | - Michael J. Harms
- Institute of Molecular Biology, University of Oregon, Eugene, OR, USA
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR, USA
- * E-mail:
| |
Collapse
|
134
|
Kumar A, Natarajan C, Moriyama H, Witt CC, Weber RE, Fago A, Storz JF. Stability-Mediated Epistasis Restricts Accessible Mutational Pathways in the Functional Evolution of Avian Hemoglobin. Mol Biol Evol 2017; 34:1240-1251. [PMID: 28201714 PMCID: PMC5400398 DOI: 10.1093/molbev/msx085] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
If the fitness effects of amino acid mutations are conditional on genetic background, then mutations can have different effects depending on the sequential order in which they occur during evolutionary transitions in protein function. A key question concerns the fraction of possible mutational pathways connecting alternative functional states that involve transient reductions in fitness. Here we examine the functional effects of multiple amino acid substitutions that contributed to an evolutionary transition in the oxygenation properties of avian hemoglobin (Hb). The set of causative changes included mutations at intradimer interfaces of the Hb tetramer. Replacements at such sites may be especially likely to have epistatic effects on Hb function since residues at intersubunit interfaces are enmeshed in networks of salt bridges and hydrogen bonds between like and unlike subunits; mutational reconfigurations of these atomic contacts can affect allosteric transitions in quaternary structure and the propensity for tetramer-dimer dissociation. We used ancestral protein resurrection in conjunction with a combinatorial protein engineering approach to synthesize genotypes representing the complete set of mutational intermediates in all possible forward pathways that connect functionally distinct ancestral and descendent genotypes. The experiments revealed that 1/2 of all possible forward pathways included mutational intermediates with aberrant functional properties because particular combinations of mutations promoted tetramer-dimer dissociation. The subset of mutational pathways with unstable intermediates may be selectively inaccessible, representing evolutionary roads not taken. The experimental results also demonstrate how epistasis for particular functional properties of proteins may be mediated indirectly by mutational effects on quaternary structural stability.
Collapse
Affiliation(s)
- Amit Kumar
- School of Biological Sciences, University of Nebraska, Lincoln, NE
| | | | - Hideaki Moriyama
- School of Biological Sciences, University of Nebraska, Lincoln, NE
| | - Christopher C. Witt
- Department of Biology, University of New Mexico, Albuquerque, NM
- Museum of Southwestern Biology, University of New Mexico, Albuquerque, NM
| | - Roy E. Weber
- Zoophysiology, Department of Bioscience, Aarhus University, Aarhus, Denmark
| | - Angela Fago
- Zoophysiology, Department of Bioscience, Aarhus University, Aarhus, Denmark
| | - Jay F. Storz
- School of Biological Sciences, University of Nebraska, Lincoln, NE
| |
Collapse
|
135
|
Ipe J, Swart M, Burgess KS, Skaar TC. High-Throughput Assays to Assess the Functional Impact of Genetic Variants: A Road Towards Genomic-Driven Medicine. Clin Transl Sci 2017; 10:67-77. [PMID: 28213901 PMCID: PMC5355973 DOI: 10.1111/cts.12440] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 01/03/2017] [Indexed: 01/08/2023] Open
Affiliation(s)
- J Ipe
- Indiana University School of MedicineDepartment of MedicineDivision of Clinical PharmacologyIndianapolisIndianaUSA
| | - M Swart
- Indiana University School of MedicineDepartment of MedicineDivision of Clinical PharmacologyIndianapolisIndianaUSA
| | - KS Burgess
- Indiana University School of MedicineDepartment of MedicineDivision of Clinical PharmacologyIndianapolisIndianaUSA
- Indiana University School of MedicineDepartment of Pharmacology and ToxicologyIndianapolisIndianaUSA
| | - TC Skaar
- Indiana University School of MedicineDepartment of MedicineDivision of Clinical PharmacologyIndianapolisIndianaUSA
| |
Collapse
|
136
|
Zagorski M, Burda Z, Waclaw B. Beyond the Hypercube: Evolutionary Accessibility of Fitness Landscapes with Realistic Mutational Networks. PLoS Comput Biol 2016; 12:e1005218. [PMID: 27935934 PMCID: PMC5147777 DOI: 10.1371/journal.pcbi.1005218] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 10/23/2016] [Indexed: 01/04/2023] Open
Abstract
Evolutionary pathways describe trajectories of biological evolution in the space of different variants of organisms (genotypes). The probability of existence and the number of evolutionary pathways that lead from a given genotype to a better-adapted genotype are important measures of accessibility of local fitness optima and the reproducibility of evolution. Both quantities have been studied in simple mathematical models where genotypes are represented as binary sequences of two types of basic units, and the network of permitted mutations between the genotypes is a hypercube graph. However, it is unclear how these results translate to the biologically relevant case in which genotypes are represented by sequences of more than two units, for example four nucleotides (DNA) or 20 amino acids (proteins), and the mutational graph is not the hypercube. Here we investigate accessibility of the best-adapted genotype in the general case of K > 2 units. Using computer generated and experimental fitness landscapes we show that accessibility of the global fitness maximum increases with K and can be much higher than for binary sequences. The increase in accessibility comes from the increase in the number of indirect trajectories exploited by evolution for higher K. As one of the consequences, the fraction of genotypes that are accessible increases by three orders of magnitude when the number of units K increases from 2 to 16 for landscapes of size N ∼ 106 genotypes. This suggests that evolution can follow many different trajectories on such landscapes and the reconstruction of evolutionary pathways from experimental data might be an extremely difficult task. Biological evolution is driven by heritable, genetic alterations that affect the fitness of organisms. However, the pool of “fitter” variants (genotypes) is often restricted and it is not at all obvious how evolution finds its way from low-fitness to high-fitness genotypes in a complex, multidimensional “fitness landscapes” with many peaks (fit organisms) and valleys (unfit ones). To address this question we investigate how likely it is for biological evolution to find a way “uphill” from a lower-fitness organism to the best adapted organism. We discover that the accessibility of the fittest organism depends on the number of types of basic “units” used to encode genotypes. These units can be, for example, the four DNA nucleotides A,T,C,G, or the ∼20 amino acids used for synthesizing proteins, and the choice of the most appropriate unit is dictated by how the genotypes and the fitnesses are related—a relationship that researchers have begun to unveil only recently. We find that increasing the number of units strongly increases the probability that there will be at least one uphill path to the best-adapted genotype, and the number of evolutionary pathways leading to it. Our findings suggest that biological evolution can follow many more pathways than previously thought.
Collapse
Affiliation(s)
- Marcin Zagorski
- Institute of Science and Technology (IST) Austria, Klosterneuburg, Austria
- Institute of Physics, Jagiellonian University, Krakow, Poland
- * E-mail:
| | - Zdzislaw Burda
- Faculty of Physics and Applied Computer Science, AGH University of Science and Technology, Krakow, Poland
| | - Bartlomiej Waclaw
- School of Physics and Astronomy, The University of Edinburgh, Edinburgh, United Kingdom
- Centre for Synthetic and Systems Biology, The University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|