1
|
Wagner A. Genotype sampling for deep-learning assisted experimental mapping of a combinatorially complete fitness landscape. Bioinformatics 2024; 40:btae317. [PMID: 38745436 PMCID: PMC11132821 DOI: 10.1093/bioinformatics/btae317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/21/2024] [Accepted: 05/14/2024] [Indexed: 05/16/2024] Open
Abstract
MOTIVATION Experimental characterization of fitness landscapes, which map genotypes onto fitness, is important for both evolutionary biology and protein engineering. It faces a fundamental obstacle in the astronomical number of genotypes whose fitness needs to be measured for any one protein. Deep learning may help to predict the fitness of many genotypes from a smaller neural network training sample of genotypes with experimentally measured fitness. Here I use a recently published experimentally mapped fitness landscape of more than 260 000 protein genotypes to ask how such sampling is best performed. RESULTS I show that multilayer perceptrons, recurrent neural networks, convolutional networks, and transformers, can explain more than 90% of fitness variance in the data. In addition, 90% of this performance is reached with a training sample comprising merely ≈103 sequences. Generalization to unseen test data is best when training data is sampled randomly and uniformly, or sampled to minimize the number of synonymous sequences. In contrast, sampling to maximize sequence diversity or codon usage bias reduces performance substantially. These observations hold for more than one network architecture. Simple sampling strategies may perform best when training deep learning neural networks to map fitness landscapes from experimental data. AVAILABILITY AND IMPLEMENTATION The fitness landscape data analyzed here is publicly available as described previously (Papkou et al. 2023). All code used to analyze this landscape is publicly available at https://github.com/andreas-wagner-uzh/fitness_landscape_sampling.
Collapse
Affiliation(s)
- Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode,1015 Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, 87501 NM, United States
| |
Collapse
|
2
|
Li J, Amado A, Bank C. Rapid adaptation of recombining populations on tunable fitness landscapes. Mol Ecol 2024; 33:e16900. [PMID: 36855836 DOI: 10.1111/mec.16900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 01/28/2023] [Accepted: 02/01/2023] [Indexed: 03/02/2023]
Abstract
How does standing genetic variation affect polygenic adaptation in recombining populations? Despite a large body of work in quantitative genetics, epistatic and weak additive fitness effects among simultaneously segregating genetic variants are difficult to capture experimentally or to predict theoretically. In this study, we simulated adaptation on fitness landscapes with tunable ruggedness driven by standing genetic variation in recombining populations. We confirmed that recombination hinders the movement of a population through a rugged fitness landscape. When surveying the effect of epistasis on the fixation of alleles, we found that the combined effects of high ruggedness and high recombination probabilities lead to preferential fixation of alleles that had a high initial frequency. This indicates that positive epistatic alleles escape from being broken down by recombination when they start at high frequency. We further extract direct selection coefficients and pairwise epistasis along the adaptive path. When taking the final fixed genotype as the reference genetic background, we observe that, along the adaptive path, beneficial direct selection appears stronger and pairwise epistasis weaker than in the underlying fitness landscape. Quantitatively, the ratio of epistasis and direct selection is smaller along the adaptive path (≈ 1 ) than expected. Thus, adaptation on a rugged fitness landscape may lead to spurious signals of direct selection generated through epistasis. Our study highlights how the interplay of epistasis and recombination constrains the adaptation of a diverse population to a new environment.
Collapse
Affiliation(s)
- Juan Li
- Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
- Swiss Institute for Bioinformatics, Lausanne, Switzerland
| | - André Amado
- Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
- Swiss Institute for Bioinformatics, Lausanne, Switzerland
| | - Claudia Bank
- Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
- Swiss Institute for Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
3
|
Weaver DT, King ES, Maltas J, Scott JG. Reinforcement learning informs optimal treatment strategies to limit antibiotic resistance. Proc Natl Acad Sci U S A 2024; 121:e2303165121. [PMID: 38607932 PMCID: PMC11032439 DOI: 10.1073/pnas.2303165121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 02/23/2024] [Indexed: 04/14/2024] Open
Abstract
Antimicrobial resistance was estimated to be associated with 4.95 million deaths worldwide in 2019. It is possible to frame the antimicrobial resistance problem as a feedback-control problem. If we could optimize this feedback-control problem and translate our findings to the clinic, we could slow, prevent, or reverse the development of high-level drug resistance. Prior work on this topic has relied on systems where the exact dynamics and parameters were known a priori. In this study, we extend this work using a reinforcement learning (RL) approach capable of learning effective drug cycling policies in a system defined by empirically measured fitness landscapes. Crucially, we show that it is possible to learn effective drug cycling policies despite the problems of noisy, limited, or delayed measurement. Given access to a panel of 15 [Formula: see text]-lactam antibiotics with which to treat the simulated Escherichia coli population, we demonstrate that RL agents outperform two naive treatment paradigms at minimizing the population fitness over time. We also show that RL agents approach the performance of the optimal drug cycling policy. Even when stochastic noise is introduced to the measurements of population fitness, we show that RL agents are capable of maintaining evolving populations at lower growth rates compared to controls. We further tested our approach in arbitrary fitness landscapes of up to 1,024 genotypes. We show that minimization of population fitness using drug cycles is not limited by increasing genome size. Our work represents a proof-of-concept for using AI to control complex evolutionary processes.
Collapse
Affiliation(s)
- Davis T. Weaver
- Case Western Reserve University School of Medicine, Cleveland, OH44106
- Translational Hematology Oncology Research, Cleveland Clinic, Cleveland, OH44106
| | - Eshan S. King
- Case Western Reserve University School of Medicine, Cleveland, OH44106
- Translational Hematology Oncology Research, Cleveland Clinic, Cleveland, OH44106
| | - Jeff Maltas
- Translational Hematology Oncology Research, Cleveland Clinic, Cleveland, OH44106
| | - Jacob G. Scott
- Case Western Reserve University School of Medicine, Cleveland, OH44106
- Translational Hematology Oncology Research, Cleveland Clinic, Cleveland, OH44106
- Department of Physics, Case Western Reserve University, Cleveland, OH44106
| |
Collapse
|
4
|
Sanchez A, Bajic D, Diaz-Colunga J, Skwara A, Vila JCC, Kuehn S. The community-function landscape of microbial consortia. Cell Syst 2023; 14:122-134. [PMID: 36796331 DOI: 10.1016/j.cels.2022.12.011] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 10/17/2022] [Accepted: 12/21/2022] [Indexed: 02/17/2023]
Abstract
Quantitatively linking the composition and function of microbial communities is a major aspiration of microbial ecology. Microbial community functions emerge from a complex web of molecular interactions between cells, which give rise to population-level interactions among strains and species. Incorporating this complexity into predictive models is highly challenging. Inspired by a similar problem in genetics of predicting quantitative phenotypes from genotypes, an ecological community-function (or structure-function) landscape could be defined that maps community composition and function. In this piece, we present an overview of our current understanding of these community landscapes, their uses, limitations, and open questions. We argue that exploiting the parallels between both landscapes could bring powerful predictive methodologies from evolution and genetics into ecology, providing a boost to our ability to engineer and optimize microbial consortia.
Collapse
Affiliation(s)
- Alvaro Sanchez
- Department of Ecology & Evolutionary Biology & Microbial Sciences Institute, Yale University, New Haven, CT, USA; Department of Microbial Biotechnology, CNB-CSIC, Campus de Cantoblanco, Madrid, Spain.
| | - Djordje Bajic
- Department of Ecology & Evolutionary Biology & Microbial Sciences Institute, Yale University, New Haven, CT, USA
| | - Juan Diaz-Colunga
- Department of Ecology & Evolutionary Biology & Microbial Sciences Institute, Yale University, New Haven, CT, USA
| | - Abigail Skwara
- Department of Ecology & Evolutionary Biology & Microbial Sciences Institute, Yale University, New Haven, CT, USA
| | - Jean C C Vila
- Department of Ecology & Evolutionary Biology & Microbial Sciences Institute, Yale University, New Haven, CT, USA
| | - Seppe Kuehn
- Center for the Physics of Evolving Systems, The Unviersity of Chicago, Chicago, IL, USA; Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA
| |
Collapse
|
5
|
On the sparsity of fitness functions and implications for learning. Proc Natl Acad Sci U S A 2022; 119:2109649118. [PMID: 34937698 PMCID: PMC8740588 DOI: 10.1073/pnas.2109649118] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/11/2021] [Indexed: 01/05/2023] Open
Abstract
The properties of proteins and other biological molecules are encoded in large part in the sequence of amino acids or nucleotides that defines them. Increasingly, researchers estimate functions that map sequences to a particular property using machine learning and related statistical approaches. However, an important question remains unanswered: How many experimental measurements are needed in order to accurately learn these “fitness” functions? We leverage perspectives from the fields of biophysics, evolutionary biology, and signal processing to develop a theoretical framework that enables us to make progress on answering this question. We demonstrate that this framework can be used to make useful calculations on real-world data and suggest how these calculations may be used to guide experiments. Fitness functions map biological sequences to a scalar property of interest. Accurate estimation of these functions yields biological insight and sets the foundation for model-based sequence design. However, the fitness datasets available to learn these functions are typically small relative to the large combinatorial space of sequences; characterizing how much data are needed for accurate estimation remains an open problem. There is a growing body of evidence demonstrating that empirical fitness functions display substantial sparsity when represented in terms of epistatic interactions. Moreover, the theory of Compressed Sensing provides scaling laws for the number of samples required to exactly recover a sparse function. Motivated by these results, we develop a framework to study the sparsity of fitness functions sampled from a generalization of the NK model, a widely used random field model of fitness functions. In particular, we present results that allow us to test the effect of the Generalized NK (GNK) model’s interpretable parameters—sequence length, alphabet size, and assumed interactions between sequence positions—on the sparsity of fitness functions sampled from the model and, consequently, the number of measurements required to exactly recover these functions. We validate our framework by demonstrating that GNK models with parameters set according to structural considerations can be used to accurately approximate the number of samples required to recover two empirical protein fitness functions and an RNA fitness function. In addition, we show that these GNK models identify important higher-order epistatic interactions in the empirical fitness functions using only structural information.
Collapse
|
6
|
Baquero F, Martínez JL, F. Lanza V, Rodríguez-Beltrán J, Galán JC, San Millán A, Cantón R, Coque TM. Evolutionary Pathways and Trajectories in Antibiotic Resistance. Clin Microbiol Rev 2021; 34:e0005019. [PMID: 34190572 PMCID: PMC8404696 DOI: 10.1128/cmr.00050-19] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Evolution is the hallmark of life. Descriptions of the evolution of microorganisms have provided a wealth of information, but knowledge regarding "what happened" has precluded a deeper understanding of "how" evolution has proceeded, as in the case of antimicrobial resistance. The difficulty in answering the "how" question lies in the multihierarchical dimensions of evolutionary processes, nested in complex networks, encompassing all units of selection, from genes to communities and ecosystems. At the simplest ontological level (as resistance genes), evolution proceeds by random (mutation and drift) and directional (natural selection) processes; however, sequential pathways of adaptive variation can occasionally be observed, and under fixed circumstances (particular fitness landscapes), evolution is predictable. At the highest level (such as that of plasmids, clones, species, microbiotas), the systems' degrees of freedom increase dramatically, related to the variable dispersal, fragmentation, relatedness, or coalescence of bacterial populations, depending on heterogeneous and changing niches and selective gradients in complex environments. Evolutionary trajectories of antibiotic resistance find their way in these changing landscapes subjected to random variations, becoming highly entropic and therefore unpredictable. However, experimental, phylogenetic, and ecogenetic analyses reveal preferential frequented paths (highways) where antibiotic resistance flows and propagates, allowing some understanding of evolutionary dynamics, modeling and designing interventions. Studies on antibiotic resistance have an applied aspect in improving individual health, One Health, and Global Health, as well as an academic value for understanding evolution. Most importantly, they have a heuristic significance as a model to reduce the negative influence of anthropogenic effects on the environment.
Collapse
Affiliation(s)
- F. Baquero
- Department of Microbiology, Ramón y Cajal University Hospital, Ramón y Cajal Institute for Health Research (IRYCIS), Network Center for Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - J. L. Martínez
- National Center for Biotechnology (CNB-CSIC), Madrid, Spain
| | - V. F. Lanza
- Department of Microbiology, Ramón y Cajal University Hospital, Ramón y Cajal Institute for Health Research (IRYCIS), Network Center for Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
- Central Bioinformatics Unit, Ramón y Cajal Institute for Health Research (IRYCIS), Madrid, Spain
| | - J. Rodríguez-Beltrán
- Department of Microbiology, Ramón y Cajal University Hospital, Ramón y Cajal Institute for Health Research (IRYCIS), Network Center for Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - J. C. Galán
- Department of Microbiology, Ramón y Cajal University Hospital, Ramón y Cajal Institute for Health Research (IRYCIS), Network Center for Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - A. San Millán
- National Center for Biotechnology (CNB-CSIC), Madrid, Spain
| | - R. Cantón
- Department of Microbiology, Ramón y Cajal University Hospital, Ramón y Cajal Institute for Health Research (IRYCIS), Network Center for Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - T. M. Coque
- Department of Microbiology, Ramón y Cajal University Hospital, Ramón y Cajal Institute for Health Research (IRYCIS), Network Center for Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| |
Collapse
|
7
|
Castle SD, Grierson CS, Gorochowski TE. Towards an engineering theory of evolution. Nat Commun 2021; 12:3326. [PMID: 34099656 PMCID: PMC8185075 DOI: 10.1038/s41467-021-23573-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 05/04/2021] [Indexed: 02/07/2023] Open
Abstract
Biological technologies are fundamentally unlike any other because biology evolves. Bioengineering therefore requires novel design methodologies with evolution at their core. Knowledge about evolution is currently applied to the design of biosystems ad hoc. Unless we have an engineering theory of evolution, we will neither be able to meet evolution's potential as an engineering tool, nor understand or limit its unintended consequences for our biological designs. Here, we propose the evotype as a helpful concept for engineering the evolutionary potential of biosystems, or other self-adaptive technologies, potentially beyond the realm of biology.
Collapse
Affiliation(s)
- Simeon D Castle
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Claire S Grierson
- School of Biological Sciences, University of Bristol, Bristol, UK
- BrisSynBio, University of Bristol, Bristol, UK
| | - Thomas E Gorochowski
- School of Biological Sciences, University of Bristol, Bristol, UK.
- BrisSynBio, University of Bristol, Bristol, UK.
| |
Collapse
|
8
|
Manrubia S, Cuesta JA, Aguirre J, Ahnert SE, Altenberg L, Cano AV, Catalán P, Diaz-Uriarte R, Elena SF, García-Martín JA, Hogeweg P, Khatri BS, Krug J, Louis AA, Martin NS, Payne JL, Tarnowski MJ, Weiß M. From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics. Phys Life Rev 2021; 38:55-106. [PMID: 34088608 DOI: 10.1016/j.plrev.2021.03.004] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/01/2021] [Indexed: 12/21/2022]
Abstract
Understanding how genotypes map onto phenotypes, fitness, and eventually organisms is arguably the next major missing piece in a fully predictive theory of evolution. We refer to this generally as the problem of the genotype-phenotype map. Though we are still far from achieving a complete picture of these relationships, our current understanding of simpler questions, such as the structure induced in the space of genotypes by sequences mapped to molecular structures, has revealed important facts that deeply affect the dynamical description of evolutionary processes. Empirical evidence supporting the fundamental relevance of features such as phenotypic bias is mounting as well, while the synthesis of conceptual and experimental progress leads to questioning current assumptions on the nature of evolutionary dynamics-cancer progression models or synthetic biology approaches being notable examples. This work delves with a critical and constructive attitude into our current knowledge of how genotypes map onto molecular phenotypes and organismal functions, and discusses theoretical and empirical avenues to broaden and improve this comprehension. As a final goal, this community should aim at deriving an updated picture of evolutionary processes soundly relying on the structural properties of genotype spaces, as revealed by modern techniques of molecular and functional analysis.
Collapse
Affiliation(s)
- Susanna Manrubia
- Department of Systems Biology, Centro Nacional de Biotecnología (CSIC), Madrid, Spain; Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain; Instituto de Biocomputación y Física de Sistemas Complejos (BiFi), Universidad de Zaragoza, Spain; UC3M-Santander Big Data Institute (IBiDat), Getafe, Madrid, Spain
| | - Jacobo Aguirre
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Centro de Astrobiología, CSIC-INTA, ctra. de Ajalvir km 4, 28850 Torrejón de Ardoz, Madrid, Spain
| | - Sebastian E Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK
| | | | - Alejandro V Cano
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pablo Catalán
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain; Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid, Spain
| | - Santiago F Elena
- Instituto de Biología Integrativa de Sistemas, I(2)SysBio (CSIC-UV), València, Spain; The Santa Fe Institute, Santa Fe, NM, USA
| | | | - Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Utrecht University, the Netherlands
| | - Bhavin S Khatri
- The Francis Crick Institute, London, UK; Department of Life Sciences, Imperial College London, London, UK
| | - Joachim Krug
- Institute for Biological Physics, University of Cologne, Köln, Germany
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK
| | - Nora S Martin
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Marcel Weiß
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| |
Collapse
|
9
|
Gilman J, Walls L, Bandiera L, Menolascina F. Statistical Design of Experiments for Synthetic Biology. ACS Synth Biol 2021; 10:1-18. [PMID: 33406821 DOI: 10.1021/acssynbio.0c00385] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The design and optimization of biological systems is an inherently complex undertaking that requires careful balancing of myriad synergistic and antagonistic variables. However, despite this complexity, much synthetic biology research is predicated on One Factor at A Time (OFAT) experimentation; the genetic and environmental variables affecting the activity of a system of interest are sequentially altered while all other variables are held constant. Beyond being time and resource intensive, OFAT experimentation crucially ignores the effect of interactions between factors. Given the ubiquity of interacting genetic and environmental factors in biology this failure to account for interaction effects in OFAT experimentation can result in the development of suboptimal systems. To address these limitations, an increasing number of studies have turned to Design of Experiments (DoE), a suite of methods that enable efficient, systematic exploration and exploitation of complex design spaces. This review provides an overview of DoE for synthetic biologists. Key concepts and commonly used experimental designs are introduced, and we discuss the advantages of DoE as compared to OFAT experimentation. We dissect the applicability of DoE in the context of synthetic biology and review studies which have successfully employed these methods, illustrating the potential of statistical experimental design to guide the design, characterization, and optimization of biological protocols, pathways, and processes.
Collapse
Affiliation(s)
- James Gilman
- Institute for Bioengineering, School of Engineering, University of Edinburgh, Edinburgh EH8 9YL, U.K
| | - Laura Walls
- Institute for Bioengineering, School of Engineering, University of Edinburgh, Edinburgh EH8 9YL, U.K
| | - Lucia Bandiera
- Institute for Bioengineering, School of Engineering, University of Edinburgh, Edinburgh EH8 9YL, U.K
| | - Filippo Menolascina
- Institute for Bioengineering, School of Engineering, University of Edinburgh, Edinburgh EH8 9YL, U.K
| |
Collapse
|
10
|
Sailer ZR, Shafik SH, Summers RL, Joule A, Patterson-Robert A, Martin RE, Harms MJ. Inferring a complete genotype-phenotype map from a small number of measured phenotypes. PLoS Comput Biol 2020; 16:e1008243. [PMID: 32991585 PMCID: PMC7546491 DOI: 10.1371/journal.pcbi.1008243] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2019] [Revised: 10/09/2020] [Accepted: 08/13/2020] [Indexed: 01/02/2023] Open
Abstract
Understanding evolution requires detailed knowledge of genotype-phenotype maps; however, it can be a herculean task to measure every phenotype in a combinatorial map. We have developed a computational strategy to predict the missing phenotypes from an incomplete, combinatorial genotype-phenotype map. As a test case, we used an incomplete genotype-phenotype dataset previously generated for the malaria parasite’s ‘chloroquine resistance transporter’ (PfCRT). Wild-type PfCRT (PfCRT3D7) lacks significant chloroquine (CQ) transport activity, but the introduction of the eight mutations present in the ‘Dd2’ isoform of PfCRT (PfCRTDd2) enables the protein to transport CQ away from its site of antimalarial action. This gain of a transport function imparts CQ resistance to the parasite. A combinatorial map between PfCRT3D7 and PfCRTDd2 consists of 256 genotypes, of which only 52 have had their CQ transport activities measured through expression in the Xenopus laevis oocyte. We trained a statistical model with these 52 measurements to infer the CQ transport activity for the remaining 204 combinatorial genotypes between PfCRT3D7 and PfCRTDd2. Our best-performing model incorporated a binary classifier, a nonlinear scale, and additive effects for each mutation. The addition of specific pairwise- and high-order-epistatic coefficients decreased the predictive power of the model. We evaluated our predictions by experimentally measuring the CQ transport activities of 24 additional PfCRT genotypes. The R2 value between our predicted and newly-measured phenotypes was 0.90. We then used the model to probe the accessibility of evolutionary trajectories through the map. Approximately 1% of the possible trajectories between PfCRT3D7 and PfCRTDd2 are accessible; however, none of the trajectories entailed eight successive increases in CQ transport activity. These results demonstrate that phenotypes can be inferred with known uncertainty from a partial genotype-phenotype dataset. We also validated our approach against a collection of previously published genotype-phenotype maps. The model therefore appears general and should be applicable to a large number of genotype-phenotype maps. Biological macromolecules are built from chains of building blocks. The function of a macromolecule depends on the specific chemical properties of the building blocks that make it up. Macromolecules evolve through mutations that swap one building block for another. Understanding how biomolecules work and evolve therefore requires knowledge of the effects of mutations. The effects of mutations can be measured experimentally; however, because there are a vast number of possible combinations of mutations, it is often difficult to make enough measurements to understand biomolecular function and evolution. In this paper, we describe a simple method to predict the effects of mutations on biomolecules from a small number of measurements. This method works by appropriately averaging the effects of mutations seen in different contexts. We test the method by predicting the effects of mutations on a PfCRT—a macromolecule from the malarial parasite that confers drug resistance. We find that our method is fast and effective. Using a small number of measurements, we were able to gain insight into the evolutionary steps by which this macromolecule conferred drug resistance. To make this method accessible to other researchers, we have released it as an open-source software package: https://gpseer.readthedocs.io.
Collapse
Affiliation(s)
- Zachary R. Sailer
- Institute for Molecular Biology, University of Oregon, Eugene, OR, United States of America
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR, United States of America
| | - Sarah H. Shafik
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Robert L. Summers
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Alex Joule
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | | | - Rowena E. Martin
- Research School of Biology, Australian National University, Canberra, ACT, Australia
- * E-mail: (REM); (MJH)
| | - Michael J. Harms
- Institute for Molecular Biology, University of Oregon, Eugene, OR, United States of America
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR, United States of America
- * E-mail: (REM); (MJH)
| |
Collapse
|
11
|
Deconvolving mutational patterns of poliovirus outbreaks reveals its intrinsic fitness landscape. Nat Commun 2020; 11:377. [PMID: 31953427 PMCID: PMC6969152 DOI: 10.1038/s41467-019-14174-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Accepted: 12/16/2019] [Indexed: 01/08/2023] Open
Abstract
Vaccination has essentially eradicated poliovirus. Yet, its mutation rate is higher than that of viruses like HIV, for which no effective vaccine exists. To investigate this, we infer a fitness model for the poliovirus viral protein 1 (vp1), which successfully predicts in vitro fitness measurements. This is achieved by first developing a probabilistic model for the prevalence of vp1 sequences that enables us to isolate and remove data that are subject to strong vaccine-derived biases. The intrinsic fitness constraints derived for vp1, a capsid protein subject to antibody responses, are compared with those of analogous HIV proteins. We find that vp1 evolution is subject to tighter constraints, limiting its ability to evade vaccine-induced immune responses. Our analysis also indicates that circulating poliovirus strains in unimmunized populations serve as a reservoir that can seed outbreaks in spatio-temporally localized sub-optimally immunized populations. Poliovirus has a higher mutation rate than HIV, yet has been almost eradicated by vaccination while an effective vaccine against HIV does not exist. Here, the authors develop a fitness model for poliovirus viral protein 1 to show that it is subject to stringent evolutionary constraints that limit its ability to avoid vaccine-induced immune responses.
Collapse
|
12
|
Abstract
Evolvability is the ability of a biological system to produce phenotypic variation that is both heritable and adaptive. It has long been the subject of anecdotal observations and theoretical work. In recent years, however, the molecular causes of evolvability have been an increasing focus of experimental work. Here, we review recent experimental progress in areas as different as the evolution of drug resistance in cancer cells and the rewiring of transcriptional regulation circuits in vertebrates. This research reveals the importance of three major themes: multiple genetic and non-genetic mechanisms to generate phenotypic diversity, robustness in genetic systems, and adaptive landscape topography. We also discuss the mounting evidence that evolvability can evolve and the question of whether it evolves adaptively.
Collapse
|
13
|
Domingo J, Baeza-Centurion P, Lehner B. The Causes and Consequences of Genetic Interactions (Epistasis). Annu Rev Genomics Hum Genet 2019; 20:433-460. [PMID: 31082279 DOI: 10.1146/annurev-genom-083118-014857] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The same mutation can have different effects in different individuals. One important reason for this is that the outcome of a mutation can depend on the genetic context in which it occurs. This dependency is known as epistasis. In recent years, there has been a concerted effort to quantify the extent of pairwise and higher-order genetic interactions between mutations through deep mutagenesis of proteins and RNAs. This research has revealed two major components of epistasis: nonspecific genetic interactions caused by nonlinearities in genotype-to-phenotype maps, and specific interactions between particular mutations. Here, we provide an overview of our current understanding of the mechanisms causing epistasis at the molecular level, the consequences of genetic interactions for evolution and genetic prediction, and the applications of epistasis for understanding biology and determining macromolecular structures.
Collapse
Affiliation(s)
- Júlia Domingo
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , ,
| | - Pablo Baeza-Centurion
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , ,
| | - Ben Lehner
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , , .,Universitat Pompeu Fabra, 08003 Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
| |
Collapse
|
14
|
Pokusaeva VO, Usmanova DR, Putintseva EV, Espinar L, Sarkisyan KS, Mishin AS, Bogatyreva NS, Ivankov DN, Akopyan AV, Avvakumov SY, Povolotskaya IS, Filion GJ, Carey LB, Kondrashov FA. An experimental assay of the interactions of amino acids from orthologous sequences shaping a complex fitness landscape. PLoS Genet 2019; 15:e1008079. [PMID: 30969963 PMCID: PMC6476524 DOI: 10.1371/journal.pgen.1008079] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 04/22/2019] [Accepted: 03/11/2019] [Indexed: 11/18/2022] Open
Abstract
Characterizing the fitness landscape, a representation of fitness for a large set of genotypes, is key to understanding how genetic information is interpreted to create functional organisms. Here we determined the evolutionarily-relevant segment of the fitness landscape of His3, a gene coding for an enzyme in the histidine synthesis pathway, focusing on combinations of amino acid states found at orthologous sites of extant species. Just 15% of amino acids found in yeast His3 orthologues were always neutral while the impact on fitness of the remaining 85% depended on the genetic background. Furthermore, at 67% of sites, amino acid replacements were under sign epistasis, having both strongly positive and negative effect in different genetic backgrounds. 46% of sites were under reciprocal sign epistasis. The fitness impact of amino acid replacements was influenced by only a few genetic backgrounds but involved interaction of multiple sites, shaping a rugged fitness landscape in which many of the shortest paths between highly fit genotypes are inaccessible. An intuitive understanding of protein evolution dictates that, with the exception of adaptive substitutions, amino acid states should be freely exchangeable between the same gene from different species. However, the extent to which this assertion holds true has not been tested in a controlled experiment. Here, we show that whether an amino acid state can be exchanged between orthologues depends on other amino acid states in the same protein. Furthermore, we show that the mode of interaction of amino acid states is multidimensional. Assuming that amino acid replacements influence the protein in several independent ways substantially improves our ability to predict the effect of an amino acid state in a protein sequence that has not been observed in nature.
Collapse
Affiliation(s)
| | - Dinara R. Usmanova
- Department of Systems Biology, Columbia University, New York, NY, United States of America
| | | | - Lorena Espinar
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Karen S. Sarkisyan
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
- Medical Research Council London Institute of Medical Sciences, Imperial College London, London, United Kingdom
| | | | - Natalya S. Bogatyreva
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Moscow region, Russia
| | - Dmitry N. Ivankov
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Moscow region, Russia
| | - Arseniy V. Akopyan
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
| | - Sergey Ya. Avvakumov
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
| | - Inna S. Povolotskaya
- Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Moscow, Russia
| | - Guillaume J. Filion
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Lucas B. Carey
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Center for Quantitative Biology and Peking-Tsinghua Joint Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- * E-mail: (LBC); (FAK)
| | - Fyodor A. Kondrashov
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
- * E-mail: (LBC); (FAK)
| |
Collapse
|
15
|
Vasylenko L, Feldman MW, Papadimitriou C, Livnat A. Sex: The power of randomization. Theor Popul Biol 2019; 129:41-53. [PMID: 30638926 DOI: 10.1016/j.tpb.2018.11.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2018] [Revised: 10/11/2018] [Accepted: 11/01/2018] [Indexed: 10/27/2022]
Abstract
In evolutionary biology, randomness has been perceived as a force that, in and of itself, is capable of inventing: mutation creates new genetic information at random across the genome which leads to phenotypic change, which is then subject to selection. However, in science in general and in computer science in particular, the widespread use of randomness takes a different form. Here, randomization allows for the breaking of pattern, as seen for example in its removal of biases (patterns) by random sampling or random assignment to conditions. Combined with various forms of evaluation, this breaking of pattern becomes an extraordinarily powerful tool, as also seen in many randomized algorithms in computer science. Here we show that this power of randomness is harnessed in nature by sex and recombination. In a finite population, and under the assumption of interactions between genetic variants, sex and recombination allow selection to test how well an allele will perform in a sample of combinations of interacting genetic partners drawn at random from all possible such combinations; consequently, even a small number of tests of genotypes such as takes place in a finite population favors alleles that will most likely perform well in a vast number of yet unrealized genetic combinations. This power of randomization is not manifest in asexual populations.
Collapse
Affiliation(s)
- Liudmyla Vasylenko
- Department of Evolutionary and Environmental Biology and Institute of Evolution, University of Haifa, 3498838, Israel
| | | | | | - Adi Livnat
- Department of Evolutionary and Environmental Biology and Institute of Evolution, University of Haifa, 3498838, Israel.
| |
Collapse
|
16
|
Fragata I, Blanckaert A, Dias Louro MA, Liberles DA, Bank C. Evolution in the light of fitness landscape theory. Trends Ecol Evol 2019; 34:69-82. [DOI: 10.1016/j.tree.2018.10.009] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 10/16/2018] [Accepted: 10/17/2018] [Indexed: 01/28/2023]
|
17
|
Otwinowski J. Biophysical Inference of Epistasis and the Effects of Mutations on Protein Stability and Function. Mol Biol Evol 2018; 35:2345-2354. [PMID: 30085303 PMCID: PMC6188545 DOI: 10.1093/molbev/msy141] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Understanding the relationship between protein sequence, function, and stability is a fundamental problem in biology. The essential function of many proteins that fold into a specific structure is their ability to bind to a ligand, which can be assayed for thousands of mutated variants. However, binding assays do not distinguish whether mutations affect the stability of the binding interface or the overall fold. Here, we introduce a statistical method to infer a detailed energy landscape of how a protein folds and binds to a ligand by combining information from many mutated variants. We fit a thermodynamic model describing the bound, unbound, and unfolded states to high quality data of protein G domain B1 binding to IgG-Fc. We infer distinct folding and binding energies for each mutation providing a detailed view of how mutations affect binding and stability across the protein. We accurately infer the folding energy of each variant in physical units, validated by independent data, whereas previous high-throughput methods could only measure indirect changes in stability. While we assume an additive sequence-energy relationship, the binding fraction is epistatic due its nonlinear relation to energy. Despite having no epistasis in energy, our model explains much of the observed epistasis in binding fraction, with the remaining epistasis identifying conformationally dynamic regions.
Collapse
Affiliation(s)
- Jakub Otwinowski
- Biology Department, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
18
|
Abstract
Genotype-phenotype relationships are notoriously complicated. Idiosyncratic interactions between specific combinations of mutations occur and are difficult to predict. Yet it is increasingly clear that many interactions can be understood in terms of global epistasis. That is, mutations may act additively on some underlying, unobserved trait, and this trait is then transformed via a nonlinear function to the observed phenotype as a result of subsequent biophysical and cellular processes. Here we infer the shape of such global epistasis in three proteins, based on published high-throughput mutagenesis data. To do so, we develop a maximum-likelihood inference procedure using a flexible family of monotonic nonlinear functions spanned by an I-spline basis. Our analysis uncovers dramatic nonlinearities in all three proteins; in some proteins a model with global epistasis accounts for virtually all of the measured variation, whereas in others we find substantial local epistasis as well. This method allows us to test hypotheses about the form of global epistasis and to distinguish variance components attributable to global epistasis, local epistasis, and measurement error.
Collapse
|
19
|
Obolski U, Ram Y, Hadany L. Key issues review: evolution on rugged adaptive landscapes. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2018; 81:012602. [PMID: 29051394 DOI: 10.1088/1361-6633/aa94d4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Adaptive landscapes represent a mapping between genotype and fitness. Rugged adaptive landscapes contain two or more adaptive peaks: allele combinations with higher fitness than any of their neighbors in the genetic space. How do populations evolve on such rugged landscapes? Evolutionary biologists have struggled with this question since it was first introduced in the 1930s by Sewall Wright. Discoveries in the fields of genetics and biochemistry inspired various mathematical models of adaptive landscapes. The development of landscape models led to numerous theoretical studies analyzing evolution on rugged landscapes under different biological conditions. The large body of theoretical work suggests that adaptive landscapes are major determinants of the progress and outcome of evolutionary processes. Recent technological advances in molecular biology and microbiology allow experimenters to measure adaptive values of large sets of allele combinations and construct empirical adaptive landscapes for the first time. Such empirical landscapes have already been generated in bacteria, yeast, viruses, and fungi, and are contributing to new insights about evolution on adaptive landscapes. In this Key Issues Review we will: (i) introduce the concept of adaptive landscapes; (ii) review the major theoretical studies of evolution on rugged landscapes; (iii) review some of the recently obtained empirical adaptive landscapes; (iv) discuss recent mathematical and statistical analyses motivated by empirical adaptive landscapes, as well as provide the reader with instructions and source code to implement simulations of evolution on adaptive landscapes; and (v) discuss possible future directions for this exciting field.
Collapse
|
20
|
Garcia V, Feldman MW. Within-Epitope Interactions Can Bias CTL Escape Estimation in Early HIV Infection. Front Immunol 2017; 8:423. [PMID: 28507544 PMCID: PMC5410659 DOI: 10.3389/fimmu.2017.00423] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 03/27/2017] [Indexed: 01/03/2023] Open
Abstract
As human immunodeficiency virus (HIV) begins to replicate within hosts, immune responses are elicited against it. Escape mutations in viral epitopes—immunogenic peptide parts presented on the surface of infected cells—allow HIV to partially evade these responses, and thus rapidly go to fixation. The faster they go to fixation, i.e., the higher their escape rate, the larger the selective pressure exerted by the immune system is assumed to be. This relation underpins the rationale for using escapes to assess the strength of immune responses. However, escape rate estimates are often obtained by employing an aggregation procedure, where several mutations that affect the same epitope are aggregated into a single, composite epitope mutation. The aggregation procedure thus rests upon the assumption that all within-epitope mutations have indistinguishable effects on immune recognition. In this study, we investigate how violation of this assumption affects escape rate estimates. To this end, we extend a previously developed simulation model of HIV that accounts for mutation, selection, and recombination to include different distributions of fitness effects (DFEs) and inter-mutational genomic distances. We use this discrete time Wright–Fisher based model to simulate early within-host evolution of HIV for DFEs and apply standard estimation methods to infer the escape rates. We then compare true with estimated escape rate values. We also compare escape rate values obtained by applying the aggregation procedure with values estimated without use of that procedure. We find that across the DFEs analyzed, the aggregation procedure alters the detectability of escape mutations: large-effect mutations are overrepresented while small-effect mutations are concealed. The effect of the aggregation procedure is similar to extracting the largest-effect mutation appearing within an epitope. Furthermore, the more pronounced the over-exponential decay of the DFEs, the more severely true escape rates are underestimated. We conclude that the aggregation procedure has two main consequences. On the one hand, it leads to a misrepresentation of the DFE of fixed mutations. On the other hand, it conceals within-epitope interactions that may generate irregularities in mutation frequency trajectories that are thus left unexplained.
Collapse
Affiliation(s)
- Victor Garcia
- Department of Biology, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
21
|
|