1
|
Lipsh-Sokolik R, Fleishman SJ. Addressing epistasis in the design of protein function. Proc Natl Acad Sci U S A 2024; 121:e2314999121. [PMID: 39133844 PMCID: PMC11348311 DOI: 10.1073/pnas.2314999121] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024] Open
Abstract
Mutations in protein active sites can dramatically improve function. The active site, however, is densely packed and extremely sensitive to mutations. Therefore, some mutations may only be tolerated in combination with others in a phenomenon known as epistasis. Epistasis reduces the likelihood of obtaining improved functional variants and dramatically slows natural and lab evolutionary processes. Research has shed light on the molecular origins of epistasis and its role in shaping evolutionary trajectories and outcomes. In addition, sequence- and AI-based strategies that infer epistatic relationships from mutational patterns in natural or experimental evolution data have been used to design functional protein variants. In recent years, combinations of such approaches and atomistic design calculations have successfully predicted highly functional combinatorial mutations in active sites. These were used to design thousands of functional active-site variants, demonstrating that, while our understanding of epistasis remains incomplete, some of the determinants that are critical for accurate design are now sufficiently understood. We conclude that the space of active-site variants that has been explored by evolution may be expanded dramatically to enhance natural activities or discover new ones. Furthermore, design opens the way to systematically exploring sequence and structure space and mutational impacts on function, deepening our understanding and control over protein activity.
Collapse
Affiliation(s)
- Rosalie Lipsh-Sokolik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
2
|
Metzger BPH, Park Y, Starr TN, Thornton JW. Epistasis facilitates functional evolution in an ancient transcription factor. eLife 2024; 12:RP88737. [PMID: 38767330 PMCID: PMC11105156 DOI: 10.7554/elife.88737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024] Open
Abstract
A protein's genetic architecture - the set of causal rules by which its sequence produces its functions - also determines its possible evolutionary trajectories. Prior research has proposed that the genetic architecture of proteins is very complex, with pervasive epistatic interactions that constrain evolution and make function difficult to predict from sequence. Most of this work has analyzed only the direct paths between two proteins of interest - excluding the vast majority of possible genotypes and evolutionary trajectories - and has considered only a single protein function, leaving unaddressed the genetic architecture of functional specificity and its impact on the evolution of new functions. Here, we develop a new method based on ordinal logistic regression to directly characterize the global genetic determinants of multiple protein functions from 20-state combinatorial deep mutational scanning (DMS) experiments. We use it to dissect the genetic architecture and evolution of a transcription factor's specificity for DNA, using data from a combinatorial DMS of an ancient steroid hormone receptor's capacity to activate transcription from two biologically relevant DNA elements. We show that the genetic architecture of DNA recognition consists of a dense set of main and pairwise effects that involve virtually every possible amino acid state in the protein-DNA interface, but higher-order epistasis plays only a tiny role. Pairwise interactions enlarge the set of functional sequences and are the primary determinants of specificity for different DNA elements. They also massively expand the number of opportunities for single-residue mutations to switch specificity from one DNA target to another. By bringing variants with different functions close together in sequence space, pairwise epistasis therefore facilitates rather than constrains the evolution of new functions.
Collapse
Affiliation(s)
- Brian PH Metzger
- Department of Ecology and Evolution, University of ChicagoChicagoUnited States
| | - Yeonwoo Park
- Program in Genetics, Genomics, and Systems Biology, University of ChicagoChicagoUnited States
| | - Tyler N Starr
- Department of Biochemistry and Molecular Biophysics, University of ChicagoChicagoUnited States
| | - Joseph W Thornton
- Department of Ecology and Evolution, University of ChicagoChicagoUnited States
- Department of Human Genetics, University of ChicagoChicagoUnited States
| |
Collapse
|
3
|
Zheng T, Zhang C. Engineering strategies and challenges of endolysin as an antibacterial agent against Gram-negative bacteria. Microb Biotechnol 2024; 17:e14465. [PMID: 38593316 PMCID: PMC11003714 DOI: 10.1111/1751-7915.14465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/09/2024] [Accepted: 03/21/2024] [Indexed: 04/11/2024] Open
Abstract
Bacteriophage endolysin is a novel antibacterial agent that has attracted much attention in the prevention and control of drug-resistant bacteria due to its unique mechanism of hydrolysing peptidoglycans. Although endolysin exhibits excellent bactericidal effects on Gram-positive bacteria, the presence of the outer membrane of Gram-negative bacteria makes it difficult to lyse them extracellularly, thus limiting their application field. To enhance the extracellular activity of endolysin and facilitate its crossing through the outer membrane of Gram-negative bacteria, researchers have adopted physical, chemical, and molecular methods. This review summarizes the characterization of endolysin targeting Gram-negative bacteria, strategies for endolysin modification, and the challenges and future of engineering endolysin against Gram-negative bacteria in clinical applications, to promote the application of endolysin in the prevention and control of Gram-negative bacteria.
Collapse
Affiliation(s)
- Tianyu Zheng
- Bathurst Future Agri‐Tech InstituteQingdao Agricultural UniversityQingdaoChina
| | - Can Zhang
- College of Veterinary MedicineQingdao Agricultural UniversityQingdaoChina
| |
Collapse
|
4
|
Liu Y, Luo Y, Lu X, Gao H, He R, Zhang X, Zhang X, Li Y. Genotypic-phenotypic landscape computation based on first principle and deep learning. Brief Bioinform 2024; 25:bbae191. [PMID: 38701420 PMCID: PMC11066946 DOI: 10.1093/bib/bbae191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 03/03/2024] [Accepted: 04/10/2024] [Indexed: 05/05/2024] Open
Abstract
The relationship between genotype and fitness is fundamental to evolution, but quantitatively mapping genotypes to fitness has remained challenging. We propose the Phenotypic-Embedding theorem (P-E theorem) that bridges genotype-phenotype through an encoder-decoder deep learning framework. Inspired by this, we proposed a more general first principle for correlating genotype-phenotype, and the P-E theorem provides a computable basis for the application of first principle. As an application example of the P-E theorem, we developed the Co-attention based Transformer model to bridge Genotype and Fitness model, a Transformer-based pre-train foundation model with downstream supervised fine-tuning that can accurately simulate the neutral evolution of viruses and predict immune escape mutations. Accordingly, following the calculation path of the P-E theorem, we accurately obtained the basic reproduction number (${R}_0$) of SARS-CoV-2 from first principles, quantitatively linked immune escape to viral fitness and plotted the genotype-fitness landscape. The theoretical system we established provides a general and interpretable method to construct genotype-phenotype landscapes, providing a new paradigm for studying theoretical and computational biology.
Collapse
Affiliation(s)
- Yuexing Liu
- Guangzhou Laboratory, Guangzhou, Guangdong Province 510005, China
| | - Yao Luo
- National University of Singapore, 21 Lower Kent Ridge Road, 119077, Singapore
| | - Xin Lu
- Guangzhou Laboratory, Guangzhou, Guangdong Province 510005, China
| | - Hao Gao
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200030, China
| | - Ruikun He
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200030, China
| | - Xin Zhang
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200030, China
| | - Xuguang Zhang
- Mengniu Institute of Nutrition Science, Shanghai 200126, China
| | - Yixue Li
- Guangzhou Laboratory, Guangzhou, Guangdong Province 510005, China
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200030, China
- GZMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Medical University, Guangzhou 511436, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai 200433, China
- Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai 200032, China
| |
Collapse
|
5
|
Kolesnik VV, Nurtdinov RF, Oloruntimehin ES, Karabelsky AV, Malogolovkin AS. Optimization strategies and advances in the research and development of AAV-based gene therapy to deliver large transgenes. Clin Transl Med 2024; 14:e1607. [PMID: 38488469 PMCID: PMC10941601 DOI: 10.1002/ctm2.1607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 02/07/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
Adeno-associated virus (AAV)-based therapies are recognized as one of the most potent next-generation treatments for inherited and genetic diseases. However, several biological and technological aspects of AAV vectors remain a critical issue for their widespread clinical application. Among them, the limited capacity of the AAV genome significantly hinders the development of AAV-based gene therapy. In this context, genetically modified transgenes compatible with AAV are opening up new opportunities for unlimited gene therapies for many genetic disorders. Recent advances in de novo protein design and remodelling are paving the way for new, more efficient and targeted gene therapeutics. Using computational and genetic tools, AAV expression cassette and transgenic DNA can be split, miniaturized, shuffled or created from scratch to mediate efficient gene transfer into targeted cells. In this review, we highlight recent advances in AAV-based gene therapy with a focus on its use in translational research. We summarize recent research and development in gene therapy, with an emphasis on large transgenes (>4.8 kb) and optimizing strategies applied by biomedical companies in the research pipeline. We critically discuss the prospects for AAV-based treatment and some emerging challenges. We anticipate that the continued development of novel computational tools will lead to rapid advances in basic gene therapy research and translational studies.
Collapse
Affiliation(s)
- Valeria V. Kolesnik
- Martsinovsky Institute of Medical ParasitologyTropical and Vector‐Borne Diseases, Sechenov UniversityMoscowRussia
| | - Ruslan F. Nurtdinov
- Martsinovsky Institute of Medical ParasitologyTropical and Vector‐Borne Diseases, Sechenov UniversityMoscowRussia
| | - Ezekiel Sola Oloruntimehin
- Martsinovsky Institute of Medical ParasitologyTropical and Vector‐Borne Diseases, Sechenov UniversityMoscowRussia
| | | | - Alexander S. Malogolovkin
- Martsinovsky Institute of Medical ParasitologyTropical and Vector‐Borne Diseases, Sechenov UniversityMoscowRussia
- Center for Translational MedicineSirius University of Science and TechnologySochiRussia
| |
Collapse
|
6
|
Ding D, Shaw AY, Sinai S, Rollins N, Prywes N, Savage DF, Laub MT, Marks DS. Protein design using structure-based residue preferences. Nat Commun 2024; 15:1639. [PMID: 38388493 PMCID: PMC10884402 DOI: 10.1038/s41467-024-45621-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 01/29/2024] [Indexed: 02/24/2024] Open
Abstract
Recent developments in protein design rely on large neural networks with up to 100s of millions of parameters, yet it is unclear which residue dependencies are critical for determining protein function. Here, we show that amino acid preferences at individual residues-without accounting for mutation interactions-explain much and sometimes virtually all of the combinatorial mutation effects across 8 datasets (R2 ~ 78-98%). Hence, few observations (~100 times the number of mutated residues) enable accurate prediction of held-out variant effects (Pearson r > 0.80). We hypothesized that the local structural contexts around a residue could be sufficient to predict mutation preferences, and develop an unsupervised approach termed CoVES (Combinatorial Variant Effects from Structure). Our results suggest that CoVES outperforms not just model-free methods but also similarly to complex models for creating functional and diverse protein variants. CoVES offers an effective alternative to complicated models for identifying functional protein mutations.
Collapse
Affiliation(s)
- David Ding
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA.
| | - Ada Y Shaw
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Sam Sinai
- Dyno Therapeutics, Watertown, MA, 02472, USA
| | - Nathan Rollins
- Seismic Therapeutics, Lab Central, Cambridge, MA, 02142, USA
| | - Noam Prywes
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA
| | - David F Savage
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA, 94720, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
7
|
Avizemer Z, Martí-Gómez C, Hoch SY, McCandlish DM, Fleishman SJ. Evolutionary paths that link orthogonal pairs of binding proteins. RESEARCH SQUARE 2023:rs.3.rs-2836905. [PMID: 37131620 PMCID: PMC10153392 DOI: 10.21203/rs.3.rs-2836905/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Some protein binding pairs exhibit extreme specificities that functionally insulate them from homologs. Such pairs evolve mostly by accumulating single-point mutations, and mutants are selected if their affinity exceeds the threshold required for function1-4. Thus, homologous and high-specificity binding pairs bring to light an evolutionary conundrum: how does a new specificity evolve while maintaining the required affinity in each intermediate5,6? Until now, a fully functional single-mutation path that connects two orthogonal pairs has only been described where the pairs were mutationally close thus enabling experimental enumeration of all intermediates2. We present an atomistic and graph-theoretical framework for discovering low molecular strain single-mutation paths that connect two extant pairs, enabling enumeration beyond experimental capability. We apply it to two orthogonal bacterial colicin endonuclease-immunity pairs separated by 17 interface mutations7. We were not able to find a strain-free and functional path in the sequence space defined by the two extant pairs. But including mutations that bridge amino acids that cannot be exchanged through single-nucleotide mutations led us to a strain-free 19-mutation trajectory that is completely viable in vivo. Our experiments show that the specificity switch is remarkably abrupt, resulting from only one radical mutation on each partner. Furthermore, each of the critical specificity-switch mutations increases fitness, demonstrating that functional divergence could be driven by positive Darwinian selection. These results reveal how even radical functional changes in an epistatic fitness landscape may evolve.
Collapse
Affiliation(s)
- Ziv Avizemer
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001, Rehovot, Israel
| | - Carlos Martí-Gómez
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724
| | - Shlomo Yakir Hoch
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001, Rehovot, Israel
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724
| | - Sarel J. Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001, Rehovot, Israel
| |
Collapse
|
8
|
Charest N, Shen Y, Lai YC, Chen IA, Shea JE. Discovering pathways through ribozyme fitness landscapes using information theoretic quantification of epistasis. RNA (NEW YORK, N.Y.) 2023; 29:1644-1657. [PMID: 37580126 PMCID: PMC10578471 DOI: 10.1261/rna.079541.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 07/29/2023] [Indexed: 08/16/2023]
Abstract
The identification of catalytic RNAs is typically achieved through primarily experimental means. However, only a small fraction of sequence space can be analyzed even with high-throughput techniques. Methods to extrapolate from a limited data set to predict additional ribozyme sequences, particularly in a human-interpretable fashion, could be useful both for designing new functional RNAs and for generating greater understanding about a ribozyme fitness landscape. Using information theory, we express the effects of epistasis (i.e., deviations from additivity) on a ribozyme. This representation was incorporated into a simple model of the epistatic fitness landscape, which identified potentially exploitable combinations of mutations. We used this model to theoretically predict mutants of high activity for a self-aminoacylating ribozyme, identifying potentially active triple and quadruple mutants beyond the experimental data set of single and double mutants. The predictions were validated experimentally, with nine out of nine sequences being accurately predicted to have high activity. This set of sequences included mutants that form a previously unknown evolutionary "bridge" between two ribozyme families that share a common motif. Individual steps in the method could be examined, understood, and guided by a human, combining interpretability and performance in a simple model to predict ribozyme sequences by extrapolation.
Collapse
Affiliation(s)
- Nathaniel Charest
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
| | - Yuning Shen
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
| | - Yei-Chen Lai
- Department of Chemistry, National Chung Hsing University, Taichung City 40227, Taiwan
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, California 90095, USA
| | - Irene A Chen
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, California 90095, USA
| | - Joan-Emma Shea
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA
| |
Collapse
|
9
|
Romero-Romero S, Lindner S, Ferruz N. Exploring the Protein Sequence Space with Global Generative Models. Cold Spring Harb Perspect Biol 2023; 15:a041471. [PMID: 37848247 PMCID: PMC10626256 DOI: 10.1101/cshperspect.a041471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2023]
Abstract
Recent advancements in specialized large-scale architectures for training images and language have profoundly impacted the field of computer vision and natural language processing (NLP). Language models, such as the recent ChatGPT and GPT-4, have demonstrated exceptional capabilities in processing, translating, and generating human language. These breakthroughs have also been reflected in protein research, leading to the rapid development of numerous new methods in a short time, with unprecedented performance. Several of these models have been developed with the goal of generating sequences in novel regions of the protein space. In this work, we provide an overview of the use of protein generative models, reviewing (1) language models for the design of novel artificial proteins, (2) works that use non-transformer architectures, and (3) applications in directed evolution approaches.
Collapse
Affiliation(s)
| | | | - Noelia Ferruz
- Barcelona Institute of Molecular Biology, 08028 Barcelona, Spain
| |
Collapse
|
10
|
Radford F, Rinehart J, Isaacs FJ. Mapping the in vivo fitness landscape of a tethered ribosome. SCIENCE ADVANCES 2023; 9:eade8934. [PMID: 37115918 PMCID: PMC10146877 DOI: 10.1126/sciadv.ade8934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Fitness landscapes are models of the sequence space of a genetic element that map how each sequence corresponds to its activity and can be used to guide laboratory evolution. The ribosome is a macromolecular machine that is essential for protein synthesis in all organisms. Because of the prevalence of dominant lethal mutations, a comprehensive fitness landscape of the ribosomal peptidyl transfer center (PTC) has not yet been attained. Here, we develop a method to functionally map an orthogonal tethered ribosome (oRiboT), which permits complete mutagenesis of nucleotides located in the PTC and the resulting epistatic interactions. We found that most nucleotides studied showed flexibility to mutation, and identified epistatic interactions between them, which compensate for deleterious mutations. This work provides a basis for a deeper understanding of ribosome function and malleability and could be used to inform design of engineered ribosomes with applications to synthesize next-generation biomaterials and therapeutics.
Collapse
Affiliation(s)
- Felix Radford
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06520, USA
- Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Jesse Rinehart
- Systems Biology Institute, Yale University, West Haven, CT 06516, USA
- Department of Cellular and Molecular Physiology, Yale School of Medicine, New Haven, CT 06520, USA
| | - Farren J. Isaacs
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06520, USA
- Systems Biology Institute, Yale University, West Haven, CT 06516, USA
- Department of Biomedical Engineering, Yale University, New Haven, CT 06520, USA
- Corresponding author.
| |
Collapse
|
11
|
Lipsh-Sokolik R, Khersonsky O, Schröder SP, de Boer C, Hoch SY, Davies GJ, Overkleeft HS, Fleishman SJ. Combinatorial assembly and design of enzymes. Science 2023; 379:195-201. [PMID: 36634164 DOI: 10.1126/science.ade9434] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The design of structurally diverse enzymes is constrained by long-range interactions that are necessary for accurate folding. We introduce an atomistic and machine learning strategy for the combinatorial assembly and design of enzymes (CADENZ) to design fragments that combine with one another to generate diverse, low-energy structures with stable catalytic constellations. We applied CADENZ to endoxylanases and used activity-based protein profiling to recover thousands of structurally diverse enzymes. Functional designs exhibit high active-site preorganization and more stable and compact packing outside the active site. Implementing these lessons into CADENZ led to a 10-fold improved hit rate and more than 10,000 recovered enzymes. This design-test-learn loop can be applied, in principle, to any modular protein family, yielding huge diversity and general lessons on protein design principles.
Collapse
Affiliation(s)
- R Lipsh-Sokolik
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - O Khersonsky
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - S P Schröder
- Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2300 RA Leiden, Netherlands
| | - C de Boer
- Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2300 RA Leiden, Netherlands
| | - S-Y Hoch
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - G J Davies
- York Structural Biology Laboratory, Department of Chemistry, The University of York, Heslington, York YO10 5DD, UK
| | - H S Overkleeft
- Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2300 RA Leiden, Netherlands
| | - S J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001 Rehovot, Israel
| |
Collapse
|
12
|
Segredo-Otero E, Sanjuán R. Genetic complementation fosters evolvability in complex fitness landscapes. Sci Rep 2023; 13:662. [PMID: 36635310 PMCID: PMC9837146 DOI: 10.1038/s41598-022-26588-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 12/16/2022] [Indexed: 01/14/2023] Open
Abstract
The ability of natural selection to optimize traits depends on the topology of the genotype-fitness map (fitness landscape). Epistatic interactions produce rugged fitness landscapes, where adaptation is constrained by the presence of low-fitness intermediates. Here, we used simulations to explore how evolvability in rugged fitness landscapes is influenced by genetic complementation, a process whereby different sequence variants mutually compensate for their deleterious mutations. We designed our model inspired by viral populations, in which genetic variants are known to interact frequently through coinfection. Our simulations indicate that genetic complementation enables a more efficient exploration of rugged fitness landscapes. Although this benefit may be undermined by genetic parasites, its overall effect on evolvability remains positive in populations that exhibit strong relatedness between interacting sequences. Similar processes could operate in contexts other than viral coinfection, such as in the evolution of ploidy.
Collapse
Affiliation(s)
- Ernesto Segredo-Otero
- grid.4711.30000 0001 2183 4846Institute for Integrative Systems Biology (I2SysBio), Consejo Superior de Investigaciones Científicas-Universitat de València, C/ Catedrático Agustín Escardino 9, 46980 Paterna, València, Spain
| | - Rafael Sanjuán
- Institute for Integrative Systems Biology (I2SysBio), Consejo Superior de Investigaciones Científicas-Universitat de València, C/ Catedrático Agustín Escardino 9, 46980, Paterna, València, Spain.
| |
Collapse
|
13
|
Conflicting effects of recombination on the evolvability and robustness in neutrally evolving populations. PLoS Comput Biol 2022; 18:e1010710. [DOI: 10.1371/journal.pcbi.1010710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 12/05/2022] [Accepted: 11/04/2022] [Indexed: 11/22/2022] Open
Abstract
Understanding the benefits and costs of recombination under different scenarios of evolutionary adaptation remains an open problem for theoretical and experimental research. In this study, we focus on finite populations evolving on neutral networks comprising viable and unfit genotypes. We provide a comprehensive overview of the effects of recombination by jointly considering different measures of evolvability and mutational robustness over a broad parameter range, such that many evolutionary regimes are covered. We find that several of these measures vary non-monotonically with the rates of mutation and recombination. Moreover, the presence of unfit genotypes that introduce inhomogeneities in the network of viable states qualitatively alters the effects of recombination. We conclude that conflicting trends induced by recombination can be explained by an emerging trade-off between evolvability on the one hand, and mutational robustness on the other. Finally, we discuss how different implementations of the recombination scheme in theoretical models can affect the observed dependence on recombination rate through a coupling between recombination and genetic drift.
Collapse
|
14
|
Azbukina N, Zharikova A, Ramensky V. Intragenic compensation through the lens of deep mutational scanning. Biophys Rev 2022; 14:1161-1182. [PMID: 36345285 PMCID: PMC9636336 DOI: 10.1007/s12551-022-01005-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 09/26/2022] [Indexed: 12/20/2022] Open
Abstract
A significant fraction of mutations in proteins are deleterious and result in adverse consequences for protein function, stability, or interaction with other molecules. Intragenic compensation is a specific case of positive epistasis when a neutral missense mutation cancels effect of a deleterious mutation in the same protein. Permissive compensatory mutations facilitate protein evolution, since without them all sequences would be extremely conserved. Understanding compensatory mechanisms is an important scientific challenge at the intersection of protein biophysics and evolution. In human genetics, intragenic compensatory interactions are important since they may result in variable penetrance of pathogenic mutations or fixation of pathogenic human alleles in orthologous proteins from related species. The latter phenomenon complicates computational and clinical inference of an allele's pathogenicity. Deep mutational scanning is a relatively new technique that enables experimental studies of functional effects of thousands of mutations in proteins. We review the important aspects of the field and discuss existing limitations of current datasets. We reviewed ten published DMS datasets with quantified functional effects of single and double mutations and described rates and patterns of intragenic compensation in eight of them. Supplementary Information The online version contains supplementary material available at 10.1007/s12551-022-01005-w.
Collapse
Affiliation(s)
- Nadezhda Azbukina
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
| | - Anastasia Zharikova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| | - Vasily Ramensky
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| |
Collapse
|
15
|
Abstract
One core goal of genetics is to systematically understand the mapping between the DNA sequence of an organism (genotype) and its measurable characteristics (phenotype). Understanding this mapping is often challenging because of interactions between mutations, where the result of combining several different mutations can be very different than the sum of their individual effects. Here we provide a statistical framework for modeling complex genetic interactions of this type. The key idea is to ask how fast the effects of mutations change when introducing the same mutation in increasingly distant genetic backgrounds. We then propose a model for phenotypic prediction that takes into account this tendency for the effects of mutations to be more similar in nearby genetic backgrounds. Contemporary high-throughput mutagenesis experiments are providing an increasingly detailed view of the complex patterns of genetic interaction that occur between multiple mutations within a single protein or regulatory element. By simultaneously measuring the effects of thousands of combinations of mutations, these experiments have revealed that the genotype–phenotype relationship typically reflects not only genetic interactions between pairs of sites but also higher-order interactions among larger numbers of sites. However, modeling and understanding these higher-order interactions remains challenging. Here we present a method for reconstructing sequence-to-function mappings from partially observed data that can accommodate all orders of genetic interaction. The main idea is to make predictions for unobserved genotypes that match the type and extent of epistasis found in the observed data. This information on the type and extent of epistasis can be extracted by considering how phenotypic correlations change as a function of mutational distance, which is equivalent to estimating the fraction of phenotypic variance due to each order of genetic interaction (additive, pairwise, three-way, etc.). Using these estimated variance components, we then define an empirical Bayes prior that in expectation matches the observed pattern of epistasis and reconstruct the genotype–phenotype mapping by conducting Gaussian process regression under this prior. To demonstrate the power of this approach, we present an application to the antibody-binding domain GB1 and also provide a detailed exploration of a dataset consisting of high-throughput measurements for the splicing efficiency of human pre-mRNA 5′ splice sites, for which we also validate our model predictions via additional low-throughput experiments.
Collapse
|
16
|
Interpretable modeling of genotype-phenotype landscapes with state-of-the-art predictive power. Proc Natl Acad Sci U S A 2022; 119:e2114021119. [PMID: 35733251 PMCID: PMC9245639 DOI: 10.1073/pnas.2114021119] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Large-scale measurements linking genetic background to biological function have driven a need for models that can incorporate these data for reliable predictions and insight into the underlying biophysical system. Recent modeling efforts, however, prioritize predictive accuracy at the expense of model interpretability. Here, we present LANTERN (landscape interpretable nonparametric model, https://github.com/usnistgov/lantern), a hierarchical Bayesian model that distills genotype-phenotype landscape (GPL) measurements into a low-dimensional feature space that represents the fundamental biological mechanisms of the system while also enabling straightforward, explainable predictions. Across a benchmark of large-scale datasets, LANTERN equals or outperforms all alternative approaches, including deep neural networks. LANTERN furthermore extracts useful insights of the landscape, including its inherent dimensionality, a latent space of additive mutational effects, and metrics of landscape structure. LANTERN facilitates straightforward discovery of fundamental mechanisms in GPLs, while also reliably extrapolating to unexplored regions of genotypic space.
Collapse
|
17
|
Yang CH, Scarpino SV. A Family of Fitness Landscapes Modeled through Gene Regulatory Networks. ENTROPY (BASEL, SWITZERLAND) 2022; 24:622. [PMID: 35626507 PMCID: PMC9141513 DOI: 10.3390/e24050622] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 04/11/2022] [Accepted: 04/26/2022] [Indexed: 02/01/2023]
Abstract
Fitness landscapes are a powerful metaphor for understanding the evolution of biological systems. These landscapes describe how genotypes are connected to each other through mutation and related through fitness. Empirical studies of fitness landscapes have increasingly revealed conserved topographical features across diverse taxa, e.g., the accessibility of genotypes and "ruggedness". As a result, theoretical studies are needed to investigate how evolution proceeds on fitness landscapes with such conserved features. Here, we develop and study a model of evolution on fitness landscapes using the lens of Gene Regulatory Networks (GRNs), where the regulatory products are computed from multiple genes and collectively treated as phenotypes. With the assumption that regulation is a binary process, we prove the existence of empirically observed, topographical features such as accessibility and connectivity. We further show that these results hold across arbitrary fitness functions and that a trade-off between accessibility and ruggedness need not exist. Then, using graph theory and a coarse-graining approach, we deduce a mesoscopic structure underlying GRN fitness landscapes where the information necessary to predict a population's evolutionary trajectory is retained with minimal complexity. Using this coarse-graining, we develop a bottom-up algorithm to construct such mesoscopic backbones, which does not require computing the genotype network and is therefore far more efficient than brute-force approaches. Altogether, this work provides mathematical results of high-dimensional fitness landscapes and a path toward connecting theory to empirical studies.
Collapse
Affiliation(s)
- Chia-Hung Yang
- Network Science Institute, Northeastern University, Boston, MA 02115, USA
| | - Samuel V. Scarpino
- Network Science Institute, Northeastern University, Boston, MA 02115, USA
- Physics Department, Northeastern University, Boston, MA 02115, USA
- Roux Institute, Northeastern University, Boston, MA 02115, USA
- Institute for Experiential AI, Northeastern University, Boston, MA 02115, USA
- Santa Fe Institute, Santa Fe, NM 87501, USA
- Vermont Complex Systems Center, University of Vermont, Burlington, VT 05405, USA
| |
Collapse
|
18
|
The evolution, evolvability and engineering of gene regulatory DNA. Nature 2022; 603:455-463. [PMID: 35264797 DOI: 10.1038/s41586-022-04506-6] [Citation(s) in RCA: 88] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 02/02/2022] [Indexed: 11/08/2022]
Abstract
Mutations in non-coding regulatory DNA sequences can alter gene expression, organismal phenotype and fitness1-3. Constructing complete fitness landscapes, in which DNA sequences are mapped to fitness, is a long-standing goal in biology, but has remained elusive because it is challenging to generalize reliably to vast sequence spaces4-6. Here we build sequence-to-expression models that capture fitness landscapes and use them to decipher principles of regulatory evolution. Using millions of randomly sampled promoter DNA sequences and their measured expression levels in the yeast Saccharomyces cerevisiae, we learn deep neural network models that generalize with excellent prediction performance, and enable sequence design for expression engineering. Using our models, we study expression divergence under genetic drift and strong-selection weak-mutation regimes to find that regulatory evolution is rapid and subject to diminishing returns epistasis; that conflicting expression objectives in different environments constrain expression adaptation; and that stabilizing selection on gene expression leads to the moderation of regulatory complexity. We present an approach for using such models to detect signatures of selection on expression from natural variation in regulatory sequences and use it to discover an instance of convergent regulatory evolution. We assess mutational robustness, finding that regulatory mutation effect sizes follow a power law, characterize regulatory evolvability, visualize promoter fitness landscapes, discover evolvability archetypes and illustrate the mutational robustness of natural regulatory sequence populations. Our work provides a general framework for designing regulatory sequences and addressing fundamental questions in regulatory evolution.
Collapse
|
19
|
Scheele RA, Lindenburg LH, Petek M, Schober M, Dalby KN, Hollfelder F. Droplet-based screening of phosphate transfer catalysis reveals how epistasis shapes MAP kinase interactions with substrates. Nat Commun 2022; 13:844. [PMID: 35149678 PMCID: PMC8837617 DOI: 10.1038/s41467-022-28396-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 01/10/2022] [Indexed: 11/20/2022] Open
Abstract
The combination of ultrahigh-throughput screening and sequencing informs on function and intragenic epistasis within combinatorial protein mutant libraries. Establishing a droplet-based, in vitro compartmentalised approach for robust expression and screening of protein kinase cascades (>107 variants/day) allowed us to dissect the intrinsic molecular features of the MKK-ERK signalling pathway, without interference from endogenous cellular components. In a six-residue combinatorial library of the MKK1 docking domain, we identified 29,563 sequence permutations that allow MKK1 to efficiently phosphorylate and activate its downstream target kinase ERK2. A flexibly placed hydrophobic sequence motif emerges which is defined by higher order epistatic interactions between six residues, suggesting synergy that enables high connectivity in the sequence landscape. Through positive epistasis, MKK1 maintains function during mutagenesis, establishing the importance of co-dependent residues in mammalian protein kinase-substrate interactions, and creating a scenario for the evolution of diverse human signalling networks. Here, the authors use a droplet-based screen for phosphate transfer catalysis, testing variants of the human protein kinase MKK1 for its ability to activate its downstream target ERK2. Data reveal a flexible motif in the MKK1 docking domain that promotes efficient activation of ERK2, and suggest epistasis between the residues within that sequence.
Collapse
Affiliation(s)
- Remkes A Scheele
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | | | - Maya Petek
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK.,Faculty of Medicine, University of Maribor, SI-2000, Maribor, Slovenia
| | - Markus Schober
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | - Kevin N Dalby
- Division of Chemical Biology and Medicinal Chemistry, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Florian Hollfelder
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK.
| |
Collapse
|
20
|
Gonzalez Somermeyer L, Fleiss A, Mishin AS, Bozhanova NG, Igolkina AA, Meiler J, Alaball Pujol ME, Putintseva EV, Sarkisyan KS, Kondrashov FA. Heterogeneity of the GFP fitness landscape and data-driven protein design. eLife 2022; 11:75842. [PMID: 35510622 PMCID: PMC9119679 DOI: 10.7554/elife.75842] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 03/25/2022] [Indexed: 11/24/2022] Open
Abstract
Studies of protein fitness landscapes reveal biophysical constraints guiding protein evolution and empower prediction of functional proteins. However, generalisation of these findings is limited due to scarceness of systematic data on fitness landscapes of proteins with a defined evolutionary relationship. We characterized the fitness peaks of four orthologous fluorescent proteins with a broad range of sequence divergence. While two of the four studied fitness peaks were sharp, the other two were considerably flatter, being almost entirely free of epistatic interactions. Mutationally robust proteins, characterized by a flat fitness peak, were not optimal templates for machine-learning-driven protein design - instead, predictions were more accurate for fragile proteins with epistatic landscapes. Our work paves insights for practical application of fitness landscape heterogeneity in protein engineering.
Collapse
Affiliation(s)
| | - Aubin Fleiss
- Synthetic Biology Group, MRC London Institute of Medical SciencesLondonUnited Kingdom,Institute of Clinical Sciences, Faculty of Medicine and Imperial College Centre for Synthetic Biology, Imperial College LondonLondonUnited Kingdom
| | - Alexander S Mishin
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of SciencesMoscowRussian Federation
| | - Nina G Bozhanova
- Department of Chemistry, Center for Structural Biology, Vanderbilt UniversityNashvilleUnited States
| | - Anna A Igolkina
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna BioCenterViennaAustria
| | - Jens Meiler
- Department of Chemistry, Center for Structural Biology, Vanderbilt UniversityNashvilleUnited States,Institute for Drug Discovery, Medical School, Leipzig UniversityLeipzigGermany
| | - Maria-Elisenda Alaball Pujol
- Synthetic Biology Group, MRC London Institute of Medical SciencesLondonUnited Kingdom,Institute of Clinical Sciences, Faculty of Medicine and Imperial College Centre for Synthetic Biology, Imperial College LondonLondonUnited Kingdom
| | | | - Karen S Sarkisyan
- Synthetic Biology Group, MRC London Institute of Medical SciencesLondonUnited Kingdom,Institute of Clinical Sciences, Faculty of Medicine and Imperial College Centre for Synthetic Biology, Imperial College LondonLondonUnited Kingdom,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of SciencesMoscowRussian Federation
| | - Fyodor A Kondrashov
- Institute of Science and Technology AustriaKlosterneuburgAustria,Evolutionary and Synthetic Biology Unit, Okinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| |
Collapse
|
21
|
Currin A, Parker S, Robinson CJ, Takano E, Scrutton NS, Breitling R. The evolving art of creating genetic diversity: From directed evolution to synthetic biology. Biotechnol Adv 2021; 50:107762. [PMID: 34000294 PMCID: PMC8299547 DOI: 10.1016/j.biotechadv.2021.107762] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 04/21/2021] [Accepted: 04/25/2021] [Indexed: 12/31/2022]
Abstract
The ability to engineer biological systems, whether to introduce novel functionality or improved performance, is a cornerstone of biotechnology and synthetic biology. Typically, this requires the generation of genetic diversity to explore variations in phenotype, a process that can be performed at many levels, from single molecule targets (i.e., in directed evolution of enzymes) to whole organisms (e.g., in chassis engineering). Recent advances in DNA synthesis technology and automation have enhanced our ability to create variant libraries with greater control and throughput. This review highlights the latest developments in approaches to create such a hierarchy of diversity from the enzyme level to entire pathways in vitro, with a focus on the creation of combinatorial libraries that are required to navigate a target's vast design space successfully to uncover significant improvements in function.
Collapse
Affiliation(s)
- Andrew Currin
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, United Kingdom.
| | - Steven Parker
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, United Kingdom
| | - Christopher J Robinson
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, United Kingdom
| | - Eriko Takano
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, United Kingdom
| | - Nigel S Scrutton
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, United Kingdom
| | - Rainer Breitling
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, United Kingdom.
| |
Collapse
|
22
|
Ecology shapes epistasis in a genotype-phenotype-fitness map for stick insect colour. Nat Ecol Evol 2020; 4:1673-1684. [PMID: 32929238 DOI: 10.1038/s41559-020-01305-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 08/19/2020] [Indexed: 01/06/2023]
Abstract
Genetic interactions such as epistasis are widespread in nature and can shape evolutionary dynamics. Epistasis occurs due to nonlinearity in biological systems, which can arise via cellular processes that convert genotype to phenotype and via selective processes that connect phenotype to fitness. Few studies in nature have connected genotype to phenotype to fitness for multiple potentially interacting genetic variants. Thus, the causes of epistasis in the wild remain poorly understood. Here, we show that epistasis for fitness is an emergent and predictable property of nonlinear selective processes. We do so by measuring the genetic basis of cryptic colouration and survival in a field experiment with stick insects. We find that colouration shows a largely additive genetic basis but with some effects of epistasis that enhance differentiation between colour morphs. In terms of fitness, different combinations of loci affecting colouration confer high survival in one host-plant treatment. Specifically, nonlinear correlational selection for specific combinations of colour traits in this treatment drives the emergence of pairwise and higher-order epistasis for fitness at loci underlying colour. In turn, this results in a rugged fitness landscape for genotypes. In contrast, fitness epistasis was dampened in another treatment, where selection was weaker. Patterns of epistasis that are shaped by ecologically based selection could be common and central to understanding fitness landscapes, the dynamics of evolution and potentially other complex systems.
Collapse
|
23
|
Kurahashi R, Tanaka SI, Takano K. Highly active enzymes produced by directed evolution with stability-based selection. Enzyme Microb Technol 2020; 140:109626. [DOI: 10.1016/j.enzmictec.2020.109626] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 06/12/2020] [Accepted: 06/12/2020] [Indexed: 12/22/2022]
|
24
|
Láruson ÁJ, Yeaman S, Lotterhos KE. The Importance of Genetic Redundancy in Evolution. Trends Ecol Evol 2020; 35:809-822. [DOI: 10.1016/j.tree.2020.04.009] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 04/21/2020] [Accepted: 04/24/2020] [Indexed: 12/20/2022]
|
25
|
Das SG, Direito SOL, Waclaw B, Allen RJ, Krug J. Predictable properties of fitness landscapes induced by adaptational tradeoffs. eLife 2020; 9:e55155. [PMID: 32423531 PMCID: PMC7297540 DOI: 10.7554/elife.55155] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 05/05/2020] [Indexed: 02/06/2023] Open
Abstract
Fitness effects of mutations depend on environmental parameters. For example, mutations that increase fitness of bacteria at high antibiotic concentration often decrease fitness in the absence of antibiotic, exemplifying a tradeoff between adaptation to environmental extremes. We develop a mathematical model for fitness landscapes generated by such tradeoffs, based on experiments that determine the antibiotic dose-response curves of Escherichia coli strains, and previous observations on antibiotic resistance mutations. Our model generates a succession of landscapes with predictable properties as antibiotic concentration is varied. The landscape is nearly smooth at low and high concentrations, but the tradeoff induces a high ruggedness at intermediate antibiotic concentrations. Despite this high ruggedness, however, all the fitness maxima in the landscapes are evolutionarily accessible from the wild type. This implies that selection for antibiotic resistance in multiple mutational steps is relatively facile despite the complexity of the underlying landscape.
Collapse
Affiliation(s)
- Suman G Das
- Institute for Biological Physics, University of CologneCologneGermany
| | - Susana OL Direito
- School of Physics and Astronomy, University of EdinburghEdinburghUnited Kingdom
| | - Bartlomiej Waclaw
- School of Physics and Astronomy, University of EdinburghEdinburghUnited Kingdom
| | - Rosalind J Allen
- School of Physics and Astronomy, University of EdinburghEdinburghUnited Kingdom
| | - Joachim Krug
- Institute for Biological Physics, University of CologneCologneGermany
| |
Collapse
|
26
|
Zhou J, McCandlish DM. Minimum epistasis interpolation for sequence-function relationships. Nat Commun 2020; 11:1782. [PMID: 32286265 PMCID: PMC7156698 DOI: 10.1038/s41467-020-15512-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open
Abstract
Massively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While such assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes have not been directly assayed. Here, we present an imputation method based on inferring the least epistatic possible sequence-function relationship compatible with the data. In particular, we infer the reconstruction where mutational effects change as little as possible across adjacent genetic backgrounds. The resulting models can capture complex higher-order genetic interactions near the data, but approach additivity where data is sparse or absent. We apply the method to high-throughput transcription factor binding assays and use it to explore a fitness landscape for protein G.
Collapse
Affiliation(s)
- Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
27
|
Lespinats S, De Clerck O, Colange B, Gorelova V, Grando D, Maréchal E, Van Der Straeten D, Rébeillé F, Bastien O. Phylogeny and Sequence Space: A Combined Approach to Analyze the Evolutionary Trajectories of Homologous Proteins. The Case Study of Aminodeoxychorismate Synthase. Acta Biotheor 2020; 68:139-156. [PMID: 31312977 DOI: 10.1007/s10441-019-09352-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 07/10/2019] [Indexed: 11/27/2022]
Abstract
During the course of evolution, variations of a protein sequence is an ongoing phenomenon however limited by the need to maintain its structural and functional integrity. Deciphering the evolutionary path of a protein is thus of fundamental interest. With the development of new methods to visualize high dimension spaces and the improvement of phylogenetic analysis tools, it is possible to study the evolutionary trajectories of proteins in the sequence space. Using the data-driven high-dimensional scaling method, we show that it is possible to predict and represent potential evolutionary trajectories by representing phylogenetic trees into a 3D projection of the sequence space. With the case of the aminodeoxychorismate synthase, an enzyme involved in folate synthesis, we show that this representation raises interesting questions about the complexity of the evolution of a given biological function, in particular concerning its capacity to explore the sequence space.
Collapse
Affiliation(s)
| | - Olivier De Clerck
- Department of Biology, Phycology Research Group, Ghent University, Krijgslaan 281, 9000, Ghent, Belgium
| | - Benoît Colange
- Univ. Grenoble Alpes, INES, 73375, Le Bourget du Lac, France
| | - Vera Gorelova
- Department of Biology, Laboratory of Functional Plant Biology, Ghent University, K.L Ledeganckstraat 35, 9000, Ghent, Belgium
- Department of Botany and Plant Biology, Laboratory of Plant Biochemistry and Physiology, University of Geneva, Quai E. Ansermet 30, 1211, Geneva, Switzerland
| | - Delphine Grando
- Univ. Grenoble Alpes, CEA, CNRS, INRA, BIG-LPCV, 38000, Grenoble, France
| | - Eric Maréchal
- Univ. Grenoble Alpes, CEA, CNRS, INRA, BIG-LPCV, 38000, Grenoble, France
| | - Dominique Van Der Straeten
- Department of Biology, Laboratory of Functional Plant Biology, Ghent University, K.L Ledeganckstraat 35, 9000, Ghent, Belgium
| | - Fabrice Rébeillé
- Univ. Grenoble Alpes, CEA, CNRS, INRA, BIG-LPCV, 38000, Grenoble, France
| | - Olivier Bastien
- Univ. Grenoble Alpes, CEA, CNRS, INRA, BIG-LPCV, 38000, Grenoble, France.
- Laboratoire de Physiologie Cellulaire Végétale, Département Réponse et Dynamique Cellulaire, CEA Grenoble, UMR 5168, CNRS-CEA-INRA-Université J. Fourier, 17 rue des Martyrs, 38054, Grenoble Cedex 09, France.
| |
Collapse
|
28
|
Evolution Rapidly Optimizes Stability and Aggregation in Lattice Proteins Despite Pervasive Landscape Valleys and Mazes. Genetics 2020; 214:1047-1057. [PMID: 32107278 DOI: 10.1534/genetics.120.302815] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 02/18/2020] [Indexed: 11/18/2022] Open
Abstract
The "fitness" landscapes of genetic sequences are characterized by high dimensionality and "ruggedness" due to sign epistasis. Ascending from low to high fitness on such landscapes can be difficult because adaptive trajectories get stuck at low-fitness local peaks. Compounding matters, recent theoretical arguments have proposed that extremely long, winding adaptive paths may be required to reach even local peaks: a "maze-like" landscape topography. The extent to which peaks and mazes shape the mode and tempo of evolution is poorly understood, due to empirical limitations and the abstractness of many landscape models. We explore the prevalence, scale, and evolutionary consequences of landscape mazes in a biophysically grounded computational model of protein evolution that captures the "frustration" between "stability" and aggregation propensity. Our stability-aggregation landscape exhibits extensive sign epistasis and local peaks galore. Although this frequently obstructs adaptive ascent to high fitness and virtually eliminates reproducibility of evolutionary outcomes, many adaptive paths do successfully complete the ascent from low to high fitness, with hydrophobicity a critical mediator of success. These successful paths exhibit maze-like properties on a global landscape scale, in which taking an indirect path helps to avoid low-fitness local peaks. This delicate balance of "hard but possible" adaptation could occur more broadly in other biological settings where competing interactions and frustration are important.
Collapse
|
29
|
Allele-specific nonstationarity in evolution of influenza A virus surface proteins. Proc Natl Acad Sci U S A 2019; 116:21104-21112. [PMID: 31578251 DOI: 10.1073/pnas.1904246116] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Influenza A virus (IAV) is a major public health problem and a pandemic threat. Its evolution is largely driven by diversifying positive selection so that relative fitness of different amino acid variants changes with time due to changes in herd immunity or genomic context, and novel amino acid variants attain fitness advantage. Here, we hypothesize that diversifying selection also has another manifestation: the fitness associated with a particular amino acid variant should decline with time since its origin, as the herd immunity adapts to it. By tracing the evolution of antigenic sites at IAV surface proteins, we show that an amino acid variant becomes progressively more likely to become replaced by another variant with time since its origin-a phenomenon we call "senescence." Senescence is particularly pronounced at experimentally validated antigenic sites, implying that it is largely driven by host immunity. By contrast, at internal sites, existing variants become more favorable with time, probably due to arising contingent mutations at other epistatically interacting sites. Our findings reveal a previously undescribed facet of adaptive evolution and suggest approaches for prediction of evolutionary dynamics of pathogens.
Collapse
|
30
|
Currin A, Kwok J, Sadler JC, Bell EL, Swainston N, Ababi M, Day P, Turner NJ, Kell DB. GeneORator: An Effective Strategy for Navigating Protein Sequence Space More Efficiently through Boolean OR-Type DNA Libraries. ACS Synth Biol 2019; 8:1371-1378. [PMID: 31132850 PMCID: PMC7007284 DOI: 10.1021/acssynbio.9b00063] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Directed evolution requires the creation of genetic diversity and subsequent screening or selection for improved variants. For DNA mutagenesis, conventional site-directed methods implicitly utilize the Boolean AND operator (creating all mutations simultaneously), producing a combinatorial explosion in the number of genetic variants as the number of mutations increases. We introduce GeneORator, a novel strategy for creating DNA libraries based on the Boolean logical OR operator. Here, a single library is divided into many subsets, each containing different combinations of the desired mutations. Consequently, the effect of adding more mutations on the number of genetic combinations is additive (Boolean OR logic) and not exponential (AND logic). We demonstrate this strategy with large-scale mutagenesis studies, using monoamine oxidase-N ( Aspergillus niger) as the exemplar target. First, we mutated every residue in the secondary structure-containing regions (276 out of a total 495 amino acids) to screen for improvements in kcat. Second, combinatorial OR-type libraries permitted screening of diverse mutation combinations in the enzyme active site to detect activity toward novel substrates. In both examples, OR-type libraries effectively reduced the number of variants searched up to 1010-fold, dramatically reducing the screening effort required to discover variants with improved and/or novel activity. Importantly, this approach enables the screening of a greater diversity of mutation combinations, accessing a larger area of a protein's sequence space. OR-type libraries can be applied to any biological engineering objective requiring DNA mutagenesis, and the approach has wide ranging applications in, for example, enzyme engineering, antibody engineering, and synthetic biology.
Collapse
Affiliation(s)
- Andrew Currin
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, United Kingdom
- School of Chemistry, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Jane Kwok
- School of Chemistry, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Joanna C. Sadler
- School of Chemistry, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Elizabeth L. Bell
- Faculty of Biology, Medicine and Health, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Neil Swainston
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, United Kingdom
- School of Chemistry, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Maria Ababi
- Faculty of Biology, Medicine and Health, The University of Manchester, Manchester M13 9PL, United Kingdom
- School of Computer Science, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Philip Day
- Faculty of Biology, Medicine and Health, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Nicholas J. Turner
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, United Kingdom
- School of Chemistry, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Douglas B. Kell
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, United Kingdom
- School of Chemistry, The University of Manchester, Manchester M13 9PL, United Kingdom
| |
Collapse
|
31
|
Abstract
The average fitness difference between adjacent sites in a fitness landscape is an important descriptor that impacts in particular the dynamics of selection/mutation processes on the landscape. Of particular interest is its connection to the error threshold phenomenon. We show here that this parameter is intimately tied to the ruggedness through the landscape's amplitude spectrum. For the NK model, a surprisingly simple analytical estimate explains simulation data with high precision.
Collapse
|
32
|
Pokusaeva VO, Usmanova DR, Putintseva EV, Espinar L, Sarkisyan KS, Mishin AS, Bogatyreva NS, Ivankov DN, Akopyan AV, Avvakumov SY, Povolotskaya IS, Filion GJ, Carey LB, Kondrashov FA. An experimental assay of the interactions of amino acids from orthologous sequences shaping a complex fitness landscape. PLoS Genet 2019; 15:e1008079. [PMID: 30969963 PMCID: PMC6476524 DOI: 10.1371/journal.pgen.1008079] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 04/22/2019] [Accepted: 03/11/2019] [Indexed: 11/18/2022] Open
Abstract
Characterizing the fitness landscape, a representation of fitness for a large set of genotypes, is key to understanding how genetic information is interpreted to create functional organisms. Here we determined the evolutionarily-relevant segment of the fitness landscape of His3, a gene coding for an enzyme in the histidine synthesis pathway, focusing on combinations of amino acid states found at orthologous sites of extant species. Just 15% of amino acids found in yeast His3 orthologues were always neutral while the impact on fitness of the remaining 85% depended on the genetic background. Furthermore, at 67% of sites, amino acid replacements were under sign epistasis, having both strongly positive and negative effect in different genetic backgrounds. 46% of sites were under reciprocal sign epistasis. The fitness impact of amino acid replacements was influenced by only a few genetic backgrounds but involved interaction of multiple sites, shaping a rugged fitness landscape in which many of the shortest paths between highly fit genotypes are inaccessible. An intuitive understanding of protein evolution dictates that, with the exception of adaptive substitutions, amino acid states should be freely exchangeable between the same gene from different species. However, the extent to which this assertion holds true has not been tested in a controlled experiment. Here, we show that whether an amino acid state can be exchanged between orthologues depends on other amino acid states in the same protein. Furthermore, we show that the mode of interaction of amino acid states is multidimensional. Assuming that amino acid replacements influence the protein in several independent ways substantially improves our ability to predict the effect of an amino acid state in a protein sequence that has not been observed in nature.
Collapse
Affiliation(s)
| | - Dinara R. Usmanova
- Department of Systems Biology, Columbia University, New York, NY, United States of America
| | | | - Lorena Espinar
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Karen S. Sarkisyan
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
- Medical Research Council London Institute of Medical Sciences, Imperial College London, London, United Kingdom
| | | | - Natalya S. Bogatyreva
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Moscow region, Russia
| | - Dmitry N. Ivankov
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Moscow region, Russia
| | - Arseniy V. Akopyan
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
| | - Sergey Ya. Avvakumov
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
| | - Inna S. Povolotskaya
- Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Moscow, Russia
| | - Guillaume J. Filion
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Lucas B. Carey
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Center for Quantitative Biology and Peking-Tsinghua Joint Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- * E-mail: (LBC); (FAK)
| | - Fyodor A. Kondrashov
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
- * E-mail: (LBC); (FAK)
| |
Collapse
|
33
|
Evolutionary transitions in controls reconcile adaptation with continuity of evolution. Semin Cell Dev Biol 2019; 88:36-45. [DOI: 10.1016/j.semcdb.2018.05.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 02/19/2018] [Accepted: 05/15/2018] [Indexed: 12/14/2022]
|
34
|
Fragata I, Blanckaert A, Dias Louro MA, Liberles DA, Bank C. Evolution in the light of fitness landscape theory. Trends Ecol Evol 2019; 34:69-82. [DOI: 10.1016/j.tree.2018.10.009] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 10/16/2018] [Accepted: 10/17/2018] [Indexed: 01/28/2023]
|
35
|
de Visser JAGM, Elena SF, Fragata I, Matuszewski S. The utility of fitness landscapes and big data for predicting evolution. Heredity (Edinb) 2018; 121:401-405. [PMID: 30127530 PMCID: PMC6180140 DOI: 10.1038/s41437-018-0128-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 07/13/2018] [Accepted: 07/13/2018] [Indexed: 11/25/2022] Open
Affiliation(s)
| | - Santiago F Elena
- Instituto de Biología Molecular y Celular de Plantas (IBMCP), Consejo Superior de Investigaciones Científicas-Universitat Politècnica de València, València, Spain. .,Instituto de Biología Integrativa de Sistemas (I2SysBio), Consejo Superior de Investigaciones Científicas-Universitat de València, València, Spain. .,The Santa Fe Institute, Santa Fe, NM, 87501, USA.
| | - Inês Fragata
- Instituto Gulbenkian de Ciência, Oeiras, Portugal.
| | | |
Collapse
|
36
|
Abstract
Genotype-phenotype relationships are notoriously complicated. Idiosyncratic interactions between specific combinations of mutations occur and are difficult to predict. Yet it is increasingly clear that many interactions can be understood in terms of global epistasis. That is, mutations may act additively on some underlying, unobserved trait, and this trait is then transformed via a nonlinear function to the observed phenotype as a result of subsequent biophysical and cellular processes. Here we infer the shape of such global epistasis in three proteins, based on published high-throughput mutagenesis data. To do so, we develop a maximum-likelihood inference procedure using a flexible family of monotonic nonlinear functions spanned by an I-spline basis. Our analysis uncovers dramatic nonlinearities in all three proteins; in some proteins a model with global epistasis accounts for virtually all of the measured variation, whereas in others we find substantial local epistasis as well. This method allows us to test hypotheses about the form of global epistasis and to distinguish variance components attributable to global epistasis, local epistasis, and measurement error.
Collapse
|
37
|
Storz JF. Compensatory mutations and epistasis for protein function. Curr Opin Struct Biol 2018; 50:18-25. [PMID: 29100081 PMCID: PMC5936477 DOI: 10.1016/j.sbi.2017.10.009] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 10/05/2017] [Accepted: 10/12/2017] [Indexed: 01/09/2023]
Abstract
Adaptive protein evolution may be facilitated by neutral amino acid mutations that confer no benefit when they first arise but which potentiate subsequent function-altering mutations via direct or indirect structural mechanisms. Theoretical and empirical results indicate that such compensatory interactions (intramolecular epistasis) can exert a strong influence on trajectories of protein evolution. For this reason, assessing the form and prevalence of intramolecular epistasis and characterizing biophysical mechanisms of compensatory interaction are important research goals at the nexus of structural biology and molecular evolution. Here I review recent insights derived from protein-engineering studies, and I describe an approach for identifying and characterizing mechanisms of epistasis that integrates experimental data on structure-function relationships with analyses of comparative sequence data.
Collapse
Affiliation(s)
- Jay F Storz
- University of Nebraska, School of Biological Sciences, Lincoln, NE 68588-0114, United States.
| |
Collapse
|
38
|
Abstract
We show that genetic recombination can be a powerful mechanism for escaping suboptimal peaks. Recent studies of empirical fitness landscapes reveal complex gene interactions and multiple peaks. However, classical work on recombination largely ignores the effect of complex gene interactions. Briefly, we restrict to fitness landscapes where the global peak is difficult to access. If the optimal genotype can be generated by shuffling genes present in the population, then recombination will produce the genotype. If, in addition, recombination is sufficiently rare, then the proportion of the genotype is expected to increase. Specifically, we consider landscapes where shuffling of suboptimal peak genotypes can produce the global peak genotype. The advantage of recombination we identify has no correspondence for 2-locus systems or for smooth landscapes. The effect of recombination indicated is sometimes extreme, also for rare recombination, in the sense that shutting off recombination could result in the organism failing to adapt. A standard question about recombination is whether the mechanism tends to accelerate or decelerate adaptation. However, we argue that extreme effects may be more important than how the majority falls. In a limited sense, our result can be considered a support for Sewall Wright’s view that adaptation sometimes works better in subdivided populations.
Collapse
Affiliation(s)
- Kristina Crona
- American University, Washington DC, United States of America
- * E-mail:
| |
Collapse
|
39
|
Learning epistatic interactions from sequence-activity data to predict enantioselectivity. J Comput Aided Mol Des 2017; 31:1085-1096. [DOI: 10.1007/s10822-017-0090-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 12/04/2017] [Indexed: 10/18/2022]
|
40
|
Affiliation(s)
- Diarmaid Hughes
- Department of Medical Biochemistry and Microbiology, Uppsala University, 751 23 Uppsala, Sweden
| | - Dan I. Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, 751 23 Uppsala, Sweden
| |
Collapse
|
41
|
Sommer MOA, Munck C, Toft-Kehler RV, Andersson DI. Prediction of antibiotic resistance: time for a new preclinical paradigm? Nat Rev Microbiol 2017; 15:689-696. [DOI: 10.1038/nrmicro.2017.75] [Citation(s) in RCA: 164] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
42
|
Lagator M, Paixão T, Barton NH, Bollback JP, Guet CC. On the mechanistic nature of epistasis in a canonical cis-regulatory element. eLife 2017; 6. [PMID: 28518057 PMCID: PMC5481185 DOI: 10.7554/elife.25192] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Accepted: 05/17/2017] [Indexed: 01/02/2023] Open
Abstract
Understanding the relation between genotype and phenotype remains a major challenge. The difficulty of predicting individual mutation effects, and particularly the interactions between them, has prevented the development of a comprehensive theory that links genotypic changes to their phenotypic effects. We show that a general thermodynamic framework for gene regulation, based on a biophysical understanding of protein-DNA binding, accurately predicts the sign of epistasis in a canonical cis-regulatory element consisting of overlapping RNA polymerase and repressor binding sites. Sign and magnitude of individual mutation effects are sufficient to predict the sign of epistasis and its environmental dependence. Thus, the thermodynamic model offers the correct null prediction for epistasis between mutations across DNA-binding sites. Our results indicate that a predictive theory for the effects of cis-regulatory mutations is possible from first principles, as long as the essential molecular mechanisms and the constraints these impose on a biological system are accounted for. DOI:http://dx.doi.org/10.7554/eLife.25192.001 Mutations are changes to DNA that provide the raw material upon which evolution can act. Therefore, to understand evolution, we need to know the effects of mutations, and how those mutations interact with each other (a phenomenon referred to as epistasis). So far, few mathematical models allow scientists to predict the effects of mutations, and even fewer are able to predict epistasis. Biological systems are complex and consist of many proteins and other molecules. Genes are the sections of DNA that provide the instructions needed to produce these molecules, and some genes encode proteins that can bind to DNA to control whether other genes are switched on or off. Lagator, Paixão et al. have now used mathematical models and experiments to understand how the environment inside the cells of a bacterium known as E. coli, specifically the amount of particular proteins, affects epistasis. These mathematical models are able to predict interactions between mutations in the most abundant class of DNA-binding sites in proteins. This approach found that the nature of the interaction between mutations can be explained through biophysical laws, combined with the basic knowledge of the logic of how genes regulate each other’s activities. Furthermore, the models allow Lagator, Paixão et al. to predict interactions between mutations in several different environments, such as the presence of a new food source or a toxin, defined by the amounts of relevant DNA-binding proteins in cells. By providing new ways of understanding how genes are regulated in bacteria, and how gene regulation is affected by mutations, these findings contribute to our understanding of how organisms evolve. In addition, this work may help us to build artificial networks of genes that interact with each other to produce a desired response, such as more efficient production of fuel from ethanol or the break down of hazardous chemicals. DOI:http://dx.doi.org/10.7554/eLife.25192.002
Collapse
Affiliation(s)
- Mato Lagator
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | - Tiago Paixão
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | - Nicholas H Barton
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | - Jonathan P Bollback
- Institute of Science and Technology Austria, Klosterneuburg, Austria.,Department of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Călin C Guet
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| |
Collapse
|
43
|
Kumar A, Natarajan C, Moriyama H, Witt CC, Weber RE, Fago A, Storz JF. Stability-Mediated Epistasis Restricts Accessible Mutational Pathways in the Functional Evolution of Avian Hemoglobin. Mol Biol Evol 2017; 34:1240-1251. [PMID: 28201714 PMCID: PMC5400398 DOI: 10.1093/molbev/msx085] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
If the fitness effects of amino acid mutations are conditional on genetic background, then mutations can have different effects depending on the sequential order in which they occur during evolutionary transitions in protein function. A key question concerns the fraction of possible mutational pathways connecting alternative functional states that involve transient reductions in fitness. Here we examine the functional effects of multiple amino acid substitutions that contributed to an evolutionary transition in the oxygenation properties of avian hemoglobin (Hb). The set of causative changes included mutations at intradimer interfaces of the Hb tetramer. Replacements at such sites may be especially likely to have epistatic effects on Hb function since residues at intersubunit interfaces are enmeshed in networks of salt bridges and hydrogen bonds between like and unlike subunits; mutational reconfigurations of these atomic contacts can affect allosteric transitions in quaternary structure and the propensity for tetramer-dimer dissociation. We used ancestral protein resurrection in conjunction with a combinatorial protein engineering approach to synthesize genotypes representing the complete set of mutational intermediates in all possible forward pathways that connect functionally distinct ancestral and descendent genotypes. The experiments revealed that 1/2 of all possible forward pathways included mutational intermediates with aberrant functional properties because particular combinations of mutations promoted tetramer-dimer dissociation. The subset of mutational pathways with unstable intermediates may be selectively inaccessible, representing evolutionary roads not taken. The experimental results also demonstrate how epistasis for particular functional properties of proteins may be mediated indirectly by mutational effects on quaternary structural stability.
Collapse
Affiliation(s)
- Amit Kumar
- School of Biological Sciences, University of Nebraska, Lincoln, NE
| | | | - Hideaki Moriyama
- School of Biological Sciences, University of Nebraska, Lincoln, NE
| | - Christopher C. Witt
- Department of Biology, University of New Mexico, Albuquerque, NM
- Museum of Southwestern Biology, University of New Mexico, Albuquerque, NM
| | - Roy E. Weber
- Zoophysiology, Department of Bioscience, Aarhus University, Aarhus, Denmark
| | - Angela Fago
- Zoophysiology, Department of Bioscience, Aarhus University, Aarhus, Denmark
| | - Jay F. Storz
- School of Biological Sciences, University of Nebraska, Lincoln, NE
| |
Collapse
|
44
|
Chan YH, Venev SV, Zeldovich KB, Matthews CR. Correlation of fitness landscapes from three orthologous TIM barrels originates from sequence and structure constraints. Nat Commun 2017; 8:14614. [PMID: 28262665 PMCID: PMC5343507 DOI: 10.1038/ncomms14614] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 01/11/2017] [Indexed: 02/07/2023] Open
Abstract
Sequence divergence of orthologous proteins enables adaptation to environmental stresses and promotes evolution of novel functions. Limits on evolution imposed by constraints on sequence and structure were explored using a model TIM barrel protein, indole-3-glycerol phosphate synthase (IGPS). Fitness effects of point mutations in three phylogenetically divergent IGPS proteins during adaptation to temperature stress were probed by auxotrophic complementation of yeast with prokaryotic, thermophilic IGPS. Analysis of beneficial mutations pointed to an unexpected, long-range allosteric pathway towards the active site of the protein. Significant correlations between the fitness landscapes of distant orthologues implicate both sequence and structure as primary forces in defining the TIM barrel fitness landscape and suggest that fitness landscapes can be translocated in sequence space. Exploration of fitness landscapes in the context of a protein fold provides a strategy for elucidating the sequence-structure-fitness relationships in other common motifs. The TIM barrel fold is an evolutionarily conserved motif found in proteins with a variety of enzymatic functions. Here the authors explore the fitness landscape of the TIM barrel protein IGPS and uncover evolutionary constraints on both sequence and structure, accompanied by long range allosteric interactions.
Collapse
Affiliation(s)
- Yvonne H Chan
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 364 Plantation Street, Worcester, Massachusetts 01605, USA
| | - Sergey V Venev
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, Massachusetts 01605, USA
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, Massachusetts 01605, USA
| | - C Robert Matthews
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 364 Plantation Street, Worcester, Massachusetts 01605, USA
| |
Collapse
|
45
|
|
46
|
Cheema J, Faraldos JA, O'Maille PE. REVIEW: Epistasis and dominance in the emergence of catalytic function as exemplified by the evolution of plant terpene synthases. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2017; 255:29-38. [PMID: 28131339 DOI: 10.1016/j.plantsci.2016.11.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Revised: 10/17/2016] [Accepted: 11/12/2016] [Indexed: 06/06/2023]
Abstract
Epistasis, the interaction between mutations and the genetic background, is a pervasive force in evolution that is difficult to predict yet derives from a simple principle - biological systems are interconnected. Therefore, one effect may be intimately linked to another, hence interdependent. Untangling epistatic interactions between and within genes is a vibrant area of research. Deriving a mechanistic understanding of epistasis is a major challenge. Particularly, elucidating how epistasis can attenuate the effects of otherwise dominant mutations that control phenotypes. Using the emergence of terpene cyclization in specialized metabolism as an excellent example, this review describes the process of discovery and interpretation of dominance and epistasis in relation to current efforts. Specifically, we outline experimental approaches to isolating epistatic networks of mutations in protein structure, formally quantifying epistatic interactions, then building biochemical models with chemical mechanisms in efforts to achieve an understanding of the physical basis for epistasis. From these models we describe informed conjectures about past evolutionary events that underlie the emergence, divergence and specialization of terpene synthases to illustrate key principles of the constraining forces of epistasis in enzyme function.
Collapse
Affiliation(s)
- Jitender Cheema
- John Innes Centre, Computational and Systems Biology, Norwich Research Park, Norwich NR4 7UH, UK.
| | - Juan A Faraldos
- John Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich NR4 7UH, UK.
| | - Paul E O'Maille
- John Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich NR4 7UH, UK; Institute of Food Research, Food & Health Programme, Norwich Research Park, Norwich NR4 7UA, UK.
| |
Collapse
|
47
|
Spiraling Complexity: A Test of the Snowball Effect in a Computational Model of RNA Folding. Genetics 2016; 206:377-388. [PMID: 28007889 DOI: 10.1534/genetics.116.196030] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 03/03/2017] [Indexed: 01/07/2023] Open
Abstract
Genetic incompatibilities can emerge as a byproduct of genetic divergence. According to Dobzhansky and Muller, an allele that fixes in one population may be incompatible with an allele at a different locus in another population when the two alleles are brought together in hybrids. Orr showed that the number of Dobzhansky-Muller incompatibilities (DMIs) should accumulate faster than linearly-i.e., snowball-as two lineages diverge. Several studies have attempted to test the snowball effect using data from natural populations. One limitation of these studies is that they have focused on predictions of the Orr model, but not on its underlying assumptions. Here, we use a computational model of RNA folding to test both predictions and assumptions of the Orr model. Two populations are allowed to evolve in allopatry on a holey fitness landscape. We find that the number of inviable introgressions (an indicator for the number of DMIs) snowballs, but does so more slowly than expected. We show that this pattern is explained, in part, by the fact that DMIs can disappear after they have arisen, contrary to the assumptions of the Orr model. This occurs because DMIs become progressively more complex (i.e., involve alleles at more loci) as a result of later substitutions. We also find that most DMIs involve >2 loci, i.e., they are complex. Reproductive isolation does not snowball because DMIs do not act independently of each other. We conclude that the RNA model supports the central prediction of the Orr model that the number of DMIs snowballs, but challenges other predictions, as well as some of its underlying assumptions.
Collapse
|
48
|
Bazykin GA. Changing preferences: deformation of single position amino acid fitness landscapes and evolution of proteins. Biol Lett 2016; 11:rsbl.2015.0315. [PMID: 26445980 DOI: 10.1098/rsbl.2015.0315] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The fitness landscape-the function that relates genotypes to fitness-and its role in directing evolution are a central object of evolutionary biology. However, its huge dimensionality precludes understanding of even the basic aspects of its shape. One way to approach it is to ask a simpler question: what are the properties of a function that assigns fitness to each possible variant at just one particular site-a single position fitness landscape-and how does it change in the course of evolution? Analyses of genomic data from multiple species and multiple individuals within a species have proved beyond reasonable doubt that fitness functions of positions throughout the genome do themselves change with time, thus shaping protein evolution. Here, I will briefly review the literature that addresses these dynamics, focusing on recent genome-scale analyses of fitness functions of amino acid sites, i.e. vectors of fitnesses of 20 individual amino acid variants at a given position of a protein. The set of amino acids that confer high fitness at a particular position changes with time, and the rate of this change is comparable with the rate at which a position evolves, implying that this process plays a major role in evolutionary dynamics. However, the causes of these changes remain largely unclear.
Collapse
Affiliation(s)
- Georgii A Bazykin
- Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences, Moscow 127051, Russia Faculty of Bioengineering and Bioinformatics and Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow 119234, Russia Pirogov Russian National Research Medical University, Moscow 117997, Russia
| |
Collapse
|
49
|
Wu NC, Dai L, Olson CA, Lloyd-Smith JO, Sun R. Adaptation in protein fitness landscapes is facilitated by indirect paths. eLife 2016; 5. [PMID: 27391790 PMCID: PMC4985287 DOI: 10.7554/elife.16965] [Citation(s) in RCA: 128] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 07/07/2016] [Indexed: 12/11/2022] Open
Abstract
The structure of fitness landscapes is critical for understanding adaptive protein evolution. Previous empirical studies on fitness landscapes were confined to either the neighborhood around the wild type sequence, involving mostly single and double mutants, or a combinatorially complete subgraph involving only two amino acids at each site. In reality, the dimensionality of protein sequence space is higher (20L) and there may be higher-order interactions among more than two sites. Here we experimentally characterized the fitness landscape of four sites in protein GB1, containing 204 = 160,000 variants. We found that while reciprocal sign epistasis blocked many direct paths of adaptation, such evolutionary traps could be circumvented by indirect paths through genotype space involving gain and subsequent loss of mutations. These indirect paths alleviate the constraint on adaptive protein evolution, suggesting that the heretofore neglected dimensions of sequence space may change our views on how proteins evolve. DOI:http://dx.doi.org/10.7554/eLife.16965.001 Proteins can evolve over time by changing their component parts, which are called amino acids. These changes usually happen one at a time and natural selection tends to preserve those changes that make the protein more efficient at its specific tasks, while discarding those that impair the protein’s activity. However the effect of each change depends on the protein as a whole, and so two changes that separately make the protein worse can make it much better if they occur together. This phenomenon is called epistasis and in some cases it can trap proteins in a sub-optimal form and prevent them from improving further. Proteins are made from twenty different kinds of amino acid, and there are millions of different combinations of amino acids that could, in theory, make a protein of a given length. Studying protein evolution involves making variants of the same protein, each with just a few changes, and comparing how efficient, or “fit”, they are. Previous studies only measured the fitness of a few variants and showed that epistasis could block protein evolution by requiring the protein to lose some fitness before it could improve further. However, new techniques have now made it easier to study protein evolution by testing many more protein variants. Wu, Dai et al. focused on four amino acids in part of a protein called GB1 and tested the efficiency of every possible combination of these four amino acids, a total of 160,000 (204) variants. Contrary to expectations, the results suggested that the protein could evolve quickly to maximise fitness despite there being epistasis between the four amino acids. Overcoming epistasis typically involved making a change to one amino acid that paved the way for further changes while avoiding the need to lose fitness. The original change could then be reversed once the epistasis was overcome. The complexity of this solution means it can only be seen by studying a large number of protein variants that represent many alternative sequences of protein changes. Wu, Dai et al. conclude that proteins are able to achieve a higher level of fitness through evolution by exploring a large number of changes. There are many possible changes for each protein and it is this variety that, despite epistasis, allows proteins to become naturally optimised for the tasks that they perform. While the full complexity of protein evolution cannot be explored at the moment, as technology advances it will become possible to study more protein variants. Such advances would therefore hopefully allow researchers to discover even more about the natural mechanisms of protein evolution. DOI:http://dx.doi.org/10.7554/eLife.16965.002
Collapse
Affiliation(s)
- Nicholas C Wu
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, United States.,Molecular Biology Institute, University of California, Los Angeles, Los Angeles, United States
| | - Lei Dai
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, United States.,Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, United States
| | - C Anders Olson
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, United States
| | - James O Lloyd-Smith
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, United States
| | - Ren Sun
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, United States.,Molecular Biology Institute, University of California, Los Angeles, Los Angeles, United States
| |
Collapse
|
50
|
Steinberg B, Ostermeier M. Shifting Fitness and Epistatic Landscapes Reflect Trade-offs along an Evolutionary Pathway. J Mol Biol 2016; 428:2730-43. [DOI: 10.1016/j.jmb.2016.04.033] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Revised: 04/18/2016] [Accepted: 04/29/2016] [Indexed: 01/04/2023]
|