1
|
Westmann CA, Goldbach L, Wagner A. The highly rugged yet navigable regulatory landscape of the bacterial transcription factor TetR. Nat Commun 2024; 15:10745. [PMID: 39737967 DOI: 10.1038/s41467-024-54723-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 11/19/2024] [Indexed: 01/01/2025] Open
Abstract
Transcription factor binding sites (TFBSs) are important sources of evolutionary innovations. Understanding how evolution navigates the sequence space of such sites can be achieved by mapping TFBS adaptive landscapes. In such a landscape, an individual location corresponds to a TFBS bound by a transcription factor. The elevation at that location corresponds to the strength of transcriptional regulation conveyed by the sequence. Here, we develop an in vivo massively parallel reporter assay to map the landscape of bacterial TFBSs. We apply this assay to the TetR repressor, for which few TFBSs are known. We quantify the strength of transcriptional repression for 17,765 TFBSs and show that the resulting landscape is highly rugged, with 2092 peaks. Only a few peaks convey stronger repression than the wild type. Non-additive (epistatic) interactions between mutations are frequent. Despite these hallmarks of ruggedness, most high peaks are evolutionarily accessible. They have large basins of attraction and are reached by around 20% of populations evolving on the landscape. Which high peak is reached during evolution is unpredictable and contingent on the mutational path taken. This in-depth analysis of a prokaryotic gene regulator reveals a landscape that is navigable but much more rugged than the landscapes of eukaryotic regulators.
Collapse
Affiliation(s)
- Cauã Antunes Westmann
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, Zurich, CH-8057, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015, Lausanne, Switzerland
| | - Leander Goldbach
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, Zurich, CH-8057, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015, Lausanne, Switzerland
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, Zurich, CH-8057, Switzerland.
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015, Lausanne, Switzerland.
- The Santa Fe Institute, Santa Fe, NM, 87501, USA.
| |
Collapse
|
2
|
Mackay TFC, Anholt RRH. Pleiotropy, epistasis and the genetic architecture of quantitative traits. Nat Rev Genet 2024; 25:639-657. [PMID: 38565962 PMCID: PMC11330371 DOI: 10.1038/s41576-024-00711-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/14/2024] [Indexed: 04/04/2024]
Abstract
Pleiotropy (whereby one genetic polymorphism affects multiple traits) and epistasis (whereby non-linear interactions between genetic polymorphisms affect the same trait) are fundamental aspects of the genetic architecture of quantitative traits. Recent advances in the ability to characterize the effects of polymorphic variants on molecular and organismal phenotypes in human and model organism populations have revealed the prevalence of pleiotropy and unexpected shared molecular genetic bases among quantitative traits, including diseases. By contrast, epistasis is common between polymorphic loci associated with quantitative traits in model organisms, such that alleles at one locus have different effects in different genetic backgrounds, but is rarely observed for human quantitative traits and common diseases. Here, we review the concepts and recent inferences about pleiotropy and epistasis, and discuss factors that contribute to similarities and differences between the genetic architecture of quantitative traits in model organisms and humans.
Collapse
Affiliation(s)
- Trudy F C Mackay
- Center for Human Genetics, Clemson University, Greenwood, SC, USA.
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA.
| | - Robert R H Anholt
- Center for Human Genetics, Clemson University, Greenwood, SC, USA.
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA.
| |
Collapse
|
3
|
Hong Z, Shimagaki KS, Barton JP. popDMS infers mutation effects from deep mutational scanning data. Bioinformatics 2024; 40:btae499. [PMID: 39115383 PMCID: PMC11335369 DOI: 10.1093/bioinformatics/btae499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 07/10/2024] [Accepted: 08/06/2024] [Indexed: 08/22/2024] Open
Abstract
SUMMARY Deep mutational scanning (DMS) experiments provide a powerful method to measure the functional effects of genetic mutations at massive scales. However, the data generated from these experiments can be difficult to analyze, with significant variation between experimental replicates. To overcome this challenge, we developed popDMS, a computational method based on population genetics theory, to infer the functional effects of mutations from DMS data. Through extensive tests, we found that the functional effects of single mutations and epistasis inferred by popDMS are highly consistent across replicates, comparing favorably with existing methods. Our approach is flexible and can be widely applied to DMS data that includes multiple time points, multiple replicates, and different experimental conditions. AVAILABILITY AND IMPLEMENTATION popDMS is implemented in Python and Julia, and is freely available on GitHub at https://github.com/bartonlab/popDMS.
Collapse
Affiliation(s)
- Zhenchen Hong
- Department of Physics and Astronomy, University of California, Riverside, CA 92521, United States
| | - Kai S Shimagaki
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, PA 15260, United States
| | - John P Barton
- Department of Physics and Astronomy, University of California, Riverside, CA 92521, United States
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, PA 15260, United States
- Department of Physics and Astronomy, University of Pittsburgh, PA 15260, United States
| |
Collapse
|
4
|
Chen SK, Liu J, Van Nynatten A, Tudor-Price BM, Chang BSW. Sampling Strategies for Experimentally Mapping Molecular Fitness Landscapes Using High-Throughput Methods. J Mol Evol 2024:10.1007/s00239-024-10179-8. [PMID: 38886207 DOI: 10.1007/s00239-024-10179-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 05/20/2024] [Indexed: 06/20/2024]
Abstract
Empirical studies of genotype-phenotype-fitness maps of proteins are fundamental to understanding the evolutionary process, in elucidating the space of possible genotypes accessible through mutations in a landscape of phenotypes and fitness effects. Yet, comprehensively mapping molecular fitness landscapes remains challenging since all possible combinations of amino acid substitutions for even a few protein sites are encoded by an enormous genotype space. High-throughput mapping of genotype space can be achieved using large-scale screening experiments known as multiplexed assays of variant effect (MAVEs). However, to accommodate such multi-mutational studies, the size of MAVEs has grown to the point where a priori determination of sampling requirements is needed. To address this problem, we propose calculations and simulation methods to approximate minimum sampling requirements for multi-mutational MAVEs, which we combine with a new library construction protocol to experimentally validate our approximation approaches. Analysis of our simulated data reveals how sampling trajectories differ between simulations of nucleotide versus amino acid variants and among mutagenesis schemes. For this, we show quantitatively that marginal gains in sampling efficiency demand increasingly greater sampling effort when sampling for nucleotide sequences over their encoded amino acid equivalents. We present a new library construction protocol that efficiently maximizes sequence variation, and demonstrate using ultradeep sequencing that the library encodes virtually all possible combinations of mutations within the experimental design. Insights learned from our analyses together with the methodological advances reported herein are immediately applicable toward pooled experimental screens of arbitrary design, enabling further assay upscaling and expanded testing of genotype space.
Collapse
Affiliation(s)
- Steven K Chen
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Jing Liu
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Alexander Van Nynatten
- Department of Biological Science, University of Toronto Scarborough, Toronto, ON, Canada
| | | | - Belinda S W Chang
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada.
- Department of Ecology & Evolutionary Biology, University of Toronto, Toronto, ON, Canada.
- Centre for the Analysis of Genome Evolution & Function, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
5
|
Wagner A. Genotype sampling for deep-learning assisted experimental mapping of a combinatorially complete fitness landscape. Bioinformatics 2024; 40:btae317. [PMID: 38745436 PMCID: PMC11132821 DOI: 10.1093/bioinformatics/btae317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/21/2024] [Accepted: 05/14/2024] [Indexed: 05/16/2024] Open
Abstract
MOTIVATION Experimental characterization of fitness landscapes, which map genotypes onto fitness, is important for both evolutionary biology and protein engineering. It faces a fundamental obstacle in the astronomical number of genotypes whose fitness needs to be measured for any one protein. Deep learning may help to predict the fitness of many genotypes from a smaller neural network training sample of genotypes with experimentally measured fitness. Here I use a recently published experimentally mapped fitness landscape of more than 260 000 protein genotypes to ask how such sampling is best performed. RESULTS I show that multilayer perceptrons, recurrent neural networks, convolutional networks, and transformers, can explain more than 90% of fitness variance in the data. In addition, 90% of this performance is reached with a training sample comprising merely ≈103 sequences. Generalization to unseen test data is best when training data is sampled randomly and uniformly, or sampled to minimize the number of synonymous sequences. In contrast, sampling to maximize sequence diversity or codon usage bias reduces performance substantially. These observations hold for more than one network architecture. Simple sampling strategies may perform best when training deep learning neural networks to map fitness landscapes from experimental data. AVAILABILITY AND IMPLEMENTATION The fitness landscape data analyzed here is publicly available as described previously (Papkou et al. 2023). All code used to analyze this landscape is publicly available at https://github.com/andreas-wagner-uzh/fitness_landscape_sampling.
Collapse
Affiliation(s)
- Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode,1015 Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, 87501 NM, United States
| |
Collapse
|
6
|
Lou J, Liang W, Cao L, Hu I, Zhao S, Chen Z, Chan RWY, Cheung PPH, Zheng H, Liu C, Li Q, Chong MKC, Zhang Y, Yeoh EK, Chan PKS, Zee BCY, Mok CKP, Wang MH. Predictive evolutionary modelling for influenza virus by site-based dynamics of mutations. Nat Commun 2024; 15:2546. [PMID: 38514647 PMCID: PMC10958014 DOI: 10.1038/s41467-024-46918-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 03/12/2024] [Indexed: 03/23/2024] Open
Abstract
Influenza virus continuously evolves to escape human adaptive immunity and generates seasonal epidemics. Therefore, influenza vaccine strains need to be updated annually for the upcoming flu season to ensure vaccine effectiveness. We develop a computational approach, beth-1, to forecast virus evolution and select representative virus for influenza vaccine. The method involves modelling site-wise mutation fitness. Informed by virus genome and population sero-positivity, we calibrate transition time of mutations and project the fitness landscape to future time, based on which beth-1 selects the optimal vaccine strain. In season-to-season prediction in historical data for the influenza A pH1N1 and H3N2 viruses, beth-1 demonstrates superior genetic matching compared to existing approaches. In prospective validations, the model shows superior or non-inferior genetic matching and neutralization against circulating virus in mice immunization experiments compared to the current vaccine. The method offers a promising and ready-to-use tool to facilitate vaccine strain selection for the influenza virus through capturing heterogeneous evolutionary dynamics over genome space-time and linking molecular variants to population immune response.
Collapse
Affiliation(s)
- Jingzhi Lou
- JC School of Public Health and Primary Care (JCSPHPC), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
- Beth Bioinformatics Co. Ltd, Hong Kong SAR, China
| | - Weiwen Liang
- HKU-Pasteur Research Pole, School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Lirong Cao
- JC School of Public Health and Primary Care (JCSPHPC), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
- CUHK Shenzhen Research Institute, Shenzhen, China
| | - Inchi Hu
- Department of Statistics, George Mason University, Fairfax, VA, USA
| | - Shi Zhao
- JC School of Public Health and Primary Care (JCSPHPC), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
- School of Public Health, Tianjin Medical University, Tianjin, China
| | - Zigui Chen
- Department of Microbiology, CUHK, Hong Kong SAR, China
| | - Renee Wan Yi Chan
- Department of Paediatrics, CUHK, Hong Kong SAR, China
- Hong Kong Hub of Paediatric Excellence, CUHK, Hong Kong SAR, China
| | | | - Hong Zheng
- JC School of Public Health and Primary Care (JCSPHPC), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
| | - Caiqi Liu
- JC School of Public Health and Primary Care (JCSPHPC), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
| | - Qi Li
- JC School of Public Health and Primary Care (JCSPHPC), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
| | - Marc Ka Chun Chong
- JC School of Public Health and Primary Care (JCSPHPC), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
- CUHK Shenzhen Research Institute, Shenzhen, China
| | - Yexian Zhang
- Beth Bioinformatics Co. Ltd, Hong Kong SAR, China
- CUHK Shenzhen Research Institute, Shenzhen, China
| | - Eng-Kiong Yeoh
- JC School of Public Health and Primary Care (JCSPHPC), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
- Centre for Health Systems and Policy Research, CUHK, Hong Kong SAR, China
| | - Paul Kay-Sheung Chan
- Department of Microbiology, CUHK, Hong Kong SAR, China
- Stanley Ho Centre for Emerging Infectious Diseases, CUHK, Hong Kong SAR, China
| | - Benny Chung Ying Zee
- JC School of Public Health and Primary Care (JCSPHPC), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
- CUHK Shenzhen Research Institute, Shenzhen, China
| | - Chris Ka Pun Mok
- JC School of Public Health and Primary Care (JCSPHPC), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China.
- Li Ka Shing Institute of Health Sciences, Faculty of Medicine, CUHK, Hong Kong SAR, China.
| | - Maggie Haitian Wang
- JC School of Public Health and Primary Care (JCSPHPC), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China.
- CUHK Shenzhen Research Institute, Shenzhen, China.
| |
Collapse
|
7
|
Sumi S, Hamada M, Saito H. Deep generative design of RNA family sequences. Nat Methods 2024; 21:435-443. [PMID: 38238559 DOI: 10.1038/s41592-023-02148-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 12/07/2023] [Indexed: 03/13/2024]
Abstract
RNA engineering has immense potential to drive innovation in biotechnology and medicine. Despite its importance, a versatile platform for the automated design of functional RNA is still lacking. Here, we propose RNA family sequence generator (RfamGen), a deep generative model that designs RNA family sequences in a data-efficient manner by explicitly incorporating alignment and consensus secondary structure information. RfamGen can generate novel and functional RNA family sequences by sampling points from a semantically rich and continuous representation. We have experimentally demonstrated the versatility of RfamGen using diverse RNA families. Furthermore, we confirmed the high success rate of RfamGen in designing functional ribozymes through a quantitative massively parallel assay. Notably, RfamGen successfully generates artificial sequences with higher activity than natural sequences. Overall, RfamGen significantly improves our ability to design functional RNA and opens up new potential for generative RNA engineering in synthetic biology.
Collapse
Affiliation(s)
- Shunsuke Sumi
- Department of Life Science Frontiers, Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto, Japan
- Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
| | - Michiaki Hamada
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan.
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.
- Graduate School of Medicine, Nippon Medical School, Tokyo, Japan.
| | - Hirohide Saito
- Department of Life Science Frontiers, Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto, Japan.
- Graduate School of Medicine, Kyoto University, Kyoto, Japan.
| |
Collapse
|
8
|
Hong Z, Barton JP. popDMS infers mutation effects from deep mutational scanning data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.29.577759. [PMID: 38352383 PMCID: PMC10862717 DOI: 10.1101/2024.01.29.577759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
Deep mutational scanning (DMS) experiments provide a powerful method to measure the functional effects of genetic mutations at massive scales. However, the data generated from these experiments can be difficult to analyze, with significant variation between experimental replicates. To overcome this challenge, we developed popDMS, a computational method based on population genetics theory, to infer the functional effects of mutations from DMS data. Through extensive tests, we found that the functional effects of single mutations and epistasis inferred by popDMS are highly consistent across replicates, comparing favorably with existing methods. Our approach is flexible and can be widely applied to DMS data that includes multiple time points, multiple replicates, and different experimental conditions.
Collapse
Affiliation(s)
- Zhenchen Hong
- Department of Physics and Astronomy, University of California, Riverside, USA
| | - John P. Barton
- Department of Physics and Astronomy, University of California, Riverside, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, USA
- Department of Physics and Astronomy, University of Pittsburgh, USA
| |
Collapse
|
9
|
McRae EKS, Wan CJK, Kristoffersen EL, Hansen K, Gianni E, Gallego I, Curran JF, Attwater J, Holliger P, Andersen ES. Cryo-EM structure and functional landscape of an RNA polymerase ribozyme. Proc Natl Acad Sci U S A 2024; 121:e2313332121. [PMID: 38207080 PMCID: PMC10801858 DOI: 10.1073/pnas.2313332121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 11/29/2023] [Indexed: 01/13/2024] Open
Abstract
The emergence of an RNA replicase capable of self-replication is considered an important stage in the origin of life. RNA polymerase ribozymes (PR) - including a variant that uses trinucleotide triphosphates (triplets) as substrates - have been created by in vitro evolution and are the closest functional analogues of the replicase, but the structural basis for their function is poorly understood. Here we use single-particle cryogenic electron microscopy (cryo-EM) and high-throughput mutation analysis to obtain the structure of a triplet polymerase ribozyme (TPR) apoenzyme and map its functional landscape. The cryo-EM structure at 5-Å resolution reveals the TPR as an RNA heterodimer comprising a catalytic subunit and a noncatalytic, auxiliary subunit, resembling the shape of a left hand with thumb and fingers at a 70° angle. The two subunits are connected by two distinct kissing-loop (KL) interactions that are essential for polymerase function. Our combined structural and functional data suggest a model for templated RNA synthesis by the TPR holoenzyme, whereby heterodimer formation and KL interactions preorganize the TPR for optimal primer-template duplex binding, triplet substrate discrimination, and templated RNA synthesis. These results provide a better understanding of TPR structure and function and should aid the engineering of more efficient PRs.
Collapse
Affiliation(s)
- Ewan K. S. McRae
- Interdisciplinary Nanoscience Center, Department of Molecular Biology and Genetics, Aarhus University, Aarhus8000, Denmark
- Division of Protein and Nucleic Acid Chemistry, Medical Research Council, Laboratory of Molecular Biology, CambridgeCB2 0QH, United Kingdom
| | - Christopher J. K. Wan
- Division of Protein and Nucleic Acid Chemistry, Medical Research Council, Laboratory of Molecular Biology, CambridgeCB2 0QH, United Kingdom
| | - Emil L. Kristoffersen
- Interdisciplinary Nanoscience Center, Department of Molecular Biology and Genetics, Aarhus University, Aarhus8000, Denmark
- Division of Protein and Nucleic Acid Chemistry, Medical Research Council, Laboratory of Molecular Biology, CambridgeCB2 0QH, United Kingdom
| | - Kalinka Hansen
- Interdisciplinary Nanoscience Center, Department of Molecular Biology and Genetics, Aarhus University, Aarhus8000, Denmark
| | - Edoardo Gianni
- Division of Protein and Nucleic Acid Chemistry, Medical Research Council, Laboratory of Molecular Biology, CambridgeCB2 0QH, United Kingdom
| | - Isaac Gallego
- Division of Protein and Nucleic Acid Chemistry, Medical Research Council, Laboratory of Molecular Biology, CambridgeCB2 0QH, United Kingdom
| | - Joseph F. Curran
- Division of Protein and Nucleic Acid Chemistry, Medical Research Council, Laboratory of Molecular Biology, CambridgeCB2 0QH, United Kingdom
| | - James Attwater
- Division of Protein and Nucleic Acid Chemistry, Medical Research Council, Laboratory of Molecular Biology, CambridgeCB2 0QH, United Kingdom
| | - Philipp Holliger
- Division of Protein and Nucleic Acid Chemistry, Medical Research Council, Laboratory of Molecular Biology, CambridgeCB2 0QH, United Kingdom
| | - Ebbe S. Andersen
- Interdisciplinary Nanoscience Center, Department of Molecular Biology and Genetics, Aarhus University, Aarhus8000, Denmark
| |
Collapse
|
10
|
Papkou A, Garcia-Pastor L, Escudero JA, Wagner A. A rugged yet easily navigable fitness landscape. Science 2023; 382:eadh3860. [PMID: 37995212 DOI: 10.1126/science.adh3860] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 09/29/2023] [Indexed: 11/25/2023]
Abstract
Fitness landscape theory predicts that rugged landscapes with multiple peaks impair Darwinian evolution, but experimental evidence is limited. In this study, we used genome editing to map the fitness of >260,000 genotypes of the key metabolic enzyme dihydrofolate reductase in the presence of the antibiotic trimethoprim, which targets this enzyme. The resulting landscape is highly rugged and harbors 514 fitness peaks. However, its highest peaks are accessible to evolving populations via abundant fitness-increasing paths. Different peaks share large basins of attraction that render the outcome of adaptive evolution highly contingent on chance events. Our work shows that ruggedness need not be an obstacle to Darwinian evolution but can reduce its predictability. If true in general, the complexity of optimization problems on realistic landscapes may require reappraisal.
Collapse
Affiliation(s)
- Andrei Papkou
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - Lucia Garcia-Pastor
- Departamento de Sanidad Animal and VISAVET Health Surveillance Centre, Universidad Complutense de Madrid, Madrid, Spain
| | - José Antonio Escudero
- Departamento de Sanidad Animal and VISAVET Health Surveillance Centre, Universidad Complutense de Madrid, Madrid, Spain
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
11
|
Martin NS, Ahnert SE. The Boltzmann distributions of molecular structures predict likely changes through random mutations. Biophys J 2023; 122:4467-4475. [PMID: 37897043 PMCID: PMC10698324 DOI: 10.1016/j.bpj.2023.10.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/19/2023] [Accepted: 10/20/2023] [Indexed: 10/29/2023] Open
Abstract
New folded molecular structures can only evolve after arising through mutations. This aspect is modeled using genotype-phenotype maps, which connect sequence changes through mutations to changes in molecular structures. Previous work has shown that the likelihood of appearing through mutations can differ by orders of magnitude from structure to structure and that this can affect the outcomes of evolutionary processes. Thus, we focus on the phenotypic mutation probabilities φqp, i.e., the likelihood that a random mutation changes structure p into structure q. For both RNA secondary structures and the HP protein model, we show that a simple biophysical principle can explain and predict how this likelihood depends on the new structure q: φqp is high if sequences that fold into p as the minimum-free-energy structure are likely to have q as an alternative structure with high Boltzmann frequency. This generalizes the existing concept of plastogenetic congruence from individual sequences to the entire neutral spaces of structures. Our result helps us understand why some structural changes are more likely than others, may be useful for estimating these likelihoods via sampling and makes a connection to alternative structures with high Boltzmann frequency, which could be relevant in evolutionary processes.
Collapse
Affiliation(s)
- Nora S Martin
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, United Kingdom; Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, United Kingdom; Sainsbury Laboratory, University of Cambridge, Cambridge, United Kingdom.
| | - Sebastian E Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom; The Alan Turing Institute, London, United Kingdom
| |
Collapse
|
12
|
Derbel H, Zhao Z, Liu Q. Accurate prediction of functional effect of single amino acid variants with deep learning. Comput Struct Biotechnol J 2023; 21:5776-5784. [PMID: 38074467 PMCID: PMC10709104 DOI: 10.1016/j.csbj.2023.11.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 11/08/2023] [Accepted: 11/09/2023] [Indexed: 02/12/2024] Open
Abstract
The assessment of functional effect of amino acid variants is a critical biological problem in proteomics for clinical medicine and protein engineering. Although natively occurring variants offer insights into deleterious variants, high-throughput deep mutational experiments enable comprehensive investigation of amino acid variants for a given protein. However, these mutational experiments are too expensive to dissect millions of variants on thousands of proteins. Thus, computational approaches have been proposed, but they heavily rely on hand-crafted evolutionary conservation, limiting their accuracy. Recent advancement in transformers provides a promising solution to precisely estimate the functional effects of protein variants on high-throughput experimental data. Here, we introduce a novel deep learning model, namely Rep2Mut-V2, which leverages learned representation from transformer models. Rep2Mut-V2 significantly enhances the prediction accuracy for 27 types of measurements of functional effects of protein variants. In the evaluation of 38 protein datasets with 118,933 single amino acid variants, Rep2Mut-V2 achieved an average Spearman's correlation coefficient of 0.7. This surpasses the performance of six state-of-the-art methods, including the recently released methods ESM, DeepSequence and EVE. Even with limited training data, Rep2Mut-V2 outperforms ESM and DeepSequence, showing its potential to extend high-throughput experimental analysis for more protein variants to reduce experimental cost. In conclusion, Rep2Mut-V2 provides accurate predictions of the functional effects of single amino acid variants of protein coding sequences. This tool can significantly aid in the interpretation of variants in human disease studies.
Collapse
Affiliation(s)
- Houssemeddine Derbel
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
| | - Zhongming Zhao
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Qian Liu
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
- School of Life Sciences, College of Sciences, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
| |
Collapse
|
13
|
Lobinska G, Pilpel Y, Ram Y. Phenotype switching of the mutation rate facilitates adaptive evolution. Genetics 2023; 225:iyad111. [PMID: 37293818 PMCID: PMC10471227 DOI: 10.1093/genetics/iyad111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 02/05/2023] [Accepted: 05/25/2023] [Indexed: 06/10/2023] Open
Abstract
The mutation rate plays an important role in adaptive evolution. It can be modified by mutator and anti-mutator alleles. Recent empirical evidence hints that the mutation rate may vary among genetically identical individuals: evidence from bacteria suggests that the mutation rate can be affected by expression noise of a DNA repair protein and potentially also by translation errors in various proteins. Importantly, this non-genetic variation may be heritable via a transgenerational epigenetic mode of inheritance, giving rise to a mutator phenotype that is independent from mutator alleles. Here, we investigate mathematically how the rate of adaptive evolution is affected by the rate of mutation rate phenotype switching. We model an asexual population with two mutation rate phenotypes, non-mutator and mutator. An offspring may switch from its parental phenotype to the other phenotype. We find that switching rates that correspond to so-far empirically described non-genetic systems of inheritance of the mutation rate lead to higher rates of adaptation on both artificial and natural fitness landscapes. These switching rates can maintain within the same individuals both a mutator phenotype and intermediary mutations, a combination that facilitates adaptation. Moreover, non-genetic inheritance increases the proportion of mutators in the population, which in turn increases the probability of hitchhiking of the mutator phenotype with adaptive mutations. This in turns facilitates the acquisition of additional adaptive mutations. Our results rationalize recently observed noise in the expression of proteins that affect the mutation rate and suggest that non-genetic inheritance of this phenotype may facilitate evolutionary adaptive processes.
Collapse
Affiliation(s)
- Gabriela Lobinska
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Yitzhak Pilpel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Yoav Ram
- School of Zoology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
14
|
Wagner A. Evolvability-enhancing mutations in the fitness landscapes of an RNA and a protein. Nat Commun 2023; 14:3624. [PMID: 37336901 PMCID: PMC10279741 DOI: 10.1038/s41467-023-39321-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 06/05/2023] [Indexed: 06/21/2023] Open
Abstract
Can evolvability-the ability to produce adaptive heritable variation-itself evolve through adaptive Darwinian evolution? If so, then Darwinian evolution may help create the conditions that enable Darwinian evolution. Here I propose a framework that is suitable to address this question with available experimental data on adaptive landscapes. I introduce the notion of an evolvability-enhancing mutation, which increases the likelihood that subsequent mutations in an evolving organism, protein, or RNA molecule are adaptive. I search for such mutations in the experimentally characterized and combinatorially complete fitness landscapes of a protein and an RNA molecule. I find that such evolvability-enhancing mutations indeed exist. They constitute a small fraction of all mutations, which shift the distribution of fitness effects of subsequent mutations towards less deleterious mutations, and increase the incidence of beneficial mutations. Evolving populations which experience such mutations can evolve significantly higher fitness. The study of evolvability-enhancing mutations opens many avenues of investigation into the evolution of evolvability.
Collapse
Affiliation(s)
- Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland.
- The Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
15
|
Baier F, Gauye F, Perez-Carrasco R, Payne JL, Schaerli Y. Environment-dependent epistasis increases phenotypic diversity in gene regulatory networks. SCIENCE ADVANCES 2023; 9:eadf1773. [PMID: 37224262 DOI: 10.1126/sciadv.adf1773] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 04/17/2023] [Indexed: 05/26/2023]
Abstract
Mutations to gene regulatory networks can be maladaptive or a source of evolutionary novelty. Epistasis confounds our understanding of how mutations affect the expression patterns of gene regulatory networks, a challenge exacerbated by the dependence of epistasis on the environment. We used the toolkit of synthetic biology to systematically assay the effects of pairwise and triplet combinations of mutant genotypes on the expression pattern of a gene regulatory network expressed in Escherichia coli that interprets an inducer gradient across a spatial domain. We uncovered a preponderance of epistasis that can switch in magnitude and sign across the inducer gradient to produce a greater diversity of expression pattern phenotypes than would be possible in the absence of such environment-dependent epistasis. We discuss our findings in the context of the evolution of hybrid incompatibilities and evolutionary novelties.
Collapse
Affiliation(s)
- Florian Baier
- Department of Fundamental Microbiology, University of Lausanne, Biophore Building, 1015 Lausanne, Switzerland
| | - Florence Gauye
- Department of Fundamental Microbiology, University of Lausanne, Biophore Building, 1015 Lausanne, Switzerland
| | | | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland
| | - Yolanda Schaerli
- Department of Fundamental Microbiology, University of Lausanne, Biophore Building, 1015 Lausanne, Switzerland
| |
Collapse
|
16
|
Li XC, Fuqua T, van Breugel ME, Crocker J. Mutational scans reveal differential evolvability of Drosophila promoters and enhancers. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220054. [PMID: 37004721 PMCID: PMC10067265 DOI: 10.1098/rstb.2022.0054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2023] Open
Abstract
Rapid enhancer and slow promoter evolution have been demonstrated through comparative genomics. However, it is not clear how this information is encoded genetically and if this can be used to place evolution in a predictive context. Part of the challenge is that our understanding of the potential for regulatory evolution is biased primarily toward natural variation or limited experimental perturbations. Here, to explore the evolutionary capacity of promoter variation, we surveyed an unbiased mutation library for three promoters in Drosophila melanogaster. We found that mutations in promoters had limited to no effect on spatial patterns of gene expression. Compared to developmental enhancers, promoters are more robust to mutations and have more access to mutations that can increase gene expression, suggesting that their low activity might be a result of selection. Consistent with these observations, increasing the promoter activity at the endogenous locus of shavenbaby led to increased transcription yet limited phenotypic changes. Taken together, developmental promoters may encode robust transcriptional outputs allowing evolvability through the integration of diverse developmental enhancers. This article is part of the theme issue ‘Interdisciplinary approaches to predicting evolutionary biology’.
Collapse
Affiliation(s)
- Xueying C. Li
- European Molecular Biology Laboratory, Heidelberg, Baden-Württemberg 69117, Germany
| | - Timothy Fuqua
- European Molecular Biology Laboratory, Heidelberg, Baden-Württemberg 69117, Germany
| | | | - Justin Crocker
- European Molecular Biology Laboratory, Heidelberg, Baden-Württemberg 69117, Germany
| |
Collapse
|
17
|
Radford F, Rinehart J, Isaacs FJ. Mapping the in vivo fitness landscape of a tethered ribosome. SCIENCE ADVANCES 2023; 9:eade8934. [PMID: 37115918 PMCID: PMC10146877 DOI: 10.1126/sciadv.ade8934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Fitness landscapes are models of the sequence space of a genetic element that map how each sequence corresponds to its activity and can be used to guide laboratory evolution. The ribosome is a macromolecular machine that is essential for protein synthesis in all organisms. Because of the prevalence of dominant lethal mutations, a comprehensive fitness landscape of the ribosomal peptidyl transfer center (PTC) has not yet been attained. Here, we develop a method to functionally map an orthogonal tethered ribosome (oRiboT), which permits complete mutagenesis of nucleotides located in the PTC and the resulting epistatic interactions. We found that most nucleotides studied showed flexibility to mutation, and identified epistatic interactions between them, which compensate for deleterious mutations. This work provides a basis for a deeper understanding of ribosome function and malleability and could be used to inform design of engineered ribosomes with applications to synthesize next-generation biomaterials and therapeutics.
Collapse
Affiliation(s)
- Felix Radford
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06520, USA
- Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Jesse Rinehart
- Systems Biology Institute, Yale University, West Haven, CT 06516, USA
- Department of Cellular and Molecular Physiology, Yale School of Medicine, New Haven, CT 06520, USA
| | - Farren J. Isaacs
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06520, USA
- Systems Biology Institute, Yale University, West Haven, CT 06516, USA
- Department of Biomedical Engineering, Yale University, New Haven, CT 06520, USA
- Corresponding author.
| |
Collapse
|
18
|
Ang RML, Chen SAA, Kern AF, Xie Y, Fraser HB. Widespread epistasis among beneficial genetic variants revealed by high-throughput genome editing. CELL GENOMICS 2023; 3:100260. [PMID: 37082144 PMCID: PMC10112194 DOI: 10.1016/j.xgen.2023.100260] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 09/27/2022] [Accepted: 01/06/2023] [Indexed: 04/22/2023]
Abstract
The phenotypic effect of any genetic variant can be altered by variation at other genomic loci. Known as epistasis, these genetic interactions shape the genotype-phenotype map of every species, yet their origins remain poorly understood. To investigate this, we employed high-throughput genome editing to measure the fitness effects of 1,826 naturally polymorphic variants in four strains of Saccharomyces cerevisiae. About 31% of variants affect fitness, of which 24% have strain-specific fitness effects indicative of epistasis. We found that beneficial variants are more likely to exhibit genetic interactions and that these interactions can be mediated by specific traits such as flocculation ability. This work suggests that adaptive evolution will often involve trade-offs where a variant is only beneficial in some genetic backgrounds, potentially explaining why many beneficial variants remain polymorphic. In sum, we provide a framework to understand the factors influencing epistasis with single-nucleotide resolution, revealing widespread epistasis among beneficial variants.
Collapse
Affiliation(s)
- Roy Moh Lik Ang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Shi-An A. Chen
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Alexander F. Kern
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Yihua Xie
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Hunter B. Fraser
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
19
|
Zhang J. What Has Genomics Taught An Evolutionary Biologist? GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1-12. [PMID: 36720382 PMCID: PMC10373158 DOI: 10.1016/j.gpb.2023.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 01/06/2023] [Accepted: 01/19/2023] [Indexed: 01/30/2023]
Abstract
Genomics, an interdisciplinary field of biology on the structure, function, and evolution of genomes, has revolutionized many subdisciplines of life sciences, including my field of evolutionary biology, by supplying huge data, bringing high-throughput technologies, and offering a new approach to biology. In this review, I describe what I have learned from genomics and highlight the fundamental knowledge and mechanistic insights gained. I focus on three broad topics that are central to evolutionary biology and beyond-variation, interaction, and selection-and use primarily my own research and study subjects as examples. In the next decade or two, I expect that the most important contributions of genomics to evolutionary biology will be to provide genome sequences of nearly all known species on Earth, facilitate high-throughput phenotyping of natural variants and systematically constructed mutants for mapping genotype-phenotype-fitness landscapes, and assist the determination of causality in evolutionary processes using experimental evolution.
Collapse
Affiliation(s)
- Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
20
|
Roberts JM, Beck JD, Pollock TB, Bendixsen DP, Hayden EJ. RNA sequence to structure analysis from comprehensive pairwise mutagenesis of multiple self-cleaving ribozymes. eLife 2023; 12:80360. [PMID: 36655987 PMCID: PMC9901934 DOI: 10.7554/elife.80360] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 12/28/2022] [Indexed: 01/20/2023] Open
Abstract
Self-cleaving ribozymes are RNA molecules that catalyze the cleavage of their own phosphodiester backbones. These ribozymes are found in all domains of life and are also a tool for biotechnical and synthetic biology applications. Self-cleaving ribozymes are also an important model of sequence-to-function relationships for RNA because their small size simplifies synthesis of genetic variants and self-cleaving activity is an accessible readout of the functional consequence of the mutation. Here, we used a high-throughput experimental approach to determine the relative activity for every possible single and double mutant of five self-cleaving ribozymes. From this data, we comprehensively identified non-additive effects between pairs of mutations (epistasis) for all five ribozymes. We analyzed how changes in activity and trends in epistasis map to the ribozyme structures. The variety of structures studied provided opportunities to observe several examples of common structural elements, and the data was collected under identical experimental conditions to enable direct comparison. Heatmap-based visualization of the data revealed patterns indicating structural features of the ribozymes including paired regions, unpaired loops, non-canonical structures, and tertiary structural contacts. The data also revealed signatures of functionally critical nucleotides involved in catalysis. The results demonstrate that the data sets provide structural information similar to chemical or enzymatic probing experiments, but with additional quantitative functional information. The large-scale data sets can be used for models predicting structure and function and for efforts to engineer self-cleaving ribozymes.
Collapse
Affiliation(s)
- Jessica M Roberts
- Biomolecular Sciences Graduate Programs, Boise State UniversityBoiseUnited States
| | - James D Beck
- Computing PhD Program, Boise State UniversityBoiseUnited States
| | - Tanner B Pollock
- Department of Biological Science, Boise State UniversityBoiseUnited States
| | - Devin P Bendixsen
- Biomolecular Sciences Graduate Programs, Boise State UniversityBoiseUnited States
| | - Eric J Hayden
- Biomolecular Sciences Graduate Programs, Boise State UniversityBoiseUnited States
- Computing PhD Program, Boise State UniversityBoiseUnited States
- Department of Biological Science, Boise State UniversityBoiseUnited States
| |
Collapse
|
21
|
Wei H, Li X. Deep mutational scanning: A versatile tool in systematically mapping genotypes to phenotypes. Front Genet 2023; 14:1087267. [PMID: 36713072 PMCID: PMC9878224 DOI: 10.3389/fgene.2023.1087267] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 01/02/2023] [Indexed: 01/13/2023] Open
Abstract
Unveiling how genetic variations lead to phenotypic variations is one of the key questions in evolutionary biology, genetics, and biomedical research. Deep mutational scanning (DMS) technology has allowed the mapping of tens of thousands of genetic variations to phenotypic variations efficiently and economically. Since its first systematic introduction about a decade ago, we have witnessed the use of deep mutational scanning in many research areas leading to scientific breakthroughs. Also, the methods in each step of deep mutational scanning have become much more versatile thanks to the oligo-synthesizing technology, high-throughput phenotyping methods and deep sequencing technology. However, each specific possible step of deep mutational scanning has its pros and cons, and some limitations still await further technological development. Here, we discuss recent scientific accomplishments achieved through the deep mutational scanning and describe widely used methods in each step of deep mutational scanning. We also compare these different methods and analyze their advantages and disadvantages, providing insight into how to design a deep mutational scanning study that best suits the aims of the readers' projects.
Collapse
Affiliation(s)
- Huijin Wei
- Zhejiang University—University of Edinburgh Institute, Zhejiang University, Haining, Zhejiang, China
| | - Xianghua Li
- Zhejiang University—University of Edinburgh Institute, Zhejiang University, Haining, Zhejiang, China
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh, United Kingdom
- The Second Affiliated Hospital of Zhejiang University, Hangzhou, Zhejiang, China
- Biomedical and Health Translational Centre of Zhejiang Province, Haining, Zhejiang, China
| |
Collapse
|
22
|
Abstract
One core goal of genetics is to systematically understand the mapping between the DNA sequence of an organism (genotype) and its measurable characteristics (phenotype). Understanding this mapping is often challenging because of interactions between mutations, where the result of combining several different mutations can be very different than the sum of their individual effects. Here we provide a statistical framework for modeling complex genetic interactions of this type. The key idea is to ask how fast the effects of mutations change when introducing the same mutation in increasingly distant genetic backgrounds. We then propose a model for phenotypic prediction that takes into account this tendency for the effects of mutations to be more similar in nearby genetic backgrounds. Contemporary high-throughput mutagenesis experiments are providing an increasingly detailed view of the complex patterns of genetic interaction that occur between multiple mutations within a single protein or regulatory element. By simultaneously measuring the effects of thousands of combinations of mutations, these experiments have revealed that the genotype–phenotype relationship typically reflects not only genetic interactions between pairs of sites but also higher-order interactions among larger numbers of sites. However, modeling and understanding these higher-order interactions remains challenging. Here we present a method for reconstructing sequence-to-function mappings from partially observed data that can accommodate all orders of genetic interaction. The main idea is to make predictions for unobserved genotypes that match the type and extent of epistasis found in the observed data. This information on the type and extent of epistasis can be extracted by considering how phenotypic correlations change as a function of mutational distance, which is equivalent to estimating the fraction of phenotypic variance due to each order of genetic interaction (additive, pairwise, three-way, etc.). Using these estimated variance components, we then define an empirical Bayes prior that in expectation matches the observed pattern of epistasis and reconstruct the genotype–phenotype mapping by conducting Gaussian process regression under this prior. To demonstrate the power of this approach, we present an application to the antibody-binding domain GB1 and also provide a detailed exploration of a dataset consisting of high-throughput measurements for the splicing efficiency of human pre-mRNA 5′ splice sites, for which we also validate our model predictions via additional low-throughput experiments.
Collapse
|
23
|
Gabzi T, Pilpel Y, Friedlander T. Fitness landscape analysis of a tRNA gene reveals that the wild type allele is sub-optimal, yet mutationally robust. Mol Biol Evol 2022; 39:6670756. [PMID: 35976926 PMCID: PMC9447856 DOI: 10.1093/molbev/msac178] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Fitness landscape mapping and the prediction of evolutionary trajectories on these landscapes are major tasks in evolutionary biology research. Evolutionary dynamics is tightly linked to the landscape topography, but this relation is not straightforward. Here, we analyze a fitness landscape of a yeast tRNA gene, previously measured under four different conditions. We find that the wild type allele is sub-optimal, and 8–10% of its variants are fitter. We rule out the possibilities that the wild type is fittest on average on these four conditions or located on a local fitness maximum. Notwithstanding, we cannot exclude the possibility that the wild type might be fittest in some of the many conditions in the complex ecology that yeast lives at. Instead, we find that the wild type is mutationally robust (“flat”), while more fit variants are typically mutationally fragile. Similar observations of mutational robustness or flatness have been so far made in very few cases, predominantly in viral genomes.
Collapse
Affiliation(s)
- Tzahi Gabzi
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Yitzhak Pilpel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Tamar Friedlander
- The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture Faculty of Agriculture, Hebrew University of Jerusalem, 229 Herzl St., Rehovot 7610001, Israel
| |
Collapse
|
24
|
Rotrattanadumrong R, Yokobayashi Y. Experimental exploration of a ribozyme neutral network using evolutionary algorithm and deep learning. Nat Commun 2022; 13:4847. [PMID: 35977956 PMCID: PMC9385714 DOI: 10.1038/s41467-022-32538-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 08/03/2022] [Indexed: 11/18/2022] Open
Abstract
A neutral network connects all genotypes with equivalent phenotypes in a fitness landscape and plays an important role in the mutational robustness and evolvability of biomolecules. In contrast to earlier theoretical works, evidence of large neutral networks has been lacking in recent experimental studies of fitness landscapes. This suggests that evolution could be constrained globally. Here, we demonstrate that a deep learning-guided evolutionary algorithm can efficiently identify neutral genotypes within the sequence space of an RNA ligase ribozyme. Furthermore, we measure the activities of all 216 variants connecting two active ribozymes that differ by 16 mutations and analyze mutational interactions (epistasis) up to the 16th order. We discover an extensive network of neutral paths linking the two genotypes and reveal that these paths might be predicted using only information from lower-order interactions. Our experimental evaluation of over 120,000 ribozyme sequences provides important empirical evidence that neutral networks can increase the accessibility and predictability of the fitness landscape. Neutral networks, which are sets of genotypes connected via single mutations that share the same phenotype, are important for evolvability. Here, the authors provide experimental evidence of a neutral network in an RNA enzyme using a high-throughput assay and deep learning.
Collapse
Affiliation(s)
- Rachapun Rotrattanadumrong
- Nucleic Acid Chemistry and Engineering Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa, 9040495, Japan
| | - Yohei Yokobayashi
- Nucleic Acid Chemistry and Engineering Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa, 9040495, Japan.
| |
Collapse
|
25
|
Beck JD, Roberts JM, Kitzhaber JM, Trapp A, Serra E, Spezzano F, Hayden EJ. Predicting higher-order mutational effects in an RNA enzyme by machine learning of high-throughput experimental data. Front Mol Biosci 2022; 9:893864. [PMID: 36046603 PMCID: PMC9421044 DOI: 10.3389/fmolb.2022.893864] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 06/28/2022] [Indexed: 11/13/2022] Open
Abstract
Ribozymes are RNA molecules that catalyze biochemical reactions. Self-cleaving ribozymes are a common naturally occurring class of ribozymes that catalyze site-specific cleavage of their own phosphodiester backbone. In addition to their natural functions, self-cleaving ribozymes have been used to engineer control of gene expression because they can be designed to alter RNA processing and stability. However, the rational design of ribozyme activity remains challenging, and many ribozyme-based systems are engineered or improved by random mutagenesis and selection (in vitro evolution). Improving a ribozyme-based system often requires several mutations to achieve the desired function, but extensive pairwise and higher-order epistasis prevent a simple prediction of the effect of multiple mutations that is needed for rational design. Recently, high-throughput sequencing-based approaches have produced data sets on the effects of numerous mutations in different ribozymes (RNA fitness landscapes). Here we used such high-throughput experimental data from variants of the CPEB3 self-cleaving ribozyme to train a predictive model through machine learning approaches. We trained models using either a random forest or long short-term memory (LSTM) recurrent neural network approach. We found that models trained on a comprehensive set of pairwise mutant data could predict active sequences at higher mutational distances, but the correlation between predicted and experimentally observed self-cleavage activity decreased with increasing mutational distance. Adding sequences with increasingly higher numbers of mutations to the training data improved the correlation at increasing mutational distances. Systematically reducing the size of the training data set suggests that a wide distribution of ribozyme activity may be the key to accurate predictions. Because the model predictions are based only on sequence and activity data, the results demonstrate that this machine learning approach allows readily obtainable experimental data to be used for RNA design efforts even for RNA molecules with unknown structures. The accurate prediction of RNA functions will enable a more comprehensive understanding of RNA fitness landscapes for studying evolution and for guiding RNA-based engineering efforts.
Collapse
Affiliation(s)
| | - Jessica M. Roberts
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, United States
| | - Joey M. Kitzhaber
- Department of Computer Science, Boise State University, Boise, ID, United States
| | - Ashlyn Trapp
- Department of Biological Sciences, Boise State University, Boise, ID, United States
| | | | | | - Eric J. Hayden
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, United States
- Department of Computer Science, Boise State University, Boise, ID, United States
- *Correspondence: Eric J. Hayden,
| |
Collapse
|
26
|
Synonymous mutations in representative yeast genes are mostly strongly non-neutral. Nature 2022; 606:725-731. [PMID: 35676473 PMCID: PMC9650438 DOI: 10.1038/s41586-022-04823-w] [Citation(s) in RCA: 144] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 04/28/2022] [Indexed: 01/12/2023]
Abstract
Synonymous mutations in protein-coding genes do not alter protein sequences and are thus generally presumed to be neutral or nearly neutral1-5. Here, to experimentally verify this presumption, we constructed 8,341 yeast mutants each carrying a synonymous, nonsynonymous or nonsense mutation in one of 21 endogenous genes with diverse functions and expression levels and measured their fitness relative to the wild type in a rich medium. Three-quarters of synonymous mutations resulted in a significant reduction in fitness, and the distribution of fitness effects was overall similar-albeit nonidentical-between synonymous and nonsynonymous mutations. Both synonymous and nonsynonymous mutations frequently disturbed the level of mRNA expression of the mutated gene, and the extent of the disturbance partially predicted the fitness effect. Investigations in additional environments revealed greater across-environment fitness variations for nonsynonymous mutants than for synonymous mutants despite their similar fitness distributions in each environment, suggesting that a smaller proportion of nonsynonymous mutants than synonymous mutants are always non-deleterious in a changing environment to permit fixation, potentially explaining the common observation of substantially lower nonsynonymous than synonymous substitution rates. The strong non-neutrality of most synonymous mutations, if it holds true for other genes and in other organisms, would require re-examination of numerous biological conclusions about mutation, selection, effective population size, divergence time and disease mechanisms that rely on the assumption that synoymous mutations are neutral.
Collapse
|
27
|
Yang CH, Scarpino SV. A Family of Fitness Landscapes Modeled through Gene Regulatory Networks. ENTROPY (BASEL, SWITZERLAND) 2022; 24:622. [PMID: 35626507 PMCID: PMC9141513 DOI: 10.3390/e24050622] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 04/11/2022] [Accepted: 04/26/2022] [Indexed: 02/01/2023]
Abstract
Fitness landscapes are a powerful metaphor for understanding the evolution of biological systems. These landscapes describe how genotypes are connected to each other through mutation and related through fitness. Empirical studies of fitness landscapes have increasingly revealed conserved topographical features across diverse taxa, e.g., the accessibility of genotypes and "ruggedness". As a result, theoretical studies are needed to investigate how evolution proceeds on fitness landscapes with such conserved features. Here, we develop and study a model of evolution on fitness landscapes using the lens of Gene Regulatory Networks (GRNs), where the regulatory products are computed from multiple genes and collectively treated as phenotypes. With the assumption that regulation is a binary process, we prove the existence of empirically observed, topographical features such as accessibility and connectivity. We further show that these results hold across arbitrary fitness functions and that a trade-off between accessibility and ruggedness need not exist. Then, using graph theory and a coarse-graining approach, we deduce a mesoscopic structure underlying GRN fitness landscapes where the information necessary to predict a population's evolutionary trajectory is retained with minimal complexity. Using this coarse-graining, we develop a bottom-up algorithm to construct such mesoscopic backbones, which does not require computing the genotype network and is therefore far more efficient than brute-force approaches. Altogether, this work provides mathematical results of high-dimensional fitness landscapes and a path toward connecting theory to empirical studies.
Collapse
Affiliation(s)
- Chia-Hung Yang
- Network Science Institute, Northeastern University, Boston, MA 02115, USA
| | - Samuel V. Scarpino
- Network Science Institute, Northeastern University, Boston, MA 02115, USA
- Physics Department, Northeastern University, Boston, MA 02115, USA
- Roux Institute, Northeastern University, Boston, MA 02115, USA
- Institute for Experiential AI, Northeastern University, Boston, MA 02115, USA
- Santa Fe Institute, Santa Fe, NM 87501, USA
- Vermont Complex Systems Center, University of Vermont, Burlington, VT 05405, USA
| |
Collapse
|
28
|
Li C, Haller G, Weihl CC. Current and Future Approaches to Classify VUSs in LGMD-Related Genes. Genes (Basel) 2022; 13:genes13020382. [PMID: 35205425 PMCID: PMC8871643 DOI: 10.3390/genes13020382] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 02/11/2022] [Accepted: 02/16/2022] [Indexed: 01/09/2023] Open
Abstract
Next-generation sequencing (NGS) has revealed large numbers of genetic variants in LGMD-related genes, with most of them classified as variants of uncertain significance (VUSs). VUSs are genetic changes with unknown pathological impact and present a major challenge in genetic test interpretation and disease diagnosis. Understanding the phenotypic consequences of VUSs can provide clinical guidance regarding LGMD risk and therapy. In this review, we provide a brief overview of the subtypes of LGMD, disease diagnosis, current classification systems for investigating VUSs, and a potential deep mutational scanning approach to classify VUSs in LGMD-related genes.
Collapse
Affiliation(s)
- Chengcheng Li
- Department of Neurology, Washington University School of Medicine, Saint Louis, MO 63110, USA; (C.L.); (G.H.)
| | - Gabe Haller
- Department of Neurology, Washington University School of Medicine, Saint Louis, MO 63110, USA; (C.L.); (G.H.)
- Department of Neurological Surgery, Washington University School of Medicine, Saint Louis, MO 63110, USA
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Conrad C. Weihl
- Department of Neurology, Washington University School of Medicine, Saint Louis, MO 63110, USA; (C.L.); (G.H.)
- Correspondence:
| |
Collapse
|
29
|
Expression level is a major modifier of the fitness landscape of a protein coding gene. Nat Ecol Evol 2021; 6:103-115. [PMID: 34795386 DOI: 10.1038/s41559-021-01578-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 10/01/2021] [Indexed: 11/09/2022]
Abstract
The phenotypic consequence of a genetic mutation depends on many factors including the expression level of a gene. However, a comprehensive quantification of this expression effect is still lacking, as is a further general mechanistic understanding of the effect. Here, we measured the fitness effect of almost all (>97.5%) single-nucleotide mutations in GFP, an exogenous gene with no physiological function, and URA3, a conditionally essential gene. Both genes were driven by two promoters whose expression levels differed by around tenfold. The resulting fitness landscapes revealed that the fitness effects of at least 42% of all single-nucleotide mutations within the genes were expression dependent. Although only a small fraction of variation in fitness effects among different mutations can be explained by biophysical properties of the protein and messenger RNA of the gene, our analyses revealed that the avoidance of stochastic molecular errors generally underlies the expression dependency of mutational effects and suggested protein misfolding as the most important type of molecular error among those examined. Our results therefore directly explained the slower evolution of highly expressed genes and highlighted cytotoxicity due to stochastic molecular errors as a non-negligible component for understanding the phenotypic consequence of mutations.
Collapse
|
30
|
Sesta L, Uguzzoni G, Fernandez-de-Cossio-Diaz J, Pagnani A. AMaLa: Analysis of Directed Evolution Experiments via Annealed Mutational Approximated Landscape. Int J Mol Sci 2021; 22:10908. [PMID: 34681569 PMCID: PMC8535593 DOI: 10.3390/ijms222010908] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 09/24/2021] [Accepted: 09/27/2021] [Indexed: 01/12/2023] Open
Abstract
We present Annealed Mutational approximated Landscape (AMaLa), a new method to infer fitness landscapes from Directed Evolution experiments sequencing data. Such experiments typically start from a single wild-type sequence, which undergoes Darwinian in vitro evolution via multiple rounds of mutation and selection for a target phenotype. In the last years, Directed Evolution is emerging as a powerful instrument to probe fitness landscapes under controlled experimental conditions and as a relevant testing ground to develop accurate statistical models and inference algorithms (thanks to high-throughput screening and sequencing). Fitness landscape modeling either uses the enrichment of variants abundances as input, thus requiring the observation of the same variants at different rounds or assuming the last sequenced round as being sampled from an equilibrium distribution. AMaLa aims at effectively leveraging the information encoded in the whole time evolution. To do so, while assuming statistical sampling independence between sequenced rounds, the possible trajectories in sequence space are gauged with a time-dependent statistical weight consisting of two contributions: (i) an energy term accounting for the selection process and (ii) a generalized Jukes-Cantor model for the purely mutational step. This simple scheme enables accurately describing the Directed Evolution dynamics and inferring a fitness landscape that correctly reproduces the measures of the phenotype under selection (e.g., antibiotic drug resistance), notably outperforming widely used inference strategies. In addition, we assess the reliability of AMaLa by showing how the inferred statistical model could be used to predict relevant structural properties of the wild-type sequence.
Collapse
Affiliation(s)
- Luca Sesta
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy; (L.S.); (G.U.); (A.P.)
| | - Guido Uguzzoni
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy; (L.S.); (G.U.); (A.P.)
| | - Jorge Fernandez-de-Cossio-Diaz
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR 8023 & PSL Research, Sorbonne Université, 24 rue Lhomond, 75005 Paris, France
- Center of Molecular Immunology, Systems Biology Department, Playa, Havana CP 11600, Cuba
| | - Andrea Pagnani
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy; (L.S.); (G.U.); (A.P.)
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo, Italy
- INFN, Sezione di Torino, I-10125 Torino, Italy
| |
Collapse
|
31
|
Song S, Zhang J. Unbiased inference of the fitness landscape ruggedness from imprecise fitness estimates. Evolution 2021; 75:2658-2671. [PMID: 34554581 DOI: 10.1111/evo.14363] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 09/14/2021] [Indexed: 01/17/2023]
Abstract
Fitness landscapes map genotypes to their corresponding fitness under given environments and allow explaining and predicting evolutionary trajectories. Of particular interest is the landscape ruggedness or the unevenness of the landscape, because it impacts many aspects of evolution such as the likelihood that a population is trapped in a local fitness peak. Although the ruggedness has been inferred from a number of empirically mapped fitness landscapes, it is unclear to what extent this inference is affected by fitness estimation error, which is inevitable in the experimental determination of fitness landscapes. Here, we address this question by simulating fitness landscapes under various theoretical models, with or without fitness estimation error. We find that all eight examined measures of landscape ruggedness are overestimated due to imprecise fitness quantification, but different measures are affected to different degrees. We devise a method to use replicate fitness measures to correct this bias and show that our method performs well under realistic conditions. We conclude that previously reported fitness landscape ruggedness is likely upward biased owing to the negligence of fitness estimation error and advise that future fitness landscape mapping should include at least three biological replicates to permit an unbiased inference of the ruggedness.
Collapse
Affiliation(s)
- Siliang Song
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, 48109
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, 48109
| |
Collapse
|
32
|
Garcia DM, Campbell EA, Jakobson CM, Tsuchiya M, Shaw EA, DiNardo AL, Kaeberlein M, Jarosz DF. A prion accelerates proliferation at the expense of lifespan. eLife 2021; 10:e60917. [PMID: 34545808 PMCID: PMC8455135 DOI: 10.7554/elife.60917] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 08/12/2021] [Indexed: 12/23/2022] Open
Abstract
In fluctuating environments, switching between different growth strategies, such as those affecting cell size and proliferation, can be advantageous to an organism. Trade-offs arise, however. Mechanisms that aberrantly increase cell size or proliferation-such as mutations or chemicals that interfere with growth regulatory pathways-can also shorten lifespan. Here we report a natural example of how the interplay between growth and lifespan can be epigenetically controlled. We find that a highly conserved RNA-modifying enzyme, the pseudouridine synthase Pus4/TruB, can act as a prion, endowing yeast with greater proliferation rates at the cost of a shortened lifespan. Cells harboring the prion grow larger and exhibit altered protein synthesis. This epigenetic state, [BIG+] (better in growth), allows cells to heritably yet reversibly alter their translational program, leading to the differential synthesis of dozens of proteins, including many that regulate proliferation and aging. Our data reveal a new role for prion-based control of an RNA-modifying enzyme in driving heritable epigenetic states that transform cell growth and survival.
Collapse
Affiliation(s)
- David M Garcia
- Department of Chemical & Systems Biology, Stanford University School of Medicine, Stanford, United States
- Institute of Molecular Biology, Department of Biology, University of Oregon, Eugene, United States
| | - Edgar A Campbell
- Department of Chemical & Systems Biology, Stanford University School of Medicine, Stanford, United States
| | - Christopher M Jakobson
- Department of Chemical & Systems Biology, Stanford University School of Medicine, Stanford, United States
| | - Mitsuhiro Tsuchiya
- Department of Pathology, University of Washington, Seattle, United States
| | - Ethan A Shaw
- Institute of Molecular Biology, Department of Biology, University of Oregon, Eugene, United States
| | - Acadia L DiNardo
- Institute of Molecular Biology, Department of Biology, University of Oregon, Eugene, United States
| | - Matt Kaeberlein
- Department of Pathology, University of Washington, Seattle, United States
| | - Daniel F Jarosz
- Department of Chemical & Systems Biology, Stanford University School of Medicine, Stanford, United States
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, United States
| |
Collapse
|
33
|
Bendixsen DP, Pollock TB, Peri G, Hayden EJ. Experimental Resurrection of Ancestral Mammalian CPEB3 Ribozymes Reveals Deep Functional Conservation. Mol Biol Evol 2021; 38:2843-2853. [PMID: 33720319 PMCID: PMC8233481 DOI: 10.1093/molbev/msab074] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Self-cleaving ribozymes are genetic elements found in all domains of life, but their evolution remains poorly understood. A ribozyme located in the second intron of the cytoplasmic polyadenylation binding protein 3 gene (CPEB3) shows high sequence conservation in mammals, but little is known about the functional conservation of self-cleaving ribozyme activity across the mammalian tree of life or during the course of mammalian evolution. Here, we use a phylogenetic approach to design a mutational library and a deep sequencing assay to evaluate the in vitro self-cleavage activity of numerous extant and resurrected CPEB3 ribozymes that span over 100 My of mammalian evolution. We found that the predicted sequence at the divergence of placentals and marsupials is highly active, and this activity has been conserved in most lineages. A reduction in ribozyme activity appears to have occurred multiple different times throughout the mammalian tree of life. The in vitro activity data allow an evaluation of the predicted mutational pathways leading to extant ribozyme as well as the mutational landscape surrounding these ribozymes. The results demonstrate that in addition to sequence conservation, the self-cleavage activity of the CPEB3 ribozyme has persisted over millions of years of mammalian evolution.
Collapse
Affiliation(s)
- Devin P. Bendixsen
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, USA
| | - Tanner B. Pollock
- Department of Biological Science, Boise State University, Boise, ID, USA
| | - Gianluca Peri
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, USA
| | - Eric J. Hayden
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, USA
- Department of Biological Science, Boise State University, Boise, ID, USA
| |
Collapse
|
34
|
Burton TD, Eyre NS. Applications of Deep Mutational Scanning in Virology. Viruses 2021; 13:1020. [PMID: 34071591 PMCID: PMC8227372 DOI: 10.3390/v13061020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 05/26/2021] [Accepted: 05/26/2021] [Indexed: 12/20/2022] Open
Abstract
Several recently developed high-throughput techniques have changed the field of molecular virology. For example, proteomics studies reveal complete interactomes of a viral protein, genome-wide CRISPR knockout and activation screens probe the importance of every single human gene in aiding or fighting a virus, and ChIP-seq experiments reveal genome-wide epigenetic changes in response to infection. Deep mutational scanning is a relatively novel form of protein science which allows the in-depth functional analysis of every nucleotide within a viral gene or genome, revealing regions of importance, flexibility, and mutational potential. In this review, we discuss the application of this technique to RNA viruses including members of the Flaviviridae family, Influenza A Virus and Severe Acute Respiratory Syndrome Coronavirus 2. We also briefly discuss the reverse genetics systems which allow for analysis of viral replication cycles, next-generation sequencing technologies and the bioinformatics tools that facilitate this research.
Collapse
Affiliation(s)
| | - Nicholas S. Eyre
- College of Medicine and Public Health, Flinders University, Bedford Park, SA 5042, Australia;
| |
Collapse
|
35
|
Lai YC, Liu Z, Chen IA. Encapsulation of ribozymes inside model protocells leads to faster evolutionary adaptation. Proc Natl Acad Sci U S A 2021; 118:e2025054118. [PMID: 34001592 PMCID: PMC8166191 DOI: 10.1073/pnas.2025054118] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Functional biomolecules, such as RNA, encapsulated inside a protocellular membrane are believed to have comprised a very early, critical stage in the evolution of life, since membrane vesicles allow selective permeability and create a unit of selection enabling cooperative phenotypes. The biophysical environment inside a protocell would differ fundamentally from bulk solution due to the microscopic confinement. However, the effect of the encapsulated environment on ribozyme evolution has not been previously studied experimentally. Here, we examine the effect of encapsulation inside model protocells on the self-aminoacylation activity of tens of thousands of RNA sequences using a high-throughput sequencing assay. We find that encapsulation of these ribozymes generally increases their activity, giving encapsulated sequences an advantage over nonencapsulated sequences in an amphiphile-rich environment. In addition, highly active ribozymes benefit disproportionately more from encapsulation. The asymmetry in fitness gain broadens the distribution of fitness in the system. Consistent with Fisher's fundamental theorem of natural selection, encapsulation therefore leads to faster adaptation when the RNAs are encapsulated inside a protocell during in vitro selection. Thus, protocells would not only provide a compartmentalization function but also promote activity and evolutionary adaptation during the origin of life.
Collapse
Affiliation(s)
- Yei-Chen Lai
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, CA 90095
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095
| | - Ziwei Liu
- Medical Research Council Laboratory of Molecular Biology, Cambridge Biomedical Campus, Cambridge CB2 0QH, United Kingdom
| | - Irene A Chen
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, CA 90095;
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095
| |
Collapse
|
36
|
Manrubia S, Cuesta JA, Aguirre J, Ahnert SE, Altenberg L, Cano AV, Catalán P, Diaz-Uriarte R, Elena SF, García-Martín JA, Hogeweg P, Khatri BS, Krug J, Louis AA, Martin NS, Payne JL, Tarnowski MJ, Weiß M. From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics. Phys Life Rev 2021; 38:55-106. [PMID: 34088608 DOI: 10.1016/j.plrev.2021.03.004] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/01/2021] [Indexed: 12/21/2022]
Abstract
Understanding how genotypes map onto phenotypes, fitness, and eventually organisms is arguably the next major missing piece in a fully predictive theory of evolution. We refer to this generally as the problem of the genotype-phenotype map. Though we are still far from achieving a complete picture of these relationships, our current understanding of simpler questions, such as the structure induced in the space of genotypes by sequences mapped to molecular structures, has revealed important facts that deeply affect the dynamical description of evolutionary processes. Empirical evidence supporting the fundamental relevance of features such as phenotypic bias is mounting as well, while the synthesis of conceptual and experimental progress leads to questioning current assumptions on the nature of evolutionary dynamics-cancer progression models or synthetic biology approaches being notable examples. This work delves with a critical and constructive attitude into our current knowledge of how genotypes map onto molecular phenotypes and organismal functions, and discusses theoretical and empirical avenues to broaden and improve this comprehension. As a final goal, this community should aim at deriving an updated picture of evolutionary processes soundly relying on the structural properties of genotype spaces, as revealed by modern techniques of molecular and functional analysis.
Collapse
Affiliation(s)
- Susanna Manrubia
- Department of Systems Biology, Centro Nacional de Biotecnología (CSIC), Madrid, Spain; Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain; Instituto de Biocomputación y Física de Sistemas Complejos (BiFi), Universidad de Zaragoza, Spain; UC3M-Santander Big Data Institute (IBiDat), Getafe, Madrid, Spain
| | - Jacobo Aguirre
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Centro de Astrobiología, CSIC-INTA, ctra. de Ajalvir km 4, 28850 Torrejón de Ardoz, Madrid, Spain
| | - Sebastian E Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK
| | | | - Alejandro V Cano
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pablo Catalán
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain; Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid, Spain
| | - Santiago F Elena
- Instituto de Biología Integrativa de Sistemas, I(2)SysBio (CSIC-UV), València, Spain; The Santa Fe Institute, Santa Fe, NM, USA
| | | | - Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Utrecht University, the Netherlands
| | - Bhavin S Khatri
- The Francis Crick Institute, London, UK; Department of Life Sciences, Imperial College London, London, UK
| | - Joachim Krug
- Institute for Biological Physics, University of Cologne, Köln, Germany
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK
| | - Nora S Martin
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Marcel Weiß
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| |
Collapse
|
37
|
Abstract
RNA viruses, such as hepatitis C virus (HCV), influenza virus, and SARS-CoV-2, are notorious for their ability to evolve rapidly under selection in novel environments. It is known that the high mutation rate of RNA viruses can generate huge genetic diversity to facilitate viral adaptation. However, less attention has been paid to the underlying fitness landscape that represents the selection forces on viral genomes, especially under different selection conditions. Here, we systematically quantified the distribution of fitness effects of about 1,600 single amino acid substitutions in the drug-targeted region of NS5A protein of HCV. We found that the majority of nonsynonymous substitutions incur large fitness costs, suggesting that NS5A protein is highly optimized. The replication fitness of viruses is correlated with the pattern of sequence conservation in nature, and viral evolution is constrained by the need to maintain protein stability. We characterized the adaptive potential of HCV by subjecting the mutant viruses to selection by the antiviral drug daclatasvir at multiple concentrations. Both the relative fitness values and the number of beneficial mutations were found to increase with the increasing concentrations of daclatasvir. The changes in the spectrum of beneficial mutations in NS5A protein can be explained by a pharmacodynamics model describing viral fitness as a function of drug concentration. Overall, our results show that the distribution of fitness effects of mutations is modulated by both the constraints on the biophysical properties of proteins (i.e., selection pressure for protein stability) and the level of environmental stress (i.e., selection pressure for drug resistance). IMPORTANCE Many viruses adapt rapidly to novel selection pressures, such as antiviral drugs. Understanding how pathogens evolve under drug selection is critical for the success of antiviral therapy against human pathogens. By combining deep sequencing with selection experiments in cell culture, we have quantified the distribution of fitness effects of mutations in hepatitis C virus (HCV) NS5A protein. Our results indicate that the majority of single amino acid substitutions in NS5A protein incur large fitness costs. Simulation of protein stability suggests viral evolution is constrained by the need to maintain protein stability. By subjecting the mutant viruses to selection under an antiviral drug, we find that the adaptive potential of viral proteins in a novel environment is modulated by the level of environmental stress, which can be explained by a pharmacodynamics model. Our comprehensive characterization of the fitness landscapes of NS5A can potentially guide the design of effective strategies to limit viral evolution.
Collapse
|
38
|
Routh S, Acharyya A, Dhar R. A two-step PCR assembly for construction of gene variants across large mutational distances. Biol Methods Protoc 2021; 6:bpab007. [PMID: 33928191 PMCID: PMC8062255 DOI: 10.1093/biomethods/bpab007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/09/2021] [Accepted: 04/01/2021] [Indexed: 11/14/2022] Open
Abstract
Construction of empirical fitness landscapes has transformed our understanding of genotype-phenotype relationships across genes. However, most empirical fitness landscapes have been constrained to the local genotype neighbourhood of a gene primarily due to our limited ability to systematically construct genotypes that differ by a large number of mutations. Although a few methods have been proposed in the literature, these techniques are complex owing to several steps of construction or contain a large number of amplification cycles that increase chances of non-specific mutations. A few other described methods require amplification of the whole vector, thereby increasing the chances of vector backbone mutations that can have unintended consequences for study of fitness landscapes. Thus, this has substantially constrained us from traversing large mutational distances in the genotype network, thereby limiting our understanding of the interactions between multiple mutations and the role these interactions play in evolution of novel phenotypes. In the current work, we present a simple but powerful approach that allows us to systematically and accurately construct gene variants at large mutational distances. Our approach relies on building-up small fragments containing targeted mutations in the first step followed by assembly of these fragments into the complete gene fragment by polymerase chain reaction (PCR). We demonstrate the utility of our approach by constructing variants that differ by up to 11 mutations in a model gene. Our work thus provides an accurate method for construction of multi-mutant variants of genes and therefore will transform the studies of empirical fitness landscapes by enabling exploration of genotypes that are far away from a starting genotype.
Collapse
Affiliation(s)
- Shreya Routh
- Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, West Bengal, India
| | - Anamika Acharyya
- Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, West Bengal, India
| | - Riddhiman Dhar
- Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, West Bengal, India
| |
Collapse
|
39
|
Tack DS, Tonner PD, Pressman A, Olson ND, Levy SF, Romantseva EF, Alperovich N, Vasilyeva O, Ross D. The genotype-phenotype landscape of an allosteric protein. Mol Syst Biol 2021; 17:e10179. [PMID: 33784029 PMCID: PMC8009258 DOI: 10.15252/msb.202010179] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 02/15/2021] [Accepted: 02/18/2021] [Indexed: 12/18/2022] Open
Abstract
Allostery is a fundamental biophysical mechanism that underlies cellular sensing, signaling, and metabolism. Yet a quantitative understanding of allosteric genotype-phenotype relationships remains elusive. Here, we report the large-scale measurement of the genotype-phenotype landscape for an allosteric protein: the lac repressor from Escherichia coli, LacI. Using a method that combines long-read and short-read DNA sequencing, we quantitatively measure the dose-response curves for nearly 105 variants of the LacI genetic sensor. The resulting data provide a quantitative map of the effect of amino acid substitutions on LacI allostery and reveal systematic sequence-structure-function relationships. We find that in many cases, allosteric phenotypes can be quantitatively predicted with additive or neural-network models, but unpredictable changes also occur. For example, we were surprised to discover a new band-stop phenotype that challenges conventional models of allostery and that emerges from combinations of nearly silent amino acid substitutions.
Collapse
Affiliation(s)
- Drew S Tack
- National Institute of Standards and TechnologyGaithersburgMDUSA
| | - Peter D Tonner
- National Institute of Standards and TechnologyGaithersburgMDUSA
| | - Abe Pressman
- National Institute of Standards and TechnologyGaithersburgMDUSA
| | - Nathan D Olson
- National Institute of Standards and TechnologyGaithersburgMDUSA
| | - Sasha F Levy
- SLAC National Accelerator LaboratoryMenlo ParkCAUSA
- Joint Initiative for Metrology in BiologyStanfordCAUSA
| | | | - Nina Alperovich
- National Institute of Standards and TechnologyGaithersburgMDUSA
| | - Olga Vasilyeva
- National Institute of Standards and TechnologyGaithersburgMDUSA
| | - David Ross
- National Institute of Standards and TechnologyGaithersburgMDUSA
| |
Collapse
|
40
|
Specificity of RNA Folding and Its Association with Evolutionarily Adaptive mRNA Secondary Structures. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:882-900. [PMID: 33607297 PMCID: PMC9403030 DOI: 10.1016/j.gpb.2019.11.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 08/03/2019] [Accepted: 11/08/2019] [Indexed: 11/23/2022]
Abstract
The secondary structure is a fundamental feature of both noncoding and messenger RNAs. However, our understanding of the secondary structure of mRNA, especially that of the coding regions, remains elusive, likely due to translation and the lack of RNA-binding proteins that sustain the consensus structure, such as those that bind to noncoding RNA. Indeed, mRNA has recently been found to adopt diverse alternative structures, the overall functional significance of which remains untested. We hereby approached this problem by estimating the folding specificity, i.e., the probability that a fragment of RNA folds back to the same partner once refolded. We showed that the folding specificity of mRNA is lower than that of noncoding RNA and exhibits moderate evolutionary conservation. Notably, we found that specific rather than alternative folding is likely evolutionarily adaptive since specific folding is frequently associated with functionally important genes or sites within a gene. Additional analysis in combination with ribosome density suggests the ability to modulate ribosome movement as one potential functional advantage provided by specific folding. Our findings revealed a novel facet of the RNA structurome with important functional and evolutionary implications and indicated a potential method for distinguishing the mRNA secondary structures maintained by natural selection from molecular noise.
Collapse
|
41
|
Soo VWC, Swadling JB, Faure AJ, Warnecke T. Fitness landscape of a dynamic RNA structure. PLoS Genet 2021; 17:e1009353. [PMID: 33524037 PMCID: PMC7877785 DOI: 10.1371/journal.pgen.1009353] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 02/11/2021] [Accepted: 01/12/2021] [Indexed: 11/24/2022] Open
Abstract
RNA structures are dynamic. As a consequence, mutational effects can be hard to rationalize with reference to a single static native structure. We reasoned that deep mutational scanning experiments, which couple molecular function to fitness, should capture mutational effects across multiple conformational states simultaneously. Here, we provide a proof-of-principle that this is indeed the case, using the self-splicing group I intron from Tetrahymena thermophila as a model system. We comprehensively mutagenized two 4-bp segments of the intron. These segments first come together to form the P1 extension (P1ex) helix at the 5' splice site. Following cleavage at the 5' splice site, the two halves of the helix dissociate to allow formation of an alternative helix (P10) at the 3' splice site. Using an in vivo reporter system that couples splicing activity to fitness in E. coli, we demonstrate that fitness is driven jointly by constraints on P1ex and P10 formation. We further show that patterns of epistasis can be used to infer the presence of intramolecular pleiotropy. Using a machine learning approach that allows quantification of mutational effects in a genotype-specific manner, we demonstrate that the fitness landscape can be deconvoluted to implicate P1ex or P10 as the effective genetic background in which molecular fitness is compromised or enhanced. Our results highlight deep mutational scanning as a tool to study alternative conformational states, with the capacity to provide critical insights into the structure, evolution and evolvability of RNAs as dynamic ensembles. Our findings also suggest that, in the future, deep mutational scanning approaches might help reverse-engineer multiple alternative or successive conformations from a single fitness landscape.
Collapse
Affiliation(s)
- Valerie W. C. Soo
- Medical Research Council London Institute of Medical Sciences, London, United Kingdom
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Jacob B. Swadling
- Medical Research Council London Institute of Medical Sciences, London, United Kingdom
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Andre J. Faure
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Tobias Warnecke
- Medical Research Council London Institute of Medical Sciences, London, United Kingdom
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
42
|
Lyons DM, Zou Z, Xu H, Zhang J. Idiosyncratic epistasis creates universals in mutational effects and evolutionary trajectories. Nat Ecol Evol 2020; 4:1685-1693. [PMID: 32895516 PMCID: PMC7710555 DOI: 10.1038/s41559-020-01286-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Accepted: 07/23/2020] [Indexed: 01/06/2023]
Abstract
Patterns of epistasis and shapes of fitness landscapes are of wide interest because of their bearings on a number of evolutionary theories. The common phenomena of slowing fitness increases during adaptations and diminishing returns from beneficial mutations are believed to reflect a concave fitness landscape and a preponderance of negative epistasis. Paradoxically, fitness decreases tend to decelerate and harm from deleterious mutations shrinks during the accumulation of random mutations-patterns thought to indicate a convex fitness landscape and a predominance of positive epistasis. Current theories cannot resolve this apparent contradiction. Here, we show that the phenotypic effect of a mutation varies substantially depending on the specific genetic background and that this idiosyncrasy in epistasis creates all of the above trends without requiring a biased distribution of epistasis. The idiosyncratic epistasis theory explains the universalities in mutational effects and evolutionary trajectories as emerging from randomness due to biological complexity.
Collapse
Affiliation(s)
| | | | | | - Jianzhi Zhang
- Correspondence to Jianzhi Zhang, Department of Ecology and Evolutionary Biology, University of Michigan, 4018 Biological Sciences Building, 1105 North University Avenue, Ann Arbor, MI 48109, USA, Phone: 734-763-0527,
| |
Collapse
|
43
|
DiMSum: an error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies. Genome Biol 2020; 21:207. [PMID: 32799905 PMCID: PMC7429474 DOI: 10.1186/s13059-020-02091-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 07/05/2020] [Indexed: 12/30/2022] Open
Abstract
Deep mutational scanning (DMS) enables multiplexed measurement of the effects of thousands of variants of proteins, RNAs, and regulatory elements. Here, we present a customizable pipeline, DiMSum, that represents an end-to-end solution for obtaining variant fitness and error estimates from raw sequencing data. A key innovation of DiMSum is the use of an interpretable error model that captures the main sources of variability arising in DMS workflows, outperforming previous methods. DiMSum is available as an R/Bioconda package and provides summary reports to help researchers diagnose common DMS pathologies and take remedial steps in their analyses.
Collapse
|
44
|
Genotype networks of 80 quantitative Arabidopsis thaliana phenotypes reveal phenotypic evolvability despite pervasive epistasis. PLoS Comput Biol 2020; 16:e1008082. [PMID: 32790763 PMCID: PMC7447023 DOI: 10.1371/journal.pcbi.1008082] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 08/25/2020] [Accepted: 06/22/2020] [Indexed: 12/23/2022] Open
Abstract
We study the genotype-phenotype maps of 80 quantitative phenotypes in the model plant Arabidopsis thaliana, by representing the genotypes affecting each phenotype as a genotype network. In such a network, each vertex or node corresponds to an individual's genotype at all those genomic loci that affect a given phenotype. Two vertices are connected by an edge if the associated genotypes differ in exactly one nucleotide. The 80 genotype networks we analyze are based on data from genome-wide association studies of 199 A. thaliana accessions. They form connected graphs whose topography differs substantially among phenotypes. We focus our analysis on the incidence of epistasis (non-additive interactions among mutations) because a high incidence of epistasis can reduce the accessibility of evolutionary paths towards high or low phenotypic values. We find epistatic interactions in 67 phenotypes, and in 51 phenotypes every pairwise mutant interaction is epistatic. Moreover, we find phenotype-specific differences in the fraction of accessible mutational paths to maximum phenotypic values. However, even though epistasis affects the accessibility of maximum phenotypic values, the relationships between genotypic and phenotypic change of our analyzed phenotypes are sufficiently smooth that some evolutionary paths remain accessible for most phenotypes, even where epistasis is pervasive. The genotype network representation we use can complement existing approaches to understand the genetic architecture of polygenic traits in many different organisms.
Collapse
|
45
|
Csörgő B, Nyerges A, Pál C. Targeted mutagenesis of multiple chromosomal regions in microbes. Curr Opin Microbiol 2020; 57:22-30. [PMID: 32599531 PMCID: PMC7613694 DOI: 10.1016/j.mib.2020.05.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2020] [Revised: 05/18/2020] [Accepted: 05/20/2020] [Indexed: 12/20/2022]
Abstract
Directed evolution allows the effective engineering of proteins, biosynthetic pathways, and cellular functions. Traditional plasmid-based methods generally subject one or occasionally multiple genes-of-interest to mutagenesis, require time-consuming manual interventions, and the genes that are subjected to mutagenesis are outside of their native genomic context. Other methods mutagenize the whole genome unselectively which may distort the outcome. Recent recombineering- and CRISPR-based technologies radically change this field by allowing exceedingly high mutation rates at multiple, predefined loci in their native genomic context. In this review, we focus on recent technologies that potentially allow accelerated tunable mutagenesis at multiple genomic loci in the native genomic context of these target sequences. These technologies will be compared by four main criteria, including the scale of mutagenesis, portability to multiple microbial species, off-target mutagenesis, and cost-effectiveness. Finally, we discuss how these technical advances open new avenues in basic research and biotechnology.
Collapse
Affiliation(s)
- Bálint Csörgő
- Department of Microbiology and Immunology, University of California, San Francisco, 94143, San Francisco, CA, USA; Genome Biology Unit, European Molecular Biology Laboratory, 69117, Heidelberg, Germany.
| | - Akos Nyerges
- Synthetic and Systems Biology Unit, Biological Research Centre, 6726, Szeged, Hungary; Department of Genetics, Harvard Medical School, 02115, Boston, MA, USA
| | - Csaba Pál
- Synthetic and Systems Biology Unit, Biological Research Centre, 6726, Szeged, Hungary.
| |
Collapse
|
46
|
Kuo ST, Jahn RL, Cheng YJ, Chen YL, Lee YJ, Hollfelder F, Wen JD, Chou HHD. Global fitness landscapes of the Shine-Dalgarno sequence. Genome Res 2020; 30:711-723. [PMID: 32424071 PMCID: PMC7263185 DOI: 10.1101/gr.260182.119] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 04/21/2020] [Indexed: 01/06/2023]
Abstract
Shine-Dalgarno sequences (SD) in prokaryotic mRNA facilitate protein translation by pairing with rRNA in ribosomes. Although conventionally defined as AG-rich motifs, recent genomic surveys reveal great sequence diversity, questioning how SD functions. Here, we determined the molecular fitness (i.e., translation efficiency) of 49 synthetic 9-nt SD genotypes in three distinct mRNA contexts in Escherichia coli. We uncovered generic principles governing the SD fitness landscapes: (1) Guanine contents, rather than canonical SD motifs, best predict the fitness of both synthetic and endogenous SD; (2) the genotype-fitness correlation of SD promotes its evolvability by steadily supplying beneficial mutations across fitness landscapes; and (3) the frequency and magnitude of deleterious mutations increase with background fitness, and adjacent nucleotides in SD show stronger epistasis. Epistasis results from disruption of the continuous base pairing between SD and rRNA. This “chain-breaking” epistasis creates sinkholes in SD fitness landscapes and may profoundly impact the evolution and function of prokaryotic translation initiation and other RNA-mediated processes. Collectively, our work yields functional insights into the SD sequence variation in prokaryotic genomes, identifies a simple design principle to guide bioengineering and bioinformatic analysis of SD, and illuminates the fundamentals of fitness landscapes and molecular evolution.
Collapse
Affiliation(s)
- Syue-Ting Kuo
- Department of Life Science, National Taiwan University, Taipei 10617, Taiwan
| | - Ruey-Lin Jahn
- Department of Electrical Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Yuan-Ju Cheng
- Department of Life Science, National Taiwan University, Taipei 10617, Taiwan
| | - Yi-Lan Chen
- Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei 10617, Taiwan
| | - Yun-Ju Lee
- Department of Life Science, National Taiwan University, Taipei 10617, Taiwan
| | - Florian Hollfelder
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Jin-Der Wen
- Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei 10617, Taiwan.,Institute of Molecular and Cellular Biology, National Taiwan University, Taipei 10617, Taiwan
| | - Hsin-Hung David Chou
- Department of Life Science, National Taiwan University, Taipei 10617, Taiwan.,Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei 10617, Taiwan
| |
Collapse
|
47
|
Zhou J, McCandlish DM. Minimum epistasis interpolation for sequence-function relationships. Nat Commun 2020; 11:1782. [PMID: 32286265 PMCID: PMC7156698 DOI: 10.1038/s41467-020-15512-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open
Abstract
Massively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While such assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes have not been directly assayed. Here, we present an imputation method based on inferring the least epistatic possible sequence-function relationship compatible with the data. In particular, we infer the reconstruction where mutational effects change as little as possible across adjacent genetic backgrounds. The resulting models can capture complex higher-order genetic interactions near the data, but approach additivity where data is sparse or absent. We apply the method to high-throughput transcription factor binding assays and use it to explore a fitness landscape for protein G.
Collapse
Affiliation(s)
- Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
48
|
Zhang Z, Xiong P, Zhang T, Wang J, Zhan J, Zhou Y. Accurate inference of the full base-pairing structure of RNA by deep mutational scanning and covariation-induced deviation of activity. Nucleic Acids Res 2020; 48:1451-1465. [PMID: 31872260 PMCID: PMC7026644 DOI: 10.1093/nar/gkz1192] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 12/10/2019] [Accepted: 12/11/2019] [Indexed: 11/12/2022] Open
Abstract
Despite the large number of noncoding RNAs in human genome and their roles in many diseases include cancer, we know very little about them due to lack of structural clues. The centerpiece of the structural clues is the full RNA base-pairing structure of secondary and tertiary contacts that can be precisely obtained only from costly and time-consuming 3D structure determination. Here, we performed deep mutational scanning of self-cleaving CPEB3 ribozyme by error-prone PCR and showed that a library of <5 × 104 single-to-triple mutants is sufficient to infer 25 of 26 base pairs including non-nested, nonhelical, and noncanonical base pairs with both sensitivity and precision at 96%. Such accurate inference was further confirmed by a twister ribozyme at 100% precision with only noncanonical base pairs as false negatives. The performance was resulted from analyzing covariation-induced deviation of activity by utilizing both functional and nonfunctional variants for unsupervised classification, followed by Monte Carlo (MC) simulated annealing with mutation-derived scores. Highly accurate inference can also be obtained by combining MC with evolution/direct coupling analysis, R-scape or epistasis analysis. The results highlight the usefulness of deep mutational scanning for high-accuracy structural inference of self-cleaving ribozymes with implications for other structured RNAs that permit high-throughput functional selections.
Collapse
Affiliation(s)
- Zhe Zhang
- High Magnetic Field Laboratory, Key Laboratory of High Magnetic Field and Ion Beam Physical Biology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, Anhui, P. R. China
- University of Chinese Academy of Sciences, Beijing 101408, P. R. China
- Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD 4222, Australia
| | - Peng Xiong
- Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD 4222, Australia
| | - Tongchuan Zhang
- Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD 4222, Australia
| | - Junfeng Wang
- High Magnetic Field Laboratory, Key Laboratory of High Magnetic Field and Ion Beam Physical Biology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, Anhui, P. R. China
- Institute of Physical Science and Information Technology, Anhui University, Hefei 230031, Anhui, P. R. China
| | - Jian Zhan
- Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD 4222, Australia
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD 4222, Australia
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD 4222, Australia
| |
Collapse
|
49
|
Esteban L, Lonishin LR, Bobrovskiy DM, Leleytner G, Bogatyreva NS, Kondrashov FA, Ivankov DN. HypercubeME: two hundred million combinatorially complete datasets from a single experiment. Bioinformatics 2019; 36:btz841. [PMID: 31742320 PMCID: PMC7703787 DOI: 10.1093/bioinformatics/btz841] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 11/01/2019] [Accepted: 11/07/2019] [Indexed: 11/17/2022] Open
Abstract
MOTIVATION Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a "combinatorially complete dataset". So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. RESULTS We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. AVAILABILITY https://github.com/ivankovlab/HypercubeME.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Lyubov R Lonishin
- Faculty of Medical Physics, Institute of Biomedical System and Technologies, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg 195251, Russia
| | - Daniil M Bobrovskiy
- Faculty of Bioengineering and Bioinformatics, Moscow State University, Moscow 119234, Russia
| | - Gregory Leleytner
- Department of Innovation and High Technology, Moscow Institute of Physics and Technology, Moscow 141701, Russia
| | - Natalya S Bogatyreva
- Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain
- Bioinformatics and Genomics Programme, Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, 08003 Barcelona, Spain
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Moscow 142290, Russia
| | | | - Dmitry N Ivankov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| |
Collapse
|
50
|
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 2019; 20:223. [PMID: 31679514 PMCID: PMC6827219 DOI: 10.1186/s13059-019-1845-6] [Citation(s) in RCA: 138] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 10/01/2019] [Indexed: 11/10/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here, we present MaveDB ( https://www.mavedb.org ), a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first such application, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.
Collapse
Affiliation(s)
- Daniel Esposito
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
| | - Jochen Weile
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Lea M Starita
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Frederick P Roth
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia.
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
| |
Collapse
|