101
|
Abstract
RNA viruses, such as hepatitis C virus (HCV), influenza virus, and SARS-CoV-2, are notorious for their ability to evolve rapidly under selection in novel environments. It is known that the high mutation rate of RNA viruses can generate huge genetic diversity to facilitate viral adaptation. However, less attention has been paid to the underlying fitness landscape that represents the selection forces on viral genomes, especially under different selection conditions. Here, we systematically quantified the distribution of fitness effects of about 1,600 single amino acid substitutions in the drug-targeted region of NS5A protein of HCV. We found that the majority of nonsynonymous substitutions incur large fitness costs, suggesting that NS5A protein is highly optimized. The replication fitness of viruses is correlated with the pattern of sequence conservation in nature, and viral evolution is constrained by the need to maintain protein stability. We characterized the adaptive potential of HCV by subjecting the mutant viruses to selection by the antiviral drug daclatasvir at multiple concentrations. Both the relative fitness values and the number of beneficial mutations were found to increase with the increasing concentrations of daclatasvir. The changes in the spectrum of beneficial mutations in NS5A protein can be explained by a pharmacodynamics model describing viral fitness as a function of drug concentration. Overall, our results show that the distribution of fitness effects of mutations is modulated by both the constraints on the biophysical properties of proteins (i.e., selection pressure for protein stability) and the level of environmental stress (i.e., selection pressure for drug resistance). IMPORTANCE Many viruses adapt rapidly to novel selection pressures, such as antiviral drugs. Understanding how pathogens evolve under drug selection is critical for the success of antiviral therapy against human pathogens. By combining deep sequencing with selection experiments in cell culture, we have quantified the distribution of fitness effects of mutations in hepatitis C virus (HCV) NS5A protein. Our results indicate that the majority of single amino acid substitutions in NS5A protein incur large fitness costs. Simulation of protein stability suggests viral evolution is constrained by the need to maintain protein stability. By subjecting the mutant viruses to selection under an antiviral drug, we find that the adaptive potential of viral proteins in a novel environment is modulated by the level of environmental stress, which can be explained by a pharmacodynamics model. Our comprehensive characterization of the fitness landscapes of NS5A can potentially guide the design of effective strategies to limit viral evolution.
Collapse
|
102
|
Bhasin M, Varadarajan R. Prediction of Function Determining and Buried Residues Through Analysis of Saturation Mutagenesis Datasets. Front Mol Biosci 2021; 8:635425. [PMID: 33778004 PMCID: PMC7991590 DOI: 10.3389/fmolb.2021.635425] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 01/25/2021] [Indexed: 11/13/2022] Open
Abstract
Mutational scanning can be used to probe effects of large numbers of point mutations on protein function. Positions affected by mutation are primarily at either buried or at exposed residues directly involved in function, hereafter designated as active-site residues. In the absence of prior structural information, it has not been easy to distinguish between these two categories of residues. We curated and analyzed a set of twelve published deep mutational scanning datasets. The analysis revealed differential patterns of mutational sensitivity and substitution preferences at buried and exposed positions. Prediction of buried-sites solely from the mutational sensitivity data was facilitated by incorporating predicted sequence-based accessibility values. For active-site residues we observed mean sensitivity, specificity and accuracy of 61, 90 and 88% respectively. For buried residues the corresponding figures were 59, 90 and 84% while for exposed non active-site residues these were 98, 44 and 82% respectively. We also identified positions which did not follow these general trends and might require further experimental re-validation. This analysis highlights the ability of deep mutational scans to provide important structural and functional insights, even in the absence of three-dimensional structures determined using conventional structure determination techniques, and also discuss some limitations of the methodology.
Collapse
Affiliation(s)
- Munmun Bhasin
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Raghavan Varadarajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore, India
| |
Collapse
|
103
|
Nedrud D, Coyote-Maestas W, Schmidt D. A large-scale survey of pairwise epistasis reveals a mechanism for evolutionary expansion and specialization of PDZ domains. Proteins 2021; 89:899-914. [PMID: 33620761 DOI: 10.1002/prot.26067] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 02/02/2021] [Accepted: 02/18/2021] [Indexed: 12/21/2022]
Abstract
Deep mutational scanning (DMS) facilitates data-driven models of protein structure and function. Here, we adapted Saturated Programmable Insertion Engineering (SPINE) as a programmable DMS technique. We validate SPINE with a reference single mutant dataset in the PSD95 PDZ3 domain and then characterize most pairwise double mutants to study epistasis. We observe wide-spread proximal negative epistasis, which we attribute to mutations affecting thermodynamic stability, and strong long-range positive epistasis, which is enriched in an evolutionarily conserved and function-defining network of "sector" and clade-specifying residues. Conditional neutrality of mutations in clade-specifying residues compensates for deleterious mutations in sector positions. This suggests that epistatic interactions between these position pairs facilitated the evolutionary expansion and specialization of PDZ domains. We propose that SPINE provides easy experimental access to reveal epistasis signatures in proteins that will improve our understanding of the structural basis for protein function and adaptation.
Collapse
Affiliation(s)
- David Nedrud
- Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Willow Coyote-Maestas
- Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Daniel Schmidt
- Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
104
|
Rauscher R, Bampi GB, Guevara-Ferrer M, Santos LA, Joshi D, Mark D, Strug LJ, Rommens JM, Ballmann M, Sorscher EJ, Oliver KE, Ignatova Z. Positive epistasis between disease-causing missense mutations and silent polymorphism with effect on mRNA translation velocity. Proc Natl Acad Sci U S A 2021; 118:e2010612118. [PMID: 33468668 PMCID: PMC7848603 DOI: 10.1073/pnas.2010612118] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Epistasis refers to the dependence of a mutation on other mutation(s) and the genetic context in general. In the context of human disorders, epistasis complicates the spectrum of disease symptoms and has been proposed as a major contributor to variations in disease outcome. The nonadditive relationship between mutations and the lack of complete understanding of the underlying physiological effects limit our ability to predict phenotypic outcome. Here, we report positive epistasis between intragenic mutations in the cystic fibrosis transmembrane conductance regulator (CFTR)-the gene responsible for cystic fibrosis (CF) pathology. We identified a synonymous single-nucleotide polymorphism (sSNP) that is invariant for the CFTR amino acid sequence but inverts translation speed at the affected codon. This sSNP in cis exhibits positive epistatic effects on some CF disease-causing missense mutations. Individually, both mutations alter CFTR structure and function, yet when combined, they lead to enhanced protein expression and activity. The most robust effect was observed when the sSNP was present in combination with missense mutations that, along with the primary amino acid change, also alter the speed of translation at the affected codon. Functional studies revealed that synergistic alteration in ribosomal velocity is the underlying mechanism; alteration of translation speed likely increases the time window for establishing crucial domain-domain interactions that are otherwise perturbed by each individual mutation.
Collapse
Affiliation(s)
- Robert Rauscher
- Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Giovana B Bampi
- Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Marta Guevara-Ferrer
- Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Leonardo A Santos
- Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Disha Joshi
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA 30322
- Children's Healthcare of Atlanta, Atlanta, GA 30322
| | - David Mark
- Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Lisa J Strug
- Program in Genetics & Genome Biology, The Hospital for Sick Children, Toronto M5G 0A4, Canada
- Department of Statistical Sciences, Computer Science and Division of Biostatistics, University of Toronto, Toronto M5G 0A4, Canada
| | - Johanna M Rommens
- Program in Genetics & Genome Biology, The Hospital for Sick Children, Toronto M5G 0A4, Canada
| | | | - Eric J Sorscher
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA 30322
- Children's Healthcare of Atlanta, Atlanta, GA 30322
| | - Kathryn E Oliver
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA 30322
- Children's Healthcare of Atlanta, Atlanta, GA 30322
| | - Zoya Ignatova
- Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany;
| |
Collapse
|
105
|
Neverov AD, Popova AV, Fedonin GG, Cheremukhin EA, Klink GV, Bazykin GA. Episodic evolution of coadapted sets of amino acid sites in mitochondrial proteins. PLoS Genet 2021; 17:e1008711. [PMID: 33493156 PMCID: PMC7861529 DOI: 10.1371/journal.pgen.1008711] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 02/04/2021] [Accepted: 12/07/2020] [Indexed: 11/19/2022] Open
Abstract
The rate of evolution differs between protein sites and changes with time. However, the link between these two phenomena remains poorly understood. Here, we design a phylogenetic approach for distinguishing pairs of amino acid sites that evolve concordantly, i.e., such that substitutions at one site trigger subsequent substitutions at the other; and also pairs of sites that evolve discordantly, so that substitutions at one site impede subsequent substitutions at the other. We distinguish groups of amino acid sites that undergo coordinated evolution and evolve discordantly from other such groups. In mitochondrion-encoded proteins of metazoans and fungi, we show that concordantly evolving sites are clustered in protein structures. By analysing the phylogenetic patterns of substitutions at concordantly and discordantly evolving site pairs, we find that concordant evolution has two distinct causes: epistatic interactions between amino acid substitutions and episodes of selection independently affecting substitutions at different sites. The rate of substitutions at concordantly evolving groups of protein sites changes in the course of evolution, indicating episodes of selection limited to some of the lineages. The phylogenetic positions of these changes are consistent between proteins, suggesting common selective forces underlying them. The mode and rate of evolution of a protein site depends on the effect of its mutations on protein fitness. The fitness effect of a mutation itself can change in the course of evolution for at least two reasons. First, it can be modulated by substitutions occurring at other sites, a phenomenon called epistasis. Second, changes in selection can be non-epistatic, affecting sites independently of one another. Here, we analyse substitutions accumulated by the evolving lineages of the five proteins encoded by the mitochondrial genomes of thousands of species of metazoans and fungi. We show that substitutions at different amino acid sites occur in a coordinated fashion, and this coordination is caused both by epistasis and by episodes of selection affecting groups of sites. We partition each protein into several groups of concordantly evolving sites such that evolution of sites from different groups is discordant, and show that the proteins encoded by the mitochondrial genome consist of coevolving structural blocks. Some of these blocks have a clear functional specialization, e.g. are associated with interfaces between proteins composing respiratory complexes. Together, our results reveal a previously unrecognized complexity in the causes of variation in evolutionary rates between protein sites.
Collapse
Affiliation(s)
- Alexey D. Neverov
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
- * E-mail:
| | - Anfisa V. Popova
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
| | - Gennady G. Fedonin
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow region, Russia
| | | | - Galya V. Klink
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
| | - Georgii A. Bazykin
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
| |
Collapse
|
106
|
Strokach A, Lu TY, Kim PM. ELASPIC2 (EL2): Combining Contextualized Language Models and Graph Neural Networks to Predict Effects of Mutations. J Mol Biol 2021; 433:166810. [PMID: 33450251 DOI: 10.1016/j.jmb.2021.166810] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 12/19/2020] [Accepted: 01/03/2021] [Indexed: 12/21/2022]
Abstract
The ELASPIC web server allows users to evaluate the effect of mutations on protein folding and protein-protein interaction on a proteome-wide scale. It uses homology models of proteins and protein-protein interactions, which have been precalculated for several proteomes, and machine learning models, which integrate structural information with sequence conservation scores, in order to make its predictions. Since the original publication of the ELASPIC web server, several advances have motivated a revisiting of the problem of mutation effect prediction. First, progress in neural network architectures and self-supervised pre-trained has resulted in models which provide more informative embeddings of protein sequence and structure than those used by the original version of ELASPIC. Second, the amount of training data has increased several-fold, largely driven by advances in deep mutation scanning and other multiplexed assays of variant effect. Here, we describe two machine learning models which leverage the recent advances in order to achieve superior accuracy in predicting the effect of mutation on protein folding and protein-protein interaction. The models incorporate features generated using pre-trained transformer- and graph convolution-based neural networks, and are trained to optimize a ranking objective function, which permits the use of heterogeneous training data. The outputs from the new models have been incorporated into the ELASPIC web server, available at http://elaspic.kimlab.org.
Collapse
Affiliation(s)
- Alexey Strokach
- Department of Computer Science, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Tian Yu Lu
- Department of Computer Science, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Philip M Kim
- Department of Computer Science, University of Toronto, Toronto, ON M5S 3E1, Canada; Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada.
| |
Collapse
|
107
|
Munro D, Singh M. DeMaSk: a deep mutational scanning substitution matrix and its use for variant impact prediction. Bioinformatics 2020; 36:5322-5329. [PMID: 33325500 PMCID: PMC8016454 DOI: 10.1093/bioinformatics/btaa1030] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 10/16/2020] [Accepted: 11/30/2020] [Indexed: 01/27/2023] Open
Abstract
Motivation Accurately predicting the quantitative impact of a substitution on a protein’s molecular function would be a great aid in understanding the effects of observed genetic variants across populations. While this remains a challenging task, new approaches can leverage data from the increasing numbers of comprehensive deep mutational scanning (DMS) studies that systematically mutate proteins and measure fitness. Results We introduce DeMaSk, an intuitive and interpretable method based only upon DMS datasets and sequence homologs that predicts the impact of missense mutations within any protein. DeMaSk first infers a directional amino acid substitution matrix from DMS datasets and then fits a linear model that combines these substitution scores with measures of per-position evolutionary conservation and variant frequency across homologs. Despite its simplicity, DeMaSk has state-of-the-art performance in predicting the impact of amino acid substitutions, and can easily and rapidly be applied to any protein sequence. Availability and implementation https://demask.princeton.edu generates fitness impact predictions and visualizations for any user-submitted protein sequence. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Munro
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, 08544, USA
| | - Mona Singh
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, 08544, USA.,Department of Computer Science, Princeton University, Princeton, 08544, USA
| |
Collapse
|
108
|
Lyons DM, Zou Z, Xu H, Zhang J. Idiosyncratic epistasis creates universals in mutational effects and evolutionary trajectories. Nat Ecol Evol 2020; 4:1685-1693. [PMID: 32895516 PMCID: PMC7710555 DOI: 10.1038/s41559-020-01286-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Accepted: 07/23/2020] [Indexed: 01/06/2023]
Abstract
Patterns of epistasis and shapes of fitness landscapes are of wide interest because of their bearings on a number of evolutionary theories. The common phenomena of slowing fitness increases during adaptations and diminishing returns from beneficial mutations are believed to reflect a concave fitness landscape and a preponderance of negative epistasis. Paradoxically, fitness decreases tend to decelerate and harm from deleterious mutations shrinks during the accumulation of random mutations-patterns thought to indicate a convex fitness landscape and a predominance of positive epistasis. Current theories cannot resolve this apparent contradiction. Here, we show that the phenotypic effect of a mutation varies substantially depending on the specific genetic background and that this idiosyncrasy in epistasis creates all of the above trends without requiring a biased distribution of epistasis. The idiosyncratic epistasis theory explains the universalities in mutational effects and evolutionary trajectories as emerging from randomness due to biological complexity.
Collapse
Affiliation(s)
| | | | | | - Jianzhi Zhang
- Correspondence to Jianzhi Zhang, Department of Ecology and Evolutionary Biology, University of Michigan, 4018 Biological Sciences Building, 1105 North University Avenue, Ann Arbor, MI 48109, USA, Phone: 734-763-0527,
| |
Collapse
|
109
|
Zurek PJ, Knyphausen P, Neufeld K, Pushpanath A, Hollfelder F. UMI-linked consensus sequencing enables phylogenetic analysis of directed evolution. Nat Commun 2020; 11:6023. [PMID: 33243970 PMCID: PMC7691348 DOI: 10.1038/s41467-020-19687-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 10/12/2020] [Indexed: 11/09/2022] Open
Abstract
The success of protein evolution campaigns is strongly dependent on the sequence context in which mutations are introduced, stemming from pervasive non-additive interactions between a protein's amino acids ('intra-gene epistasis'). Our limited understanding of such epistasis hinders the correct prediction of the functional contributions and adaptive potential of mutations. Here we present a straightforward unique molecular identifier (UMI)-linked consensus sequencing workflow (UMIC-seq) that simplifies mapping of evolutionary trajectories based on full-length sequences. Attaching UMIs to gene variants allows accurate consensus generation for closely related genes with nanopore sequencing. We exemplify the utility of this approach by reconstructing the artificial phylogeny emerging in three rounds of directed evolution of an amine dehydrogenase biocatalyst via ultrahigh throughput droplet screening. Uniquely, we are able to identify lineages and their founding variant, as well as non-additive interactions between mutations within a full gene showing sign epistasis. Access to deep and accurate long reads will facilitate prediction of key beneficial mutations and adaptive potential based on in silico analysis of large sequence datasets.
Collapse
Affiliation(s)
- Paul Jannis Zurek
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
- Johnson Matthey Plc, Cambridge, CB4 0WE, UK
| | - Philipp Knyphausen
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | - Katharina Neufeld
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
- Johnson Matthey Plc, Cambridge, CB4 0WE, UK
| | | | - Florian Hollfelder
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK.
| |
Collapse
|
110
|
Alejaldre L, Lemay-St-Denis C, Perez Lopez C, Sancho Jodar F, Guallar V, Pelletier JN. Known Evolutionary Paths Are Accessible to Engineered ß-Lactamases Having Altered Protein Motions at the Timescale of Catalytic Turnover. Front Mol Biosci 2020; 7:599298. [PMID: 33330628 PMCID: PMC7716773 DOI: 10.3389/fmolb.2020.599298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 10/23/2020] [Indexed: 11/26/2022] Open
Abstract
The evolution of new protein functions is dependent upon inherent biophysical features of proteins. Whereas, it has been shown that changes in protein dynamics can occur in the course of directed molecular evolution trajectories and contribute to new function, it is not known whether varying protein dynamics modify the course of evolution. We investigate this question using three related ß-lactamases displaying dynamics that differ broadly at the slow timescale that corresponds to catalytic turnover yet have similar fast dynamics, thermal stability, catalytic, and substrate recognition profiles. Introduction of substitutions E104K and G238S, that are known to have a synergistic effect on function in the parent ß-lactamase, showed similar increases in catalytic efficiency toward cefotaxime in the related ß-lactamases. Molecular simulations using Protein Energy Landscape Exploration reveal that this results from stabilizing the catalytically-productive conformations, demonstrating the dominance of the synergistic effect of the E014K and G238S substitutions in vitro in contexts that vary in terms of sequence and dynamics. Furthermore, three rounds of directed molecular evolution demonstrated that known cefotaximase-enhancing mutations were accessible regardless of the differences in dynamics. Interestingly, specific sequence differences between the related ß-lactamases were shown to have a higher effect in evolutionary outcomes than did differences in dynamics. Overall, these ß-lactamase models show tolerance to protein dynamics at the timescale of catalytic turnover in the evolution of a new function.
Collapse
Affiliation(s)
- Lorea Alejaldre
- Biochemistry Department, Université de Montréal, Montréal, QC, Canada
- PROTEO, The Québec Network for Research on Protein, Function, Engineering and Applications, Quebec City, QC, Canada
- CGCC, Center in Green Chemistry and Catalysis, Montréal, QC, Canada
| | - Claudèle Lemay-St-Denis
- Biochemistry Department, Université de Montréal, Montréal, QC, Canada
- PROTEO, The Québec Network for Research on Protein, Function, Engineering and Applications, Quebec City, QC, Canada
- CGCC, Center in Green Chemistry and Catalysis, Montréal, QC, Canada
| | | | | | - Victor Guallar
- Barcelona Supercomputing Center, Barcelona, Spain
- ICREA: Institució Catalana de Recerca i Estudis Avancats, Barcelona, Spain
| | - Joelle N. Pelletier
- Biochemistry Department, Université de Montréal, Montréal, QC, Canada
- PROTEO, The Québec Network for Research on Protein, Function, Engineering and Applications, Quebec City, QC, Canada
- CGCC, Center in Green Chemistry and Catalysis, Montréal, QC, Canada
- Chemistry Department, Université de Montréal, Montréal, QC, Canada
| |
Collapse
|
111
|
Song H, Bremer BJ, Hinds EC, Raskutti G, Romero PA. Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning. Cell Syst 2020; 12:92-101.e8. [PMID: 33212013 DOI: 10.1016/j.cels.2020.10.007] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 08/13/2020] [Accepted: 10/22/2020] [Indexed: 10/22/2022]
Abstract
Machine learning can infer how protein sequence maps to function without requiring a detailed understanding of the underlying physical or biological mechanisms. It is challenging to apply existing supervised learning frameworks to large-scale experimental data generated by deep mutational scanning (DMS) and related methods. DMS data often contain high-dimensional and correlated sequence variables, experimental sampling error and bias, and the presence of missing data. Notably, most DMS data do not contain examples of negative sequences, making it challenging to directly estimate how sequence affects function. Here, we develop a positive-unlabeled (PU) learning framework to infer sequence-function relationships from large-scale DMS data. Our PU learning method displays excellent predictive performance across ten large-scale sequence-function datasets, representing proteins of different folds, functions, and library types. The estimated parameters pinpoint key residues that dictate protein structure and function. Finally, we apply our statistical sequence-function model to design highly stabilized enzymes.
Collapse
Affiliation(s)
- Hyebin Song
- Department of Statistics, The Pennsylvania State University, State College, PA 16802, USA; Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Bennett J Bremer
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Emily C Hinds
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Garvesh Raskutti
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Philip A Romero
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA.
| |
Collapse
|
112
|
High-Throughput Protein Engineering by Massively Parallel Combinatorial Mutagenesis. Methods Mol Biol 2020. [PMID: 33125641 DOI: 10.1007/978-1-0716-0892-0_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Exploring how combinatorial mutations can be combined to optimize protein functions is important to guide protein engineering. Given the vast combinatorial space of changing multiple amino acids, identifying the top-performing variants from a large number of mutants might not be possible without a high-throughput gene assembly and screening strategy. Here we describe the CombiSEAL platform, a strategy that allows for modularization of any protein sequence into multiple segments for mutagenesis and barcoding, and seamless single-pot ligations of different segments to generate a library of combination mutants linked with concatenated barcodes at one end. By reading the barcodes using next-generation sequencing, activities of each protein variant during the protein selection process can be easily tracked in a high-throughput manner. CombiSEAL not only allows the identification of better protein variants but also enables the systematic analyses to distinguish the beneficial, deleterious, and neutral effects of combining different mutations on protein functions.
Collapse
|
113
|
Lite TLV, Grant RA, Nocedal I, Littlehale ML, Guo MS, Laub MT. Uncovering the basis of protein-protein interaction specificity with a combinatorially complete library. eLife 2020; 9:e60924. [PMID: 33107822 PMCID: PMC7669267 DOI: 10.7554/elife.60924] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 10/26/2020] [Indexed: 12/27/2022] Open
Abstract
Protein-protein interaction specificity is often encoded at the primary sequence level. However, the contributions of individual residues to specificity are usually poorly understood and often obscured by mutational robustness, sequence degeneracy, and epistasis. Using bacterial toxin-antitoxin systems as a model, we screened a combinatorially complete library of antitoxin variants at three key positions against two toxins. This library enabled us to measure the effect of individual substitutions on specificity in hundreds of genetic backgrounds. These distributions allow inferences about the general nature of interface residues in promoting specificity. We find that positive and negative contributions to specificity are neither inherently coupled nor mutually exclusive. Further, a wild-type antitoxin appears optimized for specificity as no substitutions improve discrimination between cognate and non-cognate partners. By comparing crystal structures of paralogous complexes, we provide a rationale for our observations. Collectively, this work provides a generalizable approach to understanding the logic of molecular recognition.
Collapse
Affiliation(s)
- Thuy-Lan V Lite
- Department of Biology Massachusetts Institute of TechnologyCambridgeUnited States
| | - Robert A Grant
- Department of Biology Massachusetts Institute of TechnologyCambridgeUnited States
| | - Isabel Nocedal
- Department of Biology Massachusetts Institute of TechnologyCambridgeUnited States
| | - Megan L Littlehale
- Department of Biology Massachusetts Institute of TechnologyCambridgeUnited States
| | - Monica S Guo
- Department of Biology Massachusetts Institute of TechnologyCambridgeUnited States
| | - Michael T Laub
- Department of Biology Massachusetts Institute of TechnologyCambridgeUnited States
- Howard Hughes Medical Institute Massachusetts Institute of TechnologyCambridgeUnited States
| |
Collapse
|
114
|
Vihinen M. Functional effects of protein variants. Biochimie 2020; 180:104-120. [PMID: 33164889 DOI: 10.1016/j.biochi.2020.10.009] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 10/15/2020] [Accepted: 10/19/2020] [Indexed: 12/11/2022]
Abstract
Genetic and other variations frequently affect protein functions. Scientific articles can contain confusing descriptions about which function or property is affected, and in many cases the statements are pure speculation without any experimental evidence. To clarify functional effects of protein variations of genetic or non-genetic origin, a systematic conceptualisation and framework are introduced. This framework describes protein functional effects on abundance, activity, specificity and affinity, along with countermeasures, which allow cells, tissues and organisms to tolerate, avoid, repair, attenuate or resist (TARAR) the effects. Effects on abundance discussed include gene dosage, restricted expression, mis-localisation and degradation. Enzymopathies, effects on kinetics, allostery and regulation of protein activity are subtopics for the effects of variants on activity. Variation outcomes on specificity and affinity comprise promiscuity, specificity, affinity and moonlighting. TARAR mechanisms redress variations with active and passive processes including chaperones, redundancy, robustness, canalisation and metabolic and signalling rewiring. A framework for pragmatic protein function analysis and presentation is introduced. All of the mechanisms and effects are described along with representative examples, most often in relation to diseases. In addition, protein function is discussed from evolutionary point of view. Application of the presented framework facilitates unambiguous, detailed and specific description of functional effects and their systematic study.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184, Lund, Sweden.
| |
Collapse
|
115
|
Li X, Lehner B. Biophysical ambiguities prevent accurate genetic prediction. Nat Commun 2020; 11:4923. [PMID: 33004824 PMCID: PMC7529754 DOI: 10.1038/s41467-020-18694-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 09/04/2020] [Indexed: 12/27/2022] Open
Abstract
A goal of biology is to predict how mutations combine to alter phenotypes, fitness and disease. It is often assumed that mutations combine additively or with interactions that can be predicted. Here, we show using simulations that, even for the simple example of the lambda phage transcription factor CI repressing a gene, this assumption is incorrect and that perfect measurements of the effects of mutations on a trait and mechanistic understanding can be insufficient to predict what happens when two mutations are combined. This apparent paradox arises because mutations can have different biophysical effects to cause the same change in a phenotype and the outcome in a double mutant depends upon what these hidden biophysical changes actually are. Pleiotropy and non-monotonic functions further confound prediction of how mutations interact. Accurate prediction of phenotypes and disease will sometimes not be possible unless these biophysical ambiguities can be resolved using additional measurements.
Collapse
Affiliation(s)
- Xianghua Li
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain. .,Universitat Pompeu Fabra (UPF), Barcelona, Spain. .,ICREA, Pg. Luis Companys 23, Barcelona, 08010, Spain.
| |
Collapse
|
116
|
Zhang TH, Dai L, Barton JP, Du Y, Tan Y, Pang W, Chakraborty AK, Lloyd-Smith JO, Sun R. Predominance of positive epistasis among drug resistance-associated mutations in HIV-1 protease. PLoS Genet 2020; 16:e1009009. [PMID: 33085662 PMCID: PMC7605711 DOI: 10.1371/journal.pgen.1009009] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 11/02/2020] [Accepted: 07/24/2020] [Indexed: 12/12/2022] Open
Abstract
Drug-resistant mutations often have deleterious impacts on replication fitness, posing a fitness cost that can only be overcome by compensatory mutations. However, the role of fitness cost in the evolution of drug resistance has often been overlooked in clinical studies or in vitro selection experiments, as these observations only capture the outcome of drug selection. In this study, we systematically profile the fitness landscape of resistance-associated sites in HIV-1 protease using deep mutational scanning. We construct a mutant library covering combinations of mutations at 11 sites in HIV-1 protease, all of which are associated with resistance to protease inhibitors in clinic. Using deep sequencing, we quantify the fitness of thousands of HIV-1 protease mutants after multiple cycles of replication in human T cells. Although the majority of resistance-associated mutations have deleterious effects on viral replication, we find that epistasis among resistance-associated mutations is predominantly positive. Furthermore, our fitness data are consistent with genetic interactions inferred directly from HIV sequence data of patients. Fitness valleys formed by strong positive epistasis reduce the likelihood of reversal of drug resistance mutations. Overall, our results support the view that strong compensatory effects are involved in the emergence of clinically observed resistance mutations and provide insights to understanding fitness barriers in the evolution and reversion of drug resistance.
Collapse
Affiliation(s)
- Tian-hao Zhang
- Molecular Biology Institute, University of California, Los Angeles, CA 90095, USA
| | - Lei Dai
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - John P. Barton
- Department of Physics and Astronomy, University of California, Riverside, CA 92521, USA
| | - Yushen Du
- School of Medicine, ZheJiang University, Hangzhou, 210000, China
- Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA
| | - Yuxiang Tan
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Wenwen Pang
- Department of Public Health Laboratory Science, West China School of Public Health, Sichuan University, Chengdu 610041, China
| | - Arup K. Chakraborty
- Institute for Medical Engineering and Science, Departments of Chemical Engineering, Physics, & Chemistry, Massachusetts Institute of Technology, MA 21309, USA
- Ragon Institute of MGH, MIT, & Harvard, Cambridge, MA 21309, USA
| | - James O. Lloyd-Smith
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, USA
| | - Ren Sun
- Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
117
|
Abstract
Living systems evolve one mutation at a time, but a single mutation can alter the effect of subsequent mutations. The underlying mechanistic determinants of such epistasis are unclear. Here, we demonstrate that the physical dynamics of a biological system can generically constrain epistasis. We analyze models and experimental data on proteins and regulatory networks. In each, we find that if the long-time physical dynamics is dominated by a slow, collective mode, then the dimensionality of mutational effects is reduced. Consequently, epistatic coefficients for different combinations of mutations are no longer independent, even if individually strong. Such epistasis can be summarized as resulting from a global nonlinearity applied to an underlying linear trait, that is, as global epistasis. This constraint, in turn, reduces the ruggedness of the sequence-to-function map. By providing a generic mechanistic origin for experimentally observed global epistasis, our work suggests that slow collective physical modes can make biological systems evolvable.
Collapse
Affiliation(s)
- Kabir Husain
- Department of Physics, University of Chicago, Chicago, IL
| | - Arvind Murugan
- Department of Physics, University of Chicago, Chicago, IL
| |
Collapse
|
118
|
Jakobson CM, Jarosz DF. What Has a Century of Quantitative Genetics Taught Us About Nature's Genetic Tool Kit? Annu Rev Genet 2020; 54:439-464. [PMID: 32897739 DOI: 10.1146/annurev-genet-021920-102037] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The complexity of heredity has been appreciated for decades: Many traits are controlled not by a single genetic locus but instead by polymorphisms throughout the genome. The importance of complex traits in biology and medicine has motivated diverse approaches to understanding their detailed genetic bases. Here, we focus on recent systematic studies, many in budding yeast, which have revealed that large numbers of all kinds of molecular variation, from noncoding to synonymous variants, can make significant contributions to phenotype. Variants can affect different traits in opposing directions, and their contributions can be modified by both the environment and the epigenetic state of the cell. The integration of prospective (synthesizing and analyzing variants) and retrospective (examining standing variation) approaches promises to reveal how natural selection shapes quantitative traits. Only by comprehensively understanding nature's genetic tool kit can we predict how phenotypes arise from the complex ensembles of genetic variants in living organisms.
Collapse
Affiliation(s)
- Christopher M Jakobson
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, California 94305, USA;
| | - Daniel F Jarosz
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, California 94305, USA; .,Department of Developmental Biology, Stanford University School of Medicine, Stanford, California 94305, USA
| |
Collapse
|
119
|
Procko E. Deep mutagenesis in the study of COVID-19: a technical overview for the proteomics community. Expert Rev Proteomics 2020; 17:633-638. [PMID: 33084449 PMCID: PMC7594187 DOI: 10.1080/14789450.2020.1833721] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 10/05/2020] [Indexed: 12/14/2022]
Abstract
INTRODUCTION The spike (S) of SARS coronavirus 2 (SARS-CoV-2) engages angiotensin-converting enzyme 2 (ACE2) on a host cell to trigger viral-cell membrane fusion and infection. The extracellular region of ACE2 can be administered as a soluble decoy to compete for binding sites on the receptor-binding domain (RBD) of S, but it has only moderate affinity and efficacy. The RBD, which is targeted by neutralizing antibodies, may also change and adapt through mutation as SARS-CoV-2 becomes endemic, posing challenges for therapeutic and vaccine development. AREAS COVERED Deep mutagenesis is a Big Data approach to characterizing sequence variants. A deep mutational scan of ACE2 expressed on human cells identified mutations that increase S affinity and guided the engineering of a potent and broad soluble receptor decoy. A deep mutational scan of the RBD displayed on the surface of yeast has revealed residues tolerant of mutational changes that may act as a source for drug resistance and antigenic drift. EXPERT OPINION Deep mutagenesis requires a selection of diverse sequence variants; an in vitro evolution experiment that is tracked with next-generation sequencing. The choice of expression system, diversity of the variant library and selection strategy have important consequences for data quality and interpretation.
Collapse
Affiliation(s)
- Erik Procko
- Department of Biochemistry and Cancer Center at Illinois, University of Illinois, Urbana, IL, USA
| |
Collapse
|
120
|
DiMSum: an error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies. Genome Biol 2020; 21:207. [PMID: 32799905 PMCID: PMC7429474 DOI: 10.1186/s13059-020-02091-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 07/05/2020] [Indexed: 12/30/2022] Open
Abstract
Deep mutational scanning (DMS) enables multiplexed measurement of the effects of thousands of variants of proteins, RNAs, and regulatory elements. Here, we present a customizable pipeline, DiMSum, that represents an end-to-end solution for obtaining variant fitness and error estimates from raw sequencing data. A key innovation of DiMSum is the use of an interpretable error model that captures the main sources of variability arising in DMS workflows, outperforming previous methods. DiMSum is available as an R/Bioconda package and provides summary reports to help researchers diagnose common DMS pathologies and take remedial steps in their analyses.
Collapse
|
121
|
Yang G, Miton CM, Tokuriki N. A mechanistic view of enzyme evolution. Protein Sci 2020; 29:1724-1747. [PMID: 32557882 PMCID: PMC7380680 DOI: 10.1002/pro.3901] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 06/14/2020] [Accepted: 06/16/2020] [Indexed: 12/15/2022]
Abstract
New enzyme functions often evolve through the recruitment and optimization of latent promiscuous activities. How do mutations alter the molecular architecture of enzymes to enhance their activities? Can we infer general mechanisms that are common to most enzymes, or does each enzyme require a unique optimization process? The ability to predict the location and type of mutations necessary to enhance an enzyme's activity is critical to protein engineering and rational design. In this review, via the detailed examination of recent studies that have shed new light on the molecular changes underlying the optimization of enzyme function, we provide a mechanistic perspective of enzyme evolution. We first present a global survey of the prevalence of activity-enhancing mutations and their distribution within protein structures. We then delve into the molecular solutions that mediate functional optimization, specifically highlighting several common mechanisms that have been observed across multiple examples. As distinct protein sequences encounter different evolutionary bottlenecks, different mechanisms are likely to emerge along evolutionary trajectories toward improved function. Identifying the specific mechanism(s) that need to be improved upon, and tailoring our engineering efforts to each sequence, may considerably improve our chances to succeed in generating highly efficient catalysts in the future.
Collapse
Affiliation(s)
- Gloria Yang
- Michael Smith LaboratoriesUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| | - Charlotte M. Miton
- Michael Smith LaboratoriesUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| | - Nobuhiko Tokuriki
- Michael Smith LaboratoriesUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| |
Collapse
|
122
|
Rogers JM. Peptide Folding and Binding Probed by Systematic Non-canonical Mutagenesis. Front Mol Biosci 2020; 7:100. [PMID: 32671094 PMCID: PMC7326784 DOI: 10.3389/fmolb.2020.00100] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 05/04/2020] [Indexed: 12/20/2022] Open
Abstract
Many proteins and peptides fold upon binding another protein. Mutagenesis has proved an essential tool in the study of these multi-step molecular recognition processes. By comparing the biophysical behavior of carefully selected mutants, the concert of interactions and conformational changes that occur during folding and binding can be separated and assessed. Recently, this mutagenesis approach has been radically expanded by deep mutational scanning methods, which allow for many thousands of mutations to be examined in parallel. Furthermore, these high-throughput mutagenesis methods have been expanded to include mutations to non-canonical amino acids, returning peptide structure-activity relationships with unprecedented depth and detail. These developments are timely, as the insights they provide can guide the optimization of de novo cyclic peptides, a promising new modality for chemical probes and therapeutic agents.
Collapse
Affiliation(s)
- Joseph M Rogers
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
123
|
Wu NC, Thompson AJ, Lee JM, Su W, Arlian BM, Xie J, Lerner RA, Yen HL, Bloom JD, Wilson IA. Different genetic barriers for resistance to HA stem antibodies in influenza H3 and H1 viruses. Science 2020; 368:1335-1340. [PMID: 32554590 PMCID: PMC7412937 DOI: 10.1126/science.aaz5143] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 04/14/2020] [Indexed: 12/19/2022]
Abstract
The discovery and characterization of broadly neutralizing human antibodies (bnAbs) to the highly conserved stem region of influenza hemagglutinin (HA) have contributed to considerations of a universal influenza vaccine. However, the potential for resistance to stem bnAbs also needs to be more thoroughly evaluated. Using deep mutational scanning, with a focus on epitope residues, we found that the genetic barrier to resistance to stem bnAbs is low for the H3 subtype but substantially higher for the H1 subtype owing to structural differences in the HA stem. Several strong resistance mutations in H3 can be observed in naturally circulating strains and do not reduce in vitro viral fitness and in vivo pathogenicity. This study highlights a potential challenge for development of a truly universal influenza vaccine.
Collapse
Affiliation(s)
- Nicholas C Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Andrew J Thompson
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Juhye M Lee
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA 98195, USA
| | - Wen Su
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Britni M Arlian
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Jia Xie
- Department of Chemistry, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Richard A Lerner
- Department of Chemistry, The Scripps Research Institute, La Jolla, CA 92037, USA
- The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Hui-Ling Yen
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Jesse D Bloom
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Ian A Wilson
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA.
- The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
124
|
Allostery and Epistasis: Emergent Properties of Anisotropic Networks. ENTROPY 2020; 22:e22060667. [PMID: 33286439 PMCID: PMC7517209 DOI: 10.3390/e22060667] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 06/02/2020] [Accepted: 06/08/2020] [Indexed: 11/17/2022]
Abstract
Understanding the underlying mechanisms behind protein allostery and non-additivity of substitution outcomes (i.e., epistasis) is critical when attempting to predict the functional impact of mutations, particularly at non-conserved sites. In an effort to model these two biological properties, we extend the framework of our metric to calculate dynamic coupling between residues, the Dynamic Coupling Index (DCI) to two new metrics: (i) EpiScore, which quantifies the difference between the residue fluctuation response of a functional site when two other positions are perturbed with random Brownian kicks simultaneously versus individually to capture the degree of cooperativity of these two other positions in modulating the dynamics of the functional site and (ii) DCIasym, which measures the degree of asymmetry between the residue fluctuation response of two sites when one or the other is perturbed with a random force. Applied to four independent systems, we successfully show that EpiScore and DCIasym can capture important biophysical properties in dual mutant substitution outcomes. We propose that allosteric regulation and the mechanisms underlying non-additive amino acid substitution outcomes (i.e., epistasis) can be understood as emergent properties of an anisotropic network of interactions where the inclusion of the full network of interactions is critical for accurate modeling. Consequently, mutations which drive towards a new function may require a fine balance between functional site asymmetry and strength of dynamic coupling with the functional sites. These two tools will provide mechanistic insight into both understanding and predicting the outcome of dual mutations.
Collapse
|
125
|
Du Y, Hultquist JF, Zhou Q, Olson A, Tseng Y, Zhang TH, Hong M, Tang K, Chen L, Meng X, McGregor MJ, Dai L, Gong D, Martin-Sancho L, Chanda S, Li X, Bensenger S, Krogan NJ, Sun R. mRNA display with library of even-distribution reveals cellular interactors of influenza virus NS1. Nat Commun 2020; 11:2449. [PMID: 32415096 PMCID: PMC7229031 DOI: 10.1038/s41467-020-16140-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Accepted: 04/13/2020] [Indexed: 12/19/2022] Open
Abstract
A comprehensive examination of protein-protein interactions (PPIs) is fundamental for the understanding of cellular machineries. However, limitations in current methodologies often prevent the detection of PPIs with low abundance proteins. To overcome this challenge, we develop a mRNA display with library of even-distribution (md-LED) method that facilitates the detection of low abundance binders with high specificity and sensitivity. As a proof-of-principle, we apply md-LED to IAV NS1 protein. Complementary to AP-MS, md-LED enables us to validate previously described PPIs as well as to identify novel NS1 interactors. We show that interacting with FASN allows NS1 to directly regulate the synthesis of cellular fatty acids. We also use md-LED to identify a mutant of NS1, D92Y, results in a loss of interaction with CPSF1. The use of high-throughput sequencing as the readout for md-LED enables sensitive quantification of interactions, ultimately enabling massively parallel experimentation for the investigation of PPIs.
Collapse
Affiliation(s)
- Yushen Du
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA, 90095, USA.
- Cancer Institute, ZJU-UCLA Joint Center for Medical Education and Research, School of Medicine, Zhejiang University, Hangzhou, 310058, China.
| | - Judd F Hultquist
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, 94158, USA
- California Institute for Quantitative Biosciences, QB3, University of California, San Francisco, San Francisco, CA, 94158, USA
- J. David Gladstone Institutes, San Francisco, CA, 94158, USA
- Division of Infectious Diseases, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Quan Zhou
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA, 90095, USA
| | - Anders Olson
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA, 90095, USA
| | - Yenwen Tseng
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA, 90095, USA
| | - Tian-Hao Zhang
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA, 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, CA, 90095, USA
| | - Mengying Hong
- Cancer Institute, ZJU-UCLA Joint Center for Medical Education and Research, School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Kejun Tang
- Cancer Institute, ZJU-UCLA Joint Center for Medical Education and Research, School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Liubo Chen
- Cancer Institute, ZJU-UCLA Joint Center for Medical Education and Research, School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Xiangzhi Meng
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA, 90095, USA
| | - Michael J McGregor
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, 94158, USA
- California Institute for Quantitative Biosciences, QB3, University of California, San Francisco, San Francisco, CA, 94158, USA
- J. David Gladstone Institutes, San Francisco, CA, 94158, USA
| | - Lei Dai
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA, 90095, USA
| | - Danyang Gong
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA, 90095, USA
| | - Laura Martin-Sancho
- Sanford Burnham Prebys Medical Discovery Institute, 10901 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Sumit Chanda
- Sanford Burnham Prebys Medical Discovery Institute, 10901 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Xinming Li
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, L, Los Angeles, CA, 90095, USA
| | - Steve Bensenger
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA, 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, CA, 90095, USA
| | - Nevan J Krogan
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, 94158, USA
- California Institute for Quantitative Biosciences, QB3, University of California, San Francisco, San Francisco, CA, 94158, USA
- J. David Gladstone Institutes, San Francisco, CA, 94158, USA
| | - Ren Sun
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA, 90095, USA.
- Molecular Biology Institute, University of California, Los Angeles, CA, 90095, USA.
| |
Collapse
|
126
|
Nagano M, Suga H. Expansion of Modality: Peptides to Pseudo-Natural Macrocyclic Peptides. J SYN ORG CHEM JPN 2020. [DOI: 10.5059/yukigoseikyokaishi.78.516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
| | - Hiroaki Suga
- Department of Chemistry, Graduate School of Science, The University of Tokyo
| |
Collapse
|
127
|
Lesk AM. Not Enough Natural Data? Sequence and Ye Shall Find. Front Mol Biosci 2020; 7:65. [PMID: 32373628 PMCID: PMC7186298 DOI: 10.3389/fmolb.2020.00065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 03/25/2020] [Indexed: 11/28/2022] Open
Affiliation(s)
- Arthur M Lesk
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, United States
| |
Collapse
|
128
|
Zhou J, McCandlish DM. Minimum epistasis interpolation for sequence-function relationships. Nat Commun 2020; 11:1782. [PMID: 32286265 PMCID: PMC7156698 DOI: 10.1038/s41467-020-15512-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open
Abstract
Massively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While such assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes have not been directly assayed. Here, we present an imputation method based on inferring the least epistatic possible sequence-function relationship compatible with the data. In particular, we infer the reconstruction where mutational effects change as little as possible across adjacent genetic backgrounds. The resulting models can capture complex higher-order genetic interactions near the data, but approach additivity where data is sparse or absent. We apply the method to high-throughput transcription factor binding assays and use it to explore a fitness landscape for protein G.
Collapse
Affiliation(s)
- Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
129
|
Fantini M, Lisi S, De Los Rios P, Cattaneo A, Pastore A. Protein Structural Information and Evolutionary Landscape by In Vitro Evolution. Mol Biol Evol 2020; 37:1179-1192. [PMID: 31670785 PMCID: PMC7086169 DOI: 10.1093/molbev/msz256] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein structure is tightly intertwined with function according to the laws of evolution. Understanding how structure determines function has been the aim of structural biology for decades. Here, we have wondered instead whether it is possible to exploit the function for which a protein was evolutionary selected to gain information on protein structure and on the landscape explored during the early stages of molecular and natural evolution. To answer to this question, we developed a new methodology, which we named CAMELS (Coupling Analysis by Molecular Evolution Library Sequencing), that is able to obtain the in vitro evolution of a protein from an artificial selection based on function. We were able to observe with CAMELS many features of the TEM-1 beta-lactamase local fold exclusively by generating and sequencing large libraries of mutational variants. We demonstrated that we can, whenever a functional phenotypic selection of a protein is available, sketch the structural and evolutionary landscape of a protein without utilizing purified proteins, collecting physical measurements, or relying on the pool of natural protein variants.
Collapse
Affiliation(s)
- Marco Fantini
- BioSNS Laboratory of Biology, Scuola Normale Superiore (SNS), Pisa, Italy
| | - Simonetta Lisi
- BioSNS Laboratory of Biology, Scuola Normale Superiore (SNS), Pisa, Italy
| | - Paolo De Los Rios
- Institute of Physics, School of Basic Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Antonino Cattaneo
- BioSNS Laboratory of Biology, Scuola Normale Superiore (SNS), Pisa, Italy
- European Brain Research Institute, Rome, Italy
| | - Annalisa Pastore
- Department of Clinical and Basic Neuroscience, Maurice Wohl Institute, King's College London, London, United Kingdom
- Dementia Research Institute, King’s College London, London, United Kingdom
| |
Collapse
|
130
|
Blanco C, Verbanic S, Seelig B, Chen IA. High throughput sequencing of in vitro selections of mRNA-displayed peptides: data analysis and applications. Phys Chem Chem Phys 2020; 22:6492-6506. [PMID: 31967131 PMCID: PMC8219182 DOI: 10.1039/c9cp05912a] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
In vitro selection using mRNA display is currently a widely used method to isolate functional peptides with desired properties. The analysis of high throughput sequencing (HTS) data from in vitro evolution experiments has proven to be a powerful technique but only recently has it been applied to mRNA display selections. In this Perspective, we introduce aspects of mRNA display and HTS that may be of interest to physical chemists. We highlight the potential of HTS to analyze in vitro selections of peptides and review recent advances in the application of HTS analysis to mRNA display experiments. We discuss some possible issues involved with HTS analysis and summarize some strategies to alleviate them. Finally, the potential for future impact of advancing HTS analysis on mRNA display experiments is discussed.
Collapse
Affiliation(s)
- Celia Blanco
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA 93106, USA.
| | | | | | | |
Collapse
|
131
|
Bravi B, Ravasio R, Brito C, Wyart M. Direct coupling analysis of epistasis in allosteric materials. PLoS Comput Biol 2020; 16:e1007630. [PMID: 32119660 PMCID: PMC7067494 DOI: 10.1371/journal.pcbi.1007630] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Revised: 03/12/2020] [Accepted: 01/03/2020] [Indexed: 11/22/2022] Open
Abstract
In allosteric proteins, the binding of a ligand modifies function at a distant active site. Such allosteric pathways can be used as target for drug design, generating considerable interest in inferring them from sequence alignment data. Currently, different methods lead to conflicting results, in particular on the existence of long-range evolutionary couplings between distant amino-acids mediating allostery. Here we propose a resolution of this conundrum, by studying epistasis and its inference in models where an allosteric material is evolved in silico to perform a mechanical task. We find in our model the four types of epistasis (Synergistic, Sign, Antagonistic, Saturation), which can be both short or long-range and have a simple mechanical interpretation. We perform a Direct Coupling Analysis (DCA) and find that DCA predicts well the cost of point mutations but is a rather poor generative model. Strikingly, it can predict short-range epistasis but fails to capture long-range epistasis, in consistence with empirical findings. We propose that such failure is generic when function requires subparts to work in concert. We illustrate this idea with a simple model, which suggests that other methods may be better suited to capture long-range effects. Allostery in proteins is the property of highly specific responses to ligand binding at a distant site. To inform protocols of de novo drug design, it is fundamental to understand the impact of mutations on allosteric regulation and whether it can be predicted from evolutionary correlations. In this work we consider allosteric architectures artificially evolved to optimize the cooperativity of binding at allosteric and active site. We first characterize the emergent pattern of epistasis as well as the underlying mechanical phenomena, finding the four types of epistasis (Synergistic, Sign, Antagonistic, Saturation), which can be both short or long-range. The numerical evolution of these allosteric architectures allows us to benchmark Direct Coupling Analysis, a method which relies on co-evolution in sequence data to infer direct evolutionary couplings, in connection to allostery. We show that Direct Coupling Analysis predicts quantitatively point mutation costs but underestimates strong long-range epistasis. We provide an argument, based on a simplified model, illustrating the reasons for this discrepancy. Our analysis suggests neural networks as more promising tool to measure epistasis.
Collapse
Affiliation(s)
- Barbara Bravi
- Institute of Physics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- * E-mail: (BB); (MW)
| | - Riccardo Ravasio
- Institute of Physics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Carolina Brito
- Instituto de Fìsica, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - Matthieu Wyart
- Institute of Physics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- * E-mail: (BB); (MW)
| |
Collapse
|
132
|
Penn WD, McKee AG, Kuntz CP, Woods H, Nash V, Gruenhagen TC, Roushar FJ, Chandak M, Hemmerich C, Rusch DB, Meiler J, Schlebach JP. Probing biophysical sequence constraints within the transmembrane domains of rhodopsin by deep mutational scanning. SCIENCE ADVANCES 2020; 6:eaay7505. [PMID: 32181350 PMCID: PMC7056298 DOI: 10.1126/sciadv.aay7505] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Accepted: 12/11/2019] [Indexed: 05/15/2023]
Abstract
Membrane proteins must balance the sequence constraints associated with folding and function against the hydrophobicity required for solvation within the bilayer. We recently found the expression and maturation of rhodopsin are limited by the hydrophobicity of its seventh transmembrane domain (TM7), which contains polar residues that are essential for function. On the basis of these observations, we hypothesized that rhodopsin's expression should be less tolerant of mutations in TM7 relative to those within hydrophobic TM domains. To test this hypothesis, we used deep mutational scanning to compare the effects of 808 missense mutations on the plasma membrane expression of rhodopsin in HEK293T cells. Our results confirm that a higher proportion of mutations within TM7 (37%) decrease rhodopsin's plasma membrane expression relative to those within a hydrophobic TM domain (TM2, 25%). These results in conjunction with an evolutionary analysis suggest solvation energetics likely restricts the evolutionary sequence space of polar TM domains.
Collapse
Affiliation(s)
- Wesley D. Penn
- Department of Chemistry, Indiana University, Bloomington, IN 47405, USA
| | - Andrew G. McKee
- Department of Chemistry, Indiana University, Bloomington, IN 47405, USA
| | - Charles P. Kuntz
- Department of Chemistry, Indiana University, Bloomington, IN 47405, USA
| | - Hope Woods
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
- Chemical and Physical Biology Program, Vanderbilt University, Nashville, TN 37235, USA
| | - Veronica Nash
- Department of Chemistry, Indiana University, Bloomington, IN 47405, USA
| | | | | | - Mahesh Chandak
- Department of Chemistry, Indiana University, Bloomington, IN 47405, USA
| | - Chris Hemmerich
- Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA
| | - Douglas B. Rusch
- Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
| | - Jonathan P. Schlebach
- Department of Chemistry, Indiana University, Bloomington, IN 47405, USA
- Corresponding author.
| |
Collapse
|
133
|
Affiliation(s)
- Melissa Chiasson
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA. .,Department of Bioengineering, University of Washington, Seattle, WA, USA. .,Genetic Networks Program, CIFAR, Toronto, Ontario, Canada.
| |
Collapse
|
134
|
Miton CM, Chen JZ, Ost K, Anderson DW, Tokuriki N. Statistical analysis of mutational epistasis to reveal intramolecular interaction networks in proteins. Methods Enzymol 2020; 643:243-280. [DOI: 10.1016/bs.mie.2020.07.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
135
|
Atkinson JT, Jones AM, Nanda V, Silberg JJ. Protein tolerance to random circular permutation correlates with thermostability and local energetics of residue-residue contacts. Protein Eng Des Sel 2019; 32:489-501. [PMID: 32626892 PMCID: PMC7462040 DOI: 10.1093/protein/gzaa012] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 04/13/2020] [Accepted: 04/15/2020] [Indexed: 01/08/2023] Open
Abstract
Adenylate kinase (AK) orthologs with a range of thermostabilities were subjected to random circular permutation, and deep mutational scanning was used to evaluate where new protein termini were nondisruptive to activity. The fraction of circularly permuted variants that retained function in each library correlated with AK thermostability. In addition, analysis of the positional tolerance to new termini, which increase local conformational flexibility, showed that bonds were either functionally sensitive to cleavage across all homologs, differentially sensitive, or uniformly tolerant. The mobile AMP-binding domain, which displays the highest calculated contact energies, presented the greatest tolerance to new termini across all AKs. In contrast, retention of function in the lid and core domains was more dependent upon AK melting temperature. These results show that family permutation profiling identifies primary structure that has been selected by evolution for dynamics that are critical to activity within an enzyme family. These findings also illustrate how deep mutational scanning can be applied to protein homologs in parallel to differentiate how topology, stability, and local energetics govern mutational tolerance.
Collapse
Affiliation(s)
- Joshua T Atkinson
- Systems, Synthetic, and Physical Biology Graduate Program, Rice University, 6100 Main Street, MS-180, Houston, TX 77005, USA
- Department of BioSciences, Rice University, 6100 Main Street, MS-140, Houston, TX 77005, USA
| | - Alicia M Jones
- Biochemistry and Cell Biology Graduate Program, Rice University, 6100 Main Street, MS-140, Houston, TX 77005, USA
| | - Vikas Nanda
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jonathan J Silberg
- Department of BioSciences, Rice University, 6100 Main Street, MS-140, Houston, TX 77005, USA
- Department of Bioengineering, Rice University, 6100 Main Street, MS-142, Houston, TX 77005, USA
- Department of Chemical and Biomolecular Engineering, Rice University, 6100 Main Street, MS-362, Houston, TX 77005, USA
| |
Collapse
|
136
|
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 2019; 20:223. [PMID: 31679514 PMCID: PMC6827219 DOI: 10.1186/s13059-019-1845-6] [Citation(s) in RCA: 146] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 10/01/2019] [Indexed: 11/10/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here, we present MaveDB ( https://www.mavedb.org ), a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first such application, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.
Collapse
Affiliation(s)
- Daniel Esposito
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
| | - Jochen Weile
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Lea M Starita
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Frederick P Roth
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia.
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
| |
Collapse
|
137
|
Kemble H, Nghe P, Tenaillon O. Recent insights into the genotype-phenotype relationship from massively parallel genetic assays. Evol Appl 2019; 12:1721-1742. [PMID: 31548853 PMCID: PMC6752143 DOI: 10.1111/eva.12846] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 06/21/2019] [Accepted: 07/02/2019] [Indexed: 12/20/2022] Open
Abstract
With the molecular revolution in Biology, a mechanistic understanding of the genotype-phenotype relationship became possible. Recently, advances in DNA synthesis and sequencing have enabled the development of deep mutational scanning assays, capable of scoring comprehensive libraries of genotypes for fitness and a variety of phenotypes in massively parallel fashion. The resulting empirical genotype-fitness maps pave the way to predictive models, potentially accelerating our ability to anticipate the behaviour of pathogen and cancerous cell populations from sequencing data. Besides from cellular fitness, phenotypes of direct application in industry (e.g. enzyme activity) and medicine (e.g. antibody binding) can be quantified and even selected directly by these assays. This review discusses the technological basis of and recent developments in massively parallel genetics, along with the trends it is uncovering in the genotype-phenotype relationship (distribution of mutation effects, epistasis), their possible mechanistic bases and future directions for advancing towards the goal of predictive genetics.
Collapse
Affiliation(s)
- Harry Kemble
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Unité Mixte de Recherche 1137Université Paris Diderot, Université Paris NordParisFrance
- École Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris), UMR CNRS‐ESPCI CBI 8231PSL Research UniversityParis Cedex 05France
| | - Philippe Nghe
- École Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris), UMR CNRS‐ESPCI CBI 8231PSL Research UniversityParis Cedex 05France
| | - Olivier Tenaillon
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Unité Mixte de Recherche 1137Université Paris Diderot, Université Paris NordParisFrance
| |
Collapse
|
138
|
Wang S, Dai L. Evolving generalists in switching rugged landscapes. PLoS Comput Biol 2019; 15:e1007320. [PMID: 31574088 PMCID: PMC6771975 DOI: 10.1371/journal.pcbi.1007320] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 08/02/2019] [Indexed: 01/05/2023] Open
Abstract
Evolving systems, be it an antibody repertoire in the face of mutating pathogens or a microbial population exposed to varied antibiotics, constantly search for adaptive solutions in time-varying fitness landscapes. Generalists refer to genotypes that remain fit across diverse selective pressures; while multi-drug resistant microbes are undesired yet prevalent, broadly-neutralizing antibodies are much wanted but rare. However, little is known about under what conditions such generalists with a high capacity to adapt can be efficiently discovered by evolution. In addition, can epistasis-the source of landscape ruggedness and path constraints-play a different role, if the environment varies in a non-random way? We present a generative model to estimate the propensity of evolving generalists in rugged landscapes that are tunably related and alternating relatively slowly. We find that environmental cycling can substantially facilitate the search for fit generalists by dynamically enlarging their effective basins of attraction. Importantly, these high performers are most likely to emerge at intermediate levels of ruggedness and environmental relatedness. Our approach allows one to estimate correlations across environments from the topography of experimental fitness landscapes. Our work provides a conceptual framework to study evolution in time-correlated complex environments, and offers statistical understanding that suggests general strategies for eliciting broadly neutralizing antibodies or preventing microbes from evolving multi-drug resistance.
Collapse
Affiliation(s)
- Shenshen Wang
- Department of Physics and Astronomy, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| | - Lei Dai
- Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
139
|
Li X, Lalić J, Baeza-Centurion P, Dhar R, Lehner B. Changes in gene expression predictably shift and switch genetic interactions. Nat Commun 2019; 10:3886. [PMID: 31467279 PMCID: PMC6715729 DOI: 10.1038/s41467-019-11735-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 07/29/2019] [Indexed: 11/18/2022] Open
Abstract
Non-additive interactions between mutations occur extensively and also change across conditions, making genetic prediction a difficult challenge. To better understand the plasticity of genetic interactions (epistasis), we combine mutations in a single protein performing a single function (a transcriptional repressor inhibiting a target gene). Even in this minimal system, genetic interactions switch from positive (suppressive) to negative (enhancing) as the expression of the gene changes. These seemingly complicated changes can be predicted using a mathematical model that propagates the effects of mutations on protein folding to the cellular phenotype. More generally, changes in gene expression should be expected to alter the effects of mutations and how they interact whenever the relationship between expression and a phenotype is nonlinear, which is the case for most genes. These results have important implications for understanding genotype-phenotype maps and illustrate how changes in genetic interactions can often—but not always—be predicted by hierarchical mechanistic models. Non-additive genetic interactions are plastic and can complicate genetic prediction. Here, using deep mutagenesis of the lambda repressor, Li et al. reveal that changes in gene expression can alter the strength and direction of genetic interactions between mutations in many genes and develop mathematical models for predicting them.
Collapse
Affiliation(s)
- Xianghua Li
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Jasna Lalić
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Pablo Baeza-Centurion
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Riddhiman Dhar
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain. .,Universitat Pompeu Fabra (UPF), Barcelona, Spain. .,ICREA, Pg. Luis Companys 23, Barcelona, 08010, Spain.
| |
Collapse
|
140
|
Ferrada E. Gene Families, Epistasis and the Amino Acid Preferences of Protein Homologs. Evol Bioinform Online 2019; 15:1176934319870485. [PMID: 31452598 PMCID: PMC6698995 DOI: 10.1177/1176934319870485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 07/27/2019] [Indexed: 11/16/2022] Open
Abstract
In order to preserve structure and function, proteins tend to preferentially conserve amino acids at particular sites along the sequence. Because mutations can affect structure and function, the question arises whether the preference of a protein site for a particular amino acid varies between protein homologs, and to what extent that variation depends on sequence divergence. Answering these questions can help in the development of models of sequence evolution, as well as provide insights on the dependence of the fitness effects of mutations on the genetic background of sequences, a phenomenon known as epistasis. Here, I comment on recent computational work providing a systematic analysis of the extent to which the amino acid preferences of proteins depend on the background mutations of protein homologs.
Collapse
Affiliation(s)
- Evandro Ferrada
- Center for Genomics and Bioinformatics, Faculty of Science, Universidad Mayor, Santiago, Chile
| |
Collapse
|
141
|
Atkinson JT, Jones AM, Zhou Q, Silberg JJ. Circular permutation profiling by deep sequencing libraries created using transposon mutagenesis. Nucleic Acids Res 2019; 46:e76. [PMID: 29912470 PMCID: PMC6061844 DOI: 10.1093/nar/gky255] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 03/28/2018] [Indexed: 12/17/2022] Open
Abstract
Deep mutational scanning has been used to create high-resolution DNA sequence maps that illustrate the functional consequences of large numbers of point mutations. However, this approach has not yet been applied to libraries of genes created by random circular permutation, an engineering strategy that is used to create open reading frames that express proteins with altered contact order. We describe a new method, termed circular permutation profiling with DNA sequencing (CPP-seq), which combines a one-step transposon mutagenesis protocol for creating libraries with a functional selection, deep sequencing and computational analysis to obtain unbiased insight into a protein's tolerance to circular permutation. Application of this method to an adenylate kinase revealed that CPP-seq creates two types of vectors encoding each circularly permuted gene, which differ in their ability to express proteins. Functional selection of this library revealed that >65% of the sampled vectors that express proteins are enriched relative to those that cannot translate proteins. Mapping enriched sequences onto structure revealed that the mobile AMP binding and rigid core domains display greater tolerance to backbone fragmentation than the mobile lid domain, illustrating how CPP-seq can be used to relate a protein's biophysical characteristics to the retention of activity upon permutation.
Collapse
Affiliation(s)
- Joshua T Atkinson
- Systems, Synthetic, and Physical Biology Graduate Program, Rice University, 6100 Main MS-180, Houston, TX 77005, USA
| | - Alicia M Jones
- Department of BioSciences, Rice University, MS-140, 6100 Main Street, Houston, TX 77005, USA
| | - Quan Zhou
- Department of Statistics, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Jonathan J Silberg
- Department of BioSciences, Rice University, MS-140, 6100 Main Street, Houston, TX 77005, USA.,Department of Bioengineering, Rice University, 6100 Main Street, Houston, TX 77005, USA
| |
Collapse
|
142
|
Choi GCG, Zhou P, Yuen CTL, Chan BKC, Xu F, Bao S, Chu HY, Thean D, Tan K, Wong KH, Zheng Z, Wong ASL. Combinatorial mutagenesis en masse optimizes the genome editing activities of SpCas9. Nat Methods 2019; 16:722-730. [PMID: 31308554 DOI: 10.1038/s41592-019-0473-0] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 06/03/2019] [Indexed: 01/01/2023]
Abstract
The combined effect of multiple mutations on protein function is hard to predict; thus, the ability to functionally assess a vast number of protein sequence variants would be practically useful for protein engineering. Here we present a high-throughput platform that enables scalable assembly and parallel characterization of barcoded protein variants with combinatorial modifications. We demonstrate this platform, which we name CombiSEAL, by systematically characterizing a library of 948 combination mutants of the widely used Streptococcus pyogenes Cas9 (SpCas9) nuclease to optimize its genome-editing activity in human cells. The ease with which the editing activities of the pool of SpCas9 variants can be assessed at multiple on- and off-target sites accelerates the identification of optimized variants and facilitates the study of mutational epistasis. We successfully identify Opti-SpCas9, which possesses enhanced editing specificity without sacrificing potency and broad targeting range. This platform is broadly applicable for engineering proteins through combinatorial modifications en masse.
Collapse
Affiliation(s)
- Gigi C G Choi
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, China
| | - Peng Zhou
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, China
| | - Chaya T L Yuen
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, China
| | - Becky K C Chan
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, China
| | - Feng Xu
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, China
| | - Siyu Bao
- Ming Wai Lau Centre for Reparative Medicine, Karolinska Institutet, Hong Kong, China
| | - Hoi Yee Chu
- Ming Wai Lau Centre for Reparative Medicine, Karolinska Institutet, Hong Kong, China
| | - Dawn Thean
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, China
| | - Kaeling Tan
- Faculty of Health Sciences, University of Macau, Macau, China
- Genomics, Bioinformatics and Single Cell Analysis Core, Faculty of Health Sciences, University of Macau, Macau, China
| | - Koon Ho Wong
- Faculty of Health Sciences, University of Macau, Macau, China
- Institute of Translational Medicine, University of Macau, Macau, China
| | - Zongli Zheng
- Ming Wai Lau Centre for Reparative Medicine, Karolinska Institutet, Hong Kong, China
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China
- Biotechnology and Health Centre, City University of Hong Kong Shenzhen Research Institute, Shenzhen, China
| | - Alan S L Wong
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Hong Kong, China.
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China.
| |
Collapse
|
143
|
Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. Proc Natl Acad Sci U S A 2019; 116:16367-16377. [PMID: 31371509 DOI: 10.1073/pnas.1903888116] [Citation(s) in RCA: 113] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The accurate prediction of protein stability upon sequence mutation is an important but unsolved challenge in protein engineering. Large mutational datasets are required to train computational predictors, but traditional methods for collecting stability data are either low-throughput or measure protein stability indirectly. Here, we develop an automated method to generate thermodynamic stability data for nearly every single mutant in a small 56-residue protein. Analysis reveals that most single mutants have a neutral effect on stability, mutational sensitivity is largely governed by residue burial, and unexpectedly, hydrophobics are the best tolerated amino acid type. Correlating the output of various stability-prediction algorithms against our data shows that nearly all perform better on boundary and surface positions than for those in the core and are better at predicting large-to-small mutations than small-to-large ones. We show that the most stable variants in the single-mutant landscape are better identified using combinations of 2 prediction algorithms and including more algorithms can provide diminishing returns. In most cases, poor in silico predictions were tied to compositional differences between the data being analyzed and the datasets used to train the algorithm. Finally, we find that strategies to extract stabilities from high-throughput fitness data such as deep mutational scanning are promising and that data produced by these methods may be applicable toward training future stability-prediction tools.
Collapse
|
144
|
Abstract
Evolvability is the ability of a biological system to produce phenotypic variation that is both heritable and adaptive. It has long been the subject of anecdotal observations and theoretical work. In recent years, however, the molecular causes of evolvability have been an increasing focus of experimental work. Here, we review recent experimental progress in areas as different as the evolution of drug resistance in cancer cells and the rewiring of transcriptional regulation circuits in vertebrates. This research reveals the importance of three major themes: multiple genetic and non-genetic mechanisms to generate phenotypic diversity, robustness in genetic systems, and adaptive landscape topography. We also discuss the mounting evidence that evolvability can evolve and the question of whether it evolves adaptively.
Collapse
|
145
|
Rollins NJ, Brock KP, Poelwijk FJ, Stiffler MA, Gauthier NP, Sander C, Marks DS. Inferring protein 3D structure from deep mutation scans. Nat Genet 2019; 51:1170-1176. [PMID: 31209393 PMCID: PMC7295002 DOI: 10.1038/s41588-019-0432-9] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 04/29/2019] [Indexed: 11/09/2022]
Abstract
We describe an experimental method of three-dimensional (3D) structure determination that exploits the increasing ease of high-throughput mutational scans. Inspired by the success of using natural, evolutionary sequence covariation to compute protein and RNA folds, we explored whether 'laboratory', synthetic sequence variation might also yield 3D structures. We analyzed five large-scale mutational scans and discovered that the pairs of residues with the largest positive epistasis in the experiments are sufficient to determine the 3D fold. We show that the strongest epistatic pairings from genetic screens of three proteins, a ribozyme and a protein interaction reveal 3D contacts within and between macromolecules. Using these experimental epistatic pairs, we compute ab initio folds for a GB1 domain (within 1.8 Å of the crystal structure) and a WW domain (2.1 Å). We propose strategies that reduce the number of mutants needed for contact prediction, suggesting that genomics-based techniques can efficiently predict 3D structure.
Collapse
Affiliation(s)
- Nathan J Rollins
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Kelly P Brock
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
| | - Frank J Poelwijk
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Michael A Stiffler
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Nicholas P Gauthier
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
146
|
Schmiedel JM, Lehner B. Determining protein structures using deep mutagenesis. Nat Genet 2019; 51:1177-1186. [PMID: 31209395 PMCID: PMC7610650 DOI: 10.1038/s41588-019-0431-x] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 04/29/2019] [Indexed: 12/12/2022]
Abstract
Determining the three-dimensional structures of macromolecules is a major goal of biological research, because of the close relationship between structure and function; however, thousands of protein domains still have unknown structures. Structure determination usually relies on physical techniques including X-ray crystallography, NMR spectroscopy and cryo-electron microscopy. Here we present a method that allows the high-resolution three-dimensional backbone structure of a biological macromolecule to be determined only from measurements of the activity of mutant variants of the molecule. This genetic approach to structure determination relies on the quantification of genetic interactions (epistasis) between mutations and the discrimination of direct from indirect interactions. This provides an alternative experimental strategy for structure determination, with the potential to reveal functional and in vivo structures.
Collapse
Affiliation(s)
- Jörn M Schmiedel
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- ICREA, Barcelona, Spain.
| |
Collapse
|
147
|
Jakobson CM, Jarosz DF. Molecular Origins of Complex Heritability in Natural Genotype-to-Phenotype Relationships. Cell Syst 2019; 8:363-379.e3. [PMID: 31054809 PMCID: PMC6560647 DOI: 10.1016/j.cels.2019.04.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 03/25/2019] [Accepted: 04/05/2019] [Indexed: 01/09/2023]
Abstract
The statistical complexity of heredity has long been evident, but its molecular origins remain elusive. To investigate, we charted 90 comprehensive genotype-to-phenotype maps in a large population of wild diploid yeast. In contrast to long-standing assumptions, all types of genetic variation contributed similarly to phenotype. Causal synonymous and regulatory variants exhibited distinct molecular signatures, as did nonlinearities in heterozygote fitness that likely contribute to hybrid vigor. Highly pleiotropic variants altered disordered sequences within signaling hubs, and their effects correlated across environments-even when antagonistic-suggesting that large fitness gains bring concomitant costs. Natural genetic networks defined by the causal loci differed from those determined by precise gene deletions or protein-protein interactions. Finally, we found that traits that would appear omnigenic in less powered studies do in fact have finite genetic determinants. Integrating these molecular principles will be crucial as genome reading and writing become routine in research, industry, and medicine.
Collapse
Affiliation(s)
- Christopher M Jakobson
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Daniel F Jarosz
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
148
|
Kinney JB, McCandlish DM. Massively Parallel Assays and Quantitative Sequence-Function Relationships. Annu Rev Genomics Hum Genet 2019; 20:99-127. [PMID: 31091417 DOI: 10.1146/annurev-genom-083118-014845] [Citation(s) in RCA: 88] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Over the last decade, a rich variety of massively parallel assays have revolutionized our understanding of how biological sequences encode quantitative molecular phenotypes. These assays include deep mutational scanning, high-throughput SELEX, and massively parallel reporter assays. Here, we review these experimental methods and how the data they produce can be used to quantitatively model sequence-function relationships. In doing so, we touch on a diverse range of topics, including the identification of clinically relevant genomic variants, the modeling of transcription factor binding to DNA, the functional and evolutionary landscapes of proteins, and cis-regulatory mechanisms in both transcription and mRNA splicing. We further describe a unified conceptual framework and a core set of mathematical modeling strategies that studies in these diverse areas can make use of. Finally, we highlight key aspects of experimental design and mathematical modeling that are important for the results of such studies to be interpretable and reproducible.
Collapse
Affiliation(s)
- Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| |
Collapse
|
149
|
Domingo J, Baeza-Centurion P, Lehner B. The Causes and Consequences of Genetic Interactions (Epistasis). Annu Rev Genomics Hum Genet 2019; 20:433-460. [PMID: 31082279 DOI: 10.1146/annurev-genom-083118-014857] [Citation(s) in RCA: 137] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The same mutation can have different effects in different individuals. One important reason for this is that the outcome of a mutation can depend on the genetic context in which it occurs. This dependency is known as epistasis. In recent years, there has been a concerted effort to quantify the extent of pairwise and higher-order genetic interactions between mutations through deep mutagenesis of proteins and RNAs. This research has revealed two major components of epistasis: nonspecific genetic interactions caused by nonlinearities in genotype-to-phenotype maps, and specific interactions between particular mutations. Here, we provide an overview of our current understanding of the mechanisms causing epistasis at the molecular level, the consequences of genetic interactions for evolution and genetic prediction, and the applications of epistasis for understanding biology and determining macromolecular structures.
Collapse
Affiliation(s)
- Júlia Domingo
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , ,
| | - Pablo Baeza-Centurion
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , ,
| | - Ben Lehner
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , , .,Universitat Pompeu Fabra, 08003 Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
| |
Collapse
|
150
|
Gaiha GD, Rossin EJ, Urbach J, Landeros C, Collins DR, Nwonu C, Muzhingi I, Anahtar MN, Waring OM, Piechocka-Trocha A, Waring M, Worrall DP, Ghebremichael MS, Newman RM, Power KA, Allen TM, Chodosh J, Walker BD. Structural topology defines protective CD8 + T cell epitopes in the HIV proteome. Science 2019; 364:480-484. [PMID: 31048489 PMCID: PMC6855781 DOI: 10.1126/science.aav5095] [Citation(s) in RCA: 104] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Accepted: 03/25/2019] [Indexed: 12/26/2022]
Abstract
Mutationally constrained epitopes of variable pathogens represent promising targets for vaccine design but are not reliably identified by sequence conservation. In this study, we employed structure-based network analysis, which applies network theory to HIV protein structure data to quantitate the topological importance of individual amino acid residues. Mutation of residues at important network positions disproportionately impaired viral replication and occurred with high frequency in epitopes presented by protective human leukocyte antigen (HLA) class I alleles. Moreover, CD8+ T cell targeting of highly networked epitopes distinguished individuals who naturally control HIV, even in the absence of protective HLA alleles. This approach thereby provides a mechanistic basis for immune control and a means to identify CD8+ T cell epitopes of topological importance for rational immunogen design, including a T cell-based HIV vaccine.
Collapse
Affiliation(s)
- Gaurav D Gaiha
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA
- Gastrointestinal Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Elizabeth J Rossin
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA 02114, USA
| | - Jonathan Urbach
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA
| | | | - David R Collins
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Chioma Nwonu
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA
| | - Itai Muzhingi
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA
| | - Melis N Anahtar
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Olivia M Waring
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Alicja Piechocka-Trocha
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Michael Waring
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Daniel P Worrall
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA
| | | | - Ruchi M Newman
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA
| | - Karen A Power
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA
| | - Todd M Allen
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA
| | - James Chodosh
- Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA 02114, USA
| | - Bruce D Walker
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA 02139, USA.
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|