201
|
Wheeler LC, Anderson JA, Morrison AJ, Wong CE, Harms MJ. Conservation of Specificity in Two Low-Specificity Proteins. Biochemistry 2017; 57:684-695. [PMID: 29240404 DOI: 10.1021/acs.biochem.7b01086] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Many regulatory proteins bind peptide regions of target proteins and modulate their activity. Such regulatory proteins can often interact with highly diverse target peptides. In many instances, it is not known if the peptide-binding interface discriminates targets in a biological context, or whether biological specificity is achieved exclusively through external factors such as subcellular localization. We used an evolutionary biochemical approach to distinguish these possibilities for two such low-specificity proteins: S100A5 and S100A6. We used isothermal titration calorimetry to study the binding of peptides with diverse sequence and biochemistry to human S100A5 and S100A6. These proteins bound distinct, but overlapping, sets of peptide targets. We then studied the peptide binding properties of orthologs sampled from across five amniote species. Binding specificity was conserved along all lineages, for the last 320 million years, despite the low specificity of each protein. We used ancestral sequence reconstruction to determine the binding specificity of the last common ancestor of the paralogs. The ancestor bound the entire set of peptides bound by modern S100A5 and S100A6 proteins, suggesting that paralog specificity evolved via subfunctionalization. To rule out the possibility that specificity is conserved because it is difficult to modify, we identified a single historical mutation that, when reverted in human S100A5, gave it the ability to bind an S100A6-specific peptide. These results reveal strong evolutionary constraints on peptide binding specificity. Despite being able to bind a large number of targets, the specificity of S100 peptide interfaces is likely important for the biology of these proteins.
Collapse
Affiliation(s)
- Lucas C Wheeler
- Department of Chemistry and Biochemistry, University of Oregon , Eugene, Oregon 97403, United States.,Institute of Molecular Biology, University of Oregon , Eugene, Oregon 97403, United States
| | - Jeremy A Anderson
- Department of Chemistry and Biochemistry, University of Oregon , Eugene, Oregon 97403, United States.,Institute of Molecular Biology, University of Oregon , Eugene, Oregon 97403, United States
| | - Anneliese J Morrison
- Department of Chemistry and Biochemistry, University of Oregon , Eugene, Oregon 97403, United States.,Institute of Molecular Biology, University of Oregon , Eugene, Oregon 97403, United States
| | - Caitlyn E Wong
- Department of Chemistry and Biochemistry, University of Oregon , Eugene, Oregon 97403, United States.,Institute of Molecular Biology, University of Oregon , Eugene, Oregon 97403, United States
| | - Michael J Harms
- Department of Chemistry and Biochemistry, University of Oregon , Eugene, Oregon 97403, United States.,Institute of Molecular Biology, University of Oregon , Eugene, Oregon 97403, United States
| |
Collapse
|
202
|
Weile J, Sun S, Cote AG, Knapp J, Verby M, Mellor JC, Wu Y, Pons C, Wong C, van Lieshout N, Yang F, Tasan M, Tan G, Yang S, Fowler DM, Nussbaum R, Bloom JD, Vidal M, Hill DE, Aloy P, Roth FP. A framework for exhaustively mapping functional missense variants. Mol Syst Biol 2017; 13:957. [PMID: 29269382 PMCID: PMC5740498 DOI: 10.15252/msb.20177908] [Citation(s) in RCA: 102] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Although we now routinely sequence human genomes, we can confidently identify only a fraction of the sequence variants that have a functional impact. Here, we developed a deep mutational scanning framework that produces exhaustive maps for human missense variants by combining random codon mutagenesis and multiplexed functional variation assays with computational imputation and refinement. We applied this framework to four proteins corresponding to six human genes: UBE2I (encoding SUMO E2 conjugase), SUMO1 (small ubiquitin‐like modifier), TPK1 (thiamin pyrophosphokinase), and CALM1/2/3 (three genes encoding the protein calmodulin). The resulting maps recapitulate known protein features and confidently identify pathogenic variation. Assays potentially amenable to deep mutational scanning are already available for 57% of human disease genes, suggesting that DMS could ultimately map functional variation for all human disease genes.
Collapse
Affiliation(s)
- Jochen Weile
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada.,The Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.,Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Song Sun
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada.,The Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.,Department of Computer Science, University of Toronto, Toronto, ON, Canada.,Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Atina G Cote
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada.,The Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Jennifer Knapp
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada.,The Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Marta Verby
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada.,The Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Joseph C Mellor
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.,SeqWell Inc, Boston, MA, USA
| | - Yingzhou Wu
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada.,The Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.,Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Carles Pons
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute for Science and Technology, Barcelona, Catalonia, Spain
| | - Cassandra Wong
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada.,The Donnelly Centre, University of Toronto, Toronto, ON, Canada
| | | | - Fan Yang
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada.,The Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.,Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Murat Tasan
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada.,The Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.,Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Guihong Tan
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Shan Yang
- Invitae Corp., San Francisco, CA, USA
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | | | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA.,Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - David E Hill
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute for Science and Technology, Barcelona, Catalonia, Spain.,Institució Catalana de Recerca I Estudis Avançats (ICREA), Barcelona, Catalonia, Spain
| | - Frederick P Roth
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada .,The Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.,Department of Computer Science, University of Toronto, Toronto, ON, Canada.,Canadian Institute for Advanced Research, Toronto, ON, Canada
| |
Collapse
|
203
|
Arai R. Hierarchical design of artificial proteins and complexes toward synthetic structural biology. Biophys Rev 2017; 10:391-410. [PMID: 29243094 DOI: 10.1007/s12551-017-0376-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2017] [Accepted: 11/23/2017] [Indexed: 12/14/2022] Open
Abstract
In multiscale structural biology, synthetic approaches are important to demonstrate biophysical principles and mechanisms underlying the structure, function, and action of bio-nanomachines. A central goal of "synthetic structural biology" is the design and construction of artificial proteins and protein complexes as desired. In this paper, I review recent remarkable progress of an array of approaches for hierarchical design of artificial proteins and complexes that signpost the path forward toward synthetic structural biology as an emerging interdisciplinary field. Topics covered include combinatorial and protein-engineering approaches for directed evolution of artificial binding proteins and membrane proteins, binary code strategy for structural and functional de novo proteins, protein nanobuilding block strategy for constructing nano-architectures, protein-metal-organic frameworks for 3D protein complex crystals, and rational and computational approaches for design/creation of artificial proteins and complexes, novel protein folds, ideal/optimized protein structures, novel binding proteins for targeted therapeutics, and self-assembling nanomaterials. Protein designers and engineers look toward a bright future in synthetic structural biology for the next generation of biophysics and biotechnology.
Collapse
Affiliation(s)
- Ryoichi Arai
- Department of Applied Biology, Faculty of Textile Science and Technology, Shinshu University, Ueda, Nagano 386-8567, Japan. .,Department of Supramolecular Complexes, Research Center for Fungal and Microbial Dynamism, Shinshu University, Minamiminowa, Nagano 399-4598, Japan. .,Institute for Biomedical Sciences, Interdisciplinary Cluster for Cutting Edge Research, Shinshu University, Matsumoto, Nagano 390-8621, Japan. .,Division of Structural and Synthetic Biology, RIKEN Center for Life Science Technologies, Tsurumi, Yokohama, Kanagawa 230-0045, Japan.
| |
Collapse
|
204
|
Sharma P, Kranz DM. Subtle changes at the variable domain interface of the T-cell receptor can strongly increase affinity. J Biol Chem 2017; 293:1820-1834. [PMID: 29229779 DOI: 10.1074/jbc.m117.814152] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 12/03/2017] [Indexed: 11/06/2022] Open
Abstract
Most affinity-maturation campaigns for antibodies and T-cell receptors (TCRs) operate on the residues at the binding site, located within the loops known as complementarity-determining regions (CDRs). Accordingly, mutations in contact residues, or so-called "second shell" residues, that increase affinity are typically identified by directed evolution involving combinatorial libraries. To determine the impact of residues located at a distance from the binding site, here we used single-codon libraries of both CDR and non-CDR residues to generate a deep mutational scan of a human TCR against the cancer antigen MART-1·HLA-A2. Non-CDR residues included those at the interface of the TCR variable domains (Vα and Vβ) and surface-exposed framework residues. Mutational analyses showed that both Vα/Vβ interface and CDR residues were important in maintaining binding to MART-1·HLA-A2, probably due to either structural requirements for proper Vα/Vβ association or direct contact with the ligand. More surprisingly, many Vα/Vβ interface substitutions yielded improved binding to MART-1·HLA-A2. To further explore this finding, we constructed interface libraries and selected them for improved stability or affinity. Among the variants identified, one conservative substitution (F45βY) was most prevalent. Further analysis of F45βY showed that it enhanced thermostability and increased affinity by 60-fold. Thus, introducing a single hydroxyl group at the Vα/Vβ interface, at a significant distance from the TCR·peptide·MHC-binding site, remarkably affected ligand binding. The variant retained a high degree of specificity for MART-1·HLA-A2, indicating that our approach provides a general strategy for engineering improvements in either soluble or cell-based TCRs for therapeutic purposes.
Collapse
Affiliation(s)
- Preeti Sharma
- From the Department of Biochemistry, University of Illinois, Urbana, Illinois 61801
| | - David M Kranz
- From the Department of Biochemistry, University of Illinois, Urbana, Illinois 61801
| |
Collapse
|
205
|
Higgins SA, Savage DF. Protein Science by DNA Sequencing: How Advances in Molecular Biology Are Accelerating Biochemistry. Biochemistry 2017; 57:38-46. [DOI: 10.1021/acs.biochem.7b00886] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Sean A. Higgins
- Department
of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California 94720, United States
| | - David F. Savage
- Department
of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California 94720, United States
- Department
of Chemistry, University of California, Berkeley, Berkeley, California 94720, United States
| |
Collapse
|
206
|
Abstract
A long-standing goal in evolutionary biology is predicting evolution. Here, we show that the architecture of macromolecules fundamentally limits evolutionary predictability. Under physiological conditions, macromolecules, like proteins, flip between multiple structures, forming an ensemble of structures. A mutation affects all of these structures in slightly different ways, redistributing the relative probabilities of structures in the ensemble. As a result, mutations that follow the first mutation have a different effect than they would if introduced before. This implies that knowing the effects of every mutation in an ancestor would be insufficient to predict evolutionary trajectories past the first few steps, leading to profound unpredictability in evolution. We, therefore, conclude that detailed evolutionary predictions are not possible given the chemistry of macromolecules. Evolutionary prediction is of deep practical and philosophical importance. Here we show, using a simple computational protein model, that protein evolution remains unpredictable, even if one knows the effects of all mutations in an ancestral protein background. We performed a virtual deep mutational scan—revealing the individual and pairwise epistatic effects of every mutation to our model protein—and then used this information to predict evolutionary trajectories. Our predictions were poor. This is a consequence of statistical thermodynamics. Proteins exist as ensembles of similar conformations. The effect of a mutation depends on the relative probabilities of conformations in the ensemble, which in turn, depend on the exact amino acid sequence of the protein. Accumulating substitutions alter the relative probabilities of conformations, thereby changing the effects of future mutations. This manifests itself as subtle but pervasive high-order epistasis. Uncertainty in the effect of each mutation accumulates and undermines prediction. Because conformational ensembles are an inevitable feature of proteins, this is likely universal.
Collapse
|
207
|
Starr TN, Picton LK, Thornton JW. Alternative evolutionary histories in the sequence space of an ancient protein. Nature 2017; 549:409-413. [PMID: 28902834 PMCID: PMC6214350 DOI: 10.1038/nature23902] [Citation(s) in RCA: 112] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 08/08/2017] [Indexed: 12/28/2022]
Abstract
To understand why molecular evolution turned out as it did, we must characterize not only the path that evolution followed across the space of possible molecular sequences but also the many alternative trajectories that could have been taken but were not. A large-scale comparison of real and possible histories would establish whether the outcome of evolution represents an optimal state driven by natural selection or the contingent product of historical chance events; it would also reveal how the underlying distribution of functions across sequence space shaped historical evolution. Here we combine ancestral protein reconstruction with deep mutational scanning to characterize alternative histories in the sequence space around an ancient transcription factor, which evolved a novel biological function through well-characterized mechanisms. We find hundreds of alternative protein sequences that use diverse biochemical mechanisms to perform the derived function at least as well as the historical outcome. These alternatives all require prior permissive substitutions that do not enhance the derived function, but not all require the same permissive changes that occurred during history. We find that if evolution had begun from a different starting point within the network of sequences encoding the ancestral function, outcomes with different genetic and biochemical forms would probably have resulted; this contingency arises from the distribution of functional variants in sequence space and epistasis between residues. Our results illuminate the topology of the vast space of possibilities from which history sampled one path, highlighting how the outcome of evolution depends on a serial chain of compounding chance events.
Collapse
Affiliation(s)
- Tyler N Starr
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois 60637, USA
| | - Lora K Picton
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA
| | - Joseph W Thornton
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| |
Collapse
|
208
|
Rubin AF, Gelman H, Lucas N, Bajjalieh SM, Papenfuss AT, Speed TP, Fowler DM. A statistical framework for analyzing deep mutational scanning data. Genome Biol 2017; 18:150. [PMID: 28784151 PMCID: PMC5547491 DOI: 10.1186/s13059-017-1272-5] [Citation(s) in RCA: 119] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 07/06/2017] [Indexed: 11/10/2022] Open
Abstract
Deep mutational scanning is a widely used method for multiplex measurement of functional consequences of protein variants. We developed a new deep mutational scanning statistical model that generates error estimates for each measurement, capturing both sampling error and consistency between replicates. We apply our model to one novel and five published datasets comprising 243,732 variants and demonstrate its superiority in removing noisy variants and conducting hypothesis testing. Simulations show our model applies to scans based on cell growth or binding and handles common experimental errors. We implemented our model in Enrich2, software that can empower researchers analyzing deep mutational scanning data.
Collapse
Affiliation(s)
- Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, VIC, 3010, Australia.,Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, 3000, Australia.,Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
| | - Hannah Gelman
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA.,Institute for Protein Design, University of Washington, Seattle, WA, 98195, USA
| | - Nathan Lucas
- Department of Pathology, University of Washington, Seattle, WA, 98195, USA
| | - Sandra M Bajjalieh
- Department of Pathology, University of Washington, Seattle, WA, 98195, USA
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, VIC, 3010, Australia.,Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, 3000, Australia.,Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, 3010, Australia.,Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Terence P Speed
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.,Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA. .,Department of Bioengineering, University of Washington, Seattle, WA, 98195, USA.
| |
Collapse
|
209
|
Wrenbeck EE, Faber MS, Whitehead TA. Deep sequencing methods for protein engineering and design. Curr Opin Struct Biol 2017; 45:36-44. [PMID: 27886568 PMCID: PMC5440218 DOI: 10.1016/j.sbi.2016.11.001] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 11/01/2016] [Indexed: 11/27/2022]
Abstract
The advent of next-generation sequencing (NGS) has revolutionized protein science, and the development of complementary methods enabling NGS-driven protein engineering have followed. In general, these experiments address the functional consequences of thousands of protein variants in a massively parallel manner using genotype-phenotype linked high-throughput functional screens followed by DNA counting via deep sequencing. We highlight the use of information rich datasets to engineer protein molecular recognition. Examples include the creation of multiple dual-affinity Fabs targeting structurally dissimilar epitopes and engineering of a broad germline-targeted anti-HIV-1 immunogen. Additionally, we highlight the generation of enzyme fitness landscapes for conducting fundamental studies of protein behavior and evolution. We conclude with discussion of technological advances.
Collapse
Affiliation(s)
- Emily E Wrenbeck
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI 48824, United States
| | - Matthew S Faber
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, United States
| | - Timothy A Whitehead
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI 48824, United States; Departments of Biosystems and Agricultural Engineering, Michigan State University, East Lansing, MI 48824, United States.
| |
Collapse
|
210
|
Analysis of Large-Scale Mutagenesis Data To Assess the Impact of Single Amino Acid Substitutions. Genetics 2017; 207:53-61. [PMID: 28751422 PMCID: PMC5586385 DOI: 10.1534/genetics.117.300064] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Accepted: 07/24/2017] [Indexed: 11/18/2022] Open
Abstract
Mutagenesis is a widely used method for identifying protein positions that are important for function or ligand binding. Advances in high-throughput DNA sequencing and mutagenesis techniques have enabled measurement of the effects of nearly all possible amino acid substitutions in many proteins. The resulting large-scale mutagenesis data sets offer a unique opportunity to draw general conclusions about the effects of different amino acid substitutions. Thus, we analyzed 34,373 mutations in 14 proteins whose effects were measured using large-scale mutagenesis approaches. Methionine was the most tolerated substitution, while proline was the least tolerated. We found that several substitutions, including histidine and asparagine, best recapitulated the effects of other substitutions, even when the identity of the wild-type amino acid was considered. The effects of histidine and asparagine substitutions also correlated best with the effects of other substitutions in different structural contexts. Furthermore, highly disruptive substitutions like aspartic and glutamic acid had the most discriminatory power for detecting ligand interface positions. Our work highlights the utility of large-scale mutagenesis data, and our conclusions can help guide future single substitution mutational scans.
Collapse
|
211
|
Koenig P, Sanowar S, Lee CV, Fuh G. Tuning the specificity of a Two-in-One Fab against three angiogenic antigens by fully utilizing the information of deep mutational scanning. MAbs 2017; 9:959-967. [PMID: 28585908 PMCID: PMC5540083 DOI: 10.1080/19420862.2017.1337618] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 05/24/2017] [Accepted: 05/27/2017] [Indexed: 10/19/2022] Open
Abstract
Monoclonal antibodies developed for therapeutic or diagnostic purposes need to demonstrate highly defined binding specificity profiles. Engineering of an antibody to enhance or reduce binding to related antigens is often needed to achieve the desired biologic activity without safety concern. Here, we describe a deep sequencing-aided engineering strategy to fine-tune the specificity of an angiopoietin-2 (Ang2)/vascular endothelial growth factor (VEGF) dual action Fab, 5A12.1 for the treatment of age-related macular degeneration. This antibody utilizes overlapping complementarity-determining region (CDR) sites for dual Ang2/VEGF interaction with KD in the sub-nanomolar range. However, it also exhibits significant (KD of 4 nM) binding to angiopoietin-1, which has high sequence identity with Ang2. We generated a large phage-displayed library of 5A12.1 Fab variants with all possible single mutations in the 6 CDRs. By tracking the change of prevalence of each mutation during various selection conditions, we identified 35 mutations predicted to decrease the affinity for Ang1 while maintaining the affinity for Ang2 and VEGF. We confirmed the specificity profiles for 25 of these single mutations as Fab protein. Structural analysis showed that some of the Fab mutations cluster near a potential Ang1/2 epitope residue that differs in the 2 proteins, while others are up to 15 Å away from the antigen-binding site and likely influence the binding interaction remotely. The approach presented here provides a robust and efficient method for specificity engineering that does not require prior knowledge of the antigen antibody interaction and can be broadly applied to antibody specificity engineering projects.
Collapse
Affiliation(s)
- Patrick Koenig
- Department of Antibody Engineering, Genentech Inc., South San Francisco, CA, USA
| | - Sarah Sanowar
- Department of Antibody Engineering, Genentech Inc., South San Francisco, CA, USA
| | - Chingwei V. Lee
- Department of Antibody Engineering, Genentech Inc., South San Francisco, CA, USA
| | - Germaine Fuh
- Department of Antibody Engineering, Genentech Inc., South San Francisco, CA, USA
| |
Collapse
|
212
|
Abstract
Human genetics has historically depended on the identification of individuals whose natural genetic variation underlies an observable trait or disease risk. Here we argue that new technologies now augment this historical approach by allowing the use of massively parallel assays in model systems to measure the functional effects of genetic variation in many human genes. These studies will help establish the disease risk of both observed and potential genetic variants and to overcome the problem of "variants of uncertain significance."
Collapse
|
213
|
Sosa-Pagán JO, Iversen ES, Grandl J. TRPV1 temperature activation is specifically sensitive to strong decreases in amino acid hydrophobicity. Sci Rep 2017; 7:549. [PMID: 28373693 PMCID: PMC5428820 DOI: 10.1038/s41598-017-00636-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Accepted: 03/07/2017] [Indexed: 12/15/2022] Open
Abstract
Several transient receptor potential (TRP) ion channels can be directly activated by hot or cold temperature with high sensitivity. However, the structures and molecular mechanism giving rise to their high temperature sensitivity are not fully understood. One hypothesized mechanism assumes that temperature activation is driven by the exposure of hydrophobic residues to solvent. This mechanism further predicts that residues are exposed to solvent in a coordinated fashion, but without necessarily being located in close proximity to each other. However, there is little experimental evidence supporting this mechanism in TRP channels. Here, we combined high-throughput mutagenesis, functional screening, and deep sequencing to identify mutations from a total of ~7,300 TRPV1 random mutant clones. We found that strong decreases in hydrophobicity of amino acids are better tolerated for activation by capsaicin than for activation by hot temperature, suggesting that strong hydrophobicity might be specifically required for temperature activation. Altogether, our work provides initial correlative support for a previously hypothesized temperature mechanism in TRP ion channels.
Collapse
Affiliation(s)
- Jason O Sosa-Pagán
- Department of Neurobiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Edwin S Iversen
- Department of Statistical Science, Duke University, Durham, NC 27710, USA
| | - Jörg Grandl
- Department of Neurobiology, Duke University Medical Center, Durham, NC 27710, USA.
| |
Collapse
|
214
|
Rinaldi S, Gori A, Annovazzi C, Ferrandi EE, Monti D, Colombo G. Unraveling Energy and Dynamics Determinants to Interpret Protein Functional Plasticity: The Limonene-1,2-epoxide-hydrolase Case Study. J Chem Inf Model 2017; 57:717-725. [DOI: 10.1021/acs.jcim.6b00504] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Silvia Rinaldi
- Istituto di Chimica del Riconoscimento Molecolare, C.N.R., Via Mario Bianco 9, 20131 Milano, Italy
| | - Alessandro Gori
- Istituto di Chimica del Riconoscimento Molecolare, C.N.R., Via Mario Bianco 9, 20131 Milano, Italy
| | - Celeste Annovazzi
- Istituto di Chimica del Riconoscimento Molecolare, C.N.R., Via Mario Bianco 9, 20131 Milano, Italy
| | - Erica E. Ferrandi
- Istituto di Chimica del Riconoscimento Molecolare, C.N.R., Via Mario Bianco 9, 20131 Milano, Italy
| | - Daniela Monti
- Istituto di Chimica del Riconoscimento Molecolare, C.N.R., Via Mario Bianco 9, 20131 Milano, Italy
| | - Giorgio Colombo
- Istituto di Chimica del Riconoscimento Molecolare, C.N.R., Via Mario Bianco 9, 20131 Milano, Italy
| |
Collapse
|
215
|
Chan YH, Venev SV, Zeldovich KB, Matthews CR. Correlation of fitness landscapes from three orthologous TIM barrels originates from sequence and structure constraints. Nat Commun 2017; 8:14614. [PMID: 28262665 PMCID: PMC5343507 DOI: 10.1038/ncomms14614] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 01/11/2017] [Indexed: 02/07/2023] Open
Abstract
Sequence divergence of orthologous proteins enables adaptation to environmental stresses and promotes evolution of novel functions. Limits on evolution imposed by constraints on sequence and structure were explored using a model TIM barrel protein, indole-3-glycerol phosphate synthase (IGPS). Fitness effects of point mutations in three phylogenetically divergent IGPS proteins during adaptation to temperature stress were probed by auxotrophic complementation of yeast with prokaryotic, thermophilic IGPS. Analysis of beneficial mutations pointed to an unexpected, long-range allosteric pathway towards the active site of the protein. Significant correlations between the fitness landscapes of distant orthologues implicate both sequence and structure as primary forces in defining the TIM barrel fitness landscape and suggest that fitness landscapes can be translocated in sequence space. Exploration of fitness landscapes in the context of a protein fold provides a strategy for elucidating the sequence-structure-fitness relationships in other common motifs. The TIM barrel fold is an evolutionarily conserved motif found in proteins with a variety of enzymatic functions. Here the authors explore the fitness landscape of the TIM barrel protein IGPS and uncover evolutionary constraints on both sequence and structure, accompanied by long range allosteric interactions.
Collapse
Affiliation(s)
- Yvonne H Chan
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 364 Plantation Street, Worcester, Massachusetts 01605, USA
| | - Sergey V Venev
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, Massachusetts 01605, USA
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, Massachusetts 01605, USA
| | - C Robert Matthews
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 364 Plantation Street, Worcester, Massachusetts 01605, USA
| |
Collapse
|
216
|
Inference of the Distribution of Selection Coefficients for New Nonsynonymous Mutations Using Large Samples. Genetics 2017; 206:345-361. [PMID: 28249985 PMCID: PMC5419480 DOI: 10.1534/genetics.116.197145] [Citation(s) in RCA: 113] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2016] [Accepted: 02/14/2017] [Indexed: 12/23/2022] Open
Abstract
The distribution of fitness effects (DFE) has considerable importance in population genetics. To date, estimates of the DFE come from studies using a small number of individuals. Thus, estimates of the proportion of moderately to strongly deleterious new mutations may be unreliable because such variants are unlikely to be segregating in the data. Additionally, the true functional form of the DFE is unknown, and estimates of the DFE differ significantly between studies. Here we present a flexible and computationally tractable method, called Fit∂a∂i, to estimate the DFE of new mutations using the site frequency spectrum from a large number of individuals. We apply our approach to the frequency spectrum of 1300 Europeans from the Exome Sequencing Project ESP6400 data set, 1298 Danes from the LuCamp data set, and 432 Europeans from the 1000 Genomes Project to estimate the DFE of deleterious nonsynonymous mutations. We infer significantly fewer (0.38-0.84 fold) strongly deleterious mutations with selection coefficient |s| > 0.01 and more (1.24-1.43 fold) weakly deleterious mutations with selection coefficient |s| < 0.001 compared to previous estimates. Furthermore, a DFE that is a mixture distribution of a point mass at neutrality plus a gamma distribution fits better than a gamma distribution in two of the three data sets. Our results suggest that nearly neutral forces play a larger role in human evolution than previously thought.
Collapse
|
217
|
Najar TA, Khare S, Pandey R, Gupta SK, Varadarajan R. Mapping Protein Binding Sites and Conformational Epitopes Using Cysteine Labeling and Yeast Surface Display. Structure 2017; 25:395-406. [PMID: 28132782 DOI: 10.1016/j.str.2016.12.016] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2016] [Revised: 12/10/2016] [Accepted: 12/28/2016] [Indexed: 11/16/2022]
Abstract
We describe a facile method for mapping protein:ligand binding sites and conformational epitopes. The method uses a combination of Cys scanning mutagenesis, chemical labeling, and yeast surface display. While Ala scanning is widely used for similar purposes, often mutation to Ala (or other amino acids) has little effect on binding, except at hotspot residues. Many residues in physical contact with a binding partner are insensitive to substitution with Ala. In contrast, we show that labeling of Cys residues in a binding site consistently abrogates binding. We couple this methodology to yeast surface display and deep sequencing to map conformational epitopes targeted by both monoclonal antibodies and polyclonal sera as well as a protein:ligand binding site. The method does not require purified protein, can distinguish buried and exposed residues, and can be extended to other display formats, including mammalian cells and viruses, emphasizing its wide applicability.
Collapse
Affiliation(s)
- Tariq Ahmad Najar
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| | - Shruti Khare
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| | - Rajesh Pandey
- CSIR-Institute of Genomics and Integrative Biology, Mathura Road, New Delhi 110 020, India
| | - Satish K Gupta
- National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi 110 067, India
| | - Raghavan Varadarajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India; Jawaharlal Nehru Center for Advanced Scientific Research, Jakkur, Bangalore 560 064, India.
| |
Collapse
|
218
|
Mutational landscape of antibody variable domains reveals a switch modulating the interdomain conformational dynamics and antigen binding. Proc Natl Acad Sci U S A 2017; 114:E486-E495. [PMID: 28057863 DOI: 10.1073/pnas.1613231114] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Somatic mutations within the antibody variable domains are critical to the immense capacity of the immune repertoire. Here, via a deep mutational scan, we dissect how mutations at all positions of the variable domains of a high-affinity anti-VEGF antibody G6.31 impact its antigen-binding function. The resulting mutational landscape demonstrates that large portions of antibody variable domain positions are open to mutation, and that beneficial mutations can be found throughout the variable domains. We determine the role of one antigen-distal light chain position 83, demonstrating that mutation at this site optimizes both antigen affinity and thermostability by modulating the interdomain conformational dynamics of the antigen-binding fragment. Furthermore, by analyzing a large number of human antibody sequences and structures, we demonstrate that somatic mutations occur frequently at position 83, with corresponding domain conformations observed for G6.31. Therefore, the modulation of interdomain dynamics represents an important mechanism during antibody maturation in vivo.
Collapse
|
219
|
Adams RM, Mora T, Walczak AM, Kinney JB. Measuring the sequence-affinity landscape of antibodies with massively parallel titration curves. eLife 2016; 5. [PMID: 28035901 PMCID: PMC5268739 DOI: 10.7554/elife.23156] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 12/27/2016] [Indexed: 11/30/2022] Open
Abstract
Despite the central role that antibodies play in the adaptive immune system and in biotechnology, much remains unknown about the quantitative relationship between an antibody’s amino acid sequence and its antigen binding affinity. Here we describe a new experimental approach, called Tite-Seq, that is capable of measuring binding titration curves and corresponding affinities for thousands of variant antibodies in parallel. The measurement of titration curves eliminates the confounding effects of antibody expression and stability that arise in standard deep mutational scanning assays. We demonstrate Tite-Seq on the CDR1H and CDR3H regions of a well-studied scFv antibody. Our data shed light on the structural basis for antigen binding affinity and suggests a role for secondary CDR loops in establishing antibody stability. Tite-Seq fills a large gap in the ability to measure critical aspects of the adaptive immune system, and can be readily used for studying sequence-affinity landscapes in other protein systems. DOI:http://dx.doi.org/10.7554/eLife.23156.001 Antibodies are proteins produced by cells of the immune system to tag or neutralize potential threats to the body, such as foreign substances and disease-causing microbes. Antibodies do this by binding to target molecules called antigens. An antibody’s ability to bind to an antigen depends on the sequence of amino acids – the building blocks of proteins – that make up the antibody. Through a process that randomizes this sequence of amino acids, the immune system generates a vast pool of antibodies that are able to target almost any foreign antigen that exists in nature. Currently, little is understood about how the sequence of amino acids in an antibody determines how strongly that antibody binds to its antigen target – a property referred to as the antibody’s binding affinity. Answering this fundamental question requires techniques that can measure the affinities of many different antibodies at the same time. However, previous high-throughput methods have been unable to provide quantitative measurements of binding affinities. These kinds of measurements are difficult because an antibody’s amino acid sequence governs more than just binding affinity: it also affects how easy it is to produce that antibody, and what fraction of antibody molecules work properly. Adams et al. now describe a new method, named “Tite-Seq”, that overcomes these issues. First, thousands of different antibodies are displayed on the surface of yeast cells, with each cell carrying a single kind of antibody. These cells are then incubated with fluorescently labeled antigen at a wide range of different concentrations. Next, the yeast cells are sorted based on how brightly they glow; brighter cells have more antigen bound to them, and so it is possible to calculate how much of the antigen is bound to each kind of antibody at each concentration. Plotting these data provides a “binding curve” for each antibody, which is then used to read off the antibody’s binding affinity in a way that is not affected by the factors that have plagued other high-throughput methods. Tite-Seq is thus able to measure the binding affinities for thousands of different antibodies at the same time. This will potentially allow researchers to address many fundamental and yet unanswered questions about how the immune system works. Tite-Seq can also be used to measure how amino acid sequence affects the binding affinity of proteins other than antibodies. DOI:http://dx.doi.org/10.7554/eLife.23156.002
Collapse
Affiliation(s)
- Rhys M Adams
- Laboratoire de Physique Théorique, UMR8549, CNRS, École Normale Supérieure, Paris, France.,Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, United States
| | - Thierry Mora
- Laboratoire de Physique Statistique, UMR8550, CNRS, École Normale Supérieure, Paris, France
| | - Aleksandra M Walczak
- Laboratoire de Physique Théorique, UMR8549, CNRS, École Normale Supérieure, Paris, France
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, United States
| |
Collapse
|
220
|
Stolz A, Putyrski M, Kutle I, Huber J, Wang C, Major V, Sidhu SS, Youle RJ, Rogov VV, Dötsch V, Ernst A, Dikic I. Fluorescence-based ATG8 sensors monitor localization and function of LC3/GABARAP proteins. EMBO J 2016; 36:549-564. [PMID: 28028054 DOI: 10.15252/embj.201695063] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2016] [Revised: 11/23/2016] [Accepted: 11/27/2016] [Indexed: 12/25/2022] Open
Abstract
Autophagy is a cellular surveillance pathway that balances metabolic and energy resources and transports specific cargos, including damaged mitochondria, other broken organelles, or pathogens for degradation to the lysosome. Central components of autophagosomal biogenesis are six members of the LC3 and GABARAP family of ubiquitin-like proteins (mATG8s). We used phage display to isolate peptides that possess bona fide LIR (LC3-interacting region) properties and are selective for individual mATG8 isoforms. Sensitivity of the developed sensors was optimized by multiplication, charge distribution, and fusion with a membrane recruitment (FYVE) or an oligomerization (PB1) domain. We demonstrate the use of the engineered peptides as intracellular sensors that recognize specifically GABARAP, GABL1, GABL2, and LC3C, as well as a bispecific sensor for LC3A and LC3B. By using an LC3C-specific sensor, we were able to monitor recruitment of endogenous LC3C to Salmonella during xenophagy, as well as to mitochondria during mitophagy. The sensors are general tools to monitor the fate of mATG8s and will be valuable in decoding the biological functions of the individual LC3/GABARAPs.
Collapse
Affiliation(s)
- Alexandra Stolz
- Institute of Biochemistry II Goethe University, Frankfurt am Main, Germany
| | - Mateusz Putyrski
- Institute of Biochemistry II Goethe University, Frankfurt am Main, Germany.,Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Project Group Translational Medicine and Pharmacology TMP, Frankfurt am Main, Germany
| | - Ivana Kutle
- Buchmann Institute for Molecular Life Sciences, Frankfurt am Main, Germany
| | - Jessica Huber
- Institute of Biophysical Chemistry, Goethe University, Frankfurt am Main, Germany
| | - Chunxin Wang
- Biochemistry Section, Surgical Neurology Branch, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Viktória Major
- Institute of Biochemistry II Goethe University, Frankfurt am Main, Germany
| | - Sachdev S Sidhu
- Banting and Best Department of Medical Research, The Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, The Donnelly Centre, University of Toronto, Toronto, ON, Canada
| | - Richard J Youle
- Biochemistry Section, Surgical Neurology Branch, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Vladimir V Rogov
- Institute of Biophysical Chemistry, Goethe University, Frankfurt am Main, Germany
| | - Volker Dötsch
- Institute of Biophysical Chemistry, Goethe University, Frankfurt am Main, Germany
| | - Andreas Ernst
- Institute of Biochemistry II Goethe University, Frankfurt am Main, Germany .,Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Project Group Translational Medicine and Pharmacology TMP, Frankfurt am Main, Germany
| | - Ivan Dikic
- Institute of Biochemistry II Goethe University, Frankfurt am Main, Germany .,Buchmann Institute for Molecular Life Sciences, Frankfurt am Main, Germany
| |
Collapse
|
221
|
Press O, Zvagelsky T, Vyazmensky M, Kleinau G, Engel S. Construction of Structural Mimetics of the Thyrotropin Receptor Intracellular Domain. Biophys J 2016; 111:2620-2628. [PMID: 28002738 PMCID: PMC5192603 DOI: 10.1016/j.bpj.2016.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Revised: 10/26/2016] [Accepted: 11/02/2016] [Indexed: 10/20/2022] Open
Abstract
The signaling of a G protein-coupled receptor (GPCR) is dictated by the complementary responsiveness of interacting intracellular effectors such as G proteins. Many GPCRs are known to couple to more than one G protein subtype and induce a multitude of signaling pathways, although the in vivo relevance of particular pathways is mostly unrecognized. Dissecting GPCR signaling in terms of the pathways that are activated will boost our understanding of the molecular fundamentals of hormone action. The structural determinants governing the selectivity of GPCR/G protein coupling, however, remain obscure. Here, we describe the design of soluble GPCR mimetics to study the details of the interplay between G-proteins and activators. We constructed functional mimetics of the intracellular domain of a model GPCR, the thyrotropin receptor. We based the construction on a unique scaffold, 6-Helix, an artificial protein that was derived from the elements of the trimer-of-hairpins structure of HIV gp41 and represents a bundle of six α-helices. The 6-Helix scaffold, which endowed the substituted thyrotropin receptor intracellular domain elements with spatial constraints analogous to those found in native receptors, enabled the reconstitution of a microdomain that consists of intracellular loops 2 and 3, and is capable of binding and activating Gα-(s). The 6-Helix-based mimetics could be used as a platform to study the molecular basis of GPCR/G protein recognition. Such knowledge could help investigators develop novel therapeutic strategies for GPCR-related disorders by targeting the GPCR/G protein interfaces and counteracting cellular dysfunctions via focused tuning of GPCR signaling.
Collapse
Affiliation(s)
- Olga Press
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Tatiana Zvagelsky
- Department of Chemistry, Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Maria Vyazmensky
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Gunnar Kleinau
- Institute of Experimental Pediatric Endocrinology, Charité-Universitätsmedizin, Berlin, Germany
| | - Stanislav Engel
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel; National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| |
Collapse
|
222
|
Kowalsky CA, Whitehead TA. Determination of binding affinity upon mutation for type I dockerin-cohesin complexes from Clostridium thermocellum and Clostridium cellulolyticum using deep sequencing. Proteins 2016; 84:1914-1928. [PMID: 27699856 DOI: 10.1002/prot.25175] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Revised: 09/05/2016] [Accepted: 09/27/2016] [Indexed: 12/27/2022]
Abstract
The comprehensive sequence determinants of binding affinity for type I cohesin toward dockerin from Clostridium thermocellum and Clostridium cellulolyticum was evaluated using deep mutational scanning coupled to yeast surface display. We measured the relative binding affinity to dockerin for 2970 and 2778 single point mutants of C. thermocellum and C. cellulolyticum, respectively, representing over 96% of all possible single point mutants. The interface ΔΔG for each variant was reconstructed from sequencing counts and compared with the three independent experimental methods. This reconstruction results in a narrow dynamic range of -0.8-0.5 kcal/mol. The computational software packages FoldX and Rosetta were used to predict mutations that disrupt binding by more than 0.4 kcal/mol. The area under the curve of receiver operator curves was 0.82 for FoldX and 0.77 for Rosetta, showing reasonable agreements between predictions and experimental results. Destabilizing mutations to core and rim positions were predicted with higher accuracy than support positions. This benchmark dataset may be useful for developing new computational prediction tools for the prediction of the mutational effect on binding affinities for protein-protein interactions. Experimental considerations to improve precision and range of the reconstruction method are discussed. Proteins 2016; 84:1914-1928. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Caitlin A Kowalsky
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, Michigan, 48824
| | - Timothy A Whitehead
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, Michigan, 48824
- Department of Biosystems and Agricultural Engineering, Michigan State University, East Lansing, Michigan, 48824
| |
Collapse
|
223
|
Majithia AR, Tsuda B, Agostini M, Gnanapradeepan K, Rice R, Peloso G, Patel KA, Zhang X, Broekema MF, Patterson N, Duby M, Sharpe T, Kalkhoven E, Rosen ED, Barroso I, Ellard S, Kathiresan S, O’Rahilly S, Chatterjee K, Florez JC, Mikkelsen T, Savage DB, Altshuler D. Prospective functional classification of all possible missense variants in PPARG. Nat Genet 2016; 48:1570-1575. [PMID: 27749844 PMCID: PMC5131844 DOI: 10.1038/ng.3700] [Citation(s) in RCA: 158] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Accepted: 09/23/2016] [Indexed: 12/13/2022]
Abstract
Clinical exome sequencing routinely identifies missense variants in disease-related genes, but functional characterization is rarely undertaken, leading to diagnostic uncertainty. For example, mutations in PPARG cause Mendelian lipodystrophy and increase risk of type 2 diabetes (T2D). Although approximately 1 in 500 people harbor missense variants in PPARG, most are of unknown consequence. To prospectively characterize PPARγ variants, we used highly parallel oligonucleotide synthesis to construct a library encoding all 9,595 possible single-amino acid substitutions. We developed a pooled functional assay in human macrophages, experimentally evaluated all protein variants, and used the experimental data to train a variant classifier by supervised machine learning. When applied to 55 new missense variants identified in population-based and clinical sequencing, the classifier annotated 6 variants as pathogenic; these were subsequently validated by single-variant assays. Saturation mutagenesis and prospective experimental characterization can support immediate diagnostic interpretation of newly discovered missense variants in disease-related genes.
Collapse
Affiliation(s)
- Amit R. Majithia
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Diabetes Research Center, Diabetes Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Ben Tsuda
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Maura Agostini
- University of Cambridge Metabolic Research Laboratories, Wellcome Trust-Medical Research Council Institute of Metabolic Science, Cambridge CB2 0QQ, United Kingdom
| | - Keerthana Gnanapradeepan
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Robert Rice
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Gina Peloso
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Kashyap A. Patel
- Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK
| | - Xiaolan Zhang
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Marjoleine F. Broekema
- Molecular Cancer Research and Center for Molecular Medicine, University Medical Centre Utrecht, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands
| | - Nick Patterson
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Marc Duby
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Ted Sharpe
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Eric Kalkhoven
- Molecular Cancer Research and Center for Molecular Medicine, University Medical Centre Utrecht, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands
| | - Evan D. Rosen
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Division of Endocrinology and Metabolism, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA 02115, USA
| | - Inês Barroso
- University of Cambridge Metabolic Research Laboratories, Wellcome Trust-Medical Research Council Institute of Metabolic Science, Cambridge CB2 0QQ, United Kingdom
| | - Sian Ellard
- Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK
- Department of Molecular Genetics, Royal Devon and Exeter National Health Service Foundation Trust, Exeter, UK
| | | | - Sekar Kathiresan
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Cardiovascular Research Center, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | | | - Stephen O’Rahilly
- University of Cambridge Metabolic Research Laboratories, Wellcome Trust-Medical Research Council Institute of Metabolic Science, Cambridge CB2 0QQ, United Kingdom
| | | | - Krishna Chatterjee
- University of Cambridge Metabolic Research Laboratories, Wellcome Trust-Medical Research Council Institute of Metabolic Science, Cambridge CB2 0QQ, United Kingdom
| | - Jose C. Florez
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Diabetes Research Center, Diabetes Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Tarjei Mikkelsen
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - David B. Savage
- University of Cambridge Metabolic Research Laboratories, Wellcome Trust-Medical Research Council Institute of Metabolic Science, Cambridge CB2 0QQ, United Kingdom
| | - David Altshuler
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Diabetes Research Center, Diabetes Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
224
|
Haddox HK, Dingens AS, Bloom JD. Experimental Estimation of the Effects of All Amino-Acid Mutations to HIV's Envelope Protein on Viral Replication in Cell Culture. PLoS Pathog 2016; 12:e1006114. [PMID: 27959955 PMCID: PMC5189966 DOI: 10.1371/journal.ppat.1006114] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Revised: 12/27/2016] [Accepted: 12/07/2016] [Indexed: 11/18/2022] Open
Abstract
HIV is notorious for its capacity to evade immunity and anti-viral drugs through rapid sequence evolution. Knowledge of the functional effects of mutations to HIV is critical for understanding this evolution. HIV's most rapidly evolving protein is its envelope (Env). Here we use deep mutational scanning to experimentally estimate the effects of all amino-acid mutations to Env on viral replication in cell culture. Most mutations are under purifying selection in our experiments, although a few sites experience strong selection for mutations that enhance HIV's replication in cell culture. We compare our experimental measurements of each site's preference for each amino acid to the actual frequencies of these amino acids in naturally occurring HIV sequences. Our measured amino-acid preferences correlate with amino-acid frequencies in natural sequences for most sites. However, our measured preferences are less concordant with natural amino-acid frequencies at surface-exposed sites that are subject to pressures absent from our experiments such as antibody selection. Our data enable us to quantify the inherent mutational tolerance of each site in Env. We show that the epitopes of broadly neutralizing antibodies have a significantly reduced inherent capacity to tolerate mutations, rigorously validating a pervasive idea in the field. Overall, our results help disentangle the role of inherent functional constraints and external selection pressures in shaping Env's evolution.
Collapse
Affiliation(s)
- Hugh K. Haddox
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Molecular and Cellular Biology PhD Program, University of Washington, Seattle, Washington, United States of America
| | - Adam S. Dingens
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Molecular and Cellular Biology PhD Program, University of Washington, Seattle, Washington, United States of America
| | - Jesse D. Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| |
Collapse
|
225
|
Payen C, Sunshine AB, Ong GT, Pogachar JL, Zhao W, Dunham MJ. High-Throughput Identification of Adaptive Mutations in Experimentally Evolved Yeast Populations. PLoS Genet 2016; 12:e1006339. [PMID: 27727276 PMCID: PMC5065121 DOI: 10.1371/journal.pgen.1006339] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Accepted: 09/05/2016] [Indexed: 11/19/2022] Open
Abstract
High-throughput sequencing has enabled genetic screens that can rapidly identify mutations that occur during experimental evolution. The presence of a mutation in an evolved lineage does not, however, constitute proof that the mutation is adaptive, given the well-known and widespread phenomenon of genetic hitchhiking, in which a non-adaptive or even detrimental mutation can co-occur in a genome with a beneficial mutation and the combined genotype is carried to high frequency by selection. We approximated the spectrum of possible beneficial mutations in Saccharomyces cerevisiae using sets of single-gene deletions and amplifications of almost all the genes in the S. cerevisiae genome. We determined the fitness effects of each mutation in three different nutrient-limited conditions using pooled competitions followed by barcode sequencing. Although most of the mutations were neutral or deleterious, ~500 of them increased fitness. We then compared those results to the mutations that actually occurred during experimental evolution in the same three nutrient-limited conditions. On average, ~35% of the mutations that occurred during experimental evolution were predicted by the systematic screen to be beneficial. We found that the distribution of fitness effects depended on the selective conditions. In the phosphate-limited and glucose-limited conditions, a large number of beneficial mutations of nearly equivalent, small effects drove the fitness increases. In the sulfate-limited condition, one type of mutation, the amplification of the high-affinity sulfate transporter, dominated. In the absence of that mutation, evolution in the sulfate-limited condition involved mutations in other genes that were not observed previously—but were predicted by the systematic screen. Thus, gross functional screens have the potential to predict and identify adaptive mutations that occur during experimental evolution. Experimental evolution allows us to observe evolution in real time. New advances in genome sequencing make it trivial to discover the mutations that have arisen in evolved cultures; however, linking those mutations to particular adaptive traits remains difficult. We evaluated the fitness impacts of thousands of single-gene losses and amplifications in yeast. We discovered that only a fraction of the hundreds of possible beneficial mutations were actually detected in evolution experiments performed previously. Our results provide evidence that 35% of the mutations identified in experimentally evolved populations are advantageous and that the distribution of beneficial fitness effects depends on the genetic background and the selective conditions. Furthermore, we show that it is possible to select for alternative mutations that improve fitness by blocking particularly high-fitness routes to adaptation.
Collapse
Affiliation(s)
- Celia Payen
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Anna B. Sunshine
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Giang T. Ong
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Jamie L. Pogachar
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Wei Zhao
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Maitreya J. Dunham
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- * E-mail:
| |
Collapse
|
226
|
Plasmid-based one-pot saturation mutagenesis. Nat Methods 2016; 13:928-930. [PMID: 27723752 DOI: 10.1038/nmeth.4029] [Citation(s) in RCA: 107] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 09/09/2016] [Indexed: 01/12/2023]
Abstract
Deep mutational scanning is a foundational tool for addressing the functional consequences of large numbers of mutants, but a more efficient and accessible method for construction of user-defined mutagenesis libraries is needed. Here we present nicking mutagenesis, a robust, single-day, one-pot saturation mutagenesis method performed on routinely prepped plasmid dsDNA. The method can be used to produce comprehensive or single- or multi-site saturation mutagenesis libraries.
Collapse
|
227
|
Au L, Green DF. Direct Calculation of Protein Fitness Landscapes through Computational Protein Design. Biophys J 2016; 110:75-84. [PMID: 26745411 DOI: 10.1016/j.bpj.2015.11.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Revised: 11/03/2015] [Accepted: 11/16/2015] [Indexed: 11/24/2022] Open
Abstract
Naturally selected amino-acid sequences or experimentally derived ones are often the basis for understanding how protein three-dimensional conformation and function are determined by primary structure. Such sequences for a protein family comprise only a small fraction of all possible variants, however, representing the fitness landscape with limited scope. Explicitly sampling and characterizing alternative, unexplored protein sequences would directly identify fundamental reasons for sequence robustness (or variability), and we demonstrate that computational methods offer an efficient mechanism toward this end, on a large scale. The dead-end elimination and A(∗) search algorithms were used here to find all low-energy single mutant variants, and corresponding structures of a G-protein heterotrimer, to measure changes in structural stability and binding interactions to define a protein fitness landscape. We established consistency between these algorithms with known biophysical and evolutionary trends for amino-acid substitutions, and could thus recapitulate known protein side-chain interactions and predict novel ones.
Collapse
Affiliation(s)
- Loretta Au
- Department of Statistics, The University of Chicago, Chicago, Illinois.
| | - David F Green
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York
| |
Collapse
|
228
|
Harris DT, Wang N, Riley TP, Anderson SD, Singh NK, Procko E, Baker BM, Kranz DM. Deep Mutational Scans as a Guide to Engineering High Affinity T Cell Receptor Interactions with Peptide-bound Major Histocompatibility Complex. J Biol Chem 2016; 291:24566-24578. [PMID: 27681597 DOI: 10.1074/jbc.m116.748681] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Revised: 09/15/2016] [Indexed: 11/06/2022] Open
Abstract
Proteins are often engineered to have higher affinity for their ligands to achieve therapeutic benefit. For example, many studies have used phage or yeast display libraries of mutants within complementarity-determining regions to affinity mature antibodies and T cell receptors (TCRs). However, these approaches do not allow rapid assessment or evolution across the entire interface. By combining directed evolution with deep sequencing, it is now possible to generate sequence fitness landscapes that survey the impact of every amino acid substitution across the entire protein-protein interface. Here we used the results of deep mutational scans of a TCR-peptide-MHC interaction to guide mutational strategies. The approach yielded stable TCRs with affinity increases of >200-fold. The substitutions with the greatest enrichments based on the deep sequencing were validated to have higher affinity and could be combined to yield additional improvements. We also conducted in silico binding analyses for every substitution to compare them with the fitness landscape. Computational modeling did not effectively predict the impacts of mutations distal to the interface and did not account for yeast display results that depended on combinations of affinity and protein stability. However, computation accurately predicted affinity changes for mutations within or near the interface, highlighting the complementary strengths of computational modeling and yeast surface display coupled with deep mutational scanning for engineering high affinity TCRs.
Collapse
Affiliation(s)
- Daniel T Harris
- From the Department of Biochemistry, University of Illinois, Urbana, Illinois 61801 and
| | - Ningyan Wang
- From the Department of Biochemistry, University of Illinois, Urbana, Illinois 61801 and
| | - Timothy P Riley
- the Department of Chemistry and Biochemistry and the Harper Cancer Research Institute, University of Notre Dame, South Bend, Indiana 46557
| | - Scott D Anderson
- From the Department of Biochemistry, University of Illinois, Urbana, Illinois 61801 and
| | - Nishant K Singh
- the Department of Chemistry and Biochemistry and the Harper Cancer Research Institute, University of Notre Dame, South Bend, Indiana 46557
| | - Erik Procko
- From the Department of Biochemistry, University of Illinois, Urbana, Illinois 61801 and
| | - Brian M Baker
- the Department of Chemistry and Biochemistry and the Harper Cancer Research Institute, University of Notre Dame, South Bend, Indiana 46557
| | - David M Kranz
- From the Department of Biochemistry, University of Illinois, Urbana, Illinois 61801 and.
| |
Collapse
|
229
|
Sun Z, Mehta SC, Adamski CJ, Gibbs RA, Palzkill T. Deep Sequencing of Random Mutant Libraries Reveals the Active Site of the Narrow Specificity CphA Metallo-β-Lactamase is Fragile to Mutations. Sci Rep 2016; 6:33195. [PMID: 27616327 PMCID: PMC5018959 DOI: 10.1038/srep33195] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Accepted: 08/23/2016] [Indexed: 11/17/2022] Open
Abstract
CphA is a Zn2+-dependent metallo-β-lactamase that efficiently hydrolyzes only carbapenem antibiotics. To understand the sequence requirements for CphA function, single codon random mutant libraries were constructed for residues in and near the active site and mutants were selected for E. coli growth on increasing concentrations of imipenem, a carbapenem antibiotic. At high concentrations of imipenem that select for phenotypically wild-type mutants, the active-site residues exhibit stringent sequence requirements in that nearly all residues in positions that contact zinc, the substrate, or the catalytic water do not tolerate amino acid substitutions. In addition, at high imipenem concentrations a number of residues that do not directly contact zinc or substrate are also essential and do not tolerate substitutions. Biochemical analysis confirmed that amino acid substitutions at essential positions decreased the stability or catalytic activity of the CphA enzyme. Therefore, the CphA active - site is fragile to substitutions, suggesting active-site residues are optimized for imipenem hydrolysis. These results also suggest that resistance to inhibitors targeted to the CphA active site would be slow to develop because of the strong sequence constraints on function.
Collapse
Affiliation(s)
- Zhizeng Sun
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Shrenik C Mehta
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Carolyn J Adamski
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Timothy Palzkill
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| |
Collapse
|
230
|
Protein stability: computation, sequence statistics, and new experimental methods. Curr Opin Struct Biol 2016; 33:161-8. [PMID: 26497286 DOI: 10.1016/j.sbi.2015.09.002] [Citation(s) in RCA: 112] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Revised: 09/22/2015] [Accepted: 09/24/2015] [Indexed: 11/22/2022]
Abstract
Calculating protein stability and predicting stabilizing mutations remain exceedingly difficult tasks, largely due to the inadequacy of potential functions, the difficulty of modeling entropy and the unfolded state, and challenges of sampling, particularly of backbone conformations. Yet, computational design has produced some remarkably stable proteins in recent years, apparently owing to near ideality in structure and sequence features. With caveats, computational prediction of stability can be used to guide mutation, and mutations derived from consensus sequence analysis, especially improved by recent co-variation filters, are very likely to stabilize without sacrificing function. The combination of computational and statistical approaches with library approaches, including new technologies such as deep sequencing and high throughput stability measurements, point to a very exciting near term future for stability engineering, even with difficult computational issues remaining.
Collapse
|
231
|
Wong LH, Sinha S, Bergeron JR, Mellor JC, Giaever G, Flaherty P, Nislow C. Reverse Chemical Genetics: Comprehensive Fitness Profiling Reveals the Spectrum of Drug Target Interactions. PLoS Genet 2016; 12:e1006275. [PMID: 27588687 PMCID: PMC5010250 DOI: 10.1371/journal.pgen.1006275] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 08/03/2016] [Indexed: 01/22/2023] Open
Abstract
The emergence and prevalence of drug resistance demands streamlined strategies to identify drug resistant variants in a fast, systematic and cost-effective way. Methods commonly used to understand and predict drug resistance rely on limited clinical studies from patients who are refractory to drugs or on laborious evolution experiments with poor coverage of the gene variants. Here, we report an integrative functional variomics methodology combining deep sequencing and a Bayesian statistical model to provide a comprehensive list of drug resistance alleles from complex variant populations. Dihydrofolate reductase, the target of methotrexate chemotherapy drug, was used as a model to identify functional mutant alleles correlated with methotrexate resistance. This systematic approach identified previously reported resistance mutations, as well as novel point mutations that were validated in vivo. Use of this systematic strategy as a routine diagnostics tool widens the scope of successful drug research and development. One of the most profound outcomes of fast, reliable genome sequencing is the ability to tailor drug therapy to an individual’s genotype. This ‘personalized’ or ‘precision medicine’ is the realization of a decades-long effort to maximize drug effect and limit unwanted side effects. An undesirable consequence of such targeted therapies, however, is the emergence of drug resistance. This outcome is the result of an evolutionary process where mutations in the drug target render the drug perturbation allow such mutant cells to proliferate. Because of the unbiased, and stochastic nature of the emergence of drug resistance, it is impossible to predict. We developed a test where hundreds of thousands of mutant cells are exposed to a drug simultaneously and those cells that modulate resistance survive. This method is innovative because it partners a high-throughput experimental protocol with a tailored statistical model to identify all mutations that modulate resistance. Finally, we used synthetic biology to re-create these mutations and demonstrate that they were, in fact, bona fide drug-resistant variants. These mutations were further extended and confirmed to also be resistant in the human orthologue. This combined biological-computational approach allows one to identify drug’s degree of resistance to both guide treatments and future drug discovery.
Collapse
Affiliation(s)
- Lai H. Wong
- Department of Pharmaceutical Sciences, University of British Columbia, Vancouver, Canada
| | - Sunita Sinha
- Department of Pharmaceutical Sciences, University of British Columbia, Vancouver, Canada
| | - Julien R. Bergeron
- Department of Biochemistry, University of Washington, Seattle, Washington, United States of America
| | | | - Guri Giaever
- Department of Pharmaceutical Sciences, University of British Columbia, Vancouver, Canada
| | - Patrick Flaherty
- Department of Mathematics and Statistics, University of Massachusetts, Amherst, Massachusetts, United States of America
- * E-mail: (PF); (CN)
| | - Corey Nislow
- Department of Pharmaceutical Sciences, University of British Columbia, Vancouver, Canada
- * E-mail: (PF); (CN)
| |
Collapse
|
232
|
The power of multiplexed functional analysis of genetic variants. Nat Protoc 2016; 11:1782-7. [PMID: 27583640 DOI: 10.1038/nprot.2016.135] [Citation(s) in RCA: 92] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 07/13/2016] [Indexed: 12/30/2022]
Abstract
New technologies have recently enabled saturation mutagenesis and functional analysis of nearly all possible variants of regulatory elements or proteins of interest in single experiments. Here we discuss the past, present, and future of such multiplexed (functional) assays for variant effects (MAVEs). MAVEs provide detailed insight into sequence-function relationships, and they may prove critical for the prospective clinical interpretation of genetic variants.
Collapse
|
233
|
Tripathi A, Gupta K, Khare S, Jain PC, Patel S, Kumar P, Pulianmackal AJ, Aghera N, Varadarajan R. Molecular Determinants of Mutant Phenotypes, Inferred from Saturation Mutagenesis Data. Mol Biol Evol 2016; 33:2960-2975. [PMID: 27563054 PMCID: PMC5062330 DOI: 10.1093/molbev/msw182] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Understanding how mutations affect protein activity and organismal fitness is a major challenge. We used saturation mutagenesis combined with deep sequencing to determine mutational sensitivity scores for 1,664 single-site mutants of the 101 residue Escherichia coli cytotoxin, CcdB at seven different expression levels. Active-site residues could be distinguished from buried ones, based on their differential tolerance to aliphatic and charged amino acid substitutions. At nonactive-site positions, the average mutational tolerance correlated better with depth from the protein surface than with accessibility. Remarkably, similar results were observed for two other small proteins, PDZ domain (PSD95pdz3) and IgG-binding domain of protein G (GB1). Mutational sensitivity data obtained with CcdB were used to derive a procedure for predicting functional effects of mutations. Results compared favorably with those of two widely used computational predictors. In vitro characterization of 80 single, nonactive-site mutants of CcdB showed that activity in vivo correlates moderately with thermal stability and solubility. The inability to refold reversibly, as well as a decreased folding rate in vitro, is associated with decreased activity in vivo. Upon probing the effect of modulating expression of various proteases and chaperones on mutant phenotypes, most deleterious mutants showed an increased in vivo activity and solubility only upon over-expression of either Trigger factor or SecB ATP-independent chaperones. Collectively, these data suggest that folding kinetics rather than protein stability is the primary determinant of activity in vivo. This study enhances our understanding of how mutations affect phenotype, as well as the ability to predict fitness effects of point mutations.
Collapse
Affiliation(s)
- Arti Tripathi
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Kritika Gupta
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Shruti Khare
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Pankaj C Jain
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Siddharth Patel
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Prasanth Kumar
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | | | - Nilesh Aghera
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Raghavan Varadarajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India Jawaharlal Nehru Center for Advanced Scientific Research, Bangalore, India
| |
Collapse
|
234
|
Rapid construction of metabolite biosensors using domain-insertion profiling. Nat Commun 2016; 7:12266. [PMID: 27470466 PMCID: PMC4974565 DOI: 10.1038/ncomms12266] [Citation(s) in RCA: 82] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Accepted: 06/15/2016] [Indexed: 12/15/2022] Open
Abstract
Single-fluorescent protein biosensors (SFPBs) are an important class of probes that enable the single-cell quantification of analytes in vivo. Despite advantages over other detection technologies, their use has been limited by the inherent challenges of their construction. Specifically, the rational design of green fluorescent protein (GFP) insertion into a ligand-binding domain, generating the requisite allosteric coupling, remains a rate-limiting step. Here, we describe an unbiased approach, termed domain-insertion profiling with DNA sequencing (DIP-seq), that combines the rapid creation of diverse libraries of potential SFPBs and high-throughput activity assays to identify functional biosensors. As a proof of concept, we construct an SFPB for the important regulatory sugar trehalose. DIP-seq analysis of a trehalose-binding-protein reveals allosteric hotspots for GFP insertion and results in high-dynamic range biosensors that function robustly in vivo. Taken together, DIP-seq simultaneously accelerates metabolite biosensor construction and provides a novel tool for interrogating protein allostery. In the construction of single fluorescent protein biosensors, selection of the insertion point of a fluorescent protein into a ligand-binding domain is a rate-limiting step. Here, the authors develop an unbiased, high-throughput approach, called domain insertion profiling with DNA sequencing (DIP-seq), to generate a novel trehalose biosensor.
Collapse
|
235
|
A Statistical Guide to the Design of Deep Mutational Scanning Experiments. Genetics 2016; 204:77-87. [PMID: 27412710 DOI: 10.1534/genetics.116.190462] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Accepted: 06/29/2016] [Indexed: 12/21/2022] Open
Abstract
The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates.
Collapse
|
236
|
Abstract
Human genome sequencing is routine and will soon be a staple in research and clinical genetics. However, the promise of sequencing is often just that, with genome data routinely failing to reveal useful insights about disease in general or a person's health in particular. Nowhere is this chasm between promise and progress more evident than in the designation, "variant of uncertain significance" (VUS). Although it serves an important role, careful consideration of VUS reveals it to be a nebulous description of genomic information and its relationship to disease, symptomatic of our inability to make even crude quantitative assertions about the disease risks conferred by many genetic variants. In this perspective, I discuss the challenge of "variant interpretation" and the value of comparative and functional genomic information in meeting that challenge. Although already essential, genomic annotations will become even more important as our analytical focus widens beyond coding exons. Combined with more genotype and phenotype data, they will help facilitate more quantitative and insightful assessments of the contributions of genetic variants to disease.
Collapse
Affiliation(s)
- Gregory M Cooper
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| |
Collapse
|
237
|
Wu NC, Dai L, Olson CA, Lloyd-Smith JO, Sun R. Adaptation in protein fitness landscapes is facilitated by indirect paths. eLife 2016; 5. [PMID: 27391790 PMCID: PMC4985287 DOI: 10.7554/elife.16965] [Citation(s) in RCA: 123] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 07/07/2016] [Indexed: 12/11/2022] Open
Abstract
The structure of fitness landscapes is critical for understanding adaptive protein evolution. Previous empirical studies on fitness landscapes were confined to either the neighborhood around the wild type sequence, involving mostly single and double mutants, or a combinatorially complete subgraph involving only two amino acids at each site. In reality, the dimensionality of protein sequence space is higher (20L) and there may be higher-order interactions among more than two sites. Here we experimentally characterized the fitness landscape of four sites in protein GB1, containing 204 = 160,000 variants. We found that while reciprocal sign epistasis blocked many direct paths of adaptation, such evolutionary traps could be circumvented by indirect paths through genotype space involving gain and subsequent loss of mutations. These indirect paths alleviate the constraint on adaptive protein evolution, suggesting that the heretofore neglected dimensions of sequence space may change our views on how proteins evolve. DOI:http://dx.doi.org/10.7554/eLife.16965.001 Proteins can evolve over time by changing their component parts, which are called amino acids. These changes usually happen one at a time and natural selection tends to preserve those changes that make the protein more efficient at its specific tasks, while discarding those that impair the protein’s activity. However the effect of each change depends on the protein as a whole, and so two changes that separately make the protein worse can make it much better if they occur together. This phenomenon is called epistasis and in some cases it can trap proteins in a sub-optimal form and prevent them from improving further. Proteins are made from twenty different kinds of amino acid, and there are millions of different combinations of amino acids that could, in theory, make a protein of a given length. Studying protein evolution involves making variants of the same protein, each with just a few changes, and comparing how efficient, or “fit”, they are. Previous studies only measured the fitness of a few variants and showed that epistasis could block protein evolution by requiring the protein to lose some fitness before it could improve further. However, new techniques have now made it easier to study protein evolution by testing many more protein variants. Wu, Dai et al. focused on four amino acids in part of a protein called GB1 and tested the efficiency of every possible combination of these four amino acids, a total of 160,000 (204) variants. Contrary to expectations, the results suggested that the protein could evolve quickly to maximise fitness despite there being epistasis between the four amino acids. Overcoming epistasis typically involved making a change to one amino acid that paved the way for further changes while avoiding the need to lose fitness. The original change could then be reversed once the epistasis was overcome. The complexity of this solution means it can only be seen by studying a large number of protein variants that represent many alternative sequences of protein changes. Wu, Dai et al. conclude that proteins are able to achieve a higher level of fitness through evolution by exploring a large number of changes. There are many possible changes for each protein and it is this variety that, despite epistasis, allows proteins to become naturally optimised for the tasks that they perform. While the full complexity of protein evolution cannot be explored at the moment, as technology advances it will become possible to study more protein variants. Such advances would therefore hopefully allow researchers to discover even more about the natural mechanisms of protein evolution. DOI:http://dx.doi.org/10.7554/eLife.16965.002
Collapse
Affiliation(s)
- Nicholas C Wu
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, United States.,Molecular Biology Institute, University of California, Los Angeles, Los Angeles, United States
| | - Lei Dai
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, United States.,Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, United States
| | - C Anders Olson
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, United States
| | - James O Lloyd-Smith
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, United States
| | - Ren Sun
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, United States.,Molecular Biology Institute, University of California, Los Angeles, Los Angeles, United States
| |
Collapse
|
238
|
Mannakee BK, Gutenkunst RN. Selection on Network Dynamics Drives Differential Rates of Protein Domain Evolution. PLoS Genet 2016; 12:e1006132. [PMID: 27380265 PMCID: PMC4933380 DOI: 10.1371/journal.pgen.1006132] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Accepted: 05/27/2016] [Indexed: 11/19/2022] Open
Abstract
The long-held principle that functionally important proteins evolve slowly has recently been challenged by studies in mice and yeast showing that the severity of a protein knockout only weakly predicts that protein's rate of evolution. However, the relevance of these studies to evolutionary changes within proteins is unknown, because amino acid substitutions, unlike knockouts, often only slightly perturb protein activity. To quantify the phenotypic effect of small biochemical perturbations, we developed an approach to use computational systems biology models to measure the influence of individual reaction rate constants on network dynamics. We show that this dynamical influence is predictive of protein domain evolutionary rate within networks in vertebrates and yeast, even after controlling for expression level and breadth, network topology, and knockout effect. Thus, our results not only demonstrate the importance of protein domain function in determining evolutionary rate, but also the power of systems biology modeling to uncover unanticipated evolutionary forces.
Collapse
Affiliation(s)
- Brian K. Mannakee
- Division of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, Arizona, United States of America
| | - Ryan N. Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, Arizona, United States of America
- * E-mail:
| |
Collapse
|
239
|
Stiffler MA, Subramanian SK, Salinas VH, Ranganathan R. A Protocol for Functional Assessment of Whole-Protein Saturation Mutagenesis Libraries Utilizing High-Throughput Sequencing. J Vis Exp 2016. [PMID: 27403811 DOI: 10.3791/54119] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
Site-directed mutagenesis has long been used as a method to interrogate protein structure, function and evolution. Recent advances in massively-parallel sequencing technology have opened up the possibility of assessing the functional or fitness effects of large numbers of mutations simultaneously. Here, we present a protocol for experimentally determining the effects of all possible single amino acid mutations in a protein of interest utilizing high-throughput sequencing technology, using the 263 amino acid antibiotic resistance enzyme TEM-1 β-lactamase as an example. In this approach, a whole-protein saturation mutagenesis library is constructed by site-directed mutagenic PCR, randomizing each position individually to all possible amino acids. The library is then transformed into bacteria, and selected for the ability to confer resistance to β-lactam antibiotics. The fitness effect of each mutation is then determined by deep sequencing of the library before and after selection. Importantly, this protocol introduces methods which maximize sequencing read depth and permit the simultaneous selection of the entire mutation library, by mixing adjacent positions into groups of length accommodated by high-throughput sequencing read length and utilizing orthogonal primers to barcode each group. Representative results using this protocol are provided by assessing the fitness effects of all single amino acid mutations in TEM-1 at a clinically relevant dosage of ampicillin. The method should be easily extendable to other proteins for which a high-throughput selection assay is in place.
Collapse
Affiliation(s)
- Michael A Stiffler
- Green Center for Systems Biology, University of Texas Southwestern Medical Center;
| | - Subu K Subramanian
- Green Center for Systems Biology, University of Texas Southwestern Medical Center
| | - Victor H Salinas
- Green Center for Systems Biology, University of Texas Southwestern Medical Center
| | - Rama Ranganathan
- Green Center for Systems Biology, University of Texas Southwestern Medical Center;
| |
Collapse
|
240
|
Starr TN, Thornton JW. Epistasis in protein evolution. Protein Sci 2016; 25:1204-18. [PMID: 26833806 PMCID: PMC4918427 DOI: 10.1002/pro.2897] [Citation(s) in RCA: 296] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 01/25/2016] [Accepted: 01/27/2016] [Indexed: 01/18/2023]
Abstract
The structure, function, and evolution of proteins depend on physical and genetic interactions among amino acids. Recent studies have used new strategies to explore the prevalence, biochemical mechanisms, and evolutionary implications of these interactions-called epistasis-within proteins. Here we describe an emerging picture of pervasive epistasis in which the physical and biological effects of mutations change over the course of evolution in a lineage-specific fashion. Epistasis can restrict the trajectories available to an evolving protein or open new paths to sequences and functions that would otherwise have been inaccessible. We describe two broad classes of epistatic interactions, which arise from different physical mechanisms and have different effects on evolutionary processes. Specific epistasis-in which one mutation influences the phenotypic effect of few other mutations-is caused by direct and indirect physical interactions between mutations, which nonadditively change the protein's physical properties, such as conformation, stability, or affinity for ligands. In contrast, nonspecific epistasis describes mutations that modify the effect of many others; these typically behave additively with respect to the physical properties of a protein but exhibit epistasis because of a nonlinear relationship between the physical properties and their biological effects, such as function or fitness. Both types of interaction are rampant, but specific epistasis has stronger effects on the rate and outcomes of evolution, because it imposes stricter constraints and modulates evolutionary potential more dramatically; it therefore makes evolution more contingent on low-probability historical events and leaves stronger marks on the sequences, structures, and functions of protein families.
Collapse
Affiliation(s)
- Tyler N Starr
- Graduate Program in Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, 60637
| | - Joseph W Thornton
- Departments of Ecology and Evolution and Human Genetics, University of Chicago, Chicago, Illinois, 60637
| |
Collapse
|
241
|
Abriata LA, Bovigny C, Dal Peraro M. Detection and sequence/structure mapping of biophysical constraints to protein variation in saturated mutational libraries and protein sequence alignments with a dedicated server. BMC Bioinformatics 2016; 17:242. [PMID: 27315797 PMCID: PMC4912743 DOI: 10.1186/s12859-016-1124-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 06/07/2016] [Indexed: 11/21/2022] Open
Abstract
Background Protein variability can now be studied by measuring high-resolution tolerance-to-substitution maps and fitness landscapes in saturated mutational libraries. But these rich and expensive datasets are typically interpreted coarsely, restricting detailed analyses to positions of extremely high or low variability or dubbed important beforehand based on existing knowledge about active sites, interaction surfaces, (de)stabilizing mutations, etc. Results Our new webserver PsychoProt (freely available without registration at http://psychoprot.epfl.ch or at http://lucianoabriata.altervista.org/psychoprot/index.html) helps to detect, quantify, and sequence/structure map the biophysical and biochemical traits that shape amino acid preferences throughout a protein as determined by deep-sequencing of saturated mutational libraries or from large alignments of naturally occurring variants. Discussion We exemplify how PsychoProt helps to (i) unveil protein structure-function relationships from experiments and from alignments that are consistent with structures according to coevolution analysis, (ii) recall global information about structural and functional features and identify hitherto unknown constraints to variation in alignments, and (iii) point at different sources of variation among related experimental datasets or between experimental and alignment-based data. Remarkably, metabolic costs of the amino acids pose strong constraints to variability at protein surfaces in nature but not in the laboratory. This and other differences call for caution when extrapolating results from in vitro experiments to natural scenarios in, for example, studies of protein evolution. Conclusion We show through examples how PsychoProt can be a useful tool for the broad communities of structural biology and molecular evolution, particularly for studies about protein modeling, evolution and design. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1124-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Luciano A Abriata
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland.
| | - Christophe Bovigny
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland.,Present address: Molecular Modeling Group, Swiss Institute of Bioinformatics, UNIL, Bâtiment Génopode, Lausanne, 1015, Switzerland
| | - Matteo Dal Peraro
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland
| |
Collapse
|
242
|
Local fitness landscape of the green fluorescent protein. Nature 2016; 533:397-401. [PMID: 27193686 PMCID: PMC4968632 DOI: 10.1038/nature17995] [Citation(s) in RCA: 282] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 04/07/2016] [Indexed: 01/16/2023]
Abstract
Fitness landscapes1,2, depictions of how genotypes manifest at the phenotypic level, form the basis for our understanding of many areas of biology2–7 yet their properties remain elusive. Studies addressing this issue often consider specific genes and their function as proxy for fitness2,4, experimentally assessing the impact on function of single mutations and their combinations in a specific sequence2,5,8–15 or in different sequences2,3,5,16–18. However, systematic high-throughput studies of the local fitness landscape of an entire protein have not yet been reported. Here, we chart an extensive region of the local fitness landscape of the green fluorescent protein from Aequorea victoria (avGFP) by measuring the native function, fluorescence, of tens of thousands of derivative genotypes of avGFP. We find that its fitness landscape is narrow, with half of genotypes with two mutations showing reduced fluorescence and half of genotypes with five mutations being completely non-fluorescent. The narrowness is enhanced by epistasis, which was detected in up to 30% of genotypes with multiple mutations arising mostly through the cumulative impact of slightly deleterious mutations causing a threshold-like decrease of protein stability and concomitant loss of fluorescence. A model of orthologous sequence divergence spanning hundreds of millions of years predicted the extent of epistasis in our data, indicating congruence between the fitness landscape properties at the local and global scales. The characterization of the local fitness landscape of avGFP has important implications for a number of fields including molecular evolution, population genetics and protein design.
Collapse
|
243
|
Boyer S, Biswas D, Kumar Soshee A, Scaramozzino N, Nizak C, Rivoire O. Hierarchy and extremes in selections from pools of randomized proteins. Proc Natl Acad Sci U S A 2016; 113:3482-7. [PMID: 26969726 PMCID: PMC4822605 DOI: 10.1073/pnas.1517813113] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Variation and selection are the core principles of Darwinian evolution, but quantitatively relating the diversity of a population to its capacity to respond to selection is challenging. Here, we examine this problem at a molecular level in the context of populations of partially randomized proteins selected for binding to well-defined targets. We built several minimal protein libraries, screened them in vitro by phage display, and analyzed their response to selection by high-throughput sequencing. A statistical analysis of the results reveals two main findings. First, libraries with the same sequence diversity but built around different "frameworks" typically have vastly different responses; second, the distribution of responses of the best binders in a library follows a simple scaling law. We show how an elementary probabilistic model based on extreme value theory rationalizes the latter finding. Our results have implications for designing synthetic protein libraries, estimating the density of functional biomolecules in sequence space, characterizing diversity in natural populations, and experimentally investigating evolvability (i.e., the potential for future evolution).
Collapse
Affiliation(s)
- Sébastien Boyer
- Laboratoire Interdisciplinaire de Physique, CNRS and Université Grenoble Alpes, 38000 Grenoble, France
| | - Dipanwita Biswas
- Laboratoire Interdisciplinaire de Physique, CNRS and Université Grenoble Alpes, 38000 Grenoble, France
| | - Ananda Kumar Soshee
- Laboratoire Interdisciplinaire de Physique, CNRS and Université Grenoble Alpes, 38000 Grenoble, France
| | - Natale Scaramozzino
- Laboratoire Interdisciplinaire de Physique, CNRS and Université Grenoble Alpes, 38000 Grenoble, France
| | - Clément Nizak
- Laboratoire de Biochimie, Chimie-Biologie-Innovation UMR8231, CNRS and Ecole Supérieure de Physique et Chimie Industrielles ParisTech, Paris Sciences & Lettres Research University, 75005 Paris, France
| | - Olivier Rivoire
- Laboratoire Interdisciplinaire de Physique, CNRS and Université Grenoble Alpes, 38000 Grenoble, France;
| |
Collapse
|
244
|
Peterman N, Levine E. Sort-seq under the hood: implications of design choices on large-scale characterization of sequence-function relations. BMC Genomics 2016; 17:206. [PMID: 26956374 PMCID: PMC4784318 DOI: 10.1186/s12864-016-2533-5] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 02/25/2016] [Indexed: 12/22/2022] Open
Abstract
Background Sort-seq is an effective approach for simultaneous activity measurements in a large-scale library, combining flow cytometry, deep sequencing, and statistical inference. Such assays enable the characterization of functional landscapes at unprecedented scale for a wide-reaching array of biological molecules and functionalities in vivo. Applications of sort-seq range from footprinting to establishing quantitative models of biological systems and rational design of synthetic genetic elements. Nearly as diverse are implementations of this technique, reflecting key design choices with extensive impact on the scope and accuracy the results. Yet how to make these choices remains unclear. Here we investigate the effects of alternative sort-seq designs and inference methods on the information output using mathematical formulation and simulations. Results We identify key intrinsic properties of any system of interest with practical implications for sort-seq assays, depending on the experimental goals. The fluorescence range and cell-to-cell variability specify the number of sorted populations needed for quantitative measurements that are precise and unbiased. These factors also indicate cases where an enrichment-based approach that uses a single sorted population can offer satisfactory results. These predications of our model are corroborated using re-analysis of published data. We explore implications of these results for quantitative modeling and library design. Conclusions Sort-seq assays can be streamlined by reducing the number of sorted populations, saving considerable resources. Simple preliminary experiments can guide optimal experiment design, minimizing cost while maintaining the maximal information output and avoiding latent biases. These insights can facilitate future applications of this highly adaptable technique. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2533-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Neil Peterman
- Department of Physics and FAS Center for Systems Biology, Harvard University, 17 Oxford St., Cambridge, MA, USA
| | - Erel Levine
- Department of Physics and FAS Center for Systems Biology, Harvard University, 17 Oxford St., Cambridge, MA, USA.
| |
Collapse
|
245
|
Phillips AM, Shoulders MD. The Path of Least Resistance: Mechanisms to Reduce Influenza's Sensitivity to Oseltamivir. J Mol Biol 2016; 428:533-537. [PMID: 26748011 DOI: 10.1016/j.jmb.2015.12.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Angela M Phillips
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Matthew D Shoulders
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
246
|
A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing. BMC Genomics 2016; 17:108. [PMID: 26868371 PMCID: PMC4751728 DOI: 10.1186/s12864-016-2388-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 01/08/2016] [Indexed: 11/10/2022] Open
Abstract
Background The high error rate of next generation sequencing (NGS) restricts some of its applications, such as monitoring virus mutations and detecting rare mutations in tumors. There are two commonly employed sequencing library preparation strategies to improve sequencing accuracy by correcting sequencing errors: read-pairing method and tag-clustering method (i.e. primer ID or UID). Here, we constructed a homogeneous library from a single clone, and compared the variant calling accuracy of these error-correction methods. Result We comprehensively described the strengths and pitfalls of these methods. We found that both read-pairing and tag-clustering methods significantly decreased sequencing error rate. While the read-pairing method was more effective than the tag-clustering method at correcting insertion and deletion errors, it was not as effective as the tag-clustering method at correcting substitution errors. In addition, we observed that when the read quality was poor, the tag-clustering method led to huge coverage loss. We also tested the effect of applying quality score filtering to the error-correction methods and demonstrated that quality score filtering was able to impose a minor, yet statistically significant improvement to the error-correction methods tested in this study. Conclusion Our study provides a benchmark for researchers to select suitable error-correction methods based on the goal of the experiment by balancing the trade-off between sequencing cost (i.e. sequencing coverage requirement) and detection sensitivity. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2388-9) contains supplementary material, which is available to authorized users.
Collapse
|
247
|
Echave J, Spielman SJ, Wilke CO. Causes of evolutionary rate variation among protein sites. Nat Rev Genet 2016; 17:109-21. [PMID: 26781812 DOI: 10.1038/nrg.2015.18] [Citation(s) in RCA: 176] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
It has long been recognized that certain sites within a protein, such as sites in the protein core or catalytic residues in enzymes, are evolutionarily more conserved than other sites. However, our understanding of rate variation among sites remains surprisingly limited. Recent progress to address this includes the development of a wide array of reliable methods to estimate site-specific substitution rates from sequence alignments. In addition, several molecular traits have been identified that correlate with site-specific mutation rates, and novel mechanistic biophysical models have been proposed to explain the observed correlations. Nonetheless, current models explain, at best, approximately 60% of the observed variance, highlighting the limitations of current methods and models and the need for new research directions.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina
| | - Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
248
|
Tinberg CE, Khare SD. Improving Binding Affinity and Selectivity of Computationally Designed Ligand-Binding Proteins Using Experiments. Methods Mol Biol 2016; 1414:155-171. [PMID: 27094290 DOI: 10.1007/978-1-4939-3569-7_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The ability to de novo design proteins that can bind small molecules has wide implications for synthetic biology and medicine. Combining computational protein design with the high-throughput screening of mutagenic libraries of computationally designed proteins is emerging as a general approach for creating binding proteins with programmable binding modes, affinities, and selectivities. The computational step enables the creation of a binding site in a protein that otherwise does not (measurably) bind the intended ligand, and targeted mutagenic screening allows for validation and refinement of the computational model as well as provides orders-of-magnitude increases in the binding affinity. Deep sequencing of mutagenic libraries can provide insights into the mutagenic binding landscape and enable further affinity improvements. Moreover, in such a combined computational-experimental approach where the binding mode is preprogrammed and iteratively refined, selectivity can be achieved (and modulated) by the placement of specified amino acid side chain groups around the ligand in defined orientations. Here, we describe the experimental aspects of a combined computational-experimental approach for designing-using the software suite Rosetta-proteins that bind a small molecule of choice and engineering, using fluorescence-activated cell sorting and high-throughput yeast surface display, high affinity and ligand selectivity. We illustrated the utility of this approach by performing the design of a selective digoxigenin (DIG)-binding protein that, after affinity maturation, binds DIG with picomolar affinity and high selectivity over structurally related steroids.
Collapse
Affiliation(s)
- Christine E Tinberg
- Department of Biochemistry, University of Washington, Seattle, WA, 98109, USA.
- Amgen, South San Francisco, CA, 94080, USA.
| | - Sagar D Khare
- Department of Chemistry and Chemical Biology, Rutgers State University of New Jersey, Piscataway, NJ, 08854, USA
- Center for Integrative Proteomics Research, Rutgers State University of New Jersey, Piscataway, NJ, 08854, USA
| |
Collapse
|
249
|
Generating High-Accuracy Peptide-Binding Data in High Throughput with Yeast Surface Display and SORTCERY. Methods Mol Biol 2016; 1414:233-47. [PMID: 27094295 DOI: 10.1007/978-1-4939-3569-7_14] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Library methods are widely used to study protein-protein interactions, and high-throughput screening or selection followed by sequencing can identify a large number of peptide ligands for a protein target. In this chapter, we describe a procedure called "SORTCERY" that can rank the affinities of library members for a target with high accuracy. SORTCERY follows a three-step protocol. First, fluorescence-activated cell sorting (FACS) is used to sort a library of yeast-displayed peptide ligands according to their affinities for a target. Second, all sorted pools are deep sequenced. Third, the resulting data are analyzed to create a ranking. We demonstrate an application of SORTCERY to the problem of ranking peptide ligands for the anti-apoptotic regulator Bcl-xL.
Collapse
|
250
|
Sahoo A, Khare S, Devanarayanan S, Jain PC, Varadarajan R. Residue proximity information and protein model discrimination using saturation-suppressor mutagenesis. eLife 2015; 4. [PMID: 26716404 PMCID: PMC4758949 DOI: 10.7554/elife.09532] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 12/29/2015] [Indexed: 12/16/2022] Open
Abstract
Identification of residue-residue contacts from primary sequence can be used to guide protein structure prediction. Using Escherichia coli CcdB as the test case, we describe an experimental method termed saturation-suppressor mutagenesis to acquire residue contact information. In this methodology, for each of five inactive CcdB mutants, exhaustive screens for suppressors were performed. Proximal suppressors were accurately discriminated from distal suppressors based on their phenotypes when present as single mutants. Experimentally identified putative proximal pairs formed spatial constraints to recover >98% of native-like models of CcdB from a decoy dataset. Suppressor methodology was also applied to the integral membrane protein, diacylglycerol kinase A where the structures determined by X-ray crystallography and NMR were significantly different. Suppressor as well as sequence co-variation data clearly point to the X-ray structure being the functional one adopted in vivo. The methodology is applicable to any macromolecular system for which a convenient phenotypic assay exists. DOI:http://dx.doi.org/10.7554/eLife.09532.001 Common techniques to determine the three-dimensional structures of proteins can help researchers to understand these molecules’ activities, but are often time-consuming and do not work for all proteins. Proteins are made of chains of amino acids. When a protein chain folds, some of these amino acids interact with other amino acids and these contacts dictate the overall shape of the protein. This means that identifying the pairs of contacting amino acids could make it possible to predict the protein’s structure. Interactions between pairs of contacting amino acids tend to remain conserved throughout evolution, and if a mutation alters one of the amino acids in a pair then a 'compensatory' change often occurs to alter the second amino acid as well. Compensatory mutations can suggest that two amino acids are close to each other in the three-dimensional shape of a protein, but the computational methods used to identify such amino acid pairs can sometimes be inaccurate. In 2012, researchers generated mutants of a bacterial protein called CcdB with changes to single amino acids that caused the protein to fail to fold correctly. Now, Sahoo et al. – who include two of the researchers involved in the 2012 work – have developed an experimental method to identify contacting amino acids and use the CcdB protein as a test case. The approach involved searching for additional mutations that could restore the activity of five of the original mutant proteins when the proteins were produced in yeast cells. The rationale was that any secondary mutations that restored the activity must have corrected the folding defect caused by the original mutation. Sahoo et al. then predicted how close the amino acids affected by the secondary mutations were to the amino acids altered by the original mutations. This information was used to select reliable three-dimensional models of CcdB from a large set of possible structures that had been generated previously using computer models. Next, the technique was applied to a protein called diacylglycerol kinase A. The structure of this protein had previously been inferred using techniques such as X-ray crystallography and nuclear magnetic resonance, but there was a mismatch between the two methods. Sahoo et al. found that the amino acid contacts derived from their experimental method matched those found in the crystal structure, suggesting that the functional protein structure in living cells is similar to the crystal structure. In the future, the experimental approach developed in this work could be combined with existing methods to reliably guide protein structure prediction. DOI:http://dx.doi.org/10.7554/eLife.09532.002
Collapse
Affiliation(s)
- Anusmita Sahoo
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Shruti Khare
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | | | - Pankaj C Jain
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Raghavan Varadarajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India.,Jawaharlal Nehru Center for Advanced Scientific Research, Bangalore, India
| |
Collapse
|