1
|
Faure AJ, Martí-Aranda A, Hidalgo-Carcedo C, Beltran A, Schmiedel JM, Lehner B. The genetic architecture of protein stability. Nature 2024; 634:995-1003. [PMID: 39322666 PMCID: PMC11499273 DOI: 10.1038/s41586-024-07966-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 08/20/2024] [Indexed: 09/27/2024]
Abstract
There are more ways to synthesize a 100-amino acid (aa) protein (20100) than there are atoms in the universe. Only a very small fraction of such a vast sequence space can ever be experimentally or computationally surveyed. Deep neural networks are increasingly being used to navigate high-dimensional sequence spaces1. However, these models are extremely complicated. Here, by experimentally sampling from sequence spaces larger than 1010, we show that the genetic architecture of at least some proteins is remarkably simple, allowing accurate genetic prediction in high-dimensional sequence spaces with fully interpretable energy models. These models capture the nonlinear relationships between free energies and phenotypes but otherwise consist of additive free energy changes with a small contribution from pairwise energetic couplings. These energetic couplings are sparse and associated with structural contacts and backbone proximity. Our results indicate that protein genetics is actually both rather simple and intelligible.
Collapse
Affiliation(s)
- Andre J Faure
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- ALLOX, Barcelona, Spain.
| | - Aina Martí-Aranda
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Cristina Hidalgo-Carcedo
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Antoni Beltran
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Jörn M Schmiedel
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- factorize.bio, Berlin, Germany
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
2
|
Duan B, Qiu C, Lockless SW, Sze SH, Kaplan CD. Higher-order epistasis within Pol II trigger loop haplotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576280. [PMID: 38293233 PMCID: PMC10827151 DOI: 10.1101/2024.01.20.576280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
RNA polymerase II (Pol II) has a highly conserved domain, the trigger loop (TL), that controls transcription fidelity and speed. We previously probed pairwise genetic interactions between residues within and surrounding the TL for the purpose of understand functional interactions between residues and to understand how individual mutants might alter TL function. We identified widespread incompatibility between TLs of different species when placed in the Saccharomyces cerevisiae Pol II context, indicating species-specific interactions between otherwise highly conserved TLs and its surroundings. These interactions represent epistasis between TL residues and the rest of Pol II. We sought to understand why certain TL sequences are incompatible with S. cerevisiae Pol II and to dissect the nature of genetic interactions within multiply substituted TLs as a window on higher order epistasis in this system. We identified both positive and negative higher-order residue interactions within example TL haplotypes. Intricate higher-order epistasis formed by TL residues was sometimes only apparent from analysis of intermediate genotypes, emphasizing complexity of epistatic interactions. Furthermore, we distinguished TL substitutions with distinct classes of epistatic patterns, suggesting specific TL residues that potentially influence TL evolution. Our examples of complex residue interactions suggest possible pathways for epistasis to facilitate Pol II evolution.
Collapse
Affiliation(s)
- Bingbing Duan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260
| | - Chenxi Qiu
- Department of Genetics, Harvard Medical School, Boston, MA 02215
| | - Steve W Lockless
- Department of Biology, Texas A&M University, College Station, TX 77843
| | - Sing-Hoi Sze
- Department of Computer Science & Engineering, Texas A&M University, College Station, TX 77843
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, TX 77843
| | - Craig D Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260
| |
Collapse
|
3
|
Park Y, Metzger BPH, Thornton JW. The simplicity of protein sequence-function relationships. Nat Commun 2024; 15:7953. [PMID: 39261454 PMCID: PMC11390738 DOI: 10.1038/s41467-024-51895-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 08/20/2024] [Indexed: 09/13/2024] Open
Abstract
How complex are the rules by which a protein's sequence determines its function? High-order epistatic interactions among residues are thought to be pervasive, suggesting an idiosyncratic and unpredictable sequence-function relationship. But many prior studies may have overestimated epistasis, because they analyzed sequence-function relationships relative to a single reference sequence-which causes measurement noise and local idiosyncrasies to snowball into high-order epistasis-or they did not fully account for global nonlinearities. Here we present a reference-free method that jointly infers specific epistatic interactions and global nonlinearity using a bird's-eye view of sequence space. This technique yields the simplest explanation of sequence-function relationships and is more robust than existing methods to measurement noise, missing data, and model misspecification. We reanalyze 20 experimental datasets and find that context-independent amino acid effects and pairwise interactions, along with a simple nonlinearity to account for limited dynamic range, explain a median of 96% of phenotypic variance and over 92% in every case. Only a tiny fraction of genotypes are strongly affected by higher-order epistasis. Sequence-function relationships are also sparse: a miniscule fraction of amino acids and interactions account for 90% of phenotypic variance. Sequence-function causality across these datasets is therefore simple, opening the way for tractable approaches to characterize proteins' genetic architecture.
Collapse
Affiliation(s)
- Yeonwoo Park
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL, USA
- Center for RNA Research, Institute for Basic Science, Seoul, Republic of Korea
| | - Brian P H Metzger
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Joseph W Thornton
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA.
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
4
|
Cheng P, Mao C, Tang J, Yang S, Cheng Y, Wang W, Gu Q, Han W, Chen H, Li S, Chen Y, Zhou J, Li W, Pan A, Zhao S, Huang X, Zhu S, Zhang J, Shu W, Wang S. Zero-shot prediction of mutation effects with multimodal deep representation learning guides protein engineering. Cell Res 2024; 34:630-647. [PMID: 38969803 PMCID: PMC11369238 DOI: 10.1038/s41422-024-00989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 06/03/2024] [Indexed: 07/07/2024] Open
Abstract
Mutations in amino acid sequences can provoke changes in protein function. Accurate and unsupervised prediction of mutation effects is critical in biotechnology and biomedicine, but remains a fundamental challenge. To resolve this challenge, here we present Protein Mutational Effect Predictor (ProMEP), a general and multiple sequence alignment-free method that enables zero-shot prediction of mutation effects. A multimodal deep representation learning model embedded in ProMEP was developed to comprehensively learn both sequence and structure contexts from ~160 million proteins. ProMEP achieves state-of-the-art performance in mutational effect prediction and accomplishes a tremendous improvement in speed, enabling efficient and intelligent protein engineering. Specifically, ProMEP accurately forecasts mutational consequences on the gene-editing enzymes TnpB and TadA, and successfully guides the development of high-performance gene-editing tools with their engineered variants. The gene-editing efficiency of a 5-site mutant of TnpB reaches up to 74.04% (vs 24.66% for the wild type); and the base editing tool developed on the basis of a TadA 15-site mutant (in addition to the A106V/D108N double mutation that renders deoxyadenosine deaminase activity to TadA) exhibits an A-to-G conversion frequency of up to 77.27% (vs 69.80% for ABE8e, a previous TadA-based adenine base editor) with significantly reduced bystander and off-target effects compared to ABE8e. ProMEP not only showcases superior performance in predicting mutational effects on proteins but also demonstrates a great capability to guide protein engineering. Therefore, ProMEP enables efficient exploration of the gigantic protein space and facilitates practical design of proteins, thereby advancing studies in biomedicine and synthetic biology.
Collapse
Affiliation(s)
- Peng Cheng
- Bioinformatics Center of AMMS, Beijing, China
| | - Cong Mao
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Jin Tang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Sen Yang
- Bioinformatics Center of AMMS, Beijing, China
| | - Yu Cheng
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wuke Wang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Qiuxi Gu
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wei Han
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Hao Chen
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Sihan Li
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | | | | | - Wuju Li
- Bioinformatics Center of AMMS, Beijing, China
| | - Aimin Pan
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Suwen Zhao
- iHuman Institute, ShanghaiTech University, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xingxu Huang
- Zhejiang Lab, Hangzhou, Zhejiang, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | | | - Jun Zhang
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China.
| | - Wenjie Shu
- Bioinformatics Center of AMMS, Beijing, China.
| | | |
Collapse
|
5
|
Ouyang WO, Lv H, Liu W, Mou Z, Lei R, Pholcharee T, Wang Y, Dailey KE, Gopal AB, Choi D, Ardagh MR, Talmage L, Rodriguez LA, Dai X, Wu NC. Rapid synthesis and screening of natively paired antibodies against influenza hemagglutinin stem via oPool + display. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.30.610421. [PMID: 39257766 PMCID: PMC11383711 DOI: 10.1101/2024.08.30.610421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Antibody discovery is crucial for developing therapeutics and vaccines as well as understanding adaptive immunity. However, the lack of approaches to synthesize antibodies with defined sequences in a high-throughput manner represents a major bottleneck in antibody discovery. Here, we presented oPool+ display, which combines oligo pool synthesis and mRNA display to construct and characterize many natively paired antibodies in parallel. As a proof-of-concept, we applied oPool+ display to rapidly screen the binding activity of >300 natively paired influenza hemagglutinin (HA) antibodies against the conserved HA stem domain. Structural analysis of 16.ND.92, one of the identified HA stem antibodies, revealed a unique binding mode distinct from other known broadly neutralizing HA stem antibodies with convergent sequence features. Yet, despite such differences, 16.ND.92 remained broadly reactive and conferred in vivo protection. Overall, this study not only established an experimental platform that can be applied in both research and therapeutics to accelerate antibody discovery, but also provides molecular insights into antibody responses to the influenza HA stem, which is a major target for universal influenza vaccine development.
Collapse
Affiliation(s)
- Wenhao O. Ouyang
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Huibin Lv
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Wenkan Liu
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Zongjun Mou
- Department of Physiology and Biophysics, Case Western Reserve University, School of Medicine, Cleveland, OH 44106, USA
| | - Ruipeng Lei
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Tossapol Pholcharee
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Yiquan Wang
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Katrine E. Dailey
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Akshita B. Gopal
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Danbi Choi
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Madison R. Ardagh
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Logan Talmage
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Lucia A. Rodriguez
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Xinghong Dai
- Department of Physiology and Biophysics, Case Western Reserve University, School of Medicine, Cleveland, OH 44106, USA
| | - Nicholas C. Wu
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Carle Illinois College of Medicine, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
6
|
Guclu TF, Atilgan AR, Atilgan C. Deciphering GB1's Single Mutational Landscape: Insights from MuMi Analysis. J Phys Chem B 2024; 128:7987-7996. [PMID: 39115184 DOI: 10.1021/acs.jpcb.4c04916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2024]
Abstract
Mutational changes that affect the binding of the C2 fragment of Streptococcal protein G (GB1) to the Fc domain of human IgG (IgG-Fc) have been extensively studied using deep mutational scanning (DMS), and the binding affinity of all single mutations has been measured experimentally in the literature. To investigate the underlying molecular basis, we perform in silico mutational scanning for all possible single mutations, along with 2 μs-long molecular dynamics (WT-MD) of the wild-type (WT) GB1 in both unbound and IgG-Fc bound forms. We compute the hydrogen bonds between GB1 and IgG-Fc in WT-MD to identify the dominant hydrogen bonds for binding, which we then assess in conformations produced by Mutation and Minimization (MuMi) to explain the fitness landscape of GB1 and IgG-Fc binding. Furthermore, we analyze MuMi and WT-MD to investigate the dynamics of binding, focusing on the relative solvent accessibility of residues and the probability of residues being located at the binding interface. With these analyses, we explain the interactions between GB1 and IgG-Fc and display the structural features of binding. In sum, our findings highlight the potential of MuMi as a reliable and computationally efficient tool for predicting protein fitness landscapes, offering significant advantages over traditional methods. The methodologies and results presented in this study pave the way for improved predictive accuracy in protein stability and interaction studies, which are crucial for advancements in drug design and synthetic biology.
Collapse
Affiliation(s)
- Tandac F Guclu
- Faculty of Natural Sciences and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| | - Ali Rana Atilgan
- Faculty of Natural Sciences and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| | - Canan Atilgan
- Faculty of Natural Sciences and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| |
Collapse
|
7
|
Johnston KE, Almhjell PJ, Watkins-Dulaney EJ, Liu G, Porter NJ, Yang J, Arnold FH. A combinatorially complete epistatic fitness landscape in an enzyme active site. Proc Natl Acad Sci U S A 2024; 121:e2400439121. [PMID: 39074291 PMCID: PMC11317637 DOI: 10.1073/pnas.2400439121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Accepted: 06/17/2024] [Indexed: 07/31/2024] Open
Abstract
Protein engineering often targets binding pockets or active sites which are enriched in epistasis-nonadditive interactions between amino acid substitutions-and where the combined effects of multiple single substitutions are difficult to predict. Few existing sequence-fitness datasets capture epistasis at large scale, especially for enzyme catalysis, limiting the development and assessment of model-guided enzyme engineering approaches. We present here a combinatorially complete, 160,000-variant fitness landscape across four residues in the active site of an enzyme. Assaying the native reaction of a thermostable β-subunit of tryptophan synthase (TrpB) in a nonnative environment yielded a landscape characterized by significant epistasis and many local optima. These effects prevent simulated directed evolution approaches from efficiently reaching the global optimum. There is nonetheless wide variability in the effectiveness of different directed evolution approaches, which together provide experimental benchmarks for computational and machine learning workflows. The most-fit TrpB variants contain a substitution that is nearly absent in natural TrpB sequences-a result that conservation-based predictions would not capture. Thus, although fitness prediction using evolutionary data can enrich in more-active variants, these approaches struggle to identify and differentiate among the most-active variants, even for this near-native function. Overall, this work presents a large-scale testing ground for model-guided enzyme engineering and suggests that efficient navigation of epistatic fitness landscapes can be improved by advances in both machine learning and physical modeling.
Collapse
Affiliation(s)
- Kadina E. Johnston
- Division of Biology and Bioengineering, California Institute of Technology, Pasadena, CA91125
| | - Patrick J. Almhjell
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA91125
| | - Ella J. Watkins-Dulaney
- Division of Biology and Bioengineering, California Institute of Technology, Pasadena, CA91125
| | - Grace Liu
- Division of Biology and Bioengineering, California Institute of Technology, Pasadena, CA91125
| | - Nicholas J. Porter
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA91125
| | - Jason Yang
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA91125
| | - Frances H. Arnold
- Division of Biology and Bioengineering, California Institute of Technology, Pasadena, CA91125
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA91125
| |
Collapse
|
8
|
Freschlin CR, Fahlberg SA, Heinzelman P, Romero PA. Neural network extrapolation to distant regions of the protein fitness landscape. Nat Commun 2024; 15:6405. [PMID: 39080282 PMCID: PMC11289474 DOI: 10.1038/s41467-024-50712-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 07/13/2024] [Indexed: 08/02/2024] Open
Abstract
Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks' capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models' extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. We also find that implementing a simple ensemble of convolutional neural networks enables robust design of high-performing variants in the local landscape. Our findings highlight how each architecture's inductive biases prime them to learn different aspects of the protein fitness landscape and how a simple ensembling approach makes protein engineering more robust.
Collapse
Affiliation(s)
- Chase R Freschlin
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Sarah A Fahlberg
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Pete Heinzelman
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Philip A Romero
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Chemical & Biological Engineering, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
9
|
Chamness LM, Kuntz CP, McKee AG, Penn WD, Hemmerich CM, Rusch DB, Woods H, Dyotima, Meiler J, Schlebach JP. Divergent folding-mediated epistasis among unstable membrane protein variants. eLife 2024; 12:RP92406. [PMID: 39078397 PMCID: PMC11288631 DOI: 10.7554/elife.92406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/31/2024] Open
Abstract
Many membrane proteins are prone to misfolding, which compromises their functional expression at the plasma membrane. This is particularly true for the mammalian gonadotropin-releasing hormone receptor GPCRs (GnRHR). We recently demonstrated that evolutionary GnRHR modifications appear to have coincided with adaptive changes in cotranslational folding efficiency. Though protein stability is known to shape evolution, it is unclear how cotranslational folding constraints modulate the synergistic, epistatic interactions between mutations. We therefore compared the pairwise interactions formed by mutations that disrupt the membrane topology (V276T) or tertiary structure (W107A) of GnRHR. Using deep mutational scanning, we evaluated how the plasma membrane expression of these variants is modified by hundreds of secondary mutations. An analysis of 251 mutants in three genetic backgrounds reveals that V276T and W107A form distinct epistatic interactions that depend on both the severity and the mechanism of destabilization. V276T forms predominantly negative epistatic interactions with destabilizing mutations in soluble loops. In contrast, W107A forms positive interactions with mutations in both loops and transmembrane domains that reflect the diminishing impacts of the destabilizing mutations in variants that are already unstable. These findings reveal how epistasis is remodeled by conformational defects in membrane proteins and in unstable proteins more generally.
Collapse
Affiliation(s)
- Laura M Chamness
- Department of Chemistry, Indiana UniversityBloomingtonUnited States
| | - Charles P Kuntz
- The James Tarpo Jr. and Margaret Tarpo Department of Chemistry, Purdue UniversityWest LafayetteUnited States
| | - Andrew G McKee
- Department of Chemistry, Indiana UniversityBloomingtonUnited States
| | - Wesley D Penn
- Department of Chemistry, Indiana UniversityBloomingtonUnited States
| | | | - Douglas B Rusch
- Center for Genomics and Bioinformatics, Indiana UniversityBloomingtonUnited States
| | - Hope Woods
- Department of Chemistry, Vanderbilt UniversityNashvilleUnited States
- Chemical and Physical Biology Program, Vanderbilt UniversityNashvilleUnited States
| | - Dyotima
- Department of Chemistry, Indiana UniversityBloomingtonUnited States
| | - Jens Meiler
- Department of Chemistry, Vanderbilt UniversityNashvilleUnited States
- Institute for Drug Discovery, Leipzig UniversityLeipzigGermany
| | - Jonathan P Schlebach
- The James Tarpo Jr. and Margaret Tarpo Department of Chemistry, Purdue UniversityWest LafayetteUnited States
| |
Collapse
|
10
|
Zhou Z, Zhang L, Yu Y, Wu B, Li M, Hong L, Tan P. Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning. Nat Commun 2024; 15:5566. [PMID: 38956442 PMCID: PMC11219809 DOI: 10.1038/s41467-024-49798-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 06/11/2024] [Indexed: 07/04/2024] Open
Abstract
Accurately modeling the protein fitness landscapes holds great importance for protein engineering. Pre-trained protein language models have achieved state-of-the-art performance in predicting protein fitness without wet-lab experimental data, but their accuracy and interpretability remain limited. On the other hand, traditional supervised deep learning models require abundant labeled training examples for performance improvements, posing a practical barrier. In this work, we introduce FSFP, a training strategy that can effectively optimize protein language models under extreme data scarcity for fitness prediction. By combining meta-transfer learning, learning to rank, and parameter-efficient fine-tuning, FSFP can significantly boost the performance of various protein language models using merely tens of labeled single-site mutants from the target protein. In silico benchmarks across 87 deep mutational scanning datasets demonstrate FSFP's superiority over both unsupervised and supervised baselines. Furthermore, we successfully apply FSFP to engineer the Phi29 DNA polymerase through wet-lab experiments, achieving a 25% increase in the positive rate. These results underscore the potential of our approach in aiding AI-guided protein engineering.
Collapse
Affiliation(s)
- Ziyi Zhou
- School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai National Center for Applied Mathematics (SJTU Center) & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Liang Zhang
- School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yuanxi Yu
- School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Banghao Wu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Mingchen Li
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Liang Hong
- School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai National Center for Applied Mathematics (SJTU Center) & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.
- Zhang Jiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai, 201203, China.
| | - Pan Tan
- School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai National Center for Applied Mathematics (SJTU Center) & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.
| |
Collapse
|
11
|
Posfai A, Zhou J, McCandlish DM, Kinney JB. Gauge fixing for sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.12.593772. [PMID: 38798671 PMCID: PMC11118547 DOI: 10.1101/2024.05.12.593772] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
- Department of Biology, University of Florida, Gainesville, FL, 32611
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Justin B. Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
12
|
Lei R, Liang W, Ouyang WO, Hernandez Garcia A, Kikuchi C, Wang S, McBride R, Tan TJC, Sun Y, Chen C, Graham CS, Rodriguez LA, Shen IR, Choi D, Bruzzone R, Paulson JC, Nair SK, Mok CKP, Wu NC. Epistasis mediates the evolution of the receptor binding mode in recent human H3N2 hemagglutinin. Nat Commun 2024; 15:5175. [PMID: 38890325 PMCID: PMC11189414 DOI: 10.1038/s41467-024-49487-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Accepted: 06/05/2024] [Indexed: 06/20/2024] Open
Abstract
The receptor-binding site of influenza A virus hemagglutinin partially overlaps with major antigenic sites and constantly evolves. In this study, we observe that mutations G186D and D190N in the hemagglutinin receptor-binding site have coevolved in two recent human H3N2 clades. X-ray crystallography results show that these mutations coordinately drive the evolution of the hemagglutinin receptor binding mode. Epistasis between G186D and D190N is further demonstrated by glycan binding and thermostability analyses. Immunization and neutralization experiments using mouse and human samples indicate that the evolution of receptor binding mode is accompanied by a change in antigenicity. Besides, combinatorial mutagenesis reveals that G186D and D190N, along with other natural mutations in recent H3N2 strains, alter the compatibility with a common egg-adaptive mutation in seasonal influenza vaccines. Overall, our findings elucidate the role of epistasis in shaping the recent evolution of human H3N2 hemagglutinin and substantiate the high evolvability of its receptor-binding mode.
Collapse
MESH Headings
- Humans
- Influenza A Virus, H3N2 Subtype/genetics
- Influenza A Virus, H3N2 Subtype/metabolism
- Hemagglutinin Glycoproteins, Influenza Virus/genetics
- Hemagglutinin Glycoproteins, Influenza Virus/chemistry
- Hemagglutinin Glycoproteins, Influenza Virus/metabolism
- Epistasis, Genetic
- Animals
- Evolution, Molecular
- Mice
- Binding Sites
- Influenza, Human/virology
- Mutation
- Crystallography, X-Ray
- Influenza Vaccines
- Protein Binding
- Receptors, Virus/metabolism
- Receptors, Virus/genetics
- Receptors, Virus/chemistry
- Female
Collapse
Affiliation(s)
- Ruipeng Lei
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
| | - Weiwen Liang
- HKU-Pasteur Research Pole, School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Wenhao O Ouyang
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
| | | | - Chika Kikuchi
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Shengyang Wang
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Ryan McBride
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Timothy J C Tan
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
| | - Yuanxin Sun
- The Jockey Club School of Public Health and Primary Care, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
- Li Ka Shing Institute of Health Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Chunke Chen
- The Jockey Club School of Public Health and Primary Care, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
- Li Ka Shing Institute of Health Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Claire S Graham
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
| | - Lucia A Rodriguez
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
| | - Ivana R Shen
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
| | - Danbi Choi
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
| | - Roberto Bruzzone
- HKU-Pasteur Research Pole, School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Department of Cell Biology and Infection, Institut Pasteur, Paris, Cedex, 75015, France
- Centre for Immunology and Infection, Hong Kong Science Park, Hong Kong SAR, China
| | - James C Paulson
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Satish K Nair
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
| | - Chris K P Mok
- The Jockey Club School of Public Health and Primary Care, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China.
- Li Ka Shing Institute of Health Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China.
- S.H. Ho Research Centre for Infectious Diseases, The Chinese University of Hong Kong, Hong Kong SAR, China.
- School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China.
| | - Nicholas C Wu
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA.
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA.
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA.
- Carle Illinois College of Medicine, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|
13
|
Chen SK, Liu J, Van Nynatten A, Tudor-Price BM, Chang BSW. Sampling Strategies for Experimentally Mapping Molecular Fitness Landscapes Using High-Throughput Methods. J Mol Evol 2024:10.1007/s00239-024-10179-8. [PMID: 38886207 DOI: 10.1007/s00239-024-10179-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 05/20/2024] [Indexed: 06/20/2024]
Abstract
Empirical studies of genotype-phenotype-fitness maps of proteins are fundamental to understanding the evolutionary process, in elucidating the space of possible genotypes accessible through mutations in a landscape of phenotypes and fitness effects. Yet, comprehensively mapping molecular fitness landscapes remains challenging since all possible combinations of amino acid substitutions for even a few protein sites are encoded by an enormous genotype space. High-throughput mapping of genotype space can be achieved using large-scale screening experiments known as multiplexed assays of variant effect (MAVEs). However, to accommodate such multi-mutational studies, the size of MAVEs has grown to the point where a priori determination of sampling requirements is needed. To address this problem, we propose calculations and simulation methods to approximate minimum sampling requirements for multi-mutational MAVEs, which we combine with a new library construction protocol to experimentally validate our approximation approaches. Analysis of our simulated data reveals how sampling trajectories differ between simulations of nucleotide versus amino acid variants and among mutagenesis schemes. For this, we show quantitatively that marginal gains in sampling efficiency demand increasingly greater sampling effort when sampling for nucleotide sequences over their encoded amino acid equivalents. We present a new library construction protocol that efficiently maximizes sequence variation, and demonstrate using ultradeep sequencing that the library encodes virtually all possible combinations of mutations within the experimental design. Insights learned from our analyses together with the methodological advances reported herein are immediately applicable toward pooled experimental screens of arbitrary design, enabling further assay upscaling and expanded testing of genotype space.
Collapse
Affiliation(s)
- Steven K Chen
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Jing Liu
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Alexander Van Nynatten
- Department of Biological Science, University of Toronto Scarborough, Toronto, ON, Canada
| | | | - Belinda S W Chang
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada.
- Department of Ecology & Evolutionary Biology, University of Toronto, Toronto, ON, Canada.
- Centre for the Analysis of Genome Evolution & Function, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
14
|
Gonzales J, Kim I, Hwang W, Cho JH. Evolutionary rewiring of the dynamic network underpinning allosteric epistasis in NS1 of influenza A virus. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.24.595776. [PMID: 38826371 PMCID: PMC11142230 DOI: 10.1101/2024.05.24.595776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Viral proteins frequently mutate to evade or antagonize host innate immune responses, yet the impact of these mutations on the molecular energy landscape remains unclear. Epistasis, the intramolecular communications between mutations, often renders the combined mutational effects unpredictable. Nonstructural protein 1 (NS1) is a major virulence factor of the influenza A virus (IAV) that activates host PI3K by binding to its p85β subunit. Here, we present the deep analysis for the impact of evolutionary mutations in NS1 that emerged between the 1918 pandemic IAV strain and its descendant PR8 strain. Our analysis reveal how the mutations rewired inter-residue communications which underlies long-range allosteric and epistatic networks in NS1. Our findings show that PR8 NS1 binds to p85β with approximately 10-fold greater affinity than 1918 NS1 due to allosteric mutational effects. Notably, these mutations also exhibited long-range epistatic effects. NMR chemical shift perturbation and methyl-axis order parameter analyses revealed that the mutations induced long-range structural and dynamic changes in PR8 NS1, enhancing its affinity to p85β. Complementary MD simulations and graph-based network analysis uncover how these mutations rewire dynamic residue interaction networks, which underlies the long-range epistasis and allosteric effects on p85β-binding affinity. Significantly, we find that conformational dynamics of residues with high betweenness centrality play a crucial role in communications between network communities and are highly conserved across influenza A virus evolution. These findings advance our mechanistic understanding of the allosteric and epistatic communications between distant residues and provides insight into their role in the molecular evolution of NS1.
Collapse
|
15
|
Metzger BPH, Park Y, Starr TN, Thornton JW. Epistasis facilitates functional evolution in an ancient transcription factor. eLife 2024; 12:RP88737. [PMID: 38767330 PMCID: PMC11105156 DOI: 10.7554/elife.88737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024] Open
Abstract
A protein's genetic architecture - the set of causal rules by which its sequence produces its functions - also determines its possible evolutionary trajectories. Prior research has proposed that the genetic architecture of proteins is very complex, with pervasive epistatic interactions that constrain evolution and make function difficult to predict from sequence. Most of this work has analyzed only the direct paths between two proteins of interest - excluding the vast majority of possible genotypes and evolutionary trajectories - and has considered only a single protein function, leaving unaddressed the genetic architecture of functional specificity and its impact on the evolution of new functions. Here, we develop a new method based on ordinal logistic regression to directly characterize the global genetic determinants of multiple protein functions from 20-state combinatorial deep mutational scanning (DMS) experiments. We use it to dissect the genetic architecture and evolution of a transcription factor's specificity for DNA, using data from a combinatorial DMS of an ancient steroid hormone receptor's capacity to activate transcription from two biologically relevant DNA elements. We show that the genetic architecture of DNA recognition consists of a dense set of main and pairwise effects that involve virtually every possible amino acid state in the protein-DNA interface, but higher-order epistasis plays only a tiny role. Pairwise interactions enlarge the set of functional sequences and are the primary determinants of specificity for different DNA elements. They also massively expand the number of opportunities for single-residue mutations to switch specificity from one DNA target to another. By bringing variants with different functions close together in sequence space, pairwise epistasis therefore facilitates rather than constrains the evolution of new functions.
Collapse
Affiliation(s)
- Brian PH Metzger
- Department of Ecology and Evolution, University of ChicagoChicagoUnited States
| | - Yeonwoo Park
- Program in Genetics, Genomics, and Systems Biology, University of ChicagoChicagoUnited States
| | - Tyler N Starr
- Department of Biochemistry and Molecular Biophysics, University of ChicagoChicagoUnited States
| | - Joseph W Thornton
- Department of Ecology and Evolution, University of ChicagoChicagoUnited States
- Department of Human Genetics, University of ChicagoChicagoUnited States
| |
Collapse
|
16
|
Lei R, Qing E, Odle A, Yuan M, Gunawardene CD, Tan TJC, So N, Ouyang WO, Wilson IA, Gallagher T, Perlman S, Wu NC, Wong LYR. Functional and antigenic characterization of SARS-CoV-2 spike fusion peptide by deep mutational scanning. Nat Commun 2024; 15:4056. [PMID: 38744813 PMCID: PMC11094058 DOI: 10.1038/s41467-024-48104-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 04/16/2024] [Indexed: 05/16/2024] Open
Abstract
The fusion peptide of SARS-CoV-2 spike protein is functionally important for membrane fusion during virus entry and is part of a broadly neutralizing epitope. However, sequence determinants at the fusion peptide and its adjacent regions for pathogenicity and antigenicity remain elusive. In this study, we perform a series of deep mutational scanning (DMS) experiments on an S2 region spanning the fusion peptide of authentic SARS-CoV-2 in different cell lines and in the presence of broadly neutralizing antibodies. We identify mutations at residue 813 of the spike protein that reduced TMPRSS2-mediated entry with decreased virulence. In addition, we show that an F823Y mutation, present in bat betacoronavirus HKU9 spike protein, confers resistance to broadly neutralizing antibodies. Our findings provide mechanistic insights into SARS-CoV-2 pathogenicity and also highlight a potential challenge in developing broadly protective S2-based coronavirus vaccines.
Collapse
Affiliation(s)
- Ruipeng Lei
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Enya Qing
- Department of Microbiology and Immunology, Loyola University Chicago, Maywood, IL, 60153, USA
| | - Abby Odle
- Department of Microbiology and Immunology, University of Iowa, Iowa City, IA, 52242, USA
| | - Meng Yuan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Chaminda D Gunawardene
- Center for Virus-Host Innate Immunity, Rutgers New Jersey Medical School, Newark, NJ, 07103, USA
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, NJ, 07103, USA
| | - Timothy J C Tan
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Natalie So
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Wenhao O Ouyang
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Ian A Wilson
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Tom Gallagher
- Department of Microbiology and Immunology, Loyola University Chicago, Maywood, IL, 60153, USA.
| | - Stanley Perlman
- Department of Microbiology and Immunology, University of Iowa, Iowa City, IA, 52242, USA.
- Department of Pediatrics, University of Iowa, Iowa City, IA, 52242, USA.
| | - Nicholas C Wu
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
- Carle Illinois College of Medicine, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| | - Lok-Yin Roy Wong
- Department of Microbiology and Immunology, University of Iowa, Iowa City, IA, 52242, USA.
- Center for Virus-Host Innate Immunity, Rutgers New Jersey Medical School, Newark, NJ, 07103, USA.
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, NJ, 07103, USA.
| |
Collapse
|
17
|
Wagner A. Genotype sampling for deep-learning assisted experimental mapping of a combinatorially complete fitness landscape. Bioinformatics 2024; 40:btae317. [PMID: 38745436 PMCID: PMC11132821 DOI: 10.1093/bioinformatics/btae317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/21/2024] [Accepted: 05/14/2024] [Indexed: 05/16/2024] Open
Abstract
MOTIVATION Experimental characterization of fitness landscapes, which map genotypes onto fitness, is important for both evolutionary biology and protein engineering. It faces a fundamental obstacle in the astronomical number of genotypes whose fitness needs to be measured for any one protein. Deep learning may help to predict the fitness of many genotypes from a smaller neural network training sample of genotypes with experimentally measured fitness. Here I use a recently published experimentally mapped fitness landscape of more than 260 000 protein genotypes to ask how such sampling is best performed. RESULTS I show that multilayer perceptrons, recurrent neural networks, convolutional networks, and transformers, can explain more than 90% of fitness variance in the data. In addition, 90% of this performance is reached with a training sample comprising merely ≈103 sequences. Generalization to unseen test data is best when training data is sampled randomly and uniformly, or sampled to minimize the number of synonymous sequences. In contrast, sampling to maximize sequence diversity or codon usage bias reduces performance substantially. These observations hold for more than one network architecture. Simple sampling strategies may perform best when training deep learning neural networks to map fitness landscapes from experimental data. AVAILABILITY AND IMPLEMENTATION The fitness landscape data analyzed here is publicly available as described previously (Papkou et al. 2023). All code used to analyze this landscape is publicly available at https://github.com/andreas-wagner-uzh/fitness_landscape_sampling.
Collapse
Affiliation(s)
- Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode,1015 Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, 87501 NM, United States
| |
Collapse
|
18
|
Rozhoňová H, Martí-Gómez C, McCandlish DM, Payne JL. Robust genetic codes enhance protein evolvability. PLoS Biol 2024; 22:e3002594. [PMID: 38754362 PMCID: PMC11098591 DOI: 10.1371/journal.pbio.3002594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 03/19/2024] [Indexed: 05/18/2024] Open
Abstract
The standard genetic code defines the rules of translation for nearly every life form on Earth. It also determines the amino acid changes accessible via single-nucleotide mutations, thus influencing protein evolvability-the ability of mutation to bring forth adaptive variation in protein function. One of the most striking features of the standard genetic code is its robustness to mutation, yet it remains an open question whether such robustness facilitates or frustrates protein evolvability. To answer this question, we use data from massively parallel sequence-to-function assays to construct and analyze 6 empirical adaptive landscapes under hundreds of thousands of rewired genetic codes, including those of codon compression schemes relevant to protein engineering and synthetic biology. We find that robust genetic codes tend to enhance protein evolvability by rendering smooth adaptive landscapes with few peaks, which are readily accessible from throughout sequence space. However, the standard genetic code is rarely exceptional in this regard, because many alternative codes render smoother landscapes than the standard code. By constructing low-dimensional visualizations of these landscapes, which each comprise more than 16 million mRNA sequences, we show that such alternative codes radically alter the topological features of the network of high-fitness genotypes. Whereas the genetic codes that optimize evolvability depend to some extent on the detailed relationship between amino acid sequence and protein function, we also uncover general design principles for engineering nonstandard genetic codes for enhanced and diminished evolvability, which may facilitate directed protein evolution experiments and the bio-containment of synthetic organisms, respectively.
Collapse
Affiliation(s)
- Hana Rozhoňová
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Carlos Martí-Gómez
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Joshua L. Payne
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
19
|
Judge A, Sankaran B, Hu L, Palaniappan M, Birgy A, Prasad BVV, Palzkill T. Network of epistatic interactions in an enzyme active site revealed by large-scale deep mutational scanning. Proc Natl Acad Sci U S A 2024; 121:e2313513121. [PMID: 38483989 PMCID: PMC10962969 DOI: 10.1073/pnas.2313513121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 02/14/2024] [Indexed: 03/19/2024] Open
Abstract
Cooperative interactions between amino acids are critical for protein function. A genetic reflection of cooperativity is epistasis, which is when a change in the amino acid at one position changes the sequence requirements at another position. To assess epistasis within an enzyme active site, we utilized CTX-M β-lactamase as a model system. CTX-M hydrolyzes β-lactam antibiotics to provide antibiotic resistance, allowing a simple functional selection for rapid sorting of modified enzymes. We created all pairwise mutations across 17 active site positions in the β-lactamase enzyme and quantitated the function of variants against two β-lactam antibiotics using next-generation sequencing. Context-dependent sequence requirements were determined by comparing the antibiotic resistance function of double mutations across the CTX-M active site to their predicted function based on the constituent single mutations, revealing both positive epistasis (synergistic interactions) and negative epistasis (antagonistic interactions) between amino acid substitutions. The resulting trends demonstrate that positive epistasis is present throughout the active site, that epistasis between residues is mediated through substrate interactions, and that residues more tolerant to substitutions serve as generic compensators which are responsible for many cases of positive epistasis. Additionally, we show that a key catalytic residue (Glu166) is amenable to compensatory mutations, and we characterize one such double mutant (E166Y/N170G) that acts by an altered catalytic mechanism. These findings shed light on the unique biochemical factors that drive epistasis within an enzyme active site and will inform enzyme engineering efforts by bridging the gap between amino acid sequence and catalytic function.
Collapse
Affiliation(s)
- Allison Judge
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Banumathi Sankaran
- Department of Molecular Biophysics and Integrated Bioimaging, Berkeley Center for Structural Biology Lawrence Berkeley National Laboratory, Berkeley, CA94720
| | - Liya Hu
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Murugesan Palaniappan
- Department of Pathology and Immunology, Center for Drug Discovery, Baylor College of Medicine, Houston, TX77030
| | - André Birgy
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
- Infections, Antimicrobials, Modelling, Evolution, UMR 1137, French Insitute for Medical Research (INSERM), Faculty of Health, Université Paris Cité, Paris75006, France
| | - B. V. Venkataram Prasad
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Timothy Palzkill
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| |
Collapse
|
20
|
Gelman S, Johnson B, Freschlin C, D'Costa S, Gitter A, Romero PA. Biophysics-based protein language models for protein engineering. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585128. [PMID: 38559182 PMCID: PMC10980077 DOI: 10.1101/2024.03.15.585128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose Mutational Effect Transfer Learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure, and energetics. We finetune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity, and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.
Collapse
Affiliation(s)
- Sam Gelman
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | - Bryce Johnson
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | | | - Sameer D'Costa
- Department of Biochemistry, University of Wisconsin-Madison
| | - Anthony Gitter
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
| | | |
Collapse
|
21
|
Yang J, Li FZ, Arnold FH. Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering. ACS CENTRAL SCIENCE 2024; 10:226-241. [PMID: 38435522 PMCID: PMC10906252 DOI: 10.1021/acscentsci.3c01275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 12/26/2023] [Accepted: 01/16/2024] [Indexed: 03/05/2024]
Abstract
Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.
Collapse
Affiliation(s)
- Jason Yang
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Francesca-Zhoufan Li
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Frances H. Arnold
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
22
|
Ding D, Shaw AY, Sinai S, Rollins N, Prywes N, Savage DF, Laub MT, Marks DS. Protein design using structure-based residue preferences. Nat Commun 2024; 15:1639. [PMID: 38388493 PMCID: PMC10884402 DOI: 10.1038/s41467-024-45621-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 01/29/2024] [Indexed: 02/24/2024] Open
Abstract
Recent developments in protein design rely on large neural networks with up to 100s of millions of parameters, yet it is unclear which residue dependencies are critical for determining protein function. Here, we show that amino acid preferences at individual residues-without accounting for mutation interactions-explain much and sometimes virtually all of the combinatorial mutation effects across 8 datasets (R2 ~ 78-98%). Hence, few observations (~100 times the number of mutated residues) enable accurate prediction of held-out variant effects (Pearson r > 0.80). We hypothesized that the local structural contexts around a residue could be sufficient to predict mutation preferences, and develop an unsupervised approach termed CoVES (Combinatorial Variant Effects from Structure). Our results suggest that CoVES outperforms not just model-free methods but also similarly to complex models for creating functional and diverse protein variants. CoVES offers an effective alternative to complicated models for identifying functional protein mutations.
Collapse
Affiliation(s)
- David Ding
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA.
| | - Ada Y Shaw
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Sam Sinai
- Dyno Therapeutics, Watertown, MA, 02472, USA
| | - Nathan Rollins
- Seismic Therapeutics, Lab Central, Cambridge, MA, 02142, USA
| | - Noam Prywes
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA
| | - David F Savage
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA, 94720, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
23
|
Radojković M, Ubbink M. Positive epistasis drives clavulanic acid resistance in double mutant libraries of BlaC β-lactamase. Commun Biol 2024; 7:197. [PMID: 38368480 PMCID: PMC10874438 DOI: 10.1038/s42003-024-05868-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 01/26/2024] [Indexed: 02/19/2024] Open
Abstract
Phenotypic effects of mutations are highly dependent on the genetic backgrounds in which they occur, due to epistatic effects. To test how easily the loss of enzyme activity can be compensated for, we screen mutant libraries of BlaC, a β-lactamase from Mycobacterium tuberculosis, for fitness in the presence of carbenicillin and the inhibitor clavulanic acid. Using a semi-rational approach and deep sequencing, we prepare four double-site saturation libraries and determine the relative fitness effect for 1534/1540 (99.6%) of the unique library members at two temperatures. Each library comprises variants of a residue known to be relevant for clavulanic acid resistance as well as residue 105, which regulates access to the active site. Variants with greatly improved fitness were identified within each library, demonstrating that compensatory mutations for loss of activity can be readily found. In most cases, the fittest variants are a result of positive epistasis, indicating strong synergistic effects between the chosen residue pairs. Our study sheds light on a role of epistasis in the evolution of functional residues and underlines the highly adaptive potential of BlaC.
Collapse
Affiliation(s)
- Marko Radojković
- Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Marcellus Ubbink
- Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.
| |
Collapse
|
24
|
Zeng M, Sarker B, Rondthaler SN, Vu V, Andrews LB. Identifying LasR Quorum Sensors with Improved Signal Specificity by Mapping the Sequence-Function Landscape. ACS Synth Biol 2024; 13:568-589. [PMID: 38206199 DOI: 10.1021/acssynbio.3c00543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
Programmable intercellular signaling using components of naturally occurring quorum sensing can allow for coordinated functions to be engineered in microbial consortia. LuxR-type transcriptional regulators are widely used for this purpose and are activated by homoserine lactone (HSL) signals. However, they often suffer from imperfect molecular discrimination of structurally similar HSLs, causing misregulation within engineered consortia containing multiple HSL signals. Here, we studied one such example, the regulator LasR from Pseudomonas aeruginosa. We elucidated its sequence-function relationship for ligand specificity using targeted protein engineering and multiplexed high-throughput biosensor screening. A pooled combinatorial saturation mutagenesis library (9,486 LasR DNA sequences) was created by mutating six residues in LasR's β5 sheet with single, double, or triple amino acid substitutions. Sort-seq assays were performed in parallel using cognate and noncognate HSLs to quantify each corresponding sensor's response to each HSL signal, which identified hundreds of highly specific variants. Sensor variants identified were individually assayed and exhibited up to 60.6-fold (p = 0.0013) improved relative activation by the cognate signal compared to the wildtype. Interestingly, we uncovered prevalent mutational epistasis and previously unidentified residues contributing to signal specificity. The resulting sensors with negligible signal crosstalk could be broadly applied to engineer bacteria consortia.
Collapse
Affiliation(s)
- Min Zeng
- Department of Chemical Engineering, University of Massachusetts Amherst, Amherst, Massachusetts 01003, United States
| | - Biprodev Sarker
- Department of Chemical Engineering, University of Massachusetts Amherst, Amherst, Massachusetts 01003, United States
| | - Stephen N Rondthaler
- Department of Chemical Engineering, University of Massachusetts Amherst, Amherst, Massachusetts 01003, United States
| | - Vanessa Vu
- Department of Biochemistry and Molecular Biology, University of Massachusetts Amherst, Amherst, Massachusetts 01003, United States
| | - Lauren B Andrews
- Department of Chemical Engineering, University of Massachusetts Amherst, Amherst, Massachusetts 01003, United States
- Molecular and Cellular Biology Graduate Program, University of Massachusetts Amherst, Amherst, Massachusetts 01003, United States
- Biotechnology Training Program, University of Massachusetts Amherst, Amherst, Massachusetts 01003, United States
| |
Collapse
|
25
|
Park Y, Metzger BP, Thornton JW. The simplicity of protein sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.02.556057. [PMID: 37732229 PMCID: PMC10508729 DOI: 10.1101/2023.09.02.556057] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
How complicated is the genetic architecture of proteins - the set of causal effects by which sequence determines function? High-order epistatic interactions among residues are thought to be pervasive, making a protein's function difficult to predict or understand from its sequence. Most studies, however, used methods that overestimate epistasis, because they analyze genetic architecture relative to a designated reference sequence - causing measurement noise and small local idiosyncrasies to propagate into pervasive high-order interactions - or have not effectively accounted for global nonlinearity in the sequence-function relationship. Here we present a new reference-free method that jointly estimates global nonlinearity and specific epistatic interactions across a protein's entire genotype-phenotype map. This method yields a maximally efficient explanation of a protein's genetic architecture and is more robust than existing methods to measurement noise, partial sampling, and model misspecification. We reanalyze 20 combinatorial mutagenesis experiments from a diverse set of proteins and find that additive and pairwise effects, along with a simple nonlinearity to account for limited dynamic range, explain a median of 96% of total variance in measured phenotypes (and >92% in every case). Only a tiny fraction of genotypes are strongly affected by third- or higher-order epistasis. Genetic architecture is also sparse: the number of terms required to explain the vast majority of variance is smaller than the number of genotypes by many orders of magnitude. The sequence-function relationship in most proteins is therefore far simpler than previously thought, opening the way for new and tractable approaches to characterize it.
Collapse
Affiliation(s)
- Yeonwoo Park
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL 60637
- Current affiliation: Center for RNA Research, Institute for Basic Science, Seoul, Republic of Korea 08826
| | - Brian P.H. Metzger
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637
- Current affiliation: Department of Biological Sciences, Purdue University, West Lafayette, IN 47907
| | - Joseph W. Thornton
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637
- Department of Human Genetics, University of Chicago, Chicago, IL 60637
| |
Collapse
|
26
|
Sesta L, Pagnani A, Fernandez-de-Cossio-Diaz J, Uguzzoni G. Inference of annealed protein fitness landscapes with AnnealDCA. PLoS Comput Biol 2024; 20:e1011812. [PMID: 38377054 PMCID: PMC10878520 DOI: 10.1371/journal.pcbi.1011812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 01/08/2024] [Indexed: 02/22/2024] Open
Abstract
The design of proteins with specific tasks is a major challenge in molecular biology with important diagnostic and therapeutic applications. High-throughput screening methods have been developed to systematically evaluate protein activity, but only a small fraction of possible protein variants can be tested using these techniques. Computational models that explore the sequence space in-silico to identify the fittest molecules for a given function are needed to overcome this limitation. In this article, we propose AnnealDCA, a machine-learning framework to learn the protein fitness landscape from sequencing data derived from a broad range of experiments that use selection and sequencing to quantify protein activity. We demonstrate the effectiveness of our method by applying it to antibody Rep-Seq data of immunized mice and screening experiments, assessing the quality of the fitness landscape reconstructions. Our method can be applied to several experimental cases where a population of protein variants undergoes various rounds of selection and sequencing, without relying on the computation of variants enrichment ratios, and thus can be used even in cases of disjoint sequence samples.
Collapse
Affiliation(s)
- Luca Sesta
- Department of Applied Science and Technology, Politecnico di Torino, Torino, Italy
| | - Andrea Pagnani
- Department of Applied Science and Technology, Politecnico di Torino, Torino, Italy
- Italian Institute for Genomic Medicine, Torino, Italy
- INFN, Sezione di Torino, Torino, Italy
| | | | | |
Collapse
|
27
|
Chamness LM, Kuntz CP, McKee AG, Penn WD, Hemmerich CM, Rusch DB, Woods H, Dyotima, Meiler J, Schlebach JP. Divergent Folding-Mediated Epistasis Among Unstable Membrane Protein Variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.25.554866. [PMID: 37662415 PMCID: PMC10473758 DOI: 10.1101/2023.08.25.554866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Many membrane proteins are prone to misfolding, which compromises their functional expression at the plasma membrane. This is particularly true for the mammalian gonadotropin-releasing hormone receptor GPCRs (GnRHR). We recently demonstrated that evolutionary GnRHR modifications appear to have coincided with adaptive changes in cotranslational folding efficiency. Though protein stability is known to shape evolution, it is unclear how cotranslational folding constraints modulate the synergistic, epistatic interactions between mutations. We therefore compared the pairwise interactions formed by mutations that disrupt the membrane topology (V276T) or tertiary structure (W107A) of GnRHR. Using deep mutational scanning, we evaluated how the plasma membrane expression of these variants is modified by hundreds of secondary mutations. An analysis of 251 mutants in three genetic backgrounds reveals that V276T and W107A form distinct epistatic interactions that depend on both the severity and the mechanism of destabilization. V276T forms predominantly negative epistatic interactions with destabilizing mutations in soluble loops. In contrast, W107A forms positive interactions with mutations in both loops and transmembrane domains that reflect the diminishing impacts of the destabilizing mutations in variants that are already unstable. These findings reveal how epistasis is remodeled by conformational defects in membrane proteins and in unstable proteins more generally.
Collapse
Affiliation(s)
- Laura M. Chamness
- Department of Chemistry, Indiana University, Bloomington, Indiana, USA
| | - Charles P. Kuntz
- Department of Chemistry, Purdue University, West Lafayette, Indiana, USA
| | - Andrew G. McKee
- Department of Chemistry, Indiana University, Bloomington, Indiana, USA
| | - Wesley D. Penn
- Department of Chemistry, Indiana University, Bloomington, Indiana, USA
| | | | - Douglas B. Rusch
- Center for Genomics and Bioinformatics, Indiana University, Bloomington, Indiana, USA
| | - Hope Woods
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, USA
- Chemical and Physical Biology Program, Vanderbilt University, Nashville, Tennessee, USA
| | - Dyotima
- Department of Chemistry, Indiana University, Bloomington, Indiana, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, USA
- Institute for Drug Discovery, Leipzig University, Leipzig, SAC, Germany
| | - Jonathan P. Schlebach
- Department of Chemistry, Indiana University, Bloomington, Indiana, USA
- Department of Chemistry, Purdue University, West Lafayette, Indiana, USA
| |
Collapse
|
28
|
Hayes RL, Nixon CF, Marqusee S, Brooks CL. Selection pressures on evolution of ribonuclease H explored with rigorous free-energy-based design. Proc Natl Acad Sci U S A 2024; 121:e2312029121. [PMID: 38194446 PMCID: PMC10801872 DOI: 10.1073/pnas.2312029121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 11/22/2023] [Indexed: 01/11/2024] Open
Abstract
Understanding natural protein evolution and designing novel proteins are motivating interest in development of high-throughput methods to explore large sequence spaces. In this work, we demonstrate the application of multisite λ dynamics (MSλD), a rigorous free energy simulation method, and chemical denaturation experiments to quantify evolutionary selection pressure from sequence-stability relationships and to address questions of design. This study examines a mesophilic phylogenetic clade of ribonuclease H (RNase H), furthering its extensive characterization in earlier studies, focusing on E. coli RNase H (ecRNH) and a more stable consensus sequence (AncCcons) differing at 15 positions. The stabilities of 32,768 chimeras between these two sequences were computed using the MSλD framework. The most stable and least stable chimeras were predicted and tested along with several other sequences, revealing a designed chimera with approximately the same stability increase as AncCcons, but requiring only half the mutations. Comparing the computed stabilities with experiment for 12 sequences reveals a Pearson correlation of 0.86 and root mean squared error of 1.18 kcal/mol, an unprecedented level of accuracy well beyond less rigorous computational design methods. We then quantified selection pressure using a simple evolutionary model in which sequences are selected according to the Boltzmann factor of their stability. Selection temperatures from 110 to 168 K are estimated in three ways by comparing experimental and computational results to evolutionary models. These estimates indicate selection pressure is high, which has implications for evolutionary dynamics and for the accuracy required for design, and suggests accurate high-throughput computational methods like MSλD may enable more effective protein design.
Collapse
Affiliation(s)
- Ryan L. Hayes
- Department of Chemical and Biomolecular Engineering, University of California, Irvine, CA92697
- Department of Chemistry, University of Michigan, Ann Arbor, MI48109
| | - Charlotte F. Nixon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA94720
| | - Susan Marqusee
- Department of Molecular and Cell Biology, University of California, Berkeley, CA94720
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA94720
- Department of Chemistry, University of California, Berkeley, CA94720
| | - Charles L. Brooks
- Department of Chemistry, University of Michigan, Ann Arbor, MI48109
- Biophysics Program, University of Michigan, Ann Arbor, MI48109
| |
Collapse
|
29
|
Nemoto T, Ocari T, Planul A, Tekinsoy M, Zin EA, Dalkara D, Ferrari U. ACIDES: on-line monitoring of forward genetic screens for protein engineering. Nat Commun 2023; 14:8504. [PMID: 38148337 PMCID: PMC10751290 DOI: 10.1038/s41467-023-43967-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 11/24/2023] [Indexed: 12/28/2023] Open
Abstract
Forward genetic screens of mutated variants are a versatile strategy for protein engineering and investigation, which has been successfully applied to various studies like directed evolution (DE) and deep mutational scanning (DMS). While next-generation sequencing can track millions of variants during the screening rounds, the vast and noisy nature of the sequencing data impedes the estimation of the performance of individual variants. Here, we propose ACIDES that combines statistical inference and in-silico simulations to improve performance estimation in the library selection process by attributing accurate statistical scores to individual variants. We tested ACIDES first on a random-peptide-insertion experiment and then on multiple public datasets from DE and DMS studies. ACIDES allows experimentalists to reliably estimate variant performance on the fly and can aid protein engineering and research pipelines in a range of applications, including gene therapy.
Collapse
Affiliation(s)
- Takahiro Nemoto
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
- Graduate School of Informatics, Kyoto University, Yoshida Hon-machi, Sakyo-ku, Kyoto, 606-8501, Japan.
- Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Osaka University, Suita, Osaka, 565-0871, Japan.
| | - Tommaso Ocari
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Arthur Planul
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Muge Tekinsoy
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Emilia A Zin
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Deniz Dalkara
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
| | - Ulisse Ferrari
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
| |
Collapse
|
30
|
Eccleston RC, Manko E, Campino S, Clark TG, Furnham N. A computational method for predicting the most likely evolutionary trajectories in the stepwise accumulation of resistance mutations. eLife 2023; 12:e84756. [PMID: 38132182 PMCID: PMC10807863 DOI: 10.7554/elife.84756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 12/21/2023] [Indexed: 12/23/2023] Open
Abstract
Pathogen evolution of drug resistance often occurs in a stepwise manner via the accumulation of multiple mutations that in combination have a non-additive impact on fitness, a phenomenon known as epistasis. The evolution of resistance via the accumulation of point mutations in the DHFR genes of Plasmodium falciparum (Pf) and Plasmodium vivax (Pv) has been studied extensively and multiple studies have shown epistatic interactions between these mutations determine the accessible evolutionary trajectories to highly resistant multiple mutations. Here, we simulated these evolutionary trajectories using a model of molecular evolution, parameterised using Rosetta Flex ddG predictions, where selection acts to reduce the target-drug binding affinity. We observe strong agreement with pathways determined using experimentally measured IC50 values of pyrimethamine binding, which suggests binding affinity is strongly predictive of resistance and epistasis in binding affinity strongly influences the order of fixation of resistance mutations. We also infer pathways directly from the frequency of mutations found in isolate data, and observe remarkable agreement with the most likely pathways predicted by our mechanistic model, as well as those determined experimentally. This suggests mutation frequency data can be used to intuitively infer evolutionary pathways, provided sufficient sampling of the population.
Collapse
Affiliation(s)
- Ruth Charlotte Eccleston
- Department of Infection Biology, London School of Hygiene and Tropical MedicineLondonUnited Kingdom
| | - Emilia Manko
- Department of Infection Biology, London School of Hygiene and Tropical MedicineLondonUnited Kingdom
| | - Susana Campino
- Department of Infection Biology, London School of Hygiene and Tropical MedicineLondonUnited Kingdom
| | - Taane G Clark
- Department of Infection Biology, London School of Hygiene and Tropical MedicineLondonUnited Kingdom
- Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical MedicineLondonUnited Kingdom
| | - Nicholas Furnham
- Department of Infection Biology, London School of Hygiene and Tropical MedicineLondonUnited Kingdom
| |
Collapse
|
31
|
Buda K, Miton CM, Tokuriki N. Pervasive epistasis exposes intramolecular networks in adaptive enzyme evolution. Nat Commun 2023; 14:8508. [PMID: 38129396 PMCID: PMC10739712 DOI: 10.1038/s41467-023-44333-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Accepted: 12/08/2023] [Indexed: 12/23/2023] Open
Abstract
Enzyme evolution is characterized by constant alterations of the intramolecular residue networks supporting their functions. The rewiring of these network interactions can give rise to epistasis. As mutations accumulate, the epistasis observed across diverse genotypes may appear idiosyncratic, that is, exhibit unique effects in different genetic backgrounds. Here, we unveil a quantitative picture of the prevalence and patterns of epistasis in enzyme evolution by analyzing 41 fitness landscapes generated from seven enzymes. We show that >94% of all mutational and epistatic effects appear highly idiosyncratic, which greatly distorted the functional prediction of the evolved enzymes. By examining seemingly idiosyncratic changes in epistasis along adaptive trajectories, we expose several instances of higher-order, intramolecular rewiring. Using complementary structural data, we outline putative molecular mechanisms explaining higher-order epistasis along two enzyme trajectories. Our work emphasizes the prevalence of epistasis and provides an approach to exploring this phenomenon through a molecular lens.
Collapse
Affiliation(s)
- Karol Buda
- Michael Smith Laboratories, University of British Columbia, Vancouver, Canada
| | - Charlotte M Miton
- Michael Smith Laboratories, University of British Columbia, Vancouver, Canada
| | - Nobuhiko Tokuriki
- Michael Smith Laboratories, University of British Columbia, Vancouver, Canada.
| |
Collapse
|
32
|
Notin P, Kollasch AW, Ritter D, van Niekerk L, Paul S, Spinner H, Rollins N, Shaw A, Weitzman R, Frazer J, Dias M, Franceschi D, Orenbuch R, Gal Y, Marks DS. ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.07.570727. [PMID: 38106144 PMCID: PMC10723403 DOI: 10.1101/2023.12.07.570727] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite a surge in machine learning-based protein models to tackle these questions, an assessment of their respective benefits is challenging due to the use of distinct, often contrived, experimental datasets, and the variable performance of models across different protein families. Addressing these challenges requires scale. To that end we introduce ProteinGym, a large-scale and holistic set of benchmarks specifically designed for protein fitness prediction and design. It encompasses both a broad collection of over 250 standardized deep mutational scanning assays, spanning millions of mutated sequences, as well as curated clinical datasets providing high-quality expert annotations about mutation effects. We devise a robust evaluation framework that combines metrics for both fitness prediction and design, factors in known limitations of the underlying experimental methods, and covers both zero-shot and supervised settings. We report the performance of a diverse set of over 70 high-performing models from various subfields (eg., alignment-based, inverse folding) into a unified benchmark suite. We open source the corresponding codebase, datasets, MSAs, structures, model predictions and develop a user-friendly website that facilitates data access and analysis.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Ada Shaw
- Applied Mathematics, Harvard University
| | | | | | - Mafalda Dias
- Centre for Genomic Regulation, Universitat Pompeu Fabra
| | | | | | - Yarin Gal
- Computer Science, University of Oxford
| | | |
Collapse
|
33
|
Bertrand Q, Coquille S, Iorio A, Sterpone F, Madern D. Biochemical, structural and dynamical characterizations of the lactate dehydrogenase from Selenomonas ruminantium provide information about an intermediate evolutionary step prior to complete allosteric regulation acquisition in the super family of lactate and malate dehydrogenases. J Struct Biol 2023; 215:108039. [PMID: 37884067 DOI: 10.1016/j.jsb.2023.108039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 09/26/2023] [Accepted: 10/22/2023] [Indexed: 10/28/2023]
Abstract
In this work, we investigated the lactate dehydrogenase (LDH) from Selenomonas ruminantium (S. rum), an enzyme that differs at key amino acid positions from canonical allosteric LDHs. The wild type (Wt) of this enzyme recognises pyuvate as all LDHs. However, introducing a single point mutation in the active site loop (I85R) allows S. Rum LDH to recognize the oxaloacetate substrate as a typical malate dehydrogenase (MalDH), whilst maintaining homotropic activation as an LDH. We report the tertiary structure of the Wt and I85RLDH mutant. The Wt S. rum enzyme structure binds NADH and malonate, whilst also resembling the typical compact R-active state of canonical LDHs. The structure of the mutant with I85R was solved in the Apo State (without ligand), and shows no large conformational reorganization such as that observed with canonical allosteric LDHs in Apo state. This is due to a local structural feature typical of S. rum LDH that prevents large-scale conformational reorganization. The S. rum LDH was also studied using Molecular Dynamics simulations, probing specific local deformations of the active site that allow the S. rum LDH to sample the T-inactive state. We propose that, with respect to the LDH/MalDH superfamily, the S. rum enzyme possesses a specificstructural and dynamical way to ensure homotropic activation.
Collapse
Affiliation(s)
- Quentin Bertrand
- Univ. Grenoble Alpes, CEA, CNRS, IBS, 38000 Grenoble, France; Laboratory of Biomolecular Research, Biology and Chemistry Division, Paul Scherrer Institut, Villigen, Switzerland
| | | | - Antonio Iorio
- CNRS, Université de Paris, UPR 9080, Laboratoire de Biochimie Théorique, Paris, France; Institut de Biologie Physico-Chimique-Fondation Edmond de Rothschild, PSL Research University, Paris, France
| | - Fabio Sterpone
- CNRS, Université de Paris, UPR 9080, Laboratoire de Biochimie Théorique, Paris, France; Institut de Biologie Physico-Chimique-Fondation Edmond de Rothschild, PSL Research University, Paris, France
| | | |
Collapse
|
34
|
Lei R, Qing E, Odle A, Yuan M, Tan TJ, So N, Ouyang WO, Wilson IA, Gallagher T, Perlman S, Wu NC, Wong LYR. Functional and antigenic characterization of SARS-CoV-2 spike fusion peptide by deep mutational scanning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.28.569051. [PMID: 38076875 PMCID: PMC10705381 DOI: 10.1101/2023.11.28.569051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The fusion peptide of SARS-CoV-2 spike protein is functionally important for membrane fusion during virus entry and is part of a broadly neutralizing epitope. However, sequence determinants at the fusion peptide and its adjacent regions for pathogenicity and antigenicity remain elusive. In this study, we performed a series of deep mutational scanning (DMS) experiments on an S2 region spanning the fusion peptide of authentic SARS-CoV-2 in different cell lines and in the presence of broadly neutralizing antibodies. We identified mutations at residue 813 of the spike protein that reduced TMPRSS2-mediated entry with decreased virulence. In addition, we showed that an F823Y mutation, present in bat betacoronavirus HKU9 spike protein, confers resistance to broadly neutralizing antibodies. Our findings provide mechanistic insights into SARS-CoV-2 pathogenicity and also highlight a potential challenge in developing broadly protective S2-based coronavirus vaccines.
Collapse
Affiliation(s)
- Ruipeng Lei
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Enya Qing
- Department of Microbiology and Immunology, Loyola University Chicago, Maywood, IL 60153, USA
| | - Abby Odle
- Department of Microbiology and Immunology, University of Iowa, Iowa City, IA 52242, USA
| | - Meng Yuan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Timothy J.C. Tan
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Natalie So
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Wenhao O. Ouyang
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Ian A. Wilson
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
- The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Tom Gallagher
- Department of Microbiology and Immunology, Loyola University Chicago, Maywood, IL 60153, USA
| | - Stanley Perlman
- Department of Microbiology and Immunology, University of Iowa, Iowa City, IA 52242, USA
- Department of Pediatrics, University of Iowa, Iowa City, IA 52242, USA
| | - Nicholas C. Wu
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Carle Illinois College of Medicine, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Lok-Yin Roy Wong
- Department of Microbiology and Immunology, University of Iowa, Iowa City, IA 52242, USA
- Center for Virus-Host-Innate Immunity, Rutgers New Jersey Medical School, Newark, NJ 07103, USA
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, NJ 07103, USA
| |
Collapse
|
35
|
Papkou A, Garcia-Pastor L, Escudero JA, Wagner A. A rugged yet easily navigable fitness landscape. Science 2023; 382:eadh3860. [PMID: 37995212 DOI: 10.1126/science.adh3860] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 09/29/2023] [Indexed: 11/25/2023]
Abstract
Fitness landscape theory predicts that rugged landscapes with multiple peaks impair Darwinian evolution, but experimental evidence is limited. In this study, we used genome editing to map the fitness of >260,000 genotypes of the key metabolic enzyme dihydrofolate reductase in the presence of the antibiotic trimethoprim, which targets this enzyme. The resulting landscape is highly rugged and harbors 514 fitness peaks. However, its highest peaks are accessible to evolving populations via abundant fitness-increasing paths. Different peaks share large basins of attraction that render the outcome of adaptive evolution highly contingent on chance events. Our work shows that ruggedness need not be an obstacle to Darwinian evolution but can reduce its predictability. If true in general, the complexity of optimization problems on realistic landscapes may require reappraisal.
Collapse
Affiliation(s)
- Andrei Papkou
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - Lucia Garcia-Pastor
- Departamento de Sanidad Animal and VISAVET Health Surveillance Centre, Universidad Complutense de Madrid, Madrid, Spain
| | - José Antonio Escudero
- Departamento de Sanidad Animal and VISAVET Health Surveillance Centre, Universidad Complutense de Madrid, Madrid, Spain
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
36
|
Maes S, Deploey N, Peelman F, Eyckerman S. Deep mutational scanning of proteins in mammalian cells. CELL REPORTS METHODS 2023; 3:100641. [PMID: 37963462 PMCID: PMC10694495 DOI: 10.1016/j.crmeth.2023.100641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/06/2023] [Accepted: 10/20/2023] [Indexed: 11/16/2023]
Abstract
Protein mutagenesis is essential for unveiling the molecular mechanisms underlying protein function in health, disease, and evolution. In the past decade, deep mutational scanning methods have evolved to support the functional analysis of nearly all possible single-amino acid changes in a protein of interest. While historically these methods were developed in lower organisms such as E. coli and yeast, recent technological advancements have resulted in the increased use of mammalian cells, particularly for studying proteins involved in human disease. These advancements will aid significantly in the classification and interpretation of variants of unknown significance, which are being discovered at large scale due to the current surge in the use of whole-genome sequencing in clinical contexts. Here, we explore the experimental aspects of deep mutational scanning studies in mammalian cells and report the different methods used in each step of the workflow, ultimately providing a useful guide toward the design of such studies.
Collapse
Affiliation(s)
- Stefanie Maes
- VIB Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium; Department of Biochemistry and Microbiology, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Nick Deploey
- VIB Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Frank Peelman
- VIB Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Sven Eyckerman
- VIB Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium.
| |
Collapse
|
37
|
Fahlberg SA, Freschlin CR, Heinzelman P, Romero PA. Neural network extrapolation to distant regions of the protein fitness landscape. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.08.566287. [PMID: 37987009 PMCID: PMC10659313 DOI: 10.1101/2023.11.08.566287] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks' capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models' extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. Our findings highlight how each architecture's inductive biases prime them to learn different aspects of the protein fitness landscape.
Collapse
Affiliation(s)
- Sarah A Fahlberg
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Chase R Freschlin
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Pete Heinzelman
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Philip A Romero
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA
- Department of Chemical & Biological Engineering, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
38
|
Ogbunugafor CB, Guerrero RF, Miller-Dickson MD, Shakhnovich EI, Shoulders MD. Epistasis and pleiotropy shape biophysical protein subspaces associated with drug resistance. Phys Rev E 2023; 108:054408. [PMID: 38115433 PMCID: PMC10935598 DOI: 10.1103/physreve.108.054408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 09/19/2023] [Indexed: 12/21/2023]
Abstract
Protein space is a rich analogy for genotype-phenotype maps, where amino acid sequence is organized into a high-dimensional space that highlights the connectivity between protein variants. It is a useful abstraction for understanding the process of evolution, and for efforts to engineer proteins towards desirable phenotypes. Few mentions of protein space consider how protein phenotypes can be described in terms of their biophysical components, nor do they rigorously interrogate how forces like epistasis-describing the nonlinear interaction between mutations and their phenotypic consequences-manifest across these components. In this study, we deconstruct a low-dimensional protein space of a bacterial enzyme (dihydrofolate reductase; DHFR) into "subspaces" corresponding to a set of kinetic and thermodynamic traits [k_{cat}, K_{M}, K_{i}, and T_{m} (melting temperature)]. We then examine how combinations of three mutations (eight alleles in total) display pleiotropy, or unique effects on individual subspace traits. We examine protein spaces across three orthologous DHFR enzymes (Escherichia coli, Listeria grayi, and Chlamydia muridarum), adding a genotypic context dimension through which epistasis occurs across subspaces. In doing so, we reveal that protein space is a deceptively complex notion, and that future applications to bioengineering should consider how interactions between amino acid substitutions manifest across different phenotypic subspaces.
Collapse
Affiliation(s)
- C. Brandon Ogbunugafor
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- Santa Fe Institute, Santa Fe, New Mexico, USA
| | - Rafael F. Guerrero
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Eugene I. Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Matthew D. Shoulders
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| |
Collapse
|
39
|
Wang S, Zhang TH, Hu M, Tang K, Sheng L, Hong M, Chen D, Chen L, Shi Y, Feng J, Qian J, Sun L, Ding K, Sun R, Du Y. Deep mutational scanning of influenza A virus neuraminidase facilitates the identification of drug resistance mutations in vivo. mSystems 2023; 8:e0067023. [PMID: 37772870 PMCID: PMC10654105 DOI: 10.1128/msystems.00670-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 08/09/2023] [Indexed: 09/30/2023] Open
Abstract
IMPORTANCE NA is a crucial surface antigen and drug target of influenza A virus. A comprehensive understanding of NA's mutational effect and drug resistance profiles in vivo is essential for comprehending the evolutionary constraints and making informed choices regarding drug selection to combat resistance in clinical settings. In the current study, we established an efficient deep mutational screening system in mouse lung tissues and systematically evaluated the fitness effect and drug resistance to three neuraminidase inhibitors of NA single-nucleotide mutations. The fitness of NA mutants is generally correlated with a natural mutation in the database. The fitness of NA mutants is influenced by biophysical factors such as protein stability, complex formation, and the immune response triggered by viral infection. In addition to confirming previously reported drug-resistant mutations, novel mutations were identified. Interestingly, we identified an allosteric drug-resistance mutation that is not located within the drug-binding pocket but potentially affects drug binding by interfering with NA tetramerization. The dual assessments performed in this study provide a more accurate assessment of the evolutionary potential of drug-resistant mutations and offer guidance for the rational selection of antiviral drugs.
Collapse
Affiliation(s)
- Sihan Wang
- Cancer Institute (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education), The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Affiliated Hangzhou First People’s Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Tian-hao Zhang
- Molecular Biology Institute, University of California, Los Angeles, California, USA
| | - Menglong Hu
- Cancer Institute (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education), The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Kejun Tang
- Department of Surgery, Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Li Sheng
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, California, USA
- School of Biomedical Sciences, LKS Faculty of Medicine, The Hong Kong University, Hong Kong, China
| | - Mengying Hong
- Cancer Institute (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education), The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Dongdong Chen
- Department of Ultrasound in Medicine, The Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, China
| | - Liubo Chen
- Department of Medical Oncology, The Second Affiliated Hospital of Zhejiang University, School of Medicine, Hangzhou, China
| | - Yuan Shi
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, California, USA
| | - Jun Feng
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, California, USA
| | - Jing Qian
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Lifeng Sun
- Department of Colorectal Surgery and Oncology, Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Kefeng Ding
- Department of Colorectal Surgery and Oncology, Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Ren Sun
- Cancer Institute (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education), The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, California, USA
- Center for Infectious Disease Research, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| | - Yushen Du
- Cancer Institute (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education), The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| |
Collapse
|
40
|
Busia A, Listgarten J. MBE: model-based enrichment estimation and prediction for differential sequencing data. Genome Biol 2023; 24:218. [PMID: 37784130 PMCID: PMC10544408 DOI: 10.1186/s13059-023-03058-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 09/14/2023] [Indexed: 10/04/2023] Open
Abstract
Characterizing differences in sequences between two conditions, such as with and without drug exposure, using high-throughput sequencing data is a prevalent problem involving quantifying changes in sequence abundances, and predicting such differences for unobserved sequences. A key shortcoming of current approaches is their extremely limited ability to share information across related but non-identical reads. Consequently, they cannot use sequencing data effectively, nor be directly applied in many settings of interest. We introduce model-based enrichment (MBE) to overcome this shortcoming. We evaluate MBE using both simulated and real data. Overall, MBE improves accuracy compared to current differential analysis methods.
Collapse
Affiliation(s)
- Akosua Busia
- Department of Electrical Engineering & Computer Science, University of California, Berkeley, Berkeley, 94720, CA, USA.
| | - Jennifer Listgarten
- Department of Electrical Engineering & Computer Science, University of California, Berkeley, Berkeley, 94720, CA, USA.
| |
Collapse
|
41
|
Xie X, Sun X, Wang Y, Lehner B, Li X. Dominance vs epistasis: the biophysical origins and plasticity of genetic interactions within and between alleles. Nat Commun 2023; 14:5551. [PMID: 37689712 PMCID: PMC10492795 DOI: 10.1038/s41467-023-41188-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 08/25/2023] [Indexed: 09/11/2023] Open
Abstract
An important challenge in genetics, evolution and biotechnology is to understand and predict how mutations combine to alter phenotypes, including molecular activities, fitness and disease. In diploids, mutations in a gene can combine on the same chromosome or on different chromosomes as a "heteroallelic combination". However, a direct comparison of the extent, sign, and stability of the genetic interactions between variants within and between alleles is lacking. Here we use thermodynamic models of protein folding and ligand-binding to show that interactions between mutations within and between alleles are expected in even very simple biophysical systems. Protein folding alone generates within-allele interactions and a single molecular interaction is sufficient to cause between-allele interactions and dominance. These interactions change differently, quantitatively and qualitatively as a system becomes more complex. Altering the concentration of a ligand can, for example, switch alleles from dominant to recessive. Our results show that intra-molecular epistasis and dominance should be widely expected in even the simplest biological systems but also reinforce the view that they are plastic system properties and so a formidable challenge to predict. Accurate prediction of both intra-molecular epistasis and dominance will require either detailed mechanistic understanding and experimental parameterization or brute-force measurement and learning.
Collapse
Affiliation(s)
- Xuan Xie
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China
| | - Xia Sun
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China
- Deanery of Biomedical Sciences, College of Medicine & Veterinary Medicine, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | - Yuheng Wang
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China
- Deanery of Biomedical Sciences, College of Medicine & Veterinary Medicine, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | - Ben Lehner
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, 08003, Spain.
- ICREA, Pg. Luis Companys 23, Barcelona, 08010, Spain.
- Wellcome Sanger Institute, Wellcome Genome Campus Hinxton, Cambridge, CB10 1SA, UK.
| | - Xianghua Li
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China.
- Wellcome Sanger Institute, Wellcome Genome Campus Hinxton, Cambridge, CB10 1SA, UK.
- Deanery of Biomedical Sciences, College of Medicine & Veterinary Medicine, University of Edinburgh, Edinburgh, EH8 9XD, UK.
- Biomedical and Health Translational Centre of Zhejiang Province, Haizhou East Road 718, Haining, 314400, P. R. China.
| |
Collapse
|
42
|
Chen L, Zhang Z, Li Z, Li R, Huo R, Chen L, Wang D, Luo X, Chen K, Liao C, Zheng M. Learning protein fitness landscapes with deep mutational scanning data from multiple sources. Cell Syst 2023; 14:706-721.e5. [PMID: 37591206 DOI: 10.1016/j.cels.2023.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/30/2023] [Accepted: 07/18/2023] [Indexed: 08/19/2023]
Abstract
One of the key points of machine learning-assisted directed evolution (MLDE) is the accurate learning of the fitness landscape, a conceptual mapping from sequence variants to the desired function. Here, we describe a multi-protein training scheme that leverages the existing deep mutational scanning data from diverse proteins to aid in understanding the fitness landscape of a new protein. Proof-of-concept trials are designed to validate this training scheme in three aspects: random and positional extrapolation for single-variant effects, zero-shot fitness predictions for new proteins, and extrapolation for higher-order variant effects from single-variant effects. Moreover, our study identified previously overlooked strong baselines, and their unexpectedly good performance brings our attention to the pitfalls of MLDE. Overall, these results may improve our understanding of the association between different protein fitness profiles and shed light on developing better machine learning-assisted approaches to the directed evolution of proteins. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Lin Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zehong Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhenghao Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; Shanghai Institute for Advanced Immunochemical Studies, School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Rui Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Ruifeng Huo
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Lifan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | | | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Cangsong Liao
- University of Chinese Academy of Sciences, Beijing 100049, China; Chemical Biology Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Science, Shanghai 201203, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China; School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China.
| |
Collapse
|
43
|
Hauser BM, Luo Y, Nathan A, Gaiha GD, Vavvas D, Comander J, Pierce EA, Place EM, Bujakowska KM, Rossin EJ. Structure-based network analysis predicts mutations associated with inherited retinal disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.05.23292247. [PMID: 37461650 PMCID: PMC10350150 DOI: 10.1101/2023.07.05.23292247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
With continued advances in gene sequencing technologies comes the need to develop better tools to understand which mutations cause disease. Here we validate structure-based network analysis (SBNA)1,2 in well-studied human proteins and report results of using SBNA to identify critical amino acids that may cause retinal disease if subject to missense mutation. We computed SBNA scores for genes with high-quality structural data, starting with validating the method using 4 well-studied human disease-associated proteins. We then analyzed 47 inherited retinal disease (IRD) genes. We compared SBNA scores to phenotype data from the ClinVar database and found a significant difference between benign and pathogenic mutations with respect to network score. Finally, we applied this approach to 65 patients at Massachusetts Eye and Ear (MEE) who were diagnosed with IRD but for whom no genetic cause was found. Multivariable logistic regression models built using SBNA scores for IRD-associated genes successfully predicted pathogenicity of novel mutations, allowing us to identify likely causative disease variants in 37 patients with IRD from our clinic. In conclusion, SBNA can be meaningfully applied to human proteins and may help predict mutations causative of IRD.
Collapse
Affiliation(s)
| | - Yuyang Luo
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Anusha Nathan
- Ragon Institute of Mass General, MIT, and Harvard, Cambridge, MA
| | - Gaurav D. Gaiha
- Ragon Institute of Mass General, MIT, and Harvard, Cambridge, MA
| | - Demetrios Vavvas
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Jason Comander
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Eric A. Pierce
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Emily M. Place
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Kinga M. Bujakowska
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| | - Elizabeth J. Rossin
- Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
| |
Collapse
|
44
|
Konecki DM, Hamrick S, Wang C, Agosto MA, Wensel TG, Lichtarge O. CovET: A covariation-evolutionary trace method that identifies protein structure-function modules. J Biol Chem 2023; 299:104896. [PMID: 37290531 PMCID: PMC10338321 DOI: 10.1016/j.jbc.2023.104896] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 06/01/2023] [Accepted: 06/02/2023] [Indexed: 06/10/2023] Open
Abstract
Measuring the relative effect that any two sequence positions have on each other may improve protein design or help better interpret coding variants. Current approaches use statistics and machine learning but rarely consider phylogenetic divergences which, as shown by Evolutionary Trace studies, provide insight into the functional impact of sequence perturbations. Here, we reframe covariation analyses in the Evolutionary Trace framework to measure the relative tolerance to perturbation of each residue pair during evolution. This approach (CovET) systematically accounts for phylogenetic divergences: at each divergence event, we penalize covariation patterns that belie evolutionary coupling. We find that while CovET approximates the performance of existing methods to predict individual structural contacts, it performs significantly better at finding structural clusters of coupled residues and ligand binding sites. For example, CovET found more functionally critical residues when we examined the RNA recognition motif and WW domains. It correlates better with large-scale epistasis screen data. In the dopamine D2 receptor, top CovET residue pairs recovered accurately the allosteric activation pathway characterized for Class A G protein-coupled receptors. These data suggest that CovET ranks highest the sequence position pairs that play critical functional roles through epistatic and allosteric interactions in evolutionarily relevant structure-function motifs. CovET complements current methods and may shed light on fundamental molecular mechanisms of protein structure and function.
Collapse
Affiliation(s)
- Daniel M Konecki
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Spencer Hamrick
- Chemical, Physical, and Structural Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Chen Wang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Melina A Agosto
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Theodore G Wensel
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Cancer and Cell Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Olivier Lichtarge
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Cancer and Cell Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
45
|
Wagner A. Evolvability-enhancing mutations in the fitness landscapes of an RNA and a protein. Nat Commun 2023; 14:3624. [PMID: 37336901 PMCID: PMC10279741 DOI: 10.1038/s41467-023-39321-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 06/05/2023] [Indexed: 06/21/2023] Open
Abstract
Can evolvability-the ability to produce adaptive heritable variation-itself evolve through adaptive Darwinian evolution? If so, then Darwinian evolution may help create the conditions that enable Darwinian evolution. Here I propose a framework that is suitable to address this question with available experimental data on adaptive landscapes. I introduce the notion of an evolvability-enhancing mutation, which increases the likelihood that subsequent mutations in an evolving organism, protein, or RNA molecule are adaptive. I search for such mutations in the experimentally characterized and combinatorially complete fitness landscapes of a protein and an RNA molecule. I find that such evolvability-enhancing mutations indeed exist. They constitute a small fraction of all mutations, which shift the distribution of fitness effects of subsequent mutations towards less deleterious mutations, and increase the incidence of beneficial mutations. Evolving populations which experience such mutations can evolve significantly higher fitness. The study of evolvability-enhancing mutations opens many avenues of investigation into the evolution of evolvability.
Collapse
Affiliation(s)
- Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland.
- The Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
46
|
Yee SW, Macdonald C, Mitrovic D, Zhou X, Koleske ML, Yang J, Silva DB, Grimes PR, Trinidad D, More SS, Kachuri L, Witte JS, Delemotte L, Giacomini KM, Coyote-Maestas W. The full spectrum of OCT1 (SLC22A1) mutations bridges transporter biophysics to drug pharmacogenomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.06.543963. [PMID: 37333090 PMCID: PMC10274788 DOI: 10.1101/2023.06.06.543963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Membrane transporters play a fundamental role in the tissue distribution of endogenous compounds and xenobiotics and are major determinants of efficacy and side effects profiles. Polymorphisms within these drug transporters result in inter-individual variation in drug response, with some patients not responding to the recommended dosage of drug whereas others experience catastrophic side effects. For example, variants within the major hepatic Human organic cation transporter OCT1 (SLC22A1) can change endogenous organic cations and many prescription drug levels. To understand how variants mechanistically impact drug uptake, we systematically study how all known and possible single missense and single amino acid deletion variants impact expression and substrate uptake of OCT1. We find that human variants primarily disrupt function via folding rather than substrate uptake. Our study revealed that the major determinants of folding reside in the first 300 amino acids, including the first 6 transmembrane domains and the extracellular domain (ECD) with a stabilizing and highly conserved stabilizing helical motif making key interactions between the ECD and transmembrane domains. Using the functional data combined with computational approaches, we determine and validate a structure-function model of OCT1s conformational ensemble without experimental structures. Using this model and molecular dynamic simulations of key mutants, we determine biophysical mechanisms for how specific human variants alter transport phenotypes. We identify differences in frequencies of reduced function alleles across populations with East Asians vs European populations having the lowest and highest frequency of reduced function variants, respectively. Mining human population databases reveals that reduced function alleles of OCT1 identified in this study associate significantly with high LDL cholesterol levels. Our general approach broadly applied could transform the landscape of precision medicine by producing a mechanistic basis for understanding the effects of human mutations on disease and drug response.
Collapse
Affiliation(s)
- Sook Wah Yee
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Christian Macdonald
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Darko Mitrovic
- Science for Life Laboratory, Department of Applied Physics, KTH Royal Institute of Technology, 12121 Solna, Sweden
| | - Xujia Zhou
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Megan L Koleske
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Jia Yang
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Dina Buitrago Silva
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Patrick Rockefeller Grimes
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Donovan Trinidad
- Department of Medicine, Division of Infectious Disease, University of California, San Francisco, United States
| | - Swati S More
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
- Current address: Center for Drug Design (CDD), College of Pharmacy, University of Minnesota, Minnesota, United States
| | - Linda Kachuri
- Epidemiology and Population Health, Stanford University, California, United States
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, United States
| | - John S Witte
- Epidemiology and Population Health, Stanford University, California, United States
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, United States
| | - Lucie Delemotte
- Science for Life Laboratory, Department of Applied Physics, KTH Royal Institute of Technology, 12121 Solna, Sweden
| | - Kathleen M Giacomini
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Willow Coyote-Maestas
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
- Quantitative Biosciences Institute, University of California, San Francisco, United States
| |
Collapse
|
47
|
Baier F, Gauye F, Perez-Carrasco R, Payne JL, Schaerli Y. Environment-dependent epistasis increases phenotypic diversity in gene regulatory networks. SCIENCE ADVANCES 2023; 9:eadf1773. [PMID: 37224262 DOI: 10.1126/sciadv.adf1773] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 04/17/2023] [Indexed: 05/26/2023]
Abstract
Mutations to gene regulatory networks can be maladaptive or a source of evolutionary novelty. Epistasis confounds our understanding of how mutations affect the expression patterns of gene regulatory networks, a challenge exacerbated by the dependence of epistasis on the environment. We used the toolkit of synthetic biology to systematically assay the effects of pairwise and triplet combinations of mutant genotypes on the expression pattern of a gene regulatory network expressed in Escherichia coli that interprets an inducer gradient across a spatial domain. We uncovered a preponderance of epistasis that can switch in magnitude and sign across the inducer gradient to produce a greater diversity of expression pattern phenotypes than would be possible in the absence of such environment-dependent epistasis. We discuss our findings in the context of the evolution of hybrid incompatibilities and evolutionary novelties.
Collapse
Affiliation(s)
- Florian Baier
- Department of Fundamental Microbiology, University of Lausanne, Biophore Building, 1015 Lausanne, Switzerland
| | - Florence Gauye
- Department of Fundamental Microbiology, University of Lausanne, Biophore Building, 1015 Lausanne, Switzerland
| | | | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland
| | - Yolanda Schaerli
- Department of Fundamental Microbiology, University of Lausanne, Biophore Building, 1015 Lausanne, Switzerland
| |
Collapse
|
48
|
Chen Y, Hu R, Li K, Zhang Y, Fu L, Zhang J, Si T. Deep Mutational Scanning of an Oxygen-Independent Fluorescent Protein CreiLOV for Comprehensive Profiling of Mutational and Epistatic Effects. ACS Synth Biol 2023; 12:1461-1473. [PMID: 37066862 PMCID: PMC10204710 DOI: 10.1021/acssynbio.2c00662] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Indexed: 04/18/2023]
Abstract
Oxygen-independent, flavin mononucleotide-based fluorescent proteins (FbFPs) are promising alternatives to green fluorescent protein in anaerobic contexts. Deep mutational scanning performs systematic profiling of protein sequence-function relationships but has not been applied to FbFPs. Focusing on CreiLOV from Chlamydomonas reinhardtii, we created and analyzed two comprehensive mutant collections: (1) single-residue, site-saturation mutagenesis libraries covering all 118 residues; and (2) a full combinatorial metagenesis library among 20 mutations at 15 residues, where mutation and residue selection was based on single-site mutagenesis results. Notably, the second type of library is indispensable to study higher-order epistasis but underrepresented in the literature. Using optimized FACS-seq assays, 2,185 (>92.5%) out of 2,360 possible single-site mutants and 165,428 (>89.7%) out of 184,320 possible combinatorial mutants were reliably assigned with fitness values. We constructed statistical and machine-learning models to analyze the CreiLOV data set, enabling accurate fitness prediction of higher-order mutants using lower-order mutagenesis data. In addition, we successfully isolated CreiLOV variants with improved fluorescence quantum yield and thermostability. This work provides new empirical data and design rules to engineer combinatorial protein variants.
Collapse
Affiliation(s)
- Yongcan Chen
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Ruyun Hu
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Keyi Li
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yating Zhang
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Lihao Fu
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University
of Chinese Academy of Sciences, Beijing 100049, China
| | - Jianzhi Zhang
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Tong Si
- CAS
Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute
of Synthetic Biology, Shenzhen Institute
of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- BGI-Shenzhen, Shenzhen 518083, China
- University
of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
49
|
Moulana A, Dupic T, Phillips AM, Desai MM. Genotype-phenotype landscapes for immune-pathogen coevolution. Trends Immunol 2023; 44:384-396. [PMID: 37024340 PMCID: PMC10147585 DOI: 10.1016/j.it.2023.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/08/2023] [Accepted: 03/09/2023] [Indexed: 04/07/2023]
Abstract
Our immune systems constantly coevolve with the pathogens that challenge them, as pathogens adapt to evade our defense responses, with our immune repertoires shifting in turn. These coevolutionary dynamics take place across a vast and high-dimensional landscape of potential pathogen and immune receptor sequence variants. Mapping the relationship between these genotypes and the phenotypes that determine immune-pathogen interactions is crucial for understanding, predicting, and controlling disease. Here, we review recent developments applying high-throughput methods to create large libraries of immune receptor and pathogen protein sequence variants and measure relevant phenotypes. We describe several approaches that probe different regions of the high-dimensional sequence space and comment on how combinations of these methods may offer novel insight into immune-pathogen coevolution.
Collapse
Affiliation(s)
- Alief Moulana
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Thomas Dupic
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Angela M Phillips
- Department of Microbiology and Immunology, University of California at San Francisco, San Francisco, CA 94143, USA
| | - Michael M Desai
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA; Department of Physics, Harvard University, Cambridge, MA 02138, USA; NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, MA 02138, USA; Quantitative Biology Initiative, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
50
|
Tan TJC, Mou Z, Lei R, Ouyang WO, Yuan M, Song G, Andrabi R, Wilson IA, Kieffer C, Dai X, Matreyek KA, Wu NC. High-throughput identification of prefusion-stabilizing mutations in SARS-CoV-2 spike. Nat Commun 2023; 14:2003. [PMID: 37037866 PMCID: PMC10086000 DOI: 10.1038/s41467-023-37786-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 03/31/2023] [Indexed: 04/12/2023] Open
Abstract
Designing prefusion-stabilized SARS-CoV-2 spike is critical for the effectiveness of COVID-19 vaccines. All COVID-19 vaccines in the US encode spike with K986P/V987P mutations to stabilize its prefusion conformation. However, contemporary methods on engineering prefusion-stabilized spike immunogens involve tedious experimental work and heavily rely on structural information. Here, we establish a systematic and unbiased method of identifying mutations that concomitantly improve expression and stabilize the prefusion conformation of the SARS-CoV-2 spike. Our method integrates a fluorescence-based fusion assay, mammalian cell display technology, and deep mutational scanning. As a proof-of-concept, we apply this method to a region in the S2 domain that includes the first heptad repeat and central helix. Our results reveal that besides K986P and V987P, several mutations simultaneously improve expression and significantly lower the fusogenicity of the spike. As prefusion stabilization is a common challenge for viral immunogen design, this work will help accelerate vaccine development against different viruses.
Collapse
Affiliation(s)
- Timothy J C Tan
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
| | - Zongjun Mou
- Department of Physiology and Biophysics, Case Western Reserve University School of Medicine, Cleveland, OH, 44106, USA
| | - Ruipeng Lei
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
| | - Wenhao O Ouyang
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
| | - Meng Yuan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Ge Song
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Raiees Andrabi
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Ian A Wilson
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA, 92037, USA
- The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Collin Kieffer
- Department of Microbiology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA
| | - Xinghong Dai
- Department of Physiology and Biophysics, Case Western Reserve University School of Medicine, Cleveland, OH, 44106, USA
| | - Kenneth A Matreyek
- Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, OH, 44106, USA
| | - Nicholas C Wu
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA.
- Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA.
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA.
- Carle Illinois College of Medicine, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|