1
|
Vila JA. The origin of mutational epistasis. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2024; 53:473-480. [PMID: 39443382 DOI: 10.1007/s00249-024-01725-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 10/03/2024] [Accepted: 10/06/2024] [Indexed: 10/25/2024]
Abstract
The interconnected processes of protein folding, mutations, epistasis, and evolution have all been the subject of extensive analysis throughout the years due to their significance for structural and evolutionary biology. The origin (molecular basis) of epistasis-the non-additive interactions between mutations-is still, nonetheless, unknown. The existence of a new perspective on protein folding, a problem that needs to be conceived as an 'analytic whole', will enable us to shed light on the origin of mutational epistasis at the simplest level-within proteins-while also uncovering the reasons why the genetic background in which they occur, a key component of molecular evolution, could foster changes in epistasis effects. Additionally, because mutations are the source of epistasis, more research is needed to determine the impact of post-translational modifications, which can potentially increase the proteome's diversity by several orders of magnitude, on mutational epistasis and protein evolvability. Finally, a protein evolution thermodynamic-based analysis that does not consider specific mutational steps or epistasis effects will be briefly discussed. Our study explores the complex processes behind the evolution of proteins upon mutations, clearing up some previously unresolved issues, and providing direction for further research.
Collapse
Affiliation(s)
- Jorge A Vila
- IMASL-CONICET, Ejército de Los Andes 950, 5700, San Luis, Argentina.
| |
Collapse
|
2
|
Simon JJ, Fowler DM, Maly DJ. Multiplexed profiling of intracellular protein abundance, activity, interactions and druggability with LABEL-seq. Nat Methods 2024; 21:2094-2106. [PMID: 39433876 DOI: 10.1038/s41592-024-02456-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 09/10/2024] [Indexed: 10/23/2024]
Abstract
Here we describe labeling with barcodes and enrichment for biochemical analysis by sequencing (LABEL-seq), an assay for massively parallel profiling of pooled protein variants in human cells. By leveraging the intracellular self-assembly of an RNA-binding domain (RBD) with a stable, variant-encoding RNA barcode, LABEL-seq facilitates the direct measurement of protein properties and functions using simple affinity enrichments of RBD protein fusions, followed by high-throughput sequencing of co-enriched barcodes. Measurement of ~20,000 variant effects for ~1,600 BRaf variants revealed that variation at positions frequently mutated in cancer minimally impacted intracellular abundance but could dramatically alter activity, protein-protein interactions and druggability. Integrative analysis identified networks of positions with similar biochemical roles and enabled modeling of variant effects on cell proliferation and small molecule-promoted degradation. Thus, LABEL-seq enables direct measurement of multiple biochemical properties in a native cellular context, providing insights into protein function, disease mechanisms and druggability.
Collapse
Affiliation(s)
- Jessica J Simon
- Department of Chemistry, University of Washington, Seattle, WA, USA
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
| | - Dustin J Maly
- Department of Chemistry, University of Washington, Seattle, WA, USA.
- Department of Biochemistry, University of Washington, Seattle, WA, USA.
| |
Collapse
|
3
|
Measuring multiple intracellular biochemical properties of proteins with next-generation sequencing. Nat Methods 2024; 21:1988-1989. [PMID: 39433881 DOI: 10.1038/s41592-024-02461-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
|
4
|
Dou Z, He J, Han C, Wu X, Wan L, Yang J, Zheng Y, Gong B, Wang L. qProtein: Exploring Physical Features of Protein Thermostability Based on Structural Proteomics. J Chem Inf Model 2024; 64:7885-7894. [PMID: 39375829 DOI: 10.1021/acs.jcim.4c01303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/09/2024]
Abstract
Thermostability, which is essential for the functional performance of enzymes, is largely determined by intramolecular physical interactions. Although many tools have been developed, existing computational methods have struggled to find the universal principles of protein thermostability. Recent advancements in structural proteomics have been driven by the introduction of deep neural networks such as AlphaFold2 and ESMFold. These innovations have enabled the characterization of protein structures with unprecedented speed and accuracy. Here, we introduce qProtein, a Python-implemented workflow designed for the quantitative analysis of physical interactions on the scale of structural proteomics. This platform accepts protein sequences as input and produces four structural features, including hydrophobic clusters, hydrogen bonds, electrostatic interactions, and disulfide bonds. To demonstrate the use of qProtein, we investigate the structural features related to protein thermostability in six glycoside hydrolase (GH) families, comprising a total of 3,811 protein structures. Our results indicate that in five enzyme families (GH11, GH12, GH5_2, GH10, and GH48), the thermophilic enzymes have a larger average area of hydrophobic clusters compared to the nonthermophilic enzymes within each family. Furthermore, our analysis of the local-structure regions reveals that the hydrophobic clusters are predominantly distributed in the distal regions of the GH11 enzymes. In addition, the average hydrophobic cluster area of the thermophilic enzymes is significantly higher than that of the nonthermophilic enzymes in the distal regions of the GH11 enzymes. Therefore, qProtein is a well-suited platform for analyzing the structural features of thermal stability at the level of structural proteomics. We provide the source code for qProtein at https://github.com/bj600800/qProtein, and the web server is available at http://qProtein.sdu.edu.cn:8888.
Collapse
Affiliation(s)
- Zhixin Dou
- State Key Laboratory of Microbial Technology, Shandong University, No. 72 Binhai Road, Qingdao 266237, P.R. China
| | - Jiaxin He
- School of Computer Science and Technology, Shandong University, No. 72 Binhai Road, Qingdao 266237, P.R. China
| | - Chao Han
- Shandong Key Laboratory of Agricultural Microbiology, Shandong Agricultural University, Tai'an 271018, China
| | - Xiuyun Wu
- State Key Laboratory of Microbial Technology, Shandong University, No. 72 Binhai Road, Qingdao 266237, P.R. China
| | - Lin Wan
- School of Software, Shandong University, Shunhua Road, Jinan 250101, P.R. China
| | - Jian Yang
- School of Computer Science and Technology, Shandong University, No. 72 Binhai Road, Qingdao 266237, P.R. China
| | - Yanwei Zheng
- School of Computer Science and Technology, Shandong University, No. 72 Binhai Road, Qingdao 266237, P.R. China
| | - Bin Gong
- School of Software, Shandong University, Shunhua Road, Jinan 250101, P.R. China
| | - Lushan Wang
- State Key Laboratory of Microbial Technology, Shandong University, No. 72 Binhai Road, Qingdao 266237, P.R. China
| |
Collapse
|
5
|
Xu Y, Liu D, Gong H. Improving the prediction of protein stability changes upon mutations by geometric learning and a pre-training strategy. NATURE COMPUTATIONAL SCIENCE 2024:10.1038/s43588-024-00716-2. [PMID: 39455825 DOI: 10.1038/s43588-024-00716-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 10/03/2024] [Indexed: 10/28/2024]
Abstract
Accurate prediction of protein mutation effects is of great importance in protein engineering and design. Here we propose GeoStab-suite, a suite of three geometric learning-based models-GeoFitness, GeoDDG and GeoDTm-for the prediction of fitness score, ΔΔG and ΔTm of a protein upon mutations, respectively. GeoFitness engages a specialized loss function to allow supervised training of a unified model using the large amount of multi-labeled fitness data in the deep mutational scanning database. To further improve the downstream tasks of ΔΔG and ΔTm prediction, the encoder of GeoFitness is reutilized as a pre-trained module in GeoDDG and GeoDTm to overcome the challenge of lacking sufficient labeled data. This pre-training strategy, in combination with data expansion, markedly improves model performance and generalizability. In the benchmark test, GeoDDG and GeoDTm outperform the other state-of-the-art methods by at least 30% and 70%, respectively, in terms of the Spearman correlation coefficient.
Collapse
Affiliation(s)
- Yunxin Xu
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China
| | - Di Liu
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China.
| |
Collapse
|
6
|
Chillón-Pino D, Badonyi M, Semple CA, Marsh JA. Protein structural context of cancer mutations reveals molecular mechanisms and candidate driver genes. Cell Rep 2024; 43:114905. [PMID: 39441719 DOI: 10.1016/j.celrep.2024.114905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 08/23/2024] [Accepted: 10/08/2024] [Indexed: 10/25/2024] Open
Abstract
Advances in protein structure determination and modeling allow us to study the structural context of human genetic variants on an unprecedented scale. Here, we analyze millions of cancer-associated missense mutations based on their structural locations and predicted perturbative effects. By considering the collective properties of mutations at the level of individual proteins, we identify distinct patterns associated with tumor suppressors and oncogenes. Tumor suppressors are enriched in structurally damaging mutations, consistent with loss-of-function mechanisms, while oncogene mutations tend to be structurally mild, reflecting selection for gain-of-function driver mutations and against loss-of-function mutations. Although oncogenes are difficult to distinguish from genes with no role in cancer using only structural damage, we find that the three-dimensional clustering of mutations is highly predictive. These observations allow us to identify candidate driver genes and speculate about their molecular roles, which we expect will have general utility in the analysis of cancer sequencing data.
Collapse
Affiliation(s)
- Diego Chillón-Pino
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Mihaly Badonyi
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Colin A Semple
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Joseph A Marsh
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
7
|
Sun M, Stoltzfus A, McCandlish DM. A fitness distribution law for amino-acid replacements. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.11.617952. [PMID: 39464166 PMCID: PMC11507765 DOI: 10.1101/2024.10.11.617952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
The effect of replacing the amino acid at a given site in a protein is difficult to predict. Yet, evolutionary comparisons have revealed highly regular patterns of interchangeability between pairs of amino acids, and such patterns have proved enormously useful in a range of applications in bioinformatics, evolutionary inference, and protein design. Here we reconcile these apparently contradictory observations using fitness data from over 350,000 experimental amino acid replacements. Almost one-quarter of the 20 × 19 = 380 types of replacements have broad distributions of fitness effects (DFEs) that closely resemble the background DFE for random changes, indicating an overwhelming influence of protein context in determining mutational effects. However, we also observe that the 380 pair-specific DFEs closely follow a maximum entropy distribution, specifically a truncated exponential distribution. The shape of this distribution is determined entirely by its mean, which is equivalent to the chance that a replacement of the given type is fitter than a random replacement. In this type of distribution, modest deviations in the mean correspond to much larger changes in the probability of falling in the far right tail, so that modest differences in mean exchangeability may result in much larger differences in the chance of a highly fit mutation. Indeed, we show that under the assumption that purifying selection filters out the vast majority of mutations, the maximum entropy distributions of fitness effects inferred from deep mutational scanning experiments predict the characteristic patterns of amino acid change observed in molecular evolution. These maximum entropy distributions of mutational effects not only provide a tuneable model for molecular evolution, but also have implications for mutational effect prediction and protein engineering.
Collapse
Affiliation(s)
- Mengyi Sun
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Arlin Stoltzfus
- Office of Data and Informatics, Material Measurement Laboratory, NIST, Gaithersburg, MD
- Institute for Bioscience and Biotechnology Research, Rockville, USA
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
8
|
Hou C, Shen Y. SeqDance: A Protein Language Model for Representing Protein Dynamic Properties. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.11.617911. [PMID: 39464109 PMCID: PMC11507661 DOI: 10.1101/2024.10.11.617911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Proteins perform their functions by folding amino acid sequences into dynamic structural ensembles. Despite the important role of protein dynamics, their complexity and the absence of efficient representation methods have limited their integration into studies on protein function and mutation fitness, especially in deep learning applications. To address this, we present SeqDance, a protein language model designed to learn representation of protein dynamic properties directly from sequence alone. SeqDance is pre-trained on dynamic biophysical properties derived from over 30,400 molecular dynamics trajectories and 28,600 normal mode analyses. Our results show that SeqDance effectively captures local dynamic interactions, co-movement patterns, and global conformational features, even for proteins lacking homologs in the pre-training set. Additionally, we showed that SeqDance enhances the prediction of protein fitness landscapes, disorder-to-order transition binding regions, and phase-separating proteins. By learning dynamic properties from sequence, SeqDance complements conventional evolution- and static structure-based methods, offering new insights into protein behavior and function.
Collapse
Affiliation(s)
- Chao Hou
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY 10032
| |
Collapse
|
9
|
Zhao J, Kong D, Zhang G, Zhang S, Wu Y, Dai C, Chen Y, Yang Y, Liu Y, Wei D. An Efficient CRISPR/Cas Cooperative Shearing Platform for Clinical Diagnostics Applications. Angew Chem Int Ed Engl 2024:e202411705. [PMID: 39394860 DOI: 10.1002/anie.202411705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 10/05/2024] [Accepted: 10/07/2024] [Indexed: 10/14/2024]
Abstract
The CRISPR/Cas system is a powerful genome editing tool and possesses widespread applications in molecular diagnostics, therapeutics and genetic engineering. But easy folding of the target sequences causes remarkable deterioration of the recognition and shear efficiency in the case of single Cas-CRISPR RNA (crRNA) duplex. Here, we develop a CRISPR/Cas cooperative shearing (CRISPR-CS) system. Compared with traditional CRISPR/Cas system, two CRISPR/Cas-crRNA duplexes simultaneously recognize different sites in the target sequence, increasing recognition possibility and shearing efficiency. Cooperative shearing cuts more methylene blue-ssDNA reporters on the electrode, enabling unamplified nucleic acid electrochemical assay in less than 5 minutes with a detection limit of 9.5×10-20 M, 2 to 9 orders of magnitude lower than those of other electrochemical assays. The CRISPR-CS platform detects monkeypox, human papilloma virus and amyotrophic lateral sclerosis with an accuracy up to 98.1 %, demonstrating the potential application of the efficient cooperative shearing.
Collapse
Affiliation(s)
- Junhong Zhao
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| | - Derong Kong
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| | - Guanghui Zhang
- Shenzhen Hengsheng Hospital, Department of Laboratory Medicine, Shenzhen, Guangdong, 518102, P. R. China
| | - Shen Zhang
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| | - Yungen Wu
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| | - Changhao Dai
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| | - Yiheng Chen
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| | - Yuetong Yang
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| | - Yunqi Liu
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| | - Dacheng Wei
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| |
Collapse
|
10
|
Faure AJ, Martí-Aranda A, Hidalgo-Carcedo C, Beltran A, Schmiedel JM, Lehner B. The genetic architecture of protein stability. Nature 2024; 634:995-1003. [PMID: 39322666 PMCID: PMC11499273 DOI: 10.1038/s41586-024-07966-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 08/20/2024] [Indexed: 09/27/2024]
Abstract
There are more ways to synthesize a 100-amino acid (aa) protein (20100) than there are atoms in the universe. Only a very small fraction of such a vast sequence space can ever be experimentally or computationally surveyed. Deep neural networks are increasingly being used to navigate high-dimensional sequence spaces1. However, these models are extremely complicated. Here, by experimentally sampling from sequence spaces larger than 1010, we show that the genetic architecture of at least some proteins is remarkably simple, allowing accurate genetic prediction in high-dimensional sequence spaces with fully interpretable energy models. These models capture the nonlinear relationships between free energies and phenotypes but otherwise consist of additive free energy changes with a small contribution from pairwise energetic couplings. These energetic couplings are sparse and associated with structural contacts and backbone proximity. Our results indicate that protein genetics is actually both rather simple and intelligible.
Collapse
Affiliation(s)
- Andre J Faure
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- ALLOX, Barcelona, Spain.
| | - Aina Martí-Aranda
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Cristina Hidalgo-Carcedo
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Antoni Beltran
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Jörn M Schmiedel
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- factorize.bio, Berlin, Germany
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
11
|
Zhang J, Chen J, Sha Y, Deng J, Wu J, Yang P, Zou F, Ying H, Zhuang W. Water-mediated active conformational transitions of lipase on organic solvent interfaces. Int J Biol Macromol 2024; 277:134056. [PMID: 39074702 DOI: 10.1016/j.ijbiomac.2024.134056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/31/2024] [Accepted: 07/19/2024] [Indexed: 07/31/2024]
Abstract
When it comes to enzyme stability and their application in organic solvents, enzyme biocatalysis has emerged as a popular substitute for conventional chemical processes. However, the demand for enzymes exhibiting improved stability remains a persistent challenge. Organic solvents can significantly impacts enzyme properties, thereby limiting their practical application. This study focuses on Lipase Thermomyces lanuginose, through molecular dynamics simulations and experiments, we quantified the effect of different solvent-lipase interfaces on the interfacial activation of lipase. Revealed molecular views of the complex solvation processes through the minimum distance distribution function. Solvent-protein interactions were used to interpret the factors influencing changes in lipase conformation and enzyme activity. We found that water content is crucial for enzyme stability, and the optimum water content for lipase activity was 35 % in the presence of benzene-water interface, which is closely related to the increase of its interfacial activation angle from 78° to 102°. Methanol induces interfacial activation in addition to significant competitive inhibition and denaturation at low water content. Our findings shed light on the importance of understanding solvent effects on enzyme function and provide practical insights for enzyme engineering and optimization in various solvent-lipase interfaces.
Collapse
Affiliation(s)
- Jihang Zhang
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, No. 30, Puzhu South Road, Nanjing 211816, China
| | - Jiale Chen
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, No. 30, Puzhu South Road, Nanjing 211816, China
| | - Yu Sha
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, No. 30, Puzhu South Road, Nanjing 211816, China
| | - Jiawei Deng
- State Key Laboratory of Materials-Oriented Chemical Engineering, Nanjing Tech University, No. 30, Puzhu South Road, Nanjing 211816, China
| | - Jinglan Wu
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, No. 30, Puzhu South Road, Nanjing 211816, China
| | - Pengpeng Yang
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, No. 30, Puzhu South Road, Nanjing 211816, China
| | - Fengxia Zou
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, No. 30, Puzhu South Road, Nanjing 211816, China
| | - Hanjie Ying
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, No. 30, Puzhu South Road, Nanjing 211816, China; State Key Laboratory of Materials-Oriented Chemical Engineering, Nanjing Tech University, No. 30, Puzhu South Road, Nanjing 211816, China
| | - Wei Zhuang
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, No. 30, Puzhu South Road, Nanjing 211816, China; State Key Laboratory of Materials-Oriented Chemical Engineering, Nanjing Tech University, No. 30, Puzhu South Road, Nanjing 211816, China.
| |
Collapse
|
12
|
Bradley D, Hogrebe A, Dandage R, Dubé AK, Leutert M, Dionne U, Chang A, Villén J, Landry CR. The fitness cost of spurious phosphorylation. EMBO J 2024; 43:4720-4751. [PMID: 39256561 PMCID: PMC11480408 DOI: 10.1038/s44318-024-00200-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 07/23/2024] [Accepted: 07/24/2024] [Indexed: 09/12/2024] Open
Abstract
The fidelity of signal transduction requires the binding of regulatory molecules to their cognate targets. However, the crowded cell interior risks off-target interactions between proteins that are functionally unrelated. How such off-target interactions impact fitness is not generally known. Here, we use Saccharomyces cerevisiae to inducibly express tyrosine kinases. Because yeast lacks bona fide tyrosine kinases, the resulting tyrosine phosphorylation is biologically spurious. We engineered 44 yeast strains each expressing a tyrosine kinase, and quantitatively analysed their phosphoproteomes. This analysis resulted in ~30,000 phosphosites mapping to ~3500 proteins. The number of spurious pY sites generated correlates strongly with decreased growth, and we predict over 1000 pY events to be deleterious. However, we also find that many of the spurious pY sites have a negligible effect on fitness, possibly because of their low stoichiometry. This result is consistent with our evolutionary analyses demonstrating a lack of phosphotyrosine counter-selection in species with tyrosine kinases. Our results suggest that, alongside the risk for toxicity, the cell can tolerate a large degree of non-functional crosstalk as interaction networks evolve.
Collapse
Affiliation(s)
- David Bradley
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexander Hogrebe
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Rohan Dandage
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexandre K Dubé
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Mario Leutert
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| | - Ugo Dionne
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexis Chang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Judit Villén
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada.
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada.
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada.
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada.
- Department of Biology, Université Laval, Québec, QC, Canada.
| |
Collapse
|
13
|
Scrima S, Lambrughi M, Tiberti M, Fadda E, Papaleo E. ASM variants in the spotlight: A structure-based atlas for unraveling pathogenic mechanisms in lysosomal acid sphingomyelinase. Biochim Biophys Acta Mol Basis Dis 2024; 1870:167260. [PMID: 38782304 DOI: 10.1016/j.bbadis.2024.167260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 04/30/2024] [Accepted: 05/18/2024] [Indexed: 05/25/2024]
Abstract
Lysosomal acid sphingomyelinase (ASM), a critical enzyme in lipid metabolism encoded by the SMPD1 gene, plays a crucial role in sphingomyelin hydrolysis in lysosomes. ASM deficiency leads to acid sphingomyelinase deficiency, a rare genetic disorder with diverse clinical manifestations, and the protein can be found mutated in other diseases. We employed a structure-based framework to comprehensively understand the functional implications of ASM variants, integrating pathogenicity predictions with molecular insights derived from a molecular dynamics simulation in a lysosomal membrane environment. Our analysis, encompassing over 400 variants, establishes a structural atlas of missense variants of lysosomal ASM, associating mechanistic indicators with pathogenic potential. Our study highlights variants that influence structural stability or exert local and long-range effects at functional sites. To validate our predictions, we compared them to available experimental data on residual catalytic activity in 135 ASM variants. Notably, our findings also suggest applications of the resulting data for identifying cases suited for enzyme replacement therapy. This comprehensive approach enhances the understanding of ASM variants and provides valuable insights for potential therapeutic interventions.
Collapse
Affiliation(s)
- Simone Scrima
- Cancer Structural Biology, Center for Autophagy, Recycling and Disease, Danish Cancer Institute, 2100 Copenhagen, Denmark; Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Matteo Lambrughi
- Cancer Structural Biology, Center for Autophagy, Recycling and Disease, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Matteo Tiberti
- Cancer Structural Biology, Center for Autophagy, Recycling and Disease, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Elisa Fadda
- Department of Chemistry and Hamilton Institute, Maynooth University, Maynooth, co. Kildare, Ireland
| | - Elena Papaleo
- Cancer Structural Biology, Center for Autophagy, Recycling and Disease, Danish Cancer Institute, 2100 Copenhagen, Denmark; Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800 Lyngby, Denmark.
| |
Collapse
|
14
|
Shorthouse D, Lister H, Freeman GS, Hall BA. Understanding large scale sequencing datasets through changes to protein folding. Brief Funct Genomics 2024; 23:517-524. [PMID: 38521964 PMCID: PMC11428155 DOI: 10.1093/bfgp/elae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 02/26/2024] [Accepted: 03/01/2024] [Indexed: 03/25/2024] Open
Abstract
The expansion of high-quality, low-cost sequencing has created an enormous opportunity to understand how genetic variants alter cellular behaviour in disease. The high diversity of mutations observed has however drawn a spotlight onto the need for predictive modelling of mutational effects on phenotype from variants of uncertain significance. This is particularly important in the clinic due to the potential value in guiding clinical diagnosis and patient treatment. Recent computational modelling has highlighted the importance of mutation induced protein misfolding as a common mechanism for loss of protein or domain function, aided by developments in methods that make large computational screens tractable. Here we review recent applications of this approach to different genes, and how they have enabled and supported subsequent studies. We further discuss developments in the approach and the role for the approach in light of increasingly high throughput experimental approaches.
Collapse
Affiliation(s)
- David Shorthouse
- School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK
| | - Harris Lister
- Department of Medical Physics and Biomedical Engineering, Malet Place Engineering Building, University College London, Gower Street, London WC1E 6BT, UK
| | - Gemma S Freeman
- Department of Medical Physics and Biomedical Engineering, Malet Place Engineering Building, University College London, Gower Street, London WC1E 6BT, UK
| | - Benjamin A Hall
- Department of Medical Physics and Biomedical Engineering, Malet Place Engineering Building, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
15
|
Aita T, Nemoto N. Mathematical consideration of massive estimation of dissociation rate constant for genotype-phenotype linking molecules bound to targets through washing/selection and next-generation sequencing. J Theor Biol 2024; 595:111944. [PMID: 39306325 DOI: 10.1016/j.jtbi.2024.111944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 08/29/2024] [Accepted: 09/07/2024] [Indexed: 10/03/2024]
Abstract
As one of methods for in vitro selection, a flow reactor type washing/selection system seems to be effective, where a ligand library is composed of "genotype-phenotype linking molecules". In this system, high affinity ligands are selected by their respective "residual ratio" given by exp(-koff×t), where koff is the dissociation rate constant and t is the washing time. In this paper, we mathematically considered the following possibility. When the washing/selection dynamics obeys the residual ratio exp(-koff×t) deterministically and mole fraction measurement for sampled sequences by next-generation sequencing (NGS) is performed ideally, the "relative value" of koff for each of high-ranking sequences can be estimated simultaneously. In addition to these, when the residual ratio for the whole ligand population is measured correctly, the "absolute value" for each sequence can be estimated. We deduced formulas to present the relative and absolute estimates, and mathematically analyzed the effect of fluctuations in the number of NGS reads on the estimates in details. These were confirmed by numerical simulations.
Collapse
Affiliation(s)
- Takuyo Aita
- Epsilon Molecular Engineering, Inc, 255 Shimo-Okubo, Sakura-ku, Saitama 338-8570, Japan
| | - Naoto Nemoto
- Epsilon Molecular Engineering, Inc, 255 Shimo-Okubo, Sakura-ku, Saitama 338-8570, Japan; Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama 338-8570, Japan.
| |
Collapse
|
16
|
Savinov A, Swanson S, Keating AE, Li GW. High-throughput discovery of inhibitory protein fragments with AlphaFold. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.19.572389. [PMID: 38187731 PMCID: PMC10769210 DOI: 10.1101/2023.12.19.572389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Peptides can bind to specific sites on larger proteins and thereby function as inhibitors and regulatory elements. Peptide fragments of larger proteins are particularly attractive for achieving these functions due to their inherent potential to form native-like binding interactions. Recently developed experimental approaches allow for high-throughput measurement of protein fragment inhibitory activity in living cells. However, it has thus far not been possible to predict de novo which of the many possible protein fragments bind to protein targets, let alone act as inhibitors. We have developed a computational method, FragFold, that employs AlphaFold to predict protein fragment binding to full-length proteins in a high-throughput manner. Applying FragFold to thousands of fragments tiling across diverse proteins revealed peaks of predicted binding along each protein sequence. Comparisons with experimental measurements establish that our approach is a sensitive predictor of fragment function: Evaluating inhibitory fragments from known protein-protein interaction interfaces, we find 87% are predicted by FragFold to bind in a native-like mode. Across full protein sequences, 68% of FragFold-predicted binding peaks match experimentally measured inhibitory peaks. Deep mutational scanning experiments support the predicted binding modes and uncover superior inhibitory peptides in high throughput. Further, FragFold is able to predict previously unknown protein binding modes, explaining prior genetic and biochemical data. The success rate of FragFold demonstrates that this computational approach should be broadly applicable for discovering inhibitory protein fragments across proteomes.
Collapse
Affiliation(s)
- Andrew Savinov
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Sebastian Swanson
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Amy E. Keating
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Koch Center for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gene-Wei Li
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
17
|
Gantz M, Mathis SV, Nintzel FEH, Lio P, Hollfelder F. On synergy between ultrahigh throughput screening and machine learning in biocatalyst engineering. Faraday Discuss 2024; 252:89-114. [PMID: 39133073 PMCID: PMC11318516 DOI: 10.1039/d4fd00065j] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 04/23/2024] [Indexed: 08/13/2024]
Abstract
Protein design and directed evolution have separately contributed enormously to protein engineering. Without being mutually exclusive, the former relies on computation from first principles, while the latter is a combinatorial approach based on chance. Advances in ultrahigh throughput (uHT) screening, next generation sequencing and machine learning may create alternative routes to engineered proteins, where functional information linked to specific sequences is interpreted and extrapolated in silico. In particular, the miniaturisation of functional tests in water-in-oil emulsion droplets with picoliter volumes and their rapid generation and analysis (>1 kHz) allows screening of >107-membered libraries in a day. Subsequently, decoding the selected clones by short or long-read sequencing methods leads to large sequence-function datasets that may allow extrapolation from experimental directed evolution to further improved mutants beyond the observed hits. In this work, we explore experimental strategies for how to draw up 'fitness landscapes' in sequence space with uHT droplet microfluidics, review the current state of AI/ML in enzyme engineering and discuss how uHT datasets may be combined with AI/ML to make meaningful predictions and accelerate biocatalyst engineering.
Collapse
Affiliation(s)
- Maximilian Gantz
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, UK
| | - Simon V Mathis
- Department of Computer Science, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Friederike E H Nintzel
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, UK
| | - Pietro Lio
- Department of Computer Science, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Florian Hollfelder
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, UK
| |
Collapse
|
18
|
Velecký J, Berezný M, Musil M, Damborsky J, Bednar D, Mazurenko S. BenchStab: a tool for automated querying of web-based stability predictors. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae553. [PMID: 39259175 PMCID: PMC11427696 DOI: 10.1093/bioinformatics/btae553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 08/02/2024] [Accepted: 09/10/2024] [Indexed: 09/12/2024]
Abstract
SUMMARY Protein design requires information about how mutations affect protein stability. Many web-based predictors are available for this purpose, yet comparing them or using them en masse is difficult. Here, we present BenchStab, a console tool/Python package for easy and quick execution of 19 predictors and result collection on a list of mutants. Moreover, the tool is easily extensible with additional predictors. We created an independent dataset derived from the FireProtDB and evaluated 24 different prediction methods. AVAILABILITY AND IMPLEMENTATION BenchStab is an open-source Python package available at https://github.com/loschmidt/BenchStab with a detailed README and example usage at https://loschmidt.chemi.muni.cz/benchstab. The BenchStab dataset is available on Zenodo: https://zenodo.org/records/10637728.
Collapse
Affiliation(s)
- Jan Velecký
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
| | - Matej Berezný
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, 612 00 Brno, Czech Republic
| | - Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, 612 00 Brno, Czech Republic
- International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| |
Collapse
|
19
|
Cheng P, Mao C, Tang J, Yang S, Cheng Y, Wang W, Gu Q, Han W, Chen H, Li S, Chen Y, Zhou J, Li W, Pan A, Zhao S, Huang X, Zhu S, Zhang J, Shu W, Wang S. Zero-shot prediction of mutation effects with multimodal deep representation learning guides protein engineering. Cell Res 2024; 34:630-647. [PMID: 38969803 PMCID: PMC11369238 DOI: 10.1038/s41422-024-00989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 06/03/2024] [Indexed: 07/07/2024] Open
Abstract
Mutations in amino acid sequences can provoke changes in protein function. Accurate and unsupervised prediction of mutation effects is critical in biotechnology and biomedicine, but remains a fundamental challenge. To resolve this challenge, here we present Protein Mutational Effect Predictor (ProMEP), a general and multiple sequence alignment-free method that enables zero-shot prediction of mutation effects. A multimodal deep representation learning model embedded in ProMEP was developed to comprehensively learn both sequence and structure contexts from ~160 million proteins. ProMEP achieves state-of-the-art performance in mutational effect prediction and accomplishes a tremendous improvement in speed, enabling efficient and intelligent protein engineering. Specifically, ProMEP accurately forecasts mutational consequences on the gene-editing enzymes TnpB and TadA, and successfully guides the development of high-performance gene-editing tools with their engineered variants. The gene-editing efficiency of a 5-site mutant of TnpB reaches up to 74.04% (vs 24.66% for the wild type); and the base editing tool developed on the basis of a TadA 15-site mutant (in addition to the A106V/D108N double mutation that renders deoxyadenosine deaminase activity to TadA) exhibits an A-to-G conversion frequency of up to 77.27% (vs 69.80% for ABE8e, a previous TadA-based adenine base editor) with significantly reduced bystander and off-target effects compared to ABE8e. ProMEP not only showcases superior performance in predicting mutational effects on proteins but also demonstrates a great capability to guide protein engineering. Therefore, ProMEP enables efficient exploration of the gigantic protein space and facilitates practical design of proteins, thereby advancing studies in biomedicine and synthetic biology.
Collapse
Affiliation(s)
- Peng Cheng
- Bioinformatics Center of AMMS, Beijing, China
| | - Cong Mao
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Jin Tang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Sen Yang
- Bioinformatics Center of AMMS, Beijing, China
| | - Yu Cheng
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wuke Wang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Qiuxi Gu
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wei Han
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Hao Chen
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Sihan Li
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | | | | | - Wuju Li
- Bioinformatics Center of AMMS, Beijing, China
| | - Aimin Pan
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Suwen Zhao
- iHuman Institute, ShanghaiTech University, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xingxu Huang
- Zhejiang Lab, Hangzhou, Zhejiang, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | | | - Jun Zhang
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China.
| | - Wenjie Shu
- Bioinformatics Center of AMMS, Beijing, China.
| | | |
Collapse
|
20
|
Thornton EL, Paterson SM, Stam MJ, Wood CW, Laohakunakorn N, Regan L. Applications of cell free protein synthesis in protein design. Protein Sci 2024; 33:e5148. [PMID: 39180484 PMCID: PMC11344276 DOI: 10.1002/pro.5148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 07/26/2024] [Accepted: 08/02/2024] [Indexed: 08/26/2024]
Abstract
In protein design, the ultimate test of success is that the designs function as desired. Here, we discuss the utility of cell free protein synthesis (CFPS) as a rapid, convenient and versatile method to screen for activity. We champion the use of CFPS in screening potential designs. Compared to in vivo protein screening, a wider range of different activities can be evaluated using CFPS, and the scale on which it can easily be used-screening tens to hundreds of designed proteins-is ideally suited to current needs. Protein design using physics-based strategies tended to have a relatively low success rate, compared with current machine-learning based methods. Screening steps (such as yeast display) were often used to identify proteins that displayed the desired activity from many designs that were highly ranked computationally. We also describe how CFPS is well-suited to identify the reasons designs fail, which may include problems with transcription, translation, and solubility, in addition to not achieving the desired structure and function.
Collapse
Affiliation(s)
- Ella Lucille Thornton
- Centre for Engineering Biology, Institute of Quantitative Biology, Biochemistry and Biotechnology, School of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Sarah Maria Paterson
- Centre for Engineering Biology, Institute of Quantitative Biology, Biochemistry and Biotechnology, School of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Michael J. Stam
- Centre for Engineering Biology, Institute of Quantitative Biology, Biochemistry and Biotechnology, School of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Christopher W. Wood
- Centre for Engineering Biology, Institute of Quantitative Biology, Biochemistry and Biotechnology, School of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Nadanai Laohakunakorn
- Centre for Engineering Biology, Institute of Quantitative Biology, Biochemistry and Biotechnology, School of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Lynne Regan
- Centre for Engineering Biology, Institute of Quantitative Biology, Biochemistry and Biotechnology, School of Biological SciencesUniversity of EdinburghEdinburghUK
| |
Collapse
|
21
|
McBride JM, Tlusty T. AI-Predicted Protein Deformation Encodes Energy Landscape Perturbation. PHYSICAL REVIEW LETTERS 2024; 133:098401. [PMID: 39270162 DOI: 10.1103/physrevlett.133.098401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 02/27/2024] [Accepted: 07/24/2024] [Indexed: 09/15/2024]
Abstract
AI algorithms have proven to be excellent predictors of protein structure, but whether and how much these algorithms can capture the underlying physics remains an open question. Here, we aim to test this question using the Alphafold2 (AF) algorithm: We use AF to predict the subtle structural deformation induced by single mutations, quantified by strain, and compare with experimental datasets of corresponding perturbations in folding free energy ΔΔG. Unexpectedly, we find that physical strain alone-without any additional data or computation-correlates almost as well with ΔΔG as state-of-the-art energy-based and machine-learning predictors. This indicates that the AF-predicted structures alone encode fine details about the energy landscape. In particular, the structures encode significant information on stability, enough to estimate (de-)stabilizing effects of mutations, thus paving the way for the development of novel, structure-based stability predictors for protein design and evolution.
Collapse
Affiliation(s)
- John M McBride
- Center for Algorithmic and Robotized Synthesis, Institute for Basic Science, Ulsan 44919, South Korea
| | | |
Collapse
|
22
|
Chen Y, Xu Y, Liu D, Xing Y, Gong H. An end-to-end framework for the prediction of protein structure and fitness from single sequence. Nat Commun 2024; 15:7400. [PMID: 39191788 DOI: 10.1038/s41467-024-51776-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 08/19/2024] [Indexed: 08/29/2024] Open
Abstract
Significant research progress has been made in the field of protein structure and fitness prediction. Particularly, single-sequence-based structure prediction methods like ESMFold and OmegaFold achieve a balance between inference speed and prediction accuracy, showing promise for many downstream prediction tasks. Here, we propose SPIRED, a single-sequence-based structure prediction model that exhibits comparable performance to the state-of-the-art methods but with approximately 5-fold acceleration in inference and at least one order of magnitude reduction in training consumption. By integrating SPIRED with downstream neural networks, we compose an end-to-end framework named SPIRED-Fitness for the rapid prediction of both protein structure and fitness from single sequence with satisfactory accuracy. Moreover, SPIRED-Stab, the derivative of SPIRED-Fitness, achieves state-of-the-art performance in predicting the mutational effects on protein stability.
Collapse
Affiliation(s)
- Yinghui Chen
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China
| | - Yunxin Xu
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China
| | - Di Liu
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China
| | - Yaoguang Xing
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China.
| |
Collapse
|
23
|
Guclu TF, Atilgan AR, Atilgan C. Deciphering GB1's Single Mutational Landscape: Insights from MuMi Analysis. J Phys Chem B 2024; 128:7987-7996. [PMID: 39115184 DOI: 10.1021/acs.jpcb.4c04916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2024]
Abstract
Mutational changes that affect the binding of the C2 fragment of Streptococcal protein G (GB1) to the Fc domain of human IgG (IgG-Fc) have been extensively studied using deep mutational scanning (DMS), and the binding affinity of all single mutations has been measured experimentally in the literature. To investigate the underlying molecular basis, we perform in silico mutational scanning for all possible single mutations, along with 2 μs-long molecular dynamics (WT-MD) of the wild-type (WT) GB1 in both unbound and IgG-Fc bound forms. We compute the hydrogen bonds between GB1 and IgG-Fc in WT-MD to identify the dominant hydrogen bonds for binding, which we then assess in conformations produced by Mutation and Minimization (MuMi) to explain the fitness landscape of GB1 and IgG-Fc binding. Furthermore, we analyze MuMi and WT-MD to investigate the dynamics of binding, focusing on the relative solvent accessibility of residues and the probability of residues being located at the binding interface. With these analyses, we explain the interactions between GB1 and IgG-Fc and display the structural features of binding. In sum, our findings highlight the potential of MuMi as a reliable and computationally efficient tool for predicting protein fitness landscapes, offering significant advantages over traditional methods. The methodologies and results presented in this study pave the way for improved predictive accuracy in protein stability and interaction studies, which are crucial for advancements in drug design and synthetic biology.
Collapse
Affiliation(s)
- Tandac F Guclu
- Faculty of Natural Sciences and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| | - Ali Rana Atilgan
- Faculty of Natural Sciences and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| | - Canan Atilgan
- Faculty of Natural Sciences and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| |
Collapse
|
24
|
Dieckhaus H, Kuhlman B. Protein stability models fail to capture epistatic interactions of double point mutations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.20.608844. [PMID: 39229177 PMCID: PMC11370451 DOI: 10.1101/2024.08.20.608844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
There is strong interest in accurate methods for predicting changes in protein stability resulting from amino acid mutations to the protein sequence. Recombinant proteins must often be stabilized to be used as therapeutics or reagents, and destabilizing mutations are implicated in a variety of diseases. Due to increased data availability and improved modeling techniques, recent studies have shown advancements in predicting changes in protein stability when a single point mutation is made. Less focus has been directed toward predicting changes in protein stability when there are two or more mutations, despite the significance of mutation clusters for disease pathways and protein design studies. Here, we analyze the largest available dataset of double point mutation stability and benchmark several widely used protein stability models on this and other datasets. We identify a blind spot in how predictors are typically evaluated on multiple mutations, finding that, contrary to assumptions in the field, current stability models are unable to consistently capture epistatic interactions between double mutations. We observe one notable deviation from this trend, which is that epistasis-aware models provide marginally better predictions on stabilizing double point mutations. We develop an extension of the ThermoMPNN framework for double mutant modeling as well as a novel data augmentation scheme which mitigates some of the limitations in available datasets. Collectively, our findings indicate that current protein stability models fail to capture the nuanced epistatic interactions between concurrent mutations due to several factors, including training dataset limitations and insufficient model sensitivity.
Collapse
Affiliation(s)
- Henry Dieckhaus
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Division of Chemical Biology and Medicinal Chemistry, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, North Carolina, USA
| | - Brian Kuhlman
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| |
Collapse
|
25
|
Lipsh-Sokolik R, Fleishman SJ. Addressing epistasis in the design of protein function. Proc Natl Acad Sci U S A 2024; 121:e2314999121. [PMID: 39133844 PMCID: PMC11348311 DOI: 10.1073/pnas.2314999121] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024] Open
Abstract
Mutations in protein active sites can dramatically improve function. The active site, however, is densely packed and extremely sensitive to mutations. Therefore, some mutations may only be tolerated in combination with others in a phenomenon known as epistasis. Epistasis reduces the likelihood of obtaining improved functional variants and dramatically slows natural and lab evolutionary processes. Research has shed light on the molecular origins of epistasis and its role in shaping evolutionary trajectories and outcomes. In addition, sequence- and AI-based strategies that infer epistatic relationships from mutational patterns in natural or experimental evolution data have been used to design functional protein variants. In recent years, combinations of such approaches and atomistic design calculations have successfully predicted highly functional combinatorial mutations in active sites. These were used to design thousands of functional active-site variants, demonstrating that, while our understanding of epistasis remains incomplete, some of the determinants that are critical for accurate design are now sufficiently understood. We conclude that the space of active-site variants that has been explored by evolution may be expanded dramatically to enhance natural activities or discover new ones. Furthermore, design opens the way to systematically exploring sequence and structure space and mutational impacts on function, deepening our understanding and control over protein activity.
Collapse
Affiliation(s)
- Rosalie Lipsh-Sokolik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
26
|
Johnston KE, Almhjell PJ, Watkins-Dulaney EJ, Liu G, Porter NJ, Yang J, Arnold FH. A combinatorially complete epistatic fitness landscape in an enzyme active site. Proc Natl Acad Sci U S A 2024; 121:e2400439121. [PMID: 39074291 PMCID: PMC11317637 DOI: 10.1073/pnas.2400439121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Accepted: 06/17/2024] [Indexed: 07/31/2024] Open
Abstract
Protein engineering often targets binding pockets or active sites which are enriched in epistasis-nonadditive interactions between amino acid substitutions-and where the combined effects of multiple single substitutions are difficult to predict. Few existing sequence-fitness datasets capture epistasis at large scale, especially for enzyme catalysis, limiting the development and assessment of model-guided enzyme engineering approaches. We present here a combinatorially complete, 160,000-variant fitness landscape across four residues in the active site of an enzyme. Assaying the native reaction of a thermostable β-subunit of tryptophan synthase (TrpB) in a nonnative environment yielded a landscape characterized by significant epistasis and many local optima. These effects prevent simulated directed evolution approaches from efficiently reaching the global optimum. There is nonetheless wide variability in the effectiveness of different directed evolution approaches, which together provide experimental benchmarks for computational and machine learning workflows. The most-fit TrpB variants contain a substitution that is nearly absent in natural TrpB sequences-a result that conservation-based predictions would not capture. Thus, although fitness prediction using evolutionary data can enrich in more-active variants, these approaches struggle to identify and differentiate among the most-active variants, even for this near-native function. Overall, this work presents a large-scale testing ground for model-guided enzyme engineering and suggests that efficient navigation of epistatic fitness landscapes can be improved by advances in both machine learning and physical modeling.
Collapse
Affiliation(s)
- Kadina E. Johnston
- Division of Biology and Bioengineering, California Institute of Technology, Pasadena, CA91125
| | - Patrick J. Almhjell
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA91125
| | - Ella J. Watkins-Dulaney
- Division of Biology and Bioengineering, California Institute of Technology, Pasadena, CA91125
| | - Grace Liu
- Division of Biology and Bioengineering, California Institute of Technology, Pasadena, CA91125
| | - Nicholas J. Porter
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA91125
| | - Jason Yang
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA91125
| | - Frances H. Arnold
- Division of Biology and Bioengineering, California Institute of Technology, Pasadena, CA91125
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA91125
| |
Collapse
|
27
|
Layek S, Sengupta N. Response of Foldable Protein Conformations to Non-Physiological Perturbations: Interplay of Thermal Factors and Confinement. Chemphyschem 2024:e202400618. [PMID: 39104119 DOI: 10.1002/cphc.202400618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 07/14/2024] [Accepted: 08/01/2024] [Indexed: 08/07/2024]
Abstract
Technological advances frequently interface biomolecules with nanomaterials at non-physiological conditions, necessitating response characterization of key processes. Similar encounters are expected in cellular contexts. We report in silico investigations of the response of diverse protein conformational states to lowering of temperature and imposition of spatial constraints. Conformational states are represented by folded form of the Albumin binding domain (ABD) protein, its compact denatured form, and structurally disordered nascent folding elements. Data from extensive simulations are evaluated to elicit structural, thermodynamic and dynamic responses of the states and their associated environment. Analyses reveal alterations to folding propensity with reduced thermal energy and confinement, with signatures of trend reversal in highly disordered states. Across temperatures, confinement has restrictive effects on volume and energetic fluctuations, leading to narrowing of differences in isothermal compressibility (κ) and heat capacities (Cp). While excess (over ideal gas) entropy of the hydration layer marks dependence on the conformational state at bulk, confinement triggers erasure of differences. These observations are largely consistent with timescales of protein-water hydrogen bonding dynamics. The results implicate multi-factorial associations within a simple bio-nano complex. We expect the current study to motivate investigations of more biologically relevant interfaces towards mechanistic understanding and potential applications.
Collapse
Affiliation(s)
- Sarbajit Layek
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER), Kolkata Mohanpur, West Bengal, 741246, India
| | - Neelanjana Sengupta
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER), Kolkata Mohanpur, West Bengal, 741246, India
| |
Collapse
|
28
|
Listov D, Goverde CA, Correia BE, Fleishman SJ. Opportunities and challenges in design and optimization of protein function. Nat Rev Mol Cell Biol 2024; 25:639-653. [PMID: 38565617 PMCID: PMC7616297 DOI: 10.1038/s41580-024-00718-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2024] [Indexed: 04/04/2024]
Abstract
The field of protein design has made remarkable progress over the past decade. Historically, the low reliability of purely structure-based design methods limited their application, but recent strategies that combine structure-based and sequence-based calculations, as well as machine learning tools, have dramatically improved protein engineering and design. In this Review, we discuss how these methods have enabled the design of increasingly complex structures and therapeutically relevant activities. Additionally, protein optimization methods have improved the stability and activity of complex eukaryotic proteins. Thanks to their increased reliability, computational design methods have been applied to improve therapeutics and enzymes for green chemistry and have generated vaccine antigens, antivirals and drug-delivery nano-vehicles. Moreover, the high success of design methods reflects an increased understanding of basic rules that govern the relationships among protein sequence, structure and function. However, de novo design is still limited mostly to α-helix bundles, restricting its potential to generate sophisticated enzymes and diverse protein and small-molecule binders. Designing complex protein structures is a challenging but necessary next step if we are to realize our objective of generating new-to-nature activities.
Collapse
Affiliation(s)
- Dina Listov
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Casper A Goverde
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Bruno E Correia
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
| | - Sarel Jacob Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
29
|
Vila JA. Analysis of proteins in the light of mutations. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2024; 53:255-265. [PMID: 38955858 DOI: 10.1007/s00249-024-01714-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 05/23/2024] [Accepted: 06/18/2024] [Indexed: 07/04/2024]
Abstract
Proteins have evolved through mutations-amino acid substitutions-since life appeared on Earth, some 109 years ago. The study of these phenomena has been of particular significance because of their impact on protein stability, function, and structure. This study offers a new viewpoint on how the most recent findings in these areas can be used to explore the impact of mutations on protein sequence, stability, and evolvability. Preliminary results indicate that: (1) mutations can be viewed as sensitive probes to identify 'typos' in the amino-acid sequence, and also to assess the resistance of naturally occurring proteins to unwanted sequence alterations; (2) the presence of 'typos' in the amino acid sequence, rather than being an evolutionary obstacle, could promote faster evolvability and, in turn, increase the likelihood of higher protein stability; (3) the mutation site is far more important than the substituted amino acid in terms of the marginal stability changes of the protein, and (4) the unpredictability of protein evolution at the molecular level-by mutations-exists even in the absence of epistasis effects. Finally, the Darwinian concept of evolution "descent with modification" and experimental evidence endorse one of the results of this study, which suggests that some regions of any protein sequence are susceptible to mutations while others are not. This work contributes to our general understanding of protein responses to mutations and may spur significant progress in our efforts to develop methods to accurately forecast changes in protein stability, their propensity for metamorphism, and their ability to evolve.
Collapse
Affiliation(s)
- Jorge A Vila
- IMASL-CONICET, Universidad Nacional de San Luis, Ejército de los Andes 950, 5700, San Luis, Argentina.
| |
Collapse
|
30
|
King BR, Sumida KH, Caruso JL, Baker D, Zalatan JG. Computational stabilization of a non-heme iron enzyme enables efficient evolution of new function. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590141. [PMID: 39091854 PMCID: PMC11290999 DOI: 10.1101/2024.04.18.590141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Directed evolution has emerged as a powerful tool for engineering new biocatalysts. However, introducing new catalytic residues can be destabilizing, and it is generally beneficial to start with a stable enzyme parent. Here we show that the deep learning-based tool ProteinMPNN can be used to redesign Fe(II)/αKG superfamily enzymes for greater stability, solubility, and expression while retaining both native activity and industrially-relevant non-native functions. For the Fe(II)/αKG enzyme tP4H, we performed site-saturation mutagenesis with both the wild-type and stabilized design variant and screened for activity increases in a non-native C-H hydroxylation reaction. We observed substantially larger increases in non-native activity for variants obtained from the stabilized scaffold compared to those from the wild-type enzyme. ProteinMPNN is user-friendly and widely-accessible, and straightforward structural criteria were sufficient to obtain stabilized, catalytically-functional variants of the Fe(II)/αKG enzymes tP4H and GriE. Our work suggests that stabilization by computational sequence redesign could be routinely implemented as a first step in directed evolution campaigns for novel biocatalysts.
Collapse
Affiliation(s)
- Brianne R King
- Department of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Kiera H Sumida
- Department of Chemistry and Institute for Protein Design, University of Washington, Seattle, Washington 98195, United States
| | - Jessica L Caruso
- Department of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - David Baker
- Institute for Protein Design, Department of Biochemistry, and Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, United States
| | - Jesse G Zalatan
- Department of Chemistry, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
31
|
Jiang Z, van Vlimmeren AE, Karandur D, Semmelman A, Shah NH. Revealing the principles of inter- and intra-domain regulation in a signaling enzyme via scanning mutagenesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.13.593907. [PMID: 39091798 PMCID: PMC11291063 DOI: 10.1101/2024.05.13.593907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Multi-domain enzymes can be regulated by both inter-domain interactions and structural features intrinsic to the catalytic domain. The tyrosine phosphatase SHP2 is a quintessential example of a multi-domain protein that is regulated by inter-domain interactions. This enzyme has a protein tyrosine phosphatase (PTP) domain and two phosphotyrosine-recognition domains (N-SH2 and C-SH2) that regulate phosphatase activity through autoinhibitory interactions. SHP2 is canonically activated by phosphoprotein binding to the SH2 domains, which causes large inter-domain rearrangements, but autoinhibition can also be disrupted by disease-associated mutations. Many details of the SHP2 activation mechanism are still unclear, the physiologically-relevant active conformations remain elusive, and hundreds of human variants of SHP2 have not been functionally characterized. Here, we perform deep mutational scanning on both full-length SHP2 and its isolated PTP domain to examine mutational effects on inter-domain regulation and catalytic activity. Our experiments provide a comprehensive map of SHP2 mutational sensitivity, both in the presence and absence of inter-domain regulation. Coupled with molecular dynamics simulations, our investigation reveals novel structural features that govern the stability of the autoinhibited and active states of SHP2. Our analysis also identifies key residues beyond the SHP2 active site that control PTP domain dynamics and intrinsic catalytic activity. This work expands our understanding of SHP2 regulation and provides new insights into SHP2 pathogenicity.
Collapse
Affiliation(s)
- Ziyuan Jiang
- Department of Chemistry, Columbia University, New York, NY 10027
| | - Anne E. van Vlimmeren
- Department of Chemistry, Columbia University, New York, NY 10027
- Department of Biological Sciences, Columbia University, New York, NY 10027
| | - Deepti Karandur
- Department of Biochemistry, Vanderbilt University, Nashville, TN 37232
| | - Alyssa Semmelman
- Department of Chemistry, Columbia University, New York, NY 10027
| | - Neel H. Shah
- Department of Chemistry, Columbia University, New York, NY 10027
| |
Collapse
|
32
|
Arutyunyan A, Seuma M, Faure AJ, Bolognesi B, Lehner B. Energetic portrait of the amyloid beta nucleation transition state. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.24.604935. [PMID: 39091732 PMCID: PMC11291115 DOI: 10.1101/2024.07.24.604935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Amyloid protein aggregates are pathological hallmarks of more than fifty human diseases including the most common neurodegenerative disorders. The atomic structures of amyloid fibrils have now been determined, but the process by which soluble proteins nucleate to form amyloids remains poorly characterised and difficult to study, even though this is the key step to understand to prevent the formation and spread of aggregates. Here we use massively parallel combinatorial mutagenesis, a kinetic selection assay, and machine learning to reveal the transition state of the nucleation reaction of amyloid beta, the protein that aggregates in Alzheimer's disease. By quantifying the nucleation of >140,000 proteins we infer the changes in activation energy for all 798 amino acid substitutions in amyloid beta and the energetic couplings between >600 pairs of mutations. This unprecedented dataset provides the first comprehensive view of the energy landscape and the first large-scale measurement of energetic couplings for a protein transition state. The energy landscape reveals that the amyloid beta nucleation transition state contains a short structured C-terminal hydrophobic core with a subset of interactions similar to mature fibrils. This study demonstrates the feasibility of using mutation-selection-sequencing experiments to study transition states and identifies the key molecular species that initiates amyloid beta aggregation and, potentially, Alzheimer's disease.
Collapse
Affiliation(s)
| | - Mireia Seuma
- Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac 10-12, 08028, Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Barcelona, Spain
- Current address: Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| | - Andre J. Faure
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Current address: ALLOX, C/Dr. Aiguader, 88, PRBB Building, 08003 Barcelona, Spain
| | - Benedetta Bolognesi
- Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac 10-12, 08028, Barcelona, Spain
| | - Ben Lehner
- Wellcome Sanger Institute, Cambridge, UK
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
33
|
Diaz DJ, Gong C, Ouyang-Zhang J, Loy JM, Wells J, Yang D, Ellington AD, Dimakis AG, Klivans AR. Stability Oracle: a structure-based graph-transformer framework for identifying stabilizing mutations. Nat Commun 2024; 15:6170. [PMID: 39043654 PMCID: PMC11266546 DOI: 10.1038/s41467-024-49780-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 06/14/2024] [Indexed: 07/25/2024] Open
Abstract
Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle: a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations. Our framework introduces several innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time, such as: Thermodynamic Permutations for data augmentation, structural amino acid embeddings to model a mutation with a single structure, a protein structure-specific attention-bias mechanism that makes transformers a viable alternative to graph neural networks. We provide training/test splits that mitigate data leakage and ensure proper model evaluation. Furthermore, to examine our data engineering contributions, we fine-tune ESM2 representations (Prostata-IFML) and achieve SOTA for sequence-based models. Notably, Stability Oracle outperforms Prostata-IFML even though it was pretrained on 2000X less proteins and has 548X less parameters. Our framework establishes a path for fine-tuning structure-based transformers to virtually any phenotype, a necessary task for accelerating the development of protein-based biotechnologies.
Collapse
Affiliation(s)
- Daniel J Diaz
- UT Austin, Department of Computer Science, Austin, TX, 78712, USA.
- Intelligent Proteins, LLC, Austin, TX, 78712, USA.
- UT Austin, Department of Chemistry, Austin, TX, 78712, USA.
| | - Chengyue Gong
- UT Austin, Department of Computer Science, Austin, TX, 78712, USA
| | | | - James M Loy
- Intelligent Proteins, LLC, Austin, TX, 78712, USA
- UT Austin, Department of Molecular Biosciences, Austin, TX, 78712, USA
| | - Jordan Wells
- UT Austin, McKetta Department of Chemical Engineering, Austin, TX, 78712, USA
| | - David Yang
- UT Austin, Department of Molecular Biosciences, Austin, TX, 78712, USA
| | | | - Alexandros G Dimakis
- UT Austin, Chandra Family Department of Electrical and Computer Engineering, Austin, TX, 78712, USA
| | - Adam R Klivans
- UT Austin, Department of Computer Science, Austin, TX, 78712, USA
| |
Collapse
|
34
|
Cuturello F, Celoria M, Ansuini A, Cazzaniga A. Enhancing predictions of protein stability changes induced by single mutations using MSA-based Language Models. Bioinformatics 2024; 40:btae447. [PMID: 39012369 PMCID: PMC11269464 DOI: 10.1093/bioinformatics/btae447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/19/2024] [Accepted: 07/10/2024] [Indexed: 07/17/2024] Open
Abstract
MOTIVATION Protein Language Models offer a new perspective for addressing challenges in structural biology, while relying solely on sequence information. Recent studies have investigated their effectiveness in forecasting shifts in thermodynamic stability caused by single amino acid mutations, a task known for its complexity due to the sparse availability of data, constrained by experimental limitations. To tackle this problem, we introduce two key novelties: leveraging a Protein Language Model that incorporates Multiple Sequence Alignments to capture evolutionary information, and using a recently released mega-scale dataset with rigorous data pre-processing to mitigate overfitting. RESULTS We ensure comprehensive comparisons by fine-tuning various pre-trained models, taking advantage of analyses such as ablation studies and baselines evaluation. Our methodology introduces a stringent policy to reduce the widespread issue of data leakage, rigorously removing sequences from the training set when they exhibit significant similarity with the test set. The MSA Transformer emerges as the most accurate among the models under investigation, given its capability to leverage co-evolution signals encoded in aligned homologous sequences. Moreover, the optimized MSA Transformer outperforms existing methods and exhibits enhanced generalization power, leading to a notable improvement in predicting changes in protein stability resulting from point mutations. AVAILABILITY AND IMPLEMENTATION Code and data at https://github.com/RitAreaSciencePark/PLM4Muts. SUPPLEMENTARY INFORMATION Supplementary Information is available at Bioinformatics online.
Collapse
Affiliation(s)
- Francesca Cuturello
- Research and Technology Institute, , AREA Science Park, Trieste 34149, Italy
| | - Marco Celoria
- Research and Technology Institute, , AREA Science Park, Trieste 34149, Italy
- HPC Department, , CINECA National Supercomputing Center, Bologna 40033, Italy
| | - Alessio Ansuini
- Research and Technology Institute, , AREA Science Park, Trieste 34149, Italy
| | - Alberto Cazzaniga
- Research and Technology Institute, , AREA Science Park, Trieste 34149, Italy
| |
Collapse
|
35
|
Felbinger N, Ribeiro-Filho H, Pierce B. Proscan: a structure-based proline design web server. Nucleic Acids Res 2024; 52:W280-W286. [PMID: 38769060 PMCID: PMC11223860 DOI: 10.1093/nar/gkae408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 04/16/2024] [Accepted: 05/01/2024] [Indexed: 05/22/2024] Open
Abstract
The ability to control protein conformations and dynamics through structure-based design has been useful in various scenarios, including engineering of viral antigens for vaccines. One effective design strategy is the substitution of residues to proline amino acids, which due to its unique cyclic side chain can favor and rigidify key backbone conformations. To provide the community with a means to readily identify and explore proline designs for target proteins of interest, we developed the Proscan web server. Proscan provides assessment of backbone angles, energetic and deep learning-based favorability scores, and other parameters for proline substitutions at each position of an input structure, along with interactive visualization of backbone angles and candidate substitution sites on structures. It identifies known favorable proline substitutions for viral antigens, and was benchmarked against datasets of proline substitution stability effects from deep mutational scanning and thermodynamic measurements. This tool can enable researchers to identify and prioritize designs for prospective vaccine antigen targets, or other designs to favor stability of key protein conformations. Proscan is available at: https://proscan.ibbr.umd.edu.
Collapse
Affiliation(s)
- Nathaniel Felbinger
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, MD 20850, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| | - Helder V Ribeiro-Filho
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, MD 20850, USA
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas 13083-100, Brazil
| | - Brian G Pierce
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, MD 20850, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
36
|
Basu S, Subedi U, Tonelli M, Afshinpour M, Tiwari N, Fuentes EJ, Chakravarty S. Assessing the functional roles of coevolving PHD finger residues. Protein Sci 2024; 33:e5065. [PMID: 38923615 PMCID: PMC11201814 DOI: 10.1002/pro.5065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/21/2024] [Accepted: 05/16/2024] [Indexed: 06/28/2024]
Abstract
Although in silico folding based on coevolving residue constraints in the deep-learning era has transformed protein structure prediction, the contributions of coevolving residues to protein folding, stability, and other functions in physical contexts remain to be clarified and experimentally validated. Herein, the PHD finger module, a well-known histone reader with distinct subtypes containing subtype-specific coevolving residues, was used as a model to experimentally assess the contributions of coevolving residues and to clarify their specific roles. The results of the assessment, including proteolysis and thermal unfolding of wildtype and mutant proteins, suggested that coevolving residues have varying contributions, despite their large in silico constraints. Residue positions with large constraints were found to contribute to stability in one subtype but not others. Computational sequence design and generative model-based energy estimates of individual structures were also implemented to complement the experimental assessment. Sequence design and energy estimates distinguish coevolving residues that contribute to folding from those that do not. The results of proteolytic analysis of mutations at positions contributing to folding were consistent with those suggested by sequence design and energy estimation. Thus, we report a comprehensive assessment of the contributions of coevolving residues, as well as a strategy based on a combination of approaches that should enable detailed understanding of the residue contributions in other large protein families.
Collapse
Affiliation(s)
- Shraddha Basu
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| | - Ujwal Subedi
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| | - Marco Tonelli
- National Magnetic Resonance Facility at Madison (NMRFAM), University of Wisconsin‐MadisonMadisonWisconsinUSA
| | - Maral Afshinpour
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| | - Nitija Tiwari
- Department of Biochemistry & Molecular BiologyUniversity of IowaIowa CityIowaUSA
| | - Ernesto J. Fuentes
- Department of Biochemistry & Molecular BiologyUniversity of IowaIowa CityIowaUSA
| | - Suvobrata Chakravarty
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| |
Collapse
|
37
|
Chu SKS, Narang K, Siegel JB. Protein stability prediction by fine-tuning a protein language model on a mega-scale dataset. PLoS Comput Biol 2024; 20:e1012248. [PMID: 39038042 PMCID: PMC11293664 DOI: 10.1371/journal.pcbi.1012248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 08/01/2024] [Accepted: 06/13/2024] [Indexed: 07/24/2024] Open
Abstract
Protein stability plays a crucial role in a variety of applications, such as food processing, therapeutics, and the identification of pathogenic mutations. Engineering campaigns commonly seek to improve protein stability, and there is a strong interest in streamlining these processes to enable rapid optimization of highly stabilized proteins with fewer iterations. In this work, we explore utilizing a mega-scale dataset to develop a protein language model optimized for stability prediction. ESMtherm is trained on the folding stability of 528k natural and de novo sequences derived from 461 protein domains and can accommodate deletions, insertions, and multiple-point mutations. We show that a protein language model can be fine-tuned to predict folding stability. ESMtherm performs reasonably on small protein domains and generalizes to sequences distal from the training set. Lastly, we discuss our model's limitations compared to other state-of-the-art methods in generalizing to larger protein scaffolds. Our results highlight the need for large-scale stability measurements on a diverse dataset that mirrors the distribution of sequence lengths commonly observed in nature.
Collapse
Affiliation(s)
- Simon K. S. Chu
- Biophysics Graduate Program, University of California Davis, Davis, California, United States of America
| | - Kush Narang
- College of Biological Sciences, University of California Davis, Davis, California, United States of America
| | - Justin B. Siegel
- Genome Center, University of California Davis, Davis, California, United States of America
- Department of Chemistry, University of California Davis, Davis, California, United States of America
- Department of Biochemistry and Molecular Medicine, University of California Davis, Davis, California, United States of America
| |
Collapse
|
38
|
McShea H, Weibel C, Wehbi S, Goodman P, James JE, Wheeler AL, Masel J. The effectiveness of selection in a species affects the direction of amino acid frequency evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.01.526552. [PMID: 38948853 PMCID: PMC11212923 DOI: 10.1101/2023.02.01.526552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Nearly neutral theory predicts that species with higher effective population size (N e ) are better able to purge slightly deleterious mutations. We compare evolution in high-N e vs. low-N e vertebrates to reveal which amino acid frequencies are subject to subtle selective preferences. We take three complementary approaches, two measuring flux and one measuring outcomes. First, we fit non-stationary substitution models of amino acid flux using maximum likelihood, comparing the high-N e clade of rodents and lagomorphs to its low-N e sister clade of primates and colugos. Second, we compare evolutionary outcomes across a wider range of vertebrates, via correlations between amino acid frequencies and N e . Third, we dissect the details of flux in human, chimpanzee, mouse, and rat, as scored by parsimony - this also enables comparison to a historical paper. All three methods agree on which amino acids are preferred under more effective selection. Preferred amino acids tend to be smaller, less costly to synthesize, and to promote intrinsic structural disorder. Parsimony-induced bias in the historical study produces an apparent reduction in structural disorder, perhaps driven by slightly deleterious substitutions. Within highly exchangeable pairs of amino acids, arginine is strongly preferred over lysine, and valine over isoleucine, consistent with more effective selection preferring a marginally larger free energy of folding. These two preferences match differences between thermophiles and mesophilic relatives. These results reveal the biophysical consequences of mutation-selection-drift balance, and demonstrate the utility of nearly neutral theory for understanding protein evolution.
Collapse
Affiliation(s)
- Hanon McShea
- Department of Earth System Science, Stanford University
| | - Catherine Weibel
- Department of Ecology & Evolutionary Biology, University of Arizona
- Department of Applied Physics, Stanford University
| | - Sawsan Wehbi
- Graduate Interdisciplinary Program in Genetics, University of Arizona
| | | | - Jennifer E James
- Department of Ecology & Evolutionary Biology, University of Arizona
- Department of Ecology and Genetics, Uppsala University
| | - Andrew L Wheeler
- Graduate Interdisciplinary Program in Genetics, University of Arizona
| | - Joanna Masel
- Department of Ecology & Evolutionary Biology, University of Arizona
| |
Collapse
|
39
|
Qiu Y, Huang T, Cai YD. Review of predicting protein stability changes upon variations. Proteomics 2024; 24:e2300371. [PMID: 38643379 DOI: 10.1002/pmic.202300371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/07/2024] [Accepted: 04/08/2024] [Indexed: 04/22/2024]
Abstract
Forecasting alterations in protein stability caused by variations holds immense importance. Improving the thermal stability of proteins is important for biomedical and industrial applications. This review discusses the latest methods for predicting the effects of mutations on protein stability, databases containing protein mutations and thermodynamic parameters, and experimental techniques for efficiently assessing protein stability in high-throughput settings. Various publicly available databases for protein stability prediction are introduced. Furthermore, state-of-the-art computational approaches for anticipating protein stability changes due to variants are reviewed. Each method's types of features, base algorithm, and prediction results are also detailed. Additionally, some experimental approaches for verifying the prediction results of computational methods are introduced. Finally, the review summarizes the progress and challenges of protein stability prediction and discusses potential models for future research directions.
Collapse
Affiliation(s)
- Yiling Qiu
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
40
|
Daffern N, Johansson KE, Baumer ZT, Robertson NR, Woojuh J, Bedewitz MA, Davis Z, Wheeldon I, Cutler SR, Lindorff-Larsen K, Whitehead TA. GMMA Can Stabilize Proteins Across Different Functional Constraints. J Mol Biol 2024; 436:168586. [PMID: 38663544 DOI: 10.1016/j.jmb.2024.168586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/16/2024] [Accepted: 04/17/2024] [Indexed: 05/06/2024]
Abstract
Stabilizing proteins without otherwise hampering their function is a central task in protein engineering and design. PYR1 is a plant hormone receptor that has been engineered to bind diverse small molecule ligands. We sought a set of generalized mutations that would provide stability without affecting functionality for PYR1 variants with diverse ligand-binding capabilities. To do this we used a global multi-mutant analysis (GMMA) approach, which can identify substitutions that have stabilizing effects and do not lower function. GMMA has the added benefit of finding substitutions that are stabilizing in different sequence contexts and we hypothesized that applying GMMA to PYR1 with different functionalities would identify this set of generalized mutations. Indeed, conducting FACS and deep sequencing of libraries for PYR1 variants with two different functionalities and applying a GMMA analysis identified 5 substitutions that, when inserted into four PYR1 variants that each bind a unique ligand, provided an increase of 2-6 °C in thermal inactivation temperature and no decrease in functionality.
Collapse
Affiliation(s)
- Nicolas Daffern
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA
| | - Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Zachary T Baumer
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA
| | | | - Janty Woojuh
- Department of Botany and Plant Sciences, University of California, Riverside, USA
| | - Matthew A Bedewitz
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA
| | - Zoë Davis
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA
| | - Ian Wheeldon
- Department of Chemical and Environmental Engineering, University of California, Riverside, USA; Institute for Integrative Genome Biology, University of California, Riverside, Riverside, CA, USA
| | - Sean R Cutler
- Department of Botany and Plant Sciences, University of California, Riverside, USA; Institute for Integrative Genome Biology, University of California, Riverside, Riverside, CA, USA; Center for Plant Cell Biology, University of California, Riverside, Riverside, CA, USA
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Timothy A Whitehead
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO 80305, USA.
| |
Collapse
|
41
|
Dudley JA, Park S, Cho O, Wells NGM, MacDonald ME, Blejec KM, Fetene E, Zanderigo E, Houliston S, Liddle JC, Dashnaw CM, Sabo TM, Shaw BF, Balsbaugh JL, Rocklin GJ, Smith CA. Heat-induced structural and chemical changes to a computationally designed miniprotein. Protein Sci 2024; 33:e4991. [PMID: 38757381 PMCID: PMC11099715 DOI: 10.1002/pro.4991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/22/2024] [Accepted: 03/28/2024] [Indexed: 05/18/2024]
Abstract
The de novo design of miniprotein inhibitors has recently emerged as a new technology to create proteins that bind with high affinity to specific therapeutic targets. Their size, ease of expression, and apparent high stability makes them excellent candidates for a new class of protein drugs. However, beyond circular dichroism melts and hydrogen/deuterium exchange experiments, little is known about their dynamics, especially at the elevated temperatures they seemingly tolerate quite well. To address that and gain insight for future designs, we have focused on identifying unintended and previously overlooked heat-induced structural and chemical changes in a particularly stable model miniprotein, EHEE_rd2_0005. Nuclear magnetic resonance (NMR) studies suggest the presence of dynamics on multiple time and temperature scales. Transiently elevating the temperature results in spontaneous chemical deamidation visible in the NMR spectra, which we validate using both capillary electrophoresis and mass spectrometry (MS) experiments. High temperatures also result in greatly accelerated intrinsic rates of hydrogen exchange and signal loss in NMR heteronuclear single quantum coherence spectra from local unfolding. These losses are in excellent agreement with both room temperature hydrogen exchange experiments and hydrogen bond disruption in replica exchange molecular dynamics simulations. Our analysis reveals important principles for future miniprotein designs and the potential for high stability to result in long-lived alternate conformational states.
Collapse
Affiliation(s)
- Joshua A. Dudley
- Department of ChemistryWesleyan UniversityMiddletownConnecticutUSA
| | - Sojeong Park
- Department of ChemistryWesleyan UniversityMiddletownConnecticutUSA
| | - Oliver Cho
- Department of ChemistryWesleyan UniversityMiddletownConnecticutUSA
| | | | | | | | - Emmanuel Fetene
- Department of ChemistryWesleyan UniversityMiddletownConnecticutUSA
| | - Eric Zanderigo
- Department of ChemistryWesleyan UniversityMiddletownConnecticutUSA
| | - Scott Houliston
- Structural Genomics ConsortiumUniversity of TorontoTorontoOntarioCanada
| | - Jennifer C. Liddle
- Proteomics and Metabolomics FacilityUniversity of ConnecticutStorrsConnecticutUSA
| | - Chad M. Dashnaw
- Department of Chemistry and BiochemistryBaylor UniversityWacoTexasUSA
| | - T. Michael Sabo
- Department of Medicine and Brown Cancer CenterUniversity of LouisvilleLouisvilleKentuckyUSA
| | - Bryan F. Shaw
- Department of Chemistry and BiochemistryBaylor UniversityWacoTexasUSA
| | - Jeremy L. Balsbaugh
- Proteomics and Metabolomics FacilityUniversity of ConnecticutStorrsConnecticutUSA
| | - Gabriel J. Rocklin
- Department of Pharmacology and Center for Synthetic BiologyNorthwestern UniversityEvanstonIllinoisUSA
| | - Colin A. Smith
- Department of ChemistryWesleyan UniversityMiddletownConnecticutUSA
| |
Collapse
|
42
|
Ito S, Matsunaga R, Nakakido M, Komura D, Katoh H, Ishikawa S, Tsumoto K. High-throughput system for the thermostability analysis of proteins. Protein Sci 2024; 33:e5029. [PMID: 38801228 PMCID: PMC11129621 DOI: 10.1002/pro.5029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 04/30/2024] [Accepted: 05/06/2024] [Indexed: 05/29/2024]
Abstract
Thermal stability of proteins is a primary metric for evaluating their physical properties. Although researchers attempted to predict it using machine learning frameworks, their performance has been dependent on the quality and quantity of published data. This is due to the technical limitation that thermodynamic characterization of protein denaturation by fluorescence or calorimetry in a high-throughput manner has been challenging. Obtaining a melting curve that derives solely from the target protein requires laborious purification, making it far from practical to prepare a hundred or more samples in a single workflow. Here, we aimed to overcome this throughput limitation by leveraging the high protein secretion efficacy of Brevibacillus and consecutive treatment with plate-scale purification methodologies. By handling the entire process of expression, purification, and analysis on a per-plate basis, we enabled the direct observation of protein denaturation in 384 samples within 4 days. To demonstrate a practical application of the system, we conducted a comprehensive analysis of 186 single mutants of a single-chain variable fragment of nivolumab, harvesting the melting temperature (Tm) ranging from -9.3 up to +10.8°C compared to the wild-type sequence. Our findings will allow for data-driven stabilization in protein design and streamlining the rational approaches.
Collapse
Affiliation(s)
- Sae Ito
- Department of Bioengineering, School of EngineeringThe University of TokyoTokyoJapan
| | - Ryo Matsunaga
- Department of Bioengineering, School of EngineeringThe University of TokyoTokyoJapan
- Department of Chemistry and Biotechnology, School of EngineeringThe University of TokyoTokyoJapan
| | - Makoto Nakakido
- Department of Bioengineering, School of EngineeringThe University of TokyoTokyoJapan
- Department of Chemistry and Biotechnology, School of EngineeringThe University of TokyoTokyoJapan
| | - Daisuke Komura
- Department of Preventive Medicine, Graduate School of MedicineThe University of TokyoTokyoJapan
| | - Hiroto Katoh
- Department of Preventive Medicine, Graduate School of MedicineThe University of TokyoTokyoJapan
| | - Shumpei Ishikawa
- Department of Preventive Medicine, Graduate School of MedicineThe University of TokyoTokyoJapan
| | - Kouhei Tsumoto
- Department of Bioengineering, School of EngineeringThe University of TokyoTokyoJapan
- Department of Chemistry and Biotechnology, School of EngineeringThe University of TokyoTokyoJapan
- The Institute of Medical ScienceThe University of TokyoTokyoJapan
| |
Collapse
|
43
|
Jänes J, Müller M, Selvaraj S, Manoel D, Stephenson J, Gonçalves C, Lafita A, Polacco B, Obernier K, Alasoo K, Lemos MC, Krogan N, Martin M, Saraiva LR, Burke D, Beltrao P. Predicted mechanistic impacts of human protein missense variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.29.596373. [PMID: 38854010 PMCID: PMC11160786 DOI: 10.1101/2024.05.29.596373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Genome sequencing efforts have led to the discovery of tens of millions of protein missense variants found in the human population with the majority of these having no annotated role and some likely contributing to trait variation and disease. Sequence-based artificial intelligence approaches have become highly accurate at predicting variants that are detrimental to the function of proteins but they do not inform on mechanisms of disruption. Here we combined sequence and structure-based methods to perform proteome-wide prediction of deleterious variants with information on their impact on protein stability, protein-protein interactions and small-molecule binding pockets. AlphaFold2 structures were used to predict approximately 100,000 small-molecule binding pockets and stability changes for over 200 million variants. To inform on protein-protein interfaces we used AlphaFold2 to predict structures for nearly 500,000 protein complexes. We illustrate the value of mechanism-aware variant effect predictions to study the relation between protein stability and abundance and the structural properties of interfaces underlying trans protein quantitative trait loci (pQTLs). We characterised the distribution of mechanistic impacts of protein variants found in patients and experimentally studied example disease linked variants in FGFR1.
Collapse
Affiliation(s)
- Jürgen Jänes
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Müller
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Senthil Selvaraj
- Sidra Medicine, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | - Diogo Manoel
- Sidra Medicine, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | - James Stephenson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Catarina Gonçalves
- Sidra Medicine, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | | | - Benjamin Polacco
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA, USA
| | - Kirsten Obernier
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA, USA
| | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Manuel C. Lemos
- CICS-UBI, Health Sciences Research Centre, University of Beira Interior, 6200-506, Covilhã, Portugal
| | - Nevan Krogan
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA, USA
- J. David Gladstone Institutes, San Francisco, CA, USA
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Luis R. Saraiva
- Sidra Medicine, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | - David Burke
- Faculty of Life Sciences and Medicine, King’s College, London, UK
| | - Pedro Beltrao
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| |
Collapse
|
44
|
Grønbæk-Thygesen M, Voutsinos V, Johansson KE, Schulze TK, Cagiada M, Pedersen L, Clausen L, Nariya S, Powell RL, Stein A, Fowler DM, Lindorff-Larsen K, Hartmann-Petersen R. Deep mutational scanning reveals a correlation between degradation and toxicity of thousands of aspartoacylase variants. Nat Commun 2024; 15:4026. [PMID: 38740822 DOI: 10.1038/s41467-024-48481-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 05/02/2024] [Indexed: 05/16/2024] Open
Abstract
Unstable proteins are prone to form non-native interactions with other proteins and thereby may become toxic. To mitigate this, destabilized proteins are targeted by the protein quality control network. Here we present systematic studies of the cytosolic aspartoacylase, ASPA, where variants are linked to Canavan disease, a lethal neurological disorder. We determine the abundance of 6152 of the 6260 ( ~ 98%) possible single amino acid substitutions and nonsense ASPA variants in human cells. Most low abundance variants are degraded through the ubiquitin-proteasome pathway and become toxic upon prolonged expression. The data correlates with predicted changes in thermodynamic stability, evolutionary conservation, and separate disease-linked variants from benign variants. Mapping of degradation signals (degrons) shows that these are often buried and the C-terminal region functions as a degron. The data can be used to interpret Canavan disease variants and provide insight into the relationship between protein stability, degradation and cell fitness.
Collapse
Affiliation(s)
- Martin Grønbæk-Thygesen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Vasileios Voutsinos
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Thea K Schulze
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Line Pedersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Lene Clausen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Snehal Nariya
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Rachel L Powell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Amelie Stein
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Rasmus Hartmann-Petersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
45
|
Simon JJ, Fowler DM, Maly DJ. Multiplexed, multimodal profiling of the intracellular activity, interactions, and druggability of protein variants using LABEL-seq. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.19.590094. [PMID: 38659825 PMCID: PMC11042325 DOI: 10.1101/2024.04.19.590094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Multiplexed assays of variant effect are powerful tools for assessing the impact of protein sequence variation, but are limited to measuring a single protein property and often rely on indirect readouts of intracellular protein function. Here, we developed LAbeling with Barcodes and Enrichment for biochemicaL analysis by sequencing (LABEL-seq), a platform for the multimodal profiling of thousands of protein variants in cultured human cells. Multimodal measurement of ~20,000 variant effects for ~1,600 BRaf variants using LABEL-seq revealed that variation at positions that are frequently mutated in cancer had minimal effects on folding and intracellular abundance but could dramatically alter activity, protein-protein interactions, and druggability. Integrative analysis of our multimodal measurements identified networks of positions with similar roles in regulating BRaf's signaling properties and enabled predictive modeling of variant effects on complex processes such as cell proliferation and small molecule-promoted degradation. LABEL-seq provides a scalable approach for the direct measurement of multiple biochemical effects of protein variants in their native cellular context, yielding insight into protein function, disease mechanisms, and druggability.
Collapse
Affiliation(s)
- Jessica J Simon
- Department of Chemistry, University of Washington, Seattle, WA, United States
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
- Department of Bioengineering, University of Washington, Seattle, WA, United States
- Co-corresponding authors: ,
| | - Dustin J Maly
- Department of Chemistry, University of Washington, Seattle, WA, United States
- Department of Biochemistry, University of Washington, Seattle, WA, United States
- Co-corresponding authors: ,
| |
Collapse
|
46
|
Ohara N, Kawakami N, Arai R, Adachi N, Ikeda A, Senda T, Miyamoto K. Fusion then fission: splitting and reassembly of an artificial fusion-protein nanocage. Chem Commun (Camb) 2024; 60:4605-4608. [PMID: 38586927 DOI: 10.1039/d4cc00115j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
A split-protein system is a simple approach to introduce new termini which are useful as modification sites in protein engineering, but has been adapted mainly for monomeric proteins. Here we demonstrate the design of split subunits of the 60-mer artificial fusion-protein nanocage TIP60. The subunit fragments successfully reformed the cage structure in the same manner as prior to splitting. One of the newly introduced terminals at the interior surface can be modified using a tag peptide and green fluorescent protein. Therefore, the termini could serve as a versatile modification site for incorporating a wide variety of functional peptides and proteins.
Collapse
Affiliation(s)
- Naoya Ohara
- Department of Bioscience and Informatics, Faculty of Science and Technology, Keio University, Yokohama, Kanagawa 223-8522, Japan.
| | - Norifumi Kawakami
- Department of Bioscience and Informatics, Faculty of Science and Technology, Keio University, Yokohama, Kanagawa 223-8522, Japan.
| | - Ryoichi Arai
- Department of Biomolecular Innovation, Institute for Biomedical Sciences, Interdisciplinary Cluster for Cutting Edge Research, Shinshu University, Ueda, Nagano 386-8567, Japan
- Department of Applied Biology, Faculty of Textile Science and Technology, Shinshu University, Ueda, Nagano 386-8567, Japan
| | - Naruhiko Adachi
- Structural Biology Research Center, Institute of Materials Structure Science, High Energy Accelerator Research Organization (KEK), Oho, Tsukuba 305-0801, Japan
- Life Science Center for Survival Dynamics, Tsukuba Advanced Research Alliance (TARA), University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan
| | - Akihito Ikeda
- Structural Biology Research Center, Institute of Materials Structure Science, High Energy Accelerator Research Organization (KEK), Oho, Tsukuba 305-0801, Japan
| | - Toshiya Senda
- Structural Biology Research Center, Institute of Materials Structure Science, High Energy Accelerator Research Organization (KEK), Oho, Tsukuba 305-0801, Japan
| | - Kenji Miyamoto
- Department of Bioscience and Informatics, Faculty of Science and Technology, Keio University, Yokohama, Kanagawa 223-8522, Japan.
| |
Collapse
|
47
|
Howard MK, Hoppe N, Huang XP, Macdonald CB, Mehrota E, Grimes PR, Zahm A, Trinidad DD, English J, Coyote-Maestas W, Manglik A. Molecular basis of proton-sensing by G protein-coupled receptors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.17.590000. [PMID: 38659943 PMCID: PMC11042331 DOI: 10.1101/2024.04.17.590000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Three proton-sensing G protein-coupled receptors (GPCRs), GPR4, GPR65, and GPR68, respond to changes in extracellular pH to regulate diverse physiology and are implicated in a wide range of diseases. A central challenge in determining how protons activate these receptors is identifying the set of residues that bind protons. Here, we determine structures of each receptor to understand the spatial arrangement of putative proton sensing residues in the active state. With a newly developed deep mutational scanning approach, we determined the functional importance of every residue in proton activation for GPR68 by generating ~9,500 mutants and measuring effects on signaling and surface expression. This unbiased screen revealed that, unlike other proton-sensitive cell surface channels and receptors, no single site is critical for proton recognition in GPR68. Instead, a network of titratable residues extend from the extracellular surface to the transmembrane region and converge on canonical class A GPCR activation motifs to activate proton-sensing GPCRs. More broadly, our approach integrating structure and unbiased functional interrogation defines a new framework for understanding the rich complexity of GPCR signaling.
Collapse
Affiliation(s)
- Matthew K. Howard
- Tetrad graduate program, University of California, San Francisco, CA, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
- Department of Bioengineering and Therapeutic Science, University of California, San Francisco, CA, USA
| | - Nicholas Hoppe
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
- Biophysics graduate program, University of California, San Francisco, CA, USA
| | - Xi-Ping Huang
- Department of Pharmacology and the National Institute of Mental Health Psychoactive Drug Screening Program (NIMH PDSP), The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Christian B. Macdonald
- Department of Bioengineering and Therapeutic Science, University of California, San Francisco, CA, USA
| | - Eshan Mehrota
- Tetrad graduate program, University of California, San Francisco, CA, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
- Medical Scientist Training Program, University of California, San Francisco, CA, USA
| | | | - Adam Zahm
- Department of Biochemistry, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Donovan D. Trinidad
- Department of Medicine, Division of Infectious Disease, University of California, San Francisco, United States
| | - Justin English
- Department of Biochemistry, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Willow Coyote-Maestas
- Department of Bioengineering and Therapeutic Science, University of California, San Francisco, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
- Quantitative Biosciences Institute, University of California, San Francisco, USA
| | - Aashish Manglik
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
- Quantitative Biosciences Institute, University of California, San Francisco, USA
- Department of Anesthesia and Perioperative Care, University of California, San Francisco, CA, USA
| |
Collapse
|
48
|
Grønbæk-Thygesen M, Hartmann-Petersen R. Cellular and molecular mechanisms of aspartoacylase and its role in Canavan disease. Cell Biosci 2024; 14:45. [PMID: 38582917 PMCID: PMC10998430 DOI: 10.1186/s13578-024-01224-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 03/24/2024] [Indexed: 04/08/2024] Open
Abstract
Canavan disease is an autosomal recessive and lethal neurological disorder, characterized by the spongy degeneration of the white matter in the brain. The disease is caused by a deficiency of the cytosolic aspartoacylase (ASPA) enzyme, which catalyzes the hydrolysis of N-acetyl-aspartate (NAA), an abundant brain metabolite, into aspartate and acetate. On the physiological level, the mechanism of pathogenicity remains somewhat obscure, with multiple, not mutually exclusive, suggested hypotheses. At the molecular level, recent studies have shown that most disease linked ASPA gene variants lead to a structural destabilization and subsequent proteasomal degradation of the ASPA protein variants, and accordingly Canavan disease should in general be considered a protein misfolding disorder. Here, we comprehensively summarize the molecular and cell biology of ASPA, with a particular focus on disease-linked gene variants and the pathophysiology of Canavan disease. We highlight the importance of high-throughput technologies and computational prediction tools for making genotype-phenotype predictions as we await the results of ongoing trials with gene therapy for Canavan disease.
Collapse
Affiliation(s)
- Martin Grønbæk-Thygesen
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200N, Copenhagen, Denmark.
| | - Rasmus Hartmann-Petersen
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200N, Copenhagen, Denmark.
| |
Collapse
|
49
|
He Y, Zhou X, Chang C, Chen G, Liu W, Li G, Fan X, Sun M, Miao C, Huang Q, Ma Y, Yuan F, Chang X. Protein language models-assisted optimization of a uracil-N-glycosylase variant enables programmable T-to-G and T-to-C base editing. Mol Cell 2024; 84:1257-1270.e6. [PMID: 38377993 DOI: 10.1016/j.molcel.2024.01.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 12/20/2023] [Accepted: 01/24/2024] [Indexed: 02/22/2024]
Abstract
Current base editors (BEs) use DNA deaminases, including cytidine deaminase in cytidine BE (CBE) or adenine deaminase in adenine BE (ABE), to facilitate transition nucleotide substitutions. Combining CBE or ABE with glycosylase enzymes can induce limited transversion mutations. Nonetheless, a critical demand remains for BEs capable of generating alternative mutation types, such as T>G corrections. In this study, we leveraged pre-trained protein language models to optimize a uracil-N-glycosylase (UNG) variant with altered specificity for thymines (eTDG). Notably, after two rounds of testing fewer than 50 top-ranking variants, more than 50% exhibited over 1.5-fold enhancement in enzymatic activities. When eTDG was fused with nCas9, it induced programmable T-to-S (G/C) substitutions and corrected db/db diabetic mutation in mice (up to 55%). Our findings not only establish orthogonal strategies for developing novel BEs but also demonstrate the capacities of protein language models for optimizing enzymes without extensive task-specific training data.
Collapse
Affiliation(s)
- Yan He
- Fudan University, 220 Handan Road, Shanghai 200433, China; School of Medicine, Westlake University, Hangzhou, Zhejiang 310014, China; School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310014, China; Research Center for Industries of the Future (RCIF), Westlake University, Hangzhou, Zhejiang 310014, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, Zhejiang 310014, China; Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Xibin Zhou
- School of Engineering, Westlake University, Hangzhou, Zhejiang 310014, China
| | - Chong Chang
- School of Medicine, Westlake University, Hangzhou, Zhejiang 310014, China; School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310014, China; Research Center for Industries of the Future (RCIF), Westlake University, Hangzhou, Zhejiang 310014, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, Zhejiang 310014, China; Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Ge Chen
- School of Medicine, Westlake University, Hangzhou, Zhejiang 310014, China; School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310014, China; Research Center for Industries of the Future (RCIF), Westlake University, Hangzhou, Zhejiang 310014, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, Zhejiang 310014, China; Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Weikuan Liu
- Fudan University, 220 Handan Road, Shanghai 200433, China; School of Medicine, Westlake University, Hangzhou, Zhejiang 310014, China; School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310014, China; Research Center for Industries of the Future (RCIF), Westlake University, Hangzhou, Zhejiang 310014, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, Zhejiang 310014, China; Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Geng Li
- School of Medicine, Westlake University, Hangzhou, Zhejiang 310014, China; School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310014, China; Research Center for Industries of the Future (RCIF), Westlake University, Hangzhou, Zhejiang 310014, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, Zhejiang 310014, China; Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Xiaoqi Fan
- School of Medicine, Westlake University, Hangzhou, Zhejiang 310014, China; School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310014, China; Research Center for Industries of the Future (RCIF), Westlake University, Hangzhou, Zhejiang 310014, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, Zhejiang 310014, China; Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Mingsun Sun
- School of Medicine, Westlake University, Hangzhou, Zhejiang 310014, China; School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310014, China; Research Center for Industries of the Future (RCIF), Westlake University, Hangzhou, Zhejiang 310014, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, Zhejiang 310014, China; Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Chensi Miao
- School of Medicine, Westlake University, Hangzhou, Zhejiang 310014, China; School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310014, China; Research Center for Industries of the Future (RCIF), Westlake University, Hangzhou, Zhejiang 310014, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, Zhejiang 310014, China; Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Qianyue Huang
- School of Medicine, Westlake University, Hangzhou, Zhejiang 310014, China; School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310014, China; Research Center for Industries of the Future (RCIF), Westlake University, Hangzhou, Zhejiang 310014, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, Zhejiang 310014, China; Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Yunqing Ma
- School of Medicine, Westlake University, Hangzhou, Zhejiang 310014, China; School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310014, China; Research Center for Industries of the Future (RCIF), Westlake University, Hangzhou, Zhejiang 310014, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, Zhejiang 310014, China; Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Fajie Yuan
- School of Engineering, Westlake University, Hangzhou, Zhejiang 310014, China.
| | - Xing Chang
- School of Medicine, Westlake University, Hangzhou, Zhejiang 310014, China; School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310014, China; Research Center for Industries of the Future (RCIF), Westlake University, Hangzhou, Zhejiang 310014, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, Zhejiang 310014, China; Westlake Center for Genome Editing, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China.
| |
Collapse
|
50
|
Okada Y, Suzuki H, Tanaka T, Kaneko MK, Kato Y. Epitope Mapping of an Anti-Mouse CD39 Monoclonal Antibody Using PA Scanning and RIEDL Scanning. Monoclon Antib Immunodiagn Immunother 2024; 43:44-52. [PMID: 38507671 DOI: 10.1089/mab.2023.0029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2024] Open
Abstract
A cell-surface ectonucleotidase CD39 mediates the conversion of extracellular adenosine triphosphate into immunosuppressive adenosine with another ectonucleotidase CD73. The elevated adenosine in the tumor microenvironment attenuates antitumor immunity, which promotes tumor cell immunologic escape and progression. Anti-CD39 monoclonal antibodies (mAbs), which suppress the enzymatic activity, can be applied to antitumor therapy. Therefore, an understanding of the relationship between the inhibitory activity and epitope of mAbs is important. We previously established an anti-mouse CD39 (anti-mCD39) mAb, C39Mab-1 using the Cell-Based Immunization and Screening method. In this study, we determined the critical epitope of C39Mab-1 using flow cytometry. We performed the PA tag (12 amino acids [aa])-substituted analysis (named PA scanning) and RIEDL tag (5 aa)-substituted analysis (named RIEDL scanning) to determine the critical epitope of C39Mab-1 using flow cytometry. By the combination of PA scanning and RIEDL scanning, we identified the conformational epitope, spanning three segments of 275-279, 282-291, and 306-323 aa of mCD39. These analyses would contribute to the identification of the conformational epitope of membrane proteins.
Collapse
Affiliation(s)
- Yuki Okada
- Department of Antibody Drug Development, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Hiroyuki Suzuki
- Department of Antibody Drug Development, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Tomohiro Tanaka
- Department of Antibody Drug Development, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Mika K Kaneko
- Department of Antibody Drug Development, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Yukinari Kato
- Department of Antibody Drug Development, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| |
Collapse
|