1
|
Knapp BD, Shi H, Huang KC. Complex state transitions of the bacterial cell division protein FtsZ. Mol Biol Cell 2024; 35:ar130. [PMID: 39083352 DOI: 10.1091/mbc.e23-11-0446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/02/2024] Open
Abstract
The key bacterial cell division protein FtsZ can adopt multiple conformations, and prevailing models suggest that transitions of FtsZ subunits from the closed to open state are necessary for filament formation and stability. Using all-atom molecular dynamics simulations, we analyzed state transitions of Staphylococcus aureus FtsZ as a monomer, dimer, and hexamer. We found that monomers can adopt intermediate states but preferentially adopt a closed state that is robust to forced reopening. Dimer subunits transitioned between open and closed states, and dimers with both subunits in the closed state remained highly stable, suggesting that open-state conformations are not necessary for filament formation. Mg2+ strongly stabilized the conformation of GTP-bound subunits and the dimer filament interface. Our hexamer simulations indicate that the plus end subunit preferentially closes and that other subunits can transition between states without affecting inter-subunit stability. We found that rather than being correlated with subunit opening, inter-subunit stability was strongly correlated with catalytic site interactions. By leveraging deep-learning models, we identified key intrasubunit interactions governing state transitions. Our findings suggest a greater range of possible monomer and filament states than previously considered and offer new insights into the nuanced interplay between subunit states and the critical role of nucleotide hydrolysis and Mg2+ in FtsZ filament dynamics.
Collapse
Affiliation(s)
| | - Handuo Shi
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305
- Department of Bioengineering, Stanford University, Stanford, CA 94305
| | - Kerwyn Casey Huang
- Biophysics Program, Stanford University, Stanford, CA 94305
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305
- Department of Bioengineering, Stanford University, Stanford, CA 94305
- Chan Zuckerberg Biohub, San Francisco, CA 94158
| |
Collapse
|
2
|
Scrima S, Lambrughi M, Tiberti M, Fadda E, Papaleo E. ASM variants in the spotlight: A structure-based atlas for unraveling pathogenic mechanisms in lysosomal acid sphingomyelinase. Biochim Biophys Acta Mol Basis Dis 2024; 1870:167260. [PMID: 38782304 DOI: 10.1016/j.bbadis.2024.167260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 04/30/2024] [Accepted: 05/18/2024] [Indexed: 05/25/2024]
Abstract
Lysosomal acid sphingomyelinase (ASM), a critical enzyme in lipid metabolism encoded by the SMPD1 gene, plays a crucial role in sphingomyelin hydrolysis in lysosomes. ASM deficiency leads to acid sphingomyelinase deficiency, a rare genetic disorder with diverse clinical manifestations, and the protein can be found mutated in other diseases. We employed a structure-based framework to comprehensively understand the functional implications of ASM variants, integrating pathogenicity predictions with molecular insights derived from a molecular dynamics simulation in a lysosomal membrane environment. Our analysis, encompassing over 400 variants, establishes a structural atlas of missense variants of lysosomal ASM, associating mechanistic indicators with pathogenic potential. Our study highlights variants that influence structural stability or exert local and long-range effects at functional sites. To validate our predictions, we compared them to available experimental data on residual catalytic activity in 135 ASM variants. Notably, our findings also suggest applications of the resulting data for identifying cases suited for enzyme replacement therapy. This comprehensive approach enhances the understanding of ASM variants and provides valuable insights for potential therapeutic interventions.
Collapse
Affiliation(s)
- Simone Scrima
- Cancer Structural Biology, Center for Autophagy, Recycling and Disease, Danish Cancer Institute, 2100 Copenhagen, Denmark; Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Matteo Lambrughi
- Cancer Structural Biology, Center for Autophagy, Recycling and Disease, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Matteo Tiberti
- Cancer Structural Biology, Center for Autophagy, Recycling and Disease, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Elisa Fadda
- Department of Chemistry and Hamilton Institute, Maynooth University, Maynooth, co. Kildare, Ireland
| | - Elena Papaleo
- Cancer Structural Biology, Center for Autophagy, Recycling and Disease, Danish Cancer Institute, 2100 Copenhagen, Denmark; Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800 Lyngby, Denmark.
| |
Collapse
|
3
|
Chen Y, Xu Y, Liu D, Xing Y, Gong H. An end-to-end framework for the prediction of protein structure and fitness from single sequence. Nat Commun 2024; 15:7400. [PMID: 39191788 DOI: 10.1038/s41467-024-51776-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 08/19/2024] [Indexed: 08/29/2024] Open
Abstract
Significant research progress has been made in the field of protein structure and fitness prediction. Particularly, single-sequence-based structure prediction methods like ESMFold and OmegaFold achieve a balance between inference speed and prediction accuracy, showing promise for many downstream prediction tasks. Here, we propose SPIRED, a single-sequence-based structure prediction model that exhibits comparable performance to the state-of-the-art methods but with approximately 5-fold acceleration in inference and at least one order of magnitude reduction in training consumption. By integrating SPIRED with downstream neural networks, we compose an end-to-end framework named SPIRED-Fitness for the rapid prediction of both protein structure and fitness from single sequence with satisfactory accuracy. Moreover, SPIRED-Stab, the derivative of SPIRED-Fitness, achieves state-of-the-art performance in predicting the mutational effects on protein stability.
Collapse
Affiliation(s)
- Yinghui Chen
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China
| | - Yunxin Xu
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China
| | - Di Liu
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China
| | - Yaoguang Xing
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China.
| |
Collapse
|
4
|
Guclu TF, Atilgan AR, Atilgan C. Deciphering GB1's Single Mutational Landscape: Insights from MuMi Analysis. J Phys Chem B 2024; 128:7987-7996. [PMID: 39115184 DOI: 10.1021/acs.jpcb.4c04916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2024]
Abstract
Mutational changes that affect the binding of the C2 fragment of Streptococcal protein G (GB1) to the Fc domain of human IgG (IgG-Fc) have been extensively studied using deep mutational scanning (DMS), and the binding affinity of all single mutations has been measured experimentally in the literature. To investigate the underlying molecular basis, we perform in silico mutational scanning for all possible single mutations, along with 2 μs-long molecular dynamics (WT-MD) of the wild-type (WT) GB1 in both unbound and IgG-Fc bound forms. We compute the hydrogen bonds between GB1 and IgG-Fc in WT-MD to identify the dominant hydrogen bonds for binding, which we then assess in conformations produced by Mutation and Minimization (MuMi) to explain the fitness landscape of GB1 and IgG-Fc binding. Furthermore, we analyze MuMi and WT-MD to investigate the dynamics of binding, focusing on the relative solvent accessibility of residues and the probability of residues being located at the binding interface. With these analyses, we explain the interactions between GB1 and IgG-Fc and display the structural features of binding. In sum, our findings highlight the potential of MuMi as a reliable and computationally efficient tool for predicting protein fitness landscapes, offering significant advantages over traditional methods. The methodologies and results presented in this study pave the way for improved predictive accuracy in protein stability and interaction studies, which are crucial for advancements in drug design and synthetic biology.
Collapse
Affiliation(s)
- Tandac F Guclu
- Faculty of Natural Sciences and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| | - Ali Rana Atilgan
- Faculty of Natural Sciences and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| | - Canan Atilgan
- Faculty of Natural Sciences and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| |
Collapse
|
5
|
Ganesan S, Mittal N, Bhat A, Adiga RS, Ganesan A, Nagarajan D, Varadarajan R. Improved Prediction of Stabilizing Mutations in Proteins by Incorporation of Mutational Effects on Ligand Binding. Proteins 2024. [PMID: 39166462 DOI: 10.1002/prot.26738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 07/18/2024] [Accepted: 08/05/2024] [Indexed: 08/23/2024]
Abstract
While many computational methods accurately predict destabilizing mutations, identifying stabilizing mutations has remained a challenge, because of their relative rarity. We tested ΔΔG0 predictions from computational predictors such as Rosetta, ThermoMPNN, RaSP, and DeepDDG, using 82 mutants of the bacterial toxin CcdB as a test case. On this dataset, the best computational predictor is ThermoMPNN, which identifies stabilizing mutations with a precision of 68%. However, the average increase in Tm for these predicted mutations was only 1°C for CcdB, and predictions were poorer for a more challenging target, influenza neuraminidase. Using data from multiple previously described yeast surface display libraries and in vitro thermal stability measurements, we trained logistic regression models to identify stabilizing mutations with a precision of 90% and an average increase in Tm of 3°C for CcdB. When such libraries contain a population of mutants with significantly enhanced binding relative to the corresponding wild type, there is no benefit in using computational predictors. It is then possible to predict stabilizing mutations without any training, simply by examining the distribution of mutational binding scores. This avoids laborious steps of in vitro expression, purification, and stability characterization. When this is not the case, combining data from computational predictors with high-throughput experimental binding data enhances the prediction of stabilizing mutations. However, this requires training on stability data measured in vitro with known stabilized mutants. It is thus feasible to predict stabilizing mutations rapidly and accurately for any system of interest that can be subjected to a binding selection or screen.
Collapse
Affiliation(s)
- Srivarshini Ganesan
- Molecular Biophysics Unit (MBU), Indian Institute of Science, Bengaluru, India
| | - Nidhi Mittal
- Molecular Biophysics Unit (MBU), Indian Institute of Science, Bengaluru, India
| | - Akash Bhat
- Department of Biotechnology, M.S. Ramaiah University of Applied Sciences, Bengaluru, India
| | - Rachana S Adiga
- Department of Biotechnology, M.S. Ramaiah University of Applied Sciences, Bengaluru, India
| | - Ananthakrishnan Ganesan
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, California, USA
| | - Deepesh Nagarajan
- Department of Biotechnology, M.S. Ramaiah University of Applied Sciences, Bengaluru, India
| | | |
Collapse
|
6
|
Correa Marrero M, Jänes J, Baptista D, Beltrao P. Integrating Large-Scale Protein Structure Prediction into Human Genetics Research. Annu Rev Genomics Hum Genet 2024; 25:123-140. [PMID: 38621234 DOI: 10.1146/annurev-genom-120622-020615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
The last five years have seen impressive progress in deep learning models applied to protein research. Most notably, sequence-based structure predictions have seen transformative gains in the form of AlphaFold2 and related approaches. Millions of missense protein variants in the human population lack annotations, and these computational methods are a valuable means to prioritize variants for further analysis. Here, we review the recent progress in deep learning models applied to the prediction of protein structure and protein variants, with particular emphasis on their implications for human genetics and health. Improved prediction of protein structures facilitates annotations of the impact of variants on protein stability, protein-protein interaction interfaces, and small-molecule binding pockets. Moreover, it contributes to the study of host-pathogen interactions and the characterization of protein function. As genome sequencing in large cohorts becomes increasingly prevalent, we believe that better integration of state-of-the-art protein informatics technologies into human genetics research is of paramount importance.
Collapse
Affiliation(s)
- Miguel Correa Marrero
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | - Jürgen Jänes
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | | | - Pedro Beltrao
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| |
Collapse
|
7
|
Diaz DJ, Gong C, Ouyang-Zhang J, Loy JM, Wells J, Yang D, Ellington AD, Dimakis AG, Klivans AR. Stability Oracle: a structure-based graph-transformer framework for identifying stabilizing mutations. Nat Commun 2024; 15:6170. [PMID: 39043654 PMCID: PMC11266546 DOI: 10.1038/s41467-024-49780-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 06/14/2024] [Indexed: 07/25/2024] Open
Abstract
Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle: a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations. Our framework introduces several innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time, such as: Thermodynamic Permutations for data augmentation, structural amino acid embeddings to model a mutation with a single structure, a protein structure-specific attention-bias mechanism that makes transformers a viable alternative to graph neural networks. We provide training/test splits that mitigate data leakage and ensure proper model evaluation. Furthermore, to examine our data engineering contributions, we fine-tune ESM2 representations (Prostata-IFML) and achieve SOTA for sequence-based models. Notably, Stability Oracle outperforms Prostata-IFML even though it was pretrained on 2000X less proteins and has 548X less parameters. Our framework establishes a path for fine-tuning structure-based transformers to virtually any phenotype, a necessary task for accelerating the development of protein-based biotechnologies.
Collapse
Affiliation(s)
- Daniel J Diaz
- UT Austin, Department of Computer Science, Austin, TX, 78712, USA.
- Intelligent Proteins, LLC, Austin, TX, 78712, USA.
- UT Austin, Department of Chemistry, Austin, TX, 78712, USA.
| | - Chengyue Gong
- UT Austin, Department of Computer Science, Austin, TX, 78712, USA
| | | | - James M Loy
- Intelligent Proteins, LLC, Austin, TX, 78712, USA
- UT Austin, Department of Molecular Biosciences, Austin, TX, 78712, USA
| | - Jordan Wells
- UT Austin, McKetta Department of Chemical Engineering, Austin, TX, 78712, USA
| | - David Yang
- UT Austin, Department of Molecular Biosciences, Austin, TX, 78712, USA
| | | | - Alexandros G Dimakis
- UT Austin, Chandra Family Department of Electrical and Computer Engineering, Austin, TX, 78712, USA
| | - Adam R Klivans
- UT Austin, Department of Computer Science, Austin, TX, 78712, USA
| |
Collapse
|
8
|
Cuturello F, Celoria M, Ansuini A, Cazzaniga A. Enhancing predictions of protein stability changes induced by single mutations using MSA-based Language Models. Bioinformatics 2024; 40:btae447. [PMID: 39012369 PMCID: PMC11269464 DOI: 10.1093/bioinformatics/btae447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/19/2024] [Accepted: 07/10/2024] [Indexed: 07/17/2024] Open
Abstract
MOTIVATION Protein Language Models offer a new perspective for addressing challenges in structural biology, while relying solely on sequence information. Recent studies have investigated their effectiveness in forecasting shifts in thermodynamic stability caused by single amino acid mutations, a task known for its complexity due to the sparse availability of data, constrained by experimental limitations. To tackle this problem, we introduce two key novelties: leveraging a Protein Language Model that incorporates Multiple Sequence Alignments to capture evolutionary information, and using a recently released mega-scale dataset with rigorous data pre-processing to mitigate overfitting. RESULTS We ensure comprehensive comparisons by fine-tuning various pre-trained models, taking advantage of analyses such as ablation studies and baselines evaluation. Our methodology introduces a stringent policy to reduce the widespread issue of data leakage, rigorously removing sequences from the training set when they exhibit significant similarity with the test set. The MSA Transformer emerges as the most accurate among the models under investigation, given its capability to leverage co-evolution signals encoded in aligned homologous sequences. Moreover, the optimized MSA Transformer outperforms existing methods and exhibits enhanced generalization power, leading to a notable improvement in predicting changes in protein stability resulting from point mutations. AVAILABILITY AND IMPLEMENTATION Code and data at https://github.com/RitAreaSciencePark/PLM4Muts. SUPPLEMENTARY INFORMATION Supplementary Information is available at Bioinformatics online.
Collapse
Affiliation(s)
- Francesca Cuturello
- Research and Technology Institute, , AREA Science Park, Trieste 34149, Italy
| | - Marco Celoria
- Research and Technology Institute, , AREA Science Park, Trieste 34149, Italy
- HPC Department, , CINECA National Supercomputing Center, Bologna 40033, Italy
| | - Alessio Ansuini
- Research and Technology Institute, , AREA Science Park, Trieste 34149, Italy
| | - Alberto Cazzaniga
- Research and Technology Institute, , AREA Science Park, Trieste 34149, Italy
| |
Collapse
|
9
|
Al-Mutairi DA, Alsabah BH, Pennekamp P, Omran H. Novel pathogenic variants of DNAH5 associated with clinical and genetic spectra of primary ciliary dyskinesia in an Arab population. Front Genet 2024; 15:1396797. [PMID: 39045318 PMCID: PMC11264286 DOI: 10.3389/fgene.2024.1396797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 05/20/2024] [Indexed: 07/25/2024] Open
Abstract
Introduction: Primary ciliary dyskinesia (PCD) is caused by the dysfunction of motile cilia resulting in insufficient mucociliary clearance of the lungs. This study aimed to map novel PCD variants and determine their pathogenicity in PCD patients in Kuwait. Methods: Herein, we present five PCD individuals belonging to a cohort of 105 PCD individuals recruited from different hospitals in Kuwait. Genomic DNAs from the family members were analysed to screen for pathogenic PCD variants. Transmission electron microscopy (TEM) and immunofluorescence (IF) analyses were performed on the nasal biopsies to detect specific structural abnormalities within the ciliated cells. Results: Genetic screening and functional analyses confirmed that the five PCD individuals carried novel pathogenic variants of DNAH5 causing PCD in three Arabic families. Of these, one multiplex family with two affected individuals showed two novel homozygous missense variants in DNAH5 causing PCD with situs inversus; another multiplex family with two affected individuals showed two newly identified compound heterozygous variants in DNAH5 causing PCD with situs solitus. In addition, novel heterozygous variants were identified in a child with PCD and situs solitus from a singleton family with unrelated parents. TEM analysis demonstrated the lack of outer dynein arms (ODAs) in all analysed samples, and IF analysis confirmed the absence of the dynein arm component of DNAH5 from the ciliary axoneme. Conclusion: The newly identified pathogenic variants of DNAH5 are associated with PCD as well as variable pulmonary clinical manifestations in Arabic families.
Collapse
Affiliation(s)
- Dalal A. Al-Mutairi
- Department of Pathology, Faculty of Medicine, Kuwait University, Kuwait City, Kuwait
| | | | - Petra Pennekamp
- Department of Pediatrics, University Hospital Muenster, Muenster, Germany
| | - Heymut Omran
- Department of Pediatrics, University Hospital Muenster, Muenster, Germany
| |
Collapse
|
10
|
Visani GM, Pun MN, Galvin W, Daniel E, Borisiak K, Wagura U, Nourmohammad A. HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction. ARXIV 2024:arXiv:2407.06703v1. [PMID: 39040640 PMCID: PMC11261993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
Predicting the stability and fitness effects of amino acid mutations in proteins is a cornerstone of biological discovery and engineering. Various experimental techniques have been developed to measure mutational effects, providing us with extensive datasets across a diverse range of proteins. By training on these data, traditional computational modeling and more recent machine learning approaches have advanced significantly in predicting mutational effects. Here, we introduce HERMES, a 3D rotationally equivariant structure-based neural network model for mutational effect and stability prediction. Pre-trained to predict amino acid propensity from its surrounding 3D structure, HERMES can be fine-tuned for mutational effects using our open-source code. We present a suite of HERMES models, pre-trained with different strategies, and fine-tuned to predict the stability effect of mutations. Benchmarking against other models shows that HERMES often outperforms or matches their performance in predicting mutational effect on stability, binding, and fitness. HERMES offers versatile tools for evaluating mutational effects and can be fine-tuned for specific predictive objectives.
Collapse
Affiliation(s)
- Gian Marco Visani
- Department of Computer Science and Engineering, University of Washington, Seattle, USA
| | - Michael N. Pun
- Department of Physics, University of Washington, 3910 15th Avenue Northeast, Seattle, WA 98195, USA
| | - William Galvin
- Department of Computer Science and Engineering, University of Washington, Seattle, USA
| | - Eric Daniel
- Department of Computer Science and Engineering, University of Washington, Seattle, USA
| | - Kevin Borisiak
- Department of Physics, University of Washington, 3910 15th Avenue Northeast, Seattle, WA 98195, USA
| | - Utheri Wagura
- Department of Physics, University of Washington, 3910 15th Avenue Northeast, Seattle, WA 98195, USA
- Department of Physics, Massachusetts Institute of Technology, 182 Memorial Dr, Cambridge, MA 02139
| | - Armita Nourmohammad
- Department of Computer Science and Engineering, University of Washington, Seattle, USA
- Department of Physics, University of Washington, 3910 15th Avenue Northeast, Seattle, WA 98195, USA
- Department of Applied Mathematics, University of Washington, Seattle, USA
- Fred Hutchinson cancer Research Center, 1100 Fairview ave N, Seattle, WA 98109, USA
| |
Collapse
|
11
|
Philipp M, Moth C, Ristic N, Tiemann J, Seufert F, Panfilova A, Meiler J, Hildebrand P, Stein A, Wiegreffe D, Staritzbichler R. MutationExplorer: a webserver for mutation of proteins and 3D visualization of energetic impacts. Nucleic Acids Res 2024; 52:W132-W139. [PMID: 38647044 PMCID: PMC11223880 DOI: 10.1093/nar/gkae301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 03/22/2024] [Accepted: 04/09/2024] [Indexed: 04/25/2024] Open
Abstract
The possible effects of mutations on stability and function of a protein can only be understood in the context of protein 3D structure. The MutationExplorer webserver maps sequence changes onto protein structures and allows users to study variation by inputting sequence changes. As the user enters variants, the 3D model evolves, and estimated changes in energy are highlighted. In addition to a basic per-residue input format, MutationExplorer can also upload an entire replacement sequence. Previously the purview of desktop applications, such an upload can back-mutate PDB structures to wildtype sequence in a single step. Another supported variation source is human single nucelotide polymorphisms (SNPs), genomic coordinates input in VCF format. Structures are flexibly colorable, not only by energetic differences, but also by hydrophobicity, sequence conservation, or other biochemical profiling. Coloring by interface score reveals mutation impacts on binding surfaces. MutationExplorer strives for efficiency in user experience. For example, we have prepared 45 000 PDB depositions for instant retrieval and initial display. All modeling steps are performed by Rosetta. Visualizations leverage MDsrv/Mol*. MutationExplorer is available at: http://proteinformatics.org/mutation_explorer/.
Collapse
Affiliation(s)
- Michelle Philipp
- Image and Signal Processing Group, Department of Computer Science, Leipzig University, Augustusplatz 10, 04109 Leipzig, Germany
| | - Christopher W Moth
- Vanderbilt University, Center for Structural Biology, 465 21st Ave South, Nashville, TN 37232, USA
| | - Nikola Ristic
- Institute for Medical Physics and Biophysics, Leipzig University, Härtelstraße 16-18, 04107 Leipzig, Germany
| | - Johanna K S Tiemann
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen N., Denmark
- Novozymes A/S, 2800 Kgs. Lyngby, Denmark
| | - Florian Seufert
- Institute for Medical Physics and Biophysics, Leipzig University, Härtelstraße 16-18, 04107 Leipzig, Germany
| | - Aleksandra Panfilova
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen N., Denmark
| | - Jens Meiler
- Vanderbilt University, Center for Structural Biology, 465 21st Ave South, Nashville, TN 37232, USA
- Leipzig University Medical School, Institute for Drug Discovery, Brüderstraße 34, 04103 Leipzig, Germany
| | - Peter W Hildebrand
- Institute for Medical Physics and Biophysics, Leipzig University, Härtelstraße 16-18, 04107 Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Leipzig University, Germany
- Berlin Institute of Health, 10178 Berlin, Germany
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen N., Denmark
| | - Daniel Wiegreffe
- Image and Signal Processing Group, Department of Computer Science, Leipzig University, Augustusplatz 10, 04109 Leipzig, Germany
| | - René Staritzbichler
- Institute for Medical Physics and Biophysics, Leipzig University, Härtelstraße 16-18, 04107 Leipzig, Germany
- University Institute for Laboratory Medicine, Microbiology and Clinical Pathobiochemistry, University Hospital of Bielefeld University, Germany
| |
Collapse
|
12
|
Chu SKS, Narang K, Siegel JB. Protein stability prediction by fine-tuning a protein language model on a mega-scale dataset. PLoS Comput Biol 2024; 20:e1012248. [PMID: 39038042 PMCID: PMC11293664 DOI: 10.1371/journal.pcbi.1012248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 08/01/2024] [Accepted: 06/13/2024] [Indexed: 07/24/2024] Open
Abstract
Protein stability plays a crucial role in a variety of applications, such as food processing, therapeutics, and the identification of pathogenic mutations. Engineering campaigns commonly seek to improve protein stability, and there is a strong interest in streamlining these processes to enable rapid optimization of highly stabilized proteins with fewer iterations. In this work, we explore utilizing a mega-scale dataset to develop a protein language model optimized for stability prediction. ESMtherm is trained on the folding stability of 528k natural and de novo sequences derived from 461 protein domains and can accommodate deletions, insertions, and multiple-point mutations. We show that a protein language model can be fine-tuned to predict folding stability. ESMtherm performs reasonably on small protein domains and generalizes to sequences distal from the training set. Lastly, we discuss our model's limitations compared to other state-of-the-art methods in generalizing to larger protein scaffolds. Our results highlight the need for large-scale stability measurements on a diverse dataset that mirrors the distribution of sequence lengths commonly observed in nature.
Collapse
Affiliation(s)
- Simon K. S. Chu
- Biophysics Graduate Program, University of California Davis, Davis, California, United States of America
| | - Kush Narang
- College of Biological Sciences, University of California Davis, Davis, California, United States of America
| | - Justin B. Siegel
- Genome Center, University of California Davis, Davis, California, United States of America
- Department of Chemistry, University of California Davis, Davis, California, United States of America
- Department of Biochemistry and Molecular Medicine, University of California Davis, Davis, California, United States of America
| |
Collapse
|
13
|
Gouliaev F, Jonsson N, Gersing S, Lisby M, Lindorff-Larsen K, Hartmann-Petersen R. Destabilization and Degradation of a Disease-Linked PGM1 Protein Variant. Biochemistry 2024; 63:1423-1433. [PMID: 38743592 DOI: 10.1021/acs.biochem.4c00042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
PGM1-linked congenital disorder of glycosylation (PGM1-CDG) is an autosomal recessive disease characterized by several phenotypes, some of which are life-threatening. Research focusing on the disease-related variants of the α-D-phosphoglucomutase 1 (PGM1) protein has shown that several are insoluble in vitro and expressed at low levels in patient fibroblasts. Due to these observations, we hypothesized that some disease-linked PGM1 protein variants are structurally destabilized and subject to protein quality control (PQC) and rapid intracellular degradation. Employing yeast-based assays, we show that a disease-associated human variant, PGM1 L516P, is insoluble, inactive, and highly susceptible to ubiquitylation and rapid degradation by the proteasome. In addition, we show that PGM1 L516P forms aggregates in S. cerevisiae and that both the aggregation pattern and the abundance of PGM1 L516P are chaperone-dependent. Finally, using computational methods, we perform saturation mutagenesis to assess the impact of all possible single residue substitutions in the PGM1 protein. These analyses identify numerous missense variants with predicted detrimental effects on protein function and stability. We suggest that many disease-linked PGM1 variants are subject to PQC-linked degradation and that our in silico site-saturated data set may assist in the mechanistic interpretation of PGM1 variants.
Collapse
Affiliation(s)
- Frederik Gouliaev
- Department of Biology, University of Copenhagen, Ole Maalo̷es Vej 5, DK2200N Copenhagen, Denmark
| | - Nicolas Jonsson
- Department of Biology, University of Copenhagen, Ole Maalo̷es Vej 5, DK2200N Copenhagen, Denmark
| | - Sarah Gersing
- Department of Biology, University of Copenhagen, Ole Maalo̷es Vej 5, DK2200N Copenhagen, Denmark
| | - Michael Lisby
- Department of Biology, University of Copenhagen, Ole Maalo̷es Vej 5, DK2200N Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Department of Biology, University of Copenhagen, Ole Maalo̷es Vej 5, DK2200N Copenhagen, Denmark
| | - Rasmus Hartmann-Petersen
- Department of Biology, University of Copenhagen, Ole Maalo̷es Vej 5, DK2200N Copenhagen, Denmark
| |
Collapse
|
14
|
Qiu Y, Huang T, Cai YD. Review of predicting protein stability changes upon variations. Proteomics 2024; 24:e2300371. [PMID: 38643379 DOI: 10.1002/pmic.202300371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/07/2024] [Accepted: 04/08/2024] [Indexed: 04/22/2024]
Abstract
Forecasting alterations in protein stability caused by variations holds immense importance. Improving the thermal stability of proteins is important for biomedical and industrial applications. This review discusses the latest methods for predicting the effects of mutations on protein stability, databases containing protein mutations and thermodynamic parameters, and experimental techniques for efficiently assessing protein stability in high-throughput settings. Various publicly available databases for protein stability prediction are introduced. Furthermore, state-of-the-art computational approaches for anticipating protein stability changes due to variants are reviewed. Each method's types of features, base algorithm, and prediction results are also detailed. Additionally, some experimental approaches for verifying the prediction results of computational methods are introduced. Finally, the review summarizes the progress and challenges of protein stability prediction and discusses potential models for future research directions.
Collapse
Affiliation(s)
- Yiling Qiu
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
15
|
Jänes J, Müller M, Selvaraj S, Manoel D, Stephenson J, Gonçalves C, Lafita A, Polacco B, Obernier K, Alasoo K, Lemos MC, Krogan N, Martin M, Saraiva LR, Burke D, Beltrao P. Predicted mechanistic impacts of human protein missense variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.29.596373. [PMID: 38854010 PMCID: PMC11160786 DOI: 10.1101/2024.05.29.596373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Genome sequencing efforts have led to the discovery of tens of millions of protein missense variants found in the human population with the majority of these having no annotated role and some likely contributing to trait variation and disease. Sequence-based artificial intelligence approaches have become highly accurate at predicting variants that are detrimental to the function of proteins but they do not inform on mechanisms of disruption. Here we combined sequence and structure-based methods to perform proteome-wide prediction of deleterious variants with information on their impact on protein stability, protein-protein interactions and small-molecule binding pockets. AlphaFold2 structures were used to predict approximately 100,000 small-molecule binding pockets and stability changes for over 200 million variants. To inform on protein-protein interfaces we used AlphaFold2 to predict structures for nearly 500,000 protein complexes. We illustrate the value of mechanism-aware variant effect predictions to study the relation between protein stability and abundance and the structural properties of interfaces underlying trans protein quantitative trait loci (pQTLs). We characterised the distribution of mechanistic impacts of protein variants found in patients and experimentally studied example disease linked variants in FGFR1.
Collapse
Affiliation(s)
- Jürgen Jänes
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Müller
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Senthil Selvaraj
- Sidra Medicine, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | - Diogo Manoel
- Sidra Medicine, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | - James Stephenson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Catarina Gonçalves
- Sidra Medicine, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | | | - Benjamin Polacco
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA, USA
| | - Kirsten Obernier
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA, USA
| | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Manuel C. Lemos
- CICS-UBI, Health Sciences Research Centre, University of Beira Interior, 6200-506, Covilhã, Portugal
| | - Nevan Krogan
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA, USA
- J. David Gladstone Institutes, San Francisco, CA, USA
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Luis R. Saraiva
- Sidra Medicine, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | - David Burke
- Faculty of Life Sciences and Medicine, King’s College, London, UK
| | - Pedro Beltrao
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| |
Collapse
|
16
|
Grønbæk-Thygesen M, Voutsinos V, Johansson KE, Schulze TK, Cagiada M, Pedersen L, Clausen L, Nariya S, Powell RL, Stein A, Fowler DM, Lindorff-Larsen K, Hartmann-Petersen R. Deep mutational scanning reveals a correlation between degradation and toxicity of thousands of aspartoacylase variants. Nat Commun 2024; 15:4026. [PMID: 38740822 DOI: 10.1038/s41467-024-48481-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 05/02/2024] [Indexed: 05/16/2024] Open
Abstract
Unstable proteins are prone to form non-native interactions with other proteins and thereby may become toxic. To mitigate this, destabilized proteins are targeted by the protein quality control network. Here we present systematic studies of the cytosolic aspartoacylase, ASPA, where variants are linked to Canavan disease, a lethal neurological disorder. We determine the abundance of 6152 of the 6260 ( ~ 98%) possible single amino acid substitutions and nonsense ASPA variants in human cells. Most low abundance variants are degraded through the ubiquitin-proteasome pathway and become toxic upon prolonged expression. The data correlates with predicted changes in thermodynamic stability, evolutionary conservation, and separate disease-linked variants from benign variants. Mapping of degradation signals (degrons) shows that these are often buried and the C-terminal region functions as a degron. The data can be used to interpret Canavan disease variants and provide insight into the relationship between protein stability, degradation and cell fitness.
Collapse
Affiliation(s)
- Martin Grønbæk-Thygesen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Vasileios Voutsinos
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Thea K Schulze
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Line Pedersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Lene Clausen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Snehal Nariya
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Rachel L Powell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Amelie Stein
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Rasmus Hartmann-Petersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
17
|
Wagner A. Genotype sampling for deep-learning assisted experimental mapping of a combinatorially complete fitness landscape. Bioinformatics 2024; 40:btae317. [PMID: 38745436 PMCID: PMC11132821 DOI: 10.1093/bioinformatics/btae317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/21/2024] [Accepted: 05/14/2024] [Indexed: 05/16/2024] Open
Abstract
MOTIVATION Experimental characterization of fitness landscapes, which map genotypes onto fitness, is important for both evolutionary biology and protein engineering. It faces a fundamental obstacle in the astronomical number of genotypes whose fitness needs to be measured for any one protein. Deep learning may help to predict the fitness of many genotypes from a smaller neural network training sample of genotypes with experimentally measured fitness. Here I use a recently published experimentally mapped fitness landscape of more than 260 000 protein genotypes to ask how such sampling is best performed. RESULTS I show that multilayer perceptrons, recurrent neural networks, convolutional networks, and transformers, can explain more than 90% of fitness variance in the data. In addition, 90% of this performance is reached with a training sample comprising merely ≈103 sequences. Generalization to unseen test data is best when training data is sampled randomly and uniformly, or sampled to minimize the number of synonymous sequences. In contrast, sampling to maximize sequence diversity or codon usage bias reduces performance substantially. These observations hold for more than one network architecture. Simple sampling strategies may perform best when training deep learning neural networks to map fitness landscapes from experimental data. AVAILABILITY AND IMPLEMENTATION The fitness landscape data analyzed here is publicly available as described previously (Papkou et al. 2023). All code used to analyze this landscape is publicly available at https://github.com/andreas-wagner-uzh/fitness_landscape_sampling.
Collapse
Affiliation(s)
- Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode,1015 Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, 87501 NM, United States
| |
Collapse
|
18
|
Carter JJ, Walker TM, Walker AS, Whitfield MG, Morlock GP, Lynch CI, Adlard D, Peto TEA, Posey JE, Crook DW, Fowler PW. Prediction of pyrazinamide resistance in Mycobacterium tuberculosis using structure-based machine-learning approaches. JAC Antimicrob Resist 2024; 6:dlae037. [PMID: 38500518 PMCID: PMC10946228 DOI: 10.1093/jacamr/dlae037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 02/19/2024] [Indexed: 03/20/2024] Open
Abstract
Background Pyrazinamide is one of four first-line antibiotics used to treat tuberculosis; however, antibiotic susceptibility testing for pyrazinamide is challenging. Resistance to pyrazinamide is primarily driven by genetic variation in pncA, encoding an enzyme that converts pyrazinamide into its active form. Methods We curated a dataset of 664 non-redundant, missense amino acid mutations in PncA with associated high-confidence phenotypes from published studies and then trained three different machine-learning models to predict pyrazinamide resistance. All models had access to a range of protein structural-, chemical- and sequence-based features. Results The best model, a gradient-boosted decision tree, achieved a sensitivity of 80.2% and a specificity of 76.9% on the hold-out test dataset. The clinical performance of the models was then estimated by predicting the binary pyrazinamide resistance phenotype of 4027 samples harbouring 367 unique missense mutations in pncA derived from 24 231 clinical isolates. Conclusions This work demonstrates how machine learning can enhance the sensitivity/specificity of pyrazinamide resistance prediction in genetics-based clinical microbiology workflows, highlights novel mutations for future biochemical investigation, and is a proof of concept for using this approach in other drugs.
Collapse
Affiliation(s)
- Joshua J Carter
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
| | - Timothy M Walker
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
| | - A Sarah Walker
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
- NIHR Health Protection Research Unit in Healthcare Associated Infection and Antimicrobial Resistance, University of Oxford, Oxford, UK
| | - Michael G Whitfield
- Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, SAMRC Centre for Tuberculosis Research, DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, Stellenbosch University, Tygerberg, South Africa
| | - Glenn P Morlock
- Division of Tuberculosis Elimination, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Charlotte I Lynch
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
| | - Dylan Adlard
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
| | - Timothy E A Peto
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
| | - James E Posey
- Division of Tuberculosis Elimination, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Derrick W Crook
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
- NIHR Health Protection Research Unit in Healthcare Associated Infection and Antimicrobial Resistance, University of Oxford, Oxford, UK
| | - Philip W Fowler
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
| |
Collapse
|
19
|
Gelman S, Johnson B, Freschlin C, D'Costa S, Gitter A, Romero PA. Biophysics-based protein language models for protein engineering. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585128. [PMID: 38559182 PMCID: PMC10980077 DOI: 10.1101/2024.03.15.585128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose Mutational Effect Transfer Learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure, and energetics. We finetune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity, and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.
Collapse
Affiliation(s)
- Sam Gelman
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | - Bryce Johnson
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | | | - Sameer D'Costa
- Department of Biochemistry, University of Wisconsin-Madison
| | - Anthony Gitter
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
| | | |
Collapse
|
20
|
Philipp M, Moth CW, Ristic N, Tiemann JK, Seufert F, Panfilova A, Meiler J, Hildebrand PW, Stein A, Wiegreffe D, Staritzbichler R. MUTATIONEXPLORER- A WEBSERVER FOR MUTATION OF PROTEINS AND 3D VISUALIZATION OF ENERGETIC IMPACTS. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.23.533926. [PMID: 38464310 PMCID: PMC10925206 DOI: 10.1101/2023.03.23.533926] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
The possible effects of mutations on stability and function of a protein can only be understood in the context of protein 3D structure. The MutationExplorer webserver maps sequence changes onto protein structures and allows users to study variation by inputting sequence changes. As the user enters variants, the 3D model evolves, and estimated changes in energy are highlighted. In addition to a basic per-residue input format, MutationExplorer can also upload an entire replacement sequence. Previously the purview of desktop applications, such an upload can back-mutate PDB structures to wildtype sequence in a single step. Another supported variation source is human single nucelotide polymorphisms (SNPs), genomic coordinates input in VCF format.
Collapse
Affiliation(s)
- Michelle Philipp
- Leipzig University, Image and Signal Processing Group, Leipzig, Germany
| | - Christopher W. Moth
- Vanderbilt University, Center for Structural Biology, Nashville, Tennessee, USA
| | - Nikola Ristic
- Leipzig University, Institute for Medical Physics and Biophysics, Leipzig, Germany
| | - Johanna K.S. Tiemann
- University of Copenhagen, Linderstrøm-Lang Centre for Protein Science, Copenhagen N., Denmark, and Novozymes A/S, Lyngby, Denmark
| | - Florian Seufert
- Leipzig University, Institute for Medical Physics and Biophysics, Leipzig, Germany
| | - Aleksandra Panfilova
- University of Copenhagen, Linderstrøm-Lang Centre for Protein Science, Copenhagen N., Denmark
| | - Jens Meiler
- Vanderbilt University, Center for Structural Biology, Nashville, Tennessee, USA, and Leipzig University Medical School, Institute for Drug Discovery, Leipzig, Germany
| | - Peter W. Hildebrand
- Leipzig University, Institute for Medical Physics and Biophysics, Leipzig, Germany, and Charité Universitätsmedizin Berlin, Institute of Medical Physics and Biophysics, Berlin, Germany, and Berlin Institute of Health, Berlin, Germany
| | - Amelie Stein
- University of Copenhagen, Linderstrøm-Lang Centre for Protein Science, Copenhagen N., Denmark
| | - Daniel Wiegreffe
- Leipzig University, Image and Signal Processing Group, Leipzig, Germany
| | - René Staritzbichler
- Leipzig University, Institute for Medical Physics and Biophysics, Leipzig, Germany
| |
Collapse
|
21
|
Clausen L, Voutsinos V, Cagiada M, Johansson KE, Grønbæk-Thygesen M, Nariya S, Powell RL, Have MKN, Oestergaard VH, Stein A, Fowler DM, Lindorff-Larsen K, Hartmann-Petersen R. A mutational atlas for Parkin proteostasis. Nat Commun 2024; 15:1541. [PMID: 38378758 PMCID: PMC10879094 DOI: 10.1038/s41467-024-45829-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 02/01/2024] [Indexed: 02/22/2024] Open
Abstract
Proteostasis can be disturbed by mutations affecting folding and stability of the encoded protein. An example is the ubiquitin ligase Parkin, where gene variants result in autosomal recessive Parkinsonism. To uncover the pathological mechanism and provide comprehensive genotype-phenotype information, variant abundance by massively parallel sequencing (VAMP-seq) is leveraged to quantify the abundance of Parkin variants in cultured human cells. The resulting mutational map, covering 9219 out of the 9300 possible single-site amino acid substitutions and nonsense Parkin variants, shows that most low abundance variants are proteasome targets and are located within the structured domains of the protein. Half of the known disease-linked variants are found at low abundance. Systematic mapping of degradation signals (degrons) reveals an exposed degron region proximal to the so-called "activation element". This work provides examples of how missense variants may cause degradation either via destabilization of the native protein, or by introducing local signals for degradation.
Collapse
Affiliation(s)
- Lene Clausen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Vasileios Voutsinos
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Martin Grønbæk-Thygesen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Snehal Nariya
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Rachel L Powell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Magnus K N Have
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - Amelie Stein
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Rasmus Hartmann-Petersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
22
|
Pun MN, Ivanov A, Bellamy Q, Montague Z, LaMont C, Bradley P, Otwinowski J, Nourmohammad A. Learning the shape of protein microenvironments with a holographic convolutional neural network. Proc Natl Acad Sci U S A 2024; 121:e2300838121. [PMID: 38300863 PMCID: PMC10861886 DOI: 10.1073/pnas.2300838121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 11/29/2023] [Indexed: 02/03/2024] Open
Abstract
Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from its sequence or structure remains a major challenge. Here, we introduce holographic convolutional neural network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein stability and binding of protein complexes. Our interpretable computational model for protein structure-function maps could guide design of novel proteins with desired function.
Collapse
Affiliation(s)
- Michael N. Pun
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Andrew Ivanov
- Department of Physics, University of Washington, Seattle, WA98195
| | - Quinn Bellamy
- Department of Physics, University of Washington, Seattle, WA98195
| | - Zachary Montague
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Colin LaMont
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Philip Bradley
- Fred Hutchinson Cancer Center, Seattle, WA98102
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Jakub Otwinowski
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
- Dyno Therapeutics, Watertown, MA02472
| | - Armita Nourmohammad
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
- Fred Hutchinson Cancer Center, Seattle, WA98102
- Department of Applied Mathematics, University of Washington, Seattle, WA98105
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA98195
| |
Collapse
|
23
|
Dieckhaus H, Brocidiacono M, Randolph NZ, Kuhlman B. Transfer learning to leverage larger datasets for improved prediction of protein stability changes. Proc Natl Acad Sci U S A 2024; 121:e2314853121. [PMID: 38285937 PMCID: PMC10861915 DOI: 10.1073/pnas.2314853121] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Accepted: 12/26/2023] [Indexed: 01/31/2024] Open
Abstract
Amino acid mutations that lower a protein's thermodynamic stability are implicated in numerous diseases, and engineered proteins with enhanced stability can be important in research and medicine. Computational methods for predicting how mutations perturb protein stability are, therefore, of great interest. Despite recent advancements in protein design using deep learning, in silico prediction of stability changes has remained challenging, in part due to a lack of large, high-quality training datasets for model development. Here, we describe ThermoMPNN, a deep neural network trained to predict stability changes for protein point mutations given an initial structure. In doing so, we demonstrate the utility of a recently released megascale stability dataset for training a robust stability model. We also employ transfer learning to leverage a second, larger dataset by using learned features extracted from ProteinMPNN, a deep neural network trained to predict a protein's amino acid sequence given its three-dimensional structure. We show that our method achieves state-of-the-art performance on established benchmark datasets using a lightweight model architecture that allows for rapid, scalable predictions. Finally, we make ThermoMPNN readily available as a tool for stability prediction and design.
Collapse
Affiliation(s)
- Henry Dieckhaus
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, NC27599
- Division of Chemical Biology and Medicinal Chemistry, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, NC27599
| | - Michael Brocidiacono
- Division of Chemical Biology and Medicinal Chemistry, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, NC27599
| | - Nicholas Z. Randolph
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, NC27599
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, NC27599
| | - Brian Kuhlman
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, NC27599
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, NC27599
- Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, NC27599
| |
Collapse
|
24
|
Cheon H, Kim JH, Kim JS, Park JB. Valorization of single-carbon chemicals by using carboligases as key enzymes. Curr Opin Biotechnol 2024; 85:103047. [PMID: 38128199 DOI: 10.1016/j.copbio.2023.103047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/23/2023] [Accepted: 11/24/2023] [Indexed: 12/23/2023]
Abstract
Single-carbon (C1) biorefinery plays a key role in the consumption of global greenhouse gases and a circular carbon economy. Thereby, we have focused on the valorization of C1 compounds (e.g. methanol, formaldehyde, and formate) into multicarbon products, including bioplastic monomers, glycolate, and ethylene glycol. For instance, methanol, derived from the oxidation of CH4, can be converted into glycolate, ethylene glycol, or erythrulose via formaldehyde and glycolaldehyde, employing C1 and/or C2 carboligases as essential enzymes. Escherichia coli was engineered to convert formate, produced from CO via CO2 or from CO2 directly, into glycolate. Recent progress in the design of biotransformation pathways, enzyme discovery, and engineering, as well as whole-cell biocatalyst engineering for C1 biorefinery, was addressed in this review.
Collapse
Affiliation(s)
- Huijin Cheon
- Department of Food Science and Biotechnology, Ewha Womans University, Seoul 03760, Republic of Korea
| | - Jun-Hong Kim
- Department of Chemistry, Chonnam National University, Gwangju 61186, Republic of Korea
| | - Jeong-Sun Kim
- Department of Chemistry, Chonnam National University, Gwangju 61186, Republic of Korea.
| | - Jin-Byung Park
- Department of Food Science and Biotechnology, Ewha Womans University, Seoul 03760, Republic of Korea.
| |
Collapse
|
25
|
Notin P, Rollins N, Gal Y, Sander C, Marks D. Machine learning for functional protein design. Nat Biotechnol 2024; 42:216-228. [PMID: 38361074 DOI: 10.1038/s41587-024-02127-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 01/05/2024] [Indexed: 02/17/2024]
Abstract
Recent breakthroughs in AI coupled with the rapid accumulation of protein sequence and structure data have radically transformed computational protein design. New methods promise to escape the constraints of natural and laboratory evolution, accelerating the generation of proteins for applications in biotechnology and medicine. To make sense of the exploding diversity of machine learning approaches, we introduce a unifying framework that classifies models on the basis of their use of three core data modalities: sequences, structures and functional labels. We discuss the new capabilities and outstanding challenges for the practical design of enzymes, antibodies, vaccines, nanomachines and more. We then highlight trends shaping the future of this field, from large-scale assays to more robust benchmarks, multimodal foundation models, enhanced sampling strategies and laboratory automation.
Collapse
Affiliation(s)
- Pascal Notin
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Computer Science, University of Oxford, Oxford, UK.
| | | | - Yarin Gal
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Chris Sander
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Debora Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
26
|
Zheng F, Liu Y, Yang Y, Wen Y, Li M. Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset. Protein Sci 2024; 33:e4861. [PMID: 38084013 PMCID: PMC10751734 DOI: 10.1002/pro.4861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/14/2023] [Accepted: 12/06/2023] [Indexed: 12/28/2023]
Abstract
Insight into how mutations affect protein stability is crucial for protein engineering, understanding genetic diseases, and exploring protein evolution. Numerous computational methods have been developed to predict the impact of amino acid substitutions on protein stability. Nevertheless, comparing these methods poses challenges due to variations in their training data. Moreover, it is observed that they tend to perform better at predicting destabilizing mutations than stabilizing ones. Here, we meticulously compiled a new dataset from three recently published databases: ThermoMutDB, FireProtDB, and ProThermDB. This dataset, which does not overlap with the well-established S2648 dataset, consists of 4038 single-point mutations, including over 1000 stabilizing mutations. We assessed these mutations using 27 computational methods, including the latest ones utilizing mega-scale stability datasets and transfer learning. We excluded entries with overlap or similarity to training datasets to ensure fairness. Pearson correlation coefficients for the tested tools ranged from 0.20 to 0.53 on unseen data, and none of the methods could accurately predict stabilizing mutations, even those performing well in anti-symmetric property analysis. While most methods present consistent trends for predicting destabilizing mutations across various properties such as solvent exposure and secondary conformation, stabilizing mutations do not exhibit a clear pattern. Our study also suggests that solely addressing training dataset bias may not significantly enhance accuracy of predicting stabilizing mutations. These findings emphasize the importance of developing precise predictive methods for stabilizing mutations.
Collapse
Affiliation(s)
- Feifan Zheng
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yang Liu
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yan Yang
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yuhao Wen
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Minghui Li
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| |
Collapse
|
27
|
Musil M, Jezik A, Horackova J, Borko S, Kabourek P, Damborsky J, Bednar D. FireProt 2.0: web-based platform for the fully automated design of thermostable proteins. Brief Bioinform 2023; 25:bbad425. [PMID: 38018911 PMCID: PMC10685400 DOI: 10.1093/bib/bbad425] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/25/2023] [Accepted: 11/01/2023] [Indexed: 11/30/2023] Open
Abstract
Thermostable proteins find their use in numerous biomedical and biotechnological applications. However, the computational design of stable proteins often results in single-point mutations with a limited effect on protein stability. However, the construction of stable multiple-point mutants can prove difficult due to the possibility of antagonistic effects between individual mutations. FireProt protocol enables the automated computational design of highly stable multiple-point mutants. FireProt 2.0 builds on top of the previously published FireProt web, retaining the original functionality and expanding it with several new stabilization strategies. FireProt 2.0 integrates the AlphaFold database and the homology modeling for structure prediction, enabling calculations starting from a sequence. Multiple-point designs are constructed using the Bron-Kerbosch algorithm minimizing the antagonistic effect between the individual mutations. Users can newly limit the FireProt calculation to a set of user-defined mutations, run a saturation mutagenesis of the whole protein or select rigidifying mutations based on B-factors. Evolution-based back-to-consensus strategy is complemented by ancestral sequence reconstruction. FireProt 2.0 is significantly faster and a reworked graphical user interface broadens the tool's availability even to users with older hardware. FireProt 2.0 is freely available at http://loschmidt.chemi.muni.cz/fireprotweb.
Collapse
Affiliation(s)
- Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - Andrej Jezik
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
| | - Jana Horackova
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
| | - Simeon Borko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - Petr Kabourek
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| |
Collapse
|
28
|
Dieckhaus H, Brocidiacono M, Randolph N, Kuhlman B. Transfer learning to leverage larger datasets for improved prediction of protein stability changes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.27.550881. [PMID: 37547004 PMCID: PMC10402116 DOI: 10.1101/2023.07.27.550881] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Amino acid mutations that lower a protein's thermodynamic stability are implicated in numerous diseases, and engineered proteins with enhanced stability are important in research and medicine. Computational methods for predicting how mutations perturb protein stability are therefore of great interest. Despite recent advancements in protein design using deep learning, in silico prediction of stability changes has remained challenging, in part due to a lack of large, high-quality training datasets for model development. Here we introduce ThermoMPNN, a deep neural network trained to predict stability changes for protein point mutations given an initial structure. In doing so, we demonstrate the utility of a newly released mega-scale stability dataset for training a robust stability model. We also employ transfer learning to leverage a second, larger dataset by using learned features extracted from a deep neural network trained to predict a protein's amino acid sequence given its three-dimensional structure. We show that our method achieves competitive performance on established benchmark datasets using a lightweight model architecture that allows for rapid, scalable predictions. Finally, we make ThermoMPNN readily available as a tool for stability prediction and design.
Collapse
Affiliation(s)
- Henry Dieckhaus
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Division of Chemical Biology and Medicinal Chemistry, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, North Carolina, USA
| | - Michael Brocidiacono
- Division of Chemical Biology and Medicinal Chemistry, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, North Carolina, USA
| | - Nicholas Randolph
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| | - Brian Kuhlman
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| |
Collapse
|
29
|
Cagiada M, Bottaro S, Lindemose S, Schenstrøm SM, Stein A, Hartmann-Petersen R, Lindorff-Larsen K. Discovering functionally important sites in proteins. Nat Commun 2023; 14:4175. [PMID: 37443362 PMCID: PMC10345196 DOI: 10.1038/s41467-023-39909-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 07/02/2023] [Indexed: 07/15/2023] Open
Abstract
Proteins play important roles in biology, biotechnology and pharmacology, and missense variants are a common cause of disease. Discovering functionally important sites in proteins is a central but difficult problem because of the lack of large, systematic data sets. Sequence conservation can highlight residues that are functionally important but is often convoluted with a signal for preserving structural stability. We here present a machine learning method to predict functional sites by combining statistical models for protein sequences with biophysical models of stability. We train the model using multiplexed experimental data on variant effects and validate it broadly. We show how the model can be used to discover active sites, as well as regulatory and binding sites. We illustrate the utility of the model by prospective prediction and subsequent experimental validation on the functional consequences of missense variants in HPRT1 which may cause Lesch-Nyhan syndrome, and pinpoint the molecular mechanisms by which they cause disease.
Collapse
Affiliation(s)
- Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Sandro Bottaro
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Søren Lindemose
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Signe M Schenstrøm
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Rasmus Hartmann-Petersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
30
|
Gerasimavicius L, Livesey BJ, Marsh JA. Correspondence between functional scores from deep mutational scans and predicted effects on protein stability. Protein Sci 2023; 32:e4688. [PMID: 37243972 PMCID: PMC10273344 DOI: 10.1002/pro.4688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 04/19/2023] [Accepted: 05/24/2023] [Indexed: 05/29/2023]
Abstract
Many methodologically diverse computational methods have been applied to the growing challenge of predicting and interpreting the effects of protein variants. As many pathogenic mutations have a perturbing effect on protein stability or intermolecular interactions, one highly interpretable approach is to use protein structural information to model the physical impacts of variants and predict their likely effects on protein stability and interactions. Previous efforts have assessed the accuracy of stability predictors in reproducing thermodynamically accurate values and evaluated their ability to distinguish between known pathogenic and benign mutations. Here, we take an alternate approach, and explore how well stability predictor scores correlate with functional impacts derived from deep mutational scanning (DMS) experiments. In this work, we compare the predictions of 9 protein stability-based tools against mutant protein fitness values from 49 independent DMS datasets, covering 170,940 unique single amino acid variants. We find that FoldX and Rosetta show the strongest correlations with DMS-based functional scores, similar to their previous top performance in distinguishing between pathogenic and benign variants. For both methods, performance is considerably improved when considering intermolecular interactions from protein complex structures, when available. Furthermore, using these two predictors, we derive a "Foldetta" consensus score, which improves upon the performance of both, and manages to match dedicated variant effect predictors in reflecting variant functional impacts. Finally, we also highlight that predicted stability effects show consistently higher correlations with certain DMS experimental phenotypes, particularly those based upon protein abundance, and, in certain cases, can significantly outcompete sequence-based variant effect prediction methodologies for predicting functional scores from DMS experiments.
Collapse
Affiliation(s)
- Lukas Gerasimavicius
- MRC Human Genetics Unit, Institute of Genetics & CancerUniversity of EdinburghEdinburghUK
| | - Benjamin J. Livesey
- MRC Human Genetics Unit, Institute of Genetics & CancerUniversity of EdinburghEdinburghUK
| | - Joseph A. Marsh
- MRC Human Genetics Unit, Institute of Genetics & CancerUniversity of EdinburghEdinburghUK
| |
Collapse
|
31
|
Blaabjerg LM, Kassem MM, Good LL, Jonsson N, Cagiada M, Johansson KE, Boomsma W, Stein A, Lindorff-Larsen K. Rapid protein stability prediction using deep learning representations. eLife 2023; 12:e82593. [PMID: 37184062 PMCID: PMC10266766 DOI: 10.7554/elife.82593] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 05/12/2023] [Indexed: 05/16/2023] Open
Abstract
Predicting the thermodynamic stability of proteins is a common and widely used step in protein engineering, and when elucidating the molecular mechanisms behind evolution and disease. Here, we present RaSP, a method for making rapid and accurate predictions of changes in protein stability by leveraging deep learning representations. RaSP performs on-par with biophysics-based methods and enables saturation mutagenesis stability predictions in less than a second per residue. We use RaSP to calculate ∼ 230 million stability changes for nearly all single amino acid changes in the human proteome, and examine variants observed in the human population. We find that variants that are common in the population are substantially depleted for severe destabilization, and that there are substantial differences between benign and pathogenic variants, highlighting the role of protein stability in genetic diseases. RaSP is freely available-including via a Web interface-and enables large-scale analyses of stability in experimental and predicted protein structures.
Collapse
Affiliation(s)
- Lasse M Blaabjerg
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Maher M Kassem
- Center for Basic Machine Learning Research in Life Science, Department of Computer Science, University of CopenhagenCopenhagenDenmark
| | - Lydia L Good
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Nicolas Jonsson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Wouter Boomsma
- Center for Basic Machine Learning Research in Life Science, Department of Computer Science, University of CopenhagenCopenhagenDenmark
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| |
Collapse
|