1
|
Dieckhaus H, Brocidiacono M, Randolph NZ, Kuhlman B. Transfer learning to leverage larger datasets for improved prediction of protein stability changes. Proc Natl Acad Sci U S A 2024; 121:e2314853121. [PMID: 38285937 PMCID: PMC10861915 DOI: 10.1073/pnas.2314853121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Accepted: 12/26/2023] [Indexed: 01/31/2024] Open
Abstract
Amino acid mutations that lower a protein's thermodynamic stability are implicated in numerous diseases, and engineered proteins with enhanced stability can be important in research and medicine. Computational methods for predicting how mutations perturb protein stability are, therefore, of great interest. Despite recent advancements in protein design using deep learning, in silico prediction of stability changes has remained challenging, in part due to a lack of large, high-quality training datasets for model development. Here, we describe ThermoMPNN, a deep neural network trained to predict stability changes for protein point mutations given an initial structure. In doing so, we demonstrate the utility of a recently released megascale stability dataset for training a robust stability model. We also employ transfer learning to leverage a second, larger dataset by using learned features extracted from ProteinMPNN, a deep neural network trained to predict a protein's amino acid sequence given its three-dimensional structure. We show that our method achieves state-of-the-art performance on established benchmark datasets using a lightweight model architecture that allows for rapid, scalable predictions. Finally, we make ThermoMPNN readily available as a tool for stability prediction and design.
Collapse
Affiliation(s)
- Henry Dieckhaus
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, NC27599
- Division of Chemical Biology and Medicinal Chemistry, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, NC27599
| | - Michael Brocidiacono
- Division of Chemical Biology and Medicinal Chemistry, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, NC27599
| | - Nicholas Z. Randolph
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, NC27599
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, NC27599
| | - Brian Kuhlman
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, NC27599
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, NC27599
- Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, NC27599
| |
Collapse
|
2
|
Zheng F, Liu Y, Yang Y, Wen Y, Li M. Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset. Protein Sci 2024; 33:e4861. [PMID: 38084013 PMCID: PMC10751734 DOI: 10.1002/pro.4861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/14/2023] [Accepted: 12/06/2023] [Indexed: 12/28/2023]
Abstract
Insight into how mutations affect protein stability is crucial for protein engineering, understanding genetic diseases, and exploring protein evolution. Numerous computational methods have been developed to predict the impact of amino acid substitutions on protein stability. Nevertheless, comparing these methods poses challenges due to variations in their training data. Moreover, it is observed that they tend to perform better at predicting destabilizing mutations than stabilizing ones. Here, we meticulously compiled a new dataset from three recently published databases: ThermoMutDB, FireProtDB, and ProThermDB. This dataset, which does not overlap with the well-established S2648 dataset, consists of 4038 single-point mutations, including over 1000 stabilizing mutations. We assessed these mutations using 27 computational methods, including the latest ones utilizing mega-scale stability datasets and transfer learning. We excluded entries with overlap or similarity to training datasets to ensure fairness. Pearson correlation coefficients for the tested tools ranged from 0.20 to 0.53 on unseen data, and none of the methods could accurately predict stabilizing mutations, even those performing well in anti-symmetric property analysis. While most methods present consistent trends for predicting destabilizing mutations across various properties such as solvent exposure and secondary conformation, stabilizing mutations do not exhibit a clear pattern. Our study also suggests that solely addressing training dataset bias may not significantly enhance accuracy of predicting stabilizing mutations. These findings emphasize the importance of developing precise predictive methods for stabilizing mutations.
Collapse
Affiliation(s)
- Feifan Zheng
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yang Liu
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yan Yang
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yuhao Wen
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Minghui Li
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| |
Collapse
|
3
|
Zafar R, Awais M. Molecular identification of missense variants in SLC3A1 gene; an approach leading to computer-aided drug design for cystinuria. Gene 2023; 888:147802. [PMID: 37716586 DOI: 10.1016/j.gene.2023.147802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 09/09/2023] [Accepted: 09/13/2023] [Indexed: 09/18/2023]
Abstract
Cystinuria is a rare congenital disorder characterized by the formation of cystine stones in urinary system, mainly kidneys and urinary tract. It follows the autosomal recessive inheritance pattern, where both of the parents contain the mutant allele. Cystine is an oxidized dimeric form of amino acid cysteine, shining crystal of greenish-yellow color sized greater than 5 mm. A minor genetic defect in SLC3A1 gene, downregulate the cystine transporter, rBAT protein, to absorb cystine and other dibasic amino acids in proximal tubule of nephron, causing Cystinuria. Computational and molecular analysis of SLC3A1 gene was performed to identify the deleterious missense single nucleotide variations (mSNVs) linked with Cystinuria in Pakistani population. In silico analysis of whole SLC3A1 gene nsSNPs has revealed that the exon 1, 6 and 10 are the hotspot areas, which potentially alter the protein structure and function. Three SNVs including one synonymous SNV A186C in exon 1, and two mSNVs including G314T in exon 1 and G44972A in exons 10 were identified. Both mSNVs were confirmed by ARMS PCR in all the 68 samples. The results have shown that 10% of the patients have G314T, 16% have G44972A and 74% of the patients have both of these mSNVs. Both of these mSNVs were involved in the structural and functional deterioration of rBAT protein. Additionally, computer aided drug designing tools were used to design diaminobenzylpyrimidine drug around the mutant residues which exhibit the lowest binding affinity with the target as compared to the previously reported cystine binding thiol drugs. In future, the present study could be extended to a large scale for mass screening of reported SNVs and mSNVs which, ultimately, lead to the development of knockouts for the functional studies and treatments.
Collapse
Affiliation(s)
- Rimsha Zafar
- Department of Biotechnology, Faculty of Science and Technology, University of Central Punjab, Lahore 54782, Pakistan.
| | - Muhammad Awais
- Department of Biotechnology, Faculty of Science and Technology, University of Central Punjab, Lahore 54782, Pakistan.
| |
Collapse
|
4
|
Thakur S, Planeta Kepp K, Mehra R. Predicting virus Fitness: Towards a structure-based computational model. J Struct Biol 2023; 215:108042. [PMID: 37931730 DOI: 10.1016/j.jsb.2023.108042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 10/12/2023] [Accepted: 11/03/2023] [Indexed: 11/08/2023]
Abstract
Predicting the impact of new emerging virus mutations is of major interest in surveillance and for understanding the evolutionary forces of the pathogens. The SARS-CoV-2 surface spike-protein (S-protein) binds to human ACE2 receptors as a critical step in host cell infection. At the same time, S-protein binding to human antibodies neutralizes the virus and prevents interaction with ACE2. Here we combine these two binding properties in a simple virus fitness model, using structure-based computation of all possible mutation effects averaged over 10 ACE2 complexes and 10 antibody complexes of the S-protein (∼380,000 computed mutations), and validated the approach against diverse experimental binding/escape data of ACE2 and antibodies. The ACE2-antibody selectivity change caused by mutation (i.e., the differential change in binding to ACE2 vs. immunity-inducing antibodies) is proposed to be a key metric of fitness model, enabling systematic error cancelation when evaluated. In this model, new mutations become fixated if they increase the selective binding to ACE2 relative to circulating antibodies, assuming that both are present in the host in a competitive binding situation. We use this model to categorize viral mutations that may best reach ACE2 before being captured by antibodies. Our model may aid the understanding of variant-specific vaccines and molecular mechanisms of viral evolution in the context of a human host.
Collapse
Affiliation(s)
- Shivani Thakur
- Department of Chemistry, Indian Institute of Technology Bhilai, Kutelabhata, Durg - 491001, Chhattisgarh, India
| | - Kasper Planeta Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kongens Lyngby, Denmark
| | - Rukmankesh Mehra
- Department of Chemistry, Indian Institute of Technology Bhilai, Kutelabhata, Durg - 491001, Chhattisgarh, India; Department of Bioscience and Biomedical Engineering, Indian Institute of Technology Bhilai, Kutelabhata, Durg - 491001, Chhattisgarh, India.
| |
Collapse
|
5
|
Azmi MB, Jawed A, Ahmed SDH, Naeem U, Feroz N, Saleem A, Sardar K, Qureshi SA, Azim MK. Understanding the impact of structural modifications at the NNAT gene's post-translational acetylation site: in silico approach for predicting its drug-interaction role in anorexia nervosa. Eat Weight Disord 2023; 28:97. [PMID: 37987927 PMCID: PMC10663277 DOI: 10.1007/s40519-023-01618-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 10/18/2023] [Indexed: 11/22/2023] Open
Abstract
PURPOSE Anorexia nervosa (AN) is a neuropsychological public health concern with a socially disabling routine and affects a person's healthy relationship with food. The role of the NNAT (Neuronatin) gene in AN is well established. The impact of mutation at the protein's post-translational modification (PTM) site has been exclusively associated with the worsening of the protein's biochemical dynamics. METHODS To understand the relationship between genotype and phenotype, it is essential to investigate the appropriate molecular stability of protein required for proper biological functioning. In this regard, we investigated the PTM-acetylation site of the NNAT gene in terms of 19 other specific amino acid probabilities in place of wild type (WT) through various in silico algorithms. Based on the highest pathogenic impact computed through the consensus classifier tool, we generated 3 residue-specific (K59D, P, W) structurally modified 3D models of NNAT. These models were further tested through the AutoDock Vina tool to compute the molecular drug binding affinities and inhibition constant (Ki) of structural variants and WT 3D models. RESULTS With trained in silico machine learning algorithms and consensus classifier; the three structural modifications (K59D, P, W), which were also the most deleterious substitution at the acetylation site of the NNAT gene, showed the highest structural destabilization and decreased molecular flexibility. The validation and quality assessment of the 3D model of these structural modifications and WT were performed. They were further docked with drugs used to manage AN, it was found that the ΔGbind (kcal/mol) values and the inhibition constants (Ki) were relatively lower in structurally modified models as compared to WT. CONCLUSION We concluded that any future structural variation(s) at the PTM-acetylation site of the NNAT gene due to possible mutational consequences, will serve as a basis to explore its relationship with the propensity of developing AN. LEVEL OF EVIDENCE No level of evidence-open access bioinformatics research.
Collapse
Affiliation(s)
- Muhammad Bilal Azmi
- Department of Biochemistry, Dow Medical College, Dow University of Health Sciences, Karachi, Pakistan.
| | - Areesha Jawed
- Dow Medical College, Dow University of Health Sciences, Karachi, Pakistan
| | - Syed Danish Haseen Ahmed
- Department of Biochemistry, Dow Medical College, Dow University of Health Sciences, Karachi, Pakistan
| | - Unaiza Naeem
- Dow Medical College, Dow University of Health Sciences, Karachi, Pakistan
| | - Nazia Feroz
- Department of Biochemistry, Dow Medical College, Dow University of Health Sciences, Karachi, Pakistan
| | - Arisha Saleem
- Dow Medical College, Dow University of Health Sciences, Karachi, Pakistan
| | - Kainat Sardar
- Department of Biochemistry, University of Karachi, Karachi, Pakistan
- Department of Chemistry, Bahria College NORE-1, Karachi, Pakistan
| | | | - M Kamran Azim
- Department of Biosciences, Mohammad Ali Jinnah University, Karachi, Pakistan
| |
Collapse
|
6
|
Dieckhaus H, Brocidiacono M, Randolph N, Kuhlman B. Transfer learning to leverage larger datasets for improved prediction of protein stability changes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.27.550881. [PMID: 37547004 PMCID: PMC10402116 DOI: 10.1101/2023.07.27.550881] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Amino acid mutations that lower a protein's thermodynamic stability are implicated in numerous diseases, and engineered proteins with enhanced stability are important in research and medicine. Computational methods for predicting how mutations perturb protein stability are therefore of great interest. Despite recent advancements in protein design using deep learning, in silico prediction of stability changes has remained challenging, in part due to a lack of large, high-quality training datasets for model development. Here we introduce ThermoMPNN, a deep neural network trained to predict stability changes for protein point mutations given an initial structure. In doing so, we demonstrate the utility of a newly released mega-scale stability dataset for training a robust stability model. We also employ transfer learning to leverage a second, larger dataset by using learned features extracted from a deep neural network trained to predict a protein's amino acid sequence given its three-dimensional structure. We show that our method achieves competitive performance on established benchmark datasets using a lightweight model architecture that allows for rapid, scalable predictions. Finally, we make ThermoMPNN readily available as a tool for stability prediction and design.
Collapse
Affiliation(s)
- Henry Dieckhaus
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Division of Chemical Biology and Medicinal Chemistry, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, North Carolina, USA
| | - Michael Brocidiacono
- Division of Chemical Biology and Medicinal Chemistry, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, North Carolina, USA
| | - Nicholas Randolph
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| | - Brian Kuhlman
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| |
Collapse
|
7
|
Thakur S, Verma RK, Kepp KP, Mehra R. Modelling SARS-CoV-2 spike-protein mutation effects on ACE2 binding. J Mol Graph Model 2023; 119:108379. [PMID: 36481587 PMCID: PMC9690204 DOI: 10.1016/j.jmgm.2022.108379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 11/04/2022] [Accepted: 11/21/2022] [Indexed: 11/26/2022]
Abstract
The binding affinity of the SARS-CoV-2 spike (S)-protein to the human membrane protein ACE2 is critical for virus function. Computational structure-based screening of new S-protein mutations for ACE2 binding lends promise to rationalize virus function directly from protein structure and ideally aid early detection of potentially concerning variants. We used a computational protocol based on cryo-electron microscopy structures of the S-protein to estimate the change in ACE2-affinity due to S-protein mutation (ΔΔGbind) in good trend agreement with experimental ACE2 affinities. We then expanded predictions to all possible S-protein mutations in 21 different S-protein-ACE2 complexes (400,000 ΔΔGbind data points in total), using mutation group comparisons to reduce systematic errors. The results suggest that mutations that have arisen in major variants as a group maintain ACE2 affinity significantly more than random mutations in the total protein, at the interface, and at evolvable sites. Omicron mutations as a group had a modest change in binding affinity compared to mutations in other major variants. The single-mutation effects seem consistent with ACE2 binding being optimized and maintained in omicron, despite increased importance of other selection pressures (antigenic drift), however, epistasis, glycosylation and in vivo conditions will modulate these effects. Computational prediction of SARS-CoV-2 evolution remains far from achieved, but the feasibility of large-scale computation is substantially aided by using many structures and mutation groups rather than single mutation effects, which are very uncertain. Our results demonstrate substantial challenges but indicate ways forward to improve the quality of computer models for assessing SARS-CoV-2 mutation effects.
Collapse
Affiliation(s)
- Shivani Thakur
- Department of Chemistry, Indian Institute of Technology Bhilai, Sejbahar, Raipur, 492015, Chhattisgarh, India
| | - Rajaneesh Kumar Verma
- Department of Chemistry, Indian Institute of Technology Bhilai, Sejbahar, Raipur, 492015, Chhattisgarh, India
| | - Kasper Planeta Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, 2800, Kongens Lyngby, Denmark.
| | - Rukmankesh Mehra
- Department of Chemistry, Indian Institute of Technology Bhilai, Sejbahar, Raipur, 492015, Chhattisgarh, India.
| |
Collapse
|
8
|
Stability and expression of SARS-CoV-2 spike-protein mutations. Mol Cell Biochem 2022; 478:1269-1280. [PMID: 36302994 PMCID: PMC9612610 DOI: 10.1007/s11010-022-04588-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 10/12/2022] [Indexed: 12/02/2022]
Abstract
Protein fold stability likely plays a role in SARS-CoV-2 S-protein evolution, together with ACE2 binding and antibody evasion. While few thermodynamic stability data are available for S-protein mutants, many systematic experimental data exist for their expression. In this paper, we explore whether such expression levels relate to the thermodynamic stability of the mutants. We studied mutation-induced SARS-CoV-2 S-protein fold stability, as computed by three very distinct methods and eight different protein structures to account for method- and structure-dependencies. For all methods and structures used (24 comparisons), computed stability changes correlate significantly (99% confidence level) with experimental yeast expression from the literature, such that higher expression is associated with relatively higher fold stability. Also significant, albeit weaker, correlations were seen between stability and ACE2 binding effects. The effect of thermodynamic fold stability may be direct or a correlate of amino acid or site properties, notably the solvent exposure of the site. Correlation between computed stability and experimental expression and ACE2 binding suggests that functional properties of the SARS-CoV-2 S-protein mutant space are largely determined by a few simple features, due to underlying correlations. Our study lends promise to the development of computational tools that may ideally aid in understanding and predicting SARS-CoV-2 S-protein evolution.
Collapse
|
9
|
Structural heterogeneity and precision of implications drawn from cryo-electron microscopy structures: SARS-CoV-2 spike-protein mutations as a test case. EUROPEAN BIOPHYSICS JOURNAL 2022; 51:555-568. [PMID: 36167828 PMCID: PMC9514682 DOI: 10.1007/s00249-022-01619-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 09/19/2022] [Indexed: 11/18/2022]
Abstract
Protein structures may be used to draw functional implications at the residue level, but how sensitive are these implications to the exact structure used? Calculation of the effects of SARS-CoV-2 S-protein mutations based on experimental cryo-electron microscopy structures have been abundant during the pandemic. To understand the precision of such estimates, we studied three distinct methods to estimate stability changes for all possible mutations in 23 different S-protein structures (3.69 million ΔΔG values in total) and explored how random and systematic errors can be remedied by structure-averaged mutation group comparisons. We show that computational estimates have low precision, due to method and structure heterogeneity making results for single mutations uninformative. However, structure-averaged differences in mean effects for groups of substitutions can yield significant results. Illustrating this protocol, functionally important natural mutations, despite individual variations, average to a smaller stability impact compared to other possible mutations, independent of conformational state (open, closed). In summary, we document substantial issues with precision in structure-based protein modeling and recommend sensitivity tests to quantify these effects, but also suggest partial solutions to the problem in the form of structure-averaged “ensemble” estimates for groups of residues when multiple structures are available.
Collapse
|
10
|
Iqbal S, Ge F, Li F, Akutsu T, Zheng Y, Gasser RB, Yu DJ, Webb GI, Song J. PROST: AlphaFold2-aware Sequence-Based Predictor to Estimate Protein Stability Changes upon Missense Mutations. J Chem Inf Model 2022; 62:4270-4282. [PMID: 35973091 DOI: 10.1021/acs.jcim.2c00799] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
An essential step in engineering proteins and understanding disease-causing missense mutations is to accurately model protein stability changes when such mutations occur. Here, we developed a new sequence-based predictor for the protein stability (PROST) change (Gibb's free energy change, ΔΔG) upon a single-point missense mutation. PROST extracts multiple descriptors from the most promising sequence-based predictors, such as BoostDDG, SAAFEC-SEQ, and DDGun. RPOST also extracts descriptors from iFeature and AlphaFold2. The extracted descriptors include sequence-based features, physicochemical properties, evolutionary information, evolutionary-based physicochemical properties, and predicted structural features. The PROST predictor is a weighted average ensemble model based on extreme gradient boosting (XGBoost) decision trees and an extra-trees regressor; PROST is trained on both direct and hypothetical reverse mutations using the S5294 (S2647 direct mutations + S2647 inverse mutations). The parameters for the PROST model are optimized using grid searching with 5-fold cross-validation, and feature importance analysis unveils the most relevant features. The performance of PROST is evaluated in a blinded manner, employing nine distinct data sets and existing state-of-the-art sequence-based and structure-based predictors. This method consistently performs well on frataxin, S217, S349, Ssym, S669, Myoglobin, and CAGI5 data sets in blind tests and similarly to the state-of-the-art predictors for p53 and S276 data sets. When the performance of PROST is compared with the latest predictors such as BoostDDG, SAAFEC-SEQ, ACDC-NN-seq, and DDGun, PROST dominates these predictors. A case study of mutation scanning of the frataxin protein for nine wild-type residues demonstrates the utility of PROST. Taken together, these findings indicate that PROST is a well-suited predictor when no protein structural information is available. The source code of PROST, data sets, examples, and pretrained models along with how to use PROST are available at https://github.com/ShahidIqb/PROST and https://prost.erc.monash.edu/seq.
Collapse
Affiliation(s)
- Shahid Iqbal
- Department of Data Science and AI, Faculty of IT, Monash University, Clayton, Victoria 3800, Australia.,Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Fang Ge
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Yuanting Zheng
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Geoffrey I Webb
- Department of Data Science and AI, Faculty of IT, Monash University, Clayton, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Jiangning Song
- Department of Data Science and AI, Faculty of IT, Monash University, Clayton, Victoria 3800, Australia.,Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Clayton, Victoria 3800, Australia
| |
Collapse
|
11
|
Abstract
The spike protein (S-protein) of SARS-CoV-2, the protein that enables the virus to infect human cells, is the basis for many vaccines and a hotspot of concerning virus evolution. Here, we discuss the outstanding progress in structural characterization of the S-protein and how these structures facilitate analysis of virus function and evolution. We emphasize the differences in reported structures and that analysis of structure-function relationships is sensitive to the structure used. We show that the average residue solvent exposure in nearly complete structures is a good descriptor of open vs closed conformation states. Because of structural heterogeneity of functionally important surface-exposed residues, we recommend using averages of a group of high-quality protein structures rather than a single structure before reaching conclusions on specific structure-function relationships. To illustrate these points, we analyze some significant chemical tendencies of prominent S-protein mutations in the context of the available structures. In the discussion of new variants, we emphasize the selectivity of binding to ACE2 vs prominent antibodies rather than simply the antibody escape or ACE2 affinity separately. We note that larger chemical changes, in particular increased electrostatic charge or side-chain volume of exposed surface residues, are recurring in mutations of concern, plausibly related to adaptation to the negative surface potential of human ACE2. We also find indications that the fixated mutations of the S-protein in the main variants are less destabilizing than would be expected on average, possibly pointing toward a selection pressure on the S-protein. The richness of available structures for all of these situations provides an enormously valuable basis for future research into these structure-function relationships.
Collapse
Affiliation(s)
- Rukmankesh Mehra
- Department of Chemistry, Indian Institute
of Technology Bhilai, Sejbahar, Raipur 492015, Chhattisgarh,
India
| | - Kasper P. Kepp
- DTU Chemistry, Technical University of
Denmark, Building 206, 2800 Kongens Lyngby,
Denmark
| |
Collapse
|
12
|
Artificial intelligence challenges for predicting the impact of mutations on protein stability. Curr Opin Struct Biol 2021; 72:161-168. [PMID: 34922207 DOI: 10.1016/j.sbi.2021.11.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/15/2021] [Accepted: 11/08/2021] [Indexed: 01/17/2023]
Abstract
Stability is a key ingredient of protein fitness, and its modification through targeted mutations has applications in various fields, such as protein engineering, drug design, and deleterious variant interpretation. Many studies have been devoted over the past decades to build new, more effective methods for predicting the impact of mutations on protein stability based on the latest developments in artificial intelligence. We discuss their features, algorithms, computational efficiency, and accuracy estimated on an independent test set. We focus on a critical analysis of their limitations, the recurrent biases toward the training set, their generalizability, and interpretability. We found that the accuracy of the predictors has stagnated at around 1 kcal/mol for over 15 years. We conclude by discussing the challenges that need to be addressed to reach improved performance.
Collapse
|
13
|
Awais M, Wattoo JI, Zafar R, Khan N. Computational analysis of non-synonymous single nucleotide polymorphism in UROD gene linked with familial Porphyria Cutanea Tarda. GENE REPORTS 2021. [DOI: 10.1016/j.genrep.2021.101413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
14
|
A Deep-Learning Sequence-Based Method to Predict Protein Stability Changes Upon Genetic Variations. Genes (Basel) 2021; 12:genes12060911. [PMID: 34204764 PMCID: PMC8231498 DOI: 10.3390/genes12060911] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 06/08/2021] [Accepted: 06/09/2021] [Indexed: 01/17/2023] Open
Abstract
Several studies have linked disruptions of protein stability and its normal functions to disease. Therefore, during the last few decades, many tools have been developed to predict the free energy changes upon protein residue variations. Most of these methods require both sequence and structure information to obtain reliable predictions. However, the lower number of protein structures available with respect to their sequences, due to experimental issues, drastically limits the application of these tools. In addition, current methodologies ignore the antisymmetric property characterizing the thermodynamics of the protein stability: a variation from wild-type to a mutated form of the protein structure (XW→XM) and its reverse process (XM→XW) must have opposite values of the free energy difference (ΔΔGWM=−ΔΔGMW). Here we propose ACDC-NN-Seq, a deep neural network system that exploits the sequence information and is able to incorporate into its architecture the antisymmetry property. To our knowledge, this is the first convolutional neural network to predict protein stability changes relying solely on the protein sequence. We show that ACDC-NN-Seq compares favorably with the existing sequence-based methods.
Collapse
|
15
|
Caldararu O, Blundell TL, Kepp KP. A base measure of precision for protein stability predictors: structural sensitivity. BMC Bioinformatics 2021; 22:88. [PMID: 33632133 PMCID: PMC7908712 DOI: 10.1186/s12859-021-04030-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 02/15/2021] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Prediction of the change in fold stability (ΔΔG) of a protein upon mutation is of major importance to protein engineering and screening of disease-causing variants. Many prediction methods can use 3D structural information to predict ΔΔG. While the performance of these methods has been extensively studied, a new problem has arisen due to the abundance of crystal structures: How precise are these methods in terms of structure input used, which structure should be used, and how much does it matter? Thus, there is a need to quantify the structural sensitivity of protein stability prediction methods. RESULTS We computed the structural sensitivity of six widely-used prediction methods by use of saturated computational mutagenesis on a diverse set of 87 structures of 25 proteins. Our results show that structural sensitivity varies massively and surprisingly falls into two very distinct groups, with methods that take detailed account of the local environment showing a sensitivity of ~ 0.6 to 0.8 kcal/mol, whereas machine-learning methods display much lower sensitivity (~ 0.1 kcal/mol). We also observe that the precision correlates with the accuracy for mutation-type-balanced data sets but not generally reported accuracy of the methods, indicating the importance of mutation-type balance in both contexts. CONCLUSIONS The structural sensitivity of stability prediction methods varies greatly and is caused mainly by the models and less by the actual protein structural differences. As a new recommended standard, we therefore suggest that ΔΔG values are evaluated on three protein structures when available and the associated standard deviation reported, to emphasize not just the accuracy but also the precision of the method in a specific study. Our observation that machine-learning methods deemphasize structure may indicate that folded wild-type structures alone, without the folded mutant and unfolded structures, only add modest value for assessing protein stability effects, and that side-chain-sensitive methods overstate the significance of the folded wild-type structure.
Collapse
Affiliation(s)
- Octav Caldararu
- DTU Chemistry, Technical University of Denmark, Building 206, 2800, Kgs. Lyngby, Denmark
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, 2800, Kgs. Lyngby, Denmark.
| |
Collapse
|
16
|
Chen Y, Lu H, Zhang N, Zhu Z, Wang S, Li M. PremPS: Predicting the impact of missense mutations on protein stability. PLoS Comput Biol 2020; 16:e1008543. [PMID: 33378330 PMCID: PMC7802934 DOI: 10.1371/journal.pcbi.1008543] [Citation(s) in RCA: 93] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 01/12/2021] [Accepted: 11/16/2020] [Indexed: 12/12/2022] Open
Abstract
Computational methods that predict protein stability changes induced by missense mutations have made a lot of progress over the past decades. Most of the available methods however have very limited accuracy in predicting stabilizing mutations because existing experimental sets are dominated by mutations reducing protein stability. Moreover, few approaches could consistently perform well across different test cases. To address these issues, we developed a new computational method PremPS to more accurately evaluate the effects of missense mutations on protein stability. The PremPS method is composed of only ten evolutionary- and structure-based features and parameterized on a balanced dataset with an equal number of stabilizing and destabilizing mutations. A comprehensive comparison of the predictive performance of PremPS with other available methods on nine benchmark datasets confirms that our approach consistently outperforms other methods and shows considerable improvement in estimating the impacts of stabilizing mutations. A protein could have multiple structures available, and if another structure of the same protein is used, the predicted change in stability for structure-based methods might be different. Thus, we further estimated the impact of using different structures on prediction accuracy, and demonstrate that our method performs well across different types of structures except for low-resolution structures and models built based on templates with low sequence identity. PremPS can be used for finding functionally important variants, revealing the molecular mechanisms of functional influences and protein design. PremPS is freely available at https://lilab.jysw.suda.edu.cn/research/PremPS/, which allows to do large-scale mutational scanning and takes about four minutes to perform calculations for a single mutation per protein with ~ 300 residues and requires ~ 0.4 seconds for each additional mutation. The development of computational methods to accurately predict the impacts of amino acid substitutions on protein stability is of paramount importance for the field of protein design and understanding the roles of missense mutations in disease. However, most of the available methods have very limited predictive accuracy for mutations increasing stability and few could consistently perform well across different test cases. Here we present a new computational approach PremPS, which is capable of predicting the effects of single point mutations on protein stability. PremPS employs only ten evolutionary- and structure-based features and is trained on a symmetrical dataset consisting of the same number of cases of stabilizing and destabilizing mutations. Our method was tested against numerous blind datasets and shows a considerable improvement especially in evaluating the effects of stabilizing mutations, outperforming previously developed methods. PremPS is freely available as a user-friendly web server at http://lilab.jysw.suda.edu.cn/research/PremPS/, which is fast enough to handle the large number of cases.
Collapse
Affiliation(s)
- Yuting Chen
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Haoyu Lu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Ning Zhang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Zefeng Zhu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Shuqin Wang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Minghui Li
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
- * E-mail:
| |
Collapse
|
17
|
Li B, Yang YT, Capra JA, Gerstein MB. Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks. PLoS Comput Biol 2020; 16:e1008291. [PMID: 33253214 PMCID: PMC7728386 DOI: 10.1371/journal.pcbi.1008291] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 12/10/2020] [Accepted: 08/26/2020] [Indexed: 12/22/2022] Open
Abstract
Predicting mutation-induced changes in protein thermodynamic stability (ΔΔG) is of great interest in protein engineering, variant interpretation, and protein biophysics. We introduce ThermoNet, a deep, 3D-convolutional neural network (3D-CNN) designed for structure-based prediction of ΔΔGs upon point mutation. To leverage the image-processing power inherent in CNNs, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are uniformly constructed as multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. We train and evaluate ThermoNet with a curated data set that accounts for protein homology and is balanced with direct and reverse mutations; this provides a framework for addressing biases that have likely influenced many previous ΔΔG prediction methods. ThermoNet demonstrates performance comparable to the best available methods on the widely used Ssym test set. In addition, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We further show that homology between Ssym and widely used training sets like S2648 and VariBench has likely led to overestimated performance in previous studies. Finally, we demonstrate the practical utility of ThermoNet in predicting the ΔΔGs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar. Overall, our results suggest that 3D-CNNs can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms.
Collapse
Affiliation(s)
- Bian Li
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Biological Sciences and Vanderbilt Genetics Institute, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Yucheng T. Yang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - John A. Capra
- Department of Biological Sciences and Vanderbilt Genetics Institute, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Mark B. Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
18
|
Verma H, Silakari O. Investigating the Role of Missense SNPs on ALDH 1A1 mediated pharmacokinetic resistance to cyclophosphamide. Comput Biol Med 2020; 125:103979. [PMID: 32877739 DOI: 10.1016/j.compbiomed.2020.103979] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Revised: 08/15/2020] [Accepted: 08/17/2020] [Indexed: 12/18/2022]
Abstract
Cyclophosphamide (CP) is a well-known anti-cancer drug, which exerts its therapeutic effect by DNA cross-linking, both within and between DNA strands. Earlier, a single dose of CP was enough for an effective treatment however due to overexpression of ALDH 1A1 in cancer cells and consequent drug inactivation, the quality of treatment suffered a lot. Drug inactivation via Drug Metabolizing Enzyme (DME) like Aldehyde dehydrogenase 1A1 (ALDH 1A1) is one of the resistance mechanism which is least considered and somewhat overlooked. The current study focused on investigating the impact of missense SNPs on ALDH 1A1 mediated pharmacokinetic resistance to CP. To achieve this aim, we selected 14 missense SNPs from the large pool of SNPs database. The stability of the mutants corresponding to selected SNPs was then determined using web-based tools like I-Mutant, CUPSAT, Maestro-web, STRUM, Eris, SDM, DUET, I-Stable. The obtained results from the mentioned web tools were later validated by molecular dynamic simulations. Furthermore, to find out the optimal range in terms of geometrical parameters and binding affinity for a molecule to be a good substrate for ALDH 1A1, some well-reported substrates of ALDH1A1 were pooled from the literature. Subsequently, similar parameters were calculated for each aldophosphamide (Active metabolite of CP) - mutant complexes to determine if these parameters lie within the optimal range. Based on this analyses population which is most or least susceptible to resistance was suggested. Our results demonstrated that the population group corresponding to rs11554423 (Gly125Arg) and rs763363983 (Val460Leu) mutation may be least vulnerable to CP resistance. Whereas, the population corresponding to rs1049981 (Asn121Ser) and rs774967243 (Val295Leu) SNPs may be most vulnerable to CP resistance.
Collapse
Affiliation(s)
- Himanshu Verma
- Molecular Modelling Lab (MML), Department of Pharmaceutical Sciences and Drug Research, Punjabi University, Patiala, Punjab, 147002, India
| | - Om Silakari
- Molecular Modelling Lab (MML), Department of Pharmaceutical Sciences and Drug Research, Punjabi University, Patiala, Punjab, 147002, India.
| |
Collapse
|
19
|
Caldararu O, Mehra R, Blundell TL, Kepp KP. Systematic Investigation of the Data Set Dependency of Protein Stability Predictors. J Chem Inf Model 2020; 60:4772-4784. [DOI: 10.1021/acs.jcim.0c00591] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Octav Caldararu
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kgs. Lyngby, Denmark
| | - Rukmankesh Mehra
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kgs. Lyngby, Denmark
| | - Tom L. Blundell
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Kasper P. Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
20
|
Sanavia T, Birolo G, Montanucci L, Turina P, Capriotti E, Fariselli P. Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput Struct Biotechnol J 2020; 18:1968-1979. [PMID: 32774791 PMCID: PMC7397395 DOI: 10.1016/j.csbj.2020.07.011] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 07/10/2020] [Accepted: 07/14/2020] [Indexed: 12/13/2022] Open
Abstract
Protein stability predictions are becoming essential in medicine to develop novel immunotherapeutic agents and for drug discovery. Despite the large number of computational approaches for predicting the protein stability upon mutation, there are still critical unsolved problems: 1) the limited number of thermodynamic measurements for proteins provided by current databases; 2) the large intrinsic variability of ΔΔG values due to different experimental conditions; 3) biases in the development of predictive methods caused by ignoring the anti-symmetry of ΔΔG values between mutant and native protein forms; 4) over-optimistic prediction performance, due to sequence similarity between proteins used in training and test datasets. Here, we review these issues, highlighting new challenges required to improve current tools and to achieve more reliable predictions. In addition, we provide a perspective of how these methods will be beneficial for designing novel precision medicine approaches for several genetic disorders caused by mutations, such as cancer and neurodegenerative diseases.
Collapse
Affiliation(s)
- Tiziana Sanavia
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Ludovica Montanucci
- Department of Comparative Biomedicine and Food Science (BCA), University of Padova, Viale dell'Università 16, 35020 Legnaro, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| |
Collapse
|
21
|
Jana K, Mehra R, Dehury B, Blundell TL, Kepp KP. Common mechanism of thermostability in small α- and β-proteins studied by molecular dynamics. Proteins 2020; 88:1233-1250. [PMID: 32368818 DOI: 10.1002/prot.25897] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 04/01/2020] [Accepted: 04/29/2020] [Indexed: 12/13/2022]
Abstract
Protein thermostability is important to evolution, diseases, and industrial applications. Proteins use diverse molecular strategies to achieve stability at high temperature, yet reducing the entropy of unfolding seems required. We investigated five small α-proteins and five β-proteins with known, distinct structures and thermostability (Tm ) using multi-seed molecular dynamics simulations at 300, 350, and 400 K. The proteins displayed diverse changes in hydrogen bonding, solvent exposure, and secondary structure with no simple relationship to Tm . Our dynamics were in good agreement with experimental B-factors at 300 K and insensitive to force-field choice. Despite the very distinct structures, the native-state (300 + 350 K) free-energy landscapes (FELs) were significantly broader for the two most thermostable proteins and smallest for the three least stable proteins in both the α- and β-group and with both force fields studied independently (tailed t-test, 95% confidence level). Our results suggest that entropic ensembles stabilize proteins at high temperature due to reduced entropy of unfolding, viz., ΔG = ΔH - TΔS. Supporting this mechanism, the most thermostable proteins were also the least kinetically stable, consistent with broader FELs, typified by villin headpiece and confirmed by specific comparison to a mesophilic ortholog of Thermus thermophilus apo-pyrophosphate phosphohydrolase. We propose that molecular strategies of protein thermostabilization, although diverse, tend to converge toward highest possible entropy in the native state consistent with the functional requirements. We speculate that this tendency may explain why many proteins are not optimally structured and why molten-globule states resemble native proteins so much.
Collapse
Affiliation(s)
| | | | - Budheswar Dehury
- DTU Chemistry, Technical University of Denmark, Lyngby, Denmark.,Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
22
|
Tang N, Sandahl TD, Ott P, Kepp KP. Computing the Pathogenicity of Wilson's Disease ATP7B Mutations: Implications for Disease Prevalence. J Chem Inf Model 2019; 59:5230-5243. [PMID: 31751128 DOI: 10.1021/acs.jcim.9b00852] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Genetic variations in the gene encoding the copper-transport protein ATP7B are the primary cause of Wilson's disease. Controversially, clinical prevalence seems much smaller than the prevalence estimated by genetic screening tools, causing fear that many people are undiagnosed, although early diagnosis and treatment is essential. To address this issue, we benchmarked 16 state-of-the-art computational disease-prediction methods against established data of missense ATP7B mutations. Our results show that the quality of the methods varies widely. We show the importance of optimizing the threshold of the methods used to distinguish pathogenic from nonpathogenic mutations against data of clinically confirmed pathogenic and nonpathogenic mutations. We find that most methods use thresholds that predict too many ATP7B mutations to be pathogenic. Thus, our findings explain the current controversy on Wilson's disease prevalence because meta-analysis and text search methods include many computational estimates that lead to higher disease prevalence than clinically observed. As proteins and diseases differ widely, a one-size-fits-all threshold cannot distinguish pathogenic and nonpathogenic mutations efficiently, as shown here. We also show that amino acid changes with small evolutionary substitution probability, mainly due to amino acid volume, are more associated with the disease, implying a pathological effect on the conformational state of the protein, which could affect copper transport or adenosine triphosphate recognition and hydrolysis. These findings may be a first step toward a more quantitative genotype-phenotype relationship of Wilson's disease.
Collapse
Affiliation(s)
- Ning Tang
- DTU Chemistry , Technical University of Denmark , Kemitorvet 206 , 2800 Kongens Lyngby , Denmark
| | - Thomas D Sandahl
- Department of Hepatology and Gastroenterology , Aarhus University Hospital , 8200 Aarhus , Denmark
| | - Peter Ott
- Department of Hepatology and Gastroenterology , Aarhus University Hospital , 8200 Aarhus , Denmark
| | - Kasper P Kepp
- DTU Chemistry , Technical University of Denmark , Kemitorvet 206 , 2800 Kongens Lyngby , Denmark
| |
Collapse
|
23
|
Computational analysis of Alzheimer-causing mutations in amyloid precursor protein and presenilin 1. Arch Biochem Biophys 2019; 678:108168. [DOI: 10.1016/j.abb.2019.108168] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 10/25/2019] [Accepted: 11/02/2019] [Indexed: 12/13/2022]
|
24
|
Dehury B, Tang N, Kepp KP. Insights into membrane-bound presenilin 2 from all-atom molecular dynamics simulations. J Biomol Struct Dyn 2019; 38:3196-3210. [PMID: 31405326 DOI: 10.1080/07391102.2019.1655481] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Presenilins 1 and 2 (PS1 or PS2) are main genetic risk factors of familial Alzheimer's disease (AD) that produce the β-amyloid (Aβ) peptides and also have important stand-alone functions related to, e.g. calcium signaling. Most work so far has focused on PS1, but humans carry both PS1 and PS2, and mutations in both cause AD. Here, we develop a computational model of PS2 in the membrane to address the question how pathogenic PS2 mutations affect the membrane-embedded protein. The models are based on cryo-electron microscopy structures of PS1 translated to PS2, augmented with missing residues and a complete all-atom membrane-water system, and equilibrated using three independent 500-ns simulations of molecular dynamics with a structure-balanced force field. We show that the nine-transmembrane channel structure is substantially controlled by major dynamics in the hydrophilic loop bridging TM6 and TM7, which functions as a 'plug' in the PS2 membrane channel. TM2, TM6, TM7 and TM9 flexibility controls the size of this channel. We find that most pathogenic PS2 mutations significantly reduce stability relative to random mutations, using a statistical ANOVA test with all possible mutations in the affected sites as a control. The associated loss of compactness may also impair calcium affinity. Remarkably, similar properties of the open state are known to impair the binding of substrates to γ-secretase, and we thus argue that the two mechanisms could be functionally related.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Budheswar Dehury
- DTU Chemistry, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Ning Tang
- DTU Chemistry, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
25
|
Montanucci L, Capriotti E, Frank Y, Ben-Tal N, Fariselli P. DDGun: an untrained method for the prediction of protein stability changes upon single and multiple point variations. BMC Bioinformatics 2019; 20:335. [PMID: 31266447 PMCID: PMC6606456 DOI: 10.1186/s12859-019-2923-1] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Background Predicting the effect of single point variations on protein stability constitutes a crucial step toward understanding the relationship between protein structure and function. To this end, several methods have been developed to predict changes in the Gibbs free energy of unfolding (∆∆G) between wild type and variant proteins, using sequence and structure information. Most of the available methods however do not exhibit the anti-symmetric prediction property, which guarantees that the predicted ∆∆G value for a variation is the exact opposite of that predicted for the reverse variation, i.e., ∆∆G(A → B) = −∆∆G(B → A), where A and B are amino acids. Results Here we introduce simple anti-symmetric features, based on evolutionary information, which are combined to define an untrained method, DDGun (DDG untrained). DDGun is a simple approach based on evolutionary information that predicts the ∆∆G for single and multiple variations from sequence and structure information (DDGun3D). Our method achieves remarkable performance without any training on the experimental datasets, reaching Pearson correlation coefficients between predicted and measured ∆∆G values of ~ 0.5 and ~ 0.4 for single and multiple site variations, respectively. Surprisingly, DDGun performances are comparable with those of state of the art methods. DDGun also naturally predicts multiple site variations, thereby defining a benchmark method for both single site and multiple site predictors. DDGun is anti-symmetric by construction predicting the value of the ∆∆G of a reciprocal variation as almost equal (depending on the sequence profile) to -∆∆G of the direct variation. This is a valuable property that is missing in the majority of the methods. Conclusions Evolutionary information alone combined in an untrained method can achieve remarkably high performances in the prediction of ∆∆G upon protein mutation. Non-trained approaches like DDGun represent a valid benchmark both for scoring the predictive power of the individual features and for assessing the learning capability of supervised methods. Electronic supplementary material The online version of this article (10.1186/s12859-019-2923-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ludovica Montanucci
- Department of Comparative Biomedicine and Food Science (BCA), University of Padova, Viale dell'Università 16, 35020, Legnaro, Italy
| | - Emidio Capriotti
- BioFolD Unit, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via Selmi 3, 40126, Bologna, Italy.
| | - Yotam Frank
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, 69978, Tel Aviv, Israel
| | - Nir Ben-Tal
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, 69978, Tel Aviv, Israel
| | - Piero Fariselli
- Department of Comparative Biomedicine and Food Science (BCA), University of Padova, Viale dell'Università 16, 35020, Legnaro, Italy. .,Now at the Department of Medical Sciences, University of Torino, via Santena 19, 10126, Torino, Italy.
| |
Collapse
|
26
|
Diamantis P, Hage KE, Meuwly M. Effect of Single-Point Mutations on Nitric Oxide Rebinding and the Thermodynamic Stability of Myoglobin. J Phys Chem B 2019; 123:1961-1972. [PMID: 30724565 DOI: 10.1021/acs.jpcb.8b11454] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The effect of single amino acid mutations on the rebinding dynamics of nitrogen monoxide (NO) to myoglobin is investigated using reactive molecular dynamics simulations. In particular, mutations of residues surrounding the heme-active site (Leu29, His64, Val68) were considered. Consistent with experiments, all mutations studied here have a significant effect on the kinetics of the NO-rebinding process, which consists of a rapid (several 10 ps) and a slow (100s of ps) time scale. For all modifications considered, the time scales and rebinding fractions agree to within a few percents with results from experiments by adjusting one single, physically meaningful, conformationally averaged quantity: the asymptotic energy separation between the NO-bound (2A) and photodissociated (4A) states. It is furthermore shown that the thermodynamic stability of wild-type versus mutant Mb for the ligand-free and ligand-bound variants of the protein can be described by the same computational model. Therefore, ligand kinetics and thermodynamics are related in a direct fashion akin to Φ-value analysis, which establishes a relationship between protein folding rates and thermal stability of proteins.
Collapse
Affiliation(s)
- Polydefkis Diamantis
- Department of Chemistry , University of Basel , Klingelbergstrasse 80 , 4056 Basel , Switzerland
| | - Krystel El Hage
- Department of Chemistry , University of Basel , Klingelbergstrasse 80 , 4056 Basel , Switzerland
| | - Markus Meuwly
- Department of Chemistry , University of Basel , Klingelbergstrasse 80 , 4056 Basel , Switzerland.,Department of Chemistry , Brown University , Providence , Rhode Island 02912 , United States
| |
Collapse
|
27
|
Musil M, Konegger H, Hon J, Bednar D, Damborsky J. Computational Design of Stable and Soluble Biocatalysts. ACS Catal 2018. [DOI: 10.1021/acscatal.8b03613] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Milos Musil
- Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, 612 66 Brno, Czech Republic
- International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Hannes Konegger
- Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Jiri Hon
- Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, 612 66 Brno, Czech Republic
- International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
28
|
Computational Approaches to Prioritize Cancer Driver Missense Mutations. Int J Mol Sci 2018; 19:ijms19072113. [PMID: 30037003 PMCID: PMC6073793 DOI: 10.3390/ijms19072113] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 07/02/2018] [Accepted: 07/05/2018] [Indexed: 12/31/2022] Open
Abstract
Cancer is a complex disease that is driven by genetic alterations. There has been a rapid development of genome-wide techniques during the last decade along with a significant lowering of the cost of gene sequencing, which has generated widely available cancer genomic data. However, the interpretation of genomic data and the prediction of the association of genetic variations with cancer and disease phenotypes still requires significant improvement. Missense mutations, which can render proteins non-functional and provide a selective growth advantage to cancer cells, are frequently detected in cancer. Effects caused by missense mutations can be pinpointed by in silico modeling, which makes it more feasible to find a treatment and reverse the effect. Specific human phenotypes are largely determined by stability, activity, and interactions between proteins and other biomolecules that work together to execute specific cellular functions. Therefore, analysis of missense mutations’ effects on proteins and their complexes would provide important clues for identifying functionally important missense mutations, understanding the molecular mechanisms of cancer progression and facilitating treatment and prevention. Herein, we summarize the major computational approaches and tools that provide not only the classification of missense mutations as cancer drivers or passengers but also the molecular mechanisms induced by driver mutations. This review focuses on the discussion of annotation and prediction methods based on structural and biophysical data, analysis of somatic cancer missense mutations in 3D structures of proteins and their complexes, predictions of the effects of missense mutations on protein stability, protein-protein and protein-nucleic acid interactions, and assessment of conformational changes in protein conformations induced by mutations.
Collapse
|
29
|
Buß O, Rudat J, Ochsenreither K. FoldX as Protein Engineering Tool: Better Than Random Based Approaches? Comput Struct Biotechnol J 2018; 16:25-33. [PMID: 30275935 PMCID: PMC6158775 DOI: 10.1016/j.csbj.2018.01.002] [Citation(s) in RCA: 133] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 12/21/2017] [Accepted: 01/20/2018] [Indexed: 02/04/2023] Open
Abstract
Improving protein stability is an important goal for basic research as well as for clinical and industrial applications but no commonly accepted and widely used strategy for efficient engineering is known. Beside random approaches like error prone PCR or physical techniques to stabilize proteins, e.g. by immobilization, in silico approaches are gaining more attention to apply target-oriented mutagenesis. In this review different algorithms for the prediction of beneficial mutation sites to enhance protein stability are summarized and the advantages and disadvantages of FoldX are highlighted. The question whether the prediction of mutation sites by the algorithm FoldX is more accurate than random based approaches is addressed.
Collapse
Affiliation(s)
- Oliver Buß
- Institute of Process Engineering in Life Sciences, Section II: Technical Biology, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | | | | |
Collapse
|
30
|
El Hage K, Mondal P, Meuwly M. Free energy simulations for protein ligand binding and stability. MOLECULAR SIMULATION 2018. [DOI: 10.1080/08927022.2017.1416115] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Krystel El Hage
- Department of Chemistry, University of Basel , Basel, Switzerland
| | - Padmabati Mondal
- Department of Chemistry, University of Basel , Basel, Switzerland
| | - Markus Meuwly
- Department of Chemistry, University of Basel , Basel, Switzerland
| |
Collapse
|
31
|
Mehra R, Meyer AS, Kepp KP. Molecular dynamics derived life times of active substrate binding poses explainKMof laccase mutants. RSC Adv 2018; 8:36915-36926. [PMID: 35558910 PMCID: PMC9089231 DOI: 10.1039/c8ra07138a] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Accepted: 10/23/2018] [Indexed: 11/21/2022] Open
Abstract
Molecular dynamics derived life times of reactive poses and MMGBSA substrate affinities explain trends in experimentalKMfor laccases.
Collapse
Affiliation(s)
- Rukmankesh Mehra
- Technical University of Denmark
- DTU Chemistry
- Denmark
- Technical University of Denmark
- DTU Bioengineering
| | - Anne S. Meyer
- Technical University of Denmark
- DTU Bioengineering
- Denmark
| | | |
Collapse
|
32
|
Li G, Chen Y, Fang X, Su F, Xu L, Yan Y. Identification of a hot-spot to enhance Candida rugosa lipase thermostability by rational design methods. RSC Adv 2018; 8:1948-1957. [PMID: 35542566 PMCID: PMC9077275 DOI: 10.1039/c7ra11679a] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 01/02/2018] [Indexed: 11/21/2022] Open
Abstract
Lipase is one of the most widely used classes of enzymes in biotechnological applications and organic chemistry. Candida rugosa lipases (CRL) can catalyze hydrolysis, esterification and transesterification with high regio-, stereo- and enantio-selectivity. However, thermal inactivation above 45 °C limits CRL's applications. Studies on improving the thermal stability of CRL are often limited by its slow-growing eukaryotic expression host, which is not suitable for large-scale screening. Identification of thermally stable mutants by rational design, regarded as an efficient substitution of experimental efforts, would provide a method for site-directed improvement of CRL. In this study, mutation-induced stability changes in CRL Lip1 were predicted by three rational design methods. Followed by conservative analyses and functional region exclusion, five mutants of a hot-spot, Asp457Phe, Asp457Trp, Asp457Met, Asp457Leu, and Asp457Tyr, were identified and prepared for enzymatic characterization. These five mutants increased the apparent melting temperature of Lip1 from 7.4 °C to 9.3 °C, with the most thermostable mutant, Asp457Phe, exhibiting a 5.5-fold longer half-life at 50 °C and a 10 °C increase in optimum temperature. Furthermore, pH stability of Lip1 was also enhanced due to the introduction of Asp457Phe mutation. The study demonstrates that thermally stable mutants of CRL could be identified with limited experimental efforts using rational design methods. The thermostability of Candida rugosa lipase expressed in a eukaryotic host is enhanced with limited experimental effort based on rational design methods.![]()
Collapse
Affiliation(s)
- Guanlin Li
- Key Laboratory of Molecular Biophysics
- The Ministry of Education
- College of Life Science and Technology
- Huazhong University of Science and Technology
- Wuhan 430074
| | - Yuan Chen
- Key Laboratory of Molecular Biophysics
- The Ministry of Education
- College of Life Science and Technology
- Huazhong University of Science and Technology
- Wuhan 430074
| | - Xingrong Fang
- Key Laboratory of Molecular Biophysics
- The Ministry of Education
- College of Life Science and Technology
- Huazhong University of Science and Technology
- Wuhan 430074
| | - Feng Su
- Key Laboratory of Molecular Biophysics
- The Ministry of Education
- College of Life Science and Technology
- Huazhong University of Science and Technology
- Wuhan 430074
| | - Li Xu
- Key Laboratory of Molecular Biophysics
- The Ministry of Education
- College of Life Science and Technology
- Huazhong University of Science and Technology
- Wuhan 430074
| | - Yunjun Yan
- Key Laboratory of Molecular Biophysics
- The Ministry of Education
- College of Life Science and Technology
- Huazhong University of Science and Technology
- Wuhan 430074
| |
Collapse
|
33
|
Dasmeh P, Kepp KP. Superoxide dismutase 1 is positively selected to minimize protein aggregation in great apes. Cell Mol Life Sci 2017; 74:3023-3037. [PMID: 28389720 PMCID: PMC11107616 DOI: 10.1007/s00018-017-2519-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Revised: 03/17/2017] [Accepted: 04/03/2017] [Indexed: 12/14/2022]
Abstract
Positive (adaptive) selection has recently been implied in human superoxide dismutase 1 (SOD1), a highly abundant antioxidant protein with energy signaling and antiaging functions, one of very few examples of direct selection on a human protein product (exon); the molecular drivers of this selection are unknown. We mapped 30 extant SOD1 sequences to the recently established mammalian species tree and inferred ancestors, key substitutions, and signatures of selection during the protein's evolution. We detected elevated substitution rates leading to great apes (Hominidae) at ~1 per 2 million years, significantly higher than in other primates and rodents, although these paradoxically generally evolve much faster. The high evolutionary rate was partly due to relaxation of some selection pressures and partly to distinct positive selection of SOD1 in great apes. We then show that higher stability and net charge and changes at the dimer interface were selectively introduced upon separation from old world monkeys and lesser apes (gibbons). Consequently, human, chimpanzee and gorilla SOD1s have a net charge of -6 at physiological pH, whereas the closely related gibbons and macaques have -3. These features consistently point towards selection against the malicious aggregation effects of elevated SOD1 levels in long-living great apes. The findings mirror the impact of human SOD1 mutations that reduce net charge and/or stability and cause ALS, a motor neuron disease characterized by oxidative stress and SOD1 aggregates and triggered by aging. Our study thus marks an example of direct selection for a particular chemical phenotype (high net charge and stability) in a single human protein with possible implications for the evolution of aging.
Collapse
Affiliation(s)
- Pouria Dasmeh
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
- Department of Biochemistry and Cedergren Center for Bioinformatics and Genomics, Faculty of Medicine, University of Montreal, 2900 Edouard-Montpetit, Montreal, QC, H3T 1J4, Canada
| | - Kasper P Kepp
- Technical University of Denmark, DTU Chemistry, 2800, Kongens Lyngby, Denmark.
| |
Collapse
|
34
|
Kumar V, Rahman S, Choudhry H, Zamzami MA, Sarwar Jamal M, Islam A, Ahmad F, Hassan MI. Computing disease-linked SOD1 mutations: deciphering protein stability and patient-phenotype relations. Sci Rep 2017; 7:4678. [PMID: 28680046 PMCID: PMC5498623 DOI: 10.1038/s41598-017-04950-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 05/24/2017] [Indexed: 11/13/2022] Open
Abstract
Protein stability is a requisite in the field of biotechnology, cell biology and drug design. To understand effects of amino acid substitutions, computational models are preferred to save time and expenses. As a systemically important, highly abundant, stable protein, the knowledge of Cu/Zn Superoxide dismutase1 (SOD1) is important, making it a suitable test case for genotype-phenotype correlation in understanding ALS. Here, we report performance of eight protein stability calculators (PoPMuSiC 3.1, I-Mutant 2.0, I-Mutant 3.0, CUPSAT, FoldX, mCSM, BeatMusic and ENCoM) against 54 experimental stability changes due to mutations of SOD1. Four different high-resolution structures were used to test structure sensitivity that may affect protein calculations. Bland-Altman plot was also used to assess agreement between stability analyses. Overall, PoPMuSiC and FoldX emerge as the best methods in this benchmark. The relative performance of all the eight methods was very much structure independent, and also displayed less structural sensitivity. We also analyzed patient's data in relation to experimental and computed protein stabilities for mutations of human SOD1. Correlation between disease phenotypes and stability changes suggest that the changes in SOD1 stability correlate with ALS patient survival times. Thus, the results clearly demonstrate the importance of protein stability in SOD1 pathogenicity.
Collapse
Affiliation(s)
- Vijay Kumar
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, 110025, India
| | - Safikur Rahman
- Department of Medical Biotechnology, Yeungnam University, Gyeongsan, 712-749, South Korea
| | - Hani Choudhry
- Department of Biochemistry, Cancer Metabolism and Epigenetic Unit, Faculty of Science, Center of Innovation in Personalized Medicine, Cancer and Mutagenesis Unit, King Fahd Center for Medical Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mazin A Zamzami
- Department of Biochemistry, Cancer Metabolism and Epigenetic Unit, Faculty of Science, Cancer and Mutagenesis Unit, King Fahd Center for Medical Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mohammad Sarwar Jamal
- King Fahd Medical Research Center, King Abdulaziz University, P.O. Box 80216, Jeddah, 21589, Saudi Arabia
| | - Asimul Islam
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, 110025, India
| | - Faizan Ahmad
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, 110025, India
| | - Md Imtaiyaz Hassan
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, 110025, India.
| |
Collapse
|
35
|
Mukherjee S, Mukherjee M, Bandyopadhyay S, Dey A. Three phases in pH dependent heme abstraction from myoglobin. J Inorg Biochem 2017; 172:80-87. [DOI: 10.1016/j.jinorgbio.2017.04.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Revised: 04/04/2017] [Accepted: 04/08/2017] [Indexed: 10/19/2022]
|
36
|
Topham CM, Barbe S, André I. An Atomistic Statistically Effective Energy Function for Computational Protein Design. J Chem Theory Comput 2016; 12:4146-68. [PMID: 27341125 DOI: 10.1021/acs.jctc.6b00090] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Shortcomings in the definition of effective free-energy surfaces of proteins are recognized to be a major contributory factor responsible for the low success rates of existing automated methods for computational protein design (CPD). The formulation of an atomistic statistically effective energy function (SEEF) suitable for a wide range of CPD applications and its derivation from structural data extracted from protein domains and protein-ligand complexes are described here. The proposed energy function comprises nonlocal atom-based and local residue-based SEEFs, which are coupled using a novel atom connectivity number factor to scale short-range, pairwise, nonbonded atomic interaction energies and a surface-area-dependent cavity energy term. This energy function was used to derive additional SEEFs describing the unfolded-state ensemble of any given residue sequence based on computed average energies for partially or fully solvent-exposed fragments in regions of irregular structure in native proteins. Relative thermal stabilities of 97 T4 bacteriophage lysozyme mutants were predicted from calculated energy differences for folded and unfolded states with an average unsigned error (AUE) of 0.84 kcal mol(-1) when compared to experiment. To demonstrate the utility of the energy function for CPD, further validation was carried out in tests of its capacity to recover cognate protein sequences and to discriminate native and near-native protein folds, loop conformers, and small-molecule ligand binding poses from non-native benchmark decoys. Experimental ligand binding free energies for a diverse set of 80 protein complexes could be predicted with an AUE of 2.4 kcal mol(-1) using an additional energy term to account for the loss in ligand configurational entropy upon binding. The atomistic SEEF is expected to improve the accuracy of residue-based coarse-grained SEEFs currently used in CPD and to extend the range of applications of extant atom-based protein statistical potentials.
Collapse
Affiliation(s)
- Christopher M Topham
- Université de Toulouse; INSA, UPS, INP; LISBP , 135 Avenue de Rangueil, F-31077 Toulouse, France.,CNRS, UMR5504 , F-31400 Toulouse, France.,INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés , F-31400 Toulouse, France
| | - Sophie Barbe
- Université de Toulouse; INSA, UPS, INP; LISBP , 135 Avenue de Rangueil, F-31077 Toulouse, France.,CNRS, UMR5504 , F-31400 Toulouse, France.,INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés , F-31400 Toulouse, France
| | - Isabelle André
- Université de Toulouse; INSA, UPS, INP; LISBP , 135 Avenue de Rangueil, F-31077 Toulouse, France.,CNRS, UMR5504 , F-31400 Toulouse, France.,INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés , F-31400 Toulouse, France
| |
Collapse
|
37
|
Soloviov M, Das AK, Meuwly M. Strukturelle Interpretation metastabiler Zustände in Myoglobin-NO. Angew Chem Int Ed Engl 2016. [DOI: 10.1002/ange.201604552] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Maksym Soloviov
- Departement für Chemie; Universität Basel; Klingelbergstraße 80 4056 Basel Schweiz
| | - Akshaya K. Das
- Departement für Chemie; Universität Basel; Klingelbergstraße 80 4056 Basel Schweiz
| | - Markus Meuwly
- Departement für Chemie; Universität Basel; Klingelbergstraße 80 4056 Basel Schweiz
| |
Collapse
|
38
|
Soloviov M, Das AK, Meuwly M. Structural Interpretation of Metastable States in Myoglobin-NO. Angew Chem Int Ed Engl 2016; 55:10126-30. [DOI: 10.1002/anie.201604552] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Indexed: 01/08/2023]
Affiliation(s)
- Maksym Soloviov
- Department of Chemistry; University of Basel; Klingelbergstrasse 80 4056 Basel Switzerland
| | - Akshaya K. Das
- Department of Chemistry; University of Basel; Klingelbergstrasse 80 4056 Basel Switzerland
| | - Markus Meuwly
- Department of Chemistry; University of Basel; Klingelbergstrasse 80 4056 Basel Switzerland
| |
Collapse
|
39
|
Tracking evolution of myoglobin stability in cetaceans using experimentally calibrated computational methods that account for generic protein relaxation. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2016; 1864:825-34. [DOI: 10.1016/j.bbapap.2016.04.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Revised: 04/05/2016] [Accepted: 04/07/2016] [Indexed: 11/22/2022]
|