1
|
Xu J, Gong J, Bo X, Tong Y, Ren Z, Ni M. A benchmark for evaluation of structure-based online tools for antibody-antigen binding affinity. Biophys Chem 2024; 311:107253. [PMID: 38768531 DOI: 10.1016/j.bpc.2024.107253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 04/08/2024] [Accepted: 04/28/2024] [Indexed: 05/22/2024]
Abstract
The prediction of binding affinity changes caused by missense mutations can elucidate antigen-antibody interactions. A few accessible structure-based online computational tools have been proposed. However, selecting suitable software for particular research is challenging, especially research on the SARS-CoV-2 spike protein with antibodies. Therefore, benchmarking of the mutation-diverse SARS-CoV-2 datasets is critical. Here, we collected the datasets including 1216 variants about the changes in binding affinity of antigens from 22 complexes for SARS-CoV-2 S proteins and 22 monoclonal antibodies as well as applied them to evaluate the performance of seven binding affinity prediction tools. The tested tools' Pearson correlations between predicted and measured changes in binding affinity were between -0.158 and 0.657, while accuracy in classification tasks on predicting increasing or decreasing affinity ranged from 0.444 to 0.834. These tools performed relatively better on predicting single mutations, especially at epitope sites, whereas poor performance on extremely decreasing affinity. The tested tools were relatively insensitive to the experimental techniques used to obtain structures of complexes. In summary, we constructed a list of datasets and evaluated a range of structure-based online prediction tools that will explicate relevant processes of antigen-antibody interactions and enhance the computational design of therapeutic monoclonal antibodies.
Collapse
Affiliation(s)
- Jiayi Xu
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Jianting Gong
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yigang Tong
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China; Beijing Advanced Innovation Center for Soft Matter Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, China.
| | - Zilin Ren
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China; Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Changchun 130122, China.
| | - Ming Ni
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China.
| |
Collapse
|
2
|
Thakur S, Planeta Kepp K, Mehra R. Predicting virus Fitness: Towards a structure-based computational model. J Struct Biol 2023; 215:108042. [PMID: 37931730 DOI: 10.1016/j.jsb.2023.108042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 10/12/2023] [Accepted: 11/03/2023] [Indexed: 11/08/2023]
Abstract
Predicting the impact of new emerging virus mutations is of major interest in surveillance and for understanding the evolutionary forces of the pathogens. The SARS-CoV-2 surface spike-protein (S-protein) binds to human ACE2 receptors as a critical step in host cell infection. At the same time, S-protein binding to human antibodies neutralizes the virus and prevents interaction with ACE2. Here we combine these two binding properties in a simple virus fitness model, using structure-based computation of all possible mutation effects averaged over 10 ACE2 complexes and 10 antibody complexes of the S-protein (∼380,000 computed mutations), and validated the approach against diverse experimental binding/escape data of ACE2 and antibodies. The ACE2-antibody selectivity change caused by mutation (i.e., the differential change in binding to ACE2 vs. immunity-inducing antibodies) is proposed to be a key metric of fitness model, enabling systematic error cancelation when evaluated. In this model, new mutations become fixated if they increase the selective binding to ACE2 relative to circulating antibodies, assuming that both are present in the host in a competitive binding situation. We use this model to categorize viral mutations that may best reach ACE2 before being captured by antibodies. Our model may aid the understanding of variant-specific vaccines and molecular mechanisms of viral evolution in the context of a human host.
Collapse
Affiliation(s)
- Shivani Thakur
- Department of Chemistry, Indian Institute of Technology Bhilai, Kutelabhata, Durg - 491001, Chhattisgarh, India
| | - Kasper Planeta Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kongens Lyngby, Denmark
| | - Rukmankesh Mehra
- Department of Chemistry, Indian Institute of Technology Bhilai, Kutelabhata, Durg - 491001, Chhattisgarh, India; Department of Bioscience and Biomedical Engineering, Indian Institute of Technology Bhilai, Kutelabhata, Durg - 491001, Chhattisgarh, India.
| |
Collapse
|
3
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
4
|
Jia DX, Yu H, Wang F, Jin LQ, Liu ZQ, Zheng YG. Computer-aided design of novel cellobiose 2-epimerase for efficient synthesis of lactulose using lactose. Bioprocess Biosyst Eng 2023:10.1007/s00449-023-02896-z. [PMID: 37450268 DOI: 10.1007/s00449-023-02896-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 06/16/2023] [Indexed: 07/18/2023]
Abstract
Cellobiose 2-epimerase (CE) is ideally suited to synthesize lactulose from lactose, but the poor thermostability and catalytic efficiency restrict enzymatic application. Herein, a non-characterized CE originating from Caldicellulosiruptor morganii (CmCE) was discovered in the NCBI database. Then, a smart mutation library was constructed based on FoldX ΔΔG calculation and modeling structure analysis, from which a positive mutant D226G located within the α8/α9 loop exhibited longer half-lives at 65-75 °C as well as lower Km and higher kcat/Km values compared with CmCE. Molecular modeling demonstrated that the improvement of D226G was largely attributed to the rigidification of the flexible loop, the compactness of the catalysis pocket and the increment of substrate-binding capability. Finally, the yield of synthesizing lactulose catalyzed by D226G reached 45.5%, higher than the 35.9% achieved with CmCE. The disclosed effect of the flexible loop on enzymatic stability and catalysis provides insight to redesign efficient CEs to biosynthesize lactulose.
Collapse
Affiliation(s)
- Dong-Xu Jia
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, 18 Chaowang Road, Hangzhou, 310014, People's Republic of China
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, 18 Chaowang Road, Hangzhou, 310014, People's Republic of China
| | - Hai Yu
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, 18 Chaowang Road, Hangzhou, 310014, People's Republic of China
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, 18 Chaowang Road, Hangzhou, 310014, People's Republic of China
| | - Fan Wang
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, 18 Chaowang Road, Hangzhou, 310014, People's Republic of China
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, 18 Chaowang Road, Hangzhou, 310014, People's Republic of China
| | - Li-Qun Jin
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, 18 Chaowang Road, Hangzhou, 310014, People's Republic of China
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, 18 Chaowang Road, Hangzhou, 310014, People's Republic of China
| | - Zhi-Qiang Liu
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, 18 Chaowang Road, Hangzhou, 310014, People's Republic of China.
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, 18 Chaowang Road, Hangzhou, 310014, People's Republic of China.
| | - Yu-Guo Zheng
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, 18 Chaowang Road, Hangzhou, 310014, People's Republic of China
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, 18 Chaowang Road, Hangzhou, 310014, People's Republic of China
| |
Collapse
|
5
|
Thakur S, Verma RK, Kepp KP, Mehra R. Modelling SARS-CoV-2 spike-protein mutation effects on ACE2 binding. J Mol Graph Model 2023; 119:108379. [PMID: 36481587 PMCID: PMC9690204 DOI: 10.1016/j.jmgm.2022.108379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 11/04/2022] [Accepted: 11/21/2022] [Indexed: 11/26/2022]
Abstract
The binding affinity of the SARS-CoV-2 spike (S)-protein to the human membrane protein ACE2 is critical for virus function. Computational structure-based screening of new S-protein mutations for ACE2 binding lends promise to rationalize virus function directly from protein structure and ideally aid early detection of potentially concerning variants. We used a computational protocol based on cryo-electron microscopy structures of the S-protein to estimate the change in ACE2-affinity due to S-protein mutation (ΔΔGbind) in good trend agreement with experimental ACE2 affinities. We then expanded predictions to all possible S-protein mutations in 21 different S-protein-ACE2 complexes (400,000 ΔΔGbind data points in total), using mutation group comparisons to reduce systematic errors. The results suggest that mutations that have arisen in major variants as a group maintain ACE2 affinity significantly more than random mutations in the total protein, at the interface, and at evolvable sites. Omicron mutations as a group had a modest change in binding affinity compared to mutations in other major variants. The single-mutation effects seem consistent with ACE2 binding being optimized and maintained in omicron, despite increased importance of other selection pressures (antigenic drift), however, epistasis, glycosylation and in vivo conditions will modulate these effects. Computational prediction of SARS-CoV-2 evolution remains far from achieved, but the feasibility of large-scale computation is substantially aided by using many structures and mutation groups rather than single mutation effects, which are very uncertain. Our results demonstrate substantial challenges but indicate ways forward to improve the quality of computer models for assessing SARS-CoV-2 mutation effects.
Collapse
Affiliation(s)
- Shivani Thakur
- Department of Chemistry, Indian Institute of Technology Bhilai, Sejbahar, Raipur, 492015, Chhattisgarh, India
| | - Rajaneesh Kumar Verma
- Department of Chemistry, Indian Institute of Technology Bhilai, Sejbahar, Raipur, 492015, Chhattisgarh, India
| | - Kasper Planeta Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, 2800, Kongens Lyngby, Denmark.
| | - Rukmankesh Mehra
- Department of Chemistry, Indian Institute of Technology Bhilai, Sejbahar, Raipur, 492015, Chhattisgarh, India.
| |
Collapse
|
6
|
Stability and expression of SARS-CoV-2 spike-protein mutations. Mol Cell Biochem 2022; 478:1269-1280. [PMID: 36302994 PMCID: PMC9612610 DOI: 10.1007/s11010-022-04588-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 10/12/2022] [Indexed: 12/02/2022]
Abstract
Protein fold stability likely plays a role in SARS-CoV-2 S-protein evolution, together with ACE2 binding and antibody evasion. While few thermodynamic stability data are available for S-protein mutants, many systematic experimental data exist for their expression. In this paper, we explore whether such expression levels relate to the thermodynamic stability of the mutants. We studied mutation-induced SARS-CoV-2 S-protein fold stability, as computed by three very distinct methods and eight different protein structures to account for method- and structure-dependencies. For all methods and structures used (24 comparisons), computed stability changes correlate significantly (99% confidence level) with experimental yeast expression from the literature, such that higher expression is associated with relatively higher fold stability. Also significant, albeit weaker, correlations were seen between stability and ACE2 binding effects. The effect of thermodynamic fold stability may be direct or a correlate of amino acid or site properties, notably the solvent exposure of the site. Correlation between computed stability and experimental expression and ACE2 binding suggests that functional properties of the SARS-CoV-2 S-protein mutant space are largely determined by a few simple features, due to underlying correlations. Our study lends promise to the development of computational tools that may ideally aid in understanding and predicting SARS-CoV-2 S-protein evolution.
Collapse
|
7
|
Structural heterogeneity and precision of implications drawn from cryo-electron microscopy structures: SARS-CoV-2 spike-protein mutations as a test case. EUROPEAN BIOPHYSICS JOURNAL 2022; 51:555-568. [PMID: 36167828 PMCID: PMC9514682 DOI: 10.1007/s00249-022-01619-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 09/19/2022] [Indexed: 11/18/2022]
Abstract
Protein structures may be used to draw functional implications at the residue level, but how sensitive are these implications to the exact structure used? Calculation of the effects of SARS-CoV-2 S-protein mutations based on experimental cryo-electron microscopy structures have been abundant during the pandemic. To understand the precision of such estimates, we studied three distinct methods to estimate stability changes for all possible mutations in 23 different S-protein structures (3.69 million ΔΔG values in total) and explored how random and systematic errors can be remedied by structure-averaged mutation group comparisons. We show that computational estimates have low precision, due to method and structure heterogeneity making results for single mutations uninformative. However, structure-averaged differences in mean effects for groups of substitutions can yield significant results. Illustrating this protocol, functionally important natural mutations, despite individual variations, average to a smaller stability impact compared to other possible mutations, independent of conformational state (open, closed). In summary, we document substantial issues with precision in structure-based protein modeling and recommend sensitivity tests to quantify these effects, but also suggest partial solutions to the problem in the form of structure-averaged “ensemble” estimates for groups of residues when multiple structures are available.
Collapse
|
8
|
Pak MA, Ivankov DN. Best templates outperform homology models in predicting the impact of mutations on protein stability. Bioinformatics 2022; 38:4312-4320. [PMID: 35894930 DOI: 10.1093/bioinformatics/btac515] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 05/31/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Prediction of protein stability change upon mutation (ΔΔG) is crucial for facilitating protein engineering and understanding of protein folding principles. Robust prediction of protein folding free energy change requires the knowledge of protein three-dimensional (3D) structure. In case, protein 3D structure is not available, one can predict the structure from protein sequence; however, the perspectives of ΔΔG predictions for predicted protein structures are unknown. The accuracy of using 3D structures of the best templates for the ΔΔG prediction is also unclear. RESULTS To investigate these questions, we used a representative set of seven diverse and accurate publicly available tools (FoldX, Eris, Rosetta, DDGun, ACDC-NN, ThermoNet and DynaMut) for stability change prediction combined with AlphaFold or I-Tasser for protein 3D structure prediction. We found that best templates perform consistently better than (or similar to) homology models for all ΔΔG predictors. Our findings imply using the best template structure for the prediction of protein stability change upon mutation if the protein 3D structure is not available. AVAILABILITY AND IMPLEMENTATION The data are available at https://github.com/ivankovlab/template-vs-model. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marina A Pak
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| | - Dmitry N Ivankov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| |
Collapse
|
9
|
Abdullaev A, Abdurakhimov A, Mirakbarova Z, Ibragimova S, Tsoy V, Nuriddinov S, Dalimova D, Turdikulova S, Abdurakhmonov I. Genome sequence diversity of SARS-CoV-2 obtained from clinical samples in Uzbekistan. PLoS One 2022; 17:e0270314. [PMID: 35759503 PMCID: PMC9236271 DOI: 10.1371/journal.pone.0270314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 06/07/2022] [Indexed: 11/27/2022] Open
Abstract
Tracking temporal and spatial genomic changes and evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are among the most urgent research topics worldwide, which help to elucidate the coronavirus disease 2019 (COVID-19) pathogenesis and the effect of deleterious variants. Our current study concentrates genetic diversity of SARS-CoV-2 variants in Uzbekistan and their associations with COVID-19 severity. Thirty-nine whole genome sequences (WGS) of SARS-CoV-2 isolated from PCR-positive patients from Tashkent, Uzbekistan for the period of July-August 2021, were generated and further subjected to further genomic analysis. Genome-wide annotations of clinical isolates from our study have revealed a total of 223 nucleotide-level variations including SNPs and 34 deletions at different positions throughout the entire genome of SARS-CoV-2. These changes included two novel mutations at the Nonstructural protein (Nsp) 13: A85P and Nsp12: Y479N, which were unreported previously. There were two groups of co-occurred substitution patterns: the missense mutations in the Spike (S): D614G, Open Reading Frame (ORF) 1b: P314L, Nsp3: F924, 5`UTR:C241T; Nsp3:P2046L and Nsp3:P2287S, and the synonymous mutations in the Nsp4:D2907 (C8986T), Nsp6:T3646A and Nsp14:A1918V regions, respectively. The “Nextstrain” clustered the largest number of SARS-CoV-2 strains into the Delta clade (n = 32; 82%), followed by two Alpha-originated (n = 4; 10,3%) and 20A (n = 3; 7,7%) clades. Geographically the Delta clade sample sequences were grouped into several clusters with the SARS-CoV genotypes from Russia, Denmark, USA, Egypt and Bangladesh. Phylogenetically, the Delta isolates in our study belong to the two main subclades 21A (56%) and 21J (44%). We found that females were more affected by 21A, whereas males by 21J variant (χ2 = 4.57; p ≤ 0.05, n = 32). The amino acid substitution ORF7a:P45L in the Delta isolates found to be significantly associated with disease severity. In conclusion, this study evidenced that Identified novel substitutions Nsp13: A85P and Nsp12: Y479N, have a destabilizing effect, while missense substitution ORF7a: P45L significantly associated with disease severity.
Collapse
Affiliation(s)
| | | | | | | | - Vladimir Tsoy
- Center for Advanced Technologies, Tashkent, Uzbekistan
| | | | | | | | - Ibrokhim Abdurakhmonov
- Center for Advanced Technologies, Tashkent, Uzbekistan
- Center of Genomics and Bioinformatics, Academy of Sciences of Uzbekistan, Qibray Region, Tashkent, Republic of Uzbekistan
| |
Collapse
|
10
|
Weißenborn L, Richel E, Hüseman H, Welzer J, Beck S, Schäfer S, Sticht H, Überla K, Eichler J. Smaller, Stronger, More Stable: Peptide Variants of a SARS-CoV-2 Neutralizing Miniprotein. Int J Mol Sci 2022; 23:ijms23116309. [PMID: 35682988 PMCID: PMC9181698 DOI: 10.3390/ijms23116309] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 05/26/2022] [Accepted: 06/01/2022] [Indexed: 02/01/2023] Open
Abstract
Based on the structure of a de novo designed miniprotein (LCB1) in complex with the receptor binding domain (RBD) of the SARS-CoV-2 spike protein, we have generated and characterized truncated peptide variants of LCB1, which present only two of the three LCB1 helices, and which fully retained the virus neutralizing potency against different SARS-CoV-2 variants of concern (VOC). This antiviral activity was even 10-fold stronger for a cyclic variant of the two-helix peptides, as compared to the full-length peptide. Furthermore, the proteolytic stability of the cyclic peptide was substantially improved, rendering it a better potential candidate for SARS-CoV-2 therapy. In a more mechanistic approach, the peptides also served as tools to dissect the role of individual mutations in the RBD for the susceptibility of the resulting virus variants to neutralization by the peptides. As the peptides reported here were generated through chemical synthesis, rather than recombinant protein expression, they are amenable to further chemical modification, including the incorporation of a wide range of non-proteinogenic amino acids, with the aim to further stabilize the peptides against proteolytic degradation, as well as to improve the strength, as well the breadth, of their virus neutralizing capacity.
Collapse
Affiliation(s)
- Lucas Weißenborn
- Department of Chemistry and Pharmacy, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany; (L.W.); (H.H.); (J.W.); (S.B.)
| | - Elie Richel
- Institute for Clinical and Molecular Virology, Universitätsklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany; (E.R.); (K.Ü.)
| | - Helena Hüseman
- Department of Chemistry and Pharmacy, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany; (L.W.); (H.H.); (J.W.); (S.B.)
| | - Julia Welzer
- Department of Chemistry and Pharmacy, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany; (L.W.); (H.H.); (J.W.); (S.B.)
| | - Silvan Beck
- Department of Chemistry and Pharmacy, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany; (L.W.); (H.H.); (J.W.); (S.B.)
| | - Simon Schäfer
- Department of Biology, Genetics Division, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany;
| | - Heinrich Sticht
- Institute of Biochemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany;
| | - Klaus Überla
- Institute for Clinical and Molecular Virology, Universitätsklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany; (E.R.); (K.Ü.)
| | - Jutta Eichler
- Department of Chemistry and Pharmacy, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany; (L.W.); (H.H.); (J.W.); (S.B.)
- Correspondence:
| |
Collapse
|
11
|
Casadio R, Savojardo C, Fariselli P, Capriotti E, Martelli PL. Turning Failures into Applications: The Problem of Protein ΔΔG Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2449:169-185. [PMID: 35507262 DOI: 10.1007/978-1-0716-2095-3_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
After nearly two decades of research in the field of computational methods based on machine learning and knowledge-based potentials for ΔG and ΔΔG prediction upon variations, we now realize that all the approaches are poorly performing when tested on specific cases and that there is large space for improvement. Why this is so? Is it wrong the underlying assumption that experimental protein thermodynamics in solution reflects the thermodynamics of a single protein? Both machine learning and knowledge-based computational methods are rigorous and we know the solid theory behind. We are now in a critical situation, which suggests that predictions of protein instability upon variation should be considered with care. In the following, we will show how to cope with the problem of understanding which protein positions may be of interest for biotechnological and biomedical purposes. By applying a consensus procedure, we indicate possible strategies for the result interpretation.
Collapse
Affiliation(s)
- Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Turin, Italy
| | - Emidio Capriotti
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
12
|
Vila JA. Proteins' Evolution upon Point Mutations. ACS OMEGA 2022; 7:14371-14376. [PMID: 35573218 PMCID: PMC9089682 DOI: 10.1021/acsomega.2c01407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 04/05/2022] [Indexed: 05/03/2023]
Abstract
As the reader must be already aware, state-of-the-art protein folding prediction methods have reached a smashing success in their goal of accurately determining the three-dimensional structures of proteins. Yet, a solution to simple problems such as the effects of protein point mutations on their (i) native conformation; (ii) marginal stability; (iii) ensemble of high-energy nativelike conformations; and (iv) metamorphism propensity and, hence, their evolvability, remains as an unsolved problem. As a plausible solution to the latter, some properties of the amide hydrogen-deuterium exchange, a highly sensitive probe of the structure, stability, and folding of proteins, are assessed from a new perspective. The preliminary results indicate that the protein marginal stability change upon point mutations provides the necessary and sufficient information to estimate, through a Boltzmann factor, the evolution of the amide hydrogen exchange protection factors and, consequently, that of the ensemble of folded conformations coexisting with the native state. This work contributes to our general understanding of the effects of point mutations on proteins and may spur significant progress in our efforts to develop methods to determine the appearance of new folds and functions accurately.
Collapse
|
13
|
Abstract
In-cell structural biology aims at extracting structural information about proteins or nucleic acids in their native, cellular environment. This emerging field holds great promise and is already providing new facts and outlooks of interest at both fundamental and applied levels. NMR spectroscopy has important contributions on this stage: It brings information on a broad variety of nuclei at the atomic scale, which ensures its great versatility and uniqueness. Here, we detail the methods, the fundamental knowledge, and the applications in biomedical engineering related to in-cell structural biology by NMR. We finally propose a brief overview of the main other techniques in the field (EPR, smFRET, cryo-ET, etc.) to draw some advisable developments for in-cell NMR. In the era of large-scale screenings and deep learning, both accurate and qualitative experimental evidence are as essential as ever to understand the interior life of cells. In-cell structural biology by NMR spectroscopy can generate such a knowledge, and it does so at the atomic scale. This review is meant to deliver comprehensive but accessible information, with advanced technical details and reflections on the methods, the nature of the results, and the future of the field.
Collapse
Affiliation(s)
- Francois-Xavier Theillet
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| |
Collapse
|
14
|
Tiberti M, Terkelsen T, Degn K, Beltrame L, Cremers TC, da Piedade I, Di Marco M, Maiani E, Papaleo E. MutateX: an automated pipeline for in silico saturation mutagenesis of protein structures and structural ensembles. Brief Bioinform 2022; 23:6552273. [PMID: 35323860 DOI: 10.1093/bib/bbac074] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/28/2022] [Accepted: 02/16/2022] [Indexed: 12/26/2022] Open
Abstract
Mutations, which result in amino acid substitutions, influence the stability of proteins and their binding to biomolecules. A molecular understanding of the effects of protein mutations is both of biotechnological and medical relevance. Empirical free energy functions that quickly estimate the free energy change upon mutation (ΔΔG) can be exploited for systematic screenings of proteins and protein complexes. In silico saturation mutagenesis can guide the design of new experiments or rationalize the consequences of known mutations. Often software such as FoldX, while fast and reliable, lack the necessary automation features to apply them in a high-throughput manner. We introduce MutateX, a software to automate the prediction of ΔΔGs associated with the systematic mutation of each residue within a protein, or protein complex to all other possible residue types, using the FoldX energy function. MutateX also supports ΔΔG calculations over protein ensembles, upon post-translational modifications and in multimeric assemblies. At the heart of MutateX lies an automated pipeline engine that handles input preparation, parallelization and outputs publication-ready figures. We illustrate the MutateX protocol applied to different case studies. The results of the high-throughput scan provided by our tools can help in different applications, such as the analysis of disease-associated mutations, to complement experimental deep mutational scans, or assist the design of variants for industrial applications. MutateX is a collection of Python tools that relies on open-source libraries. It is available free of charge under the GNU General Public License from https://github.com/ELELAB/mutatex.
Collapse
Affiliation(s)
- Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark
| | - Thilde Terkelsen
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800, Lyngby, Denmark
| | - Ludovica Beltrame
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark
| | - Tycho Canter Cremers
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark
| | - Isabelle da Piedade
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark
| | - Miriam Di Marco
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark
| | - Emiliano Maiani
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark
| | - Elena Papaleo
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark.,Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800, Lyngby, Denmark.,Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
15
|
Pan Q, Nguyen TB, Ascher DB, Pires DEV. Systematic evaluation of computational tools to predict the effects of mutations on protein stability in the absence of experimental structures. Brief Bioinform 2022; 23:bbac025. [PMID: 35189634 PMCID: PMC9155634 DOI: 10.1093/bib/bbac025] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 01/13/2022] [Accepted: 01/30/2022] [Indexed: 12/26/2022] Open
Abstract
Changes in protein sequence can have dramatic effects on how proteins fold, their stability and dynamics. Over the last 20 years, pioneering methods have been developed to try to estimate the effects of missense mutations on protein stability, leveraging growing availability of protein 3D structures. These, however, have been developed and validated using experimentally derived structures and biophysical measurements. A large proportion of protein structures remain to be experimentally elucidated and, while many studies have based their conclusions on predictions made using homology models, there has been no systematic evaluation of the reliability of these tools in the absence of experimental structural data. We have, therefore, systematically investigated the performance and robustness of ten widely used structural methods when presented with homology models built using templates at a range of sequence identity levels (from 15% to 95%) and contrasted performance with sequence-based tools, as a baseline. We found there is indeed performance deterioration on homology models built using templates with sequence identity below 40%, where sequence-based tools might become preferable. This was most marked for mutations in solvent exposed residues and stabilizing mutations. As structure prediction tools improve, the reliability of these predictors is expected to follow, however we strongly suggest that these factors should be taken into consideration when interpreting results from structure-based predictors of mutation effects on protein stability.
Collapse
Affiliation(s)
- Qisheng Pan
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
| | - Thanh Binh Nguyen
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
- Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, UK
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria 3053, Australia
| |
Collapse
|
16
|
Baek KT, Kepp KP. Data set and fitting dependencies when estimating protein mutant stability: Toward simple, balanced, and interpretable models. J Comput Chem 2022; 43:504-518. [PMID: 35040492 DOI: 10.1002/jcc.26810] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 12/13/2021] [Accepted: 01/03/2022] [Indexed: 12/27/2022]
Abstract
Accurate prediction of protein stability changes upon mutation (ΔΔG) is increasingly important to evolution studies, protein engineering, and screening of disease-causing gene variants but is challenged by biases in training data. We investigated 45 linear regression models trained on data sets that account systematically for destabilization bias and mutation-type bias BM . The models were externally validated on three test data sets probing different pathologies and for internal consistency (symmetry and neutrality). Model structure and performance substantially depended on training data and even fitting method. We developed two final models: SimBa-IB for typical natural mutations and SimBa-SYM for situations where stabilizing and destabilizing mutations occur to a similar extent. SimBa-SYM, despite is simplicity, is essentially non-biased (vs. the Ssym data set) while still performing well for all data sets (R ~ 0.46-0.54, MAE = 1.16-1.24 kcal/mol). The simple models provide advantage in terms of interpretability, use and future improvement, and are freely available on GitHub.
Collapse
Affiliation(s)
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
17
|
Abstract
The spike protein (S-protein) of SARS-CoV-2, the protein that enables the virus to infect human cells, is the basis for many vaccines and a hotspot of concerning virus evolution. Here, we discuss the outstanding progress in structural characterization of the S-protein and how these structures facilitate analysis of virus function and evolution. We emphasize the differences in reported structures and that analysis of structure-function relationships is sensitive to the structure used. We show that the average residue solvent exposure in nearly complete structures is a good descriptor of open vs closed conformation states. Because of structural heterogeneity of functionally important surface-exposed residues, we recommend using averages of a group of high-quality protein structures rather than a single structure before reaching conclusions on specific structure-function relationships. To illustrate these points, we analyze some significant chemical tendencies of prominent S-protein mutations in the context of the available structures. In the discussion of new variants, we emphasize the selectivity of binding to ACE2 vs prominent antibodies rather than simply the antibody escape or ACE2 affinity separately. We note that larger chemical changes, in particular increased electrostatic charge or side-chain volume of exposed surface residues, are recurring in mutations of concern, plausibly related to adaptation to the negative surface potential of human ACE2. We also find indications that the fixated mutations of the S-protein in the main variants are less destabilizing than would be expected on average, possibly pointing toward a selection pressure on the S-protein. The richness of available structures for all of these situations provides an enormously valuable basis for future research into these structure-function relationships.
Collapse
Affiliation(s)
- Rukmankesh Mehra
- Department of Chemistry, Indian Institute
of Technology Bhilai, Sejbahar, Raipur 492015, Chhattisgarh,
India
| | - Kasper P. Kepp
- DTU Chemistry, Technical University of
Denmark, Building 206, 2800 Kongens Lyngby,
Denmark
| |
Collapse
|
18
|
Pancotti C, Benevenuta S, Birolo G, Alberini V, Repetto V, Sanavia T, Capriotti E, Fariselli P. Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset. Brief Bioinform 2022; 23:6502552. [PMID: 35021190 PMCID: PMC8921618 DOI: 10.1093/bib/bbab555] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 11/29/2021] [Accepted: 12/05/2021] [Indexed: 12/13/2022] Open
Abstract
Predicting the difference in thermodynamic stability between protein variants is crucial for protein design and understanding the genotype-phenotype relationships. So far, several computational tools have been created to address this task. Nevertheless, most of them have been trained or optimized on the same and ‘all’ available data, making a fair comparison unfeasible. Here, we introduce a novel dataset, collected and manually cleaned from the latest version of the ThermoMutDB database, consisting of 669 variants not included in the most widely used training datasets. The prediction performance and the ability to satisfy the antisymmetry property by considering both direct and reverse variants were evaluated across 21 different tools. The Pearson correlations of the tested tools were in the ranges of 0.21–0.5 and 0–0.45 for the direct and reverse variants, respectively. When both direct and reverse variants are considered, the antisymmetric methods perform better achieving a Pearson correlation in the range of 0.51–0.62. The tested methods seem relatively insensitive to the physiological conditions, performing well also on the variants measured with more extreme pH and temperature values. A common issue with all the tested methods is the compression of the \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$\Delta \Delta G$\end{document} predictions toward zero. Furthermore, the thermodynamic stability of the most significantly stabilizing variants was found to be more challenging to predict. This study is the most extensive comparisons of prediction methods using an entirely novel set of variants never tested before.
Collapse
Affiliation(s)
- Corrado Pancotti
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Silvia Benevenuta
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Virginia Alberini
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Valeria Repetto
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Tiziana Sanavia
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| |
Collapse
|
19
|
Samaga YBL, Raghunathan S, Priyakumar UD. SCONES: Self-Consistent Neural Network for Protein Stability Prediction Upon Mutation. J Phys Chem B 2021; 125:10657-10671. [PMID: 34546056 DOI: 10.1021/acs.jpcb.1c04913] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Engineering proteins to have desired properties by mutating amino acids at specific sites is commonplace. Such engineered proteins must be stable to function. Experimental methods used to determine stability at throughputs required to scan the protein sequence space thoroughly are laborious. To this end, many machine learning based methods have been developed to predict thermodynamic stability changes upon mutation. These methods have been evaluated for symmetric consistency by testing with hypothetical reverse mutations. In this work, we propose transitive data augmentation, evaluating transitive consistency with our new Stransitive data set, and a new machine learning based method, the first of its kind, that incorporates both symmetric and transitive properties into the architecture. Our method, called SCONES, is an interpretable neural network that predicts small relative protein stability changes for missense mutations that do not significantly alter the structure. It estimates a residue's contributions toward protein stability (ΔG) in its local structural environment, and the difference between independently predicted contributions of the reference and mutant residues is reported as ΔΔG. We show that this self-consistent machine learning architecture is immune to many common biases in data sets, relies less on data than existing methods, is robust to overfitting, and can explain a substantial portion of the variance in experimental data.
Collapse
Affiliation(s)
- Yashas B L Samaga
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| | - Shampa Raghunathan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| |
Collapse
|
20
|
Caldararu O, Blundell TL, Kepp KP. Three Simple Properties Explain Protein Stability Change upon Mutation. J Chem Inf Model 2021; 61:1981-1988. [PMID: 33848149 DOI: 10.1021/acs.jcim.1c00201] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Accurate prediction of protein stability upon mutation enables rational engineering of new proteins and insights into protein evolution and monogenetic diseases caused by single-point amino acid substitutions. Many tools have been developed to this aim, ranging from energy-based models to machine-learning methods that use large amounts of experimental data. However, as the methods become more complex, the interpretation of the chemistry underlying the protein stability effects becomes obscure. It is thus of interest to identify the simplest prediction model that retains complete amino acid specific interpretation; for a given number of input descriptors, we expect such a model to be almost universal. In this study, we identify such a limiting model, SimBa, a simple multilinear regression model trained on a substitution-type-balanced experimental data set. The model accounts only for the solvent accessibility of the site, volume difference, and polarity difference caused by mutation. Our results show that this very simple and directly applicable model performs comparably to other much more complex, widely used protein stability prediction methods. This suggests that a hard limit of ∼1 kcal/mol numerical accuracy and an R ∼ 0.5 trend accuracy exists and that new features, such as account of unfolded states, water colocalization, and amino acid correlations, are required to improve accuracy to, e.g., 1/2 kcal/mol.
Collapse
Affiliation(s)
- Octav Caldararu
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kgs. Lyngby, Denmark
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, United Kingdom
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|