1
|
Zalewski M, Iglesias V, Bárcenas O, Ventura S, Kmiecik S. Aggrescan4D: A comprehensive tool for pH-dependent analysis and engineering of protein aggregation propensity. Protein Sci 2024; 33:e5180. [PMID: 39324697 PMCID: PMC11425640 DOI: 10.1002/pro.5180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 09/03/2024] [Accepted: 09/06/2024] [Indexed: 09/27/2024]
Abstract
Aggrescan4D (A4D) is an advanced computational tool designed for predicting protein aggregation, leveraging structural information and the influence of pH. Building upon its predecessor, Aggrescan3D (A3D), A4D has undergone numerous enhancements aimed at assisting the improvement of protein solubility. This manuscript reviews A4D's updated functionalities and explains the fundamental principles behind its pH-dependent calculations. Additionally, it presents an antibody case study to evaluate its performance in comparison with other structure-based predictors. Notably, A4D integrates advanced protein engineering protocols with pH-dependent calculations, enhancing its utility in advising solubility-enhancing mutations. A4D considers the impact of structural flexibility on aggregation propensities, and includes a large set of precalculated predictions. These capabilities should help to open new avenues for both understanding and managing protein aggregation. A4D is accessible through a dedicated web server at https://biocomp.chem.uw.edu.pl/a4d/.
Collapse
Affiliation(s)
- Mateusz Zalewski
- Faculty of Chemistry, Biological and Chemical Research CenterUniversity of WarsawWarsawPoland
| | - Valentin Iglesias
- Departament de Bioquímica i Biologia Molecular, Institut de Biotecnologia i de BiomedicinaUniversitat Autònoma de BarcelonaBarcelonaSpain
- Clinical Research CentreMedical University of BiałystokBiałystokPoland
| | - Oriol Bárcenas
- Departament de Bioquímica i Biologia Molecular, Institut de Biotecnologia i de BiomedicinaUniversitat Autònoma de BarcelonaBarcelonaSpain
- Institute of Advanced Chemistry of Catalonia (IQAC), CSICBarcelonaSpain
| | - Salvador Ventura
- Departament de Bioquímica i Biologia Molecular, Institut de Biotecnologia i de BiomedicinaUniversitat Autònoma de BarcelonaBarcelonaSpain
- Hospital Universitari Parc Taulí, Institut d'Investigació i Innovació Parc Taulí (I3PT‐CERCA)Universitat Autònoma de BarcelonaSabadellSpain
| | - Sebastian Kmiecik
- Faculty of Chemistry, Biological and Chemical Research CenterUniversity of WarsawWarsawPoland
| |
Collapse
|
2
|
Urquhart RJ, van Teijlingen A, Tuttle T. ANI neural network potentials for small molecule p Ka prediction. Phys Chem Chem Phys 2024; 26:23934-23943. [PMID: 39235138 DOI: 10.1039/d4cp01982b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
The pKa value of a molecule is of interest to chemists across a broad spectrum of fields including pharmacology, environmental chemistry and theoretical chemistry. Determination of pKa values can be accomplished through several experimental methods such as NMR techniques and titration together with computational techniques such as DFT calculations. However, all of these methods remain time consuming and computationally expensive. In this work we develop a method for the rapid calculation of pKa values of small molecules which utilises a combination of neural network potentials, low energy conformer searches and thermodynamic cycles. We show that neural network potentials trained on different phase and charge states can be employed in tandem to predict the full thermodynamic energy cycle of molecules. Focusing here on imidazolium derived carbene species, the method utilised can easily be extended to other functional groups of interest such as amines with further training.
Collapse
Affiliation(s)
- Ross James Urquhart
- Department of Pure and Applied Chemistry, University of Strathclyde, 295 Cathedral Street, Glasgow, G1 1XL, UK.
| | - Alexander van Teijlingen
- Department of Pure and Applied Chemistry, University of Strathclyde, 295 Cathedral Street, Glasgow, G1 1XL, UK.
| | - Tell Tuttle
- Department of Pure and Applied Chemistry, University of Strathclyde, 295 Cathedral Street, Glasgow, G1 1XL, UK.
| |
Collapse
|
3
|
Pervushin NV, Nilov DK, Pushkarev SV, Shipunova VO, Badlaeva AS, Yapryntseva MA, Kopytova DV, Zhivotovsky B, Kopeina GS. BH3-mimetics or DNA-damaging agents in combination with RG7388 overcome p53 mutation-induced resistance to MDM2 inhibition. Apoptosis 2024:10.1007/s10495-024-02014-8. [PMID: 39222276 DOI: 10.1007/s10495-024-02014-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2024] [Indexed: 09/04/2024]
Abstract
The development of drug resistance reduces the efficacy of cancer therapy. Tumor cells can acquire resistance to MDM2 inhibitors, which are currently under clinical evaluation. We generated RG7388-resistant neuroblastoma cells, which became more proliferative and metabolically active and were less sensitive to DNA-damaging agents in vitro and in vivo, compared with wild-type cells. The resistance was associated with a mutation of the p53 protein (His193Arg). This mutation abated its transcriptional activity via destabilization of the tetrameric p53-DNA complex and was observed in many cancer types. Finally, we found that Cisplatin and various BH3-mimetics could enhance RG7388-mediated apoptosis in RG7388-resistant neuroblastoma cells, thereby partially overcoming resistance to MDM2 inhibition.
Collapse
Affiliation(s)
- N V Pervushin
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia
- Faculty of Medicine, MV Lomonosov Moscow State University, Moscow, 119991, Russia
| | - D K Nilov
- Belozersky Institute of Physicochemical Biology, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - S V Pushkarev
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, 119234, Russia
| | - V O Shipunova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, 117997, Russia
- Moscow Center for Advanced Studies, Moscow, 123592, Russia
| | - A S Badlaeva
- Kulakov National Medical Research Center for Obstetrics, Gynecology and Perinatology, Russian Ministry of Health, Moscow, 117513, Russia
| | - M A Yapryntseva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia
- Faculty of Medicine, MV Lomonosov Moscow State University, Moscow, 119991, Russia
| | - D V Kopytova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia
| | - B Zhivotovsky
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia.
- Faculty of Medicine, MV Lomonosov Moscow State University, Moscow, 119991, Russia.
- Division of Toxicology, Institute of Environmental Medicine, Karolinska Institutet, Box 210, 17177, Stockholm, Sweden.
| | - G S Kopeina
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia.
- Faculty of Medicine, MV Lomonosov Moscow State University, Moscow, 119991, Russia.
| |
Collapse
|
4
|
Bárcenas O, Kuriata A, Zalewski M, Iglesias V, Pintado-Grima C, Firlik G, Burdukiewicz M, Kmiecik S, Ventura S. Aggrescan4D: structure-informed analysis of pH-dependent protein aggregation. Nucleic Acids Res 2024; 52:W170-W175. [PMID: 38738618 PMCID: PMC11223845 DOI: 10.1093/nar/gkae382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/11/2024] [Accepted: 04/29/2024] [Indexed: 05/14/2024] Open
Abstract
Protein aggregation is behind the genesis of incurable diseases and imposes constraints on drug discovery and the industrial production and formulation of proteins. Over the years, we have been advancing the Aggresscan3D (A3D) method, aiming to deepen our comprehension of protein aggregation and assist the engineering of protein solubility. Since its inception, A3D has become one of the most popular structure-based aggregation predictors because of its performance, modular functionalities, RESTful service for extensive screenings, and intuitive user interface. Building on this foundation, we introduce Aggrescan4D (A4D), significantly extending A3D's functionality. A4D is aimed at predicting the pH-dependent aggregation of protein structures, and features an evolutionary-informed automatic mutation protocol to engineer protein solubility without compromising structure and stability. It also integrates precalculated results for the nearly 500,000 jobs in the A3D Model Organisms Database and structure retrieval from the AlphaFold database. Globally, A4D constitutes a comprehensive tool for understanding, predicting, and designing solutions for specific protein aggregation challenges. The A4D web server and extensive documentation are available at https://biocomp.chem.uw.edu.pl/a4d/. This website is free and open to all users without a login requirement.
Collapse
Affiliation(s)
- Oriol Bárcenas
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Aleksander Kuriata
- Biological and Chemical Research Center, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Mateusz Zalewski
- Biological and Chemical Research Center, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Valentín Iglesias
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
- Clinical Research Centre, Medical University of Białystok, Kilińskiego 1, 15-369 Białystok, Poland
| | - Carlos Pintado-Grima
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Grzegorz Firlik
- Biological and Chemical Research Center, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Michał Burdukiewicz
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
- Clinical Research Centre, Medical University of Białystok, Kilińskiego 1, 15-369 Białystok, Poland
| | - Sebastian Kmiecik
- Biological and Chemical Research Center, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Salvador Ventura
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| |
Collapse
|
5
|
Liu S, Yang Q, Zhang L, Luo S. Accurate Protein p Ka Prediction with Physical Organic Chemistry Guided 3D Protein Representation. J Chem Inf Model 2024; 64:4410-4418. [PMID: 38780156 DOI: 10.1021/acs.jcim.4c00354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Protein pKa is a fundamental physicochemical parameter that dictates protein structure and function. However, accurately determining protein site-pKa values remains a substantial challenge, both experimentally and theoretically. In this study, we introduce a physical organic approach, leveraging a protein structural and physical-organic-parameter-based representation (P-SPOC), to develop a rapid and intuitive model for protein pKa prediction. Our P-SPOC model achieves state-of-the-art predictive accuracy, with a mean absolute error (MAE) of 0.33 pKa units. Furthermore, we have incorporated advanced protein structure prediction models, like AlphaFold2, to approximate structures for proteins lacking three-dimensional representations, which enhances the applicability of our model in the context of structure-undetermined protein research. To promote broader accessibility within the research community, an online prediction interface was also established at isyn.luoszgroup.com.
Collapse
Affiliation(s)
- Siyuan Liu
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Qi Yang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Long Zhang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Sanzhong Luo
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| |
Collapse
|
6
|
Cai Z, Peng H, Sun S, He J, Luo F, Huang Y. DeepKa Web Server: High-Throughput Protein p Ka Prediction. J Chem Inf Model 2024; 64:2933-2940. [PMID: 38530291 DOI: 10.1021/acs.jcim.3c02013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
DeepKa is a deep-learning-based protein pKa predictor proposed in our previous work. In this study, a web server was developed that enables online protein pKa prediction driven by DeepKa. The web server provides a user-friendly interface where a single step of entering a valid PDB code or uploading a PDB format file is required to submit a job. Two case studies have been attached in order to explain how pKa's calculated by the web server could be utilized by users. Finally, combining the web server with post processing as described in case studies, this work suggests a quick workflow of investigating the relationship between protein structure and function that are pH dependent. The web server of DeepKa is freely available at http://www.computbiophys.com/DeepKa/main.
Collapse
Affiliation(s)
- Zhitao Cai
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Hao Peng
- National Pilot School of Software, Yunnan University, Kunming 650504, China
| | - Shuo Sun
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Jiahao He
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Fangfang Luo
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Yandong Huang
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| |
Collapse
|
7
|
Buslaev P, Groenhof G. gmXtal: Cooking Crystals with GROMACS. Protein J 2024; 43:200-206. [PMID: 37620609 PMCID: PMC11058868 DOI: 10.1007/s10930-023-10141-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/23/2023] [Indexed: 08/26/2023]
Abstract
Molecular dynamics (MD) simulations are routinely performed of biomolecules in solution, because this is their native environment. However, the structures used in such simulations are often obtained with X-ray crystallography, which provides the atomic coordinates of the biomolecule in a crystal environment. With the advent of free electron lasers and time-resolved techniques, X-ray crystallography can now also access metastable states that are intermediates in a biochemical process. Such experiments provide additional data, which can be used, for example, to optimize MD force fields. Doing so requires that the simulation of the biomolecule is also performed in the crystal environment. However, in contrast to simulations of biomolecules in solution, setting up a crystal is challenging. In particular, because not all solvent molecules are resolved in X-ray crystallography, adding a suitable number of solvent molecules, such that the properties of the crystallographic unit cell are preserved in the simulation, can be difficult and typically is a trial-and-error based procedure requiring manual interventions. Such interventions preclude high throughput applications. To overcome this bottleneck, we introduce gmXtal, a tool for setting up crystal simulations for MD simulations with GROMACS. With the information from the protein data bank (rcsb.org) gmXtal automatically (i) builds the crystallographic unit cell; (ii) sets the protonation of titratable residues; (iii) builds missing residues that were not resolved experimentally; and (iv) adds an appropriate number of solvent molecules to the system. gmXtal is available as a standalone tool https://gitlab.com/pbuslaev/gmxtal .
Collapse
Affiliation(s)
- Pavel Buslaev
- Department of Chemistry and Nanoscience Center, University of Jyväskylä, 40014, Jyväskylä, Finland.
| | - Gerrit Groenhof
- Department of Chemistry and Nanoscience Center, University of Jyväskylä, 40014, Jyväskylä, Finland.
| |
Collapse
|
8
|
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 2024; 23:141-155. [PMID: 38066301 DOI: 10.1038/s41573-023-00832-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 02/08/2024]
Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- University of British Columbia, Vancouver, BC, Canada.
- Photonic Inc., Coquitlam, BC, Canada.
| |
Collapse
|
9
|
Wilson C, Karttunen M, de Groot BL, Gapsys V. Accurately Predicting Protein p Ka Values Using Nonequilibrium Alchemy. J Chem Theory Comput 2023; 19:7833-7845. [PMID: 37820376 PMCID: PMC10653114 DOI: 10.1021/acs.jctc.3c00721] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Indexed: 10/13/2023]
Abstract
The stability, solubility, and function of a protein depend on both its net charge and the protonation states of its individual residues. pKa is a measure of the tendency for a given residue to (de)protonate at a specific pH. Although pKa values can be resolved experimentally, theory and computation provide a compelling alternative. To this end, we assess the applicability of a nonequilibrium (NEQ) alchemical free energy method to the problem of pKa prediction. On a data set of 144 residues that span 13 proteins, we report an average unsigned error of 0.77 ± 0.09, 0.69 ± 0.09, and 0.52 ± 0.04 pK for aspartate, glutamate, and lysine, respectively. This is comparable to current state-of-the-art predictors and the accuracy recently reached using free energy perturbation methods (e.g., FEP+). Moreover, we demonstrate that our open-source, pmx-based approach can accurately resolve the pKa values of coupled residues and observe a substantial performance disparity associated with the lysine partial charges in Amber14SB/Amber99SB*-ILDN, for which an underused fix already exists.
Collapse
Affiliation(s)
- Carter
J. Wilson
- Department
of Mathematics, The University of Western
Ontario, N6A 5B7 London, Canada
- Centre
for Advanced Materials and Biomaterials Research (CAMBR), The University of Western Ontario, N6A 5B7 London, Canada
| | - Mikko Karttunen
- Centre
for Advanced Materials and Biomaterials Research (CAMBR), The University of Western Ontario, N6A 5B7 London, Canada
- Department
of Physics & Astronomy, The University
of Western Ontario, N6A
5B7 London, Canada
- Department
of Chemistry, The University of Western
Ontario, N6A 5B7 London, Canada
| | - Bert L. de Groot
- Computational
Biomolecular Dynamics Group, Department of Theoretical and Computational
Biophysics, Max Planck Institute for Multidisciplinary
Sciences, 37077 Göttingen, Germany
| | - Vytautas Gapsys
- Computational
Biomolecular Dynamics Group, Department of Theoretical and Computational
Biophysics, Max Planck Institute for Multidisciplinary
Sciences, 37077 Göttingen, Germany
- Computational
Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., Turnhoutseweg 30, B-2340 Beerse, Belgium
| |
Collapse
|
10
|
Liu Z, Moroz YS, Isayev O. The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions. Chem Sci 2023; 14:10835-10846. [PMID: 37829036 PMCID: PMC10566507 DOI: 10.1039/d3sc03902a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 09/12/2023] [Indexed: 10/14/2023] Open
Abstract
Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis prediction, but current models have failed to generalize to large literature datasets. To understand the causes and inspire future design, we systematically benchmarked the yield prediction task. We carefully curated and augmented a literature dataset of 41 239 amide coupling reactions, each with information on reactants, products, intermediates, yields, and reaction contexts, and provided 3D structures for the molecules. We calculated molecular features related to 2D and 3D structure information, as well as physical and electronic properties. These descriptors were paired with 4 categories of machine learning methods (linear, kernel, ensemble, and neural network), yielding valuable benchmarks about feature and model performance. Despite the excellent performance on a high-throughput experiment (HTE) dataset (R2 around 0.9), no method gave satisfactory results on the literature data. The best performance was an R2 of 0.395 ± 0.020 using the stack technique. Error analysis revealed that reactivity cliff and yield uncertainty are among the main reasons for incorrect predictions. Removing reactivity cliffs and uncertain reactions boosted the R2 to 0.457 ± 0.006. These results highlight that yield prediction models must be sensitive to the reactivity change due to the subtle structure variance, as well as be robust to the uncertainty associated with yield measurements.
Collapse
Affiliation(s)
- Zhen Liu
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University Pittsburgh PA 15213 USA
| | - Yurii S Moroz
- Enamine Ltd Kyïv 02660 Ukraine
- Chemspace LLC Kyïv 02094 Ukraine
- Taras Shevchenko National University of Kyïv Kyïv 01601 Ukraine
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University Pittsburgh PA 15213 USA
| |
Collapse
|
11
|
Wei W, Hogues H, Sulea T. Comparative Performance of High-Throughput Methods for Protein p Ka Predictions. J Chem Inf Model 2023; 63:5169-5181. [PMID: 37549424 PMCID: PMC10466379 DOI: 10.1021/acs.jcim.3c00165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Indexed: 08/09/2023]
Abstract
The medically relevant field of protein-based therapeutics has triggered a demand for protein engineering in different pH environments of biological relevance. In silico engineering workflows typically employ high-throughput screening campaigns that require evaluating large sets of protein residues and point mutations by fast yet accurate computational algorithms. While several high-throughput pKa prediction methods exist, their accuracies are unclear due to the lack of a current comprehensive benchmarking. Here, seven fast, efficient, and accessible approaches including PROPKA3, DeepKa, PKAI, PKAI+, DelPhiPKa, MCCE2, and H++ were systematically tested on a nonredundant subset of 408 measured protein residue pKa shifts from the pKa database (PKAD). While no method outperformed the null hypotheses with confidence, as illustrated by statistical bootstrapping, DeepKa, PKAI+, PROPKA3, and H++ had utility. More specifically, DeepKa consistently performed well in tests across multiple and individual amino acid residue types, as reflected by lower errors, higher correlations, and improved classifications. Arithmetic averaging of the best empirical predictors into simple consensuses improved overall transferability and accuracy up to a root-mean-square error of 0.76 pKa units and a correlation coefficient (R2) of 0.45 to experimental pKa shifts. This analysis should provide a basis for further methodological developments and guide future applications, which require embedding of computationally inexpensive pKa prediction methods, such as the optimization of antibodies for pH-dependent antigen binding.
Collapse
Affiliation(s)
- Wanlei Wei
- Human Health Therapeutics
Research Centre, National Research Council
Canada, 6100 Royalmount Avenue, Montreal, Quebec H4P 2R2, Canada
| | - Hervé Hogues
- Human Health Therapeutics
Research Centre, National Research Council
Canada, 6100 Royalmount Avenue, Montreal, Quebec H4P 2R2, Canada
| | - Traian Sulea
- Human Health Therapeutics
Research Centre, National Research Council
Canada, 6100 Royalmount Avenue, Montreal, Quebec H4P 2R2, Canada
| |
Collapse
|
12
|
Cai Z, Liu T, Lin Q, He J, Lei X, Luo F, Huang Y. Basis for Accurate Protein p Ka Prediction with Machine Learning. J Chem Inf Model 2023; 63:2936-2947. [PMID: 37146199 DOI: 10.1021/acs.jcim.3c00254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
pH regulates protein structures and the associated functions in many biological processes via protonation and deprotonation of ionizable side chains where the titration equilibria are determined by pKa's. To accelerate pH-dependent molecular mechanism research in the life sciences or industrial protein and drug designs, fast and accurate pKa prediction is crucial. Here we present a theoretical pKa data set PHMD549, which was successfully applied to four distinct machine learning methods, including DeepKa, which was proposed in our previous work. To reach a valid comparison, EXP67S was selected as the test set. Encouragingly, DeepKa was improved significantly and outperforms other state-of-the-art methods, except for the constant-pH molecular dynamics, which was utilized to create PHMD549. More importantly, DeepKa reproduced experimental pKa orders of acidic dyads in five enzyme catalytic sites. Apart from structural proteins, DeepKa was found applicable to intrinsically disordered peptides. Further, in combination with solvent exposures, it is revealed that DeepKa offers the most accurate prediction under the challenging circumstance that hydrogen bonding or salt bridge interaction is partly compensated by desolvation for a buried side chain. Finally, our benchmark data qualify PHMD549 and EXP67S as the basis for future developments of protein pKa prediction tools driven by artificial intelligence. In addition, DeepKa built on PHMD549 has been proven an efficient protein pKa predictor and thus can be applied immediately to, for example, pKa database construction, protein design, drug discovery, and so on.
Collapse
Affiliation(s)
- Zhitao Cai
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Tengzi Liu
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Qiaoling Lin
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Jiahao He
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Xiaowei Lei
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Fangfang Luo
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Yandong Huang
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| |
Collapse
|
13
|
Awoonor-Williams E, Golosov AA, Hornak V. Benchmarking In Silico Tools for Cysteine p Ka Prediction. J Chem Inf Model 2023; 63:2170-2180. [PMID: 36996330 DOI: 10.1021/acs.jcim.3c00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/01/2023]
Abstract
Accurate estimation of the pKa's of cysteine residues in proteins could inform targeted approaches in hit discovery. The pKa of a targetable cysteine residue in a disease-related protein is an important physiochemical parameter in covalent drug discovery, as it influences the fraction of nucleophilic thiolate amenable to chemical protein modification. Traditional structure-based in silico tools are limited in their predictive accuracy of cysteine pKa's relative to other titratable residues. Additionally, there are limited comprehensive benchmark assessments for cysteine pKa predictive tools. This raises the need for extensive assessment and evaluation of methods for cysteine pKa prediction. Here, we report the performance of several computational pKa methods, including single-structure and ensemble-based approaches, on a diverse test set of experimental cysteine pKa's retrieved from the PKAD database. The dataset consisted of 16 wildtype and 10 mutant proteins with experimentally measured cysteine pKa values. Our results highlight that these methods are varied in their overall predictive accuracies. Among the test set of wildtype proteins evaluated, the best method (MOE) yielded a mean absolute error of 2.3 pK units, highlighting the need for improvement of existing pKa methods for accurate cysteine pKa estimation. Given the limited accuracy of these methods, further development is needed before these approaches can be routinely employed to drive design decisions in early drug discovery efforts.
Collapse
Affiliation(s)
- Ernest Awoonor-Williams
- Novartis Institutes for BioMedical Research, 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Andrei A Golosov
- Novartis Institutes for BioMedical Research, 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Viktor Hornak
- Novartis Institutes for BioMedical Research, 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
14
|
Corredor JD, Febres-Molina C, Jaña GA, Jiménez VA. Insight into the Role of Active Site Protonation States and Water Molecules in the Catalytic Inhibition of DPP4 by Vildagliptin. J Chem Inf Model 2023; 63:1338-1350. [PMID: 36757339 DOI: 10.1021/acs.jcim.2c01558] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Vildagliptin (VIL) is an antidiabetic drug that inhibits dipeptidyl peptidase-4 (DPP4) through a covalent mechanism. The molecular bases for this inhibitory process have been addressed experimentally and computationally. Nevertheless, relevant issues remain unknown such as the roles of active site protonation states and conserved water molecules nearby the catalytic center. In this work, molecular dynamics simulations were applied to examine the structures of 12 noncovalent VIL-DPP4 complexes encompassing all possible protonation states of three noncatalytic residues (His126, Asp663, Asp709) that were inconclusively predicted by different computational tools. A catalytically competent complex structure was only achieved in the system with His126 in its ε-form and nonconventional neutral states for Asp663/Asp709. This complex suggested the involvement of one water molecule in the catalytic process of His740/Ser630 activation, which was confirmed by QM/MM simulations. Our findings support the suitability of a novel water-mediated mechanism in which His740/Ser630 activation occurs concertedly with the nucleophilic attack on VIL and the imidate protonation by Tyr547. Then, the restoration of His740/ Tyr547 protonation states occurs via a two-water hydrogen bonding network in a low-barrier process, thus describing the final step of the catalytic cycle for the first time. Additionally, two hydrolytic mechanisms were proposed based on the hydrogen bonding networks formed by water molecules and the catalytic residues along the inhibitory mechanism. These findings are valuable to unveil the molecular features of the covalent inhibition of DPP4 by VIL and support the future development of novel derivatives with improved structural or mechanistic profiles.
Collapse
Affiliation(s)
- Jeisson D Corredor
- Doctorado en Fisicoquímica Molecular, Facultad de Ciencias Exactas, Universidad Andres Bello, República 275, Santiago 8370146, Chile
| | - Camilo Febres-Molina
- Doctorado en Fisicoquímica Molecular, Facultad de Ciencias Exactas, Universidad Andres Bello, República 275, Santiago 8370146, Chile
| | - Gonzalo A Jaña
- Departamento de Ciencias Químicas, Facultad de Ciencias Exactas, Universidad Andres Bello, Autopista Concepción-Talcahuano 7100, Talcahuano 4260000, Chile
| | - Verónica A Jiménez
- Departamento de Ciencias Químicas, Facultad de Ciencias Exactas, Universidad Andres Bello, Autopista Concepción-Talcahuano 7100, Talcahuano 4260000, Chile
| |
Collapse
|
15
|
Reis PBPS, Bertolini M, Montanari F, Rocchia W, Machuqueiro M, Clevert DA. A Fast and Interpretable Deep Learning Approach for Accurate Electrostatics-Driven p Ka Predictions in Proteins. J Chem Theory Comput 2022; 18:5068-5078. [PMID: 35837736 DOI: 10.1021/acs.jctc.2c00308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Existing computational methods for estimating pKa values in proteins rely on theoretical approximations and lengthy computations. In this work, we use a data set of 6 million theoretically determined pKa shifts to train deep learning models, which are shown to rival the physics-based predictors. These neural networks managed to infer the electrostatic contributions of different chemical groups and learned the importance of solvent exposure and close interactions, including hydrogen bonds. Although trained only using theoretical data, our pKAI+ model displayed the best accuracy in a test set of ∼750 experimental values. Inference times allow speedups of more than 1000× compared to physics-based methods. By combining speed, accuracy, and a reasonable understanding of the underlying physics, our models provide a game-changing solution for fast estimations of macroscopic pKa values from ensembles of microscopic values as well as for many downstream applications such as molecular docking and constant-pH molecular dynamics simulations.
Collapse
Affiliation(s)
| | - Marco Bertolini
- Machine Learning Research, Bayer A.G., Berlin 13353, Germany
| | | | - Walter Rocchia
- CONCEPT Lab, Istituto Italiano di Tecnologia (IIT), Via Melen 83, B Block, Genoa 16152, Italy
| | - Miguel Machuqueiro
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| | | |
Collapse
|
16
|
Gokcan H, Bedoyan JK, Isayev O. Simulations of Pathogenic E1α Variants: Allostery and Impact on Pyruvate Dehydrogenase Complex-E1 Structure and Function. J Chem Inf Model 2022; 62:3463-3475. [PMID: 35797142 DOI: 10.1021/acs.jcim.2c00630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Pyruvate dehydrogenase complex (PDC) deficiency is a major cause of primary lactic acidemia resulting in high morbidity and mortality, with limited therapeutic options. The E1 component of the mitochondrial multienzyme PDC (PDC-E1) is a symmetric dimer of heterodimers (αβ/α'β') encoded by the PDHA1 and PDHB genes, with two symmetric active sites each consisting of highly conserved phosphorylation loops A and B. PDHA1 mutations are responsible for 82-88% of cases. Greater than 85% of E1α residues with disease-causing missense mutations (DMMs) are solvent-inaccessible, with ∼30% among those involved in subunit-subunit interface contact (SSIC). We performed molecular dynamics simulations of wild-type (WT) PDC-E1 and E1 variants with E1α DMMs at R349 and W185 (residues involved in SSIC), to investigate their impact on human PDC-E1 structure. We evaluated the change in E1 structure and dynamics and examined their implications on E1 function with the specific DMMs. We found that the dynamics of phosphorylation Loop A, which is crucial for E1 biological activity, changes with DMMs that are at least about 15 Å away. Because communication is essential for PDC-E1 activity (with alternating active sites), we also investigated the possible communication network within WT PDC-E1 via centrality analysis. We observed that DMMs altered/disrupted the communication network of PDC-E1. Collectively, these results indicate allosteric effect in PDC-E1, with implications for the development of novel small-molecule therapeutics for specific recurrent E1α DMMs such as replacements of R349 responsible for ∼10% of PDC deficiency due to E1α DMMs.
Collapse
Affiliation(s)
- Hatice Gokcan
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Jirair K Bedoyan
- Division of Genetic and Genomic Medicine, UPMC Children's Hospital of Pittsburgh, Pittsburgh, Pennsylvania 15224, United States.,Department of Pediatrics, University of Pittsburgh, Pittsburgh, Pennsylvania 15219, United States
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|