1
|
Bondarchuk T, Zhuravel E, Shyshlyk O, Debelyy MO, Pokholenko O, Vaskiv D, Pogribna A, Kuznietsova M, Hrynyshyn Y, Nedialko O, Brovarets V, Zozulya SA. The molecular features of non-peptidic nucleophilic substrates and acceptor proteins determine the efficiency of sortagging. RSC Chem Biol 2025; 6:295-306. [PMID: 39802631 PMCID: PMC11721432 DOI: 10.1039/d4cb00246f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Accepted: 12/19/2024] [Indexed: 01/16/2025] Open
Abstract
Sortase A-mediated ligation (SML) or "sortagging" has become a popular technology to selectively introduce structurally diverse protein modifications. Despite the great progress in the optimization of the reaction conditions and design of miscellaneous C- or N-terminal protein modification strategies, the reported yields of conjugates are highly variable. In this study, we have systematically investigated C-terminal protein sortagging efficiency using a combination of several rationally selected and modified acceptor proteins and a panel of incoming surrogate non-peptidic amine nucleophile substrates varying in the structural features of their amino linker parts and cargo molecules. Our data suggest that the sortagging efficiency is modulated by the combination of molecular features of the incoming nucleophilic substrate, including the ionization properties of the reactive amino group, structural recognition of the nucleophilic amino linker by the enzyme, as well as the molecular nature of the attached payload moiety. Previous reports have confirmed that the steric accessibility of the C-terminal SrtA recognition site in the acceptor protein is also the critical determinant of sortase reaction efficiency. We suggest a computational procedure for simplifying a priori predictions of sortagging outcomes through the structural assessment of the acceptor protein and introduction of a peptide linker, if deemed necessary.
Collapse
Affiliation(s)
- Tetiana Bondarchuk
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
- Department of Structural and Functional Proteomics, Institute of Molecular Biology and Genetics 150 Zabolotnogo Street Kyiv 03680 Ukraine
| | - Elena Zhuravel
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
| | - Oleh Shyshlyk
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry 1 Academician Kukhar Street Kyiv 02094 Ukraine
| | - Mykhaylo O Debelyy
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
| | - Oleksandr Pokholenko
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
- Taras Shevchenko National University of Kyiv, Department of Chemistry 64 Volodymyrska Street Kyiv 01033 Ukraine
| | - Diana Vaskiv
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
| | - Alla Pogribna
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
- Department of Cell Signal Systems, Institute of Molecular Biology and Genetics 150 Zabolotnogo Street Kyiv 03680 Ukraine
| | - Mariana Kuznietsova
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
| | - Yevhenii Hrynyshyn
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
| | - Oleksandr Nedialko
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
- V. N. Karazin Kharkiv National University, 4 Svobody Square Kharkiv 61022 Ukraine
| | - Volodymyr Brovarets
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry 1 Academician Kukhar Street Kyiv 02094 Ukraine
| | - Sergey A Zozulya
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
| |
Collapse
|
2
|
DeCorte J, Brown B, Jeffrey R, Meiler J. Interpretable Deep-Learning p Ka Prediction for Small Molecule Drugs via Atomic Sensitivity Analysis. J Chem Inf Model 2025; 65:101-113. [PMID: 39801290 PMCID: PMC11733947 DOI: 10.1021/acs.jcim.4c01472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2024] [Revised: 12/05/2024] [Accepted: 12/13/2024] [Indexed: 01/18/2025]
Abstract
Machine learning (ML) models now play a crucial role in predicting properties essential to drug development, such as a drug's logscale acid-dissociation constant (pKa). Despite recent architectural advances, these models often generalize poorly to novel compounds due to a scarcity of ground-truth data. Further, these models lack interpretability. To this end, with deliberate molecular embeddings, atomic-resolution information is accessible in chemical structures by observing the model response to atomic perturbations of an input molecule. Here, we present BCL-XpKa, a deep neural network (DNN)-based multitask classifier for pKa prediction that encodes local atomic environments through Mol2D descriptors. BCL-XpKa outputs a discrete distribution for each molecule, which stores the pKa prediction and the model's uncertainty for that molecule. BCL-XpKa generalizes well to novel small molecules. BCL-XpKa performs competitively with modern ML pKa predictors, outperforms several models in generalization tasks, and accurately models the effects of common molecular modifications on a molecule's ionizability. We then leverage BCL-XpKa's granular descriptor set and distribution-centered output through atomic sensitivity analysis (ASA), which decomposes a molecule's predicted pKa value into its respective atomic contributions without model retraining. ASA reveals that BCL-XpKa has implicitly learned high-resolution information about molecular substructures. We further demonstrate ASA's utility in structure preparation for protein-ligand docking by identifying ionization sites in 93.2% and 87.8% of complex small molecule acids and bases. We then applied ASA with BCL-XpKa to identify and optimize the physicochemical liabilities of a recently published KRAS-degrading PROTAC.
Collapse
Affiliation(s)
- Joseph DeCorte
- Department
of Chemical and Physical Biology, Vanderbilt
University, Nashville, Tennessee 37232, United States
- Center
for Structural Biology, Vanderbilt University, Nashville, Tennessee 37232, United States
- Vanderbilt
Medical Scientist Training Program, Vanderbilt University Medical
Center, Vanderbilt University School of
Medicine, Nashville, Tennessee 37232-8725, United States
| | - Benjamin Brown
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37232-8275, United
States
- Center
for Applied AI in Protein Dynamics, Vanderbilt
University, Nashville, Tennessee 37232-8725, United States
| | - Rathmell Jeffrey
- Department
of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee 37232, United States
| | - Jens Meiler
- Department
of Chemical and Physical Biology, Vanderbilt
University, Nashville, Tennessee 37232, United States
- Center
for Structural Biology, Vanderbilt University, Nashville, Tennessee 37232, United States
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37232-8275, United
States
- Center
for Applied AI in Protein Dynamics, Vanderbilt
University, Nashville, Tennessee 37232-8725, United States
- Institute
for Drug Discovery, Leipzig University Medical
School, Leipzig, SAC 04103, Germany
| |
Collapse
|
3
|
Stellnberger SL, Harvey R, Schwingenschlögl-Maisetschläger V, Langer T, Hacker M, Vraka C, Pichler V. Investigating experimental vs. Predicted pK a values for PET radiotracer. Eur J Pharm Biopharm 2024; 203:114430. [PMID: 39103001 DOI: 10.1016/j.ejpb.2024.114430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 07/22/2024] [Accepted: 07/28/2024] [Indexed: 08/07/2024]
Abstract
The prediction of central nervous system (CNS) active pharmaceuticals and radiopharmaceuticals has experienced a boost by the introduction of computational approaches, like blood-brain barrier (BBB) score or CNS multiparameter optimization values. These rely heavily on calculated pKa values and other physicochemical parameters. Despite the inclusion of various physicochemical parameters in online data banks, pKa values are often missing and published experimental pKa values are limited especially for radiopharmaceuticals. This comparative study investigated the discrepancies between predicted and experimental pKa values and their impact on CNS activity prediction scores. The pKa values of 46 substances, including therapeutic drugs and PET imaging radiopharmaceuticals, were measured by means of potentiometry and spectrophotometry. Experimentally obtained pKa values were compared with in silico predictions (Chemicalize/Marvin). The results demonstrate a considerable discrepancy between experimental and in silico values, with linear regression analysis showing intermediate correlation (R2(Marvin) = 0.88, R2(Chemicalize) = 0.82). This indicates that if one requires an accurate pKa value, it is essential to experimentally assess it. This underscores the importance of experimentally determining pKa values for accurate drug design and optimization. The study's data provide a valuable library of reliable experimental pKa values for therapeutic drugs and radiopharmaceuticals, aiding researchers in the field.
Collapse
Affiliation(s)
- Sarah Luise Stellnberger
- Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, Austria; Vienna Doctoral School of Pharmaceutical, Nutritional and Sport Sciences, University of Vienna, Austria
| | - Richard Harvey
- Department of Pharmaceutical Sciences, Division of Pharmaceutical Technology and Biopharmaceutics, Faculty of Life Sciences, University of Vienna, Austria
| | - Verena Schwingenschlögl-Maisetschläger
- Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, Austria; Vienna Doctoral School of Pharmaceutical, Nutritional and Sport Sciences, University of Vienna, Austria
| | - Thierry Langer
- Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, Austria
| | - Marcus Hacker
- Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Vienna, Austria
| | - Chrysoula Vraka
- Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Vienna, Austria; Cancer Research UK Scotland Institute, Glasgow, UK
| | - Verena Pichler
- Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, Austria.
| |
Collapse
|
4
|
Luo W, Zhou G, Zhu Z, Yuan Y, Ke G, Wei Z, Gao Z, Zheng H. Bridging Machine Learning and Thermodynamics for Accurate p K a Prediction. JACS AU 2024; 4:3451-3465. [PMID: 39328749 PMCID: PMC11423309 DOI: 10.1021/jacsau.4c00271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 07/07/2024] [Accepted: 07/10/2024] [Indexed: 09/28/2024]
Abstract
Integrating scientific principles into machine learning models to enhance their predictive performance and generalizability is a central challenge in the development of AI for Science. Herein, we introduce Uni-pK a, a novel framework that successfully incorporates thermodynamic principles into machine learning modeling, achieving high-precision predictions of acid dissociation constants (pK a), a crucial task in the rational design of drugs and catalysts, as well as a modeling challenge in computational physical chemistry for small organic molecules. Uni-pK a utilizes a comprehensive free energy model to represent molecular protonation equilibria accurately. It features a structure enumerator that reconstructs molecular configurations from pK a data, coupled with a neural network that functions as a free energy predictor, ensuring high-throughput, data-driven prediction while preserving thermodynamic consistency. Employing a pretraining-finetuning strategy with both predicted and experimental pK a data, Uni-pK a not only achieves state-of-the-art accuracy in chemoinformatics but also shows comparable precision to quantum mechanics-based methods.
Collapse
Affiliation(s)
- Weiliang Luo
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
- DP
Technology, Beijing 100089, China
| | - Gengmo Zhou
- DP
Technology, Beijing 100089, China
- Gaoling
School of Artificial Intelligence, Renmin
University of China, Beijing 100872, China
| | | | | | - Guolin Ke
- DP
Technology, Beijing 100089, China
| | - Zhewei Wei
- Gaoling
School of Artificial Intelligence, Renmin
University of China, Beijing 100872, China
| | | | | |
Collapse
|
5
|
Pala D, Clark DE. Caught between a ROCK and a hard place: current challenges in structure-based drug design. Drug Discov Today 2024; 29:104106. [PMID: 39029868 DOI: 10.1016/j.drudis.2024.104106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 06/27/2024] [Accepted: 07/13/2024] [Indexed: 07/21/2024]
Abstract
The discipline of structure-based drug design (SBDD) is several decades old and it is tempting to think that the proliferation of experimental structures for many drug targets might make computer-aided drug design (CADD) straightforward. However, this is far from true. In this review, we illustrate some of the challenges that CADD scientists face every day in their work, even now. We use Rho-associated protein kinase (ROCK), and public domain structures and data, as an example to illustrate some of the challenges we have experienced during our project targeting this protein. We hope that this will help to prevent unrealistic expectations of what CADD can accomplish and to educate non-CADD scientists regarding the challenges still facing their CADD colleagues.
Collapse
Affiliation(s)
- Daniele Pala
- Medicinal Chemistry and Drug Design Technologies Department, Chiesi Farmaceutici S.p.A, Research Center, Largo Belloli 11/a, 43122 Parma, Italy
| | - David E Clark
- Charles River, 6-9 Spire Green Centre, Flex Meadow, Harlow CM19 5TR, UK.
| |
Collapse
|
6
|
Abarbanel OD, Hutchison GR. QupKake: Integrating Machine Learning and Quantum Chemistry for Micro-p Ka Predictions. J Chem Theory Comput 2024; 20:6946-6956. [PMID: 38832803 PMCID: PMC11325546 DOI: 10.1021/acs.jctc.4c00328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]
Abstract
Accurate prediction of micro-pKa values is crucial for understanding and modulating the acidity and basicity of organic molecules, with applications in drug discovery, materials science, and environmental chemistry. This work introduces QupKake, a novel method that combines graph neural network models with semiempirical quantum mechanical (QM) features to achieve exceptional accuracy and generalization in micro-pKa prediction. QupKake outperforms state-of-the-art models on a variety of benchmark data sets, with root-mean-square errors between 0.5 and 0.8 pKa units on five external test sets. Feature importance analysis reveals the crucial role of QM features in both the reaction site enumeration and micro-pKa prediction models. QupKake represents a significant advancement in micro-pKa prediction, offering a powerful tool for various applications in chemistry and beyond.
Collapse
Affiliation(s)
- Omri D Abarbanel
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States
| | - Geoffrey R Hutchison
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States
- Department of Chemical and Petroleum Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, Pennsylvania 15261, United States
| |
Collapse
|
7
|
Miao R, Liu D, Mao L, Chen X, Zhang L, Yuan Z, Shi S, Li H, Li S. GR-pKa: a message-passing neural network with retention mechanism for pKa prediction. Brief Bioinform 2024; 25:bbae408. [PMID: 39171986 PMCID: PMC11339865 DOI: 10.1093/bib/bbae408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 07/26/2024] [Accepted: 08/01/2024] [Indexed: 08/23/2024] Open
Abstract
During the drug discovery and design process, the acid-base dissociation constant (pKa) of a molecule is critically emphasized due to its crucial role in influencing the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties and biological activity. However, the experimental determination of pKa values is often laborious and complex. Moreover, existing prediction methods exhibit limitations in both the quantity and quality of the training data, as well as in their capacity to handle the complex structural and physicochemical properties of compounds, consequently impeding accuracy and generalization. Therefore, developing a method that can quickly and accurately predict molecular pKa values will to some extent help the structural modification of molecules, and thus assist the development process of new drugs. In this study, we developed a cutting-edge pKa prediction model named GR-pKa (Graph Retention pKa), leveraging a message-passing neural network and employing a multi-fidelity learning strategy to accurately predict molecular pKa values. The GR-pKa model incorporates five quantum mechanical properties related to molecular thermodynamics and dynamics as key features to characterize molecules. Notably, we originally introduced the novel retention mechanism into the message-passing phase, which significantly improves the model's ability to capture and update molecular information. Our GR-pKa model outperforms several state-of-the-art models in predicting macro-pKa values, achieving impressive results with a low mean absolute error of 0.490 and root mean square error of 0.588, and a high R2 of 0.937 on the SAMPL7 dataset.
Collapse
Affiliation(s)
- Runyu Miao
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
| | - Danlin Liu
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, No. 3663, Zhongshan North Road, Putuo District, Shanghai, 200062, China
- School of Computer Science and Technology, East China Normal University, No. 3663, Zhongshan North Road, Putuo District, Shanghai, 200062, China
| | - Liyun Mao
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
| | - Xingyu Chen
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
| | - Leihao Zhang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
| | - Zhen Yuan
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
| | - Shanshan Shi
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
| | - Honglin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, No. 3663, Zhongshan North Road, Putuo District, Shanghai, 200062, China
- Lingang Laboratory, No. 319, Yueyang Road, Xuhui District, Shanghai, 200031, China
| | - Shiliang Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, No. 3663, Zhongshan North Road, Putuo District, Shanghai, 200062, China
- Department of Pain management, HuaDong Hospital affiliated to Fudan University, No. 221, West Yan'an Road, Jing'an District, Shanghai, 200040, China
| |
Collapse
|
8
|
An H, Liu X, Cai W, Shao X. AttenGpKa: A Universal Predictor of Solvation Acidity Using Graph Neural Network and Molecular Topology. J Chem Inf Model 2024; 64:5480-5491. [PMID: 38982757 DOI: 10.1021/acs.jcim.4c00449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
Rapid and accurate calculation of acid dissociation constant (pKa) is crucial for designing chemical synthesis routes, optimizing catalysts, and predicting chemical behavior. Despite recent progress in machine learning, predicting solvation acidity, especially in nonaqueous solvents, remains challenging due to limited experimental data. This challenge arises from treating experimental values in different solvents as distinct data domains and modeling them separately. In this work, we treat both the solutes and solvents equally from a perspective of molecular topology and propose a highly universal framework called AttenGpKa for predicting solvation acidity. AttenGpKa is trained using 26,522 experimental pKa values from 60 pure and mixed solvents in the iBonD database. As a result, our model can simultaneously predict the pKa values of a compound in various solvents, including pure water, pure nonaqueous, and mixed solvents. AttenGpKa achieves universality by using graph neural networks and attention mechanisms to learn complex effects within solute and solvent molecules. Furthermore, encodings of both solute and solvent molecules are adaptively fused to simulate the influence of the solvent on acid dissociation. AttenGpKa demonstrates robust generalization in extensive validations. The interpretability studies further indicate that our model has effectively learnt electronic and solvent effects. A free-to-use software is provided to facilitate the use of AttenGpKa for pKa prediction.
Collapse
Affiliation(s)
- Hongle An
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
9
|
An H, Liu X, Cai W, Shao X. Explainable Graph Neural Networks with Data Augmentation for Predicting p Ka of C-H Acids. J Chem Inf Model 2024; 64:2383-2392. [PMID: 37706462 DOI: 10.1021/acs.jcim.3c00958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
The pKa of C-H acids is an important parameter in the fields of organic synthesis, drug discovery, and materials science. However, the prediction of pKa is still a great challenge due to the limit of experimental data and the lack of chemical insight. Here, a new model for predicting the pKa values of C-H acids is proposed on the basis of graph neural networks (GNNs) and data augmentation. A message passing unit (MPU) was used to extract the topological and target-related information from the molecular graph data, and a readout layer was utilized to retrieve the information on the ionization site C atom. The retrieved information then was adopted to predict pKa by a fully connected network. Furthermore, to increase the diversity of the training data, a knowledge-infused data augmentation technique was established by replacing the H atoms in a molecule with substituents exhibiting different electronic effects. The MPU was pretrained with the augmented data. The efficacy of data augmentation was confirmed by visualizing the distribution of compounds with different substituents and by classifying compounds. The explainability of the model was studied by examining the change of pKa values when a specific atom was masked. This explainability was used to identify the key substituents for pKa. The model was evaluated on two data sets from the iBonD database. Dataset1 includes the experimental pKa values of C-H acids measured in DMSO, while dataset2 comprises the pKa values measured in water. The results show that the knowledge-infused data augmentation technique greatly improves the predictive accuracy of the model, especially when the number of samples is small.
Collapse
Affiliation(s)
- Hongle An
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
10
|
Kołodziejczyk A, Wróblewska A, Pietrzak M, Pyrcz P, Błaziak K, Szmigielski R. Dissociation constants of relevant secondary organic aerosol components in the atmosphere. CHEMOSPHERE 2024; 351:141166. [PMID: 38224752 DOI: 10.1016/j.chemosphere.2024.141166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 01/07/2024] [Accepted: 01/08/2024] [Indexed: 01/17/2024]
Abstract
The presented studies focus on measuring the determination of the acidity constant (pKa) of relevant secondary organic aerosol components. For our research, we selected important oxidation products (mainly carboxylic acids) of the most abundant terpene compounds, such as α-pinene, β-pinene, β-caryophyllene, and δ-3-carene. The research covered the synthesis and determination of the acidity constant of selected compounds. We used three methods to measure the acidity constant, i.e., 1H NMR titration, pH-metric titration, Bates-Schwarzenbach spectrophotometric method. Moreover, the pKa values were calculated with Marvin 21.17.0 software to compare the experimentally derived values with those calculated from the chemical structure. pKa values measured with 1H NMR titration ranged from 3.51 ± 0.01 for terebic acid to 5.18 ± 0.06 for β-norcaryophyllonic acid. Moreover, the data determined by the 1H NMR method revealed a good correlation with the data obtained with the commonly used potentiometric and UV-spectroscopic methods (R2 = 0.92). In contrast, the comparison with in silico results exhibits a relatively low correlation (R2Marvin = 0.66). We found that most of the values calculated with the Marvin Program are lower than experimental values obtained with pH-metric titration with an average difference of 0.44 pKa units. For di- and tricarboxylic acids, we obtained two and three pKa values, respectively. A good correlation with the literature values was observed, for example, Howell and Fisher (1958) used pH-metric titration and measured pKa1 and pKa2 to be 4.48 and 5.48, while our results are 4.24 ± 0.10 and 5.40 ± 0.02, respectively.
Collapse
Affiliation(s)
- Agata Kołodziejczyk
- Institute of Physical Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.
| | - Aleksandra Wróblewska
- Institute of Physical Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, 01-224, Warsaw, Poland
| | - Mariusz Pietrzak
- Institute of Physical Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, 01-224, Warsaw, Poland
| | - Patryk Pyrcz
- Institute of Physical Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, 01-224, Warsaw, Poland
| | - Kacper Błaziak
- Faculty of Chemistry, University of Warsaw, ul. Pasteura 1, 01-224, Warsaw, Poland; Biological and Chemical Research Center, University of Warsaw, ul. Żwirki i Wigury 101, 01-224, Warsaw, Poland
| | - Rafał Szmigielski
- Institute of Physical Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, 01-224, Warsaw, Poland
| |
Collapse
|
11
|
Sanchez AJ, Maier S, Raghavachari K. Leveraging DFT and Molecular Fragmentation for Chemically Accurate p Ka Prediction Using Machine Learning. J Chem Inf Model 2024; 64:712-723. [PMID: 38301279 DOI: 10.1021/acs.jcim.3c01923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
We present a quantum mechanical/machine learning (ML) framework based on random forest to accurately predict the pKas of complex organic molecules using inexpensive density functional theory (DFT) calculations. By including physics-based features from low-level DFT calculations and structural features from our connectivity-based hierarchy (CBH) fragmentation protocol, we can correct the systematic error associated with DFT. The generalizability and performance of our model are evaluated on two benchmark sets (SAMPL6 and Novartis). We believe the carefully curated input of physics-based features lessens the model's data dependence and need for complex deep learning architectures, without compromising the accuracy of the test sets. As a point of novelty, our work extends the applicability of CBH, employing it for the generation of viable molecular descriptors for ML.
Collapse
Affiliation(s)
- Alec J Sanchez
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| | - Sarah Maier
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| |
Collapse
|
12
|
An Accurate Approach for Computational pKa Determination of Phenolic Compounds. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27238590. [PMID: 36500683 PMCID: PMC9736058 DOI: 10.3390/molecules27238590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 11/30/2022] [Accepted: 12/01/2022] [Indexed: 12/12/2022]
Abstract
Computational chemistry is a valuable tool, as it allows for in silico prediction of key parameters of novel compounds, such as pKa. In the framework of computational pKa determination, the literature offers several approaches based on different level of theories, functionals and continuum solvation models. However, correction factors are often used to provide reliable models that adequately predict pKa. In this work, an accurate protocol based on a direct approach is proposed for computing phenols pKa. Importantly, this methodology does not require the use of correction factors or mathematical fitting, making it highly practical, easy to use and fast. Above all, DFT calculations performed in the presence two explicit water molecules using CAM-B3LYP functional with 6-311G+dp basis set and a solvation model based on density (SMD) led to accurate pKa values. In particular, calculations performed on a series of 13 differently substituted phenols provided reliable results, with a mean absolute error of 0.3. Furthermore, the model achieves accurate results with -CN and -NO2 substituents, which are usually excluded from computational pKa studies, enabling easy and reliable pKa determination in a wide range of phenols.
Collapse
|