1
|
An H, Liu X, Cai W, Shao X. AttenGpKa: A Universal Predictor of Solvation Acidity Using Graph Neural Network and Molecular Topology. J Chem Inf Model 2024; 64:5480-5491. [PMID: 38982757 DOI: 10.1021/acs.jcim.4c00449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
Rapid and accurate calculation of acid dissociation constant (pKa) is crucial for designing chemical synthesis routes, optimizing catalysts, and predicting chemical behavior. Despite recent progress in machine learning, predicting solvation acidity, especially in nonaqueous solvents, remains challenging due to limited experimental data. This challenge arises from treating experimental values in different solvents as distinct data domains and modeling them separately. In this work, we treat both the solutes and solvents equally from a perspective of molecular topology and propose a highly universal framework called AttenGpKa for predicting solvation acidity. AttenGpKa is trained using 26,522 experimental pKa values from 60 pure and mixed solvents in the iBonD database. As a result, our model can simultaneously predict the pKa values of a compound in various solvents, including pure water, pure nonaqueous, and mixed solvents. AttenGpKa achieves universality by using graph neural networks and attention mechanisms to learn complex effects within solute and solvent molecules. Furthermore, encodings of both solute and solvent molecules are adaptively fused to simulate the influence of the solvent on acid dissociation. AttenGpKa demonstrates robust generalization in extensive validations. The interpretability studies further indicate that our model has effectively learnt electronic and solvent effects. A free-to-use software is provided to facilitate the use of AttenGpKa for pKa prediction.
Collapse
Affiliation(s)
- Hongle An
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
2
|
Abarbanel OD, Hutchison GR. QupKake: Integrating Machine Learning and Quantum Chemistry for Micro-p Ka Predictions. J Chem Theory Comput 2024. [PMID: 38832803 DOI: 10.1021/acs.jctc.4c00328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]
Abstract
Accurate prediction of micro-pKa values is crucial for understanding and modulating the acidity and basicity of organic molecules, with applications in drug discovery, materials science, and environmental chemistry. This work introduces QupKake, a novel method that combines graph neural network models with semiempirical quantum mechanical (QM) features to achieve exceptional accuracy and generalization in micro-pKa prediction. QupKake outperforms state-of-the-art models on a variety of benchmark data sets, with root-mean-square errors between 0.5 and 0.8 pKa units on five external test sets. Feature importance analysis reveals the crucial role of QM features in both the reaction site enumeration and micro-pKa prediction models. QupKake represents a significant advancement in micro-pKa prediction, offering a powerful tool for various applications in chemistry and beyond.
Collapse
Affiliation(s)
- Omri D Abarbanel
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States
| | - Geoffrey R Hutchison
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States
- Department of Chemical and Petroleum Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, Pennsylvania 15261, United States
| |
Collapse
|
3
|
Csizi KS, Reiher M. Automated preparation of nanoscopic structures: Graph-based sequence analysis, mismatch detection, and pH-consistent protonation with uncertainty estimates. J Comput Chem 2024; 45:761-776. [PMID: 38124290 DOI: 10.1002/jcc.27276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 11/14/2023] [Indexed: 12/23/2023]
Abstract
Structure and function in nanoscale atomistic assemblies are tightly coupled, and every atom with its specific position and even every electron will have a decisive effect on the electronic structure, and hence, on the molecular properties. Molecular simulations of nanoscopic atomistic structures therefore require accurately resolved three-dimensional input structures. If extracted from experiment, these structures often suffer from severe uncertainties, of which the lack of information on hydrogen atoms is a prominent example. Hence, experimental structures require careful review and curation, which is a time-consuming and error-prone process. Here, we present a fast and robust protocol for the automated structure analysis and pH-consistent protonation, in short, ASAP. For biomolecules as a target, the ASAP protocol integrates sequence analysis and error assessment of a given input structure. ASAP allows for pK a prediction from reference data through Gaussian process regression including uncertainty estimation and connects to system-focused atomistic modeling described in Brunken and Reiher (J. Chem. Theory Comput. 16, 2020, 1646). Although focused on biomolecules, ASAP can be extended to other nanoscopic objects, because most of its design elements rely on a general graph-based foundation guaranteeing transferability. The modular character of the underlying pipeline supports different degrees of automation, which allows for (i) efficient feedback loops for human-machine interaction with a low entrance barrier and for (ii) integration into autonomous procedures such as automated force field parametrizations. This facilitates fast switching of the pH-state through on-the-fly system-focused reparametrization during a molecular simulation at virtually no extra computational cost.
Collapse
Affiliation(s)
- Katja-Sophia Csizi
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Markus Reiher
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
4
|
Sanchez AJ, Maier S, Raghavachari K. Leveraging DFT and Molecular Fragmentation for Chemically Accurate p Ka Prediction Using Machine Learning. J Chem Inf Model 2024; 64:712-723. [PMID: 38301279 DOI: 10.1021/acs.jcim.3c01923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
We present a quantum mechanical/machine learning (ML) framework based on random forest to accurately predict the pKas of complex organic molecules using inexpensive density functional theory (DFT) calculations. By including physics-based features from low-level DFT calculations and structural features from our connectivity-based hierarchy (CBH) fragmentation protocol, we can correct the systematic error associated with DFT. The generalizability and performance of our model are evaluated on two benchmark sets (SAMPL6 and Novartis). We believe the carefully curated input of physics-based features lessens the model's data dependence and need for complex deep learning architectures, without compromising the accuracy of the test sets. As a point of novelty, our work extends the applicability of CBH, employing it for the generation of viable molecular descriptors for ML.
Collapse
Affiliation(s)
- Alec J Sanchez
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| | - Sarah Maier
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| |
Collapse
|
5
|
Ota R, Yamashita F. Application of machine learning techniques to the analysis and prediction of drug pharmacokinetics. J Control Release 2022; 352:961-969. [PMID: 36370876 DOI: 10.1016/j.jconrel.2022.11.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 10/23/2022] [Accepted: 11/07/2022] [Indexed: 11/17/2022]
Abstract
In this review, we describe the current status and challenges in applying machine-learning techniques to the analysis and prediction of pharmacokinetic data. The theory of pharmacokinetics has been developed over decades on the basis of physiology and reaction kinetics. Mathematical models allow the reduction of pharmacokinetic data to parameter values, giving insight and understanding into ADME processes and predicting the outcome of different dosing scenarios. However, much information hidden in the data is lost through conceptual simplification with models. It is difficult to use mechanistic models alone to predict diverse pharmacokinetic time profiles, including inter-drug and inter-individual differences, in a cross-sectional manner. Machine learning is a prediction platform that can handle complex phenomena through data-driven analysis. As a resule, machine learning has been successfully adopted in various fields, including image recognition and language processing, and has been used for over two decades in pharmacokinetic research, primarily in the area of quantitative structure-activity relationships for pharmacokinetic parameters. Machine-learning models are generally known to provide better predictive performance than conventional linear models. Owing to the recent success in deep learning, models with new structures are being consistently proposed. These models include transfer learning and generative adversarial networks, which contribute to the effective use of a limited amount of data by diverting existing similar models or generating pseudo-data. How to make such newly emerging machine learning technologies applicable to meet challenges in the pharmacokinetics/pharmacodynamics field is now the key issue.
Collapse
Affiliation(s)
- Ryosaku Ota
- Department of Drug Delivery Research, Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Fumiyoshi Yamashita
- Department of Drug Delivery Research, Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan; Department of Applied Pharmacy and Pharmacokinetics, Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan.
| |
Collapse
|
6
|
Wu J, Wan Y, Wu Z, Zhang S, Cao D, Hsieh CY, Hou T. MF-SuP-pKa: Multi-fidelity modeling with subgraph pooling mechanism for pKa prediction. Acta Pharm Sin B 2022. [DOI: 10.1016/j.apsb.2022.11.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
7
|
Wu J, Kang Y, Pan P, Hou T. Machine learning methods for pK a prediction of small molecules: Advances and challenges. Drug Discov Today 2022; 27:103372. [PMID: 36167281 DOI: 10.1016/j.drudis.2022.103372] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 08/15/2022] [Accepted: 09/21/2022] [Indexed: 11/27/2022]
Abstract
The acid-base dissociation constant (pKa) is a fundamental property influencing many ADMET properties of small molecules. However, rapid and accurate pKa prediction remains a great challenge. In this review, we outline the current advances in machine-learning-based QSAR models for pKa prediction, including descriptor-based and graph-based approaches, and summarize their pros and cons. Moreover, we highlight the current challenges and future directions regarding experimental data, crucial factors influencing pKa and in silico prediction tools. We hope that this review can provide a practical guidance for the follow-up studies.
Collapse
Affiliation(s)
- Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| |
Collapse
|
8
|
Desantis J, Mammoli A, Eleuteri M, Coletti A, Croci F, Macchiarulo A, Goracci L. PROTACs bearing piperazine-containing linkers: what effect on their protonation state? RSC Adv 2022; 12:21968-21977. [PMID: 36043064 PMCID: PMC9361468 DOI: 10.1039/d2ra03761k] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 07/20/2022] [Indexed: 11/21/2022] Open
Abstract
Proteolysis targeting chimeras (PROTACs) represent an emerging class of compounds for innovative therapeutic application. Their bifunctional nature induces the formation of a ternary complex (target protein/PROTAC/E3 ligase) which allows target protein ubiquitination and subsequent proteasomal-dependent degradation. To date, despite great efforts being made to improve their biological efficacy PROTACs rational design still represents a challenging task, above all for the modulation of their physicochemical and pharmacokinetics properties. Considering the pivotal role played by the linker moiety, recently the insertion of a piperazine moiety into the PROTAC linker has been widely used, as this ring can in principle improve rigidity and increase solubility upon protonation. Nevertheless, the pK a of the piperazine ring is significantly affected by the chemical groups located nearby, and slight modifications in the linker could eliminate the desired effect. In the present study, the pK a values of a dataset of synthesized small molecule compounds including PROTACs and their precursors have been evaluated in order to highlight how a fine modulation of piperazine-containing linkers can impact the protonation state of these molecules or similar heterobifunctional ones. Finally, the possibility of predicting the trend through in silico approaches was also evaluated.
Collapse
Affiliation(s)
- Jenny Desantis
- Department of Chemistry, Biology, and Biotechnology, University of Perugia Via Elce di Sotto 8 06123 Perugia Italy
| | - Andrea Mammoli
- Department of Pharmaceutical Sciences, University of Perugia Via del Liceo 1 06123 Perugia Italy
| | - Michela Eleuteri
- Department of Chemistry, Biology, and Biotechnology, University of Perugia Via Elce di Sotto 8 06123 Perugia Italy
| | - Alice Coletti
- Department of Pharmaceutical Sciences, University of Perugia Via del Liceo 1 06123 Perugia Italy
| | - Federico Croci
- Department of Chemistry, Biology, and Biotechnology, University of Perugia Via Elce di Sotto 8 06123 Perugia Italy
| | - Antonio Macchiarulo
- Department of Pharmaceutical Sciences, University of Perugia Via del Liceo 1 06123 Perugia Italy
| | - Laura Goracci
- Department of Chemistry, Biology, and Biotechnology, University of Perugia Via Elce di Sotto 8 06123 Perugia Italy
| |
Collapse
|
9
|
Zulueta B, Tulyani SV, Westmoreland PR, Frisch MJ, Petersson EJ, Petersson GA, Keith JA. A Bond-Energy/Bond-Order and Populations Relationship. J Chem Theory Comput 2022; 18:4774-4794. [PMID: 35849729 DOI: 10.1021/acs.jctc.2c00334] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We report an analytical bond energy from bond orders and populations (BEBOP) model that provides intramolecular bond energy decompositions for chemical insight into the thermochemistry of molecules. The implementation reported here employs a minimum basis set Mulliken population analysis on well-conditioned Hartree-Fock orbitals to decompose total electronic energies into physically interpretable contributions. The model's parametrization scheme is based on atom-specific parameters for hybridization and atom pair-specific parameters for short-range repulsion and extended Hückel-type bond energy term fitted to reproduce CBS-QB3 thermochemistry data. The current implementation is suitable for molecules involving H, Li, Be, B, C, N, O, and F atoms, and it can be used to analyze intramolecular bond energies of molecular structures at optimized stationary points found from other computational methods. This first-generation model brings the computational cost of a Hartree-Fock calculation using a large triple-ζ basis set, and its atomization energies are comparable to those from widely used hybrid Kohn-Sham density functional theory (DFT, as benchmarked to 109 species from the G2/97 test set and an additional 83 reference species). This model should be useful for the community by interpreting overall ab initio molecular energies in terms of physically insightful bond energy contributions, e.g., bond dissociation energies, resonance energies, molecular strain energies, and qualitative energetic contributions to the activation barrier in chemical reaction mechanisms. This work reports a critical benchmarking of this method as well as discussions of its strengths and weaknesses compared to hybrid DFT (i.e., B3LYP, M062X, PBE0, and APF methods), and other cost-effective approximate Hamiltonian semiempirical quantum methods (i.e., AM1, PM6, PM7, and DFTB3).
Collapse
Affiliation(s)
- Barbaro Zulueta
- Department of Chemical and Petroleum Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Sonia V Tulyani
- Formerly Chemical Engineering Department, University of Massachusetts Amherst,618 North Pleasant Street, Amherst, Massachusetts 01003, United States
| | - Phillip R Westmoreland
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina 27695, United States
| | | | - E James Petersson
- Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| | - George A Petersson
- Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania 19122, United States.,Formerly Hall-Atwater Laboratories of Chemistry, Wesleyan University, Middletown, Connecticut 06459, United States
| | - John A Keith
- Department of Chemical and Petroleum Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| |
Collapse
|
10
|
Holt RA, Seybold PG. Computational Estimation of the Acidities of Pyrimidines and Related Compounds. Molecules 2022; 27:385. [PMID: 35056699 PMCID: PMC8782049 DOI: 10.3390/molecules27020385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 12/20/2021] [Accepted: 01/01/2022] [Indexed: 12/02/2022] Open
Abstract
Pyrimidines are key components in the genetic code of living organisms and the pyrimidine scaffold is also found in many bioactive and medicinal compounds. The acidities of these compounds, as represented by their pKas, are of special interest since they determine the species that will prevail under different pH conditions. Here, a quantum chemical quantitative structure-activity relationship (QSAR) approach was employed to estimate these acidities. Density-functional theory calculations at the B3LYP/6-31+G(d,p) level and the SM8 aqueous solvent model were employed, and the energy difference ∆EH2O between the parent compound and its dissociation product was used as a variation parameter. Excellent estimates for both the cation → neutral (pKa1, R2 = 0.965) and neutral → anion (pKa2, R2 = 0.962) dissociations were obtained. A commercial package from Advanced Chemical Design also yielded excellent results for these acidities.
Collapse
Affiliation(s)
| | - Paul G. Seybold
- Department of Chemistry, Wright State University, Dayton, OH 45435, USA;
| |
Collapse
|
11
|
Tse EG, Aithani L, Anderson M, Cardoso-Silva J, Cincilla G, Conduit GJ, Galushka M, Guan D, Hallyburton I, Irwin BWJ, Kirk K, Lehane AM, Lindblom JCR, Lui R, Matthews S, McCulloch J, Motion A, Ng HL, Öeren M, Robertson MN, Spadavecchio V, Tatsis VA, van Hoorn WP, Wade AD, Whitehead TM, Willis P, Todd MH. An Open Drug Discovery Competition: Experimental Validation of Predictive Models in a Series of Novel Antimalarials. J Med Chem 2021; 64:16450-16463. [PMID: 34748707 DOI: 10.1021/acs.jmedchem.1c00313] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The Open Source Malaria (OSM) consortium is developing compounds that kill the human malaria parasite, Plasmodium falciparum, by targeting PfATP4, an essential ion pump on the parasite surface. The structure of PfATP4 has not been determined. Here, we describe a public competition created to develop a predictive model for the identification of PfATP4 inhibitors, thereby reducing project costs associated with the synthesis of inactive compounds. Competition participants could see all entries as they were submitted. In the final round, featuring private sector entrants specializing in machine learning methods, the best-performing models were used to predict novel inhibitors, of which several were synthesized and evaluated against the parasite. Half possessed biological activity, with one featuring a motif that the human chemists familiar with this series would have dismissed as "ill-advised". Since all data and participant interactions remain in the public domain, this research project "lives" and may be improved by others.
Collapse
Affiliation(s)
- Edwin G Tse
- School of Pharmacy, University College London, London WC1N 1AX, U.K
| | - Laksh Aithani
- Exscientia Ltd., The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Mark Anderson
- Drug Discovery Unit, Division of Biological Chemistry and Drug Discovery, School of Life Sciences, University of Dundee, Dundee DD1 5EH, U.K
| | - Jonathan Cardoso-Silva
- Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, London WC2B 4BG, U.K
| | | | - Gareth J Conduit
- Intellegens Ltd., Eagle Labs, Chesterton Road, Cambridge CB4 3AZ, U.K.,Theory of Condensed Matter Group, Cavendish Laboratories, University of Cambridge, Cambridge CB3 0HE, U.K
| | | | - Davy Guan
- School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Irene Hallyburton
- Drug Discovery Unit, Division of Biological Chemistry and Drug Discovery, School of Life Sciences, University of Dundee, Dundee DD1 5EH, U.K
| | - Benedict W J Irwin
- Theory of Condensed Matter Group, Cavendish Laboratories, University of Cambridge, Cambridge CB3 0HE, U.K.,Optibrium Ltd. Blenheim House, Denny End Road, Cambridge CB25 9QE, U.K
| | - Kiaran Kirk
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Adele M Lehane
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Julia C R Lindblom
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Raymond Lui
- School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Slade Matthews
- School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - James McCulloch
- Kellerberrin, 6 Wharf Rd, Balmain, Sydney, NSW 2041, Australia
| | - Alice Motion
- School of Chemistry, The University of Sydney, Sydney, NSW 2006, Australia
| | - Ho Leung Ng
- Department of Biochemistry and Molecular Biophysics, Kansas State University, Manhattan Kansas 66506, United States
| | - Mario Öeren
- Optibrium Ltd. Blenheim House, Denny End Road, Cambridge CB25 9QE, U.K
| | - Murray N Robertson
- Strathclyde Institute Of Pharmacy And Biomedical Sciences, University of Strathclyde, Glasgow G4 ORE, U.K
| | | | - Vasileios A Tatsis
- Exscientia Ltd., The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Willem P van Hoorn
- Exscientia Ltd., The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Alexander D Wade
- Theory of Condensed Matter Group, Cavendish Laboratories, University of Cambridge, Cambridge CB3 0HE, U.K
| | | | - Paul Willis
- Medicines for Malaria Venture, PO Box 1826, 20 rte de Pre-Bois, 1215 Geneva 15, Switzerland
| | - Matthew H Todd
- School of Pharmacy, University College London, London WC1N 1AX, U.K
| |
Collapse
|
12
|
Imputation of sensory properties using deep learning. J Comput Aided Mol Des 2021; 35:1125-1140. [PMID: 34716833 DOI: 10.1007/s10822-021-00424-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 10/15/2021] [Indexed: 10/19/2022]
Abstract
Predicting the sensory properties of compounds is challenging due to the subjective nature of the experimental measurements. This testing relies on a panel of human participants and is therefore also expensive and time-consuming. We describe the application of a state-of-the-art deep learning method, Alchemite™, to the imputation of sparse physicochemical and sensory data and compare the results with conventional quantitative structure-activity relationship methods and a multi-target graph convolutional neural network. The imputation model achieved a substantially higher accuracy of prediction, with improvements in R2 between 0.26 and 0.45 over the next best method for each sensory property. We also demonstrate that robust uncertainty estimates generated by the imputation model enable the most accurate predictions to be identified and that imputation also more accurately predicts activity cliffs, where small changes in compound structure result in large changes in sensory properties. In combination, these results demonstrate that the use of imputation, based on data from less expensive, early experiments, enables better selection of compounds for more costly studies, saving experimental time and resources.
Collapse
|
13
|
Racioppi S, Rahm M. In-Situ Electronegativity and the Bridging of Chemical Bonding Concepts. Chemistry 2021; 27:18156-18167. [PMID: 34668618 PMCID: PMC9299076 DOI: 10.1002/chem.202103477] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Indexed: 12/30/2022]
Abstract
One challenge in chemistry is the plethora of often disparate models for rationalizing the electronic structure of molecules. Chemical concepts abound, but their connections are often frail. This work describes a quantum‐mechanical framework that enables a combination of ideas from three approaches common for the analysis of chemical bonds: energy decomposition analysis (EDA), quantum chemical topology, and molecular orbital (MO) theory. The glue to our theory is the electron energy density, interpretable as one part electrons and one part electronegativity. We present a three‐dimensional analysis of the electron energy density and use it to redefine what constitutes an atom in a molecule. Definitions of atomic partial charge and electronegativity follow in a way that connects these concepts to the total energy of a molecule. The formation of polar bonds is predicted to cause inversion of electronegativity, and a new perspective of bonding in diborane and guanine−cytosine base‐pairing is presented. The electronegativity of atoms inside molecules is shown to be predictive of pKa.
Collapse
Affiliation(s)
- Stefano Racioppi
- Department of Chemistry and Chemical Engineering, Chalmers University of Technology, Kemigården 4, 41258, Gothenburg, Sweden
| | - Martin Rahm
- Department of Chemistry and Chemical Engineering, Chalmers University of Technology, Kemigården 4, 41258, Gothenburg, Sweden
| |
Collapse
|
14
|
Xiong J, Li Z, Wang G, Fu Z, Zhong F, Xu T, Liu X, Huang Z, Liu X, Chen K, Jiang H, Zheng M. Multi-instance learning of graph neural networks for aqueous pKa prediction. Bioinformatics 2021; 38:792-798. [PMID: 34643666 PMCID: PMC8756178 DOI: 10.1093/bioinformatics/btab714] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 09/26/2021] [Accepted: 10/15/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The acid dissociation constant (pKa) is a critical parameter to reflect the ionization ability of chemical compounds and is widely applied in a variety of industries. However, the experimental determination of pKa is intricate and time-consuming, especially for the exact determination of micro-pKa information at the atomic level. Hence, a fast and accurate prediction of pKa values of chemical compounds is of broad interest. RESULTS Here, we compiled a large-scale pKa dataset containing 16 595 compounds with 17 489 pKa values. Based on this dataset, a novel pKa prediction model, named Graph-pKa, was established using graph neural networks. Graph-pKa performed well on the prediction of macro-pKa values, with a mean absolute error around 0.55 and a coefficient of determination around 0.92 on the test dataset. Furthermore, combining multi-instance learning, Graph-pKa was also able to automatically deconvolute the predicted macro-pKa into discrete micro-pKa values. AVAILABILITY AND IMPLEMENTATION The Graph-pKa model is now freely accessible via a web-based interface (https://pka.simm.ac.cn/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiacheng Xiong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China,College of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhaojun Li
- Development Department, Suzhou Alphama Biotechnology Co., Ltd, Suzhou City 215000, China
| | - Guangchao Wang
- College of Computer and Information Engineering, Dezhou University, Dezhou City 253023, China
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Feisheng Zhong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China,College of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tingyang Xu
- Tencent AI Lab, Tencent, Shenzhen 518057, China
| | - Xiaomeng Liu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China,College of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ziming Huang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China,College of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaohong Liu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China,Development Department, Suzhou Alphama Biotechnology Co., Ltd, Suzhou City 215000, China,Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, Shanghai 200031, China
| | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China,College of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | | | | |
Collapse
|
15
|
Pan X, Wang H, Li C, Zhang JZH, Ji C. MolGpka: A Web Server for Small Molecule p Ka Prediction Using a Graph-Convolutional Neural Network. J Chem Inf Model 2021; 61:3159-3165. [PMID: 34251213 DOI: 10.1021/acs.jcim.1c00075] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
pKa is an important property in the lead optimization process since the charge state of a molecule in physiologic pH plays a critical role in its biological activity, solubility, membrane permeability, metabolism, and toxicity. Accurate and fast estimation of small molecule pKa is vital during the drug discovery process. We present MolGpKa, a web server for pKa prediction using a graph-convolutional neural network model. The model works by learning pKa related chemical patterns automatically and building reliable predictors with learned features. ACD/pKa data for 1.6 million compounds from the ChEMBL database was used for model training. We found that the performance of the model is better than machine learning models built with human-engineered fingerprints. Detailed analysis shows that the substitution effect on pKa is well learned by the model. MolGpKa is a handy tool for the rapid estimation of pKa during the ligand design process. The MolGpKa server is freely available to researchers and can be accessed at https://xundrug.cn/molgpka.
Collapse
Affiliation(s)
- Xiaolin Pan
- Shanghai Engineering Research Center for Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Hao Wang
- Shanghai Engineering Research Center for Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Cuiyu Li
- Advanced Computing East China Sub-center, Suma Technology Co., Ltd., Kunshan 215300, China
| | - John Z H Zhang
- Shanghai Engineering Research Center for Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China.,Department of Chemistry, New York University, New York, New York 10003, United States.,Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi 030006, China
| | - Changge Ji
- Shanghai Engineering Research Center for Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
16
|
Pracht P, Grimme S. Efficient Quantum-Chemical Calculations of Acid Dissociation Constants from Free-Energy Relationships. J Phys Chem A 2021; 125:5681-5692. [PMID: 34142841 DOI: 10.1021/acs.jpca.1c03463] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The calculation of acid dissociation constants (pKa) is an important task in computational chemistry and chemoinformatics. Theoretically and with minimal empiricism, this is possible from computed acid dissociation free energies via so-called linear free-energy relationships. In this study some modifications are introduced to the latter, providing a straightforward, broadly applicable protocol with an adjustable degree of sophistication for quantum chemistry-based calculations of pKa in water. It targets a wide pKa range (∼70 units) and medium-sized, flexible molecules. Herein, a focus is set on the recently published r2SCAN-3c and related efficient composite density functionals and the semiempirical GFN2-xTB method, including a newly introduced energy correction for heterolytic dissociation, both in combination with implicit solvation models. The performance is evaluated in comparison with experimental data, showing mean errors often smaller than a targeted 1 pKa unit accuracy. Larger deviations are observed only upon inclusion of challenging highly negative (<-5) or positive (>15) pKa values. Among all those tested, it is found that B97-3c is the best performing functional, although rather independently of the density functional theory (DFT) method used; low root-mean-square errors of 0.8-1.0 pKa units for typical drugs are obtained. For optimal performance, it is recommended to employ DFT functional specific free-energy relationship parameters. Additionally, a significant conformational dependence of the pKa values is revealed and quantified for some nonrigid drug molecules.
Collapse
Affiliation(s)
- Philipp Pracht
- Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Stefan Grimme
- Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| |
Collapse
|
17
|
Işık M, Rustenburg AS, Rizzi A, Gunner MR, Mobley DL, Chodera JD. Overview of the SAMPL6 pK a challenge: evaluating small molecule microscopic and macroscopic pK a predictions. J Comput Aided Mol Des 2021; 35:131-166. [PMID: 33394238 PMCID: PMC7904668 DOI: 10.1007/s10822-020-00362-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 11/17/2020] [Indexed: 01/01/2023]
Abstract
The prediction of acid dissociation constants (pKa) is a prerequisite for predicting many other properties of a small molecule, such as its protein-ligand binding affinity, distribution coefficient (log D), membrane permeability, and solubility. The prediction of each of these properties requires knowledge of the relevant protonation states and solution free energy penalties of each state. The SAMPL6 pKa Challenge was the first time that a separate challenge was conducted for evaluating pKa predictions as part of the Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) exercises. This challenge was motivated by significant inaccuracies observed in prior physical property prediction challenges, such as the SAMPL5 log D Challenge, caused by protonation state and pKa prediction issues. The goal of the pKa challenge was to assess the performance of contemporary pKa prediction methods for drug-like molecules. The challenge set was composed of 24 small molecules that resembled fragments of kinase inhibitors, a number of which were multiprotic. Eleven research groups contributed blind predictions for a total of 37 pKa distinct prediction methods. In addition to blinded submissions, four widely used pKa prediction methods were included in the analysis as reference methods. Collecting both microscopic and macroscopic pKa predictions allowed in-depth evaluation of pKa prediction performance. This article highlights deficiencies of typical pKa prediction evaluation approaches when the distinction between microscopic and macroscopic pKas is ignored; in particular, we suggest more stringent evaluation criteria for microscopic and macroscopic pKa predictions guided by the available experimental data. Top-performing submissions for macroscopic pKa predictions achieved RMSE of 0.7-1.0 pKa units and included both quantum chemical and empirical approaches, where the total number of extra or missing macroscopic pKas predicted by these submissions were fewer than 8 for 24 molecules. A large number of submissions had RMSE spanning 1-3 pKa units. Molecules with sulfur-containing heterocycles or iodo and bromo groups were less accurately predicted on average considering all methods evaluated. For a subset of molecules, we utilized experimentally-determined microstates based on NMR to evaluate the dominant tautomer predictions for each macroscopic state. Prediction of dominant tautomers was a major source of error for microscopic pKa predictions, especially errors in charged tautomers. The degree of inaccuracy in pKa predictions observed in this challenge is detrimental to the protein-ligand binding affinity predictions due to errors in dominant protonation state predictions and the calculation of free energy corrections for multiple protonation states. Underestimation of ligand pKa by 1 unit can lead to errors in binding free energy errors up to 1.2 kcal/mol. The SAMPL6 pKa Challenge demonstrated the need for improving pKa prediction methods for drug-like molecules, especially for challenging moieties and multiprotic molecules.
Collapse
Affiliation(s)
- Mehtap Işık
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA.
- Tri-Institutional PhD Program in Chemical Biology, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY, 10065, USA.
| | - Ariën S Rustenburg
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
- Graduate Program in Physiology, Biophysics, and Systems Biology, Weill Cornell Medical College, New York, NY, 10065, USA
| | - Andrea Rizzi
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY, 10065, USA
| | - M R Gunner
- Department of Physics, City College of New York, New York, NY, 10031, USA
| | - David L Mobley
- Department of Pharmaceutical Sciences and Department of Chemistry, University of California, Irvine, Irvine, CA, 92697, USA
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| |
Collapse
|