1
|
Luo W, Zhou G, Zhu Z, Yuan Y, Ke G, Wei Z, Gao Z, Zheng H. Bridging Machine Learning and Thermodynamics for Accurate p K a Prediction. JACS AU 2024; 4:3451-3465. [PMID: 39328749 PMCID: PMC11423309 DOI: 10.1021/jacsau.4c00271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 07/07/2024] [Accepted: 07/10/2024] [Indexed: 09/28/2024]
Abstract
Integrating scientific principles into machine learning models to enhance their predictive performance and generalizability is a central challenge in the development of AI for Science. Herein, we introduce Uni-pK a, a novel framework that successfully incorporates thermodynamic principles into machine learning modeling, achieving high-precision predictions of acid dissociation constants (pK a), a crucial task in the rational design of drugs and catalysts, as well as a modeling challenge in computational physical chemistry for small organic molecules. Uni-pK a utilizes a comprehensive free energy model to represent molecular protonation equilibria accurately. It features a structure enumerator that reconstructs molecular configurations from pK a data, coupled with a neural network that functions as a free energy predictor, ensuring high-throughput, data-driven prediction while preserving thermodynamic consistency. Employing a pretraining-finetuning strategy with both predicted and experimental pK a data, Uni-pK a not only achieves state-of-the-art accuracy in chemoinformatics but also shows comparable precision to quantum mechanics-based methods.
Collapse
Affiliation(s)
- Weiliang Luo
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
- DP
Technology, Beijing 100089, China
| | - Gengmo Zhou
- DP
Technology, Beijing 100089, China
- Gaoling
School of Artificial Intelligence, Renmin
University of China, Beijing 100872, China
| | | | | | - Guolin Ke
- DP
Technology, Beijing 100089, China
| | - Zhewei Wei
- Gaoling
School of Artificial Intelligence, Renmin
University of China, Beijing 100872, China
| | | | | |
Collapse
|
2
|
Cao Y, Balduf T, Beachy MD, Bennett MC, Bochevarov AD, Chien A, Dub PA, Dyall KG, Furness JW, Halls MD, Hughes TF, Jacobson LD, Kwak HS, Levine DS, Mainz DT, Moore KB, Svensson M, Videla PE, Watson MA, Friesner RA. Quantum chemical package Jaguar: A survey of recent developments and unique features. J Chem Phys 2024; 161:052502. [PMID: 39092934 DOI: 10.1063/5.0213317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 07/12/2024] [Indexed: 08/04/2024] Open
Abstract
This paper is dedicated to the quantum chemical package Jaguar, which is commercial software developed and distributed by Schrödinger, Inc. We discuss Jaguar's scientific features that are relevant to chemical research as well as describe those aspects of the program that are pertinent to the user interface, the organization of the computer code, and its maintenance and testing. Among the scientific topics that feature prominently in this paper are the quantum chemical methods grounded in the pseudospectral approach. A number of multistep workflows dependent on Jaguar are covered: prediction of protonation equilibria in aqueous solutions (particularly calculations of tautomeric stability and pKa), reactivity predictions based on automated transition state search, assembly of Boltzmann-averaged spectra such as vibrational and electronic circular dichroism, as well as nuclear magnetic resonance. Discussed also are quantum chemical calculations that are oriented toward materials science applications, in particular, prediction of properties of optoelectronic materials and organic semiconductors, and molecular catalyst design. The topic of treatment of conformations inevitably comes up in real world research projects and is considered as part of all the workflows mentioned above. In addition, we examine the role of machine learning methods in quantum chemical calculations performed by Jaguar, from auxiliary functions that return the approximate calculation runtime in a user interface, to prediction of actual molecular properties. The current work is second in a series of reviews of Jaguar, the first having been published more than ten years ago. Thus, this paper serves as a rare milestone on the path that is being traversed by Jaguar's development in more than thirty years of its existence.
Collapse
Affiliation(s)
- Yixiang Cao
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, USA
| | - Ty Balduf
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, USA
| | - Michael D Beachy
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, USA
| | - M Chandler Bennett
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, USA
| | - Art D Bochevarov
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, USA
| | - Alan Chien
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, USA
| | - Pavel A Dub
- Schrödinger, Inc., 9868 Scranton Road, Suite 3200, San Diego, California 92121, USA
| | - Kenneth G Dyall
- Schrödinger, Inc., 101 SW Main St., Suite 1300, Portland, Oregon 97204, USA
| | - James W Furness
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, USA
| | - Mathew D Halls
- Schrödinger, Inc., 9868 Scranton Road, Suite 3200, San Diego, California 92121, USA
| | - Thomas F Hughes
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, USA
| | - Leif D Jacobson
- Schrödinger, Inc., 101 SW Main St., Suite 1300, Portland, Oregon 97204, USA
| | - H Shaun Kwak
- Schrödinger, Inc., 101 SW Main St., Suite 1300, Portland, Oregon 97204, USA
| | - Daniel S Levine
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, USA
| | - Daniel T Mainz
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, USA
| | - Kevin B Moore
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, USA
| | - Mats Svensson
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, USA
| | - Pablo E Videla
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, USA
| | - Mark A Watson
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, USA
| | - Richard A Friesner
- Department of Chemistry, Columbia University, 3000 Broadway, New York, New York 10027, USA
| |
Collapse
|
3
|
Giese TJ, Zeng J, Lerew L, McCarthy E, Tao Y, Ekesan Ş, York DM. Software Infrastructure for Next-Generation QM/MM-ΔMLP Force Fields. J Phys Chem B 2024; 128:6257-6271. [PMID: 38905451 PMCID: PMC11414325 DOI: 10.1021/acs.jpcb.4c01466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
We present software infrastructure for the design and testing of new quantum mechanical/molecular mechanical and machine-learning potential (QM/MM-ΔMLP) force fields for a wide range of applications. The software integrates Amber's molecular dynamics simulation capabilities with fast, approximate quantum models in the xtb package and machine-learning potential corrections in DeePMD-kit. The xtb package implements the recently developed density-functional tight-binding QM models with multipolar electrostatics and density-dependent dispersion (GFN2-xTB), and the interface with Amber enables their use in periodic boundary QM/MM simulations with linear-scaling QM/MM particle-mesh Ewald electrostatics. The accuracy of the semiempirical models is enhanced by including machine-learning correction potentials (ΔMLPs) enabled through an interface with the DeePMD-kit software. The goal of this paper is to present and validate the implementation of this software infrastructure in molecular dynamics and free energy simulations. The utility of the new infrastructure is demonstrated in proof-of-concept example applications. The software elements presented here are open source and freely available. Their interface provides a powerful enabling technology for the design of new QM/MM-ΔMLP models for studying a wide range of problems, including biomolecular reactivity and protein-ligand binding.
Collapse
Affiliation(s)
- Timothy J Giese
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Jinzhe Zeng
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Lauren Lerew
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Erika McCarthy
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Yujun Tao
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Şölen Ekesan
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Darrin M York
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, New Jersey 08854, United States
| |
Collapse
|
4
|
Schöller A, Woodcock HL, Boresch S. Exploring Routes to Enhance the Calculation of Free Energy Differences via Non-Equilibrium Work SQM/MM Switching Simulations Using Hybrid Charge Intermediates between MM and SQM Levels of Theory or Non-Linear Switching Schemes. Molecules 2023; 28:4006. [PMID: 37241747 PMCID: PMC10222338 DOI: 10.3390/molecules28104006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/28/2023] Open
Abstract
Non-equilibrium work switching simulations and Jarzynski's equation are a reliable method for computing free energy differences, ΔAlow→high, between two levels of theory, such as a pure molecular mechanical (MM) and a quantum mechanical/molecular mechanical (QM/MM) description of a system of interest. Despite the inherent parallelism, the computational cost of this approach can quickly become very high. This is particularly true for systems where the core region, the part of the system to be described at different levels of theory, is embedded in an environment such as explicit solvent water. We find that even for relatively simple solute-water systems, switching lengths of at least 5 ps are necessary to compute ΔAlow→high reliably. In this study, we investigate two approaches towards an affordable protocol, with an emphasis on keeping the switching length well below 5 ps. Inserting a hybrid charge intermediate state with modified partial charges, which resembles the charge distribution of the desired high level, makes it possible to obtain reliable calculations with 2 ps switches. Attempts using step-wise linear switching paths, on the other hand, did not lead to improvement, i.e., a faster convergence for all systems. To understand these findings, we analyzed the solutes' properties as a function of the partial charges used and the number of water molecules in direct contact with the solute, and studied the time needed for water molecules to reorient themselves upon a change in the solute's charge distribution.
Collapse
Affiliation(s)
- Andreas Schöller
- Faculty of Chemistry, Department of Computational Biological Chemistry, University of Vienna, Währingerstr. 17, A-1090 Vienna, Austria
- Vienna Doctoral School in Chemistry (DoSChem), University of Vienna, Währingerstr. 42, A-1090 Vienna, Austria
| | - H. Lee Woodcock
- Department of Chemistry, University of South Florida, 4202 E. Fowler Ave., CHE205, Tampa, FL 33620-5250, USA;
| | - Stefan Boresch
- Faculty of Chemistry, Department of Computational Biological Chemistry, University of Vienna, Währingerstr. 17, A-1090 Vienna, Austria
| |
Collapse
|
5
|
Pan X, Zhao F, Zhang Y, Wang X, Xiao X, Zhang JZH, Ji C. MolTaut: A Tool for the Rapid Generation of Favorable Tautomer in Aqueous Solution. J Chem Inf Model 2023; 63:1833-1840. [PMID: 36939644 DOI: 10.1021/acs.jcim.2c01393] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2023]
Abstract
Fast and proper treatment of the tautomeric states for drug-like molecules is critical in computer-aided drug discovery since the major tautomer of a molecule determines its pharmacophore features and physical properties. We present MolTaut, a tool for the rapid generation of favorable states of drug-like molecules in water. MolTaut works by enumerating possible tautomeric states with tautomeric transformation rules, ranking tautomers with their relative internal energies and solvation energies calculated by AI-based models, and generating preferred ionization states according to predicted microscopic pKa. Our test shows that the ranking ability of the AI-based tautomer scoring approach is comparable to the DFT method (wB97X/6-31G*//M062X/6-31G*/SMD) from which the AI models try to learn. We find that the substitution effect on tautomeric equilibrium is well predicted by MolTaut, which is helpful in computer-aided ligand design. The source code of MolTaut is freely available to researchers and can be accessed at https://github.com/xundrug/moltaut. To facilitate the usage of MolTaut by medicinal chemists, we made a free web server, which is available at http://moltaut.xundrug.cn. MolTaut is a handy tool for investigating the tautomerization issue in drug discovery.
Collapse
Affiliation(s)
- Xiaolin Pan
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Fanyu Zhao
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China.,Department of Chemistry, New York University, New York 10003, United States
| | - Yueqing Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Xingyu Wang
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Xudong Xiao
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
| | - John Z H Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China.,Department of Chemistry, New York University, New York 10003, United States.,Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi 030006, China
| | - Changge Ji
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
6
|
Zeng J, Tao Y, Giese TJ, York DM. QDπ: A Quantum Deep Potential Interaction Model for Drug Discovery. J Chem Theory Comput 2023; 19:1261-1275. [PMID: 36696673 PMCID: PMC9992268 DOI: 10.1021/acs.jctc.2c01172] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
We report QDπ-v1.0 for modeling the internal energy of drug molecules containing H, C, N, and O atoms. The QDπ model is in the form of a quantum mechanical/machine learning potential correction (QM/Δ-MLP) that uses a fast third-order self-consistent density-functional tight-binding (DFTB3/3OB) model that is corrected to a quantitatively high-level of accuracy through a deep-learning potential (DeepPot-SE). The model has the advantage that it is able to properly treat electrostatic interactions and handle changes in charge/protonation states. The model is trained against reference data computed at the ωB97X/6-31G* level (as in the ANI-1x data set) and compared to several other approximate semiempirical and machine learning potentials (ANI-1x, ANI-2x, DFTB3, MNDO/d, AM1, PM6, GFN1-xTB, and GFN2-xTB). The QDπ model is demonstrated to be accurate for a wide range of intra- and intermolecular interactions (despite its intended use as an internal energy model) and has shown to perform exceptionally well for relative protonation/deprotonation energies and tautomers. An example application to model reactions involved in RNA strand cleavage catalyzed by protein and nucleic acid enzymes illustrates QDπ has average errors less than 0.5 kcal/mol, whereas the other models compared have errors over an order of magnitude greater. Taken together, this makes QDπ highly attractive as a potential force field model for drug discovery.
Collapse
Affiliation(s)
- Jinzhe Zeng
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| | - Yujun Tao
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| | - Timothy J. Giese
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| | - Darrin M. York
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|
7
|
Bharatam PV, Valanju OR, Wani AA, Dhaked DK. Importance of tautomerism in drugs. Drug Discov Today 2023; 28:103494. [PMID: 36681235 DOI: 10.1016/j.drudis.2023.103494] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 12/08/2022] [Accepted: 01/13/2023] [Indexed: 01/20/2023]
Abstract
Tautomerism is an important phenomenon exhibited by many drugs. As we discuss in this review, identifying the different tautomers of drugs and exploring their importance in the mechanisms of drug action are integral components of current drug discovery. Nuclear magnetic resonance (NMR), infrared (IR), ultraviolet (UV), Raman, and terahertz spectroscopic techniques, as well as X-ray diffraction, are useful for exploring drug tautomerism. Quantum chemical methods, in association with pharmacoinformatics tools, are being used to evaluate tautomeric preferences in terms of energy effects. Desmotropy (i.e., tautomeric polymorphism) of the drugs is particularly important in drug delivery studies.
Collapse
Affiliation(s)
- Prasad V Bharatam
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research (NIPER), Sector 67, S.A.S. Nagar, Punjab 160062, India.
| | - Omkar R Valanju
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research (NIPER), Sector 67, S.A.S. Nagar, Punjab 160062, India
| | - Aabid A Wani
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research (NIPER), Sector 67, S.A.S. Nagar, Punjab 160062, India
| | - Devendra K Dhaked
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER)-Kolkata, Chunilal Bhawan, 168 Maniktala Main Road, Kolkata, West Bengal 700054, India
| |
Collapse
|
8
|
Deneva V, Slavova S, Kumanova A, Vassilev N, Nedeltcheva-Antonova D, Antonov L. Favipiravir-Tautomeric and Complexation Properties in Solution. Pharmaceuticals (Basel) 2022; 16:ph16010045. [PMID: 36678542 PMCID: PMC9864296 DOI: 10.3390/ph16010045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/16/2022] [Accepted: 12/23/2022] [Indexed: 12/31/2022] Open
Abstract
The tautomeric properties of favipiravir were investigated experimentally for the first time by using molecular spectroscopy (UV-Vis absorption, fluorescence and NMR), as well as DFT quantum-chemical calculations. According to the obtained results, the enol tautomer is substantially more stable in most of the organic solvents. In the presence of water, a keto form appears to be favored due to the specific solute-solvent interactions. Upon the addition of alkaline-earth-metal ions, deprotonation and complexation occurred simultaneously, giving the formation of 2 : 1 ligand : metal complexes. According to the theoretical simulations, the metal ion is captured between the carbonyl groups as a result of the size-fit effect.
Collapse
Affiliation(s)
- Vera Deneva
- Institute of Electronics, Bulgarian Academy of Sciences, 1784 Sofia, Bulgaria
- Institute of Organic Chemistry with Centre of Phytochemistry, Bulgarian Academy of Sciences, 1113 Sofia, Bulgaria
- Correspondence: (V.D.); (L.A.)
| | - Sofia Slavova
- Institute of Electronics, Bulgarian Academy of Sciences, 1784 Sofia, Bulgaria
- Institute of General and Inorganic Chemistry, Bulgarian Academy of Sciences, 1113 Sofia, Bulgaria
| | - Alina Kumanova
- Institute of Electronics, Bulgarian Academy of Sciences, 1784 Sofia, Bulgaria
| | - Nikolay Vassilev
- Institute of Organic Chemistry with Centre of Phytochemistry, Bulgarian Academy of Sciences, 1113 Sofia, Bulgaria
| | - Daniela Nedeltcheva-Antonova
- Institute of Electronics, Bulgarian Academy of Sciences, 1784 Sofia, Bulgaria
- Institute of Organic Chemistry with Centre of Phytochemistry, Bulgarian Academy of Sciences, 1113 Sofia, Bulgaria
| | - Luidmil Antonov
- Institute of Electronics, Bulgarian Academy of Sciences, 1784 Sofia, Bulgaria
- Correspondence: (V.D.); (L.A.)
| |
Collapse
|
9
|
Liu Z, Zubatiuk T, Roitberg A, Isayev O. Auto3D: Automatic Generation of the Low-Energy 3D Structures with ANI Neural Network Potentials. J Chem Inf Model 2022; 62:5373-5382. [PMID: 36112860 DOI: 10.1021/acs.jcim.2c00817] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Computational programs accelerate the chemical discovery processes but often need proper three-dimensional molecular information as part of the input. Getting optimal molecular structures is challenging because it requires enumerating and optimizing a huge space of stereoisomers and conformers. We developed the Python-based Auto3D package for generating the low-energy 3D structures using SMILES as the input. Auto3D is based on state-of-the-art algorithms and can automatize the isomer enumeration and duplicate filtering process, 3D building process, geometry optimization, and ranking process. Tested on 50 molecules with multiple unspecified stereocenters, Auto3D is guaranteed to find the stereoconfiguration that yields the lowest-energy conformer. With Auto3D, we provide an extension of the ANI model. The new model, dubbed ANI-2xt, is trained on a tautomer-rich data set. ANI-2xt is benchmarked with DFT methods on geometry optimization and electronic and Gibbs free energy calculations. Compared with ANI-2x, ANI-2xt provides a 42% error reduction for tautomeric reaction energy calculations when using the gold-standard coupled-cluster calculation as the reference. ANI-2xt can accurately predict the energies and is several orders of magnitude faster than DFT methods.
Collapse
Affiliation(s)
- Zhen Liu
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
| | - Tetiana Zubatiuk
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
| | - Adrian Roitberg
- Department of Chemistry, University of Florida, Gainesville, Florida32611, United States
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
| |
Collapse
|
10
|
Vazquez-Salazar LI, Boittier ED, Meuwly M. Uncertainty quantification for predictions of atomistic neural networks. Chem Sci 2022; 13:13068-13084. [PMID: 36425481 PMCID: PMC9667919 DOI: 10.1039/d2sc04056e] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 10/16/2022] [Indexed: 12/31/2023] Open
Abstract
The value of uncertainty quantification on predictions for trained neural networks (NNs) on quantum chemical reference data is quantitatively explored. For this, the architecture of the PhysNet NN was suitably modified and the resulting model (PhysNet-DER) was evaluated with different metrics to quantify its calibration, the quality of its predictions, and whether prediction error and the predicted uncertainty can be correlated. Training on the QM9 database and evaluating data in the test set within and outside the distribution indicate that error and uncertainty are not linearly related. However, the observed variance provides insight into the quality of the data used for training. Additionally, the influence of the chemical space covered by the training data set was studied by using a biased database. The results clarify that noise and redundancy complicate property prediction for molecules even in cases for which changes - such as double bond migration in two otherwise identical molecules - are small. The model was also applied to a real database of tautomerization reactions. Analysis of the distance between members in feature space in combination with other parameters shows that redundant information in the training dataset can lead to large variances and small errors whereas the presence of similar but unspecific information returns large errors but small variances. This was, e.g., observed for nitro-containing aliphatic chains for which predictions were difficult although the training set contained several examples for nitro groups bound to aromatic molecules. The finding underlines the importance of the composition of the training data and provides chemical insight into how this affects the prediction capabilities of a ML model. Finally, the presented method can be used for information-based improvement of chemical databases for target applications through active learning optimization.
Collapse
Affiliation(s)
| | - Eric D Boittier
- Department of Chemistry, University of Basel Basel Switzerland
| | - Markus Meuwly
- Department of Chemistry, University of Basel Basel Switzerland
- Department of Chemistry, Brown University USA
| |
Collapse
|
11
|
Göller AH. Reliable gas-phase tautomer equilibria of drug-like molecule scaffolds and the issue of continuum solvation. J Comput Aided Mol Des 2022; 36:805-824. [DOI: 10.1007/s10822-022-00480-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 09/29/2022] [Indexed: 11/06/2022]
|
12
|
Jacobson LD, Stevenson JM, Ramezanghorbani F, Ghoreishi D, Leswing K, Harder ED, Abel R. Transferable Neural Network Potential Energy Surfaces for Closed-Shell Organic Molecules: Extension to Ions. J Chem Theory Comput 2022; 18:2354-2366. [PMID: 35290063 DOI: 10.1021/acs.jctc.1c00821] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Transferable high dimensional neural network potentials (HDNNPs) have shown great promise as an avenue to increase the accuracy and domain of applicability of existing atomistic force fields for organic systems relevant to life science. We have previously reported such a potential (Schrödinger-ANI) that has broad coverage of druglike molecules. We extend that work here to cover ionic and zwitterionic druglike molecules expected to be relevant to drug discovery research activities. We report a novel HDNNP architecture, which we call QRNN, that predicts atomic charges and uses these charges as descriptors in an energy model that delivers conformational energies within chemical accuracy when measured against the reference theory it is trained to. Further, we find that delta learning based on a semiempirical level of theory approximately halves the errors. We test the models on torsion energy profiles, relative conformational energies, geometric parameters, and relative tautomer errors.
Collapse
Affiliation(s)
- Leif D Jacobson
- Schrödinger Inc., 1540 Broadway, 24th floor, New York, New York 10036, United States
| | - James M Stevenson
- Schrödinger Inc., 1540 Broadway, 24th floor, New York, New York 10036, United States
| | | | - Delaram Ghoreishi
- Schrödinger Inc., 1540 Broadway, 24th floor, New York, New York 10036, United States
| | - Karl Leswing
- Schrödinger Inc., 1540 Broadway, 24th floor, New York, New York 10036, United States
| | - Edward D Harder
- Schrödinger Inc., 1540 Broadway, 24th floor, New York, New York 10036, United States
| | - Robert Abel
- Schrödinger Inc., 1540 Broadway, 24th floor, New York, New York 10036, United States
| |
Collapse
|
13
|
Brovarets’ OO, Muradova A, Hovorun DM. Novel horizons of the conformationally-tautomeric transformations of the G·T base pairs: quantum-mechanical investigation. Mol Phys 2022. [DOI: 10.1080/00268976.2022.2026510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Ol’ha O. Brovarets’
- Department of Molecular and Quantum Biophysics, Institute of Molecular Biology and Genetics, National Academy of Sciences of Ukraine, Kyiv, Ukraine
| | - Alona Muradova
- Department of Molecular Biotechnology and Bioinformatics, Institute of High Technologies, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Dmytro M. Hovorun
- Department of Molecular and Quantum Biophysics, Institute of Molecular Biology and Genetics, National Academy of Sciences of Ukraine, Kyiv, Ukraine
- Department of Molecular Biotechnology and Bioinformatics, Institute of High Technologies, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| |
Collapse
|
14
|
Sharma S, Arya A, Cruz R, Cleaves II HJ. Automated Exploration of Prebiotic Chemical Reaction Space: Progress and Perspectives. Life (Basel) 2021; 11:1140. [PMID: 34833016 PMCID: PMC8624352 DOI: 10.3390/life11111140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 10/15/2021] [Accepted: 10/18/2021] [Indexed: 12/12/2022] Open
Abstract
Prebiotic chemistry often involves the study of complex systems of chemical reactions that form large networks with a large number of diverse species. Such complex systems may have given rise to emergent phenomena that ultimately led to the origin of life on Earth. The environmental conditions and processes involved in this emergence may not be fully recapitulable, making it difficult for experimentalists to study prebiotic systems in laboratory simulations. Computational chemistry offers efficient ways to study such chemical systems and identify the ones most likely to display complex properties associated with life. Here, we review tools and techniques for modelling prebiotic chemical reaction networks and outline possible ways to identify self-replicating features that are central to many origin-of-life models.
Collapse
Affiliation(s)
- Siddhant Sharma
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Department of Biochemistry, Deshbandhu College, University of Delhi, New Delhi 110019, India
- Department of Chemistry and Chemical Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
| | - Aayush Arya
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Department of Physics, Lovely Professional University, Jalandhar-Delhi GT Road, Phagwara 144001, India
| | - Romulo Cruz
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Big Data Laboratory, Information and Communications Technology Center (CTIC), National University of Engineering, Amaru 210, Lima 15333, Peru
| | - Henderson James Cleaves II
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| |
Collapse
|
15
|
Wieder M, Fass J, Chodera JD. Fitting quantum machine learning potentials to experimental free energy data: predicting tautomer ratios in solution. Chem Sci 2021; 12:11364-11381. [PMID: 34567495 PMCID: PMC8409483 DOI: 10.1039/d1sc01185e] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 07/05/2021] [Indexed: 11/21/2022] Open
Abstract
The computation of tautomer ratios of druglike molecules is enormously important in computer-aided drug discovery, as over a quarter of all approved drugs can populate multiple tautomeric species in solution. Unfortunately, accurate calculations of aqueous tautomer ratios—the degree to which these species must be penalized in order to correctly account for tautomers in modeling binding for computer-aided drug discovery—is surprisingly difficult. While quantum chemical approaches to computing aqueous tautomer ratios using continuum solvent models and rigid-rotor harmonic-oscillator thermochemistry are currently state of the art, these methods are still surprisingly inaccurate despite their enormous computational expense. Here, we show that a major source of this inaccuracy lies in the breakdown of the standard approach to accounting for quantum chemical thermochemistry using rigid rotor harmonic oscillator (RRHO) approximations, which are frustrated by the complex conformational landscape introduced by the migration of double bonds, creation of stereocenters, and introduction of multiple conformations separated by low energetic barriers induced by migration of a single proton. Using quantum machine learning (QML) methods that allow us to compute potential energies with quantum chemical accuracy at a fraction of the cost, we show how rigorous relative alchemical free energy calculations can be used to compute tautomer ratios in vacuum free from the limitations introduced by RRHO approximations. Furthermore, since the parameters of QML methods are tunable, we show how we can train these models to correct limitations in the underlying learned quantum chemical potential energy surface using free energies, enabling these methods to learn to generalize tautomer free energies across a broader range of predictions. We show how alchemical free energies can be calculated with QML potentials to identify deficiencies in RRHO approximations for computing tautomeric free energies, and how these potentials can be learned from experiment to improve prediction accuracy.![]()
Collapse
Affiliation(s)
- Marcus Wieder
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| | - Josh Fass
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA .,Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Graduate School of Medical Sciences New York NY 10065 USA
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| |
Collapse
|
16
|
Vazquez-Salazar LI, Boittier ED, Unke OT, Meuwly M. Impact of the Characteristics of Quantum Chemical Databases on Machine Learning Prediction of Tautomerization Energies. J Chem Theory Comput 2021; 17:4769-4785. [PMID: 34288675 DOI: 10.1021/acs.jctc.1c00363] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
An essential aspect for adequate predictions of chemical properties by machine learning models is the database used for training them. However, studies that analyze how the content and structure of the databases used for training impact the prediction quality are scarce. In this work, we analyze and quantify the relationships learned by a machine learning model (Neural Network) trained on five different reference databases (QM9, PC9, ANI-1E, ANI-1, and ANI-1x) to predict tautomerization energies from molecules in Tautobase. For this, characteristics such as the number of heavy atoms in a molecule, number of atoms of a given element, bond composition, or initial geometry on the quality of the predictions are considered. The results indicate that training on a chemically diverse database is crucial for obtaining good results and also that conformational sampling can partly compensate for limited coverage of chemical diversity. The overall best-performing reference database (ANI-1x) performs on average by 1 kcal/mol better than PC9, which, however, contains about 2 orders of magnitude fewer reference structures. On the other hand, PC9 is chemically more diverse by a factor of ∼5 as quantified by the number of atom-in-molecule-based fragments (amons) it contains compared with the ANI family of databases. A quantitative measure for deficiencies is the Kullback-Leibler divergence between reference and target distributions. It is explicitly demonstrated that when certain types of bonds need to be covered in the target database (Tautobase) but are undersampled in the reference databases, the resulting predictions are poor. Examples of this include the poor performance of all databases analyzed to predict C(sp2)-C(sp2) double bonds close to heteroatoms and azoles containing N-N and N-O bonds. Analysis of the results with a Tree MAP algorithm provides deeper understanding of specific deficiencies in predicting tautomerization energies by the reference datasets due to inadequate coverage of chemical space. Capitalizing on this information can be used to either improve existing databases or generate new databases of sufficient diversity for a range of machine learning (ML) applications in chemistry.
Collapse
Affiliation(s)
| | - Eric D Boittier
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Oliver T Unke
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany.,DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland.,Department of Chemistry, Brown University, Providence, Rhode Island 02912, United States
| |
Collapse
|
17
|
Brovarets' OO, Muradova A, Hovorun DM. Novel mechanisms of the conformational transformations of the biologically important G·C nucleobase pairs in Watson–Crick, Hoogsteen and wobble configurations via the mutual rotations of the bases around the intermolecular H-bonds: a QM/QTAIM study. RSC Adv 2021; 11:25700-25730. [PMID: 35478902 PMCID: PMC9036977 DOI: 10.1039/d0ra08702e] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 06/09/2021] [Indexed: 01/12/2023] Open
Abstract
It was established conformational transformations of the G·C nucleobase pairs, occurring via the mutual rotation of the G and C bases around the intermolecular H-bonds.
Collapse
Affiliation(s)
- Ol'ha O. Brovarets'
- Department of Molecular and Quantum Biophysics
- Institute of Molecular Biology and Genetics
- National Academy of Sciences of Ukraine
- Kyiv
- Ukraine
| | - Alona Muradova
- Department of Molecular Biotechnology and Bioinformatics
- Institute of High Technologies
- Taras Shevchenko National University of Kyiv
- Kyiv
- Ukraine
| | - Dmytro M. Hovorun
- Department of Molecular and Quantum Biophysics
- Institute of Molecular Biology and Genetics
- National Academy of Sciences of Ukraine
- Kyiv
- Ukraine
| |
Collapse
|
18
|
Baker CM, Kidley NJ, Papachristos K, Hotson M, Carson R, Gravestock D, Pouliot M, Harrison J, Dowling A. Tautomer Standardization in Chemical Databases: Deriving Business Rules from Quantum Chemistry. J Chem Inf Model 2020; 60:3781-3791. [PMID: 32644790 DOI: 10.1021/acs.jcim.0c00232] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Databases of small, potentially bioactive molecules are ubiquitous across the industry and academia. Designed such that each unique compound should appear only once, the multiplicity of ways in which many compounds can be represented means that these databases require methods for standardizing the representation of chemistry. This is commonly achieved through the use of "Chemistry Business Rules", sets of predefined rules that describe the "house style" of the database in question. At Syngenta, the historical approach to the design of chemistry business rules has been to focus on consistency of representation, with chemical relevance given secondary consideration. In this work, we overturn that convention. Through the use of quantum chemistry calculations, we define a set of chemistry business rules for tautomer standardization that reproduces gas-phase energetic preferences. We go on to show that, compared to our historic approach, this method yields tautomers that are in better agreement with those observed experimentally in condensed phases and that are better suited for use in predictive models.
Collapse
Affiliation(s)
- Christopher M Baker
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - Nathan J Kidley
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | | | - Matthew Hotson
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - Rob Carson
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - David Gravestock
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - Martin Pouliot
- Syngenta Crop Protection, Schaffhauserstrasse, Stein CH-4332, Switzerland
| | - Jim Harrison
- Datacraft Technologies, 110 Parkwood Place, Anstead, QLD 4070, Australia
| | - Alan Dowling
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| |
Collapse
|
19
|
Levine DS, Watson MA, Jacobson LD, Dickerson CE, Yu HS, Bochevarov AD. Pattern-free generation and quantum mechanical scoring of ring-chain tautomers. J Comput Aided Mol Des 2020; 35:417-431. [PMID: 32830300 DOI: 10.1007/s10822-020-00334-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2020] [Accepted: 07/21/2020] [Indexed: 11/24/2022]
Abstract
In contrast to the computational generation of conventional tautomers, the analogous operation that would produce ring-chain tautomers is rarely available in cheminformatics codes. This is partly due to the perceived unimportance of ring-chain tautomerism and partly because specialized algorithms are required to realize the non-local proton transfers that occur during ring-chain rearrangement. Nevertheless, for some types of organic compounds, including sugars, warfarin analogs, fluorescein dyes and some drug-like compounds, ring-chain tautomerism cannot be ignored. In this work, a novel ring-chain tautomer generation algorithm is presented. It differs from previously proposed solutions in that it does not rely on hard-coded patterns of proton migrations and bond rearrangements, and should therefore be more general and maintainable. We deploy this algorithm as part of a workflow which provides an automated solution for tautomer generation and scoring. The workflow identifies protonatable and deprotonatable sites in the molecule using a previously described approach based on rapid micro-pKa prediction. These data are used to distribute the active protons among the protonatable sites exhaustively, at which point alternate resonance structures are considered to obtain pairs of atoms with opposite formal charge. These pairs are connected with a single bond and a 3D undistorted geometry is generated. The scoring of the generated tautomers is performed with a subsequent density functional theory calculation employing an implicit solvent model. We demonstrate the performance of our workflow on several types of organic molecules known to exist in ring-chain tautomeric equilibria in solution. In particular, we show that some ring-chain tautomers not found using previously published algorithms are successfully located by ours.
Collapse
Affiliation(s)
- Daniel S Levine
- Schrödinger, Inc., 120 West 45th St, New York, NY, 10036, USA
| | - Mark A Watson
- Schrödinger, Inc., 120 West 45th St, New York, NY, 10036, USA
| | - Leif D Jacobson
- Schrödinger, Inc., 120 West 45th St, New York, NY, 10036, USA.,Schrödinger, Inc., Suite 1300, 101 SW Main Street, Portland, OR, 97204, USA
| | - Claire E Dickerson
- Schrödinger, Inc., 120 West 45th St, New York, NY, 10036, USA.,College of Chemistry & Biochemistry, University of California, Los Angeles, CA, 90095, USA
| | - Haoyu S Yu
- Schrödinger, Inc., 120 West 45th St, New York, NY, 10036, USA
| | | |
Collapse
|
20
|
Abstract
There is no experimental information about the tautomerism of Favipiravir (T-705). Therefore, its tautomeric state was predicted by using density functional theory in gas phase and in solution (toluene, acetonitrile and water). The results have shown that, in neutral state, the enol form is strongly dominating in both gas phase and solution. The carboxamide group is easily protonated in the presence of acid, which leads to shift of the tautomeric equilibrium toward the keto tautomer. In order to validate the theoretical predictions, 2-hydroxy pyridine and 2-hydroxy pyrazine were also included in the set of studied compounds. The available experimental data about their tautomerism are in very good agreement with the theoretical predictions, which validate the conclusions made for T-705.
Collapse
|
21
|
Dhaked DK, Guasch L, Nicklaus MC. Tautomer Database: A Comprehensive Resource for Tautomerism Analyses. J Chem Inf Model 2020; 60:1090-1100. [PMID: 32027495 DOI: 10.1021/acs.jcim.9b01156] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We report a database of tautomeric structures that contains 2819 tautomeric tuples extracted from 171 publications. Each tautomeric entry has been annotated with experimental conditions reported in the respective publication, plus bibliographic details, structural identifiers (e.g., NCI/CADD identifiers FICTS, FICuS, uuuuu, and Standard InChI), and chemical information (e.g., SMILES, molecular weight). The majority of tautomeric tuples found were pairs; the remaining 10% were triples, quadruples, or quintuples, amounting to a total number of structures of 5977. The types of tautomerism were mainly prototropic tautomerism (79%), followed by ring-chain (13%) and valence tautomerism (8%). The experimental conditions reported in the publications included about 50 pure solvents and 9 solvent mixtures with 26 unique spectroscopic or nonspectroscopic methods. 1H and 13C NMR were the most frequently used methods. A total of 77 different tautomeric transform rules (SMIRKS) are covered by at least one example tuple in the database. This database is freely available as a spreadsheet at https://cactus.nci.nih.gov/download/tautomer/.
Collapse
Affiliation(s)
- Devendra K Dhaked
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| | - Laura Guasch
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| | - Marc C Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| |
Collapse
|
22
|
Dhaked DK, Ihlenfeldt WD, Patel H, Delannée V, Nicklaus MC. Toward a Comprehensive Treatment of Tautomerism in Chemoinformatics Including in InChI V2. J Chem Inf Model 2020; 60:1253-1275. [PMID: 32043883 DOI: 10.1021/acs.jcim.9b01080] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
We have collected 86 different transforms of tautomeric interconversions. Out of those, 54 are for prototropic (non-ring-chain) tautomerism, 21 for ring-chain tautomerism, and 11 for valence tautomerism. The majority of these rules have been extracted from experimental literature. Twenty rules, covering the most well-known types of tautomerism such as keto-enol tautomerism, were taken from the default handling of tautomerism by the chemoinformatics toolkit CACTVS. The rules were analyzed against nine differerent databases totaling over 400 million (non-unique) structures as to their occurrence rates, mutual overlap in coverage, and recapitulation of the rules' enumerated tautomer sets by InChI V.1.05, both in InChI's Standard and a Nonstandard version with the increased tautomer-handling options 15T and KET turned on. These results and the background of this study are discussed in the context of the IUPAC InChI Project tasked with the redesign of handling of tautomerism for an InChI version 2. Applying the rules presented in this paper would approximately triple the number of compounds in typical small-molecule databases that would be affected by tautomeric interconversion by InChI V2. A web tool has been created to test these rules at https://cactus.nci.nih.gov/tautomerizer.
Collapse
Affiliation(s)
- Devendra K Dhaked
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| | | | - Hitesh Patel
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| | - Victorien Delannée
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| | - Marc C Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| |
Collapse
|