1
|
Dhaked DK, Nicklaus MC. What impact does tautomerism have on drug discovery and development? Expert Opin Drug Discov 2024:1-6. [PMID: 39014878 DOI: 10.1080/17460441.2024.2379873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 07/10/2024] [Indexed: 07/18/2024]
Affiliation(s)
- Devendra K Dhaked
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), Kolkata, India
| | - Marc C Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, MD, USA
| |
Collapse
|
2
|
Abarbanel OD, Hutchison GR. QupKake: Integrating Machine Learning and Quantum Chemistry for Micro-p Ka Predictions. J Chem Theory Comput 2024. [PMID: 38832803 DOI: 10.1021/acs.jctc.4c00328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]
Abstract
Accurate prediction of micro-pKa values is crucial for understanding and modulating the acidity and basicity of organic molecules, with applications in drug discovery, materials science, and environmental chemistry. This work introduces QupKake, a novel method that combines graph neural network models with semiempirical quantum mechanical (QM) features to achieve exceptional accuracy and generalization in micro-pKa prediction. QupKake outperforms state-of-the-art models on a variety of benchmark data sets, with root-mean-square errors between 0.5 and 0.8 pKa units on five external test sets. Feature importance analysis reveals the crucial role of QM features in both the reaction site enumeration and micro-pKa prediction models. QupKake represents a significant advancement in micro-pKa prediction, offering a powerful tool for various applications in chemistry and beyond.
Collapse
Affiliation(s)
- Omri D Abarbanel
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States
| | - Geoffrey R Hutchison
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States
- Department of Chemical and Petroleum Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, Pennsylvania 15261, United States
| |
Collapse
|
3
|
Eriksen CA, Andersen JL, Fagerberg R, Merkle D. Toward the Reconciliation of Inconsistent Molecular Structures from Biochemical Databases. J Comput Biol 2024; 31:498-512. [PMID: 38758924 DOI: 10.1089/cmb.2024.0520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2024] Open
Abstract
Information on the structure of molecules, retrieved via biochemical databases, plays a pivotal role in various disciplines, including metabolomics, systems biology, and drug discovery. No such database can be complete and it is often necessary to incorporate data from several sources. However, the molecular structure for a given compound is not necessarily consistent between databases. This article presents StructRecon, a novel tool for resolving unique molecular structures from database identifiers. Currently, identifiers from BiGG, ChEBI, Escherichia coli Metabolome Database (ECMDB), MetaNetX, and PubChem are supported. StructRecon traverses the cross-links between entries in different databases to construct what we call identifier graphs. The goal of these graphs is to offer a more complete view of the total information available on a given compound across all the supported databases. To reconcile discrepancies met during the traversal of the databases, we develop an extensible model for molecular structure supporting multiple independent levels of detail, which allows standardization of the structure to be applied iteratively. In some cases, our standardization approach results in multiple candidate structures for a given compound, in which case a random walk-based algorithm is used to select the most likely structure among incompatible alternatives. As a case study, we applied StructRecon to the EColiCore2 model. We found at least one structure for 98.66% of its compounds, which is more than twice as many as possible when using the databases in more standard ways not considering the complex network of cross-database references captured by our identifier graphs. StructRecon is open-source and modular, which enables support for more databases in the future.
Collapse
Affiliation(s)
- Casper Asbjørn Eriksen
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Jakob Lykke Andersen
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Rolf Fagerberg
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Daniel Merkle
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
4
|
Pasquini M, Stenta M. LinChemIn: Route Arithmetic─Operations on Digital Synthetic Routes. J Chem Inf Model 2024; 64:1765-1771. [PMID: 38480486 DOI: 10.1021/acs.jcim.3c01819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Computational tools are revolutionizing our understanding and prediction of chemical reactivity by combining traditional data analysis techniques with new predictive models. These tools extract additional value from the reaction data corpus, but to effectively convert this value into actionable knowledge, domain specialists need to interact easily with the computer-generated output. In this application note, we demonstrate the capabilities of the open-source Python toolkit LinChemIn, which simplifies the manipulation of reaction networks and provides advanced functionality for working with synthetic routes. LinChemIn ensures chemical consistency when merging, editing, mining, and analyzing reaction networks. Its flexible input interface can process routes from various sources, including predictive models and expert input. The toolkit also efficiently extracts individual routes from the combined synthetic tree, identifying alternative paths and reaction combinations. By reducing the operational barrier to accessing and analyzing synthetic routes from multiple sources, LinChemIn facilitates a constructive interplay between artificial intelligence and human expertise.
Collapse
Affiliation(s)
- Marta Pasquini
- Syngenta Crop Protection AG, Schaffhauserstrasse, 4332 Stein, AG, Switzerland
| | - Marco Stenta
- Syngenta Crop Protection AG, Schaffhauserstrasse, 4332 Stein, AG, Switzerland
| |
Collapse
|
5
|
Cornell AP, Kim S, Cuadros J, Bucholtz EC, Pence HE, Potenzone R, Belford RE. IUPAC International Chemical Identifier (InChI)-related education and training materials through InChI Open Education Resource (OER). CHEMISTRY TEACHER INTERNATIONAL : BEST PRACTICES IN CHEMISTRY EDUCATION 2024; 6:77-91. [PMID: 38601265 PMCID: PMC11003456 DOI: 10.1515/cti-2023-0009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 12/11/2023] [Indexed: 04/12/2024]
Abstract
The IUPAC International Chemical Identifier (InChI) is a structure-based chemical identifier that encodes various aspects of a chemical structure into a hierarchically layered line notation. Because InChI is non-proprietary, open-source, and freely available to everyone, it is adopted in popular chemical information resources and software programs. This paper describes the InChI Open Education Resource (OER) (https://www.inchi-trust.org/oer/), designed to provide educators and other interested parties with resources, training material, and information related to InChI. Currently, the OER contains over 100 materials collected from various sources and provides users with search, filtering, and sorting functionalities to locate specific records. New relevant materials can be suggested by anyone, allowing the scientific community to share and find InChI-related resources. This paper will show how to use the InChI OER tag taxonomy to filter content and demonstrate two resources within the InChI OER; the ChemNames2LCSS Google Sheet and the InChILayersExplorer, an Excel spreadsheet that breaks an InChI into its layers. While the InChI OER is of value to a broader chemistry community, this paper seeks to reach out to chemical educators and provide them with an understanding of InChI and its role in the practice of science.
Collapse
Affiliation(s)
- Andrew P. Cornell
- Department of Chemistry, University of Arkansas at Little Rock, Little Rock, AR72204, USA
| | - Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD20894, USA
| | - Jordi Cuadros
- Department of Quantitative Methods, IQS Universitat Ramon Llull, Barcelona, 08017, Spain
| | - Ehren C. Bucholtz
- Department of Basic Sciences, University of Health Sciences and Pharmacy in St. Louis, St. Louis, MO63110, USA
| | - Harry E. Pence
- Department of Chemistry and Biochemistry, State University of New York at Oneonta, Oneonta, NY, 13820, USA
| | | | - Robert E. Belford
- Department of Chemistry, University of Arkansas at Little Rock, Little Rock, AR72204, USA
| |
Collapse
|
6
|
Ertl P. Database of 4 Million Medicinal Chemistry-Relevant Ring Systems. J Chem Inf Model 2024; 64:1245-1250. [PMID: 38311838 DOI: 10.1021/acs.jcim.3c01812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
Central ring systems are the most important part of bioactive molecules. They determine molecule shape, keep substituents in their proper positions, and also influence global molecular properties. In the present study, a database of 4 million medicinal chemistry-relevant ring systems has been created, not by crude random enumeration but by applying a set of rules derived by analyzing rings present in bioactive molecules. The aromatic properties and tautomer stability of generated rings have also been considered to ensure that the rings in the database are stable and chemically reasonable. 99.2% of these rings are novel and not included in molecules in the ChEMBL or PubChem databases. This large database of ring systems has been created with the goal to provide support for bioisosteric design and scaffold hopping as well as to be used in generative chemistry applications. The complete set of created rings is available for download in the SMILES format from https://peter-ertl.com/molecular/data/.
Collapse
Affiliation(s)
- Peter Ertl
- Global Discovery Chemistry, Biomedical Research, Novartis CH-4056 Basel, Switzerland
| |
Collapse
|
7
|
Mansouri K, Moreira-Filho JT, Lowe CN, Charest N, Martin T, Tkachenko V, Judson R, Conway M, Kleinstreuer NC, Williams AJ. Free and open-source QSAR-ready workflow for automated standardization of chemical structures in support of QSAR modeling. J Cheminform 2024; 16:19. [PMID: 38378618 PMCID: PMC10880251 DOI: 10.1186/s13321-024-00814-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/10/2024] [Indexed: 02/22/2024] Open
Abstract
The rapid increase of publicly available chemical structures and associated experimental data presents a valuable opportunity to build robust QSAR models for applications in different fields. However, the common concern is the quality of both the chemical structure information and associated experimental data. This is especially true when those data are collected from multiple sources as chemical substance mappings can contain many duplicate structures and molecular inconsistencies. Such issues can impact the resulting molecular descriptors and their mappings to experimental data and, subsequently, the quality of the derived models in terms of accuracy, repeatability, and reliability. Herein we describe the development of an automated workflow to standardize chemical structures according to a set of standard rules and generate two and/or three-dimensional "QSAR-ready" forms prior to the calculation of molecular descriptors. The workflow was designed in the KNIME workflow environment and consists of three high-level steps. First, a structure encoding is read, and then the resulting in-memory representation is cross-referenced with any existing identifiers for consistency. Finally, the structure is standardized using a series of operations including desalting, stripping of stereochemistry (for two-dimensional structures), standardization of tautomers and nitro groups, valence correction, neutralization when possible, and then removal of duplicates. This workflow was initially developed to support collaborative modeling QSAR projects to ensure consistency of the results from the different participants. It was then updated and generalized for other modeling applications. This included modification of the "QSAR-ready" workflow to generate "MS-ready structures" to support the generation of substance mappings and searches for software applications related to non-targeted analysis mass spectrometry. Both QSAR and MS-ready workflows are freely available in KNIME, via standalone versions on GitHub, and as docker container resources for the scientific community. Scientific contribution: This work pioneers an automated workflow in KNIME, systematically standardizing chemical structures to ensure their readiness for QSAR modeling and broader scientific applications. By addressing data quality concerns through desalting, stereochemistry stripping, and normalization, it optimizes molecular descriptors' accuracy and reliability. The freely available resources in KNIME, GitHub, and docker containers democratize access, benefiting collaborative research and advancing diverse modeling endeavors in chemistry and mass spectrometry.
Collapse
Affiliation(s)
- Kamel Mansouri
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA.
| | - José T Moreira-Filho
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA
| | - Charles N Lowe
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Nathaniel Charest
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Todd Martin
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | | | - Richard Judson
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Mike Conway
- National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA
| | - Nicole C Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA
| | - Antony J Williams
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| |
Collapse
|
8
|
Witting M, Malik A, Leach A, Bridge A, Aimo L, Conroy MJ, O'Donnell VB, Hoffmann N, Kopczynski D, Giacomoni F, Paulhe N, Gassiot AC, Poupin N, Jourdan F, Bertrand-Michel J. Challenges and perspectives for naming lipids in the context of lipidomics. Metabolomics 2024; 20:15. [PMID: 38267595 PMCID: PMC10808356 DOI: 10.1007/s11306-023-02075-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 12/01/2023] [Indexed: 01/26/2024]
Abstract
INTRODUCTION Lipids are key compounds in the study of metabolism and are increasingly studied in biology projects. It is a very broad family that encompasses many compounds, and the name of the same compound may vary depending on the community where they are studied. OBJECTIVES In addition, their structures are varied and complex, which complicates their analysis. Indeed, the structural resolution does not always allow a complete level of annotation so the actual compound analysed will vary from study to study and should be clearly stated. For all these reasons the identification and naming of lipids is complicated and very variable from one study to another, it needs to be harmonized. METHODS & RESULTS In this position paper we will present and discuss the different way to name lipids (with chemoinformatic and semantic identifiers) and their importance to share lipidomic results. CONCLUSION Homogenising this identification and adopting the same rules is essential to be able to share data within the community and to map data on functional networks.
Collapse
Affiliation(s)
- Michael Witting
- Metabolomics and Proteomics Core, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
- Chair of Analytical Food Chemistry, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 2, 85354, Freising-Weihenstephan, Germany
| | - Adnan Malik
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Andrew Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Alan Bridge
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1211, Geneva 4, Switzerland
| | - Lucila Aimo
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1211, Geneva 4, Switzerland
| | - Matthew J Conroy
- Division of Infection and Immunity, Systems Immunity Research Institute, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - Valerie B O'Donnell
- Division of Infection and Immunity, Systems Immunity Research Institute, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - Nils Hoffmann
- Institute for Bio- and Geosciences (IBG-5), Forschungszentrum Jülich GmbH, 52425, Jülich, Germany
| | - Dominik Kopczynski
- Institute for Analytical Chemistry, Universität Wien, Währingerstrasse 38, 1090, Vienna, Austria
| | - Franck Giacomoni
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
- MetaboHUB, National Infrastructure of Metabolomics and Fluxomics ANR-11-INBS-0010, 31077, Toulouse, France
| | - Nils Paulhe
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
- MetaboHUB, National Infrastructure of Metabolomics and Fluxomics ANR-11-INBS-0010, 31077, Toulouse, France
| | - Amaury Cazenave Gassiot
- Singapore Lipidomics Incubator, Life Sciences Institute, and Precision Medicine TRP, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Nathalie Poupin
- UMR1331 Toxalim, Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Fabien Jourdan
- MetaboHUB, National Infrastructure of Metabolomics and Fluxomics ANR-11-INBS-0010, 31077, Toulouse, France
- UMR1331 Toxalim, Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Justine Bertrand-Michel
- MetaboHUB, National Infrastructure of Metabolomics and Fluxomics ANR-11-INBS-0010, 31077, Toulouse, France.
- I2MC, Inserm U1297, Université de Toulouse, Toulouse, France.
| |
Collapse
|
9
|
Pan X, Zhao F, Zhang Y, Wang X, Xiao X, Zhang JZH, Ji C. MolTaut: A Tool for the Rapid Generation of Favorable Tautomer in Aqueous Solution. J Chem Inf Model 2023; 63:1833-1840. [PMID: 36939644 DOI: 10.1021/acs.jcim.2c01393] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2023]
Abstract
Fast and proper treatment of the tautomeric states for drug-like molecules is critical in computer-aided drug discovery since the major tautomer of a molecule determines its pharmacophore features and physical properties. We present MolTaut, a tool for the rapid generation of favorable states of drug-like molecules in water. MolTaut works by enumerating possible tautomeric states with tautomeric transformation rules, ranking tautomers with their relative internal energies and solvation energies calculated by AI-based models, and generating preferred ionization states according to predicted microscopic pKa. Our test shows that the ranking ability of the AI-based tautomer scoring approach is comparable to the DFT method (wB97X/6-31G*//M062X/6-31G*/SMD) from which the AI models try to learn. We find that the substitution effect on tautomeric equilibrium is well predicted by MolTaut, which is helpful in computer-aided ligand design. The source code of MolTaut is freely available to researchers and can be accessed at https://github.com/xundrug/moltaut. To facilitate the usage of MolTaut by medicinal chemists, we made a free web server, which is available at http://moltaut.xundrug.cn. MolTaut is a handy tool for investigating the tautomerization issue in drug discovery.
Collapse
Affiliation(s)
- Xiaolin Pan
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Fanyu Zhao
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China.,Department of Chemistry, New York University, New York 10003, United States
| | - Yueqing Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Xingyu Wang
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Xudong Xiao
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
| | - John Z H Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China.,Department of Chemistry, New York University, New York 10003, United States.,Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi 030006, China
| | - Changge Ji
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
10
|
Cachau R, Shahsavari S, Cho SK. The in-silico evaluation of important GLUT9 residue for uric acid transport based on renal hypouricemia type 2. Chem Biol Interact 2023; 373:110378. [PMID: 36736875 PMCID: PMC10596759 DOI: 10.1016/j.cbi.2023.110378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 01/17/2023] [Accepted: 01/31/2023] [Indexed: 02/04/2023]
Abstract
Uric acid is the end product of purine metabolism. Uric acid transporters in the renal proximal tubule plays a key role in uric acid transport. Functional abnormalities in these transporters could lead to high or low levels of uric acid in the blood plasma, known as hyperuricemia and hypouricemia, respectively. GLUT9 has been reported as a key transporter for uric acid reuptake in renal proximal tubule. GLUT9 mutation is known as causal gene for renal hypouricemia due to defective uric acid uptake, with more severe cases resulting in urolithiasis and exercise induced acute kidney injury (EIAKI). However, the effect of mutation is not fully investigated and hard to predict the change of binding affinity. We comprehensively described the effect of GLUT9 mutation for uric acid transport using molecular dynamics and investigated the specific site for uric acid binding differences. R171C and R380W showed the significant disruption of the structure not affecting transport dynamics whereas L75R, G216R, N333S, and P412R showed the reduced affinity of the extracellular vestibular area towards urate. Interestingly, T125 M showed a significant increase in intracellular binding energy, associated with distorted geometries. We can use this classification to consider the effect mutations by comparing the transport profiles of mutants against those of chemical candidates for transport and providing new perspectives to urate lowering drug discovery using GLUT9.
Collapse
Affiliation(s)
- Raul Cachau
- Integrated Data Science Section, Research Technologies Branch, National Institute of Allergies and Infectious Diseases, Bethesda, MD, USA
| | | | - Sung Kweon Cho
- Center for Cancer Research, National Cancer Institute, Frederick, MD, USA; Department of Pharmacology Ajou University, School of Medicine, Suwon, South Korea.
| |
Collapse
|
11
|
Bharatam PV, Valanju OR, Wani AA, Dhaked DK. Importance of tautomerism in drugs. Drug Discov Today 2023; 28:103494. [PMID: 36681235 DOI: 10.1016/j.drudis.2023.103494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 12/08/2022] [Accepted: 01/13/2023] [Indexed: 01/20/2023]
Abstract
Tautomerism is an important phenomenon exhibited by many drugs. As we discuss in this review, identifying the different tautomers of drugs and exploring their importance in the mechanisms of drug action are integral components of current drug discovery. Nuclear magnetic resonance (NMR), infrared (IR), ultraviolet (UV), Raman, and terahertz spectroscopic techniques, as well as X-ray diffraction, are useful for exploring drug tautomerism. Quantum chemical methods, in association with pharmacoinformatics tools, are being used to evaluate tautomeric preferences in terms of energy effects. Desmotropy (i.e., tautomeric polymorphism) of the drugs is particularly important in drug delivery studies.
Collapse
Affiliation(s)
- Prasad V Bharatam
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research (NIPER), Sector 67, S.A.S. Nagar, Punjab 160062, India.
| | - Omkar R Valanju
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research (NIPER), Sector 67, S.A.S. Nagar, Punjab 160062, India
| | - Aabid A Wani
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research (NIPER), Sector 67, S.A.S. Nagar, Punjab 160062, India
| | - Devendra K Dhaked
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER)-Kolkata, Chunilal Bhawan, 168 Maniktala Main Road, Kolkata, West Bengal 700054, India
| |
Collapse
|
12
|
Lavigne C, Gomes G, Pollice R, Aspuru-Guzik A. Guided discovery of chemical reaction pathways with imposed activation. Chem Sci 2022; 13:13857-13871. [PMID: 36544742 PMCID: PMC9710306 DOI: 10.1039/d2sc05135d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/09/2022] [Indexed: 11/12/2022] Open
Abstract
Computational power and quantum chemical methods have improved immensely since computers were first applied to the study of reactivity, but the de novo prediction of chemical reactions has remained challenging. We show that complex reaction pathways can be efficiently predicted in a guided manner using chemical activation imposed by geometrical constraints of specific reactive modes, which we term imposed activation (IACTA). Our approach is demonstrated on realistic and challenging chemistry, such as a triple cyclization cascade involved in the total synthesis of a natural product, a water-mediated Michael addition, and several oxidative addition reactions of complex drug-like molecules. Notably and in contrast with traditional hand-guided computational chemistry calculations, our method requires minimal human involvement and no prior knowledge of the products or the associated mechanisms. We believe that IACTA will be a transformational tool to screen for chemical reactivity and to study both by-product formation and decomposition pathways in a guided way.
Collapse
Affiliation(s)
- Cyrille Lavigne
- Department of Computer Science, University of Toronto214 College St.TorontoOntarioM5T 3A1Canada
| | - Gabe Gomes
- Department of Computer Science, University of Toronto214 College St.TorontoOntarioM5T 3A1Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto80 St George StTorontoOntarioM5S 3H6Canada
| | - Robert Pollice
- Department of Computer Science, University of Toronto214 College St.TorontoOntarioM5T 3A1Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto80 St George StTorontoOntarioM5S 3H6Canada
| | - Alán Aspuru-Guzik
- Department of Computer Science, University of Toronto214 College St.TorontoOntarioM5T 3A1Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto80 St George StTorontoOntarioM5S 3H6Canada,Department of Chemical Engineering & Applied Chemistry, University of Toronto200 College St.OntarioM5S 3E5Canada,Department of Materials Science & Engineering, University of Toronto184 College St.OntarioM5S 3E4Canada,Vector Institute for Artificial Intelligence661 University Ave Suite 710TorontoOntarioM5G 1M1Canada,Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR)661 University AveTorontoOntarioM5GCanada
| |
Collapse
|
13
|
Kappler MA, Lowden CT, Culberson J. BioChemUDM: a unified data model for compounds and assays. PURE APPL CHEM 2022. [DOI: 10.1515/pac-2021-1004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
We present a simple, biochemistry data model (BioChemUDM) to represent compounds and assays for the purpose of capturing, reporting, and sharing data, both biological and chemical. We describe an approach to register a compound based solely on a stereo-enhanced sketch, thereby replacing the need for additional user-specified “flags” at the time of compound registration. We describe a convention for string-based labels that enables inter-organizational compound and assay data sharing. By co-adopting the BioChemUDM, we have successfully enabled same-day exchange and utilization of chemical and biological information with various stakeholders.
Collapse
Affiliation(s)
- Michael A. Kappler
- IDEAYA Biosciences Inc , 7000 Shoreline Blvd Ste 350 , South San Francisco , CA 94080 , USA
| | | | - J. Chris Culberson
- Workflow Informatics Corp , 9316 Bramden Ct , Wake Forest , NC 27587 , USA
| |
Collapse
|
14
|
Dolciami D, Villasclaras-Fernandez E, Kannas C, Meniconi M, Al-Lazikani B, Antolin AA. canSAR chemistry registration and standardization pipeline. J Cheminform 2022; 14:28. [PMID: 35643512 PMCID: PMC9148294 DOI: 10.1186/s13321-022-00606-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 04/04/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Integration of medicinal chemistry data from numerous public resources is an increasingly important part of academic drug discovery and translational research because it can bring a wealth of important knowledge related to compounds in one place. However, different data sources can report the same or related compounds in various forms (e.g., tautomers, racemates, etc.), thus highlighting the need of organising related compounds in hierarchies that alert the user on important bioactivity data that may be relevant. To generate these compound hierarchies, we have developed and implemented canSARchem, a new compound registration and standardization pipeline as part of the canSAR public knowledgebase. canSARchem builds on previously developed ChEMBL and PubChem pipelines and is developed using KNIME. We describe the pipeline which we make publicly available, and we provide examples on the strengths and limitations of the use of hierarchies for bioactivity data exploration. Finally, we identify canonicalization enrichment in FDA-approved drugs, illustrating the benefits of our approach.
Results
We created a chemical registration and standardization pipeline in KNIME and made it freely available to the research community. The pipeline consists of five steps to register the compounds and create the compounds’ hierarchy: 1. Structure checker, 2. Standardization, 3. Generation of canonical tautomers and representative structures, 4. Salt strip, and 5. Generation of abstract structure to generate the compound hierarchy. Unlike ChEMBL’s RDKit pipeline, we carry out compound canonicalization ahead of getting the parent structure, similar to PubChem’s OpenEye pipeline. canSARchem has a lower rejection rate compared to both PubChem and ChEMBL. We use our pipeline to assess the impact of grouping the compounds in hierarchies for bioactivity data exploration. We find that FDA-approved drugs show statistically significant sensitivity to canonicalization compared to the majority of bioactive compounds which demonstrates the importance of this step.
Conclusions
We use canSARchem to standardize all the compounds uploaded in canSAR (> 3 million) enabling efficient data integration and the rapid identification of alternative compound forms with useful bioactivity data. Comparison with PubChem and ChEMBL pipelines evidenced comparable performances in compound standardization, but only PubChem and canSAR canonicalize tautomers and canSAR has a slightly lower rejection rate. Our results highlight the importance of compound hierarchies for bioactivity data exploration. We make canSARchem available under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0) at https://gitlab.icr.ac.uk/cansar-public/compound-registration-pipeline.
Collapse
|
15
|
Brovarets’ OO, Muradova A, Hovorun DM. Novel horizons of the conformationally-tautomeric transformations of the G·T base pairs: quantum-mechanical investigation. Mol Phys 2022. [DOI: 10.1080/00268976.2022.2026510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Ol’ha O. Brovarets’
- Department of Molecular and Quantum Biophysics, Institute of Molecular Biology and Genetics, National Academy of Sciences of Ukraine, Kyiv, Ukraine
| | - Alona Muradova
- Department of Molecular Biotechnology and Bioinformatics, Institute of High Technologies, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Dmytro M. Hovorun
- Department of Molecular and Quantum Biophysics, Institute of Molecular Biology and Genetics, National Academy of Sciences of Ukraine, Kyiv, Ukraine
- Department of Molecular Biotechnology and Bioinformatics, Institute of High Technologies, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| |
Collapse
|
16
|
Atomistic mechanisms of the tautomerization of the G·C base pairs through the proton transfer: quantum-chemical survey. J Mol Model 2021; 27:367. [PMID: 34855024 DOI: 10.1007/s00894-021-04988-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Accepted: 11/23/2021] [Indexed: 10/19/2022]
Abstract
This study is devoted to the investigation of the G·C*tO2(WC)↔G*NH3·C*t(WC), G·C*O2(WC)↔G*NH3·C*(WC) and G*·C*O2(WC)↔G*NH3·C(wWC)↓ tautomerization reactions occurring through the proton transfer, obtained at the MP2/6-311++G(2df,pd)//B3LYP/6-311++G(d,p) level of theory in gas phase under normal conditions ('WC' means base pair in Watson-Crick configuration, T=298.15 K). These reactions lead to the formation of the G*NH3·C*t(WC), G*NH3·C*(WC) and G*NH3·C(wWC)↓ base pairs by the participation of the G*NH3 base with NH3 group. Gibbs free energies of activation for these reactions are 6.43, 11.00 and 1.63 kcal·mol-1, respectively. All of these tautomerization reactions are dipole active. Finally, we believe that these non-dissociative processes, which are tightly connected with the tautomeric transformations of the G·C base pairs, play an outstanding role in supporting of the spatial structure of the DNA and RNA molecules with various functional purposes.
Collapse
|
17
|
Sharma S, Arya A, Cruz R, Cleaves II HJ. Automated Exploration of Prebiotic Chemical Reaction Space: Progress and Perspectives. Life (Basel) 2021; 11:1140. [PMID: 34833016 PMCID: PMC8624352 DOI: 10.3390/life11111140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 10/15/2021] [Accepted: 10/18/2021] [Indexed: 12/12/2022] Open
Abstract
Prebiotic chemistry often involves the study of complex systems of chemical reactions that form large networks with a large number of diverse species. Such complex systems may have given rise to emergent phenomena that ultimately led to the origin of life on Earth. The environmental conditions and processes involved in this emergence may not be fully recapitulable, making it difficult for experimentalists to study prebiotic systems in laboratory simulations. Computational chemistry offers efficient ways to study such chemical systems and identify the ones most likely to display complex properties associated with life. Here, we review tools and techniques for modelling prebiotic chemical reaction networks and outline possible ways to identify self-replicating features that are central to many origin-of-life models.
Collapse
Affiliation(s)
- Siddhant Sharma
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Department of Biochemistry, Deshbandhu College, University of Delhi, New Delhi 110019, India
- Department of Chemistry and Chemical Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
| | - Aayush Arya
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Department of Physics, Lovely Professional University, Jalandhar-Delhi GT Road, Phagwara 144001, India
| | - Romulo Cruz
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Big Data Laboratory, Information and Communications Technology Center (CTIC), National University of Engineering, Amaru 210, Lima 15333, Peru
| | - Henderson James Cleaves II
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| |
Collapse
|
18
|
Exploring the octanol-water partition coefficient dataset using deep learning techniques and data augmentation. Commun Chem 2021; 4:90. [PMID: 36697535 PMCID: PMC9814212 DOI: 10.1038/s42004-021-00528-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 05/21/2021] [Indexed: 01/28/2023] Open
Abstract
Today more and more data are freely available. Based on these big datasets deep neural networks (DNNs) rapidly gain relevance in computational chemistry. Here, we explore the potential of DNNs to predict chemical properties from chemical structures. We have selected the octanol-water partition coefficient (log P) as an example, which plays an essential role in environmental chemistry and toxicology but also in chemical analysis. The predictive performance of the developed DNN is good with an rmse of 0.47 log units in the test dataset and an rmse of 0.33 for an external dataset from the SAMPL6 challenge. To this end, we trained the DNN using data augmentation considering all potential tautomeric forms of the chemicals. We further demonstrate how DNN models can help in the curation of the log P dataset by identifying potential errors, and address limitations of the dataset itself.
Collapse
|
19
|
Goodman JM, Pletnev I, Thiessen P, Bolton E, Heller SR. InChI version 1.06: now more than 99.99% reliable. J Cheminform 2021; 13:40. [PMID: 34030732 PMCID: PMC8147039 DOI: 10.1186/s13321-021-00517-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 05/04/2021] [Indexed: 12/19/2022] Open
Abstract
The software for the IUPAC Chemical Identifier, InChI, is extraordinarily reliable. It has been tested on large databases around the world, and has proved itself to be an essential tool in the handling and integration of large chemical databases. InChI version 1.05 was released in January 2017 and version 1.06 in December 2020. In this paper, we report on the current state of the InChI Software, the details of the improvements in the v1.06 release, and the results of a test of the InChI run on PubChem, a database of more than a hundred million molecules. The upgrade introduces significant new features, including support for pseudo-element atoms and an improved description of polymers. We expect that few, if any, applications using the standard InChI will need to change as a result of the changes in version 1.06. Numerical instability was discovered for 0.002% of this database, and a small number of other molecules were discovered for which the algorithm did not run smoothly. On the basis of PubChem data, we can demonstrate that InChI version 1.05 was 99.996% accurate, and InChI version 1.06 represents a step closer to perfection. Finally, we look forward to future releases and extensions for the InChI Chemical identifier.
Collapse
Affiliation(s)
- Jonathan M Goodman
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Igor Pletnev
- InChI Trust, Cambridge, UK.,Department of Chemistry, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Paul Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Stephen R Heller
- InChI Trust, Cambridge, UK. .,National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
20
|
Brovarets' OO, Muradova A, Hovorun DM. Novel mechanisms of the conformational transformations of the biologically important G·C nucleobase pairs in Watson–Crick, Hoogsteen and wobble configurations via the mutual rotations of the bases around the intermolecular H-bonds: a QM/QTAIM study. RSC Adv 2021; 11:25700-25730. [PMID: 35478902 PMCID: PMC9036977 DOI: 10.1039/d0ra08702e] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 06/09/2021] [Indexed: 01/12/2023] Open
Abstract
It was established conformational transformations of the G·C nucleobase pairs, occurring via the mutual rotation of the G and C bases around the intermolecular H-bonds.
Collapse
Affiliation(s)
- Ol'ha O. Brovarets'
- Department of Molecular and Quantum Biophysics
- Institute of Molecular Biology and Genetics
- National Academy of Sciences of Ukraine
- Kyiv
- Ukraine
| | - Alona Muradova
- Department of Molecular Biotechnology and Bioinformatics
- Institute of High Technologies
- Taras Shevchenko National University of Kyiv
- Kyiv
- Ukraine
| | - Dmytro M. Hovorun
- Department of Molecular and Quantum Biophysics
- Institute of Molecular Biology and Genetics
- National Academy of Sciences of Ukraine
- Kyiv
- Ukraine
| |
Collapse
|
21
|
Baker CM, Kidley NJ, Papachristos K, Hotson M, Carson R, Gravestock D, Pouliot M, Harrison J, Dowling A. Tautomer Standardization in Chemical Databases: Deriving Business Rules from Quantum Chemistry. J Chem Inf Model 2020; 60:3781-3791. [PMID: 32644790 DOI: 10.1021/acs.jcim.0c00232] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Databases of small, potentially bioactive molecules are ubiquitous across the industry and academia. Designed such that each unique compound should appear only once, the multiplicity of ways in which many compounds can be represented means that these databases require methods for standardizing the representation of chemistry. This is commonly achieved through the use of "Chemistry Business Rules", sets of predefined rules that describe the "house style" of the database in question. At Syngenta, the historical approach to the design of chemistry business rules has been to focus on consistency of representation, with chemical relevance given secondary consideration. In this work, we overturn that convention. Through the use of quantum chemistry calculations, we define a set of chemistry business rules for tautomer standardization that reproduces gas-phase energetic preferences. We go on to show that, compared to our historic approach, this method yields tautomers that are in better agreement with those observed experimentally in condensed phases and that are better suited for use in predictive models.
Collapse
Affiliation(s)
- Christopher M Baker
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - Nathan J Kidley
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | | | - Matthew Hotson
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - Rob Carson
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - David Gravestock
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - Martin Pouliot
- Syngenta Crop Protection, Schaffhauserstrasse, Stein CH-4332, Switzerland
| | - Jim Harrison
- Datacraft Technologies, 110 Parkwood Place, Anstead, QLD 4070, Australia
| | - Alan Dowling
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| |
Collapse
|
22
|
Levine DS, Watson MA, Jacobson LD, Dickerson CE, Yu HS, Bochevarov AD. Pattern-free generation and quantum mechanical scoring of ring-chain tautomers. J Comput Aided Mol Des 2020; 35:417-431. [PMID: 32830300 DOI: 10.1007/s10822-020-00334-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2020] [Accepted: 07/21/2020] [Indexed: 11/24/2022]
Abstract
In contrast to the computational generation of conventional tautomers, the analogous operation that would produce ring-chain tautomers is rarely available in cheminformatics codes. This is partly due to the perceived unimportance of ring-chain tautomerism and partly because specialized algorithms are required to realize the non-local proton transfers that occur during ring-chain rearrangement. Nevertheless, for some types of organic compounds, including sugars, warfarin analogs, fluorescein dyes and some drug-like compounds, ring-chain tautomerism cannot be ignored. In this work, a novel ring-chain tautomer generation algorithm is presented. It differs from previously proposed solutions in that it does not rely on hard-coded patterns of proton migrations and bond rearrangements, and should therefore be more general and maintainable. We deploy this algorithm as part of a workflow which provides an automated solution for tautomer generation and scoring. The workflow identifies protonatable and deprotonatable sites in the molecule using a previously described approach based on rapid micro-pKa prediction. These data are used to distribute the active protons among the protonatable sites exhaustively, at which point alternate resonance structures are considered to obtain pairs of atoms with opposite formal charge. These pairs are connected with a single bond and a 3D undistorted geometry is generated. The scoring of the generated tautomers is performed with a subsequent density functional theory calculation employing an implicit solvent model. We demonstrate the performance of our workflow on several types of organic molecules known to exist in ring-chain tautomeric equilibria in solution. In particular, we show that some ring-chain tautomers not found using previously published algorithms are successfully located by ours.
Collapse
Affiliation(s)
- Daniel S Levine
- Schrödinger, Inc., 120 West 45th St, New York, NY, 10036, USA
| | - Mark A Watson
- Schrödinger, Inc., 120 West 45th St, New York, NY, 10036, USA
| | - Leif D Jacobson
- Schrödinger, Inc., 120 West 45th St, New York, NY, 10036, USA.,Schrödinger, Inc., Suite 1300, 101 SW Main Street, Portland, OR, 97204, USA
| | - Claire E Dickerson
- Schrödinger, Inc., 120 West 45th St, New York, NY, 10036, USA.,College of Chemistry & Biochemistry, University of California, Los Angeles, CA, 90095, USA
| | - Haoyu S Yu
- Schrödinger, Inc., 120 West 45th St, New York, NY, 10036, USA
| | | |
Collapse
|
23
|
Dhaked DK, Guasch L, Nicklaus MC. Tautomer Database: A Comprehensive Resource for Tautomerism Analyses. J Chem Inf Model 2020; 60:1090-1100. [PMID: 32027495 DOI: 10.1021/acs.jcim.9b01156] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We report a database of tautomeric structures that contains 2819 tautomeric tuples extracted from 171 publications. Each tautomeric entry has been annotated with experimental conditions reported in the respective publication, plus bibliographic details, structural identifiers (e.g., NCI/CADD identifiers FICTS, FICuS, uuuuu, and Standard InChI), and chemical information (e.g., SMILES, molecular weight). The majority of tautomeric tuples found were pairs; the remaining 10% were triples, quadruples, or quintuples, amounting to a total number of structures of 5977. The types of tautomerism were mainly prototropic tautomerism (79%), followed by ring-chain (13%) and valence tautomerism (8%). The experimental conditions reported in the publications included about 50 pure solvents and 9 solvent mixtures with 26 unique spectroscopic or nonspectroscopic methods. 1H and 13C NMR were the most frequently used methods. A total of 77 different tautomeric transform rules (SMIRKS) are covered by at least one example tuple in the database. This database is freely available as a spreadsheet at https://cactus.nci.nih.gov/download/tautomer/.
Collapse
Affiliation(s)
- Devendra K Dhaked
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| | - Laura Guasch
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| | - Marc C Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| |
Collapse
|