1
|
Schadow G, Borodina YV, Delannée V, Ihlenfeldt WD, Godfrey AG, Nicklaus MC. Reaction SPL – extension of a public document markup standard to chemical reactions. PURE APPL CHEM 2022. [PMCID: PMC9189732 DOI: 10.1515/pac-2021-2011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
There are numerous formats and data models for describing reaction-related data. However, each offers only a limited coverage of the multitude of information that can be of interest to a broad user base in the context of chemical reactions. Structured Product Labeling (SPL) is a robust yet fairly light public XML document standard. It uses a highly generic but usefully refinable data schema, which is, like a language, highly expressive. We are therefore presenting an extension of SPL to chemical reactions (“Reaction SPL”). This extension is designed to support chemical manufacturing processes, which include as a minimum the chemical reaction and the procedures and conditions to run it. We provide an overview of the SPL reaction specification structures followed by some examples of documents with reaction data: predicted single-step reactions, a two-step synthesis, an enzymatic reaction, an example how to represent a reaction center, a patent, and a fully annotated reaction with by-products. Special attention is given to a mechanism for atom-atom mapping of reactions as well as to the possibility to integrate Reaction SPL with laboratory automation equipment, in particular automated synthesis devices.
Collapse
Affiliation(s)
| | | | | | | | - Alexander G. Godfrey
- National Center for Advancing Translational Sciences, NIH , Rockville , MD , USA
| | | |
Collapse
|
2
|
Delannée V, Nicklaus MC. ReactionCode: format for reaction searching, analysis, classification, transform, and encoding/decoding. J Cheminform 2020; 12:72. [PMID: 33292568 PMCID: PMC7713369 DOI: 10.1186/s13321-020-00476-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 11/18/2020] [Indexed: 12/19/2022] Open
Abstract
In the past two decades a lot of different formats for molecules and reactions have been created. These formats were mostly developed for the purposes of identifiers, representation, classification, analysis and data exchange. A lot of efforts have been made on molecule formats but only few for reactions where the endeavors have been made mostly by companies leading to proprietary formats. Here, we present ReactionCode: a new open-source format that allows one to encode and decode a reaction into multi-layer machine readable code, which aggregates reactants and products into a condensed graph of reaction (CGR). This format is flexible and can be used in a context of reaction similarity searching and classification. It is also designed for database organization, machine learning applications and as a new transform reaction language.![]()
Collapse
Affiliation(s)
- Victorien Delannée
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, 376 Boyles Street, Frederick, MD, 21702, USA
| | - Marc C Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, 376 Boyles Street, Frederick, MD, 21702, USA.
| |
Collapse
|
3
|
Patel H, Ihlenfeldt WD, Judson PN, Moroz YS, Pevzner Y, Peach ML, Delannée V, Tarasova NI, Nicklaus MC. SAVI, in silico generation of billions of easily synthesizable compounds through expert-system type rules. Sci Data 2020; 7:384. [PMID: 33177514 PMCID: PMC7658252 DOI: 10.1038/s41597-020-00727-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Accepted: 10/16/2020] [Indexed: 01/08/2023] Open
Abstract
We have made available a database of over 1 billion compounds predicted to be easily synthesizable, called Synthetically Accessible Virtual Inventory (SAVI). They have been created by a set of transforms based on an adaptation and extension of the CHMTRN/PATRAN programming languages describing chemical synthesis expert knowledge, which originally stem from the LHASA project. The chemoinformatics toolkit CACTVS was used to apply a total of 53 transforms to about 150,000 readily available building blocks (enamine.net). Only single-step, two-reactant syntheses were calculated for this database even though the technology can execute multi-step reactions. The possibility to incorporate scoring systems in CHMTRN allowed us to subdivide the database of 1.75 billion compounds in sets according to their predicted synthesizability, with the most-synthesizable class comprising 1.09 billion synthetic products. Properties calculated for all SAVI products show that the database should be well-suited for drug discovery. It is being made publicly available for free download from https://doi.org/10.35115/37n9-5738.
Collapse
Affiliation(s)
- Hitesh Patel
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | | | - Philip N Judson
- Heather Lea, Bland Hill, Norwood, Harrogate, HG3 1TE, England
| | - Yurii S Moroz
- Enamine Ltd, 78 Chervonotkatska Street, Suite 1, Kyiv, 02094, Ukraine and Chemspace LLC, 85 Chervonotkatska Street, Suite 1, Kyiv, 02094, Ukraine
| | - Yuri Pevzner
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
- AbbVie, Inc., North Chicago, IL, 60064, USA
| | - Megan L Peach
- Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Victorien Delannée
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Nadya I Tarasova
- Synthetic Biologics and Drug Discovery Group, Laboratory of Cancer Immunometabolism, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Marc C Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA.
| |
Collapse
|
4
|
Abstract
We have adopted and extended the CHMTRN language and used it for the knowledge base of a computer program to generate a large database of synthetically accessible, drug-like chemical structures, the Synthetically Accessible Virtual Inventory (SAVI) Database. CHMTRN is a powerful language originally developed in the LHASA (Logic and Heuristics Applied to Synthetic Analysis) project at Harvard University and used together with the chemical pattern description language, PATRAN, to describe chemical retro-reactions. The languages have proven to be useful beyond the design of retrosynthetic routes and have the potential for much wider use in chemistry; this paper describes CHMTRN and PATRAN as now reimplemented for the forward-synthetic SAVI project but able to describe both forward and retro-reactions.
Collapse
Affiliation(s)
- Philip N Judson
- Heather Lea, Bland Hill, Norwood, Harrogate HG3 1TE, England
| | | | - Hitesh Patel
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, Maryland 21702, United States
| | - Victorien Delannée
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, Maryland 21702, United States
| | - Nadya Tarasova
- Synthetic Biologics and Drug Discovery Group, Laboratory of Cancer Immunometabolism, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, Maryland 21702, United States
| | - Marc C Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, Maryland 21702, United States
| |
Collapse
|
5
|
Dhaked DK, Ihlenfeldt WD, Patel H, Delannée V, Nicklaus MC. Toward a Comprehensive Treatment of Tautomerism in Chemoinformatics Including in InChI V2. J Chem Inf Model 2020; 60:1253-1275. [PMID: 32043883 DOI: 10.1021/acs.jcim.9b01080] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
We have collected 86 different transforms of tautomeric interconversions. Out of those, 54 are for prototropic (non-ring-chain) tautomerism, 21 for ring-chain tautomerism, and 11 for valence tautomerism. The majority of these rules have been extracted from experimental literature. Twenty rules, covering the most well-known types of tautomerism such as keto-enol tautomerism, were taken from the default handling of tautomerism by the chemoinformatics toolkit CACTVS. The rules were analyzed against nine differerent databases totaling over 400 million (non-unique) structures as to their occurrence rates, mutual overlap in coverage, and recapitulation of the rules' enumerated tautomer sets by InChI V.1.05, both in InChI's Standard and a Nonstandard version with the increased tautomer-handling options 15T and KET turned on. These results and the background of this study are discussed in the context of the IUPAC InChI Project tasked with the redesign of handling of tautomerism for an InChI version 2. Applying the rules presented in this paper would approximately triple the number of compounds in typical small-molecule databases that would be affected by tautomeric interconversion by InChI V2. A web tool has been created to test these rules at https://cactus.nci.nih.gov/tautomerizer.
Collapse
Affiliation(s)
- Devendra K Dhaked
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| | | | - Hitesh Patel
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| | - Victorien Delannée
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| | - Marc C Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| |
Collapse
|
6
|
Delannée V, Langouët S, Théret N, Siegel A. A modeling approach to evaluate the balance between bioactivation and detoxification of MeIQx in human hepatocytes. PeerJ 2017; 5:e3703. [PMID: 28879062 PMCID: PMC5582613 DOI: 10.7717/peerj.3703] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 07/27/2017] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Heterocyclic aromatic amines (HAA) are environmental and food contaminants that are potentially carcinogenic for humans. 2-Amino-3,8-dimethylimidazo[4,5-f]quinoxaline (MeIQx) is one of the most abundant HAA formed in cooked meat. MeIQx is metabolized by cytochrome P450 1A2 in the human liver into detoxificated and bioactivated products. Once bioactivated, MeIQx metabolites can lead to DNA adduct formation responsible for further genome instability. METHODS Using a computational approach, we developed a numerical model for MeIQx metabolism in the liver that predicts the MeIQx biotransformation into detoxification or bioactivation pathways according to the concentration of MeIQx. RESULTS Our results demonstrate that (1) the detoxification pathway predominates, (2) the ratio between detoxification and bioactivation pathways is not linear and shows a maximum at 10 µM of MeIQx in hepatocyte cell models, and (3) CYP1A2 is a key enzyme in the system that regulates the balance between bioactivation and detoxification. Our analysis suggests that such a ratio could be considered as an indicator of MeIQx genotoxicity at a low concentration of MeIQx. CONCLUSIONS Our model permits the investigation of the balance between bioactivation (i.e., DNA adduct formation pathway through the prediction of potential genotoxic compounds) and detoxification of MeIQx in order to predict the behaviour of this environmental contaminant in the human liver. It highlights the importance of complex regulations of enzyme competitions that should be taken into account in any further multi-organ models.
Collapse
Affiliation(s)
- Victorien Delannée
- UMR 6074 IRISA, CNRS, INRIA, University of Rennes 1, Rennes, France.,UMR Inserm U1085 IRSET, University of Rennes 1, Rennes, France
| | - Sophie Langouët
- UMR Inserm U1085 IRSET, University of Rennes 1, Rennes, France
| | - Nathalie Théret
- UMR 6074 IRISA, CNRS, INRIA, University of Rennes 1, Rennes, France.,UMR Inserm U1085 IRSET, University of Rennes 1, Rennes, France
| | - Anne Siegel
- UMR 6074 IRISA, CNRS, INRIA, University of Rennes 1, Rennes, France
| |
Collapse
|