1
|
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 2024; 23:141-155. [PMID: 38066301 DOI: 10.1038/s41573-023-00832-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 02/08/2024]
Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- University of British Columbia, Vancouver, BC, Canada.
- Photonic Inc., Coquitlam, BC, Canada.
| |
Collapse
|
2
|
Casciuc I, Osypenko A, Kozibroda B, Horvath D, Marcou G, Bonachera F, Varnek A, Lehn JM. Toward in Silico Modeling of Dynamic Combinatorial Libraries. ACS CENTRAL SCIENCE 2022; 8:804-813. [PMID: 35756377 PMCID: PMC9228562 DOI: 10.1021/acscentsci.2c00048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Indexed: 06/15/2023]
Abstract
Dynamic combinatorial libraries (DCLs) display adaptive behavior, enabled by the reversible generation of their molecular constituents from building blocks, in response to external effectors, e.g., protein receptors. So far, chemoinformatics has not yet been used for the design of DCLs-which comprise a radically different set of challenges compared to classical library design. Here, we propose a chemoinformatic model for theoretically assessing the composition of DCLs in the presence and the absence of an effector. An imine-based DCL in interaction with the effector human carbonic anhydrase II (CA II) served as a case study. Support vector regression models for the imine formation constants and imine-CA II binding were derived from, respectively, a set of 276 imines synthesized and experimentally studied in this work and 4350 inhibitors of CA II from ChEMBL. These models predict constants for all DCL constituents, to feed software assessing equilibrium concentrations. They are publicly available on the dedicated website. Models rationally selected two amines and two aldehydes predicted to yield stable imines with high affinity for CA II and provided a virtual illustration on how effector affinity regulates DCL members.
Collapse
Affiliation(s)
- Iuri Casciuc
- Laboratoire
de Chémoinformatique UMR 7140 CNRS, Institut Le Bel 4, rue B. Pascal 67081 Strasbourg, France
- Laboratoire
de Chimie Supramoléculaire, Institut de Science et d’Ingénierie
Supramoléculaires (ISIS), Université
de Strasbourg, 8 allée Gaspard Monge, 67000 Strasbourg, France
| | - Artem Osypenko
- Laboratoire
de Chimie Supramoléculaire, Institut de Science et d’Ingénierie
Supramoléculaires (ISIS), Université
de Strasbourg, 8 allée Gaspard Monge, 67000 Strasbourg, France
| | - Bohdan Kozibroda
- Laboratoire
de Chimie Supramoléculaire, Institut de Science et d’Ingénierie
Supramoléculaires (ISIS), Université
de Strasbourg, 8 allée Gaspard Monge, 67000 Strasbourg, France
- Institute
of High Technologies, Taras Shevchenko National
University of Kyiv, 4g
Hlushkova Avenue, 03022 Kyiv, Ukraine
| | - Dragos Horvath
- Laboratoire
de Chémoinformatique UMR 7140 CNRS, Institut Le Bel 4, rue B. Pascal 67081 Strasbourg, France
| | - Gilles Marcou
- Laboratoire
de Chémoinformatique UMR 7140 CNRS, Institut Le Bel 4, rue B. Pascal 67081 Strasbourg, France
| | - Fanny Bonachera
- Laboratoire
de Chémoinformatique UMR 7140 CNRS, Institut Le Bel 4, rue B. Pascal 67081 Strasbourg, France
| | - Alexandre Varnek
- Laboratoire
de Chémoinformatique UMR 7140 CNRS, Institut Le Bel 4, rue B. Pascal 67081 Strasbourg, France
| | - Jean-Marie Lehn
- Laboratoire
de Chimie Supramoléculaire, Institut de Science et d’Ingénierie
Supramoléculaires (ISIS), Université
de Strasbourg, 8 allée Gaspard Monge, 67000 Strasbourg, France
| |
Collapse
|
3
|
Heid E, Green WH. Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction. J Chem Inf Model 2021; 62:2101-2110. [PMID: 34734699 PMCID: PMC9092344 DOI: 10.1021/acs.jcim.1c00975] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
The estimation of
chemical reaction properties such as activation
energies, rates, or yields is a central topic of computational chemistry.
In contrast to molecular properties, where machine learning approaches
such as graph convolutional neural networks (GCNNs) have excelled
for a wide variety of tasks, no general and transferable adaptations
of GCNNs for reactions have been developed yet. We therefore combined
a popular cheminformatics reaction representation, the so-called condensed
graph of reaction (CGR), with a recent GCNN architecture to arrive
at a versatile, robust, and compact deep learning model. The CGR is
a superposition of the reactant and product graphs of a chemical reaction
and thus an ideal input for graph-based machine learning approaches.
The model learns to create a data-driven, task-dependent reaction
embedding that does not rely on expert knowledge, similar to current
molecular GCNNs. Our approach outperforms current state-of-the-art
models in accuracy, is applicable even to imbalanced reactions, and
possesses excellent predictive capabilities for diverse target properties,
such as activation energies, reaction enthalpies, rate constants,
yields, or reaction classes. We furthermore curated a large set of
atom-mapped reactions along with their target properties, which can
serve as benchmark data sets for future work. All data sets and the
developed reaction GCNN model are available online, free of charge,
and open source.
Collapse
Affiliation(s)
- Esther Heid
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
4
|
Machine learning modelling of chemical reaction characteristics: yesterday, today, tomorrow. MENDELEEV COMMUNICATIONS 2021. [DOI: 10.1016/j.mencom.2021.11.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
5
|
Dong J, Zhao M, Liu Y, Su Y, Zeng X. Deep learning in retrosynthesis planning: datasets, models and tools. Brief Bioinform 2021; 23:6375056. [PMID: 34571535 DOI: 10.1093/bib/bbab391] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/16/2021] [Accepted: 08/30/2021] [Indexed: 12/29/2022] Open
Abstract
In recent years, synthesizing drugs powered by artificial intelligence has brought great convenience to society. Since retrosynthetic analysis occupies an essential position in synthetic chemistry, it has received broad attention from researchers. In this review, we comprehensively summarize the development process of retrosynthesis in the context of deep learning. This review covers all aspects of retrosynthesis, including datasets, models and tools. Specifically, we report representative models from academia, in addition to a detailed description of the available and stable platforms in the industry. We also discuss the disadvantages of the existing models and provide potential future trends, so that more abecedarians will quickly understand and participate in the family of retrosynthesis planning.
Collapse
Affiliation(s)
- Jingxin Dong
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Mingyi Zhao
- Department of Pediatrics, Third Xiangya Hospital, Central South University, 400013, Hunan, China
| | - Yuansheng Liu
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 230601, Hefei, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| |
Collapse
|
6
|
Gimadiev TR, Lin A, Afonina VA, Batyrshin D, Nugmanov RI, Akhmetshin T, Sidorov P, Duybankova N, Verhoeven J, Wegner J, Ceulemans H, Gedich A, Madzhidov TI, Varnek A. Reaction Data Curation I: Chemical Structures and Transformations Standardization. Mol Inform 2021; 40:e2100119. [PMID: 34427989 DOI: 10.1002/minf.202100119] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 08/13/2021] [Indexed: 12/11/2022]
Abstract
The quality of experimental data for chemical reactions is a critical consideration for any reaction-driven study. However, the curation of reaction data has not been extensively discussed in the literature so far. Here, we suggest a 4 steps protocol that includes the curation of individual structures (reactants and products), chemical transformations, reaction conditions and endpoints. Its implementation in Python3 using CGRTools toolkit has been used to clean three popular reaction databases Reaxys, USPTO and Pistachio. The curated USPTO database is available in the GitHub repository (Laboratoire-de-Chemoinformatique/Reaction_Data_Cleaning).
Collapse
Affiliation(s)
- Timur R Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan
| | - Arkadii Lin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France
| | - Valentina A Afonina
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Dinar Batyrshin
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Ramil I Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Tagir Akhmetshin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France.,Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan
| | | | - Jonas Verhoeven
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Joerg Wegner
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Hugo Ceulemans
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Andrey Gedich
- Arcadia Inc., Bol'shoy Sampsoniyevskiy Prospekt, 28 κopпyc 2, 194044, St Petersburg, Russia
| | - Timur I Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan.,Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France
| |
Collapse
|
7
|
Nikonenko A, Zankov D, Baskin I, Madzhidov T, Polishchuk P. Multiple Conformer Descriptors for QSAR Modeling. Mol Inform 2021; 40:e2060030. [PMID: 34342944 DOI: 10.1002/minf.202060030] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 07/19/2021] [Indexed: 12/11/2022]
Abstract
The most widely used QSAR approaches are mainly based on 2D molecular representation which ignores stereoconfiguration and conformational flexibility of compounds. 3D QSAR uses a single conformer of each compound which is difficult to choose reasonably. 4D QSAR uses multiple conformers to overcome the issues of 2D and 3D methods. However, many of existing 4D QSAR models suffer from the necessity to pre-align conformers, while alignment-independent approaches often ignore stereoconfiguration of compounds. In this study we propose a QSAR modeling approach based on transforming chirality-aware 3D pharmacophore descriptors of individual conformers into a set of latent variables representing the whole conformer set of a molecule. This is achieved by clustering together all conformers of all training set compounds. The final representation of a compound is a bit string encoding cluster membership of its conformers. In our study we used Random Forest, but this representation can be used in combination with any machine learning method. We compared this approach with conventional 2D and 3D approaches using multiple data sets and investigated the sensitivity of the approach proposed to tuning parameters: number of conformers and clusters.
Collapse
Affiliation(s)
- Aleksandra Nikonenko
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| | - Dmitry Zankov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlevskaya Str. 18, 420008, Kazan, Russia
| | - Igor Baskin
- Department of Materials Science and Engineering, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
| | - Timur Madzhidov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlevskaya Str. 18, 420008, Kazan, Russia
| | - Pavel Polishchuk
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| |
Collapse
|
8
|
Rakhimbekova A, Akhmetshin TN, Minibaeva GI, Nugmanov RI, Gimadiev TR, Madzhidov TI, Baskin II, Varnek A. Cross-validation strategies in QSPR modelling of chemical reactions. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:207-219. [PMID: 33601989 DOI: 10.1080/1062936x.2021.1883107] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 01/26/2021] [Indexed: 06/12/2023]
Abstract
In this article, we consider cross-validation of the quantitative structure-property relationship models for reactions and show that the conventional k-fold cross-validation (CV) procedure gives an 'optimistically' biased assessment of prediction performance. To address this issue, we suggest two strategies of model cross-validation, 'transformation-out' CV, and 'solvent-out' CV. Unlike the conventional k-fold cross-validation approach that does not consider the nature of objects, the proposed procedures provide an unbiased estimation of the predictive performance of the models for novel types of structural transformations in chemical reactions and reactions going under new conditions. Both the suggested strategies have been applied to predict the rate constants of bimolecular elimination and nucleophilic substitution reactions, and Diels-Alder cycloaddition. All suggested cross-validation methodologies and tutorial are implemented in the open-source software package CIMtools (https://github.com/cimm-kzn/CIMtools).
Collapse
Affiliation(s)
- A Rakhimbekova
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - T N Akhmetshin
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, Strasbourg, France
| | - G I Minibaeva
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - R I Nugmanov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - T R Gimadiev
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo, Japan
| | - T I Madzhidov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - I I Baskin
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
- Department of Materials Science and Engineering, Technion - Israel Institute of Technology, Haifa, Israel
| | - A Varnek
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, Strasbourg, France
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo, Japan
| |
Collapse
|
9
|
Bort W, Baskin II, Gimadiev T, Mukanov A, Nugmanov R, Sidorov P, Marcou G, Horvath D, Klimchuk O, Madzhidov T, Varnek A. Discovery of novel chemical reactions by deep generative recurrent neural network. Sci Rep 2021; 11:3178. [PMID: 33542271 PMCID: PMC7862614 DOI: 10.1038/s41598-021-81889-y] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 01/06/2021] [Indexed: 12/18/2022] Open
Abstract
The "creativity" of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that "creative" AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed "SMILES/CGR" strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.
Collapse
Affiliation(s)
- William Bort
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Igor I Baskin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
- Department of Materials Science and Engineering, Technion - Israel Institute of Technology, 3200003, Haifa, Israel
| | - Timur Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan
| | - Artem Mukanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Ramil Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Olga Klimchuk
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France.
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan.
| |
Collapse
|
10
|
Gimadiev T, Nugmanov R, Batyrshin D, Madzhidov T, Maeda S, Sidorov P, Varnek A. Combined Graph/Relational Database Management System for Calculated Chemical Reaction Pathway Data. J Chem Inf Model 2021; 61:554-559. [PMID: 33502186 DOI: 10.1021/acs.jcim.0c01280] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Presently, quantum chemical calculations are widely used to generate extensive data sets for machine learning applications; however, generally, these sets only include information on equilibrium structures and some close conformers. Exploration of potential energy surfaces provides important information on ground and transition states, but analysis of such data is complicated due to the number of possible reaction pathways. Here, we present RePathDB, a database system for managing 3D structural data for both ground and transition states resulting from quantum chemical calculations. Our tool allows one to store, assemble, and analyze reaction pathway data. It combines relational database CGR DB for handling compounds and reactions as molecular graphs with a graph database architecture for pathway analysis by graph algorithms. Original condensed graph of reaction technology is used to store any chemical reaction as a single graph.
Collapse
Affiliation(s)
- Timur Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| | - Ramil Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Dinar Batyrshin
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Satoshi Maeda
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan.,Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081 Strasbourg, France
| |
Collapse
|
11
|
Varnek A, Baskin II. Modern Trends in Chemical Reactions Modeling. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11543-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
12
|
Comprehensive Analysis of Applicability Domains of QSPR Models for Chemical Reactions. Int J Mol Sci 2020; 21:ijms21155542. [PMID: 32756326 PMCID: PMC7432167 DOI: 10.3390/ijms21155542] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 07/27/2020] [Accepted: 07/30/2020] [Indexed: 01/28/2023] Open
Abstract
Nowadays, the problem of the model’s applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions. The QRPR models’ performance largely depends on the way that chemical transformation is encoded. In this study, various AD definition methods extensively used in QSAR/QSPR studies of individual molecules, as well as several novel approaches suggested in this work for reactions, were benchmarked on several reaction datasets. The ability to exclude wrong reaction types, increase coverage, improve the model performance and detect Y-outliers were tested. As a result, several “best” AD definitions for the QRPR models predicting reaction characteristics have been revealed and tested on a previously published external dataset with a clear AD definition problem.
Collapse
|
13
|
Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, Tropsha A. QSAR without borders. Chem Soc Rev 2020; 49:3525-3564. [PMID: 32356548 PMCID: PMC8008490 DOI: 10.1039/d0cs00098a] [Citation(s) in RCA: 319] [Impact Index Per Article: 79.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
Collapse
Affiliation(s)
- Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Dhaked DK, Guasch L, Nicklaus MC. Tautomer Database: A Comprehensive Resource for Tautomerism Analyses. J Chem Inf Model 2020; 60:1090-1100. [PMID: 32027495 DOI: 10.1021/acs.jcim.9b01156] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We report a database of tautomeric structures that contains 2819 tautomeric tuples extracted from 171 publications. Each tautomeric entry has been annotated with experimental conditions reported in the respective publication, plus bibliographic details, structural identifiers (e.g., NCI/CADD identifiers FICTS, FICuS, uuuuu, and Standard InChI), and chemical information (e.g., SMILES, molecular weight). The majority of tautomeric tuples found were pairs; the remaining 10% were triples, quadruples, or quintuples, amounting to a total number of structures of 5977. The types of tautomerism were mainly prototropic tautomerism (79%), followed by ring-chain (13%) and valence tautomerism (8%). The experimental conditions reported in the publications included about 50 pure solvents and 9 solvent mixtures with 26 unique spectroscopic or nonspectroscopic methods. 1H and 13C NMR were the most frequently used methods. A total of 77 different tautomeric transform rules (SMIRKS) are covered by at least one example tuple in the database. This database is freely available as a spreadsheet at https://cactus.nci.nih.gov/download/tautomer/.
Collapse
Affiliation(s)
- Devendra K Dhaked
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| | - Laura Guasch
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| | - Marc C Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, Maryland 21702, United States
| |
Collapse
|
15
|
Affiliation(s)
- Oya Wahl
- Drug Discovery Chemistry − Scientific Computing, Idorsia Pharmaceuticals, 4123 − Allschwil, Switzerland
| | - Thomas Sander
- Drug Discovery Chemistry − Scientific Computing, Idorsia Pharmaceuticals, 4123 − Allschwil, Switzerland
| |
Collapse
|
16
|
Zankov DV, Madzhidov TI, Rakhimbekova A, Gimadiev TR, Nugmanov RI, Kazymova MA, Baskin II, Varnek A. Conjugated Quantitative Structure–Property Relationship Models: Application to Simultaneous Prediction of Tautomeric Equilibrium Constants and Acidity of Molecules. J Chem Inf Model 2019; 59:4569-4576. [DOI: 10.1021/acs.jcim.9b00722] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Dmitry V. Zankov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008 Kazan, Russia
| | - Timur I. Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008 Kazan, Russia
| | - Assima Rakhimbekova
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008 Kazan, Russia
| | - Timur R. Gimadiev
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008 Kazan, Russia
| | - Ramil I. Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008 Kazan, Russia
| | - Marina A. Kazymova
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008 Kazan, Russia
| | - Igor I. Baskin
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008 Kazan, Russia
- Faculty of Physics, Moscow State University, Vorob’evy gory 1, 119234 Moscow, Russia
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg, France
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| |
Collapse
|
17
|
Abstract
Solutions of organic molecules containing one or more heterocycles with conjugated bonds may exist as a mixture of tautomers, but typically only a few of them are significantly populated even though the potential number grows combinatorially with the number of protonation and deprotonation sites. Generating the most stable tautomers from a given input structure is an important and challenging task, and numerous algorithms to tackle it have been proposed in the literature. This work describes a novel approach for tautomer prediction that involves the combined use of molecular mechanics, semiempirical quantum chemistry, and density functional theory. The key idea in our method is to identify the protonation and deprotonation sites using estimated micro-p Ka's for every atom in the molecule as well as in its nearest protonated and deprotonated forms. To generate tautomers in a systematic way with minimal bias, we then consider the full set of tautomers that arise from the combinatorial distribution of all such mobile protons among all protonatable sites, with efficient postprocessing to screen away high-energy species. To estimate the micro-p Ka's, we present a new method designed for the current task, but we emphasize that any alternative method can be used in conjunction with our basic algorithm. Our approach is therefore grounded in the computational prediction of physical properties in aqueous solution, in contrast to other approaches that may rely on the use of hard-coded rules of proton distribution, previously observed tautomerization patterns from a known chemical space, or human input. We present examples of the application of our algorithm to organic and drug-like molecules, with a focus on novel structures where traditional methods are expected to perform worse.
Collapse
Affiliation(s)
- Mark A Watson
- Schrödinger, Inc. , 120 West 45th Street , New York , New York 10036 , United States
| | - Haoyu S Yu
- Schrödinger, Inc. , 120 West 45th Street , New York , New York 10036 , United States
| | - Art D Bochevarov
- Schrödinger, Inc. , 120 West 45th Street , New York , New York 10036 , United States
| |
Collapse
|
18
|
Nugmanov RI, Mukhametgaleev RN, Akhmetshin T, Gimadiev TR, Afonina VA, Madzhidov TI, Varnek A. CGRtools: Python Library for Molecule, Reaction, and Condensed Graph of Reaction Processing. J Chem Inf Model 2019; 59:2516-2521. [DOI: 10.1021/acs.jcim.9b00102] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Ramil I. Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, A.M. Butlerov Institute of Chemistry, Kazan Federal University, 18 Kremlyovskaya Str., 420008 Kazan, Russia
| | - Ravil N. Mukhametgaleev
- Laboratory of Chemoinformatics and Molecular Modeling, A.M. Butlerov Institute of Chemistry, Kazan Federal University, 18 Kremlyovskaya Str., 420008 Kazan, Russia
| | - Tagir Akhmetshin
- Laboratory of Chemoinformatics and Molecular Modeling, A.M. Butlerov Institute of Chemistry, Kazan Federal University, 18 Kremlyovskaya Str., 420008 Kazan, Russia
| | - Timur R. Gimadiev
- Laboratory of Chemoinformatics and Molecular Modeling, A.M. Butlerov Institute of Chemistry, Kazan Federal University, 18 Kremlyovskaya Str., 420008 Kazan, Russia
| | - Valentina A. Afonina
- Laboratory of Chemoinformatics and Molecular Modeling, A.M. Butlerov Institute of Chemistry, Kazan Federal University, 18 Kremlyovskaya Str., 420008 Kazan, Russia
| | - Timur I. Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, A.M. Butlerov Institute of Chemistry, Kazan Federal University, 18 Kremlyovskaya Str., 420008 Kazan, Russia
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, Université de Strasbourg, 4 rue Blaise Pascal, 67000 Strasbourg, France
| |
Collapse
|
19
|
Gimadiev T, Madzhidov T, Tetko I, Nugmanov R, Casciuc I, Klimchuk O, Bodrov A, Polishchuk P, Antipin I, Varnek A. Bimolecular Nucleophilic Substitution Reactions: Predictive Models for Rate Constants and Molecular Reaction Pairs Analysis. Mol Inform 2018; 38:e1800104. [DOI: 10.1002/minf.201800104] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 10/16/2018] [Indexed: 11/07/2022]
Affiliation(s)
- Timur Gimadiev
- Laboratory of Chemoinformatics and Molecular ModelingButlerov Institute of ChemistryKazan Federal University Kremlyovskaya str. 18 Kazan Russia
- Laboratoire de Chémoinformatique, UMR 7140 CNRSUniversité de Strasbourg 1, rue Blaise Pascal 67000 Strasbourg France
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular ModelingButlerov Institute of ChemistryKazan Federal University Kremlyovskaya str. 18 Kazan Russia
| | - Igor Tetko
- Helmholtz Zentrum München – German Research Center for Environmental Health (GmbH)Institute of Structural Biology Ingolstädter Landstraße 1 D-85764 Neuherberg Germany
| | - Ramil Nugmanov
- Laboratory of Chemoinformatics and Molecular ModelingButlerov Institute of ChemistryKazan Federal University Kremlyovskaya str. 18 Kazan Russia
| | - Iury Casciuc
- Laboratoire de Chémoinformatique, UMR 7140 CNRSUniversité de Strasbourg 1, rue Blaise Pascal 67000 Strasbourg France
| | - Olga Klimchuk
- Laboratoire de Chémoinformatique, UMR 7140 CNRSUniversité de Strasbourg 1, rue Blaise Pascal 67000 Strasbourg France
| | - Andrey Bodrov
- Laboratory of Chemoinformatics and Molecular ModelingButlerov Institute of ChemistryKazan Federal University Kremlyovskaya str. 18 Kazan Russia
- Department of General and Organic ChemistryKazan State Medical University Kazan Russia
| | - Pavel Polishchuk
- Institute of Molecular and Translational MedicineFaculty of Medicine and DentistryPalacky University Hněvotínská 1333/5 77900 Olomouc Czech Republic
| | - Igor Antipin
- Laboratory of Chemoinformatics and Molecular ModelingButlerov Institute of ChemistryKazan Federal University Kremlyovskaya str. 18 Kazan Russia
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique, UMR 7140 CNRSUniversité de Strasbourg 1, rue Blaise Pascal 67000 Strasbourg France
| |
Collapse
|