51
|
Sacha M, Błaż M, Byrski P, Dąbrowski-Tumański P, Chromiński M, Loska R, Włodarczyk-Pruszyński P, Jastrzębski S. Molecule Edit Graph Attention Network: Modeling Chemical Reactions as Sequences of Graph Edits. J Chem Inf Model 2021; 61:3273-3284. [PMID: 34251814 DOI: 10.1021/acs.jcim.1c00537] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The central challenge in automated synthesis planning is to be able to generate and predict outcomes of a diverse set of chemical reactions. In particular, in many cases, the most likely synthesis pathway cannot be applied due to additional constraints, which requires proposing alternative chemical reactions. With this in mind, we present Molecule Edit Graph Attention Network (MEGAN), an end-to-end encoder-decoder neural model. MEGAN is inspired by models that express a chemical reaction as a sequence of graph edits, akin to the arrow pushing formalism. We extend this model to retrosynthesis prediction (predicting substrates given the product of a chemical reaction) and scale it up to large data sets. We argue that representing the reaction as a sequence of edits enables MEGAN to efficiently explore the space of plausible chemical reactions, maintaining the flexibility of modeling the reaction in an end-to-end fashion and achieving state-of-the-art accuracy in standard benchmarks. Code and trained models are made available online at https://github.com/molecule-one/megan.
Collapse
Affiliation(s)
| | | | | | - Paweł Dąbrowski-Tumański
- Molecule One, Warsaw 00-815, Poland.,Faculty of Mathematics and Natural Sciences, School of Exact Sciences, Cardinal Stefan Wyszynski University, Warsaw 01-815, Poland
| | | | - Rafał Loska
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw 01-224, Poland
| | | | | |
Collapse
|
52
|
Moskal M, Beker W, Szymkuć S, Grzybowski BA. Scaffold‐Directed Face Selectivity Machine‐Learned from Vectors of Non‐covalent Interactions. Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202101986] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Martyna Moskal
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Wiktor Beker
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Sara Szymkuć
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
- IBS Center for Soft and Living Matter and Department of Chemistry UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun Ulsan South Korea
| |
Collapse
|
53
|
Moskal M, Beker W, Szymkuć S, Grzybowski BA. Scaffold-Directed Face Selectivity Machine-Learned from Vectors of Non-covalent Interactions. Angew Chem Int Ed Engl 2021; 60:15230-15235. [PMID: 33876554 DOI: 10.1002/anie.202101986] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 03/29/2021] [Indexed: 11/06/2022]
Abstract
This work describes a method to vectorize and Machine-Learn, ML, non-covalent interactions responsible for scaffold-directed reactions important in synthetic chemistry. Models trained on this representation predict correct face of approach in ca. 90 % of Michael additions or Diels-Alder cycloadditions. These accuracies are significantly higher than those based on traditional ML descriptors, energetic calculations, or intuition of experienced synthetic chemists. Our results also emphasize the importance of ML models being provided with relevant mechanistic knowledge; without such knowledge, these models cannot easily "transfer-learn" and extrapolate to previously unseen reaction mechanisms.
Collapse
Affiliation(s)
- Martyna Moskal
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Wiktor Beker
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Sara Szymkuć
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA.,IBS Center for Soft and Living Matter and Department of Chemistry, UNIST, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan, South Korea
| |
Collapse
|
54
|
Affiliation(s)
- Agustí Lledós
- Departament de Química Universitat Autònoma de Barcelona Campus UAB 08193 Cerdanyola del Vallès Catalonia Spain
| |
Collapse
|
55
|
Stuyver T, Shaik S. Promotion Energy Analysis Predicts Reaction Modes: Nucleophilic and Electrophilic Aromatic Substitution Reactions. J Am Chem Soc 2021; 143:4367-4378. [PMID: 33689334 DOI: 10.1021/jacs.1c00307] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
To develop an approach to pre-emptively predict the existence of major reaction modes associated with a chemical system, based on exclusive consideration of reactant properties, we build herein on the valence bond perspective of chemical reactivity. In this perspective, elementary chemical reactions are conceptualized as crossovers between individual diabatic/semilocalized states. As demonstrated, the spacings between the main diabatic states in the reactant geometries-the so-called promotion energies-contain predictive information about which types of crossings are likely to occur on a potential energy surface, facilitating the identification of potential transition states and products. As an added bonus, promotion energy analysis provides direct insight into the impact of environmental effects, e.g., the presence of (polar) solvents and/or (local) electric fields, on a mechanistic landscape. We illustrate the usefulness of our approach by focusing on model nucleophilic and electrophilic aromatic substitution reactions. Overall, we envision our analysis to be useful not only as a tool for conceptualizing individual mechanistic landscapes but also as a facilitator of systematic reaction-network exploration efforts. Because the emerging VB descriptors are computationally inexpensive (and can alternatively be inferred through machine learning), they could be evaluated on-the-fly as part of an exploration algorithm. The so-predicted reaction modes could subsequently be examined in detail through computationally more-demanding methods.
Collapse
Affiliation(s)
- Thijs Stuyver
- Institute of Chemistry, The Hebrew University, Jerusalem 91904, Israel
| | - Sason Shaik
- Institute of Chemistry, The Hebrew University, Jerusalem 91904, Israel
| |
Collapse
|
56
|
Abstract
As more data are introduced in the building of models of chemical reactivity, the mechanistic component can be reduced until 'big data' applications are reached. These methods no longer depend on underlying mechanistic hypotheses, potentially learning them implicitly through extensive data training. Reactivity models often focus on reaction barriers, but can also be trained to directly predict lab-relevant properties, such as yields or conditions. Calculations with a quantum-mechanical component are still preferred for quantitative predictions of reactivity. Although big data applications tend to be more qualitative, they have the advantage to be broadly applied to different kinds of reactions. There is a continuum of methods in between these extremes, such as methods that use quantum-derived data or descriptors in machine learning models. Here, we present an overview of the recent machine learning applications in the field of chemical reactivity from a mechanistic perspective. Starting with a summary of how reactivity questions are addressed by quantum-mechanical methods, we discuss methods that augment or replace quantum-based modelling with faster alternatives relying on machine learning.
Collapse
|
57
|
Kovács DP, McCorkindale W, Lee AA. Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias. Nat Commun 2021; 12:1695. [PMID: 33727552 PMCID: PMC7966799 DOI: 10.1038/s41467-021-21895-w] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 02/10/2021] [Indexed: 12/30/2022] Open
Abstract
Organic synthesis remains a major challenge in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salient chemistry, nor is it clear which training data they are relying on to reach a prediction. This opaqueness hinders both model developers and users. In this paper, we quantitatively interpret the Molecular Transformer, the state-of-the-art model for reaction prediction. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Furthermore, we demonstrate how to retrieve evidence for predicted reaction outcomes, and understand counterintuitive predictions by scrutinising the data. Additionally, we identify Clever Hans predictions where the correct prediction is reached for the wrong reason due to dataset bias. We present a new debiased dataset that provides a more realistic assessment of model performance, which we propose as the new standard benchmark for comparing reaction prediction models.
Collapse
Affiliation(s)
| | | | - Alpha A Lee
- Cavendish Laboratory, University of Cambridge, Cambridge, UK.
| |
Collapse
|
58
|
Ree N, Göller AH, Jensen JH. RegioSQM20: improved prediction of the regioselectivity of electrophilic aromatic substitutions. J Cheminform 2021; 13:10. [PMID: 33579374 PMCID: PMC7881568 DOI: 10.1186/s13321-021-00490-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 01/27/2021] [Indexed: 01/01/2023] Open
Abstract
We present RegioSQM20, a new version of RegioSQM (Chem Sci 9:660, 2018), which predicts the regioselectivities of electrophilic aromatic substitution (EAS) reactions from the calculation of proton affinities. The following improvements have been made: The open source semiempirical tight binding program xtb is used instead of the closed source MOPAC program. Any low energy tautomeric forms of the input molecule are identified and regioselectivity predictions are made for each form. Finally, RegioSQM20 offers a qualitative prediction of the reactivity of each tautomer (low, medium, or high) based on the reaction center with the highest proton affinity. The inclusion of tautomers increases the success rate from 90.7 to 92.7%. RegioSQM20 is compared to two machine learning based models: one developed by Struble et al. (React Chem Eng 5:896, 2020) specifically for regioselectivity predictions of EAS reactions (WLN) and a more generally applicable reactivity predictor (IBM RXN) developed by Schwaller et al. (ACS Cent Sci 5:1572, 2019). RegioSQM20 and WLN offers roughly the same success rates for the entire data sets (without considering tautomers), while WLN is many orders of magnitude faster. The accuracy of the more general IBM RXN approach is somewhat lower: 76.3-85.0%, depending on the data set. The code is freely available under the MIT open source license and will be made available as a webservice (regiosqm.org) in the near future.
Collapse
Affiliation(s)
- Nicolai Ree
- Department of Chemistry, University of Copenhagen, Universitetsparken 5, 2100, Copenhagen, Denmark
| | - Andreas H Göller
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany.
| | - Jan H Jensen
- Department of Chemistry, University of Copenhagen, Universitetsparken 5, 2100, Copenhagen, Denmark.
| |
Collapse
|
59
|
Yang LC, Li X, Zhang SQ, Hong X. Machine learning prediction of hydrogen atom transfer reactivity in photoredox-mediated C–H functionalization. Org Chem Front 2021. [DOI: 10.1039/d1qo01325d] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
DFT-computed structure–activity relationship data and physical organic descriptors create accurate machine learning model for HAT barrier prediction in photoredox-mediated HAT catalysis.
Collapse
Affiliation(s)
- Li-Cheng Yang
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou 310027, China
| | - Xin Li
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou 310027, China
| | - Shuo-Qing Zhang
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou 310027, China
| | - Xin Hong
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|