1
|
Sharma A, López Y, Jia S, Lysenko A, Boroevich KA, Tsunoda T. Enhanced analysis of tabular data through Multi-representation DeepInsight. Sci Rep 2024; 14:12851. [PMID: 38834670 DOI: 10.1038/s41598-024-63630-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 05/30/2024] [Indexed: 06/06/2024] Open
Abstract
Tabular data analysis is a critical task in various domains, enabling us to uncover valuable insights from structured datasets. While traditional machine learning methods can be used for feature engineering and dimensionality reduction, they often struggle to capture the intricate relationships and dependencies within real-world datasets. In this paper, we present Multi-representation DeepInsight (MRep-DeepInsight), a novel extension of the DeepInsight method designed to enhance the analysis of tabular data. By generating multiple representations of samples using diverse feature extraction techniques, our approach is able to capture a broader range of features and reveal deeper insights. We demonstrate the effectiveness of MRep-DeepInsight on single-cell datasets, Alzheimer's data, and artificial data, showcasing an improved accuracy over the original DeepInsight approach and machine learning methods like random forest, XGBoost, LightGBM, FT-Transformer and L2-regularized logistic regression. Our results highlight the value of incorporating multiple representations for robust and accurate tabular data analysis. By leveraging the power of diverse representations, MRep-DeepInsight offers a promising new avenue for advancing decision-making and scientific discovery across a wide range of fields.
Collapse
Affiliation(s)
- Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan.
| | | | - Shangru Jia
- Laboratory for Medical Science Mathematics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Artem Lysenko
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan
| | - Keith A Boroevich
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan.
- Laboratory for Medical Science Mathematics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan.
| |
Collapse
|
2
|
King-Smith E, Faber FA, Reilly U, Sinitskiy AV, Yang Q, Liu B, Hyek D, Lee AA. Predictive Minisci late stage functionalization with transfer learning. Nat Commun 2024; 15:426. [PMID: 38225239 PMCID: PMC10789750 DOI: 10.1038/s41467-023-42145-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 10/01/2023] [Indexed: 01/17/2024] Open
Abstract
Structural diversification of lead molecules is a key component of drug discovery to explore chemical space. Late-stage functionalizations (LSFs) are versatile methodologies capable of installing functional handles on richly decorated intermediates to deliver numerous diverse products in a single reaction. Predicting the regioselectivity of LSF is still an open challenge in the field. Numerous efforts from chemoinformatics and machine learning (ML) groups have made strides in this area. However, it is arduous to isolate and characterize the multitude of LSF products generated, limiting available data and hindering pure ML approaches. We report the development of an approach that combines a message passing neural network and 13C NMR-based transfer learning to predict the atom-wise probabilities of functionalization for Minisci and P450-based functionalizations. We validated our model both retrospectively and with a series of prospective experiments, showing that it accurately predicts the outcomes of Minisci-type and P450 transformations and outperforms the well-established Fukui-based reactivity indices and other machine learning reactivity-based algorithms.
Collapse
Affiliation(s)
- Emma King-Smith
- Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Felix A Faber
- Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Usa Reilly
- Development & Medical, Pfizer Worldwide Research, Groton, CT, USA
| | - Anton V Sinitskiy
- Machine Learning Computational Sciences, Pfizer Worldwide Research, Cambridge, MA, USA
| | - Qingyi Yang
- Development & Medical, Pfizer Worldwide Research, Cambridge, MA, USA
| | - Bo Liu
- Spectrix Analytic Services, LLC., North Haven, CT, USA
| | - Dennis Hyek
- Spectrix Analytic Services, LLC., North Haven, CT, USA
| | - Alpha A Lee
- Cavendish Laboratory, University of Cambridge, Cambridge, UK.
| |
Collapse
|
3
|
Pattanaik L, Menon A, Settels V, Spiekermann KA, Tan Z, Vermeire FH, Sandfort F, Eiden P, Green WH. ConfSolv: Prediction of Solute Conformer-Free Energies across a Range of Solvents. J Phys Chem B 2023; 127:10151-10170. [PMID: 37966798 DOI: 10.1021/acs.jpcb.3c05904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Predicting Gibbs free energy of solution is key to understanding the solvent effects on thermodynamics and reaction rates for kinetic modeling. Accurately computing solution free energies requires the enumeration and evaluation of relevant solute conformers in solution. However, even after generation of relevant conformers, determining their free energy of solution requires an expensive workflow consisting of several ab initio computational chemistry calculations. To help address this challenge, we generate a large data set of solution free energies for nearly 44,000 solutes with almost 9 million conformers calculated in 41 different solvents using density functional theory and COSMO-RS and quantify the impact of solute conformers on the solution free energy. We then train a message passing neural network to predict the relative solution free energies of a set of solute conformers, enabling the identification of a small subset of thermodynamically relevant conformers. The model offers substantial computational time savings with predictions usually substantially within 1 kcal/mol of the free energy of the solution calculated by using computational chemical methods.
Collapse
Affiliation(s)
- Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Angiras Menon
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Volker Settels
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Zipei Tan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Florence H Vermeire
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemical Engineering, KU Leuven, Celestijnenlaan 200F, Leuven 3001, Belgium
| | - Frederik Sandfort
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Philipp Eiden
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
4
|
Caldeweyher E, Elkin M, Gheibi G, Johansson M, Sköld C, Norrby PO, Hartwig JF. Hybrid Machine Learning Approach to Predict the Site Selectivity of Iridium-Catalyzed Arene Borylation. J Am Chem Soc 2023; 145:17367-17376. [PMID: 37523755 DOI: 10.1021/jacs.3c04986] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/02/2023]
Abstract
The borylation of aryl and heteroaryl C-H bonds is valuable for the site-selective functionalization of C-H bonds in complex molecules. Iridium catalysts ligated by bipyridine ligands catalyze the borylation of the C-H bond that is most acidic and least sterically hindered in an arene, but predicting the site of borylation in molecules containing multiple arenes is difficult. To address this challenge, we report a hybrid computational model that predicts the Site of Borylation (SoBo) in complex molecules. The SoBo model combines density functional theory, semiempirical quantum mechanics, cheminformatics, linear regression, and machine learning to predict site selectivity and to extrapolate these predictions to new chemical space. Experimental validation of SoBo showed that the model predicts the major site of borylation of pharmaceutical intermediates with higher accuracy than prior machine-learning models or human experts, demonstrating that SoBo will be useful to guide experiments for the borylation of specific C(sp2)-H bonds during pharmaceutical development.
Collapse
Affiliation(s)
- Eike Caldeweyher
- Data Science & Modelling, Pharmaceutical Sciences, R&D, AstraZeneca Gothenburg, SE-431 83 Mölndal, Sweden
| | - Masha Elkin
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Golsa Gheibi
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Magnus Johansson
- Cardiovascular, Renal and Metabolism, Biopharmaceuticals R&D, AstraZeneca Gothenburg, SE-431 83 Mölndal, Sweden
- Department of Organic Chemistry, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Christian Sköld
- Drug Design and Discovery, Department of Medicinal Chemistry, Uppsala University, SE-751 23 Uppsala, Sweden
| | - Per-Ola Norrby
- Data Science & Modelling, Pharmaceutical Sciences, R&D, AstraZeneca Gothenburg, SE-431 83 Mölndal, Sweden
| | - John F Hartwig
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| |
Collapse
|
5
|
Shim E, Tewari A, Cernak T, Zimmerman PM. Machine Learning Strategies for Reaction Development: Toward the Low-Data Limit. J Chem Inf Model 2023; 63:3659-3668. [PMID: 37312524 PMCID: PMC11163943 DOI: 10.1021/acs.jcim.3c00577] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Machine learning models are increasingly being utilized to predict outcomes of organic chemical reactions. A large amount of reaction data is used to train these models, which is in stark contrast to how expert chemists discover and develop new reactions by leveraging information from a small number of relevant transformations. Transfer learning and active learning are two strategies that can operate in low-data situations, which may help fill this gap and promote the use of machine learning for tackling real-world challenges in organic synthesis. This Perspective introduces active and transfer learning and connects these to potential opportunities and directions for further research, especially in the area of prospective development of chemical transformations.
Collapse
Affiliation(s)
- Eunjae Shim
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Ambuj Tewari
- Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Tim Cernak
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Paul M Zimmerman
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
6
|
Castellino NJ, Montgomery AP, Danon JJ, Kassiou M. Late-stage Functionalization for Improving Drug-like Molecular Properties. Chem Rev 2023. [PMID: 37285604 DOI: 10.1021/acs.chemrev.2c00797] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The development of late-stage functionalization (LSF) methodologies, particularly C-H functionalization, has revolutionized the field of organic synthesis. Over the past decade, medicinal chemists have begun to implement LSF strategies into their drug discovery programs, allowing for the drug discovery process to become more efficient. Most reported applications of late-stage C-H functionalization of drugs and drug-like molecules have been to rapidly diversify screening libraries to explore structure-activity relationships. However, there has been a growing trend toward the use of LSF methodologies as an efficient tool for improving drug-like molecular properties of promising drug candidates. In this review, we have comprehensively reviewed recent progress in this emerging area. Particular emphasis is placed on case studies where multiple LSF techniques were implemented to generate a library of novel analogues with improved drug-like properties. We have critically analyzed the current scope of LSF strategies to improve drug-like properties and commented on how we believe LSF can transform drug discovery in the future. Overall, we aim to provide a comprehensive survey of LSF techniques as tools for efficiently improving drug-like molecular properties, anticipating its continued uptake in drug discovery programs.
Collapse
Affiliation(s)
| | | | - Jonathan J Danon
- School of Chemistry, The University of Sydney, Sydney, NSW 2006, Australia
| | - Michael Kassiou
- School of Chemistry, The University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
7
|
Montgomery AP, Joyce JM, Danon JJ, Kassiou M. An update on late-stage functionalization in today's drug discovery. Expert Opin Drug Discov 2023; 18:597-613. [PMID: 37114995 DOI: 10.1080/17460441.2023.2205635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
INTRODUCTION Late-stage functionalization (LSF) allows for the introduction of new chemical groups toward the end of a synthetic sequence, which means new molecules can be rapidly accessed without laborious de novo chemical synthesis. Over the last decade, medicinal chemists have begun to implement LSF strategies into their drug discovery programs, affording benefits such as efficient access to diverse libraries to explore structure-activity relationships and the improvement of physicochemical and pharmacokinetic properties. AREAS COVERED An overview of the key advancements in LSF methodology development from 2019 to 2022 and their applicability to drug discovery is provided. In addition, several examples from both academia and industry where LSF methodologies have been applied by medicinal chemists to their drug discovery programs are presented. EXPERT OPINION Utilization of LSF by medicinal chemists is on the rise, both in academia and in industry. The maturation of the LSF field to produce methodologies bearing increased regioselectivity, scope, and functional group tolerance is envisaged to narrow the gap between methodology development and medicinal chemistry research. The authors predict that the sheer versatility of these techniques in facilitating challenging chemical transformations of bioactive molecules will continue to increase the efficiency of the drug discovery process.
Collapse
Affiliation(s)
| | - Jack M Joyce
- School of Chemistry, The University of Sydney, Sydney, Australia
| | - Jonathan J Danon
- School of Chemistry, The University of Sydney, Sydney, Australia
| | - Michael Kassiou
- School of Chemistry, The University of Sydney, Sydney, Australia
| |
Collapse
|
8
|
Liu Q, Tang K, Zhang L, Du J, Meng Q. Computer‐assisted synthetic planning considering reaction kinetics based on transition state automated generation method. AIChE J 2023. [DOI: 10.1002/aic.18092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
Affiliation(s)
- Qilei Liu
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
| | - Kun Tang
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
| | - Lei Zhang
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
| | - Jian Du
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
| | - Qingwei Meng
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
- Ningbo Research Institute Dalian University of Technology Ningbo 315016 China
| |
Collapse
|
9
|
Ektefaie Y, Dasoulas G, Noori A, Farhat M, Zitnik M. Multimodal learning with graphs. NAT MACH INTELL 2023; 5:340-350. [PMID: 38076673 PMCID: PMC10704992 DOI: 10.1038/s42256-023-00624-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 02/01/2023] [Indexed: 04/05/2023]
Abstract
Artificial intelligence for graphs has achieved remarkable success in modeling complex systems, ranging from dynamic networks in biology to interacting particle systems in physics. However, the increasingly heterogeneous graph datasets call for multimodal methods that can combine different inductive biases-the set of assumptions that algorithms use to make predictions for inputs they have not encountered during training. Learning on multimodal datasets presents fundamental challenges because the inductive biases can vary by data modality and graphs might not be explicitly given in the input. To address these challenges, multimodal graph AI methods combine different modalities while leveraging cross-modal dependencies using graphs. Diverse datasets are combined using graphs and fed into sophisticated multimodal architectures, specified as image-intensive, knowledge-grounded and language-intensive models. Using this categorization, we introduce a blueprint for multimodal graph learning, use it to study existing methods and provide guidelines to design new models.
Collapse
Affiliation(s)
- Yasha Ektefaie
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA 02115, USA
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
| | - George Dasoulas
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
- Harvard Data Science Initiative, Cambridge, MA 02138, USA
| | - Ayush Noori
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
- Harvard College, Cambridge, MA 02138, USA
| | - Maha Farhat
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
- Division of Pulmonary and Critical Care, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Harvard Data Science Initiative, Cambridge, MA 02138, USA
| |
Collapse
|
10
|
Singh S, Sunoj RB. Molecular Machine Learning for Chemical Catalysis: Prospects and Challenges. Acc Chem Res 2023; 56:402-412. [PMID: 36715248 DOI: 10.1021/acs.accounts.2c00801] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
ConspectusIn the domain of reaction development, one aims to obtain higher efficacies as measured in terms of yield and/or selectivities. During the empirical cycles, an admixture of outcomes from low to high yields/selectivities is expected. While it is not easy to identify all of the factors that might impact the reaction efficiency, complex and nonlinear dependence on the nature of reactants, catalysts, solvents, etc. is quite likely. Developmental stages of newer reactions would typically offer a few hundreds of samples with variations in participating molecules and/or reaction conditions. These "observations" and their "output" can be harnessed as valuable labeled data for developing molecular machine learning (ML) models. Once a robust ML model is built for a specific reaction under development, it can predict the reaction outcome for any new choice of substrates/catalyst in a few seconds/minutes and thus can expedite the identification of promising candidates for experimental validation. Recent years have witnessed impressive applications of ML in the molecular world, most of them aimed at predicting important chemical or biological properties. We believe that an integration of effective ML workflows can be made richly beneficial to reaction discovery.As with any new technology, direct adaptation of ML as used in well-developed domains, such as natural language processing (NLP) and image recognition, is unlikely to succeed in reaction discovery. Some of the challenges stem from ineffective featurization of the molecular space, unavailability of quality data and its distribution, in making the right choice of ML model and its technically robust deployment. It shall be noted that there is no universal ML model suitable for an inherently high-dimensional problem such as chemical reactions. Given these backgrounds, rendering ML tools conducive for reactions is an exciting as well as challenging endeavor at the same time. With the increased availability of efficient ML algorithms, we focused on tapping their potential for small-data reaction discovery (a few hundreds to thousands of samples).In this Account, we describe both feature engineering and feature learning approaches for molecular ML as applied to diverse reactions of high contemporary interest. Among these, catalytic asymmetric hydrogenation of imines/alkenes, β-C(sp3)-H bond functionalization, and relay Heck reaction employed a feature engineering approach using the quantum-chemically derived physical organic descriptors as the molecular features─all designed to predict the enantioselectivity. The selection of molecular features to customize it for a reaction of interest is described, along with emphasizing the chemical insights that could be gathered through the use of such features. Feature learning methods for predicting the yield of Buchwald-Hartwig cross-coupling, deoxyfluorination of alcohols, and enantioselectivity of N,S-acetal formation are found to offer excellent predictions. We propose a transfer learning protocol, wherein an ML model such as a language model is trained on a large number of molecules (105-106) and fine-tuned on a focused library of target task reactions, as an effective alternative for small-data reaction discovery (102-103 reactions). The exploitation of deep neural network latent space as a method for generative tasks to identify useful substrates for a reaction is demonstrated as a promising strategy.
Collapse
Affiliation(s)
- Sukriti Singh
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai 400076, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai 400076, India.,Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Mumbai 400076, India
| |
Collapse
|
11
|
Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023; 14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.
Collapse
Affiliation(s)
- Zhengkai Tu
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|
12
|
Reiser P, Neubert M, Eberhard A, Torresi L, Zhou C, Shao C, Metni H, van Hoesel C, Schopmans H, Sommer T, Friederich P. Graph neural networks for materials science and chemistry. COMMUNICATIONS MATERIALS 2022; 3:93. [PMID: 36468086 PMCID: PMC9702700 DOI: 10.1038/s43246-022-00315-6] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/07/2022] [Indexed: 05/14/2023]
Abstract
Machine learning plays an increasingly important role in many areas of chemistry and materials science, being used to predict materials properties, accelerate simulations, design new structures, and predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials science, as they directly work on a graph or structural representation of molecules and materials and therefore have full access to all relevant information required to characterize materials. In this Review, we provide an overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures, followed by a discussion of a wide range of recent applications of GNNs in chemistry and materials science, and concluding with a road-map for the further development and application of GNNs.
Collapse
Affiliation(s)
- Patrick Reiser
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Marlen Neubert
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - André Eberhard
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Luca Torresi
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Zhou
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Shao
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Present Address: Institute for Applied Informatics and Formal Description Systems, Karlsruhe Institute of Technology, Kaiserstr. 89, 76133 Karlsruhe, Germany
| | - Houssam Metni
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- ECPM, Université de Strasbourg, 25 Rue Becquerel, 67087 Strasbourg, France
| | - Clint van Hoesel
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Department of Applied Physics, Eindhoven University of Technology, Groene Loper 19, 5612 AP Eindhoven, The Netherlands
| | - Henrik Schopmans
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Timo Sommer
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute for Theory of Condensed Matter, Karlsruhe Institute of Technology, Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany
- Present Address: School of Chemistry, Trinity College Dublin, College Green, Dublin 2, Ireland
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
13
|
Boni YT, Cammarota RC, Liao K, Sigman MS, Davies HML. Leveraging Regio- and Stereoselective C(sp 3)-H Functionalization of Silyl Ethers to Train a Logistic Regression Classification Model for Predicting Site-Selectivity Bias. J Am Chem Soc 2022; 144:15549-15561. [PMID: 35977100 DOI: 10.1021/jacs.2c04383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The C-H functionalization of silyl ethers via carbene-induced C-H insertion represents an efficient synthetic disconnection strategy. In this work, site- and stereoselective C(sp3)-H functionalization at α, γ, δ, and even more distal positions to the siloxy group has been achieved using donor/acceptor carbene intermediates. By exploiting the predilections of Rh2(R-TCPTAD)4 and Rh2(S-2-Cl-5-BrTPCP)4 catalysts to target either more electronically activated or more spatially accessible C-H sites, respectively, divergent desired products can be formed with good diastereocontrol and enantiocontrol. Notably, the reaction can also be extended to enable desymmetrization of meso silyl ethers. Leveraging the broad substrate scope examined in this study, we have trained a machine learning classification model using logistic regression to predict the major C-H functionalization site based on intrinsic substrate reactivity and catalyst propensity for overriding it. This model enables prediction of the major product when applying these C-H functionalization methods to a new substrate of interest. Applying this model broadly, we have demonstrated its utility for guiding late-stage functionalization in complex settings and developed an intuitive visualization tool to assist synthetic chemists in such endeavors.
Collapse
Affiliation(s)
- Yannick T Boni
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| | - Ryan C Cammarota
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Kuangbiao Liao
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| | - Matthew S Sigman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Huw M L Davies
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| |
Collapse
|
14
|
Shim E, Kammeraad JA, Xu Z, Tewari A, Cernak T, Zimmerman PM. Predicting reaction conditions from limited data through active transfer learning. Chem Sci 2022; 13:6655-6668. [PMID: 35756521 PMCID: PMC9172577 DOI: 10.1039/d1sc06932b] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 05/10/2022] [Indexed: 12/30/2022] Open
Abstract
Transfer and active learning have the potential to accelerate the development of new chemical reactions, using prior data and new experiments to inform models that adapt to the target area of interest. This article shows how specifically tuned machine learning models, based on random forest classifiers, can expand the applicability of Pd-catalyzed cross-coupling reactions to types of nucleophiles unknown to the model. First, model transfer is shown to be effective when reaction mechanisms and substrates are closely related, even when models are trained on relatively small numbers of data points. Then, a model simplification scheme is tested and found to provide comparative predictivity on reactions of new nucleophiles that include unseen reagent combinations. Lastly, for a challenging target where model transfer only provides a modest benefit over random selection, an active transfer learning strategy is introduced to improve model predictions. Simple models, composed of a small number of decision trees with limited depths, are crucial for securing generalizability, interpretability, and performance of active transfer learning.
Collapse
Affiliation(s)
- Eunjae Shim
- Department of Chemistry, University of MichiganAnn ArborMIUSA
| | - Joshua A. Kammeraad
- Department of Chemistry, University of MichiganAnn ArborMIUSA,Department of Statistics, University of MichiganAnn ArborMIUSA
| | - Ziping Xu
- Department of Statistics, University of MichiganAnn ArborMIUSA
| | - Ambuj Tewari
- Department of Statistics, University of MichiganAnn ArborMIUSA,Department of Electrical Engineering and Computer Science, University of MichiganAnn ArborMIUSA
| | - Tim Cernak
- Department of Chemistry, University of MichiganAnn ArborMIUSA,Department of Medicinal Chemistry, University of MichiganAnn ArborMIUSA
| | | |
Collapse
|
15
|
Cammarota RC, Liu W, Bacsa J, Davies HML, Sigman MS. Mechanistically Guided Workflow for Relating Complex Reactive Site Topologies to Catalyst Performance in C–H Functionalization Reactions. J Am Chem Soc 2022; 144:1881-1898. [DOI: 10.1021/jacs.1c12198] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Ryan C. Cammarota
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Wenbin Liu
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| | - John Bacsa
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| | - Huw M. L. Davies
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| | - Matthew S. Sigman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| |
Collapse
|
16
|
Towards Data‐Driven Design of Asymmetric Hydrogenation of Olefins: Database and Hierarchical Learning. Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202106880] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
17
|
Juliá F, Shao Q, Duan M, Plutschack MB, Berger F, Mateos J, Lu C, Xue XS, Houk KN, Ritter T. High Site Selectivity in Electrophilic Aromatic Substitutions: Mechanism of C-H Thianthrenation. J Am Chem Soc 2021; 143:16041-16054. [PMID: 34546749 PMCID: PMC8499029 DOI: 10.1021/jacs.1c06281] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
![]()
The introduction
of thianthrene as a linchpin has proven to be
a versatile strategy for the C–H functionalization of aromatic
compounds, featuring a broad scope and fast diversification. The synthesis
of aryl thianthrenium salts has displayed an unusually high para regioselectivity, notably superior to those observed
in halogenation or borylation reactions for various substrates. We
report an experimental and computational study on the mechanism of
aromatic C–H thianthrenation reactions, with an emphasis on
the elucidation of the reactive species and the nature of the exquisite
site selectivity. Mechanisms involving a direct attack of arene to
the isolated O-trifluoracetylthianthrene S-oxide (TT+-TFA) or to the thianthrene
dication (TT2+) via electron transfer under
acidic conditions are identified. A reversible interconversion of
the different Wheland-type intermediates before a subsequent, irreversible
deprotonation is proposed to be responsible for the exceptional para selectivity of the reaction.
Collapse
Affiliation(s)
- Fabio Juliá
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm Platz 1, D-45470 Mülheim an der Ruhr, Germany
| | - Qianzhen Shao
- Department of Chemistry and Biochemistry, University of California, Los Angeles, California 90095-1569 United States
| | - Meng Duan
- Department of Chemistry and Biochemistry, University of California, Los Angeles, California 90095-1569 United States
| | - Matthew B Plutschack
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm Platz 1, D-45470 Mülheim an der Ruhr, Germany
| | - Florian Berger
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm Platz 1, D-45470 Mülheim an der Ruhr, Germany
| | - Javier Mateos
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm Platz 1, D-45470 Mülheim an der Ruhr, Germany
| | - Chenxi Lu
- Department of Chemistry and Biochemistry, University of California, Los Angeles, California 90095-1569 United States
| | - Xiao-Song Xue
- Department of Chemistry and Biochemistry, University of California, Los Angeles, California 90095-1569 United States
| | - K N Houk
- Department of Chemistry and Biochemistry, University of California, Los Angeles, California 90095-1569 United States
| | - Tobias Ritter
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm Platz 1, D-45470 Mülheim an der Ruhr, Germany
| |
Collapse
|
18
|
Lasso JD, Castillo-Pazos DJ, Li CJ. Green chemistry meets medicinal chemistry: a perspective on modern metal-free late-stage functionalization reactions. Chem Soc Rev 2021; 50:10955-10982. [PMID: 34382989 DOI: 10.1039/d1cs00380a] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The progress of drug discovery and development is paced by milestones reached in organic synthesis. In the last decade, the advent of late-stage functionalization (LSF) reactions has represented a valuable breakthrough. Recent literature has defined these reactions as the chemoselective modification of complex molecules by means of C-H functionalization or the manipulation of endogenous functional groups. Traditionally, these diversifications have been accomplished by organometallic means. However, the presence of metals carries disadvantages related to their cost, environmental hazard and health risks. Fundamentally, green chemistry directives can help minimize such hazards through the development of metal-free LSF methodologies. In this review, we expand the current discussion on metal-free LSF reactions by providing an overview of C(sp2)-H, and C(sp3)-H functionalizations, as well as the utilization of heteroatom-containing functional groups as chemical handles. Selected topics such as metal-free cross-dehydrogenative coupling (CDC) reactions, organocatalysis, electrochemistry and photochemistry are also discussed. By writing the first review on metal-free LSF methodologies, we aim to highlight current advances in the field with examples that reveal specific challenges and solutions, as well as future research opportunities.
Collapse
Affiliation(s)
- Juan D Lasso
- Department of Chemistry, FRQNT Centre for Green Chemistry and Catalysis, McGill University, 801 Sherbrooke St. W., Montreal, Quebec H3A 0B8, Canada.
| | - Durbis J Castillo-Pazos
- Department of Chemistry, FRQNT Centre for Green Chemistry and Catalysis, McGill University, 801 Sherbrooke St. W., Montreal, Quebec H3A 0B8, Canada.
| | - Chao-Jun Li
- Department of Chemistry, FRQNT Centre for Green Chemistry and Catalysis, McGill University, 801 Sherbrooke St. W., Montreal, Quebec H3A 0B8, Canada.
| |
Collapse
|
19
|
Zubatyuk R, Smith JS, Nebgen BT, Tretiak S, Isayev O. Teaching a neural network to attach and detach electrons from molecules. Nat Commun 2021; 12:4870. [PMID: 34381051 PMCID: PMC8357920 DOI: 10.1038/s41467-021-24904-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 07/01/2021] [Indexed: 02/07/2023] Open
Abstract
Interatomic potentials derived with Machine Learning algorithms such as Deep-Neural Networks (DNNs), achieve the accuracy of high-fidelity quantum mechanical (QM) methods in areas traditionally dominated by empirical force fields and allow performing massive simulations. Most DNN potentials were parametrized for neutral molecules or closed-shell ions due to architectural limitations. In this work, we propose an improved machine learning framework for simulating open-shell anions and cations. We introduce the AIMNet-NSE (Neural Spin Equilibration) architecture, which can predict molecular energies for an arbitrary combination of molecular charge and spin multiplicity with errors of about 2-3 kcal/mol and spin-charges with error errors ~0.01e for small and medium-sized organic molecules, compared to the reference QM simulations. The AIMNet-NSE model allows to fully bypass QM calculations and derive the ionization potential, electron affinity, and conceptual Density Functional Theory quantities like electronegativity, hardness, and condensed Fukui functions. We show that these descriptors, along with learned atomic representations, could be used to model chemical reactivity through an example of regioselectivity in electrophilic aromatic substitution reactions.
Collapse
Affiliation(s)
- Roman Zubatyuk
- grid.147455.60000 0001 2097 0344Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA USA
| | - Justin S. Smith
- grid.148313.c0000 0004 0428 3079Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM USA
| | - Benjamin T. Nebgen
- grid.148313.c0000 0004 0428 3079Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM USA
| | - Sergei Tretiak
- grid.148313.c0000 0004 0428 3079Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM USA ,grid.148313.c0000 0004 0428 3079Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, NM USA
| | - Olexandr Isayev
- grid.147455.60000 0001 2097 0344Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA USA
| |
Collapse
|
20
|
Xu LC, Zhang SQ, Li X, Tang MJ, Xie PP, Hong X. Towards Data-driven Design of Asymmetric Hydrogenation of Olefins: Database and Hierarchical Learning. Angew Chem Int Ed Engl 2021; 60:22804-22811. [PMID: 34370892 DOI: 10.1002/anie.202106880] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Revised: 07/14/2021] [Indexed: 11/09/2022]
Abstract
Asymmetric hydrogenation of olefins is one of the most powerful asymmetric transformations in molecular synthesis. Although several privileged catalyst scaffolds are available, the catalyst development for asymmetric hydrogenation is still a time- and resource-consuming process due to the lack of predictive catalyst design strategy. Targeting the data-driven design of asymmetric catalysis, we herein report the development of a standardized database that contains the detailed information of over 12000 literature asymmetric hydrogenations of olefins. This database provides a valuable platform for the machine learning applications in asymmetric catalysis. Based on this database, we developed a hierarchical learning approach to achieve predictive machine leaning model using only dozens of enantioselectivity data with the target olefin, which offers a useful solution for the few-shot learning problem and will facilitate the reaction optimization with new olefin substrate in catalysis screening.
Collapse
Affiliation(s)
- Li-Cheng Xu
- Zhejiang University, Department of Chemistry, CHINA
| | | | - Xin Li
- Zhejiang University, Department of Chemistry, CHINA
| | | | - Pei-Pei Xie
- Zhejiang University, Department of Chemistry, CHINA
| | - Xin Hong
- Zhejiang University, Department of Chemistry, 38 Zheda Road, 310028, Hangzhou, CHINA
| |
Collapse
|
21
|
Late-stage C–H functionalization offers new opportunities in drug discovery. Nat Rev Chem 2021; 5:522-545. [PMID: 37117588 DOI: 10.1038/s41570-021-00300-6] [Citation(s) in RCA: 248] [Impact Index Per Article: 82.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/02/2021] [Indexed: 12/24/2022]
Abstract
Over the past decade, the landscape of molecular synthesis has gained major impetus by the introduction of late-stage functionalization (LSF) methodologies. C-H functionalization approaches, particularly, set the stage for new retrosynthetic disconnections, while leading to improvements in resource economy. A variety of innovative techniques have been successfully applied to the C-H diversification of pharmaceuticals, and these key developments have enabled medicinal chemists to integrate LSF strategies in their drug discovery programmes. This Review highlights the significant advances achieved in the late-stage C-H functionalization of drugs and drug-like compounds, and showcases how the implementation of these modern strategies allows increased efficiency in the drug discovery process. Representative examples are examined and classified by mechanistic patterns involving directed or innate C-H functionalization, as well as emerging reaction manifolds, such as electrosynthesis and biocatalysis, among others. Structurally complex bioactive entities beyond small molecules are also covered, including diversification in the new modalities sphere. The challenges and limitations of current LSF methods are critically assessed, and avenues for future improvements of this rapidly expanding field are discussed. We, hereby, aim to provide a toolbox for chemists in academia as well as industrial practitioners, and introduce guiding principles for the application of LSF strategies to access new molecules of interest.
Collapse
|
22
|
Besson T, Fruit C. Recent Advances in Transition-Metal-Free Late-Stage C-H and N-H Arylation of Heteroarenes Using Diaryliodonium Salts. Pharmaceuticals (Basel) 2021; 14:661. [PMID: 34358087 PMCID: PMC8308686 DOI: 10.3390/ph14070661] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 07/07/2021] [Accepted: 07/09/2021] [Indexed: 12/21/2022] Open
Abstract
Transition-metal-free direct arylation of C-H or N-H bonds is one of the key emerging methodologies that is currently attracting tremendous attention. Diaryliodonium salts serve as a stepping stone on the way to alternative environmentally friendly and straightforward pathways for the construction of C-C and C-heteroatom bonds. In this review, we emphasize the recent synthetic advances of late-stage C(sp2)-N and C(sp2)-C(sp2) bond-forming reactions under metal-free conditions using diaryliodonium salts as arylating reagent and its applications to the synthesis of new arylated bioactive heterocyclic compounds.
Collapse
Affiliation(s)
| | - Corinne Fruit
- Normandie University, UNIROUEN, INSA Rouen, CNRS, COBRA UMR 6014, F-76000 Rouen, France;
| |
Collapse
|
23
|
Rogge T, Kaplaneris N, Chatani N, Kim J, Chang S, Punji B, Schafer LL, Musaev DG, Wencel-Delord J, Roberts CA, Sarpong R, Wilson ZE, Brimble MA, Johansson MJ, Ackermann L. C–H activation. ACTA ACUST UNITED AC 2021. [DOI: 10.1038/s43586-021-00041-2] [Citation(s) in RCA: 101] [Impact Index Per Article: 33.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
24
|
Ree N, Göller AH, Jensen JH. RegioSQM20: improved prediction of the regioselectivity of electrophilic aromatic substitutions. J Cheminform 2021; 13:10. [PMID: 33579374 PMCID: PMC7881568 DOI: 10.1186/s13321-021-00490-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 01/27/2021] [Indexed: 01/01/2023] Open
Abstract
We present RegioSQM20, a new version of RegioSQM (Chem Sci 9:660, 2018), which predicts the regioselectivities of electrophilic aromatic substitution (EAS) reactions from the calculation of proton affinities. The following improvements have been made: The open source semiempirical tight binding program xtb is used instead of the closed source MOPAC program. Any low energy tautomeric forms of the input molecule are identified and regioselectivity predictions are made for each form. Finally, RegioSQM20 offers a qualitative prediction of the reactivity of each tautomer (low, medium, or high) based on the reaction center with the highest proton affinity. The inclusion of tautomers increases the success rate from 90.7 to 92.7%. RegioSQM20 is compared to two machine learning based models: one developed by Struble et al. (React Chem Eng 5:896, 2020) specifically for regioselectivity predictions of EAS reactions (WLN) and a more generally applicable reactivity predictor (IBM RXN) developed by Schwaller et al. (ACS Cent Sci 5:1572, 2019). RegioSQM20 and WLN offers roughly the same success rates for the entire data sets (without considering tautomers), while WLN is many orders of magnitude faster. The accuracy of the more general IBM RXN approach is somewhat lower: 76.3-85.0%, depending on the data set. The code is freely available under the MIT open source license and will be made available as a webservice (regiosqm.org) in the near future.
Collapse
Affiliation(s)
- Nicolai Ree
- Department of Chemistry, University of Copenhagen, Universitetsparken 5, 2100, Copenhagen, Denmark
| | - Andreas H Göller
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany.
| | - Jan H Jensen
- Department of Chemistry, University of Copenhagen, Universitetsparken 5, 2100, Copenhagen, Denmark.
| |
Collapse
|
25
|
|
26
|
Jorner K, Brinck T, Norrby PO, Buttar D. Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies. Chem Sci 2021; 12:1163-1175. [PMID: 36299676 PMCID: PMC9528810 DOI: 10.1039/d0sc04896h] [Citation(s) in RCA: 68] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 11/02/2020] [Indexed: 12/19/2022] Open
Abstract
Accurate prediction of chemical reactions in solution is challenging for current state-of-the-art approaches based on transition state modelling with density functional theory. Models based on machine learning have emerged as a promising alternative to address these problems, but these models currently lack the precision to give crucial information on the magnitude of barrier heights, influence of solvents and catalysts and extent of regio- and chemoselectivity. Here, we construct hybrid models which combine the traditional transition state modelling and machine learning to accurately predict reaction barriers. We train a Gaussian Process Regression model to reproduce high-quality experimental kinetic data for the nucleophilic aromatic substitution reaction and use it to predict barriers with a mean absolute error of 0.77 kcal mol−1 for an external test set. The model was further validated on regio- and chemoselectivity prediction on patent reaction data and achieved a competitive top-1 accuracy of 86%, despite not being trained explicitly for this task. Importantly, the model gives error bars for its predictions that can be used for risk assessment by the end user. Hybrid models emerge as the preferred alternative for accurate reaction prediction in the very common low-data situation where only 100–150 rate constants are available for a reaction class. With recent advances in deep learning for quickly predicting barriers and transition state geometries from density functional theory, we envision that hybrid models will soon become a standard alternative to complement current machine learning approaches based on ground-state physical organic descriptors or structural information such as molecular graphs or fingerprints. Hybrid reactivity models, combining mechanistic calculations and machine learning with descriptors, are used to predict barriers for nucleophilic aromatic substitution.![]()
Collapse
Affiliation(s)
- Kjell Jorner
- Early Chemical Development
- Pharmaceutical Sciences
- R&D
- AstraZeneca
- Macclesfield
| | - Tore Brinck
- Applied Physical Chemistry
- Department of Chemistry
- CBH
- KTH Royal Institute of Technology
- Stockholm
| | - Per-Ola Norrby
- Data Science & Modelling
- Pharmaceutical Sciences
- R&D
- AstraZeneca
- Gothenburg
| | - David Buttar
- Early Chemical Development
- Pharmaceutical Sciences
- R&D
- AstraZeneca
- Macclesfield
| |
Collapse
|
27
|
Copper-Catalyzed C–H Arylation of Fused-Pyrimidinone Derivatives Using Diaryliodonium Salts. Catalysts 2020. [DOI: 10.3390/catal11010028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Copper-catalyzed Csp2–Csp2 bond forming reactions through C–H activation are still one of the most useful strategies for the diversification of heterocyclic moieties using various coupling partners. A catalytic protocol for the C–H (hetero)arylation of thiazolo[5,4-f]quinazolin-9(8H)-ones and more generally fused-pyrimidinones using catalyst loading of CuI with diaryliodonium triflates as aryl source under microwave irradiation has been disclosed. The selectivity of the transfer of the aryl group was also disclosed in the case of unsymmetrical diaryliodonium salts. Specific phenylation of valuable fused-pyrimidinones including quinazolinone are provided. This strategy enables a rapid access to an array of various (hetero)arylated N-containing polyheteroaromatics as new potential bioactive compounds.
Collapse
|
28
|
Guan Y, Coley CW, Wu H, Ranasinghe D, Heid E, Struble TJ, Pattanaik L, Green WH, Jensen KF. Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors. Chem Sci 2020; 12:2198-2208. [PMID: 34163985 PMCID: PMC8179287 DOI: 10.1039/d0sc04823b] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Accurate and rapid evaluation of whether substrates can undergo the desired the transformation is crucial and challenging for both human knowledge and computer predictions. Despite the potential of machine learning in predicting chemical reactivity such as selectivity, popular feature engineering and learning methods are either time-consuming or data-hungry. We introduce a new method that combines machine-learned reaction representation with selected quantum mechanical descriptors to predict regio-selectivity in general substitution reactions. We construct a reactivity descriptor database based on ab initio calculations of 130k organic molecules, and train a multi-task constrained model to calculate demanded descriptors on-the-fly. The proposed platform enhances the inter/extra-polated performance for regio-selectivity predictions and enables learning from small datasets with just hundreds of examples. Furthermore, the proposed protocol is demonstrated to be generally applicable to a diverse range of chemical spaces. For three general types of substitution reactions (aromatic C–H functionalization, aromatic C–X substitution, and other substitution reactions) curated from a commercial database, the fusion model achieves 89.7%, 96.7%, and 97.2% top-1 accuracy in predicting the major outcome, respectively, each using 5000 training reactions. Using predicted descriptors, the fusion model is end-to-end, and requires approximately only 70 ms per reaction to predict the selectivity from reaction SMILES strings. Integrating feature learning and on-the-fly feather engineering enables fast and accurate reacitvity predictions using large or small dataset. ![]()
Collapse
Affiliation(s)
- Yanfei Guan
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Haoyang Wu
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Duminda Ranasinghe
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Esther Heid
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thomas J Struble
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|