1
|
Li SC, Wu H, Menon A, Spiekermann KA, Li YP, Green WH. When Do Quantum Mechanical Descriptors Help Graph Neural Networks to Predict Chemical Properties? J Am Chem Soc 2024. [PMID: 39106041 DOI: 10.1021/jacs.4c04670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/07/2024]
Abstract
Deep graph neural networks are extensively utilized to predict chemical reactivity and molecular properties. However, because of the complexity of chemical space, such models often have difficulty extrapolating beyond the chemistry contained in the training set. Augmenting the model with quantum mechanical (QM) descriptors is anticipated to improve its generalizability. However, obtaining QM descriptors often requires CPU-intensive computational chemistry calculations. To identify when QM descriptors help graph neural networks predict chemical properties, we conduct a systematic investigation of the impact of atom, bond, and molecular QM descriptors on the performance of directed message passing neural networks (D-MPNNs) for predicting 16 molecular properties. The analysis surveys computational and experimental targets, as well as classification and regression tasks, and varied data set sizes from several hundred to hundreds of thousands of data points. Our results indicate that QM descriptors are mostly beneficial for D-MPNN performance on small data sets, provided that the descriptors correlate well with the targets and can be readily computed with high accuracy. Otherwise, using QM descriptors can add cost without benefit or even introduce unwanted noise that can degrade model performance. Strategic integration of QM descriptors with D-MPNN unlocks potential for physics-informed, data-efficient modeling with some interpretability that can streamline de novo drug and material designs. To facilitate the use of QM descriptors in machine learning workflows for chemistry, we provide a set of guidelines regarding when and how to best leverage QM descriptors, a high-throughput workflow to compute them, and an enhancement to Chemprop, a widely adopted open-source D-MPNN implementation for chemical property prediction.
Collapse
Affiliation(s)
- Shih-Cheng Li
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Haoyang Wu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Angiras Menon
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
2
|
Atz K, Nippa DF, Müller AT, Jost V, Anelli A, Reutlinger M, Kramer C, Martin RE, Grether U, Schneider G, Wuitschik G. Geometric deep learning-guided Suzuki reaction conditions assessment for applications in medicinal chemistry. RSC Med Chem 2024; 15:2310-2321. [PMID: 39026644 PMCID: PMC11253849 DOI: 10.1039/d4md00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/25/2024] [Indexed: 07/20/2024] Open
Abstract
Suzuki cross-coupling reactions are considered a valuable tool for constructing carbon-carbon bonds in small molecule drug discovery. However, the synthesis of chemical matter often represents a time-consuming and labour-intensive bottleneck. We demonstrate how machine learning methods trained on high-throughput experimentation (HTE) data can be leveraged to enable fast reaction condition selection for novel coupling partners. We show that the trained models support chemists in determining suitable catalyst-solvent-base combinations for individual transformations including an evaluation of the need for HTE screening. We introduce an algorithm for designing 96-well plates optimized towards reaction yields and discuss the model performance of zero- and few-shot machine learning. The best-performing machine learning model achieved a three-category classification accuracy of 76.3% (±0.2%) and an F 1-score for a binary classification of 79.1% (±0.9%). Validation on eight reactions revealed a receiver operating characteristic (ROC) curve (AUC) value of 0.82 (±0.07) for few-shot machine learning. On the other hand, zero-shot machine learning models achieved a mean ROC-AUC value of 0.63 (±0.16). This study positively advocates the application of few-shot machine learning-guided reaction condition selection for HTE campaigns in medicinal chemistry and highlights practical applications as well as challenges associated with zero-shot machine learning.
Collapse
Affiliation(s)
- Kenneth Atz
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - David F Nippa
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Alex T Müller
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Vera Jost
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Andrea Anelli
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Michael Reutlinger
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Christian Kramer
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Rainer E Martin
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Uwe Grether
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich Vladimir-Prelog-Weg 4 8093 Zurich Switzerland
| | - Georg Wuitschik
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| |
Collapse
|
3
|
Keto A, Guo T, Underdue M, Stuyver T, Coley CW, Zhang X, Krenske EH, Wiest O. Data-Efficient, Chemistry-Aware Machine Learning Predictions of Diels-Alder Reaction Outcomes. J Am Chem Soc 2024; 146:16052-16061. [PMID: 38822795 DOI: 10.1021/jacs.4c03131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2024]
Abstract
The application of machine learning models to the prediction of reaction outcomes currently needs large and/or highly featurized data sets. We show that a chemistry-aware model, NERF, which mimics the bonding changes that occur during reactions, allows for highly accurate predictions of the outcomes of Diels-Alder reactions using a relatively small training set, with no pretraining and no additional features. We establish a diverse data set of 9537 intramolecular, hetero-, aromatic, and inverse electron demand Diels-Alder reactions. This data set is used to train a NERF model, and the performance is compared against state-of-the-art classification and generative machine learning models across low- and high-data regimes, with and without pretraining. The predictive accuracy (regio- and site selectivity in the major product) achieved by NERF exceeds 90% when as little as 40% of the data set is used for training. Another high-performing model, Chemformer, requires a larger training data set (>45%) and pretraining to reach 90% Top-1 accuracy. Accurate predictions of less-represented reaction subclasses, such as those involving heteroatomic or aromatic substrates, require higher percentages of training data. We also show how NERF can use small amounts of additional training data to quickly learn new systems and improve its overall understanding of reactivity. Synthetic chemists stand to benefit as this model can be rapidly expanded and tailored to areas of chemistry corresponding to the low-data regime.
Collapse
Affiliation(s)
- Angus Keto
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Taicheng Guo
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Morgan Underdue
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Xiangliang Zhang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Elizabeth H Krenske
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Olaf Wiest
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| |
Collapse
|
4
|
Luchini G, Paton RS. Bottom-Up Atomistic Descriptions of Top-Down Macroscopic Measurements: Computational Benchmarks for Hammett Electronic Parameters. ACS PHYSICAL CHEMISTRY AU 2024; 4:259-267. [PMID: 38800724 PMCID: PMC11117679 DOI: 10.1021/acsphyschemau.3c00045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 01/14/2024] [Accepted: 01/16/2024] [Indexed: 05/29/2024]
Abstract
The ability to relate substituent electronic effects to chemical reactivity is a cornerstone of physical organic chemistry and Linear Free Energy Relationships. The computation of electronic parameters is increasingly attractive since they can be obtained rapidly for structures and substituents without available experimental data and can be applied beyond aromatic substituents, for example, in studies of transition metal complexes and aliphatic and radical systems. Nevertheless, the description of "top-down" macroscopic observables, such as Hammett parameters using a "bottom-up" computational approach, poses several challenges for the practitioner. We have examined and benchmarked the performance of various computational charge schemes encompassing quantum mechanical methods that partition charge density, methods that fit charge to physical observables, and methods enhanced by semiempirical adjustments alongside NMR values. We study the locations of the atoms used to obtain these descriptors and their correlation with empirical Hammett parameters and rate differences resulting from electronic effects. These seemingly small choices have a much more significant impact than previously imagined, which outweighs the level of theory or basis set used. We observe a wide range of performance across the different computational protocols and observe stark and surprising differences in the ability of computational parameters to capture para- vs meta-electronic effects. In general, σm predictions fare much worse than σp. As a result, the choice of where to compute these descriptors-for the ring carbons or the attached H or other substituent atoms-affects their ability to capture experimental electronic differences. Density-based schemes, such as Hirshfeld charges, are more stable toward unphysical charge perturbations that result from nearby functional groups and outperform all other computational descriptors, including several commonly used basis set based schemes such as Natural Population Analysis. Using attached atoms also improves the statistical correlations. We obtained general linear relationships for the global prediction of experimental Hammett parameters from computed descriptors for use in statistical modeling studies.
Collapse
Affiliation(s)
- Guilian Luchini
- Department
of Chemistry, Colorado State University, 1301 Center Ave., Ft. Collins, Colorado 80523-1872, United States
| | - Robert S. Paton
- Department
of Chemistry, Colorado State University, 1301 Center Ave., Ft. Collins, Colorado 80523-1872, United States
| |
Collapse
|
5
|
van Gerwen P, Briling KR, Calvino Alonso Y, Franke M, Corminboeuf C. Benchmarking machine-readable vectors of chemical reactions on computed activation barriers. DIGITAL DISCOVERY 2024; 3:932-943. [PMID: 38756222 PMCID: PMC11094696 DOI: 10.1039/d3dd00175j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 02/28/2024] [Indexed: 05/18/2024]
Abstract
In recent years, there has been a surge of interest in predicting computed activation barriers, to enable the acceleration of the automated exploration of reaction networks. Consequently, various predictive approaches have emerged, ranging from graph-based models to methods based on the three-dimensional structure of reactants and products. In tandem, many representations have been developed to predict experimental targets, which may hold promise for barrier prediction as well. Here, we bring together all of these efforts and benchmark various methods (Morgan fingerprints, the DRFP, the CGR representation-based Chemprop, SLATMd, B2Rl2, EquiReact and language model BERT + RXNFP) for the prediction of computed activation barriers on three diverse datasets.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Ksenia R Briling
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Yannick Calvino Alonso
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Malte Franke
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| |
Collapse
|
6
|
Zhao XG, Yang Q, Xu Y, Liu QY, Li ZY, Liu XX, Zhao YX, He SG. Machine Learning for Experimental Reactivity of a Set of Metal Clusters toward C-H Activation. J Am Chem Soc 2024; 146:12485-12495. [PMID: 38651836 DOI: 10.1021/jacs.4c00501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Understanding the mechanisms of C-H activation of alkanes is a very important research topic. The reactions of metal clusters with alkanes have been extensively studied to reveal the electronic features governing C-H activation, while the experimental cluster reactivity was qualitatively interpreted case by case in the literature. Herein, we prepared and mass-selected over 100 rhodium-based clusters (RhxVyOz- and RhxCoyOz-) to react with light alkanes, enabling the determination of reaction rate constants spanning six orders of magnitude. A satisfactory model being able to quantitatively describe the rate data in terms of multiple cluster electronic features (average electron occupancy of valence s orbitals, the minimum natural charge on the metal atom, cluster polarizability, and energy gap involved in the agostic interaction) has been constructed through a machine learning approach. This study demonstrates that the general mechanisms governing the very important process of C-H activation by diverse metal centers can be discovered by interpreting experimental data with artificial intelligence.
Collapse
Affiliation(s)
- Xi-Guan Zhao
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| | - Qi Yang
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| | - Ying Xu
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| | - Qing-Yu Liu
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| | - Zi-Yu Li
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| | - Xiao-Xiao Liu
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| | - Yan-Xia Zhao
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| | - Sheng-Gui He
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| |
Collapse
|
7
|
Shields JD, Howells R, Lamont G, Leilei Y, Madin A, Reimann CE, Rezaei H, Reuillon T, Smith B, Thomson C, Zheng Y, Ziegler RE. AiZynth impact on medicinal chemistry practice at AstraZeneca. RSC Med Chem 2024; 15:1085-1095. [PMID: 38665822 PMCID: PMC11042116 DOI: 10.1039/d3md00651d] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 02/15/2024] [Indexed: 04/28/2024] Open
Abstract
AstraZeneca chemists have been using the AI retrosynthesis tool AiZynth for three years. In this article, we present seven examples of how medicinal chemists using AiZynth positively impacted their drug discovery programmes. These programmes run the gamut from early-stage hit confirmation to late-stage route optimisation efforts. We also discuss the different use cases for which AI retrosynthesis tools are best suited.
Collapse
Affiliation(s)
- Jason D Shields
- Early Oncology R&D, AstraZeneca 35 Gatehouse Drive Waltham MA 02451 USA
| | - Rachel Howells
- Early Oncology R&D, AstraZeneca 1 Francis Crick Avenue Cambridge CB2 0AA UK
| | - Gillian Lamont
- Early Oncology R&D, AstraZeneca 1 Francis Crick Avenue Cambridge CB2 0AA UK
| | - Yin Leilei
- Pharmaron Beijing Co., Ltd. 6 Taihe Road BDA Beijing 100176 P.R. China
| | - Andrew Madin
- Discovery Sciences, AstraZeneca 1 Francis Crick Avenue Cambridge CB2 0AA UK
| | | | - Hadi Rezaei
- Early Oncology R&D, AstraZeneca 35 Gatehouse Drive Waltham MA 02451 USA
| | - Tristan Reuillon
- Respiratory & Immunology, BioPharmaceuticals R&D, AstraZeneca Pepparedsleden 1 43183 Mölndal Sweden
| | - Bryony Smith
- Early Oncology R&D, AstraZeneca 1 Francis Crick Avenue Cambridge CB2 0AA UK
| | - Clare Thomson
- Early Oncology R&D, AstraZeneca 1 Francis Crick Avenue Cambridge CB2 0AA UK
| | - Yuting Zheng
- Pharmaron Beijing Co., Ltd. 6 Taihe Road BDA Beijing 100176 P.R. China
| | - Robert E Ziegler
- Early Oncology R&D, AstraZeneca 35 Gatehouse Drive Waltham MA 02451 USA
| |
Collapse
|
8
|
Strieth-Kalthoff F, Szymkuć S, Molga K, Aspuru-Guzik A, Glorius F, Grzybowski BA. Artificial Intelligence for Retrosynthetic Planning Needs Both Data and Expert Knowledge. J Am Chem Soc 2024. [PMID: 38598363 DOI: 10.1021/jacs.4c00338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Rapid advancements in artificial intelligence (AI) have enabled breakthroughs across many scientific disciplines. In organic chemistry, the challenge of planning complex multistep chemical syntheses should conceptually be well-suited for AI. Yet, the development of AI synthesis planners trained solely on reaction-example-data has stagnated and is not on par with the performance of "hybrid" algorithms combining AI with expert knowledge. This Perspective examines possible causes of these shortcomings, extending beyond the established reasoning of insufficient quantities of reaction data. Drawing attention to the intricacies and data biases that are specific to the domain of synthetic chemistry, we advocate augmenting the unique capabilities of AI with the knowledge base and the reasoning strategies of domain experts. By actively involving synthetic chemists, who are the end users of any synthesis planning software, into the development process, we envision to bridge the gap between computer algorithms and the intricate nature of chemical synthesis.
Collapse
Affiliation(s)
- Felix Strieth-Kalthoff
- University of Toronto, Department of Chemistry and Department of Computer Science, 80 St. George St., Toronto, Ontario M5S 3H6, Canada
- University of Toronto, Department of Computer Science, 10 King's College Road, Toronto, Ontario M5S 3G4, Canada
| | - Sara Szymkuć
- Allchemy, 2145 45th Street #201, Highland, Indiana 46322, United States
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
| | - Karol Molga
- Allchemy, 2145 45th Street #201, Highland, Indiana 46322, United States
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
| | - Alán Aspuru-Guzik
- University of Toronto, Department of Chemistry and Department of Computer Science, 80 St. George St., Toronto, Ontario M5S 3H6, Canada
- University of Toronto, Department of Computer Science, 10 King's College Road, Toronto, Ontario M5S 3G4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave., Toronto, Ontario M5G 1M1, Canada
- University of Toronto, Department of Chemical Engineering and Applied Chemistry, 200 College St., Toronto, Ontario M5S 3E5, Canada
- University of Toronto, Department of Materials Science and Engineering, 184 College St., Toronto, Ontario M5S 3E4, Canada
| | - Frank Glorius
- Universität Münster, Organisch-Chemisches Institut, Corrensstr. 36, 48149 Münster, Germany
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
- IBS Center for Algorithmic and Robotized Synthesis, CARS, UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan 689-798, South Korea
- Department of Chemistry, UNIST, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan 689-798, South Korea
| |
Collapse
|
9
|
Chung Y, Green WH. Machine learning from quantum chemistry to predict experimental solvent effects on reaction rates. Chem Sci 2024; 15:2410-2424. [PMID: 38362410 PMCID: PMC10866337 DOI: 10.1039/d3sc05353a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 01/04/2024] [Indexed: 02/17/2024] Open
Abstract
Fast and accurate prediction of solvent effects on reaction rates are crucial for kinetic modeling, chemical process design, and high-throughput solvent screening. Despite the recent advance in machine learning, a scarcity of reliable data has hindered the development of predictive models that are generalizable for diverse reactions and solvents. In this work, we generate a large set of data with the COSMO-RS method for over 28 000 neutral reactions and 295 solvents and train a machine learning model to predict the solvation free energy and solvation enthalpy of activation (ΔΔG‡solv, ΔΔH‡solv) for a solution phase reaction. On unseen reactions, the model achieves mean absolute errors of 0.71 and 1.03 kcal mol-1 for ΔΔG‡solv and ΔΔH‡solv, respectively, relative to the COSMO-RS calculations. The model also provides reliable predictions of relative rate constants within a factor of 4 when tested on experimental data. The presented model can provide nearly instantaneous predictions of kinetic solvent effects or relative rate constants for a broad range of neutral closed-shell or free radical reactions and solvents only based on atom-mapped reaction SMILES and solvent SMILES strings.
Collapse
Affiliation(s)
- Yunsie Chung
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
10
|
Nippa DF, Atz K, Hohler R, Müller AT, Marx A, Bartelmus C, Wuitschik G, Marzuoli I, Jost V, Wolfard J, Binder M, Stepan AF, Konrad DB, Grether U, Martin RE, Schneider G. Enabling late-stage drug diversification by high-throughput experimentation with geometric deep learning. Nat Chem 2024; 16:239-248. [PMID: 37996732 PMCID: PMC10849962 DOI: 10.1038/s41557-023-01360-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 10/03/2023] [Indexed: 11/25/2023]
Abstract
Late-stage functionalization is an economical approach to optimize the properties of drug candidates. However, the chemical complexity of drug molecules often makes late-stage diversification challenging. To address this problem, a late-stage functionalization platform based on geometric deep learning and high-throughput reaction screening was developed. Considering borylation as a critical step in late-stage functionalization, the computational model predicted reaction yields for diverse reaction conditions with a mean absolute error margin of 4-5%, while the reactivity of novel reactions with known and unknown substrates was classified with a balanced accuracy of 92% and 67%, respectively. The regioselectivity of the major products was accurately captured with a classifier F-score of 67%. When applied to 23 diverse commercial drug molecules, the platform successfully identified numerous opportunities for structural diversification. The influence of steric and electronic information on model performance was quantified, and a comprehensive simple user-friendly reaction format was introduced that proved to be a key enabler for seamlessly integrating deep learning and high-throughput experimentation for late-stage functionalization.
Collapse
Affiliation(s)
- David F Nippa
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Kenneth Atz
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Remo Hohler
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Alex T Müller
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Andreas Marx
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Christian Bartelmus
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Georg Wuitschik
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Irene Marzuoli
- Process Chemistry and Catalysis (PCC), F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Vera Jost
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Jens Wolfard
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Martin Binder
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Antonia F Stepan
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - David B Konrad
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Munich, Germany.
| | - Uwe Grether
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland.
| | - Rainer E Martin
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland.
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland.
- ETH Singapore SEC Ltd, Singapore, Singapore.
| |
Collapse
|
11
|
King-Smith E, Faber FA, Reilly U, Sinitskiy AV, Yang Q, Liu B, Hyek D, Lee AA. Predictive Minisci late stage functionalization with transfer learning. Nat Commun 2024; 15:426. [PMID: 38225239 PMCID: PMC10789750 DOI: 10.1038/s41467-023-42145-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 10/01/2023] [Indexed: 01/17/2024] Open
Abstract
Structural diversification of lead molecules is a key component of drug discovery to explore chemical space. Late-stage functionalizations (LSFs) are versatile methodologies capable of installing functional handles on richly decorated intermediates to deliver numerous diverse products in a single reaction. Predicting the regioselectivity of LSF is still an open challenge in the field. Numerous efforts from chemoinformatics and machine learning (ML) groups have made strides in this area. However, it is arduous to isolate and characterize the multitude of LSF products generated, limiting available data and hindering pure ML approaches. We report the development of an approach that combines a message passing neural network and 13C NMR-based transfer learning to predict the atom-wise probabilities of functionalization for Minisci and P450-based functionalizations. We validated our model both retrospectively and with a series of prospective experiments, showing that it accurately predicts the outcomes of Minisci-type and P450 transformations and outperforms the well-established Fukui-based reactivity indices and other machine learning reactivity-based algorithms.
Collapse
Affiliation(s)
- Emma King-Smith
- Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Felix A Faber
- Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Usa Reilly
- Development & Medical, Pfizer Worldwide Research, Groton, CT, USA
| | - Anton V Sinitskiy
- Machine Learning Computational Sciences, Pfizer Worldwide Research, Cambridge, MA, USA
| | - Qingyi Yang
- Development & Medical, Pfizer Worldwide Research, Cambridge, MA, USA
| | - Bo Liu
- Spectrix Analytic Services, LLC., North Haven, CT, USA
| | - Dennis Hyek
- Spectrix Analytic Services, LLC., North Haven, CT, USA
| | - Alpha A Lee
- Cavendish Laboratory, University of Cambridge, Cambridge, UK.
| |
Collapse
|
12
|
Heid E, Greenman KP, Chung Y, Li SC, Graff DE, Vermeire FH, Wu H, Green WH, McGill CJ. Chemprop: A Machine Learning Package for Chemical Property Prediction. J Chem Inf Model 2024; 64:9-17. [PMID: 38147829 PMCID: PMC10777403 DOI: 10.1021/acs.jcim.3c01250] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/04/2023] [Accepted: 12/05/2023] [Indexed: 12/28/2023]
Abstract
Deep learning has become a powerful and frequently employed tool for the prediction of molecular properties, thus creating a need for open-source and versatile software solutions that can be operated by nonexperts. Among the current approaches, directed message-passing neural networks (D-MPNNs) have proven to perform well on a variety of property prediction tasks. The software package Chemprop implements the D-MPNN architecture and offers simple, easy, and fast access to machine-learned molecular properties. Compared to its initial version, we present a multitude of new Chemprop functionalities such as the support of multimolecule properties, reactions, atom/bond-level properties, and spectra. Further, we incorporate various uncertainty quantification and calibration methods along with related metrics as well as pretraining and transfer learning workflows, improved hyperparameter optimization, and other customization options concerning loss functions or atom/bond features. We benchmark D-MPNN models trained using Chemprop with the new reaction, atom-level, and spectra functionality on a variety of property prediction data sets, including MoleculeNet and SAMPL, and observe state-of-the-art performance on the prediction of water-octanol partition coefficients, reaction barrier heights, atomic partial charges, and absorption spectra. Chemprop enables out-of-the-box training of D-MPNN models for a variety of problem settings in fast, user-friendly, and open-source software.
Collapse
Affiliation(s)
- Esther Heid
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Institute
of Materials Chemistry, TU Wien, 1060 Vienna, Austria
| | - Kevin P. Greenman
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Yunsie Chung
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Shih-Cheng Li
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical Engineering, National Taiwan
University, Taipei 10617, Taiwan
| | - David E. Graff
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry and Chemical Biology, Harvard
University, Cambridge, Massachusetts 02138, United States
| | - Florence H. Vermeire
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical Engineering, KU Leuven, Celestijnenlaan 200F, B-3001 Leuven, Belgium
| | - Haoyang Wu
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - William H. Green
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Charles J. McGill
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| |
Collapse
|
13
|
Raghavan P, Haas BC, Ruos ME, Schleinitz J, Doyle AG, Reisman SE, Sigman MS, Coley CW. Dataset Design for Building Models of Chemical Reactivity. ACS CENTRAL SCIENCE 2023; 9:2196-2204. [PMID: 38161380 PMCID: PMC10755851 DOI: 10.1021/acscentsci.3c01163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 11/06/2023] [Accepted: 11/15/2023] [Indexed: 01/03/2024]
Abstract
Models can codify our understanding of chemical reactivity and serve a useful purpose in the development of new synthetic processes via, for example, evaluating hypothetical reaction conditions or in silico substrate tolerance. Perhaps the most determining factor is the composition of the training data and whether it is sufficient to train a model that can make accurate predictions over the full domain of interest. Here, we discuss the design of reaction datasets in ways that are conducive to data-driven modeling, emphasizing the idea that training set diversity and model generalizability rely on the choice of molecular or reaction representation. We additionally discuss the experimental constraints associated with generating common types of chemistry datasets and how these considerations should influence dataset design and model building.
Collapse
Affiliation(s)
- Priyanka Raghavan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Brittany C. Haas
- Department
of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Madeline E. Ruos
- Department
of Chemistry & Biochemistry, University
of California, Los Angeles, Los Angeles, California 90095, United States
| | - Jules Schleinitz
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Abigail G. Doyle
- Department
of Chemistry & Biochemistry, University
of California, Los Angeles, Los Angeles, California 90095, United States
| | - Sarah E. Reisman
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Matthew S. Sigman
- Department
of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
14
|
Pattanaik L, Menon A, Settels V, Spiekermann KA, Tan Z, Vermeire FH, Sandfort F, Eiden P, Green WH. ConfSolv: Prediction of Solute Conformer-Free Energies across a Range of Solvents. J Phys Chem B 2023; 127:10151-10170. [PMID: 37966798 DOI: 10.1021/acs.jpcb.3c05904] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Predicting Gibbs free energy of solution is key to understanding the solvent effects on thermodynamics and reaction rates for kinetic modeling. Accurately computing solution free energies requires the enumeration and evaluation of relevant solute conformers in solution. However, even after generation of relevant conformers, determining their free energy of solution requires an expensive workflow consisting of several ab initio computational chemistry calculations. To help address this challenge, we generate a large data set of solution free energies for nearly 44,000 solutes with almost 9 million conformers calculated in 41 different solvents using density functional theory and COSMO-RS and quantify the impact of solute conformers on the solution free energy. We then train a message passing neural network to predict the relative solution free energies of a set of solute conformers, enabling the identification of a small subset of thermodynamically relevant conformers. The model offers substantial computational time savings with predictions usually substantially within 1 kcal/mol of the free energy of the solution calculated by using computational chemical methods.
Collapse
Affiliation(s)
- Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Angiras Menon
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Volker Settels
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Zipei Tan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Florence H Vermeire
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemical Engineering, KU Leuven, Celestijnenlaan 200F, Leuven 3001, Belgium
| | - Frederik Sandfort
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Philipp Eiden
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
15
|
Nippa DF, Atz K, Müller AT, Wolfard J, Isert C, Binder M, Scheidegger O, Konrad DB, Grether U, Martin RE, Schneider G. Identifying opportunities for late-stage C-H alkylation with high-throughput experimentation and in silico reaction screening. Commun Chem 2023; 6:256. [PMID: 37985850 PMCID: PMC10661846 DOI: 10.1038/s42004-023-01047-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 10/30/2023] [Indexed: 11/22/2023] Open
Abstract
Enhancing the properties of advanced drug candidates is aided by the direct incorporation of specific chemical groups, avoiding the need to construct the entire compound from the ground up. Nevertheless, their chemical intricacy often poses challenges in predicting reactivity for C-H activation reactions and planning their synthesis. We adopted a reaction screening approach that combines high-throughput experimentation (HTE) at a nanomolar scale with computational graph neural networks (GNNs). This approach aims to identify suitable substrates for late-stage C-H alkylation using Minisci-type chemistry. GNNs were trained using experimentally generated reactions derived from in-house HTE and literature data. These trained models were then used to predict, in a forward-looking manner, the coupling of 3180 advanced heterocyclic building blocks with a diverse set of sp3-rich carboxylic acids. This predictive approach aimed to explore the substrate landscape for Minisci-type alkylations. Promising candidates were chosen, their production was scaled up, and they were subsequently isolated and characterized. This process led to the creation of 30 novel, functionally modified molecules that hold potential for further refinement. These results positively advocate the application of HTE-based machine learning to virtual reaction screening.
Collapse
Affiliation(s)
- David F Nippa
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Butenandtstrasse 5, 81377, Munich, Germany
| | - Kenneth Atz
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Alex T Müller
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland
| | - Jens Wolfard
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland
| | - Clemens Isert
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Martin Binder
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland
| | - Oliver Scheidegger
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland
| | - David B Konrad
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Butenandtstrasse 5, 81377, Munich, Germany.
| | - Uwe Grether
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland.
| | - Rainer E Martin
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland.
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland.
| |
Collapse
|
16
|
Li J, Wu N, Zhang J, Wu HH, Pan K, Wang Y, Liu G, Liu X, Yao Z, Zhang Q. Machine Learning-Assisted Low-Dimensional Electrocatalysts Design for Hydrogen Evolution Reaction. NANO-MICRO LETTERS 2023; 15:227. [PMID: 37831203 PMCID: PMC10575847 DOI: 10.1007/s40820-023-01192-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 08/10/2023] [Indexed: 10/14/2023]
Abstract
Efficient electrocatalysts are crucial for hydrogen generation from electrolyzing water. Nevertheless, the conventional "trial and error" method for producing advanced electrocatalysts is not only cost-ineffective but also time-consuming and labor-intensive. Fortunately, the advancement of machine learning brings new opportunities for electrocatalysts discovery and design. By analyzing experimental and theoretical data, machine learning can effectively predict their hydrogen evolution reaction (HER) performance. This review summarizes recent developments in machine learning for low-dimensional electrocatalysts, including zero-dimension nanoparticles and nanoclusters, one-dimensional nanotubes and nanowires, two-dimensional nanosheets, as well as other electrocatalysts. In particular, the effects of descriptors and algorithms on screening low-dimensional electrocatalysts and investigating their HER performance are highlighted. Finally, the future directions and perspectives for machine learning in electrocatalysis are discussed, emphasizing the potential for machine learning to accelerate electrocatalyst discovery, optimize their performance, and provide new insights into electrocatalytic mechanisms. Overall, this work offers an in-depth understanding of the current state of machine learning in electrocatalysis and its potential for future research.
Collapse
Affiliation(s)
- Jin Li
- College of Chemistry and Chemical Engineering, and Henan Key Laboratory of Function-Oriented Porous Materials, Luoyang Normal University, Luoyang, 471934, People's Republic of China
| | - Naiteng Wu
- College of Chemistry and Chemical Engineering, and Henan Key Laboratory of Function-Oriented Porous Materials, Luoyang Normal University, Luoyang, 471934, People's Republic of China
| | - Jian Zhang
- New Energy Technology Engineering Lab of Jiangsu Province, College of Science, Nanjing University of Posts and Telecommunications (NUPT), Nanjing, 210023, People's Republic of China
| | - Hong-Hui Wu
- School of Materials Science and Engineering, University of Science and Technology Beijing, Beijing, 100083, People's Republic of China.
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE, 8588, USA.
| | - Kunming Pan
- Henan Key Laboratory of High-Temperature Structural and Functional Materials, National Joint Engineering Research Center for Abrasion Control and Molding of Metal Materials, Henan University of Science and Technology, Luoyang, 471003, People's Republic of China
| | - Yingxue Wang
- National Engineering Laboratory for Risk Perception and Prevention, Beijing, 100041, People's Republic of China.
| | - Guilong Liu
- College of Chemistry and Chemical Engineering, and Henan Key Laboratory of Function-Oriented Porous Materials, Luoyang Normal University, Luoyang, 471934, People's Republic of China
| | - Xianming Liu
- College of Chemistry and Chemical Engineering, and Henan Key Laboratory of Function-Oriented Porous Materials, Luoyang Normal University, Luoyang, 471934, People's Republic of China.
| | - Zhenpeng Yao
- Center of Hydrogen Science, Shanghai Jiao Tong University, Shanghai, 200000, People's Republic of China
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200000, People's Republic of China
| | - Qiaobao Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Materials, Xiamen University, Xiamen, 361005, People's Republic of China.
| |
Collapse
|
17
|
Shilpa S, Kashyap G, Sunoj RB. Recent Applications of Machine Learning in Molecular Property and Chemical Reaction Outcome Predictions. J Phys Chem A 2023; 127:8253-8271. [PMID: 37769193 DOI: 10.1021/acs.jpca.3c04779] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/30/2023]
Abstract
Burgeoning developments in machine learning (ML) and its rapidly growing adaptations in chemistry are noteworthy. Motivated by the successful deployments of ML in the realm of molecular property prediction (MPP) and chemical reaction prediction (CRP), herein we highlight some of its most recent applications in predictive chemistry. We present a nonmathematical and concise overview of the progression of ML implementations, ranging from an ensemble-based random forest model to advanced graph neural network algorithms. Similarly, the prospects of various feature engineering and feature learning approaches that work in conjunction with ML models are described. Highly accurate predictions reported in MPP tasks (e.g., lipophilicity, solubility, distribution coefficient), using methods such as D-MPNN, MolCLR, SMILES-BERT, and MolBERT, offer promising avenues in molecular design and drug discovery. Whereas MPP pertains to a given molecule, ML applications in chemical reactions present a different level of challenge, primarily arising from the simultaneous involvement of multiple molecules and their diverse roles in a reaction setting. The reported RMSEs in MPP tasks range from 0.287 to 2.20, while those for yield predictions are well over 4.9 in the lower end, reaching thresholds of >10.0 in several examples. Our Review concludes with a set of persisting challenges in dealing with reaction data sets and an overall optimistic outlook on benefits of ML-driven workflows for various MPP as well as CRP tasks.
Collapse
Affiliation(s)
- Shilpa Shilpa
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Gargee Kashyap
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
- Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| |
Collapse
|
18
|
Biswas S, Chung Y, Ramirez J, Wu H, Green WH. Predicting Critical Properties and Acentric Factors of Fluids Using Multitask Machine Learning. J Chem Inf Model 2023; 63:4574-4588. [PMID: 37487557 DOI: 10.1021/acs.jcim.3c00546] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]
Abstract
Knowledge of critical properties, such as critical temperature, pressure, density, as well as acentric factor, is essential to calculate thermo-physical properties of chemical compounds. Experiments to determine critical properties and acentric factors are expensive and time intensive; therefore, we developed a machine learning (ML) model that can predict these molecular properties given the SMILES representation of a chemical species. We explored directed message passing neural network (D-MPNN) and graph attention network as ML architecture choices. Additionally, we investigated featurization with additional atomic and molecular features, multitask training, and pretraining using estimated data to optimize model performance. Our final model utilizes a D-MPNN layer to learn the molecular representation and is supplemented by Abraham parameters. A multitask training scheme was used to train a single model to predict all the critical properties and acentric factors along with boiling point, melting point, enthalpy of vaporization, and enthalpy of fusion. The model was evaluated on both random and scaffold splits where it shows state-of-the-art accuracies. The extensive data set of critical properties and acentric factors contains 1144 chemical compounds and is made available in the public domain together with the source code that can be used for further exploration.
Collapse
Affiliation(s)
- Sayandeep Biswas
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yunsie Chung
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Josephine Ramirez
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Haoyang Wu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
19
|
Lin Z, Dhawa U, Hou X, Surke M, Yuan B, Li SW, Liou YC, Johansson MJ, Xu LC, Chao CH, Hong X, Ackermann L. Electrocatalyzed direct arene alkenylations without directing groups for selective late-stage drug diversification. Nat Commun 2023; 14:4224. [PMID: 37454167 DOI: 10.1038/s41467-023-39747-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 06/22/2023] [Indexed: 07/18/2023] Open
Abstract
Electrooxidation has emerged as an increasingly viable platform in molecular syntheses that can avoid stoichiometric chemical redox agents. Despite major progress in electrochemical C-H activations, these arene functionalizations generally require directing groups to enable the C-H activation. The installation and removal of these directing groups call for additional synthesis steps, which jeopardizes the inherent efficacy of the electrochemical C-H activation approach, leading to undesired waste with reduced step and atom economy. In sharp contrast, herein we present palladium-electrochemical C-H olefinations of simple arenes devoid of exogenous directing groups. The robust electrocatalysis protocol proved amenable to a wide range of both electron-rich and electron-deficient arenes under exceedingly mild reaction conditions, avoiding chemical oxidants. This study points to an interesting approach of two electrochemical transformations for the success of outstanding levels of position-selectivities in direct olefinations of electron-rich anisoles. A physical organic parameter-based machine learning model was developed to predict position-selectivity in electrochemical C-H olefinations. Furthermore, late-stage functionalizations set the stage for the direct C-H olefinations of structurally complex pharmaceutically relevant compounds, thereby avoiding protection and directing group manipulations.
Collapse
Affiliation(s)
- Zhipeng Lin
- Wöhler Research Institute for Sustainable Chemistry (WISCh), Georg-August-Universität Göttingen, Göttingen, Germany
| | - Uttam Dhawa
- Wöhler Research Institute for Sustainable Chemistry (WISCh), Georg-August-Universität Göttingen, Göttingen, Germany
| | - Xiaoyan Hou
- Wöhler Research Institute for Sustainable Chemistry (WISCh), Georg-August-Universität Göttingen, Göttingen, Germany
| | - Max Surke
- Wöhler Research Institute for Sustainable Chemistry (WISCh), Georg-August-Universität Göttingen, Göttingen, Germany
| | - Binbin Yuan
- Wöhler Research Institute for Sustainable Chemistry (WISCh), Georg-August-Universität Göttingen, Göttingen, Germany
| | - Shu-Wen Li
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, China
| | - Yan-Cheng Liou
- Wöhler Research Institute for Sustainable Chemistry (WISCh), Georg-August-Universität Göttingen, Göttingen, Germany
| | - Magnus J Johansson
- Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
- Department of Organic Chemistry, Stockholm University, Stockholm, Sweden
| | - Li-Cheng Xu
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, China
| | - Chen-Hang Chao
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, China
| | - Xin Hong
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, China.
- Beijing National Laboratory for Molecular Sciences, Beijing, PR China.
- Key Laboratory of Precise Synthesis of Functional Molecules of Zhejiang Province, School of Science, Westlake University, Hangzhou, Zhejiang Province, China.
| | - Lutz Ackermann
- Wöhler Research Institute for Sustainable Chemistry (WISCh), Georg-August-Universität Göttingen, Göttingen, Germany.
- German Centre for Cardiovascular Research (DZHK), Berlin, Germany.
| |
Collapse
|
20
|
Li SW, Xu LC, Zhang C, Zhang SQ, Hong X. Reaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge. Nat Commun 2023; 14:3569. [PMID: 37322041 DOI: 10.1038/s41467-023-39283-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023] Open
Abstract
Accurate prediction of reactivity and selectivity provides the desired guideline for synthetic development. Due to the high-dimensional relationship between molecular structure and synthetic function, it is challenging to achieve the predictive modelling of synthetic transformation with the required extrapolative ability and chemical interpretability. To meet the gap between the rich domain knowledge of chemistry and the advanced molecular graph model, herein we report a knowledge-based graph model that embeds the digitalized steric and electronic information. In addition, a molecular interaction module is developed to enable the learning of the synergistic influence of reaction components. In this study, we demonstrate that this knowledge-based graph model achieves excellent predictions of reaction yield and stereoselectivity, whose extrapolative ability is corroborated by additional scaffold-based data splittings and experimental verifications with new catalysts. Because of the embedding of local environment, the model allows the atomic level of interpretation of the steric and electronic influence on the overall synthetic performance, which serves as a useful guide for the molecular engineering towards the target synthetic function. This model offers an extrapolative and interpretable approach for reaction performance prediction, pointing out the importance of chemical knowledge-constrained reaction modelling for synthetic purpose.
Collapse
Affiliation(s)
- Shu-Wen Li
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China
| | - Li-Cheng Xu
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China
| | - Cheng Zhang
- Department of Chemistry, University of Science and Technology of China, Hefei, China
| | - Shuo-Qing Zhang
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China.
| | - Xin Hong
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China.
- Beijing National Laboratory for Molecular Sciences, Zhongguancun North First Street No. 2, Beijing, 100190, PR China.
- Key Laboratory of Precise Synthesis of Functional Molecules of Zhejiang Province, School of Science, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, China.
| |
Collapse
|
21
|
Ektefaie Y, Dasoulas G, Noori A, Farhat M, Zitnik M. Multimodal learning with graphs. NAT MACH INTELL 2023; 5:340-350. [PMID: 38076673 PMCID: PMC10704992 DOI: 10.1038/s42256-023-00624-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 02/01/2023] [Indexed: 04/05/2023]
Abstract
Artificial intelligence for graphs has achieved remarkable success in modeling complex systems, ranging from dynamic networks in biology to interacting particle systems in physics. However, the increasingly heterogeneous graph datasets call for multimodal methods that can combine different inductive biases-the set of assumptions that algorithms use to make predictions for inputs they have not encountered during training. Learning on multimodal datasets presents fundamental challenges because the inductive biases can vary by data modality and graphs might not be explicitly given in the input. To address these challenges, multimodal graph AI methods combine different modalities while leveraging cross-modal dependencies using graphs. Diverse datasets are combined using graphs and fed into sophisticated multimodal architectures, specified as image-intensive, knowledge-grounded and language-intensive models. Using this categorization, we introduce a blueprint for multimodal graph learning, use it to study existing methods and provide guidelines to design new models.
Collapse
Affiliation(s)
- Yasha Ektefaie
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA 02115, USA
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
| | - George Dasoulas
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
- Harvard Data Science Initiative, Cambridge, MA 02138, USA
| | - Ayush Noori
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
- Harvard College, Cambridge, MA 02138, USA
| | - Maha Farhat
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
- Division of Pulmonary and Critical Care, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Harvard Data Science Initiative, Cambridge, MA 02138, USA
| |
Collapse
|
22
|
Noto N, Yada A, Yanai T, Saito S. Machine-Learning Classification for the Prediction of Catalytic Activity of Organic Photosensitizers in the Nickel(II)-Salt-Induced Synthesis of Phenols. Angew Chem Int Ed Engl 2023; 62:e202219107. [PMID: 36645619 DOI: 10.1002/anie.202219107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 01/15/2023] [Accepted: 01/16/2023] [Indexed: 01/17/2023]
Abstract
Catalytic systems using a small amount of organic photosensitizer for the activation of an inorganic (on-demand ligand-free) nickel(II) salt represent a cost-effective method for cross-coupling reactions, while C(sp2 )-O bond formation remains less developed. Herein, we report a strategy for the synthesis of phenols with a nickel(II) salt and an organic photosensitizer, which was identified via an investigation into the catalytic activity of 60 organic photosensitizers consisting of various electron donor and acceptor moieties. To examine the effect of multiple intractable parameters on the catalytic activity of photosensitizers, machine-learning (ML) models were developed, wherein we embedded descriptors representing their physical and structural properties, which were obtained from DFT calculations and RDKit, respectively. The study clarified that integrating both DFT- and RDKit-derived descriptors in ML models balances higher "precision" and "recall" across a wide range of search space relative to using only one of the two descriptor sets.
Collapse
Affiliation(s)
- Naoki Noto
- Integrated Research Consortium on Chemical Sciences (IRCCS), Nagoya University, Nagoya, Aichi, 464-8602, Japan
| | - Akira Yada
- Interdisciplinary Research Center for Catalytic Chemistry, National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Higashi, Tsukuba, Ibaraki, 305-8565, Japan
| | - Takeshi Yanai
- Institute of Transformative Bio-Molecules (WPI-ITbM) and Graduate School of Science, Nagoya University, Nagoya, Aichi, 464-8602, Japan
| | - Susumu Saito
- Integrated Research Consortium on Chemical Sciences (IRCCS) and Graduate School of Science, Nagoya University, Nagoya, Aichi, 464-8602, Japan
| |
Collapse
|
23
|
Chen Y, Ou Y, Zheng P, Huang Y, Ge F, Dral PO. Benchmark of general-purpose machine learning-based quantum mechanical method AIQM1 on reaction barrier heights. J Chem Phys 2023; 158:074103. [PMID: 36813722 DOI: 10.1063/5.0137101] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence-enhanced quantum mechanical method 1 (AIQM1) is a general-purpose method that was shown to achieve high accuracy for many applications with a speed close to its baseline semiempirical quantum mechanical (SQM) method ODM2*. Here, we evaluate the hitherto unknown performance of out-of-the-box AIQM1 without any refitting for reaction barrier heights on eight datasets, including a total of ∼24 thousand reactions. This evaluation shows that AIQM1's accuracy strongly depends on the type of transition state and ranges from excellent for rotation barriers to poor for, e.g., pericyclic reactions. AIQM1 clearly outperforms its baseline ODM2* method and, even more so, a popular universal potential, ANI-1ccx. Overall, however, AIQM1 accuracy largely remains similar to SQM methods (and B3LYP/6-31G* for most reaction types) suggesting that it is desirable to focus on improving AIQM1 performance for barrier heights in the future. We also show that the built-in uncertainty quantification helps in identifying confident predictions. The accuracy of confident AIQM1 predictions is approaching the level of popular density functional theory methods for most reaction types. Encouragingly, AIQM1 is rather robust for transition state optimizations, even for the type of reactions it struggles with the most. Single-point calculations with high-level methods on AIQM1-optimized geometries can be used to significantly improve barrier heights, which cannot be said for its baseline ODM2* method.
Collapse
Affiliation(s)
- Yuxinxin Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yanchi Ou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yaohuang Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
24
|
Neves P, McClure K, Verhoeven J, Dyubankova N, Nugmanov R, Gedich A, Menon S, Shi Z, Wegner JK. Global reactivity models are impactful in industrial synthesis applications. J Cheminform 2023; 15:20. [PMID: 36774523 PMCID: PMC9921076 DOI: 10.1186/s13321-023-00685-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 01/22/2023] [Indexed: 02/13/2023] Open
Abstract
Artificial Intelligence is revolutionizing many aspects of the pharmaceutical industry. Deep learning models are now routinely applied to guide drug discovery projects leading to faster and improved findings, but there are still many tasks with enormous unrealized potential. One such task is the reaction yield prediction. Every year more than one fifth of all synthesis attempts result in product yields which are either zero or too low. This equates to chemical and human resources being spent on activities which ultimately do not progress the programs, leading to a triple loss when accounting for the cost of opportunity in time wasted. In this work we pre-train a BERT model on more than 16 million reactions from 4 different data sources, and fine tune it to achieve an uncertainty calibrated global yield prediction model. This model is an improvement upon state of the art not just from the increase in pre-train data but also by introducing a new embedding layer which solves a few limitations of SMILES and enables integration of additional information such as equivalents and molecule role into the reaction encoding, the model is called BERT Enriched Embedding (BEE). The model is benchmarked on an open-source dataset against a state-of-the-art synthesis focused BERT showing a near 20-point improvement in r2 score. The model is fine-tuned and tested on an internal company data benchmark, and a prospective study shows that the application of the model can reduce the total number of negative reactions (yield under 5%) ran in Janssen by at least 34%. Lastly, we corroborate the previous results through experimental validation, by directly deploying the model in an on-going drug discovery project and showing that it can also be used successfully as a reagent recommender due to its fast inference speed and reliable confidence estimation, a critical feature for industry application.
Collapse
Affiliation(s)
- Paulo Neves
- In-Silico Discovery and External Innovation (ISDEI), Janssen Research & Development, Janssen Pharmaceutica N.V, Beerse, Belgium.
| | - Kelly McClure
- Discovery Chemistry LJ, Janssen Research & Development, Janssen Pharmaceutica N.V, Philadelphia, United States of America
| | - Jonas Verhoeven
- grid.419619.20000 0004 0623 0341In-Silico Discovery and External Innovation (ISDEI), Janssen Research & Development, Janssen Pharmaceutica N.V, Beerse, Belgium
| | - Natalia Dyubankova
- grid.419619.20000 0004 0623 0341In-Silico Discovery and External Innovation (ISDEI), Janssen Research & Development, Janssen Pharmaceutica N.V, Beerse, Belgium
| | - Ramil Nugmanov
- grid.419619.20000 0004 0623 0341In-Silico Discovery and External Innovation (ISDEI), Janssen Research & Development, Janssen Pharmaceutica N.V, Beerse, Belgium
| | | | - Sairam Menon
- grid.419619.20000 0004 0623 0341Pharma R&D Information Tech, Janssen Research & Development, Janssen Pharmaceutica N.V, Beerse, Belgium
| | - Zhicai Shi
- Discovery Chemistry LJ, Janssen Research & Development, Janssen Pharmaceutica N.V, Philadelphia, United States of America
| | - Jörg K. Wegner
- grid.419619.20000 0004 0623 0341In-Silico Discovery and External Innovation (ISDEI), Janssen Research & Development, Janssen Pharmaceutica N.V, Beerse, Belgium
| |
Collapse
|
25
|
Singh S, Sunoj RB. Molecular Machine Learning for Chemical Catalysis: Prospects and Challenges. Acc Chem Res 2023; 56:402-412. [PMID: 36715248 DOI: 10.1021/acs.accounts.2c00801] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
ConspectusIn the domain of reaction development, one aims to obtain higher efficacies as measured in terms of yield and/or selectivities. During the empirical cycles, an admixture of outcomes from low to high yields/selectivities is expected. While it is not easy to identify all of the factors that might impact the reaction efficiency, complex and nonlinear dependence on the nature of reactants, catalysts, solvents, etc. is quite likely. Developmental stages of newer reactions would typically offer a few hundreds of samples with variations in participating molecules and/or reaction conditions. These "observations" and their "output" can be harnessed as valuable labeled data for developing molecular machine learning (ML) models. Once a robust ML model is built for a specific reaction under development, it can predict the reaction outcome for any new choice of substrates/catalyst in a few seconds/minutes and thus can expedite the identification of promising candidates for experimental validation. Recent years have witnessed impressive applications of ML in the molecular world, most of them aimed at predicting important chemical or biological properties. We believe that an integration of effective ML workflows can be made richly beneficial to reaction discovery.As with any new technology, direct adaptation of ML as used in well-developed domains, such as natural language processing (NLP) and image recognition, is unlikely to succeed in reaction discovery. Some of the challenges stem from ineffective featurization of the molecular space, unavailability of quality data and its distribution, in making the right choice of ML model and its technically robust deployment. It shall be noted that there is no universal ML model suitable for an inherently high-dimensional problem such as chemical reactions. Given these backgrounds, rendering ML tools conducive for reactions is an exciting as well as challenging endeavor at the same time. With the increased availability of efficient ML algorithms, we focused on tapping their potential for small-data reaction discovery (a few hundreds to thousands of samples).In this Account, we describe both feature engineering and feature learning approaches for molecular ML as applied to diverse reactions of high contemporary interest. Among these, catalytic asymmetric hydrogenation of imines/alkenes, β-C(sp3)-H bond functionalization, and relay Heck reaction employed a feature engineering approach using the quantum-chemically derived physical organic descriptors as the molecular features─all designed to predict the enantioselectivity. The selection of molecular features to customize it for a reaction of interest is described, along with emphasizing the chemical insights that could be gathered through the use of such features. Feature learning methods for predicting the yield of Buchwald-Hartwig cross-coupling, deoxyfluorination of alcohols, and enantioselectivity of N,S-acetal formation are found to offer excellent predictions. We propose a transfer learning protocol, wherein an ML model such as a language model is trained on a large number of molecules (105-106) and fine-tuned on a focused library of target task reactions, as an effective alternative for small-data reaction discovery (102-103 reactions). The exploitation of deep neural network latent space as a method for generative tasks to identify useful substrates for a reaction is demonstrated as a promising strategy.
Collapse
Affiliation(s)
- Sukriti Singh
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai 400076, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai 400076, India.,Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Mumbai 400076, India
| |
Collapse
|
26
|
Zhang SQ, Xu LC, Li SW, Oliveira JCA, Li X, Ackermann L, Hong X. Bridging Chemical Knowledge and Machine Learning for Performance Prediction of Organic Synthesis. Chemistry 2023; 29:e202202834. [PMID: 36206170 PMCID: PMC10099903 DOI: 10.1002/chem.202202834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Indexed: 11/29/2022]
Abstract
Recent years have witnessed a boom of machine learning (ML) applications in chemistry, which reveals the potential of data-driven prediction of synthesis performance. Digitalization and ML modelling are the key strategies to fully exploit the unique potential within the synergistic interplay between experimental data and the robust prediction of performance and selectivity. A series of exciting studies have demonstrated the importance of chemical knowledge implementation in ML, which improves the model's capability for making predictions that are challenging and often go beyond the abilities of human beings. This Minireview summarizes the cutting-edge embedding techniques and model designs in synthetic performance prediction, elaborating how chemical knowledge can be incorporated into machine learning until June 2022. By merging organic synthesis tactics and chemical informatics, we hope this Review can provide a guide map and intrigue chemists to revisit the digitalization and computerization of organic chemistry principles.
Collapse
Affiliation(s)
- Shuo-Qing Zhang
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, 38 Zheda Road, Hangzhou, 310027, P. R. China
| | - Li-Cheng Xu
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, 38 Zheda Road, Hangzhou, 310027, P. R. China
| | - Shu-Wen Li
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, 38 Zheda Road, Hangzhou, 310027, P. R. China
| | - João C A Oliveira
- Institut für Organische und Biomolekulare Chemie, Wöhler Research Institute for Sustainable Chemistry (WISCh), Georg-August-Universität, Tammannstraße 2, 37077, Göttingen, Germany
| | - Xin Li
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, 38 Zheda Road, Hangzhou, 310027, P. R. China
| | - Lutz Ackermann
- Institut für Organische und Biomolekulare Chemie, Wöhler Research Institute for Sustainable Chemistry (WISCh), Georg-August-Universität, Tammannstraße 2, 37077, Göttingen, Germany
| | - Xin Hong
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, 38 Zheda Road, Hangzhou, 310027, P. R. China.,Beijing National Laboratory for Molecular Sciences, Zhongguancun North First Street No. 2, Beijing, 100190, P. R. China.,Key Laboratory of Precise Synthesis of, Functional Molecules of Zhejiang Province, School of Science, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, P. R. China
| |
Collapse
|
27
|
Bobko MA, Elward JM, Naidu BN, Nieves-Quinones YE, Reiher CA, Su Q, Sun L, Woodard J, Xie S, Yang W, Yin Y. Expeditious Synthesis of a Potent Allosteric HIV-1 Integrase Inhibitor GSK3839919A. Org Process Res Dev 2023. [DOI: 10.1021/acs.oprd.2c00343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
- Mark A. Bobko
- Drug Substance Development, GSK, 1250 South Collegeville Road, Collegeville, Pennsylvania 19426, United States
| | - Jennifer M. Elward
- Molecular Design, GSK, 1250 South Collegeville Road, Collegeville, Pennsylvania 19426, United States
| | | | - Yexenia E. Nieves-Quinones
- Drug Substance Development, GSK, 1250 South Collegeville Road, Collegeville, Pennsylvania 19426, United States
| | - Christopher A. Reiher
- Drug Substance Development, GSK, 1250 South Collegeville Road, Collegeville, Pennsylvania 19426, United States
| | - Qiaogong Su
- Drug Substance Development, GSK, 1250 South Collegeville Road, Collegeville, Pennsylvania 19426, United States
| | - Liang Sun
- Chemistry Service Unit, WuXi AppTec Co., Ltd., 168 Nanhai Road, Tianjin 300457, People’s Republic of China
| | - John Woodard
- Drug Substance Development, GSK, 1250 South Collegeville Road, Collegeville, Pennsylvania 19426, United States
| | - Shiping Xie
- Drug Substance Development, GSK, 1250 South Collegeville Road, Collegeville, Pennsylvania 19426, United States
| | - Wuxing Yang
- Chemistry Service Unit, WuXi AppTec Co., Ltd., 168 Nanhai Road, Tianjin 300457, People’s Republic of China
| | - Yunxing Yin
- Chemistry Service Unit, WuXi AppTec Co., Ltd., 168 Nanhai Road, Tianjin 300457, People’s Republic of China
| |
Collapse
|
28
|
Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023; 14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.
Collapse
Affiliation(s)
- Zhengkai Tu
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|
29
|
Davies JC, Pattison D, Hirst JD. Machine learning for yield prediction for chemical reactions using in situ sensors. J Mol Graph Model 2023; 118:108356. [PMID: 36272195 DOI: 10.1016/j.jmgm.2022.108356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/30/2022] [Accepted: 09/30/2022] [Indexed: 11/28/2022]
Abstract
Machine learning models were developed to predict product formation from time-series reaction data for ten Buchwald-Hartwig coupling reactions. The data was provided by DeepMatter and was collected in their DigitalGlassware cloud platform. The reaction probe has 12 sensors to measure properties of interest, including temperature, pressure, and colour. Colour was a good predictor of product formation for this reaction and machine learning models were able to learn which of the properties were important. Predictions for the current product formation (in terms of % yield) had a mean absolute error of 1.2%. For predicting 30, 60 and 120 min ahead the error rose to 3.4, 4.1 and 4.6%, respectively. The work here presents an example into the insight that can be obtained from applying machine learning methods to sensor data in synthetic chemistry.
Collapse
Affiliation(s)
- Joseph C Davies
- School of Chemistry, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
| | | | - Jonathan D Hirst
- School of Chemistry, University of Nottingham, University Park, Nottingham, NG7 2RD, UK.
| |
Collapse
|
30
|
Zahrt AF, Mo Y, Nandiwale KY, Shprints R, Heid E, Jensen KF. Machine-Learning-Guided Discovery of Electrochemical Reactions. J Am Chem Soc 2022; 144:22599-22610. [PMID: 36459170 DOI: 10.1021/jacs.2c08997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
The molecular structures synthesizable by organic chemists dictate the molecular functions they can create. The invention and development of chemical reactions are thus critical for chemists to access new and desirable functional molecules in all disciplines of organic chemistry. This work seeks to expedite the exploration of emerging areas of organic chemistry by devising a machine-learning-guided workflow for reaction discovery. Specifically, this study uses machine learning to predict competent electrochemical reactions. To this end, we first develop a molecular representation that enables the production of general models with limited training data. Next, we employ automated experimentation to test a large number of electrochemical reactions. These reactions are categorized as competent or incompetent mixtures, and a classification model was trained to predict reaction competency. This model is used to screen 38,865 potential reactions in silico, and the predictions are used to identify a number of reactions of synthetic or mechanistic interest, 80% of which are found to be competent. Additionally, we provide the predictions for the 38,865-member set in the hope of accelerating the development of this field. We envision that adopting a workflow such as this could enable the rapid development of many fields of chemistry.
Collapse
Affiliation(s)
- Andrew F Zahrt
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02142, United States
| | - Yiming Mo
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02142, United States.,College of Chemical and Biological Engineering, Zhejiang University, Hangzhou310027, China
| | - Kakasaheb Y Nandiwale
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02142, United States
| | - Ron Shprints
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02142, United States
| | - Esther Heid
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02142, United States.,Institute of Materials Chemistry, TU Wien, Vienna1060, Austria
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02142, United States
| |
Collapse
|
31
|
Reiser P, Neubert M, Eberhard A, Torresi L, Zhou C, Shao C, Metni H, van Hoesel C, Schopmans H, Sommer T, Friederich P. Graph neural networks for materials science and chemistry. COMMUNICATIONS MATERIALS 2022; 3:93. [PMID: 36468086 PMCID: PMC9702700 DOI: 10.1038/s43246-022-00315-6] [Citation(s) in RCA: 65] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/07/2022] [Indexed: 05/14/2023]
Abstract
Machine learning plays an increasingly important role in many areas of chemistry and materials science, being used to predict materials properties, accelerate simulations, design new structures, and predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials science, as they directly work on a graph or structural representation of molecules and materials and therefore have full access to all relevant information required to characterize materials. In this Review, we provide an overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures, followed by a discussion of a wide range of recent applications of GNNs in chemistry and materials science, and concluding with a road-map for the further development and application of GNNs.
Collapse
Affiliation(s)
- Patrick Reiser
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Marlen Neubert
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - André Eberhard
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Luca Torresi
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Zhou
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Shao
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Present Address: Institute for Applied Informatics and Formal Description Systems, Karlsruhe Institute of Technology, Kaiserstr. 89, 76133 Karlsruhe, Germany
| | - Houssam Metni
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- ECPM, Université de Strasbourg, 25 Rue Becquerel, 67087 Strasbourg, France
| | - Clint van Hoesel
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Department of Applied Physics, Eindhoven University of Technology, Groene Loper 19, 5612 AP Eindhoven, The Netherlands
| | - Henrik Schopmans
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Timo Sommer
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute for Theory of Condensed Matter, Karlsruhe Institute of Technology, Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany
- Present Address: School of Chemistry, Trinity College Dublin, College Green, Dublin 2, Ireland
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
32
|
Boni YT, Cammarota RC, Liao K, Sigman MS, Davies HML. Leveraging Regio- and Stereoselective C(sp 3)-H Functionalization of Silyl Ethers to Train a Logistic Regression Classification Model for Predicting Site-Selectivity Bias. J Am Chem Soc 2022; 144:15549-15561. [PMID: 35977100 DOI: 10.1021/jacs.2c04383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The C-H functionalization of silyl ethers via carbene-induced C-H insertion represents an efficient synthetic disconnection strategy. In this work, site- and stereoselective C(sp3)-H functionalization at α, γ, δ, and even more distal positions to the siloxy group has been achieved using donor/acceptor carbene intermediates. By exploiting the predilections of Rh2(R-TCPTAD)4 and Rh2(S-2-Cl-5-BrTPCP)4 catalysts to target either more electronically activated or more spatially accessible C-H sites, respectively, divergent desired products can be formed with good diastereocontrol and enantiocontrol. Notably, the reaction can also be extended to enable desymmetrization of meso silyl ethers. Leveraging the broad substrate scope examined in this study, we have trained a machine learning classification model using logistic regression to predict the major C-H functionalization site based on intrinsic substrate reactivity and catalyst propensity for overriding it. This model enables prediction of the major product when applying these C-H functionalization methods to a new substrate of interest. Applying this model broadly, we have demonstrated its utility for guiding late-stage functionalization in complex settings and developed an intuitive visualization tool to assist synthetic chemists in such endeavors.
Collapse
Affiliation(s)
- Yannick T Boni
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| | - Ryan C Cammarota
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Kuangbiao Liao
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| | - Matthew S Sigman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Huw M L Davies
- Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta, Georgia 30322, United States
| |
Collapse
|
33
|
When machine learning meets molecular synthesis. TRENDS IN CHEMISTRY 2022. [DOI: 10.1016/j.trechm.2022.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
34
|
Spiekermann KA, Pattanaik L, Green WH. Fast Predictions of Reaction Barrier Heights: Toward Coupled-Cluster Accuracy. J Phys Chem A 2022; 126:3976-3986. [PMID: 35727075 DOI: 10.1021/acs.jpca.2c02614] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Quantitative estimates of reaction barriers are essential for developing kinetic mechanisms and predicting reaction outcomes. However, the lack of experimental data and the steep scaling of accurate quantum calculations often hinder the ability to obtain reliable kinetic values. Here, we train a directed message passing neural network on nearly 24,000 diverse gas-phase reactions calculated at CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP. Our model uses 75% fewer parameters than previous studies, an improved reaction representation, and proper data splits to accurately estimate performance on unseen reactions. Using information from only the reactant and product, our model quickly predicts barrier heights with a testing MAE of 2.6 kcal mol-1 relative to the coupled-cluster data, making it more accurate than a good density functional theory calculation. Furthermore, our results show that future modeling efforts to estimate reaction properties would significantly benefit from fine-tuning calibration using a transfer learning technique. We anticipate this model will accelerate and improve kinetic predictions for small molecule chemistry.
Collapse
Affiliation(s)
- Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
35
|
Shim E, Kammeraad JA, Xu Z, Tewari A, Cernak T, Zimmerman PM. Predicting reaction conditions from limited data through active transfer learning. Chem Sci 2022; 13:6655-6668. [PMID: 35756521 PMCID: PMC9172577 DOI: 10.1039/d1sc06932b] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 05/10/2022] [Indexed: 12/30/2022] Open
Abstract
Transfer and active learning have the potential to accelerate the development of new chemical reactions, using prior data and new experiments to inform models that adapt to the target area of interest. This article shows how specifically tuned machine learning models, based on random forest classifiers, can expand the applicability of Pd-catalyzed cross-coupling reactions to types of nucleophiles unknown to the model. First, model transfer is shown to be effective when reaction mechanisms and substrates are closely related, even when models are trained on relatively small numbers of data points. Then, a model simplification scheme is tested and found to provide comparative predictivity on reactions of new nucleophiles that include unseen reagent combinations. Lastly, for a challenging target where model transfer only provides a modest benefit over random selection, an active transfer learning strategy is introduced to improve model predictions. Simple models, composed of a small number of decision trees with limited depths, are crucial for securing generalizability, interpretability, and performance of active transfer learning.
Collapse
Affiliation(s)
- Eunjae Shim
- Department of Chemistry, University of MichiganAnn ArborMIUSA
| | - Joshua A. Kammeraad
- Department of Chemistry, University of MichiganAnn ArborMIUSA,Department of Statistics, University of MichiganAnn ArborMIUSA
| | - Ziping Xu
- Department of Statistics, University of MichiganAnn ArborMIUSA
| | - Ambuj Tewari
- Department of Statistics, University of MichiganAnn ArborMIUSA,Department of Electrical Engineering and Computer Science, University of MichiganAnn ArborMIUSA
| | - Tim Cernak
- Department of Chemistry, University of MichiganAnn ArborMIUSA,Department of Medicinal Chemistry, University of MichiganAnn ArborMIUSA
| | | |
Collapse
|
36
|
Yang L, Zhu L, Zhang S, Hong X. Machine Learning Prediction of
Structure‐Performance
Relationship in Organic Synthesis. CHINESE J CHEM 2022. [DOI: 10.1002/cjoc.202200039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Li‐Cheng Yang
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University Hangzhou Zhejiang 310027 China
| | - Lu‐Jing Zhu
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University Hangzhou Zhejiang 310027 China
| | - Shuo‐Qing Zhang
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University Hangzhou Zhejiang 310027 China
| | - Xin Hong
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University Hangzhou Zhejiang 310027 China
- Beijing National Laboratory for Molecular Sciences, Zhongguancun North First Street NO. 2 Beijing 100190 China
- Key Laboratory of Precise Synthesis of Functional Molecules of Zhejiang Province, School of Science, Westlake University, 18 Shilongshan Road Hangzhou Zhejiang 310024 China
| |
Collapse
|
37
|
Wu H, Grinberg Dana A, Ranasinghe DS, Pickard FC, Wood GPF, Zelesky T, Sluggett GW, Mustakis J, Green WH. Kinetic Modeling of API Oxidation: (2) Imipramine Stress Testing. Mol Pharm 2022; 19:1526-1539. [DOI: 10.1021/acs.molpharmaceut.2c00043] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Haoyang Wu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Alon Grinberg Dana
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Wolfson Department of Chemical Engineering, Technion - Israel Institute of Technology, Haifa 3200003, Israel
| | - Duminda S. Ranasinghe
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Frank C. Pickard
- Pfizer Global Research and Development, Groton Laboratories, Eastern Point Road, Groton, Connecticut 06340, United States
| | - Geoffrey P. F. Wood
- Pfizer Global Research and Development, Groton Laboratories, Eastern Point Road, Groton, Connecticut 06340, United States
| | - Todd Zelesky
- Pfizer Global Research and Development, Groton Laboratories, Eastern Point Road, Groton, Connecticut 06340, United States
| | - Gregory W. Sluggett
- Pfizer Global Research and Development, Groton Laboratories, Eastern Point Road, Groton, Connecticut 06340, United States
| | - Jason Mustakis
- Pfizer Global Research and Development, Groton Laboratories, Eastern Point Road, Groton, Connecticut 06340, United States
| | - William H. Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
38
|
Bolcato G, Heid E, Boström J. On the Value of Using 3D Shape and Electrostatic Similarities in Deep Generative Methods. J Chem Inf Model 2022; 62:1388-1398. [PMID: 35271260 PMCID: PMC8965872 DOI: 10.1021/acs.jcim.1c01535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
Multiparameter optimization,
the heart of drug design, is still
an open challenge. Thus, improved methods for automated compound design
with multiple controlled properties are desired. Here, we present
a significant extension to our previously described fragment-based
reinforcement learning method (DeepFMPO) for the generation of novel
molecules with optimal properties. As before, the generative process
outputs optimized molecules similar to the input structures, now with
the improved feature of replacing parts of these molecules with fragments
of similar three-dimensional (3D) shape and electrostatics. We developed
and benchmarked a new python package, ESP-Sim, for the comparison
of the electrostatic potential and the molecular shape, allowing the
calculation of high-quality partial charges (e.g., RESP with B3LYP/6-31G**)
obtained using the quantum chemistry program Psi4. By performing comparisons
of 3D fragments, we can simulate 3D properties while overcoming the
notoriously difficult step of accurately describing bioactive conformations.
The new improved generative (DeepFMPO v3D) method is demonstrated
with a scaffold-hopping exercise identifying CDK2 bioisosteres. The
code is open-source and freely available.
Collapse
Affiliation(s)
- Giovanni Bolcato
- Molecular Modeling Section, University of Padova, 35131 Padova, Italy
| | - Esther Heid
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, 02139 Massachusetts, United States
| | - Jonas Boström
- Medicinal Chemistry, Early CVRM, BioPharmaceuticals R&D, AstraZeneca, 431 50 Mölndal, Sweden
| |
Collapse
|
39
|
Stuyver T, Coley CW. Quantum chemistry-augmented neural networks for reactivity prediction: Performance, generalizability, and explainability. J Chem Phys 2022; 156:084104. [DOI: 10.1063/5.0079574] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
There is a perceived dichotomy between structure-based and descriptor-based molecular representations used for predictive chemistry tasks. Here, we study the performance, generalizability, and explainability of the quantum mechanics-augmented graph neural network (ml-QM-GNN) architecture as applied to the prediction of regioselectivity (classification) and of activation energies (regression). In our hybrid QM-augmented model architecture, structure-based representations are first used to predict a set of atom- and bond-level reactivity descriptors derived from density functional theory calculations. These estimated reactivity descriptors are combined with the original structure-based representation to make the final reactivity prediction. We demonstrate that our model architecture leads to significant improvements over structure-based GNNs in not only overall accuracy but also in generalization to unseen compounds. Even when provided training sets of only a couple hundred labeled data points, the ml-QM-GNN outperforms other state-of-the-art structure-based architectures that have been applied to these tasks as well as descriptor-based (linear) regressions. As a primary contribution of this work, we demonstrate a bridge between data-driven predictions and conceptual frameworks commonly used to gain qualitative insights into reactivity phenomena, taking advantage of the fact that our models are grounded in (but not restricted to) QM descriptors. This effort results in a productive synergy between theory and data science, wherein QM-augmented models provide a data-driven confirmation of previous qualitative analyses, and these analyses in turn facilitate insights into the decision-making process occurring within ml-QM-GNNs.
Collapse
Affiliation(s)
- Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Connor W. Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
40
|
Caldeweyher E, Bauer C, Tehrani AS. An open-source framework for fast-yet-accurate calculation of quantum mechanical features. Phys Chem Chem Phys 2022; 24:10599-10610. [DOI: 10.1039/d2cp01165d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
We present the open-source framework kallisto that enables the efficient and robust calculation of quantum mechanical features for atoms and molecules. For a benchmark set of 49 experimental molecular polarizabilities,...
Collapse
|
41
|
Saini V, Kumar R. A machine learning approach for predicting the empirical polarity of organic solvents. NEW J CHEM 2022. [DOI: 10.1039/d2nj02513b] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
A neural network architecture was found to efficiently predict the empirical polarity parameter ET(30) using simple to compute and interpretable six quantum mechanical, topological and categorical descriptors.
Collapse
Affiliation(s)
- Vaneet Saini
- Department of Chemistry & Centre for Advanced Studies in Chemistry, Panjab University, Chandigarh 160014, India
| | - Ranjeet Kumar
- Department of Chemistry & Centre for Advanced Studies in Chemistry, Panjab University, Chandigarh 160014, India
| |
Collapse
|
42
|
Wan Z, Wang QD. Machine Learning Prediction of the Exfoliation Energies of Two-Dimension Materials via Data-Driven Approach. J Phys Chem Lett 2021; 12:11470-11475. [PMID: 34793172 DOI: 10.1021/acs.jpclett.1c03335] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Exfoliation energy is one of the fundamental parameters in the science and engineering of two-dimensional (2D) materials. Traditionally, it was obtained via indirect experimental measurement or first-principles calculations, which are very time- and resource-consuming. Herein, we provide an efficient machine learning (ML) method to accurately predict the exfoliation energies for 2D materials. Toward this end, a series of simple descriptors with explicit physical meanings are defined. Regression trees (RT), support vector machines (SVM), multiple linear regression (MLR), and ensemble trees (ET) are compared to develop the most suitable model for the prediction of exfoliation energies. It is shown that the ET model can efficiently predict the exfoliation energies through extensive validations and stability analysis. The influence of the defined features on the exfoliation energies is analyzed by sensitivity analysis to provide novel physical insight into the affecting factors of the exfoliation energies.
Collapse
Affiliation(s)
- Zhongyu Wan
- Jiangsu Key Laboratory of Coal-Based Greenhouse Gas Control and Utilization, Low Carbon Energy Institute, School of Chemical Engineering, China University of Mining and Technology, Xuzhou, 221008, People's Republic of China
- Department of Physics, City University of Hong Kong, Hong Kong SAR 999077, People's Republic of China
| | - Quan-De Wang
- Jiangsu Key Laboratory of Coal-Based Greenhouse Gas Control and Utilization, Low Carbon Energy Institute, School of Chemical Engineering, China University of Mining and Technology, Xuzhou, 221008, People's Republic of China
| |
Collapse
|
43
|
Sunoj RB. Coming of Age of Computational Chemistry from a Resilient Past to a Promising Future. Isr J Chem 2021. [DOI: 10.1002/ijch.202100106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Raghavan B. Sunoj
- Department of Chemistry Indian Institute of Technology Bombay, Powai Mumbai 400076 India
| |
Collapse
|
44
|
Gong Y, Xue D, Chuai G, Yu J, Liu Q. DeepReac+: deep active learning for quantitative modeling of organic chemical reactions. Chem Sci 2021; 12:14459-14472. [PMID: 34880997 PMCID: PMC8580052 DOI: 10.1039/d1sc02087k] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 10/08/2021] [Indexed: 11/21/2022] Open
Abstract
Various computational methods have been developed for quantitative modeling of organic chemical reactions; however, the lack of universality as well as the requirement of large amounts of experimental data limit their broad applications. Here, we present DeepReac+, an efficient and universal computational framework for prediction of chemical reaction outcomes and identification of optimal reaction conditions based on deep active learning. Under this framework, DeepReac is designed as a graph-neural-network-based model, which directly takes 2D molecular structures as inputs and automatically adapts to different prediction tasks. In addition, carefully-designed active learning strategies are incorporated to substantially reduce the number of necessary experiments for model training. We demonstrate the universality and high efficiency of DeepReac+ by achieving the state-of-the-art results with a minimum of labeled data on three diverse chemical reaction datasets in several scenarios. Collectively, DeepReac+ has great potential and utility in the development of AI-aided chemical synthesis. DeepReac+ is freely accessible at https://github.com/bm2-lab/DeepReac.
Collapse
Affiliation(s)
- Yukang Gong
- Department of Ophthalmology, Shanghai Tenth People's Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University Shanghai 200072 China
| | - Dongyu Xue
- Department of Ophthalmology, Shanghai Tenth People's Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University Shanghai 200072 China
| | - Guohui Chuai
- Department of Ophthalmology, Shanghai Tenth People's Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University Shanghai 200072 China
| | - Jing Yu
- Department of Ophthalmology, Shanghai Tenth People's Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University Shanghai 200072 China
| | - Qi Liu
- Department of Ophthalmology, Shanghai Tenth People's Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University Shanghai 200072 China
| |
Collapse
|
45
|
Heid E, Green WH. Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction. J Chem Inf Model 2021; 62:2101-2110. [PMID: 34734699 PMCID: PMC9092344 DOI: 10.1021/acs.jcim.1c00975] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
The estimation of
chemical reaction properties such as activation
energies, rates, or yields is a central topic of computational chemistry.
In contrast to molecular properties, where machine learning approaches
such as graph convolutional neural networks (GCNNs) have excelled
for a wide variety of tasks, no general and transferable adaptations
of GCNNs for reactions have been developed yet. We therefore combined
a popular cheminformatics reaction representation, the so-called condensed
graph of reaction (CGR), with a recent GCNN architecture to arrive
at a versatile, robust, and compact deep learning model. The CGR is
a superposition of the reactant and product graphs of a chemical reaction
and thus an ideal input for graph-based machine learning approaches.
The model learns to create a data-driven, task-dependent reaction
embedding that does not rely on expert knowledge, similar to current
molecular GCNNs. Our approach outperforms current state-of-the-art
models in accuracy, is applicable even to imbalanced reactions, and
possesses excellent predictive capabilities for diverse target properties,
such as activation energies, reaction enthalpies, rate constants,
yields, or reaction classes. We furthermore curated a large set of
atom-mapped reactions along with their target properties, which can
serve as benchmark data sets for future work. All data sets and the
developed reaction GCNN model are available online, free of charge,
and open source.
Collapse
Affiliation(s)
- Esther Heid
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
46
|
Towards Data‐Driven Design of Asymmetric Hydrogenation of Olefins: Database and Hierarchical Learning. Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202106880] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
47
|
Guan Y, Shree Sowndarya SV, Gallegos LC, St John PC, Paton RS. Real-time prediction of 1H and 13C chemical shifts with DFT accuracy using a 3D graph neural network. Chem Sci 2021; 12:12012-12026. [PMID: 34667567 PMCID: PMC8457395 DOI: 10.1039/d1sc03343c] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Accepted: 07/19/2021] [Indexed: 11/23/2022] Open
Abstract
Nuclear magnetic resonance (NMR) is one of the primary techniques used to elucidate the chemical structure, bonding, stereochemistry, and conformation of organic compounds. The distinct chemical shifts in an NMR spectrum depend upon each atom's local chemical environment and are influenced by both through-bond and through-space interactions with other atoms and functional groups. The in silico prediction of NMR chemical shifts using quantum mechanical (QM) calculations is now commonplace in aiding organic structural assignment since spectra can be computed for several candidate structures and then compared with experimental values to find the best possible match. However, the computational demands of calculating multiple structural- and stereo-isomers, each of which may typically exist as an ensemble of rapidly-interconverting conformations, are expensive. Additionally, the QM predictions themselves may lack sufficient accuracy to identify a correct structure. In this work, we address both of these shortcomings by developing a rapid machine learning (ML) protocol to predict 1H and 13C chemical shifts through an efficient graph neural network (GNN) using 3D structures as input. Transfer learning with experimental data is used to improve the final prediction accuracy of a model trained using QM calculations. When tested on the CHESHIRE dataset, the proposed model predicts observed 13C chemical shifts with comparable accuracy to the best-performing DFT functionals (1.5 ppm) in around 1/6000 of the CPU time. An automated prediction webserver and graphical interface are accessible online at http://nova.chem.colostate.edu/cascade/. We further demonstrate the model in three applications: first, we use the model to decide the correct organic structure from candidates through experimental spectra, including complex stereoisomers; second, we automatically detect and revise incorrect chemical shift assignments in a popular NMR database, the NMRShiftDB; and third, we use NMR chemical shifts as descriptors for determination of the sites of electrophilic aromatic substitution. From quantum chemical and experimental NMR data, a 3D graph neural network, CASCADE, has been developed to predict carbon and proton chemical shifts. Stereoisomers and conformers of organic molecules can be correctly distinguished.![]()
Collapse
Affiliation(s)
- Yanfei Guan
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - S V Shree Sowndarya
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Liliana C Gallegos
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Peter C St John
- Biosciences Center, National Renewable Energy Laboratory Golden CO 80401 USA
| | - Robert S Paton
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| |
Collapse
|
48
|
Xu LC, Zhang SQ, Li X, Tang MJ, Xie PP, Hong X. Towards Data-driven Design of Asymmetric Hydrogenation of Olefins: Database and Hierarchical Learning. Angew Chem Int Ed Engl 2021; 60:22804-22811. [PMID: 34370892 DOI: 10.1002/anie.202106880] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Revised: 07/14/2021] [Indexed: 11/09/2022]
Abstract
Asymmetric hydrogenation of olefins is one of the most powerful asymmetric transformations in molecular synthesis. Although several privileged catalyst scaffolds are available, the catalyst development for asymmetric hydrogenation is still a time- and resource-consuming process due to the lack of predictive catalyst design strategy. Targeting the data-driven design of asymmetric catalysis, we herein report the development of a standardized database that contains the detailed information of over 12000 literature asymmetric hydrogenations of olefins. This database provides a valuable platform for the machine learning applications in asymmetric catalysis. Based on this database, we developed a hierarchical learning approach to achieve predictive machine leaning model using only dozens of enantioselectivity data with the target olefin, which offers a useful solution for the few-shot learning problem and will facilitate the reaction optimization with new olefin substrate in catalysis screening.
Collapse
Affiliation(s)
- Li-Cheng Xu
- Zhejiang University, Department of Chemistry, CHINA
| | | | - Xin Li
- Zhejiang University, Department of Chemistry, CHINA
| | | | - Pei-Pei Xie
- Zhejiang University, Department of Chemistry, CHINA
| | - Xin Hong
- Zhejiang University, Department of Chemistry, 38 Zheda Road, 310028, Hangzhou, CHINA
| |
Collapse
|
49
|
Abstract
Computational methods have emerged as a powerful tool to augment traditional experimental molecular catalyst design by providing useful predictions of catalyst performance and decreasing the time needed for catalyst screening. In this perspective, we discuss three approaches for computational molecular catalyst design: (i) the reaction mechanism-based approach that calculates all relevant elementary steps, finds the rate and selectivity determining steps, and ultimately makes predictions on catalyst performance based on kinetic analysis, (ii) the descriptor-based approach where physical/chemical considerations are used to find molecular properties as predictors of catalyst performance, and (iii) the data-driven approach where statistical analysis as well as machine learning (ML) methods are used to obtain relationships between available data/features and catalyst performance. Following an introduction to these approaches, we cover their strengths and weaknesses and highlight some recent key applications. Furthermore, we present an outlook on how the currently applied approaches may evolve in the near future by addressing how recent developments in building automated computational workflows and implementing advanced ML models hold promise for reducing human workload, eliminating human bias, and speeding up computational catalyst design at the same time. Finally, we provide our viewpoint on how some of the challenges associated with the up-and-coming approaches driven by automation and ML may be resolved.
Collapse
Affiliation(s)
- Ademola Soyemi
- Department of Chemical and Biological Engineering, The University of Alabama, Tuscaloosa, AL 35487, USA.
| | - Tibor Szilvási
- Department of Chemical and Biological Engineering, The University of Alabama, Tuscaloosa, AL 35487, USA.
| |
Collapse
|
50
|
Wan Z, Wang QD, Liu D, Liang J. Accelerating the optimization of enzyme-catalyzed synthesis conditions via machine learning and reactivity descriptors. Org Biomol Chem 2021; 19:6267-6273. [PMID: 34195743 DOI: 10.1039/d1ob01066b] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Enzyme-catalyzed synthesis reactions are of crucial importance for a wide range of applications. An accurate and rapid selection of optimal synthesis conditions is crucial and challenging for both human knowledge and computer predictions. In this work, a new scenario, which combines a data-driven machine learning (ML) model with reactivity descriptors, is developed to predict the optimal enzyme-catalyzed synthesis conditions and the reaction yield. Fourteen reactivity descriptors in total are constructed to describe 125 reactions (classified into five categories) included in different reaction mechanisms. Nineteen ML models are developed to train the dataset and the Quadratic support vector machine (SVM) model is found to exhibit the best performance. The Quadratic SVM model is then used to predict the optimal reaction conditions, which are subsequently used to obtain the highest yield among 109 200 reaction conditions with different molar ratios of substrates, solvents, water contents, enzyme concentrations and temperatures for each reaction. The proposed protocol should be generally applicable to a diverse range of chemical reactions and provides a black-box evaluation for optimizing the reaction conditions of organic synthesis reactions.
Collapse
Affiliation(s)
- Zhongyu Wan
- Jiangsu Key Laboratory of Coal-based Greenhouse Gas Control and Utilization, Low Carbon Energy Institute and School of Chemical Engineering, China University of Mining and Technology, Xuzhou, 221008, People's Republic of China. and School of Science, City University of Hong Kong, Hong Kong SAR 999077, People's Republic of China
| | - Quan-De Wang
- Jiangsu Key Laboratory of Coal-based Greenhouse Gas Control and Utilization, Low Carbon Energy Institute and School of Chemical Engineering, China University of Mining and Technology, Xuzhou, 221008, People's Republic of China.
| | - Dongchang Liu
- School of Science, Xi'an Polytechnic University, Xi'an 710048, People's Republic of China and Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - Jinhu Liang
- School of Environment and Safety Engineering, North University of China, Taiyuan 030051, People's Republic of China
| |
Collapse
|