1
|
Schoepfer AA, Weinreich J, Laplaza R, Waser J, Corminboeuf C. Cost-informed Bayesian reaction optimization. DIGITAL DISCOVERY 2024; 3:2289-2297. [PMID: 39398973 PMCID: PMC11465108 DOI: 10.1039/d4dd00225c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 09/30/2024] [Indexed: 10/15/2024]
Abstract
Bayesian optimization (BO) is an efficient method for solving complex optimization problems, including those in chemical research, where it is gaining significant popularity. Although effective in guiding experimental design, BO does not account for experimentation costs: testing readily available reagents under different conditions could be more cost and time-effective than synthesizing or buying additional ones. To address this issue, we present cost-informed BO (CIBO), an approach tailored for the rational planning of chemical experimentation that prioritizes the most cost-effective experiments. Reagents are used only when their anticipated improvement in reaction performance sufficiently outweighs their costs. Our algorithm tracks available reagents, including those recently acquired, and dynamically updates their cost during the optimization. Using literature data of Pd-catalyzed reactions, we show that CIBO reduces the cost of reaction optimization by up to 90% compared to standard BO. Our approach is compatible with any type of cost, e.g., of buying equipment or compounds, waiting time, as well as environmental or security concerns. We believe CIBO extends the possibilities of BO in chemistry and envision applications for both traditional and self-driving laboratories for experiment planning.
Collapse
Affiliation(s)
- Alexandre A Schoepfer
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- Laboratory of Catalysis and Organic Synthesis, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Jan Weinreich
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Ruben Laplaza
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Jerome Waser
- Laboratory of Catalysis and Organic Synthesis, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| |
Collapse
|
2
|
Chen LY, Li YP. Machine learning-guided strategies for reaction conditions design and optimization. Beilstein J Org Chem 2024; 20:2476-2492. [PMID: 39376489 PMCID: PMC11457048 DOI: 10.3762/bjoc.20.212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 09/19/2024] [Indexed: 10/09/2024] Open
Abstract
This review surveys the recent advances and challenges in predicting and optimizing reaction conditions using machine learning techniques. The paper emphasizes the importance of acquiring and processing large and diverse datasets of chemical reactions, and the use of both global and local models to guide the design of synthetic processes. Global models exploit the information from comprehensive databases to suggest general reaction conditions for new reactions, while local models fine-tune the specific parameters for a given reaction family to improve yield and selectivity. The paper also identifies the current limitations and opportunities in this field, such as the data quality and availability, and the integration of high-throughput experimentation. The paper demonstrates how the combination of chemical engineering, data science, and ML algorithms can enhance the efficiency and effectiveness of reaction conditions design, and enable novel discoveries in synthetic chemistry.
Collapse
Affiliation(s)
- Lung-Yi Chen
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
- Taiwan International Graduate Program on Sustainable Chemical Science and Technology (TIGP-SCST), No. 128, Sec. 2, Academia Road, Taipei 11529, Taiwan
| |
Collapse
|
3
|
Han Y, Deng M, Liu K, Chen J, Wang Y, Xu YN, Dian L. Computer-Aided Synthesis Planning (CASP) and Machine Learning: Optimizing Chemical Reaction Conditions. Chemistry 2024; 30:e202401626. [PMID: 39083362 DOI: 10.1002/chem.202401626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 07/27/2024] [Accepted: 07/28/2024] [Indexed: 08/02/2024]
Abstract
Computer-aided synthesis planning (CASP) has garnered increasing attention in light of recent advancements in machine learning models. While the focus is on reverse synthesis or forward outcome prediction, optimizing reaction conditions remains a significant challenge. For datasets with multiple variables, the choice of descriptors and models is pivotal. This selection dictates the effective extraction of conditional features and the achievement of higher prediction accuracy. This review delineates the origins of data in conditional optimization, the criteria for descriptor selection, the response models, and the metrics for outcome evaluation, aiming to acquaint readers with the latest research trends and facilitate more informed research in this domain.
Collapse
Affiliation(s)
- Yu Han
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Mingjing Deng
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Ke Liu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Jia Chen
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Yuting Wang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Yu-Ning Xu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
| | - Longyang Dian
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, No. 72 Binhai Avenue, Qingdao, 266237, P. R. China
- Suzhou Institute of Shandong University, No. 388 Ruoshui Road, Suzhou Industrial Park, Suzhou, 215123, P. R. China
| |
Collapse
|
4
|
Singh S, Hernández-Lobato JM. Data-Driven Insights into the Transition-Metal-Catalyzed Asymmetric Hydrogenation of Olefins. J Org Chem 2024; 89:12467-12478. [PMID: 39149801 PMCID: PMC11382158 DOI: 10.1021/acs.joc.4c01396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
The transition-metal-catalyzed asymmetric hydrogenation of olefins is one of the key transformations with great utility in various industrial applications. The field has been dominated by the use of noble metal catalysts, such as iridium and rhodium. The reactions with the earth-abundant cobalt metal have increased only in recent years. In this work, we analyze the large amount of literature data available on iridium- and rhodium-catalyzed asymmetric hydrogenation. The limited data on reactions using Co catalysts are then examined in the context of Ir and Rh to obtain a better understanding of the reactivity pattern. A detailed data-driven study of the types of olefins, ligands, and reaction conditions such as solvent, temperature, and pressure is carried out. Our analysis provides an understanding of the literature trends and demonstrates that only a few olefin-ligand combinations or reaction conditions are frequently used. The knowledge of this bias in the literature data toward a certain group of substrates or reaction conditions can be useful for practitioners to design new reaction data sets that are suitable to obtain meaningful predictions from machine-learning models.
Collapse
Affiliation(s)
- Sukriti Singh
- Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, U.K
| | | |
Collapse
|
5
|
Schoepfer A, Laplaza R, Wodrich MD, Waser J, Corminboeuf C. Reaction-Agnostic Featurization of Bidentate Ligands for Bayesian Ridge Regression of Enantioselectivity. ACS Catal 2024; 14:9302-9312. [PMID: 38933467 PMCID: PMC11197013 DOI: 10.1021/acscatal.4c02452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 05/22/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024]
Abstract
Chiral ligands are important components in asymmetric homogeneous catalysis, but their synthesis and screening can be both time-consuming and resource-intensive. Data-driven approaches, in contrast to screening procedures based on intuition, have the potential to reduce the time and resources needed for reaction optimization by more rapidly identifying an ideal catalyst. These approaches, however, are often nontransferable and cannot be applied across different reactions. To overcome this drawback, we introduce a general featurization strategy for bidentate ligands that is coupled with an automated feature selection pipeline and Bayesian ridge regression to perform multivariate linear regression modeling. This approach, which is applicable to any reaction, incorporates electronic, steric, and topological features (rigidity/flexibility, branching, geometry, and constitution) and is well-suited for early stage ligand optimization. Using only small data sets, our workflow capably predicts the enantioselectivity of four metal-catalyzed asymmetric reactions. Uncertainty estimates provided by Bayesian ridge regression permit the use of Bayesian optimization to efficiently explore pools of prospective ligands. Finally, we constructed the BDL-Cu-2023 data set, composed of 312 bidentate ligands extracted from the Cambridge Structural Database, and screened it with this procedure to identify ligand candidates for a challenging asymmetric oxy-alkynylation reaction.
Collapse
Affiliation(s)
- Alexandre
A. Schoepfer
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- Laboratory
of Catalysis and Organic Synthesis, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Ruben Laplaza
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Matthew D. Wodrich
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Jerome Waser
- Laboratory
of Catalysis and Organic Synthesis, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Clemence Corminboeuf
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
6
|
Raghavan P, Rago AJ, Verma P, Hassan MM, Goshu GM, Dombrowski AW, Pandey A, Coley CW, Wang Y. Incorporating Synthetic Accessibility in Drug Design: Predicting Reaction Yields of Suzuki Cross-Couplings by Leveraging AbbVie's 15-Year Parallel Library Data Set. J Am Chem Soc 2024; 146:15070-15084. [PMID: 38768950 PMCID: PMC11157529 DOI: 10.1021/jacs.4c00098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 04/24/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024]
Abstract
Despite the increased use of computational tools to supplement medicinal chemists' expertise and intuition in drug design, predicting synthetic yields in medicinal chemistry endeavors remains an unsolved challenge. Existing design workflows could profoundly benefit from reaction yield prediction, as precious material waste could be reduced, and a greater number of relevant compounds could be delivered to advance the design, make, test, analyze (DMTA) cycle. In this work, we detail the evaluation of AbbVie's medicinal chemistry library data set to build machine learning models for the prediction of Suzuki coupling reaction yields. The combination of density functional theory (DFT)-derived features and Morgan fingerprints was identified to perform better than one-hot encoded baseline modeling, furnishing encouraging results. Overall, we observe modest generalization to unseen reactant structures within the 15-year retrospective library data set. Additionally, we compare predictions made by the model to those made by expert medicinal chemists, finding that the model can often predict both reaction success and reaction yields with greater accuracy. Finally, we demonstrate the application of this approach to suggest structurally and electronically similar building blocks to replace those predicted or observed to be unsuccessful prior to or after synthesis, respectively. The yield prediction model was used to select similar monomers predicted to have higher yields, resulting in greater synthesis efficiency of relevant drug-like molecules.
Collapse
Affiliation(s)
- Priyanka Raghavan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, 77 Massachusetts Ave, Cambridge, Massachusetts 02139, United States
| | - Alexander J. Rago
- Advanced
Chemistry Technologies Group, AbbVie, Inc., 1 N Waukegan Rd, North Chicago, Illinois 60064, United States
| | - Pritha Verma
- Advanced
Chemistry Technologies Group, AbbVie, Inc., 1 N Waukegan Rd, North Chicago, Illinois 60064, United States
| | - Majdi M. Hassan
- RAIDERS
Group, AbbVie, Inc., 1 N Waukegan Rd, North Chicago, Illinois 60064, United States
| | - Gashaw M. Goshu
- Advanced
Chemistry Technologies Group, AbbVie, Inc., 1 N Waukegan Rd, North Chicago, Illinois 60064, United States
| | - Amanda W. Dombrowski
- Advanced
Chemistry Technologies Group, AbbVie, Inc., 1 N Waukegan Rd, North Chicago, Illinois 60064, United States
| | - Abhishek Pandey
- RAIDERS
Group, AbbVie, Inc., 1 N Waukegan Rd, North Chicago, Illinois 60064, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, 77 Massachusetts Ave, Cambridge, Massachusetts 02139, United States
| | - Ying Wang
- Advanced
Chemistry Technologies Group, AbbVie, Inc., 1 N Waukegan Rd, North Chicago, Illinois 60064, United States
| |
Collapse
|
7
|
Ding Y, Qiang B, Chen Q, Liu Y, Zhang L, Liu Z. Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective. J Chem Inf Model 2024; 64:2955-2970. [PMID: 38489239 DOI: 10.1021/acs.jcim.4c00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate the design of novel reactions, optimize existing ones for higher yields, and discover new pathways for synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning models, it is imperative to derive robust and informative representations or engage in feature engineering using extensive data sets of reactions. This work aims to provide a comprehensive review of established reaction featurization approaches, offering insights into the selection of representations and the design of features for a wide array of tasks. The advantages and limitations of employing SMILES, molecular fingerprints, molecular graphs, and physics-based properties are meticulously elaborated. Solutions to bridge the gap between different representations will also be critically evaluated. Additionally, we introduce a new frontier in chemical reaction pretraining, holding promise as an innovative yet unexplored avenue.
Collapse
Affiliation(s)
- Yuheng Ding
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Bo Qiang
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Qixuan Chen
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Yiqiao Liu
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Liangren Zhang
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Zhenming Liu
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| |
Collapse
|
8
|
Schrader ML, Schäfer FR, Schäfers F, Glorius F. Bridging the information gap in organic chemical reactions. Nat Chem 2024; 16:491-498. [PMID: 38548884 DOI: 10.1038/s41557-024-01470-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 02/02/2024] [Indexed: 04/07/2024]
Abstract
The varying quality of scientific reports is a well-recognized problem and often results from a lack of standardization and transparency in scientific publications. This situation ultimately leads to prominent complications such as reproducibility issues and the slow uptake of newly developed synthetic methods for pharmaceutical and agrochemical applications. In recent years, various impactful approaches have been advocated to bridge information gaps and to improve the quality of experimental protocols in synthetic organic publications. Here we provide a critical overview of these strategies and present the reader with a versatile set of tools to augment their standard procedures. We formulate eight principles to improve data management in scientific publications relating to data standardization, reproducibility and evaluation, and encourage scientists to go beyond current publication standards. We are aware that this is a substantial effort, but we are convinced that the resulting improved data situation will greatly benefit the progress of chemistry.
Collapse
Affiliation(s)
- Malte L Schrader
- Organisch-Chemisches Institut, Universität Münster, Münster, Germany
| | - Felix R Schäfer
- Organisch-Chemisches Institut, Universität Münster, Münster, Germany
| | - Felix Schäfers
- Organisch-Chemisches Institut, Universität Münster, Münster, Germany
| | - Frank Glorius
- Organisch-Chemisches Institut, Universität Münster, Münster, Germany.
| |
Collapse
|
9
|
Voinarovska V, Kabeshov M, Dudenko D, Genheden S, Tetko IV. When Yield Prediction Does Not Yield Prediction: An Overview of the Current Challenges. J Chem Inf Model 2024; 64:42-56. [PMID: 38116926 PMCID: PMC10778086 DOI: 10.1021/acs.jcim.3c01524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 12/21/2023]
Abstract
Machine Learning (ML) techniques face significant challenges when predicting advanced chemical properties, such as yield, feasibility of chemical synthesis, and optimal reaction conditions. These challenges stem from the high-dimensional nature of the prediction task and the myriad essential variables involved, ranging from reactants and reagents to catalysts, temperature, and purification processes. Successfully developing a reliable predictive model not only holds the potential for optimizing high-throughput experiments but can also elevate existing retrosynthetic predictive approaches and bolster a plethora of applications within the field. In this review, we systematically evaluate the efficacy of current ML methodologies in chemoinformatics, shedding light on their milestones and inherent limitations. Additionally, a detailed examination of a representative case study provides insights into the prevailing issues related to data availability and transferability in the discipline.
Collapse
Affiliation(s)
- Varvara Voinarovska
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
- TUM
Graduate School, Faculty of Chemistry, Technical
University of Munich, 85748 Garching, Germany
| | - Mikhail Kabeshov
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
| | - Dmytro Dudenko
- Enamine
Ltd., 78 Chervonotkatska str., 02094 Kyiv, Ukraine
| | - Samuel Genheden
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
| | - Igor V. Tetko
- Molecular
Targets and Therapeutics Center, Helmholtz Munich − Deutsches
Forschungszentrum für Gesundheit und Umwelt (GmbH), Institute of Structural Biology, 85764 Neuherberg, Germany
| |
Collapse
|
10
|
Raghavan P, Haas BC, Ruos ME, Schleinitz J, Doyle AG, Reisman SE, Sigman MS, Coley CW. Dataset Design for Building Models of Chemical Reactivity. ACS CENTRAL SCIENCE 2023; 9:2196-2204. [PMID: 38161380 PMCID: PMC10755851 DOI: 10.1021/acscentsci.3c01163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 11/06/2023] [Accepted: 11/15/2023] [Indexed: 01/03/2024]
Abstract
Models can codify our understanding of chemical reactivity and serve a useful purpose in the development of new synthetic processes via, for example, evaluating hypothetical reaction conditions or in silico substrate tolerance. Perhaps the most determining factor is the composition of the training data and whether it is sufficient to train a model that can make accurate predictions over the full domain of interest. Here, we discuss the design of reaction datasets in ways that are conducive to data-driven modeling, emphasizing the idea that training set diversity and model generalizability rely on the choice of molecular or reaction representation. We additionally discuss the experimental constraints associated with generating common types of chemistry datasets and how these considerations should influence dataset design and model building.
Collapse
Affiliation(s)
- Priyanka Raghavan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Brittany C. Haas
- Department
of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Madeline E. Ruos
- Department
of Chemistry & Biochemistry, University
of California, Los Angeles, Los Angeles, California 90095, United States
| | - Jules Schleinitz
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Abigail G. Doyle
- Department
of Chemistry & Biochemistry, University
of California, Los Angeles, Los Angeles, California 90095, United States
| | - Sarah E. Reisman
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Matthew S. Sigman
- Department
of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
11
|
Xu Y, Gao Y, Su L, Wu H, Tian H, Zeng M, Xu C, Zhu X, Liao K. High-Throughput Experimentation and Machine Learning-Assisted Optimization of Iridium-Catalyzed Cross-Dimerization of Sulfoxonium Ylides. Angew Chem Int Ed Engl 2023; 62:e202313638. [PMID: 37814819 DOI: 10.1002/anie.202313638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/09/2023] [Accepted: 10/09/2023] [Indexed: 10/11/2023]
Abstract
A novel and convenient approach that combines high-throughput experimentation (HTE) with machine learning (ML) technologies to achieve the first selective cross-dimerization of sulfoxonium ylides via iridium catalysis is presented. A variety of valuable amide-, ketone-, ester-, and N-heterocycle-substituted unsymmetrical E-alkenes are synthesized in good yields with high stereoselectivities. This mild method avoids the use of diazo compounds and is characterized by simple operation, high step-economy, and excellent chemoselectivity and functional group compatibility. The combined experimental and computational studies identify an amide-sulfoxonium ylide as a carbene precursor. Furthermore, a comprehensive exploration of the reaction space is also performed (600 reactions) and a machine learning model for reaction yield prediction has been constructed.
Collapse
Affiliation(s)
- Yougen Xu
- Guangzhou National Laboratory, Guangzhou, 510005, PR China
- Bioland Laboratory, Guangzhou, 510005, PR China
| | - Yadong Gao
- Guangzhou National Laboratory, Guangzhou, 510005, PR China
| | - Lebin Su
- Guangzhou National Laboratory, Guangzhou, 510005, PR China
- Bioland Laboratory, Guangzhou, 510005, PR China
| | - Haiting Wu
- Guangzhou National Laboratory, Guangzhou, 510005, PR China
| | - Hao Tian
- Guangzhou National Laboratory, Guangzhou, 510005, PR China
| | - Majian Zeng
- Guangzhou National Laboratory, Guangzhou, 510005, PR China
| | - Chunqiu Xu
- AIChemEco Inc., Guangzhou, 510005, PR China
| | - Xinwei Zhu
- AIChemEco Inc., Guangzhou, 510005, PR China
| | - Kuangbiao Liao
- Guangzhou National Laboratory, Guangzhou, 510005, PR China
- Bioland Laboratory, Guangzhou, 510005, PR China
| |
Collapse
|
12
|
Götz J, Jackl MK, Jindakun C, Marziale AN, André J, Gosling DJ, Springer C, Palmieri M, Reck M, Luneau A, Brocklehurst CE, Bode JW. High-throughput synthesis provides data for predicting molecular properties and reaction success. SCIENCE ADVANCES 2023; 9:eadj2314. [PMID: 37889964 PMCID: PMC10610918 DOI: 10.1126/sciadv.adj2314] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 09/26/2023] [Indexed: 10/29/2023]
Abstract
The generation of attractive scaffolds for drug discovery efforts requires the expeditious synthesis of diverse analogues from readily available building blocks. This endeavor necessitates a trade-off between diversity and ease of access and is further complicated by uncertainty about the synthesizability and pharmacokinetic properties of the resulting compounds. Here, we document a platform that leverages photocatalytic N-heterocycle synthesis, high-throughput experimentation, automated purification, and physicochemical assays on 1152 discrete reactions. Together, the data generated allow rational predictions of the synthesizability of stereochemically diverse C-substituted N-saturated heterocycles with deep learning and reveal unexpected trends on the relationship between structure and properties. This study exemplifies how organic chemists can exploit state-of-the-art technologies to markedly increase throughput and confidence in the preparation of drug-like molecules.
Collapse
Affiliation(s)
- Julian Götz
- Laboratory of Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| | - Moritz K. Jackl
- Laboratory of Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| | - Chalupat Jindakun
- Laboratory of Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| | - Alexander N. Marziale
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, 4056 Basel, Switzerland
| | - Jérôme André
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, 4056 Basel, Switzerland
| | - Daniel J. Gosling
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, 4056 Basel, Switzerland
| | - Clayton Springer
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, Cambridge, MA 02139, USA
| | - Marco Palmieri
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, 4056 Basel, Switzerland
| | - Marcel Reck
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, 4056 Basel, Switzerland
| | - Alexandre Luneau
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, 4056 Basel, Switzerland
| | - Cara E. Brocklehurst
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, 4056 Basel, Switzerland
| | - Jeffrey W. Bode
- Laboratory of Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| |
Collapse
|
13
|
Liu Z, Moroz YS, Isayev O. The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions. Chem Sci 2023; 14:10835-10846. [PMID: 37829036 PMCID: PMC10566507 DOI: 10.1039/d3sc03902a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 09/12/2023] [Indexed: 10/14/2023] Open
Abstract
Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis prediction, but current models have failed to generalize to large literature datasets. To understand the causes and inspire future design, we systematically benchmarked the yield prediction task. We carefully curated and augmented a literature dataset of 41 239 amide coupling reactions, each with information on reactants, products, intermediates, yields, and reaction contexts, and provided 3D structures for the molecules. We calculated molecular features related to 2D and 3D structure information, as well as physical and electronic properties. These descriptors were paired with 4 categories of machine learning methods (linear, kernel, ensemble, and neural network), yielding valuable benchmarks about feature and model performance. Despite the excellent performance on a high-throughput experiment (HTE) dataset (R2 around 0.9), no method gave satisfactory results on the literature data. The best performance was an R2 of 0.395 ± 0.020 using the stack technique. Error analysis revealed that reactivity cliff and yield uncertainty are among the main reasons for incorrect predictions. Removing reactivity cliffs and uncertain reactions boosted the R2 to 0.457 ± 0.006. These results highlight that yield prediction models must be sensitive to the reactivity change due to the subtle structure variance, as well as be robust to the uncertainty associated with yield measurements.
Collapse
Affiliation(s)
- Zhen Liu
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University Pittsburgh PA 15213 USA
| | - Yurii S Moroz
- Enamine Ltd Kyïv 02660 Ukraine
- Chemspace LLC Kyïv 02094 Ukraine
- Taras Shevchenko National University of Kyïv Kyïv 01601 Ukraine
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University Pittsburgh PA 15213 USA
| |
Collapse
|
14
|
Reid M, Teskey CJ. Highlights from the 56th Bürgenstock Conference on Stereochemistry 2023. Chem Sci 2023; 14:9244-9247. [PMID: 37712042 PMCID: PMC10498497 DOI: 10.1039/d3sc90151c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/16/2023] Open
Abstract
Herein, we share an overview of the scientific highlights from speakers at the latest edition of the longstanding Bürgenstock Conference.
Collapse
Affiliation(s)
- Marc Reid
- WestCHEM Department of Pure & Applied Chemistry, University of Strathclyde Glasgow UK
| | - Christopher J Teskey
- Institute of Organic Chemistry, RWTH Aachen University Landoltweg 1 52074 Aachen Germany
| |
Collapse
|
15
|
Mahjour B, Zhang R, Shen Y, McGrath A, Zhao R, Mohamed OG, Lin Y, Zhang Z, Douthwaite JL, Tripathi A, Cernak T. Rapid planning and analysis of high-throughput experiment arrays for reaction discovery. Nat Commun 2023; 14:3924. [PMID: 37400469 DOI: 10.1038/s41467-023-39531-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 06/13/2023] [Indexed: 07/05/2023] Open
Abstract
High-throughput experimentation (HTE) is an increasingly important tool in reaction discovery. While the hardware for running HTE in the chemical laboratory has evolved significantly in recent years, there remains a need for software solutions to navigate data-rich experiments. Here we have developed phactor™, a software that facilitates the performance and analysis of HTE in a chemical laboratory. phactor™ allows experimentalists to rapidly design arrays of chemical reactions or direct-to-biology experiments in 24, 96, 384, or 1,536 wellplates. Users can access online reagent data, such as a chemical inventory, to virtually populate wells with experiments and produce instructions to perform the reaction array manually, or with the assistance of a liquid handling robot. After completion of the reaction array, analytical results can be uploaded for facile evaluation, and to guide the next series of experiments. All chemical data, metadata, and results are stored in machine-readable formats that are readily translatable to various software. We also demonstrate the use of phactor™ in the discovery of several chemistries, including the identification of a low micromolar inhibitor of the SARS-CoV-2 main protease. Furthermore, phactor™ has been made available for free academic use in 24- and 96-well formats via an online interface.
Collapse
Affiliation(s)
- Babak Mahjour
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Rui Zhang
- Department of Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Yuning Shen
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Andrew McGrath
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Ruheng Zhao
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Osama G Mohamed
- Natural Products Discovery Core, Life Sciences Institute, University of Michigan, Ann Arbor, MI, USA
| | - Yingfu Lin
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Zirong Zhang
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - James L Douthwaite
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Ashootosh Tripathi
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
- Natural Products Discovery Core, Life Sciences Institute, University of Michigan, Ann Arbor, MI, USA
| | - Tim Cernak
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA.
- Department of Chemistry, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
16
|
Shim E, Tewari A, Cernak T, Zimmerman PM. Machine Learning Strategies for Reaction Development: Toward the Low-Data Limit. J Chem Inf Model 2023; 63:3659-3668. [PMID: 37312524 PMCID: PMC11163943 DOI: 10.1021/acs.jcim.3c00577] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Machine learning models are increasingly being utilized to predict outcomes of organic chemical reactions. A large amount of reaction data is used to train these models, which is in stark contrast to how expert chemists discover and develop new reactions by leveraging information from a small number of relevant transformations. Transfer learning and active learning are two strategies that can operate in low-data situations, which may help fill this gap and promote the use of machine learning for tackling real-world challenges in organic synthesis. This Perspective introduces active and transfer learning and connects these to potential opportunities and directions for further research, especially in the area of prospective development of chemical transformations.
Collapse
Affiliation(s)
- Eunjae Shim
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Ambuj Tewari
- Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Tim Cernak
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Paul M Zimmerman
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
17
|
Chen K, Chen G, Li J, Huang Y, Wang E, Hou T, Heng PA. MetaRF: attention-based random forest for reaction yield prediction with a few trails. J Cheminform 2023; 15:43. [PMID: 37038222 PMCID: PMC10084704 DOI: 10.1186/s13321-023-00715-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 03/21/2023] [Indexed: 04/12/2023] Open
Abstract
Artificial intelligence has deeply revolutionized the field of medicinal chemistry with many impressive applications, but the success of these applications requires a massive amount of training samples with high-quality annotations, which seriously limits the wide usage of data-driven methods. In this paper, we focus on the reaction yield prediction problem, which assists chemists in selecting high-yield reactions in a new chemical space only with a few experimental trials. To attack this challenge, we first put forth MetaRF, an attention-based random forest model specially designed for the few-shot yield prediction, where the attention weight of a random forest is automatically optimized by the meta-learning framework and can be quickly adapted to predict the performance of new reagents while given a few additional samples. To improve the few-shot learning performance, we further introduce a dimension-reduction based sampling method to determine valuable samples to be experimentally tested and then learned. Our methodology is evaluated on three different datasets and acquires satisfactory performance on few-shot prediction. In high-throughput experimentation (HTE) datasets, the average yield of our methodology's top 10 high-yield reactions is relatively close to the results of ideal yield selection.
Collapse
Affiliation(s)
- Kexin Chen
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, New Territories, Hong Kong SAR
| | | | | | - Yuansheng Huang
- College of Pharmaceutical Sciences, Zhejiang University, Zhejiang, China
| | - Ercheng Wang
- Zhejiang Lab, Zhejiang, China
- College of Pharmaceutical Sciences, Zhejiang University, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Zhejiang, China
| | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, New Territories, Hong Kong SAR
- Zhejiang Lab, Zhejiang, China
| |
Collapse
|
18
|
Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023; 14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.
Collapse
Affiliation(s)
- Zhengkai Tu
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|
19
|
Hu Y, Zhao W, Wang L, Lin J, Du L. Machine-Learning-Assisted Design of Highly Tough Thermosetting Polymers. ACS APPLIED MATERIALS & INTERFACES 2022; 14:55004-55016. [PMID: 36456181 DOI: 10.1021/acsami.2c14290] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Despite advances in machine learning for accurately predicting material properties, forecasting the performance of thermosetting polymers remains a challenge due to the sparsity of historical experimental data and their complicated crosslinked structures. We proposed a machine-learning-assisted materials genome approach (MGA) for rapidly designing novel epoxy thermosets with excellent mechanical properties (high tensile moduli, high tensile strength, and high toughness) through high-throughput screening in a vast chemical space. Machine-learning models were established by combining attention- and gate-augmented graph convolutional networks, multilayer perceptrons, classical gel theory, and transfer learning from small molecules to polymers. Proof-of-concept experiments were carried out, and the structures designed by the MGA were verified. Gene substructures affecting the modulus, strength, and toughness were also extracted, revealing the mechanisms of polymers with high mechanical properties. The developed strategy can be employed to design other thermosetting polymers efficiently.
Collapse
Affiliation(s)
- Yaxi Hu
- Shanghai Key Laboratory of Advanced Polymeric Materials, Key Laboratory for Ultrafine Materials of Ministry of Education, Frontiers Science Center for Materiobiology and Dynamic Chemistry, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai200237, China
| | - Wenlin Zhao
- Shanghai Key Laboratory of Advanced Polymeric Materials, Key Laboratory for Ultrafine Materials of Ministry of Education, Frontiers Science Center for Materiobiology and Dynamic Chemistry, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai200237, China
| | - Liquan Wang
- Shanghai Key Laboratory of Advanced Polymeric Materials, Key Laboratory for Ultrafine Materials of Ministry of Education, Frontiers Science Center for Materiobiology and Dynamic Chemistry, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai200237, China
| | - Jiaping Lin
- Shanghai Key Laboratory of Advanced Polymeric Materials, Key Laboratory for Ultrafine Materials of Ministry of Education, Frontiers Science Center for Materiobiology and Dynamic Chemistry, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai200237, China
| | - Lei Du
- Shanghai Key Laboratory of Advanced Polymeric Materials, Key Laboratory for Ultrafine Materials of Ministry of Education, Frontiers Science Center for Materiobiology and Dynamic Chemistry, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai200237, China
| |
Collapse
|