Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Schwaller P, Petraglia R, Zullo V, Nair VH, Haeuselmann RA, Pisoni R, Bekas C, Iuliano A, Laino T. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem Sci 2020;11:3316-3325. [PMID: 34122839 PMCID: PMC8152799 DOI: 10.1039/c9sc05704h] [Citation(s) in RCA: 134] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 03/02/2020] [Indexed: 12/20/2022] Open

For:	Schwaller P, Petraglia R, Zullo V, Nair VH, Haeuselmann RA, Pisoni R, Bekas C, Iuliano A, Laino T. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem Sci 2020;11:3316-3325. [PMID: 34122839 PMCID: PMC8152799 DOI: 10.1039/c9sc05704h] [Citation(s) in RCA: 134] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 03/02/2020] [Indexed: 12/20/2022] Open

Number

Cited by Other Article(s)

Nana Teukam YG, Kwate Dassi L, Manica M, Probst D, Schwaller P, Laino T. Language models can identify enzymatic binding sites in protein sequences. Comput Struct Biotechnol J 2024;23:1929-1937. [PMID: 38736695 PMCID: PMC11087710 DOI: 10.1016/j.csbj.2024.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/05/2024] [Accepted: 04/05/2024] [Indexed: 05/14/2024] Open

Li J, Lin K, Pei J, Lai L. Challenging Complexity with Simplicity: Rethinking the Role of Single-Step Models in Computer-Aided Synthesis Planning. J Chem Inf Model 2024. [PMID: 38940765 DOI: 10.1021/acs.jcim.4c00432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]

Guo J, Yu C, Li K, Zhang Y, Wang G, Li S, Dong H. Retrosynthesis Zero: Self-Improving Global Synthesis Planning Using Reinforcement Learning. J Chem Theory Comput 2024;20:4921-4938. [PMID: 38747149 DOI: 10.1021/acs.jctc.4c00071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]

Luong KD, Singh A. Application of Transformers in Cheminformatics. J Chem Inf Model 2024;64:4392-4409. [PMID: 38815246 PMCID: PMC11167597 DOI: 10.1021/acs.jcim.3c02070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/05/2024] [Accepted: 05/06/2024] [Indexed: 06/01/2024]

Das M, Ghosh A, Sunoj RB. Advances in machine learning with chemical language models in molecular property and reaction outcome predictions. J Comput Chem 2024;45:1160-1176. [PMID: 38299229 DOI: 10.1002/jcc.27315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/06/2024] [Accepted: 01/09/2024] [Indexed: 02/02/2024]

Abstract

Molecular properties and reactions form the foundation of chemical space. Over the years, innumerable molecules have been synthesized, a smaller fraction of them found immediate applications, while a larger proportion served as a testimony to creative and empirical nature of the domain of chemical science. With increasing emphasis on sustainable practices, it is desirable that a target set of molecules are synthesized preferably through a fewer empirical attempts instead of a larger library, to realize an active candidate. In this front, predictive endeavors using machine learning (ML) models built on available data acquire high timely significance. Prediction of molecular property and reaction outcome remain one of the burgeoning applications of ML in chemical science. Among several methods of encoding molecular samples for ML models, the ones that employ language like representations are gaining steady popularity. Such representations would additionally help adopt well-developed natural language processing (NLP) models for chemical applications. Given this advantageous background, herein we describe several successful chemical applications of NLP focusing on molecular property and reaction outcome predictions. From relatively simpler recurrent neural networks (RNNs) to complex models like transformers, different network architecture have been leveraged for tasks such as de novo drug design, catalyst generation, forward and retro-synthesis predictions. The chemical language model (CLM) provides promising avenues toward a broad range of applications in a time and cost-effective manner. While we showcase an optimistic outlook of CLMs, attention is also placed on the persisting challenges in reaction domain, which would optimistically be addressed by advanced algorithms tailored to chemical language and with increased availability of high-quality datasets.

Collapse

Wigh D, Arrowsmith J, Pomberger A, Felton KC, Lapkin AA. ORDerly: Data Sets and Benchmarks for Chemical Reaction Data. J Chem Inf Model 2024;64:3790-3798. [PMID: 38648077 PMCID: PMC11094788 DOI: 10.1021/acs.jcim.4c00292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/03/2024] [Accepted: 04/04/2024] [Indexed: 04/25/2024]

M. Bran A, Cox S, Schilter O, Baldassari C, White AD, Schwaller P. Augmenting large language models with chemistry tools. NAT MACH INTELL 2024;6:525-535. [PMID: 38799228 PMCID: PMC11116106 DOI: 10.1038/s42256-024-00832-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 03/27/2024] [Indexed: 05/29/2024]

Westerlund AM, Manohar Koki S, Kancharla S, Tibo A, Saigiridharan L, Kabeshov M, Mercado R, Genheden S. Do Chemformers Dream of Organic Matter? Evaluating a Transformer Model for Multistep Retrosynthesis. J Chem Inf Model 2024;64:3021-3033. [PMID: 38602390 DOI: 10.1021/acs.jcim.3c01685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]

Yao S, Song J, Jia L, Cheng L, Zhong Z, Song M, Feng Z. Fast and effective molecular property prediction with transferability map. Commun Chem 2024;7:85. [PMID: 38632308 PMCID: PMC11024153 DOI: 10.1038/s42004-024-01169-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 04/05/2024] [Indexed: 04/19/2024] Open

Chen J, Schwaller P. Molecular hypergraph neural networks. J Chem Phys 2024;160:144307. [PMID: 38597317 DOI: 10.1063/5.0193557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 03/14/2024] [Indexed: 04/11/2024] Open

Dobbelaere MR, Lengyel I, Stevens CV, Van Geem KM. Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices. J Cheminform 2024;16:37. [PMID: 38553720 PMCID: PMC10980627 DOI: 10.1186/s13321-024-00834-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/23/2024] [Indexed: 04/02/2024] Open

Abstract

The challenge of devising pathways for organic synthesis remains a central issue in the field of medicinal chemistry. Over the span of six decades, computer-aided synthesis planning has given rise to a plethora of potent tools for formulating synthetic routes. Nevertheless, a significant expert task still looms: determining the appropriate solvent, catalyst, and reagents when provided with a set of reactants to achieve and optimize the desired product for a specific step in the synthesis process. Typically, chemists identify key functional groups and rings that exert crucial influences at the reaction center, classify reactions into categories, and may assign them names. This research introduces Rxn-INSIGHT, an open-source algorithm based on the bond-electron matrix approach, with the purpose of automating this endeavor. Rxn-INSIGHT not only streamlines the process but also facilitates extensive querying of reaction databases, effectively replicating the thought processes of an organic chemist. The core functions of the algorithm encompass the classification and naming of reactions, extraction of functional groups, rings, and scaffolds from the involved chemical entities. The provision of reaction condition recommendations based on the similarity and prevalence of reactions eventually arises as a side application. The performance of our rule-based model has been rigorously assessed against a carefully curated benchmark dataset, exhibiting an accuracy rate exceeding 90% in reaction classification and surpassing 95% in reaction naming. Notably, it has been discerned that a pivotal factor in selecting analogous reactions lies in the analysis of ring structures participating in the reactions. An examination of ring structures within the USPTO chemical reaction database reveals that with just 35 unique rings, a remarkable 75% of all rings found in nearly 1 million products can be encompassed. Furthermore, Rxn-INSIGHT is proficient in suggesting appropriate choices for solvents, catalysts, and reagents in entirely novel reactions, all within the span of a second, utilizing nothing more than an everyday laptop.

Collapse

Pasquini M, Stenta M. LinChemIn: Route Arithmetic─Operations on Digital Synthetic Routes. J Chem Inf Model 2024;64:1765-1771. [PMID: 38480486 DOI: 10.1021/acs.jcim.3c01819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]

Zhao D, Tu S, Xu L. Efficient retrosynthetic planning with MCTS exploration enhanced A^* search. Commun Chem 2024;7:52. [PMID: 38454002 PMCID: PMC10920677 DOI: 10.1038/s42004-024-01133-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 02/20/2024] [Indexed: 03/09/2024] Open

Qi X, Zhao Y, Qi Z, Hou S, Chen J. Machine Learning Empowering Drug Discovery: Applications, Opportunities and Challenges. Molecules 2024;29:903. [PMID: 38398653 PMCID: PMC10892089 DOI: 10.3390/molecules29040903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 02/08/2024] [Accepted: 02/14/2024] [Indexed: 02/25/2024] Open

Pham TT, Guo Z, Li B, Lapkin AA, Yan N. Synthesis of Pyrrole-2-Carboxylic Acid from Cellulose- and Chitin-Based Feedstocks Discovered by the Automated Route Search. CHEMSUSCHEM 2024;17:e202300538. [PMID: 37792551 DOI: 10.1002/cssc.202300538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 10/02/2023] [Accepted: 10/04/2023] [Indexed: 10/06/2023]

Chen LY, Li YP. Enhancing chemical synthesis: a two-stage deep neural network for predicting feasible reaction conditions. J Cheminform 2024;16:11. [PMID: 38268009 DOI: 10.1186/s13321-024-00805-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Accepted: 01/14/2024] [Indexed: 01/26/2024] Open

Abstract

In the field of chemical synthesis planning, the accurate recommendation of reaction conditions is essential for achieving successful outcomes. This work introduces an innovative deep learning approach designed to address the complex task of predicting appropriate reagents, solvents, and reaction temperatures for chemical reactions. Our proposed methodology combines a multi-label classification model with a ranking model to offer tailored reaction condition recommendations based on relevance scores derived from anticipated product yields. To tackle the challenge of limited data for unfavorable reaction contexts, we employed the technique of hard negative sampling to generate reaction conditions that might be mistakenly classified as suitable, forcing the model to refine its decision boundaries, especially in challenging cases. Our developed model excels in proposing conditions where an exact match to the recorded solvents and reagents is found within the top-10 predictions 73% of the time. It also predicts temperatures within ± 20 [Formula: see text] of the recorded temperature in 89% of test cases. Notably, the model demonstrates its capacity to recommend multiple viable reaction conditions, with accuracy varying based on the availability of condition records associated with each reaction. What sets this model apart is its ability to suggest alternative reaction conditions beyond the constraints of the dataset. This underscores its potential to inspire innovative approaches in chemical research, presenting a compelling opportunity for advancing chemical synthesis planning and elevating the field of reaction engineering. Scientific contribution: The combination of multi-label classification and ranking models provides tailored recommendations for reaction conditions based on the reaction yields. A novel approach is presented to address the issue of data scarcity in negative reaction conditions through data augmentation.

Collapse

Back S, Aspuru-Guzik A, Ceriotti M, Gryn'ova G, Grzybowski B, Gu GH, Hein J, Hippalgaonkar K, Hormázabal R, Jung Y, Kim S, Kim WY, Moosavi SM, Noh J, Park C, Schrier J, Schwaller P, Tsuda K, Vegge T, von Lilienfeld OA, Walsh A. Accelerated chemical science with AI. DIGITAL DISCOVERY 2024;3:23-33. [PMID: 38239898 PMCID: PMC10793638 DOI: 10.1039/d3dd00213f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 12/06/2023] [Indexed: 01/22/2024]

Affiliation(s)

Seoin Back Department of Chemical and Biomolecular Engineering, Institute of Emergent Materials, Sogang University Seoul Republic of Korea
Alán Aspuru-Guzik Departments of Chemistry, Computer Science, University of Toronto St. George Campus Toronto ON Canada Acceleration Consortium and Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
Michele Ceriotti Laboratory of Computational Science and Modeling (COSMO), École Polytechnique Fédérale de Lausanne Lausanne Switzerland
Ganna Gryn'ova Heidelberg Institute for Theoretical Studies (HITS gGmbH) 69118 Heidelberg Germany Interdisciplinary Center for Scientific Computing, Heidelberg University 69120 Heidelberg Germany
Bartosz Grzybowski Center for Algorithmic and Robotized Synthesis (CARS), Institute for Basic Science (IBS) Ulsan Republic of Korea Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland Department of Chemistry, Ulsan National Institute of Science and Technology Ulsan Republic of Korea
Geun Ho Gu Department of Energy Engineering, Korea Institute of Energy Technology (KENTECH) Naju 58330 Republic of Korea
Jason Hein Department of Chemistry, University of British Columbia Vancouver BC V6T 1Z1 Canada
Kedar Hippalgaonkar School of Materials Science and Engineering, Nanyang Technological University 50 Nanyang Avenue Singapore 639798 Singapore Institute of Materials Research and Engineering, Agency for Science Technology and Research 2 Fusionopolis Way, 08-03 Singapore 138634 Singapore
Rodrigo Hormázabal LG AI Research Seoul Republic of Korea
Yousung Jung Department of Chemical and Biomolecular Engineering, KAIST Daejeon Republic of Korea School of Chemical and Biological Engineering, Interdisciplinary Program in Artificial Intelligence, Seoul National University 1 Gwanak-ro, Gwanak-gu Seoul 08826 Republic of Korea
Seonah Kim Department of Chemistry, Colorado State University 1301 Center Avenue Fort Collins CO 80523 USA
Woo Youn Kim Department of Chemistry, KAIST Daejeon Republic of Korea
Seyed Mohamad Moosavi Chemical Engineering & Applied Chemistry, University of Toronto Toronto Ontario M5S 3E5 Canada
Juhwan Noh Chemical Data-Driven Research Center, Korea Research Institute of Chemical Technology Daejeon 34114 Republic of Korea
Changyoung Park LG AI Research Seoul Republic of Korea
Joshua Schrier Department of Chemistry, Fordham University The Bronx NY 10458 USA
Philippe Schwaller Laboratory of Artificial Chemical Intelligence (LIAC) & National Centre of Competence in Research (NCCR) Catalysis, École Polytechnique Fédérale de Lausanne Lausanne Switzerland
Koji Tsuda Graduate School of Frontier Sciences, The University of Tokyo Kashiwa Chiba 277-8561 Japan Center for Basic Research on Materials, National Institute for Materials Science Tsukuba Ibaraki 305-0044 Japan RIKEN Center for Advanced Intelligence Project Tokyo 103-0027 Japan
Tejs Vegge Department of Energy Conversion and Storage, Technical University of Denmark 301 Anker Engelunds vej, Kongens Lyngby Copenhagen 2800 Denmark
O Anatole von Lilienfeld Acceleration Consortium and Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada Departments of Chemistry, Materials Science and Engineering, and Physics, University of Toronto, St George Campus Toronto ON Canada Machine Learning Group, Technische Universität Berlin and Berlin Institute for the Foundations of Learning and Data 10587 Berlin Germany
Aron Walsh Department of Materials, Imperial College London London SW7 2AZ UK Department of Physics, Ewha Women's University Seoul Republic of Korea

Collapse

Yin X, Hsieh CY, Wang X, Wu Z, Ye Q, Bao H, Deng Y, Chen H, Luo P, Liu H, Hou T, Yao X. Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning. RESEARCH (WASHINGTON, D.C.) 2024;7:0292. [PMID: 38213662 PMCID: PMC10777739 DOI: 10.34133/research.0292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 12/06/2023] [Indexed: 01/13/2024]

Abstract

Deep learning (DL)-driven efficient synthesis planning may profoundly transform the paradigm for designing novel pharmaceuticals and materials. However, the progress of many DL-assisted synthesis planning (DASP) algorithms has suffered from the lack of reliable automated pathway evaluation tools. As a critical metric for evaluating chemical reactions, accurate prediction of reaction yields helps improve the practicality of DASP algorithms in the real-world scenarios. Currently, accurately predicting yields of interesting reactions still faces numerous challenges, mainly including the absence of high-quality generic reaction yield datasets and robust generic yield predictors. To compensate for the limitations of high-throughput yield datasets, we curated a generic reaction yield dataset containing 12 reaction categories and rich reaction condition information. Subsequently, by utilizing 2 pretraining tasks based on chemical reaction masked language modeling and contrastive learning, we proposed a powerful bidirectional encoder representations from transformers (BERT)-based reaction yield predictor named Egret. It achieved comparable or even superior performance to the best previous models on 4 benchmark datasets and established state-of-the-art performance on the newly curated dataset. We found that reaction-condition-based contrastive learning enhances the model's sensitivity to reaction conditions, and Egret is capable of capturing subtle differences between reactions involving identical reactants and products but different reaction conditions. Furthermore, we proposed a new scoring function that incorporated Egret into the evaluation of multistep synthesis routes. Test results showed that yield-incorporated scoring facilitated the prioritization of literature-supported high-yield reaction pathways for target molecules. In addition, through meta-learning strategy, we further improved the reliability of the model's prediction for reaction types with limited data and lower data quality. Our results suggest that Egret holds the potential to become an essential component of the next-generation DASP tools.

Collapse

Affiliation(s)

Xiaodan Yin Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao 999078, China CarbonSilicon AI Technology Co. Ltd, Hangzhou, Zhejiang 310018, China
Chang-Yu Hsieh Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
Xiaorui Wang Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao 999078, China CarbonSilicon AI Technology Co. Ltd, Hangzhou, Zhejiang 310018, China
Zhenxing Wu Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China CarbonSilicon AI Technology Co. Ltd, Hangzhou, Zhejiang 310018, China
Qing Ye Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China CarbonSilicon AI Technology Co. Ltd, Hangzhou, Zhejiang 310018, China
Honglei Bao Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao 999078, China
Yafeng Deng CarbonSilicon AI Technology Co. Ltd, Hangzhou, Zhejiang 310018, China
Hongming Chen Center of Chemistry and Chemical Biology, Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou 510530, China
Pei Luo Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao 999078, China
Huanxiang Liu Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
Tingjun Hou Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
Xiaojun Yao Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China

Collapse

Jhangiani A, Panda V, Sukheja A, Thomas S, Dusseja P, Pandya S, Chintakrindi A. Toxicological Profiling of Potential Shikimate Kinase Inhibitors Against Mycobacterium tuberculosis. Altern Lab Anim 2024;52:10-27. [PMID: 38095084 DOI: 10.1177/02611929231217062] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]

Abstract

Over the last decade, Mycobacterium tuberculosis has mutated into a putative 'superbug', as treatments against it have failed due to increasing antimicrobial resistance. As a result, the rising incidence of multidrug-resistant tuberculosis (MDR-TB) is posing a significant public health threat, thus, the need to develop effective drugs for MDR-TB has become an urgent priority. To identify new drug candidates for the treatment of MDR-TB, the present study was based on mycobacterial shikimate kinase (MtSK) as the pharmacological target. One hundred potential MtSK inhibitors were identified from literature and database searches to identify compounds that were designed to specifically function as MtSK antagonists. The ADME properties of these compounds were evaluated by using the SwissADME web tool. ProTox-II software was also used to investigate any potential endocrine disrupting effects, mediated through their interaction with oestrogenic and/or androgenic receptors. This study also aimed to predict LD50 values of potential drug candidates that would be active against the standard H37Rv strain of M. tuberculosis, by using the ProTox-II in silico tool. The molecules for which no structural hazard alerts were identified with these software tools were further subjected to molecular docking analyses and molecular dynamic simulations to estimate their ability to interact with the MtSK enzyme. Preliminary results from SwissADME indicated that 30 molecules were drug-like, due to their physicochemical and pharmacokinetic properties. However, subsequent analysis with ToxTree and ProTox-II indicated that only three of these 30 drug-like molecules were suitable for taking forward into further in vitro experiments. This study, which is based on the use of commonly used open-source in silico tools, identified new MtSK ligands for potential use in the development of new drugs for the therapeutic management of tuberculosis. An initial prediction of their safety profile was also generated.

Collapse

Liu T, Cao Z, Huang Y, Wan Y, Wu J, Hsieh CY, Hou T, Kang Y. SynCluster: Reaction Type Clustering and Recommendation Framework for Synthesis Planning. JACS AU 2023;3:3446-3461. [PMID: 38155655 PMCID: PMC10751778 DOI: 10.1021/jacsau.3c00607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 11/07/2023] [Accepted: 11/08/2023] [Indexed: 12/30/2023]

Abstract

AI-assisted synthesis planning has emerged as a valuable tool in accelerating synthetic chemistry for the discovery of new drugs and materials. The template-free approach, which showcases superior generalization capabilities, is seen as the mainstream direction in this field. However, it remains unclear whether such an end-to-end approach can achieve problem-solving performance on par with experienced chemists without fully revealing insights into the chemical mechanisms involved. Moreover, there is a lack of unified and chemically inspired frameworks for improving multitask reaction predictions in this area. In this study, we have addressed these challenges by investigating the impact of fine-grained reaction-type labels on multiple downstream tasks and propose a novel framework named SynCluster. This framework incorporates unsupervised clustering cues into the baseline models and identifies plausible chemical subspaces which is compatible with multitask extensions and can serve as model-independent indicators to effectively enhance the performance of multiple downstream tasks. In retrosynthesis prediction, SynCluster achieves significant improvements of 4.1 and 11.0% in top-1 and top-10 prediction accuracy, respectively, compared to the baseline Molecular Transformer, and achieves a notable enhancement of 13.9% in top-10 accuracy when combined with Retroformer. By incorporating simplified molecular-input line-entry system augmentation, our framework achieves higher top-10 accuracy compared to state-of-the-art sequence-based retrosynthesis models and improves over the baseline on the diversity and validity of reactants. SynCluster also achieves 94.9% top-10 accuracy in forward synthesis prediction and 51.5% top-10 Maxfrag accuracy in reagent prediction. Overall, SynCluster provides a fresh perspective with chemical interpretability and reinforcement of domain knowledge in the synthesis design. It offers a promising solution for improving the accuracy and efficiency of AI-assisted synthesis planning and bridges the gap between template-free approaches and the problem-solving abilities of experienced chemists.

Collapse

Koscher BA, Canty RB, McDonald MA, Greenman KP, McGill CJ, Bilodeau CL, Jin W, Wu H, Vermeire FH, Jin B, Hart T, Kulesza T, Li SC, Jaakkola TS, Barzilay R, Gómez-Bombarelli R, Green WH, Jensen KF. Autonomous, multiproperty-driven molecular discovery: From predictions to measurements and back. Science 2023;382:eadi1407. [PMID: 38127734 DOI: 10.1126/science.adi1407] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 11/09/2023] [Indexed: 12/23/2023]

Affiliation(s)

Brent A Koscher Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Richard B Canty Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Matthew A McDonald Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Kevin P Greenman Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Charles J McGill Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Camille L Bilodeau Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Wengong Jin Broad Institute of MIT and Harvard, Cambridge, MA, USA
Haoyang Wu Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Florence H Vermeire Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Brooke Jin Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Travis Hart Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Timothy Kulesza Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Shih-Cheng Li Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Tommi S Jaakkola Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
Regina Barzilay Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
Rafael Gómez-Bombarelli Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
William H Green Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Klavs F Jensen Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA

Collapse

Heid E, Probst D, Green WH, Madsen GKH. EnzymeMap: curation, validation and data-driven prediction of enzymatic reactions. Chem Sci 2023;14:14229-14242. [PMID: 38098707 PMCID: PMC10718068 DOI: 10.1039/d3sc02048g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 11/21/2023] [Indexed: 12/17/2023] Open

Toniato A, Vaucher AC, Lehmann MM, Luksch T, Schwaller P, Stenta M, Laino T. Fast Customization of Chemical Language Models to Out-of-Distribution Data Sets. CHEMISTRY OF MATERIALS : A PUBLICATION OF THE AMERICAN CHEMICAL SOCIETY 2023;35:8806-8815. [PMID: 38027545 PMCID: PMC10653079 DOI: 10.1021/acs.chemmater.3c01406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/09/2023] [Accepted: 10/09/2023] [Indexed: 12/01/2023]

Dolfus U, Briem H, Gutermuth T, Rarey M. Full Modification Control over Retrosynthetic Routes for Guided Optimization of Lead Structures. J Chem Inf Model 2023;63:6587-6597. [PMID: 37910814 DOI: 10.1021/acs.jcim.3c01155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2023]

Wang X, Hsieh CY, Yin X, Wang J, Li Y, Deng Y, Jiang D, Wu Z, Du H, Chen H, Li Y, Liu H, Wang Y, Luo P, Hou T, Yao X. Generic Interpretable Reaction Condition Predictions with Open Reaction Condition Datasets and Unsupervised Learning of Reaction Center. RESEARCH (WASHINGTON, D.C.) 2023;6:0231. [PMID: 37849643 PMCID: PMC10578430 DOI: 10.34133/research.0231] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 08/29/2023] [Indexed: 10/19/2023]

Affiliation(s)

Xiaorui Wang Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang310018, China
Chang-Yu Hsieh Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
Xiaodan Yin Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang310018, China
Jike Wang Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang310018, China
Yuquan Li College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou, 730000, China
Yafeng Deng CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang310018, China
Dejun Jiang Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang310018, China
Zhenxing Wu Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang310018, China
Hongyan Du Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
Hongming Chen Center of Chemistry and Chemical Biology, Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou 510530, China
Yun Li College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou, 730000, China
Huanxiang Liu Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
Yuwei Wang College of Pharmacy, Shaanxi University of Chinese Medicine, Xianyang, Shaanxi, 712044, China
Pei Luo Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China
Tingjun Hou Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
Xiaojun Yao Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China

Collapse

Stanley M, Segler M. Fake it until you make it? Generative de novo design and virtual screening of synthesizable molecules. Curr Opin Struct Biol 2023;82:102658. [PMID: 37473637 DOI: 10.1016/j.sbi.2023.102658] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 07/22/2023]

Kreutter D, Reymond JL. Multistep retrosynthesis combining a disconnection aware triple transformer loop with a route penalty score guided tree search. Chem Sci 2023;14:9959-9969. [PMID: 37736648 PMCID: PMC10510629 DOI: 10.1039/d3sc01604h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 08/30/2023] [Indexed: 09/23/2023] Open

Kim S, Schroeder CM, Jackson NE. Open Macromolecular Genome: Generative Design of Synthetically Accessible Polymers. ACS POLYMERS AU 2023;3:318-330. [PMID: 37576712 PMCID: PMC10416319 DOI: 10.1021/acspolymersau.3c00003] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 03/13/2023] [Accepted: 03/14/2023] [Indexed: 03/31/2023]

Wang H, Fu T, Du Y, Gao W, Huang K, Liu Z, Chandak P, Liu S, Van Katwyk P, Deac A, Anandkumar A, Bergen K, Gomes CP, Ho S, Kohli P, Lasenby J, Leskovec J, Liu TY, Manrai A, Marks D, Ramsundar B, Song L, Sun J, Tang J, Veličković P, Welling M, Zhang L, Coley CW, Bengio Y, Zitnik M. Scientific discovery in the age of artificial intelligence. Nature 2023;620:47-60. [PMID: 37532811 DOI: 10.1038/s41586-023-06221-2] [Citation(s) in RCA: 69] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 05/16/2023] [Indexed: 08/04/2023]

Affiliation(s)

Hanchen Wang Department of Engineering, University of Cambridge, Cambridge, UK Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA Department of Research and Early Development, Genentech Inc, South San Francisco, CA, USA Department of Computer Science, Stanford University, Stanford, CA, USA
Tianfan Fu Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
Yuanqi Du Department of Computer Science, Cornell University, Ithaca, NY, USA
Wenhao Gao Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Kexin Huang Department of Computer Science, Stanford University, Stanford, CA, USA
Ziming Liu Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA
Payal Chandak Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA, USA
Shengchao Liu Mila - Quebec AI Institute, Montreal, Quebec, Canada Université de Montréal, Montreal, Quebec, Canada
Peter Van Katwyk Department of Earth, Environmental and Planetary Sciences, Brown University, Providence, RI, USA Data Science Institute, Brown University, Providence, RI, USA
Andreea Deac Mila - Quebec AI Institute, Montreal, Quebec, Canada Université de Montréal, Montreal, Quebec, Canada
Anima Anandkumar Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA NVIDIA, Santa Clara, CA, USA
Karianne Bergen Department of Earth, Environmental and Planetary Sciences, Brown University, Providence, RI, USA Data Science Institute, Brown University, Providence, RI, USA
Carla P Gomes Department of Computer Science, Cornell University, Ithaca, NY, USA
Shirley Ho Center for Computational Astrophysics, Flatiron Institute, New York, NY, USA Department of Astrophysical Sciences, Princeton University, Princeton, NJ, USA Department of Physics, Carnegie Mellon University, Pittsburgh, PA, USA Department of Physics and Center for Data Science, New York University, New York, NY, USA
Pushmeet Kohli Google DeepMind, London, UK
Joan Lasenby Department of Engineering, University of Cambridge, Cambridge, UK
Jure Leskovec Department of Computer Science, Stanford University, Stanford, CA, USA
Tie-Yan Liu Microsoft Research, Beijing, China
Arjun Manrai Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Debora Marks Department of Systems Biology, Harvard Medical School, Boston, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA
Bharath Ramsundar Deep Forest Sciences, Palo Alto, CA, USA
Le Song BioMap, Beijing, China Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
Jimeng Sun University of Illinois at Urbana-Champaign, Champaign, IL, USA
Jian Tang Mila - Quebec AI Institute, Montreal, Quebec, Canada HEC Montréal, Montreal, Quebec, Canada CIFAR AI Chair, Toronto, Ontario, Canada
Petar Veličković Google DeepMind, London, UK Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
Max Welling University of Amsterdam, Amsterdam, Netherlands Microsoft Research Amsterdam, Amsterdam, Netherlands
Linfeng Zhang DP Technology, Beijing, China AI for Science Institute, Beijing, China
Connor W Coley Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
Yoshua Bengio Mila - Quebec AI Institute, Montreal, Quebec, Canada Université de Montréal, Montreal, Quebec, Canada
Marinka Zitnik Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. Broad Institute of MIT and Harvard, Cambridge, MA, USA. Harvard Data Science Initiative, Cambridge, MA, USA. Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA.

Collapse

Thakkar A, Vaucher AC, Byekwaso A, Schwaller P, Toniato A, Laino T. Unbiasing Retrosynthesis Language Models with Disconnection Prompts. ACS CENTRAL SCIENCE 2023;9:1488-1498. [PMID: 37529205 PMCID: PMC10390024 DOI: 10.1021/acscentsci.3c00372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Indexed: 08/03/2023]

Kuenneth C, Ramprasad R. polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat Commun 2023;14:4099. [PMID: 37433807 DOI: 10.1038/s41467-023-39868-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 06/28/2023] [Indexed: 07/13/2023] Open

Chenthamarakshan V, Hoffman SC, Owen CD, Lukacik P, Strain-Damerell C, Fearon D, Malla TR, Tumber A, Schofield CJ, Duyvesteyn HME, Dejnirattisai W, Carrique L, Walter TS, Screaton GR, Matviiuk T, Mojsilovic A, Crain J, Walsh MA, Stuart DI, Das P. Accelerating drug target inhibitor discovery with a deep generative foundation model. SCIENCE ADVANCES 2023;9:eadg7865. [PMID: 37343087 DOI: 10.1126/sciadv.adg7865] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 05/17/2023] [Indexed: 06/23/2023]

Affiliation(s)

Vijil Chenthamarakshan IBM Research, Thomas J. Watson Research Center, Yorktown Heights, New York, NY, USA
Samuel C Hoffman IBM Research, Thomas J. Watson Research Center, Yorktown Heights, New York, NY, USA
C David Owen Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
Petra Lukacik Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
Claire Strain-Damerell Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
Daren Fearon Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
Tika R Malla Chemistry Research Laboratory, Department of Chemistry and the Ineos Oxford Institute for Antimicrobial Research, University of Oxford, 12 Mansfield Road, OX1 3TA Oxford, UK
Anthony Tumber Chemistry Research Laboratory, Department of Chemistry and the Ineos Oxford Institute for Antimicrobial Research, University of Oxford, 12 Mansfield Road, OX1 3TA Oxford, UK
Christopher J Schofield Chemistry Research Laboratory, Department of Chemistry and the Ineos Oxford Institute for Antimicrobial Research, University of Oxford, 12 Mansfield Road, OX1 3TA Oxford, UK
Helen M E Duyvesteyn Division of Structural Biology, University of Oxford, The Wellcome Centre for Human Genetics, Headington, Oxford, UK
Wanwisa Dejnirattisai Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
Loic Carrique Division of Structural Biology, University of Oxford, The Wellcome Centre for Human Genetics, Headington, Oxford, UK
Thomas S Walter Division of Structural Biology, University of Oxford, The Wellcome Centre for Human Genetics, Headington, Oxford, UK
Gavin R Screaton Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
Tetiana Matviiuk Enamine Ltd., Chervonotkatska St, 67, Kyiv 02094, Ukraine
Aleksandra Mojsilovic IBM Research, Thomas J. Watson Research Center, Yorktown Heights, New York, NY, USA
Jason Crain IBM Research Europe, Hartree Centre, Daresbury WA4 4AD, UK Department of Biochemistry, University of Oxford, Oxford OX1 3QU, UK
Martin A Walsh Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
David I Stuart Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK Division of Structural Biology, University of Oxford, The Wellcome Centre for Human Genetics, Headington, Oxford, UK
Payel Das IBM Research, Thomas J. Watson Research Center, Yorktown Heights, New York, NY, USA

Collapse

Schilter O, Vaucher A, Schwaller P, Laino T. Designing catalysts with deep generative models and computational data. A case study for Suzuki cross coupling reactions. DIGITAL DISCOVERY 2023;2:728-735. [PMID: 37312682 PMCID: PMC10259369 DOI: 10.1039/d2dd00125j] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 02/22/2023] [Indexed: 06/15/2023]

Toniato A, Unsleber JP, Vaucher AC, Weymuth T, Probst D, Laino T, Reiher M. Quantum chemical data generation as fill-in for reliability enhancement of machine-learning reaction and retrosynthesis planning. DIGITAL DISCOVERY 2023;2:663-673. [PMID: 37312681 PMCID: PMC10259370 DOI: 10.1039/d3dd00006k] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 03/09/2023] [Indexed: 06/15/2023]

Ucak UV, Ashyrmamatov I, Lee J. Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization. J Cheminform 2023;15:55. [PMID: 37248531 PMCID: PMC10228139 DOI: 10.1186/s13321-023-00725-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 05/14/2023] [Indexed: 05/31/2023] Open

Zhong W, Yang Z, Chen CYC. Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nat Commun 2023;14:3009. [PMID: 37230985 DOI: 10.1038/s41467-023-38851-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 05/17/2023] [Indexed: 05/27/2023] Open

Hatakeyama-Sato K, Uchima Y, Kashikawa T, Kimura K, Oyaizu K. Extracting higher-conductivity designs for solid polymer electrolytes by quantum-inspired annealing. RSC Adv 2023;13:14651-14659. [PMID: 37197684 PMCID: PMC10183718 DOI: 10.1039/d3ra01982a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 05/04/2023] [Indexed: 05/19/2023] Open

Fang L, Li J, Zhao M, Tan L, Lou JG. Single-step retrosynthesis prediction by leveraging commonly preserved substructures. Nat Commun 2023;14:2446. [PMID: 37117216 PMCID: PMC10147675 DOI: 10.1038/s41467-023-37969-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 03/31/2023] [Indexed: 04/30/2023] Open

Liu Q, Tang K, Zhang L, Du J, Meng Q. Computer‐assisted synthetic planning considering reaction kinetics based on transition state automated generation method. AIChE J 2023. [DOI: 10.1002/aic.18092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]

Pasquini M, Stenta M. LinChemIn: SynGraph-a data model and a toolkit to analyze and compare synthetic routes. J Cheminform 2023;15:41. [PMID: 37005691 PMCID: PMC10067316 DOI: 10.1186/s13321-023-00714-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 03/20/2023] [Indexed: 04/04/2023] Open

Abstract

BACKGROUND

The increasing amount of chemical reaction data makes traditional ways to navigate its corpus less effective, while the demand for novel approaches and instruments is rising. Recent data science and machine learning techniques support the development of new ways to extract value from the available reaction data. On the one side, Computer-Aided Synthesis Planning tools can predict synthetic routes in a model-driven approach; on the other side, experimental routes can be extracted from the Network of Organic Chemistry, in which reaction data are linked in a network. In this context, the need to combine, compare and analyze synthetic routes generated by different sources arises naturally.

RESULTS

Here we present LinChemIn, a python toolkit that allows chemoinformatics operations on synthetic routes and reaction networks. Wrapping some third-party packages for handling graph arithmetic and chemoinformatics and implementing new data models and functionalities, LinChemIn allows the interconversion between data formats and data models and enables route-level analysis and operations, including route comparison and descriptors calculation. Object-Oriented Design principles inspire the software architecture, and the modules are structured to maximize code reusability and support code testing and refactoring. The code structure should facilitate external contributions, thus encouraging open and collaborative software development.

CONCLUSIONS

The current version of LinChemIn allows users to combine synthetic routes generated from various tools and analyze them, and constitutes an open and extensible framework capable of incorporating contributions from the community and fostering scientific discussion. Our roadmap envisages the development of sophisticated metrics for routes evaluation, a multi-parameter scoring system, and the implementation of an entire "ecosystem" of functionalities operating on synthetic routes. LinChemIn is freely available at https://github.com/syngenta/linchemin.

Collapse

Brinkhaus HO, Rajan K, Schaub J, Zielesny A, Steinbeck C. Open data and algorithms for open science in AI-driven molecular informatics. Curr Opin Struct Biol 2023;79:102542. [PMID: 36805192 DOI: 10.1016/j.sbi.2023.102542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 01/10/2023] [Accepted: 01/13/2023] [Indexed: 02/19/2023]

Jaume-Santero F, Bornet A, Valery A, Naderi N, Vicente Alvarez D, Proios D, Yazdani A, Bournez C, Fessard T, Teodoro D. Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios. J Chem Inf Model 2023;63:1914-1924. [PMID: 36952584 PMCID: PMC10091402 DOI: 10.1021/acs.jcim.2c01407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]

Vogel G, Schulze Balhorn L, Schweidtmann AM. Learning from flowsheets: A generative transformer model for autocompletion of flowsheets. Comput Chem Eng 2023. [DOI: 10.1016/j.compchemeng.2023.108162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Yu T, Boob AG, Volk MJ, Liu X, Cui H, Zhao H. Machine learning-enabled retrobiosynthesis of molecules. Nat Catal 2023. [DOI: 10.1038/s41929-022-00909-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]

Singh S, Sunoj RB. Molecular Machine Learning for Chemical Catalysis: Prospects and Challenges. Acc Chem Res 2023;56:402-412. [PMID: 36715248 DOI: 10.1021/acs.accounts.2c00801] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

Abstract

ConspectusIn the domain of reaction development, one aims to obtain higher efficacies as measured in terms of yield and/or selectivities. During the empirical cycles, an admixture of outcomes from low to high yields/selectivities is expected. While it is not easy to identify all of the factors that might impact the reaction efficiency, complex and nonlinear dependence on the nature of reactants, catalysts, solvents, etc. is quite likely. Developmental stages of newer reactions would typically offer a few hundreds of samples with variations in participating molecules and/or reaction conditions. These "observations" and their "output" can be harnessed as valuable labeled data for developing molecular machine learning (ML) models. Once a robust ML model is built for a specific reaction under development, it can predict the reaction outcome for any new choice of substrates/catalyst in a few seconds/minutes and thus can expedite the identification of promising candidates for experimental validation. Recent years have witnessed impressive applications of ML in the molecular world, most of them aimed at predicting important chemical or biological properties. We believe that an integration of effective ML workflows can be made richly beneficial to reaction discovery.As with any new technology, direct adaptation of ML as used in well-developed domains, such as natural language processing (NLP) and image recognition, is unlikely to succeed in reaction discovery. Some of the challenges stem from ineffective featurization of the molecular space, unavailability of quality data and its distribution, in making the right choice of ML model and its technically robust deployment. It shall be noted that there is no universal ML model suitable for an inherently high-dimensional problem such as chemical reactions. Given these backgrounds, rendering ML tools conducive for reactions is an exciting as well as challenging endeavor at the same time. With the increased availability of efficient ML algorithms, we focused on tapping their potential for small-data reaction discovery (a few hundreds to thousands of samples).In this Account, we describe both feature engineering and feature learning approaches for molecular ML as applied to diverse reactions of high contemporary interest. Among these, catalytic asymmetric hydrogenation of imines/alkenes, β-C(sp³)-H bond functionalization, and relay Heck reaction employed a feature engineering approach using the quantum-chemically derived physical organic descriptors as the molecular features─all designed to predict the enantioselectivity. The selection of molecular features to customize it for a reaction of interest is described, along with emphasizing the chemical insights that could be gathered through the use of such features. Feature learning methods for predicting the yield of Buchwald-Hartwig cross-coupling, deoxyfluorination of alcohols, and enantioselectivity of N,S-acetal formation are found to offer excellent predictions. We propose a transfer learning protocol, wherein an ML model such as a language model is trained on a large number of molecules (10⁵-10⁶) and fine-tuned on a focused library of target task reactions, as an effective alternative for small-data reaction discovery (10²-10³ reactions). The exploitation of deep neural network latent space as a method for generative tasks to identify useful substrates for a reaction is demonstrated as a promising strategy.

Collapse

A Review on Artificial Intelligence Enabled Design, Synthesis, and Process Optimization of Chemical Products for Industry 4.0. Processes (Basel) 2023. [DOI: 10.3390/pr11020330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open

Skoraczyński G, Kitlas M, Miasojedow B, Gambin A. Critical assessment of synthetic accessibility scores in computer-assisted synthesis planning. J Cheminform 2023;15:6. [PMID: 36641473 PMCID: PMC9840255 DOI: 10.1186/s13321-023-00678-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 01/04/2023] [Indexed: 01/15/2023] Open

Abstract

Modern computer-assisted synthesis planning tools provide strong support for this problem. However, they are still limited by computational complexity. This limitation may be overcome by scoring the synthetic accessibility as a pre-retrosynthesis heuristic. A wide range of machine learning scoring approaches is available, however, their applicability and correctness were studied to a limited extent. Moreover, there is a lack of critical assessment of synthetic accessibility scores with common test conditions.In the present work, we assess if synthetic accessibility scores can reliably predict the outcomes of retrosynthesis planning. Using a specially prepared compounds database, we examine the outcomes of the retrosynthetic tool AiZynthFinder. We test whether synthetic accessibility scores: SAscore, SYBA, SCScore, and RAscore accurately predict the results of retrosynthesis planning. Furthermore, we investigate if synthetic accessibility scores can speed up retrosynthesis planning by better prioritizing explored partial synthetic routes and thus reducing the size of the search space. For that purpose, we analyze the AiZynthFinder partial solutions search trees, their structure, and complexity parameters, such as the number of nodes, or treewidth.We confirm that synthetic accessibility scores in most cases well discriminate feasible molecules from infeasible ones and can be potential boosters of retrosynthesis planning tools. Moreover, we show the current challenges of designing computer-assisted synthesis planning tools. We conclude that hybrid machine learning and human intuition-based synthetic accessibility scores can efficiently boost the effectiveness of computer-assisted retrosynthesis planning, however, they need to be carefully crafted for retrosynthesis planning algorithms.The source code of this work is publicly available at https://github.com/grzsko/ASAP .

Collapse

Moret M, Pachon Angona I, Cotos L, Yan S, Atz K, Brunner C, Baumgartner M, Grisoni F, Schneider G. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat Commun 2023;14:114. [PMID: 36611029 PMCID: PMC9825622 DOI: 10.1038/s41467-022-35692-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 12/19/2022] [Indexed: 01/09/2023] Open

Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023;14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open

Lim PK, Julca I, Mutwil M. Redesigning plant specialized metabolism with supervised machine learning using publicly available reactome data. Comput Struct Biotechnol J 2023;21:1639-1650. [PMID: 36874159 PMCID: PMC9976193 DOI: 10.1016/j.csbj.2023.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 01/12/2023] [Accepted: 01/12/2023] [Indexed: 01/19/2023] Open