1
|
Kreutter D, Reymond JL. Chemoenzymatic multistep retrosynthesis with transformer loops. Chem Sci 2024:d4sc02408g. [PMID: 39416295 PMCID: PMC11474389 DOI: 10.1039/d4sc02408g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 10/07/2024] [Indexed: 10/19/2024] Open
Abstract
Integrating enzymatic reactions into computer-aided synthesis planning (CASP) should help devise more selective, economical, and greener synthetic routes. Herein we report the triple-transformer loop algorithm with biocatalysis (TTLAB) as a new CASP tool for chemo-enzymatic multistep retrosynthesis. Single-step retrosyntheses are performed using two triple transformer loops (TTL), one trained with chemical reactions from the US Patent Office (USPTO-TTL), the second one obtained by multitask transfer learning combining the USPTO dataset with preparative biotransformations from the literature (ENZR-TTL). Each TTL performs single-step retrosynthesis independently by tagging potential reactive sites in the product, predicting for each site possible starting materials (T1) and reagents or enzymes (T2), and validating the predictions via a forward transformer (T3). TTLAB combines predictions from both TTLs to explore multistep sequences using a heuristic best-first tree search and propose short routes from commercial building blocks including enantioselective biocatalytic steps. TTLAB can be used to assist chemoenzymatic route design.
Collapse
Affiliation(s)
- David Kreutter
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| |
Collapse
|
2
|
Chen LY, Li YP. Machine learning-guided strategies for reaction conditions design and optimization. Beilstein J Org Chem 2024; 20:2476-2492. [PMID: 39376489 PMCID: PMC11457048 DOI: 10.3762/bjoc.20.212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 09/19/2024] [Indexed: 10/09/2024] Open
Abstract
This review surveys the recent advances and challenges in predicting and optimizing reaction conditions using machine learning techniques. The paper emphasizes the importance of acquiring and processing large and diverse datasets of chemical reactions, and the use of both global and local models to guide the design of synthetic processes. Global models exploit the information from comprehensive databases to suggest general reaction conditions for new reactions, while local models fine-tune the specific parameters for a given reaction family to improve yield and selectivity. The paper also identifies the current limitations and opportunities in this field, such as the data quality and availability, and the integration of high-throughput experimentation. The paper demonstrates how the combination of chemical engineering, data science, and ML algorithms can enhance the efficiency and effectiveness of reaction conditions design, and enable novel discoveries in synthetic chemistry.
Collapse
Affiliation(s)
- Lung-Yi Chen
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
- Taiwan International Graduate Program on Sustainable Chemical Science and Technology (TIGP-SCST), No. 128, Sec. 2, Academia Road, Taipei 11529, Taiwan
| |
Collapse
|
3
|
Saifi I, Bhat BA, Hamdani SS, Bhat UY, Lobato-Tapia CA, Mir MA, Dar TUH, Ganie SA. Artificial intelligence and cheminformatics tools: a contribution to the drug development and chemical science. J Biomol Struct Dyn 2024; 42:6523-6541. [PMID: 37434311 DOI: 10.1080/07391102.2023.2234039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 07/03/2023] [Indexed: 07/13/2023]
Abstract
In the ever-evolving field of drug discovery, the integration of Artificial Intelligence (AI) and Machine Learning (ML) with cheminformatics has proven to be a powerful combination. Cheminformatics, which combines the principles of computer science and chemistry, is used to extract chemical information and search compound databases, while the application of AI and ML allows for the identification of potential hit compounds, optimization of synthesis routes, and prediction of drug efficacy and toxicity. This collaborative approach has led to the discovery, preclinical evaluations and approval of over 70 drugs in recent years. To aid researchers in the pursuit of new drugs, this article presents a comprehensive list of databases, datasets, predictive and generative models, scoring functions and web platforms that have been launched between 2021 and 2022. These resources provide a wealth of information and tools for computer-assisted drug development, and are a valuable asset for those working in the field of cheminformatics. Overall, the integration of AI, ML and cheminformatics has greatly advanced the drug discovery process and continues to hold great potential for the future. As new resources and technologies become available, we can expect to see even more groundbreaking discoveries and advancements in these fields.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ifra Saifi
- Chaudhary Charan Singh University, Meerut, Uttar Pradesh, India
| | - Basharat Ahmad Bhat
- Department of Bioresources, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | - Syed Suhail Hamdani
- Department of Bioresources, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | - Umar Yousuf Bhat
- Department of Zoology, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | | | - Mushtaq Ahmad Mir
- Department of Clinical Laboratory Sciences, College of Applied Medical Science, King Khalid University, KSA, Saudi Arabia
| | - Tanvir Ul Hasan Dar
- Department of Biotechnology, School of Biosciences and Biotechnology, BGSB University, Rajouri, India
| | - Showkat Ahmad Ganie
- Department of Clinical Biochemistry, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| |
Collapse
|
4
|
Yang W. Beyond algorithms: The human touch machine-generated titles for enhancing click-through rates on social media. PLoS One 2024; 19:e0306639. [PMID: 38995930 PMCID: PMC11244827 DOI: 10.1371/journal.pone.0306639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 06/20/2024] [Indexed: 07/14/2024] Open
Abstract
Artificial intelligence (AI) has the potential to revolutionize various domains by automating language-driven tasks. This study evaluates the effectiveness of an AI-assisted methodology, called the "POP Title AI Five-Step Optimization Method," in optimizing content titles on the RED social media platform. By leveraging advancements in natural language generation, this methodology aims to enhance the impact of titles by incorporating emotional sophistication and cultural proficiency, addressing existing gaps in AI capabilities. The methodology entails training generative models using human-authored examples that align with the aspirations of the target audience. By incorporating popular keywords derived from user searches, the relevance and discoverability of titles are enhanced. Audience-centric filtering is subsequently employed to further refine the generated outputs. Furthermore, human oversight is introduced to provide essential intuition that AI systems alone may lack. A total of one thousand titles, generated by AI, underwent linguistic and engagement analyses. Qualitatively, 65% of the titles exhibited intrigue and conveyed meaning comparable to those generated by humans. However, attaining full emotional sophistication remained a challenge. Quantitatively, titles emphasizing curiosity and contrast demonstrated positive correlations with user interactions, thus validating the efficacy of these techniques. Consequently, the machine-generated titles achieved coherence on par with 65% of human-generated titles, signifying significant progress and potential for further refinement. Nevertheless, achieving socio-cultural awareness is vital to match human understanding across diverse contexts, thus presenting a critical avenue for future improvement in the methodology. Continuous advancements in AI can enhance adaptability and reduce subjectivity by promoting flexibility instead of relying solely on manual reviews. As AI gains a deeper understanding of humanity, opportunities for its application across various industries through experiential reasoning abilities emerge. This case study exemplifies the nurturing of AI's potential by refining its skills through an evolutionary process.
Collapse
Affiliation(s)
- Wenyu Yang
- Foki Media Co., Ltd. Hangzhou, Hangzhou, Zhejiang Province, China
| |
Collapse
|
5
|
Saigiridharan L, Hassen AK, Lai H, Torren-Peraire P, Engkvist O, Genheden S. AiZynthFinder 4.0: developments based on learnings from 3 years of industrial application. J Cheminform 2024; 16:57. [PMID: 38778382 PMCID: PMC11112899 DOI: 10.1186/s13321-024-00860-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 05/15/2024] [Indexed: 05/25/2024] Open
Abstract
We present an updated overview of the AiZynthFinder package for retrosynthesis planning. Since the first version was released in 2020, we have added a substantial number of new features based on user feedback. Feature enhancements include policies for filter reactions, support for any one-step retrosynthesis model, a scoring framework and several additional search algorithms. To exemplify the typical use-cases of the software and highlight some learnings, we perform a large-scale analysis on several hundred thousand target molecules from diverse sources. This analysis looks at for instance route shape, stock usage and exploitation of reaction space, and points out strengths and weaknesses of our retrosynthesis approach. The software is released as open-source for educational purposes as well as to provide a reference implementation of the core algorithms for synthesis prediction. We hope that releasing the software as open-source will further facilitate innovation in developing novel methods for synthetic route prediction. AiZynthFinder is a fast, robust and extensible open-source software and can be downloaded from https://github.com/MolecularAI/aizynthfinder .
Collapse
Affiliation(s)
| | - Alan Kai Hassen
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
| | - Helen Lai
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Paula Torren-Peraire
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Zentrum München, Neuherberg, Germany
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Samuel Genheden
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| |
Collapse
|
6
|
M. Bran A, Cox S, Schilter O, Baldassari C, White AD, Schwaller P. Augmenting large language models with chemistry tools. NAT MACH INTELL 2024; 6:525-535. [PMID: 38799228 PMCID: PMC11116106 DOI: 10.1038/s42256-024-00832-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 03/27/2024] [Indexed: 05/29/2024]
Abstract
Large language models (LLMs) have shown strong performance in tasks across domains but struggle with chemistry-related problems. These models also lack access to external knowledge sources, limiting their usefulness in scientific applications. We introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery and materials design. By integrating 18 expert-designed tools and using GPT-4 as the LLM, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our agent autonomously planned and executed the syntheses of an insect repellent and three organocatalysts and guided the discovery of a novel chromophore. Our evaluation, including both LLM and expert assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Our work not only aids expert chemists and lowers barriers for non-experts but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.
Collapse
Affiliation(s)
- Andres M. Bran
- Laboratory of Artificial Chemical Intelligence (LIAC), ISIC, EPFL, Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, EPFL, Lausanne, Switzerland
| | - Sam Cox
- Department of Chemical Engineering, University of Rochester, Rochester, NY USA
- FutureHouse, San Francisco, CA USA
| | - Oliver Schilter
- Laboratory of Artificial Chemical Intelligence (LIAC), ISIC, EPFL, Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, EPFL, Lausanne, Switzerland
- Accelerated Discovery, IBM Research – Europe, Rüschlikon, Switzerland
| | - Carlo Baldassari
- Accelerated Discovery, IBM Research – Europe, Rüschlikon, Switzerland
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, NY USA
- FutureHouse, San Francisco, CA USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC), ISIC, EPFL, Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, EPFL, Lausanne, Switzerland
| |
Collapse
|
7
|
Dobbelaere MR, Lengyel I, Stevens CV, Van Geem KM. Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices. J Cheminform 2024; 16:37. [PMID: 38553720 PMCID: PMC10980627 DOI: 10.1186/s13321-024-00834-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/23/2024] [Indexed: 04/02/2024] Open
Abstract
The challenge of devising pathways for organic synthesis remains a central issue in the field of medicinal chemistry. Over the span of six decades, computer-aided synthesis planning has given rise to a plethora of potent tools for formulating synthetic routes. Nevertheless, a significant expert task still looms: determining the appropriate solvent, catalyst, and reagents when provided with a set of reactants to achieve and optimize the desired product for a specific step in the synthesis process. Typically, chemists identify key functional groups and rings that exert crucial influences at the reaction center, classify reactions into categories, and may assign them names. This research introduces Rxn-INSIGHT, an open-source algorithm based on the bond-electron matrix approach, with the purpose of automating this endeavor. Rxn-INSIGHT not only streamlines the process but also facilitates extensive querying of reaction databases, effectively replicating the thought processes of an organic chemist. The core functions of the algorithm encompass the classification and naming of reactions, extraction of functional groups, rings, and scaffolds from the involved chemical entities. The provision of reaction condition recommendations based on the similarity and prevalence of reactions eventually arises as a side application. The performance of our rule-based model has been rigorously assessed against a carefully curated benchmark dataset, exhibiting an accuracy rate exceeding 90% in reaction classification and surpassing 95% in reaction naming. Notably, it has been discerned that a pivotal factor in selecting analogous reactions lies in the analysis of ring structures participating in the reactions. An examination of ring structures within the USPTO chemical reaction database reveals that with just 35 unique rings, a remarkable 75% of all rings found in nearly 1 million products can be encompassed. Furthermore, Rxn-INSIGHT is proficient in suggesting appropriate choices for solvents, catalysts, and reagents in entirely novel reactions, all within the span of a second, utilizing nothing more than an everyday laptop.
Collapse
Affiliation(s)
- Maarten R Dobbelaere
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
| | - István Lengyel
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
- ChemInsights LLC, Dover, DE, 19901, USA
| | - Christian V Stevens
- SynBioC Research Group, Department of Green Chemistry and Technology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Kevin M Van Geem
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium.
| |
Collapse
|
8
|
Bi X, Lin L, Chen Z, Ye J. Artificial Intelligence for Surface-Enhanced Raman Spectroscopy. SMALL METHODS 2024; 8:e2301243. [PMID: 37888799 DOI: 10.1002/smtd.202301243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Surface-enhanced Raman spectroscopy (SERS), well acknowledged as a fingerprinting and sensitive analytical technique, has exerted high applicational value in a broad range of fields including biomedicine, environmental protection, food safety among the others. In the endless pursuit of ever-sensitive, robust, and comprehensive sensing and imaging, advancements keep emerging in the whole pipeline of SERS, from the design of SERS substrates and reporter molecules, synthetic route planning, instrument refinement, to data preprocessing and analysis methods. Artificial intelligence (AI), which is created to imitate and eventually exceed human behaviors, has exhibited its power in learning high-level representations and recognizing complicated patterns with exceptional automaticity. Therefore, facing up with the intertwining influential factors and explosive data size, AI has been increasingly leveraged in all the above-mentioned aspects in SERS, presenting elite efficiency in accelerating systematic optimization and deepening understanding about the fundamental physics and spectral data, which far transcends human labors and conventional computations. In this review, the recent progresses in SERS are summarized through the integration of AI, and new insights of the challenges and perspectives are provided in aim to better gear SERS toward the fast track.
Collapse
Affiliation(s)
- Xinyuan Bi
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - Li Lin
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - Zhou Chen
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - Jian Ye
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
- Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200127, P. R. China
- Shanghai Key Laboratory of Gynecologic Oncology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, P. R. China
| |
Collapse
|
9
|
Dolfus U, Briem H, Gutermuth T, Rarey M. Full Modification Control over Retrosynthetic Routes for Guided Optimization of Lead Structures. J Chem Inf Model 2023; 63:6587-6597. [PMID: 37910814 DOI: 10.1021/acs.jcim.3c01155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2023]
Abstract
Synthesizability is essential for compounds designed in silico. Regardless, synthetic accessibility is often considered only as an afterthought in the design and optimization process. In addition, the trend with modern computer-aided drug design methods is going toward full automation and away from the possibility of incorporating user knowledge. With this work, we present the second major release of our software tool, Synthesia, for synthesis-aware lead structure modification, where the user's expertise is now fully utilized. A provided retrosynthetic route is used as a pathway to guide structural modifications that introduce desired structural changes in the target compound. Moreover, the approach allows the user to define the exact position or component in the retrosynthetic route, which should be modified, further integrating the user's expert knowledge. This paper describes the functionality of Synthesia, its basic concepts, and several application scenarios ranging from simple examples to a comparison of the effects of the different exchange functions to an analysis of a set of bioisosteric linker structures, highlighting potential synthetically feasible replacements.
Collapse
Affiliation(s)
- Uschi Dolfus
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraβe 43, 20146 Hamburg, Germany
| | - Hans Briem
- Bayer AG, Research & Development, Pharmaceuticals, Computational Molecular Design Berlin, Building S110, 711, 13342 Berlin, Germany
| | - Torben Gutermuth
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraβe 43, 20146 Hamburg, Germany
| | - Matthias Rarey
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraβe 43, 20146 Hamburg, Germany
| |
Collapse
|
10
|
Schrier J, Norquist AJ, Buonassisi T, Brgoch J. In Pursuit of the Exceptional: Research Directions for Machine Learning in Chemical and Materials Science. J Am Chem Soc 2023; 145:21699-21716. [PMID: 37754929 DOI: 10.1021/jacs.3c04783] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2023]
Abstract
Exceptional molecules and materials with one or more extraordinary properties are both technologically valuable and fundamentally interesting, because they often involve new physical phenomena or new compositions that defy expectations. Historically, exceptionality has been achieved through serendipity, but recently, machine learning (ML) and automated experimentation have been widely proposed to accelerate target identification and synthesis planning. In this Perspective, we argue that the data-driven methods commonly used today are well-suited for optimization but not for the realization of new exceptional materials or molecules. Finding such outliers should be possible using ML, but only by shifting away from using traditional ML approaches that tweak the composition, crystal structure, or reaction pathway. We highlight case studies of high-Tc oxide superconductors and superhard materials to demonstrate the challenges of ML-guided discovery and discuss the limitations of automation for this task. We then provide six recommendations for the development of ML methods capable of exceptional materials discovery: (i) Avoid the tyranny of the middle and focus on extrema; (ii) When data are limited, qualitative predictions that provide direction are more valuable than interpolative accuracy; (iii) Sample what can be made and how to make it and defer optimization; (iv) Create room (and look) for the unexpected while pursuing your goal; (v) Try to fill-in-the-blanks of input and output space; (vi) Do not confuse human understanding with model interpretability. We conclude with a description of how these recommendations can be integrated into automated discovery workflows, which should enable the discovery of exceptional molecules and materials.
Collapse
Affiliation(s)
- Joshua Schrier
- Department of Chemistry, Fordham University, The Bronx, New York 10458, United States
| | - Alexander J Norquist
- Department of Chemistry, Haverford College, Haverford, Pennsylvania 19041, United States
| | - Tonio Buonassisi
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Jakoah Brgoch
- Department of Chemistry and Texas Center for Superconductivity, University of Houston, Houston, Texas 77204, United States
| |
Collapse
|
11
|
Kreutter D, Reymond JL. Multistep retrosynthesis combining a disconnection aware triple transformer loop with a route penalty score guided tree search. Chem Sci 2023; 14:9959-9969. [PMID: 37736648 PMCID: PMC10510629 DOI: 10.1039/d3sc01604h] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 08/30/2023] [Indexed: 09/23/2023] Open
Abstract
Computer-aided synthesis planning (CASP) aims to automatically learn organic reactivity from literature and perform retrosynthesis of unseen molecules. CASP systems must learn reactions sufficiently precisely to propose realistic disconnections, while avoiding overfitting to leave room for diverse options, and explore possible routes such as to allow short synthetic sequences to emerge. Herein we report an open-source CASP tool proposing original solutions to both challenges. First, we use a triple transformer loop (TTL) predicting starting materials (T1), reagents (T2), and products (T3) to explore various disconnection sites defined by combining systematic, template-based, and transformer-based tagging procedures. Second, we integrate TTL into a multistep tree search algorithm (TTLA) prioritizing sequences using a route penalty score (RPScore) considering the number of steps, their confidence score, and the simplicity of all intermediates along the route. Our approach favours short synthetic routes to commercial starting materials, as exemplified by retrosynthetic analyses of recently approved drugs.
Collapse
Affiliation(s)
- David Kreutter
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| |
Collapse
|
12
|
Thakkar A, Vaucher AC, Byekwaso A, Schwaller P, Toniato A, Laino T. Unbiasing Retrosynthesis Language Models with Disconnection Prompts. ACS CENTRAL SCIENCE 2023; 9:1488-1498. [PMID: 37529205 PMCID: PMC10390024 DOI: 10.1021/acscentsci.3c00372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Indexed: 08/03/2023]
Abstract
Data-driven approaches to retrosynthesis are limited in user interaction, diversity of their predictions, and recommendation of unintuitive disconnection strategies. Herein, we extend the notions of prompt-based inference in natural language processing to the task of chemical language modeling. We show that by using a prompt describing the disconnection site in a molecule we can steer the model to propose a broader set of precursors, thereby overcoming training data biases in retrosynthetic recommendations and achieving a 39% performance improvement over the baseline. For the first time, the use of a disconnection prompt empowers chemists by giving them greater control over the disconnection predictions, which results in more diverse and creative recommendations. In addition, in place of a human-in-the-loop strategy, we propose a two-stage schema consisting of automatic identification of disconnection sites, followed by prediction of reactant sets, thereby achieving a considerable improvement in class diversity compared with the baseline. The approach is effective in mitigating prediction biases derived from training data. This provides a wider variety of usable building blocks and improves the end user's digital experience. We demonstrate its application to different chemistry domains, from traditional to enzymatic reactions, in which substrate specificity is critical.
Collapse
Affiliation(s)
- Amol Thakkar
- IBM
Research Europe, Saümerstrasse
4, 8803 Rüschlikon, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| | - Alain C. Vaucher
- IBM
Research Europe, Saümerstrasse
4, 8803 Rüschlikon, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| | - Andrea Byekwaso
- IBM
Research Europe, Saümerstrasse
4, 8803 Rüschlikon, Switzerland
| | - Philippe Schwaller
- IBM
Research Europe, Saümerstrasse
4, 8803 Rüschlikon, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| | - Alessandra Toniato
- IBM
Research Europe, Saümerstrasse
4, 8803 Rüschlikon, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| | - Teodoro Laino
- IBM
Research Europe, Saümerstrasse
4, 8803 Rüschlikon, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| |
Collapse
|
13
|
Janet JP, Mervin L, Engkvist O. Artificial intelligence in molecular de novo design: Integration with experiment. Curr Opin Struct Biol 2023; 80:102575. [PMID: 36966692 DOI: 10.1016/j.sbi.2023.102575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 02/09/2023] [Accepted: 02/18/2023] [Indexed: 06/04/2023]
Abstract
In this mini review, we capture the latest progress of applying artificial intelligence (AI) techniques based on deep learning architectures to molecular de novo design with a focus on integration with experimental validation. We will cover the progress and experimental validation of novel generative algorithms, the validation of QSAR models and how AI-based molecular de novo design is starting to become connected with chemistry automation. While progress has been made in the last few years, it is still early days. The experimental validations conducted thus far should be considered proof-of-principle, providing confidence that the field is moving in the right direction.
Collapse
Affiliation(s)
- Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Lewis Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
14
|
Capaldo L, Wen Z, Noël T. A field guide to flow chemistry for synthetic organic chemists. Chem Sci 2023; 14:4230-4247. [PMID: 37123197 PMCID: PMC10132167 DOI: 10.1039/d3sc00992k] [Citation(s) in RCA: 56] [Impact Index Per Article: 56.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 03/15/2023] [Indexed: 03/17/2023] Open
Abstract
Flow chemistry has unlocked a world of possibilities for the synthetic community, but the idea that it is a mysterious "black box" needs to go. In this review, we show that several of the benefits of microreactor technology can be exploited to push the boundaries in organic synthesis and to unleash unique reactivity and selectivity. By "lifting the veil" on some of the governing principles behind the observed trends, we hope that this review will serve as a useful field guide for those interested in diving into flow chemistry.
Collapse
Affiliation(s)
- Luca Capaldo
- Flow Chemistry Group, Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam 1098 XH Amsterdam The Netherlands
| | - Zhenghui Wen
- Flow Chemistry Group, Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam 1098 XH Amsterdam The Netherlands
| | - Timothy Noël
- Flow Chemistry Group, Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam 1098 XH Amsterdam The Netherlands
| |
Collapse
|
15
|
McNair D. Artificial Intelligence and Machine Learning for Lead-to-Candidate Decision-Making and Beyond. Annu Rev Pharmacol Toxicol 2023; 63:77-97. [PMID: 35679624 DOI: 10.1146/annurev-pharmtox-051921-023255] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The use of artificial intelligence (AI) and machine learning (ML) in pharmaceutical research and development has to date focused on research: target identification; docking-, fragment-, and motif-based generation of compound libraries; modeling of synthesis feasibility; rank-ordering likely hits according to structural and chemometric similarity to compounds having known activity and affinity to the target(s); optimizing a smaller library for synthesis and high-throughput screening; and combining evidence from screening to support hit-to-lead decisions. Applying AI/ML methods to lead optimization and lead-to-candidate (L2C) decision-making has shown slower progress, especially regarding predicting absorption, distribution, metabolism, excretion, and toxicology properties. The present review surveys reasons why this is so, reports progress that has occurred in recent years, and summarizes some of the issues that remain. Effective AI/ML tools to derisk L2C and later phases of development are important to accelerate the pharmaceutical development process, ameliorate escalating development costs, and achieve greater success rates.
Collapse
Affiliation(s)
- Douglas McNair
- Global Health, Integrated Development, Bill & Melinda Gates Foundation, Seattle, Washington, USA;
| |
Collapse
|
16
|
Seidenberg JR, Khan AA, Lapkin AA. Boosting autonomous process design and intensification with formalized domain knowledge. Comput Chem Eng 2022. [DOI: 10.1016/j.compchemeng.2022.108097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
17
|
Viet Johansson S, Gummesson Svensson H, Bjerrum E, Schliep A, Haghir Chehreghani M, Tyrchan C, Engkvist O. Using Active Learning to Develop Machine Learning Models for Reaction Yield Prediction. Mol Inform 2022; 41:e2200043. [PMID: 35732584 DOI: 10.1002/minf.202200043] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 06/22/2022] [Indexed: 01/05/2023]
Abstract
Computer aided synthesis planning, suggesting synthetic routes for molecules of interest, is a rapidly growing field. The machine learning methods used are often dependent on access to large datasets for training, but finite experimental budgets limit how much data can be obtained from experiments. This suggests the use of schemes for data collection such as active learning, which identifies the data points of highest impact for model accuracy, and which has been used in recent studies with success. However, little has been done to explore the robustness of the methods predicting reaction yield when used together with active learning to reduce the amount of experimental data needed for training. This study aims to investigate the influence of machine learning algorithms and the number of initial data points on reaction yield prediction for two public high-throughput experimentation datasets. Our results show that active learning based on output margin reached a pre-defined AUROC faster than random sampling on both datasets. Analysis of feature importance of the trained machine learning models suggests active learning had a larger influence on the model accuracy when only a few features were important for the model prediction.
Collapse
Affiliation(s)
- Simon Viet Johansson
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, SE-431 83, Mölndal, Sweden.,Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, SE-412 96, Göteborg, Sweden
| | - Hampus Gummesson Svensson
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, SE-431 83, Mölndal, Sweden.,Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, SE-412 96, Göteborg, Sweden
| | - Esben Bjerrum
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, SE-431 83, Mölndal, Sweden
| | - Alexander Schliep
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, SE-412 96, Göteborg, Sweden
| | - Morteza Haghir Chehreghani
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, SE-412 96, Göteborg, Sweden
| | - Christian Tyrchan
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, SE-431 83, Mölndal, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, SE-431 83, Mölndal, Sweden.,Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, SE-412 96, Göteborg, Sweden
| |
Collapse
|
18
|
Dolfus U, Briem H, Rarey M. Synthesis-Aware Generation of Structural Analogues. J Chem Inf Model 2022; 62:3565-3576. [PMID: 35867908 DOI: 10.1021/acs.jcim.2c00246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In modern drug design, one of the main issues is the optimization of an initial lead structure toward a drug candidate by modifying specific properties in the desired direction. The synthetic feasibility of the target structure is often neglected during this process, resulting in structures with low or suboptimal synthetic accessibility. In this work, we present a novel approach for synthesis-aware lead optimization called Synthesia. In contrast to the traditional approaches, Synthesia integrates the preservation of the synthesizability of the target structure into the lead structure modification process. Synthesia is able to create structural diversity for a lead structure that matches user-defined molecular properties without losing the applicability of a particular synthetic pathway. The methodology is validated by demonstrating that Synthesia is capable of providing structural analogues of DrugBank compounds that meet generic modification goals and maintain their synthetic pathways. In addition, Synthesia is used to cluster compounds from two different patent structure series (CDK7, Daurismo) according to their compatibility with the same synthetic pathways, maximizing the synthetic efficiency and providing an initial estimation of the effort of synthesizing the entire series. Altogether, we demonstrate Synthesia's ability to modify compound properties while maintaining in silico synthesizability.
Collapse
Affiliation(s)
- Uschi Dolfus
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, Hamburg, 20146, Germany
| | - Hans Briem
- Bayer AG, Research & Development, Pharmaceuticals, Computational Molecular Design Berlin, Building S110, 711, Berlin, 13342, Germany
| | - Matthias Rarey
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, Hamburg, 20146, Germany
| |
Collapse
|
19
|
Fey N, Lynam JM. Computational mechanistic study in organometallic catalysis: Why prediction is still a challenge. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1590] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Natalie Fey
- School of Chemistry University of Bristol, Cantock's Close Bristol UK
| | | |
Collapse
|
20
|
Nambiar AK, Breen CP, Hart T, Kulesza T, Jamison TF, Jensen KF. Bayesian Optimization of Computer-Proposed Multistep Synthetic Routes on an Automated Robotic Flow Platform. ACS CENTRAL SCIENCE 2022; 8:825-836. [PMID: 35756374 PMCID: PMC9228554 DOI: 10.1021/acscentsci.2c00207] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Indexed: 06/15/2023]
Abstract
Computer-aided synthesis planning (CASP) tools can propose retrosynthetic pathways and forward reaction conditions for the synthesis of organic compounds, but the limited availability of context-specific data currently necessitates experimental development to fully specify process details. We plan and optimize a CASP-proposed and human-refined multistep synthesis route toward an exemplary small molecule, sonidegib, on a modular, robotic flow synthesis platform with integrated process analytical technology (PAT) for data-rich experimentation. Human insights address catalyst deactivation and improve yield by strategic choices of order of addition. Multi-objective Bayesian optimization identifies optimal values for categorical and continuous process variables in the multistep route involving 3 reactions (including heterogeneous hydrogenation) and 1 separation. The platform's modularity, robotic reconfigurability, and flexibility for convergent synthesis are shown to be essential for allowing variation of downstream residence time in multistep flow processes and controlling the order of addition to minimize undesired reactivity. Overall, the work demonstrates how automation, machine learning, and robotics enhance manual experimentation through assistance with idea generation, experimental design, execution, and optimization.
Collapse
Affiliation(s)
- Anirudh
M. K. Nambiar
- Department
of Chemical Engineering, Massachusetts Institute
of Technology,77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Christopher P. Breen
- Department
of Chemistry, Massachusetts Institute of
Technology,77 Massachusetts
Avenue, Cambridge, Massachusetts 02139, United States
| | - Travis Hart
- Department
of Chemical Engineering, Massachusetts Institute
of Technology,77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Timothy Kulesza
- Department
of Chemical Engineering, Massachusetts Institute
of Technology,77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Timothy F. Jamison
- Department
of Chemistry, Massachusetts Institute of
Technology,77 Massachusetts
Avenue, Cambridge, Massachusetts 02139, United States
| | - Klavs F. Jensen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology,77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
21
|
Bai J, Cao L, Mosbach S, Akroyd J, Lapkin AA, Kraft M. From Platform to Knowledge Graph: Evolution of Laboratory Automation. JACS AU 2022; 2:292-309. [PMID: 35252980 PMCID: PMC8889618 DOI: 10.1021/jacsau.1c00438] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Indexed: 05/19/2023]
Abstract
High-fidelity computer-aided experimentation is becoming more accessible with the development of computing power and artificial intelligence tools. The advancement of experimental hardware also empowers researchers to reach a level of accuracy that was not possible in the past. Marching toward the next generation of self-driving laboratories, the orchestration of both resources lies at the focal point of autonomous discovery in chemical science. To achieve such a goal, algorithmically accessible data representations and standardized communication protocols are indispensable. In this perspective, we recategorize the recently introduced approach based on Materials Acceleration Platforms into five functional components and discuss recent case studies that focus on the data representation and exchange scheme between different components. Emerging technologies for interoperable data representation and multi-agent systems are also discussed with their recent applications in chemical automation. We hypothesize that knowledge graph technology, orchestrating semantic web technologies and multi-agent systems, will be the driving force to bring data to knowledge, evolving our way of automating the laboratory.
Collapse
Affiliation(s)
- Jiaru Bai
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
| | - Liwei Cao
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
| | - Sebastian Mosbach
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
- Cambridge
Centre for Advanced Research and Education in Singapore (CARES), CREATE Tower #05-05, 1 Create Way, 138602 Singapore
| | - Jethro Akroyd
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
- Cambridge
Centre for Advanced Research and Education in Singapore (CARES), CREATE Tower #05-05, 1 Create Way, 138602 Singapore
| | - Alexei A. Lapkin
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
- Cambridge
Centre for Advanced Research and Education in Singapore (CARES), CREATE Tower #05-05, 1 Create Way, 138602 Singapore
| | - Markus Kraft
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
- Cambridge
Centre for Advanced Research and Education in Singapore (CARES), CREATE Tower #05-05, 1 Create Way, 138602 Singapore
- School
of Chemical and Biomedical Engineering, Nanyang Technological University, 62 Nanyang Drive, 637459 Singapore
- The
Alan Turing Institute, London NW1 2DB, United Kingdom
| |
Collapse
|
22
|
Hosny NM, Gadallah MI, Gomila RM, Qayed WS. Innovative computationally designed-spectrofluorimetric method for determination of modafinil in tablets and human plasma. Talanta 2022; 236:122890. [PMID: 34635269 DOI: 10.1016/j.talanta.2021.122890] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 09/10/2021] [Accepted: 09/14/2021] [Indexed: 12/23/2022]
Abstract
A novel computationally designed-spectrofluorimetric method for the determination of a unique antinarcoleptic drug; modafinil (MDF) in tablets and human plasma was theoretically and experimentally established. Firstly, a density functional theory (DFT) computations were performed to investigate MDF-Tb3+ complex formation and to study the affinity of Tb3+ to MDF in aqueous solution. The computed formation energy of [Tb (MDF)4]3+ (ΔG= -246.0 kcal/mol) assured the ability of Tb3+ to recognize MDF in water and proved the strong nature of the Tb3+-O coordination bonds in addition to some contribution from inter-ligand hydrophobic interactions. Hence, a spectrofluorimetric method was optimized and validated depending on MDF quenching effect on Tb3+ fluorescence via fluorescence resonance energy transfer from Tb3+ to MDF. The formed [Tb (MDF)4]3+ complex was measured at λex. 222 nm/λem. 497 nm against a reagent blank. The Tb3+ fluorescence was significantly reduced upon addition of MDF (linearity range= 0.5-20.0 μg/mL). Detection and quantification limits were 0.129 and 0.391 μg/mL, respectively. Good recoveries (97.47-101.92%) were obtained upon application of the proposed method for the assessment of the target drug in bulk powder, tablets and plasma. According ICH guidelines, the results of the established method were statistically analyzed and validated.
Collapse
Affiliation(s)
- Noha M Hosny
- Department of Pharmaceutical Analytical Chemistry, Faculty of Pharmacy, Assiut University, Assiut, 71526, Egypt.
| | - Mohammed I Gadallah
- Department of Pharmaceutical Analytical Chemistry, Faculty of Pharmacy, Assiut University, Assiut, 71526, Egypt; Department of Nutritional Sciences, School of Human Ecology, University of Texas at Austin, Austin, TX 78712, USA.
| | - Rosa M Gomila
- Department of Chemistry, Universitat de les Illes Balears, Crta de Valldemossa km 7.5, 07122, Palma de Mallorca (Baleares), Spain.
| | - Wesam S Qayed
- Department of Medicinal Chemistry, Faculty of Pharmacy, Assiut University, Assiut, 71526, Egypt.
| |
Collapse
|
23
|
Chakkingal A, Janssens P, Poissonnier J, Barrios AJ, Virginie M, Khodakov AY, Thybaut JW. Machine learning based interpretation of microkinetic data: a Fischer–Tropsch synthesis case study. REACT CHEM ENG 2022. [DOI: 10.1039/d1re00351h] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
A systematic approach for analysing kinetic data and identifying hidden trends using interpretation techniques in data science with the ANN.
Collapse
Affiliation(s)
- Anoop Chakkingal
- Laboratory for Chemical Technology (LCT), Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052 Ghent, Belgium
- CNRS, Centrale Lille, Univ. Lille, ENSCL, Univ. Artois, UMR 8181 – UCCS – Unité de Catalyse et Chimie du Solide, F-59000 Lille, France
| | - Pieter Janssens
- Laboratory for Chemical Technology (LCT), Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052 Ghent, Belgium
| | - Jeroen Poissonnier
- Laboratory for Chemical Technology (LCT), Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052 Ghent, Belgium
| | - Alan J. Barrios
- Laboratory for Chemical Technology (LCT), Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052 Ghent, Belgium
- CNRS, Centrale Lille, Univ. Lille, ENSCL, Univ. Artois, UMR 8181 – UCCS – Unité de Catalyse et Chimie du Solide, F-59000 Lille, France
| | - Mirella Virginie
- CNRS, Centrale Lille, Univ. Lille, ENSCL, Univ. Artois, UMR 8181 – UCCS – Unité de Catalyse et Chimie du Solide, F-59000 Lille, France
| | - Andrei Y. Khodakov
- CNRS, Centrale Lille, Univ. Lille, ENSCL, Univ. Artois, UMR 8181 – UCCS – Unité de Catalyse et Chimie du Solide, F-59000 Lille, France
| | - Joris W. Thybaut
- Laboratory for Chemical Technology (LCT), Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052 Ghent, Belgium
| |
Collapse
|
24
|
Galvanin F, Hartman RL, Kulkarni AA, Nieves-Remacha MJ. Introduction to the themed collection on digitalization in reaction engineering. REACT CHEM ENG 2022. [DOI: 10.1039/d2re90011d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Federico Galvanin, Ryan Hartman, Amol Kulkarni and María José Nieves-Remacha introduce the Reaction Chemistry & Engineering themed collection on digitalization in reaction engineering.
Collapse
Affiliation(s)
- Federico Galvanin
- Department of Chemical Engineering, University College London, London, UK
| | - Ryan L. Hartman
- Department of Chemical and Biomolecular Engineering, New York University, 6 MetroTech Center, Brooklyn, NY, USA
| | - Amol A. Kulkarni
- Academy of Scientific and Innovative Research (AcSIR), CSIR-National Chemical Laboratory (NCL) Campus, Pune-411008, India
| | | |
Collapse
|
25
|
Szymkuć S, Badowski T, Grzybowski BA. Is Organic Chemistry Really Growing Exponentially? Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202111540] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Sara Szymkuć
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Tomasz Badowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
- IBS Center for Soft and Living Matter and Department of Chemistry UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun Ulsan South Korea
| |
Collapse
|
26
|
Szymkuć S, Badowski T, Grzybowski BA. Is Organic Chemistry Really Growing Exponentially? Angew Chem Int Ed Engl 2021; 60:26226-26232. [PMID: 34558168 DOI: 10.1002/anie.202111540] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Indexed: 11/05/2022]
Abstract
In terms of molecules and specific reaction examples, organic chemistry features an impressive, exponential growth. However, new reaction classes/types that fuel this growth are being discovered at a much slower and only linear (or even sublinear) rate. The proportion of newly discovered reaction types to all reactions being performed keeps decreasing, suggesting that synthetic chemistry becomes more reliant on reusing the well-known methods. The newly discovered chemistries are more complex than decades ago and allow for the rapid construction of complex scaffolds in fewer numbers of steps. We study these and other trends in the function of time, reaction-type popularity and complexity based on the algorithm that extracts generalized reaction class templates. These analyses are useful in the context of computer-assisted synthesis, machine learning (to estimate the numbers of models with sufficient reaction statistics), and identifying erroneous entries in reaction databases.
Collapse
Affiliation(s)
- Sara Szymkuć
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Tomasz Badowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA.,IBS Center for Soft and Living Matter and Department of Chemistry, UNIST, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan, South Korea
| |
Collapse
|
27
|
Liu L, Bi M, Wang Y, Liu J, Jiang X, Xu Z, Zhang X. Artificial intelligence-powered microfluidics for nanomedicine and materials synthesis. NANOSCALE 2021; 13:19352-19366. [PMID: 34812823 DOI: 10.1039/d1nr06195j] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Artificial intelligence (AI) is an emerging technology with great potential, and its robust calculation and analysis capabilities are unmatched by traditional calculation tools. With the promotion of deep learning and open-source platforms, the threshold of AI has also become lower. Combining artificial intelligence with traditional fields to create new fields of high research and application value has become a trend. AI has been involved in many disciplines, such as medicine, materials, energy, and economics. The development of AI requires the support of many kinds of data, and microfluidic systems can often mine object data on a large scale to support AI. Due to the excellent synergy between the two technologies, excellent research results have emerged in many fields. In this review, we briefly review AI and microfluidics and introduce some applications of their combination, mainly in nanomedicine and material synthesis. Finally, we discuss the development trend of the combination of the two technologies.
Collapse
Affiliation(s)
- Linbo Liu
- John A. Paulson School of Engineering and Applied Science, Harvard University, Cambridge, MA 02138, USA
| | - Mingcheng Bi
- Institute of Process Equipment, College of Energy Engineering, Zhejiang University, Hangzhou 310027, P.R. China
| | - Yunhua Wang
- John A. Paulson School of Engineering and Applied Science, Harvard University, Cambridge, MA 02138, USA
| | - Junfeng Liu
- Institute of Process Equipment, College of Energy Engineering, Zhejiang University, Hangzhou 310027, P.R. China
| | - Xiwen Jiang
- College of Biological Science and Engineering, Fuzhou university, Fuzhou 350108, P.R. China
| | - Zhongbin Xu
- Institute of Process Equipment, College of Energy Engineering, Zhejiang University, Hangzhou 310027, P.R. China
| | - Xingcai Zhang
- John A. Paulson School of Engineering and Applied Science, Harvard University, Cambridge, MA 02138, USA
- School of Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
28
|
Vijayan RSK, Kihlberg J, Cross JB, Poongavanam V. Enhancing preclinical drug discovery with artificial intelligence. Drug Discov Today 2021; 27:967-984. [PMID: 34838731 DOI: 10.1016/j.drudis.2021.11.023] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 10/15/2021] [Accepted: 11/19/2021] [Indexed: 12/14/2022]
Abstract
Artificial intelligence (AI) is becoming an integral part of drug discovery. It has the potential to deliver across the drug discovery and development value chain, starting from target identification and reaching through clinical development. In this review, we provide an overview of current AI technologies and a glimpse of how AI is reimagining preclinical drug discovery by highlighting examples where AI has made a real impact. Considering the excitement and hyperbole surrounding AI in drug discovery, we aim to present a realistic view by discussing both opportunities and challenges in adopting AI in drug discovery.
Collapse
Affiliation(s)
- R S K Vijayan
- Institute for Applied Cancer Science, MD Anderson Cancer Center, Houston, TX, USA
| | - Jan Kihlberg
- Department of Chemistry-BMC, Uppsala University, Uppsala, Sweden
| | - Jason B Cross
- Institute for Applied Cancer Science, MD Anderson Cancer Center, Houston, TX, USA.
| | | |
Collapse
|
29
|
Jia P, Pei J, Wang G, Pan X, Zhu Y, Wu Y, Ouyang L. The roles of computer-aided drug synthesis in drug development. GREEN SYNTHESIS AND CATALYSIS 2021. [DOI: 10.1016/j.gresc.2021.11.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
|
30
|
Vaucher AC, Schwaller P, Geluykens J, Nair VH, Iuliano A, Laino T. Inferring experimental procedures from text-based representations of chemical reactions. Nat Commun 2021; 12:2573. [PMID: 33958589 PMCID: PMC8102565 DOI: 10.1038/s41467-021-22951-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 04/07/2021] [Indexed: 11/19/2022] Open
Abstract
The experimental execution of chemical reactions is a context-dependent and time-consuming process, often solved using the experience collected over multiple decades of laboratory work or searching similar, already executed, experimental protocols. Although data-driven schemes, such as retrosynthetic models, are becoming established technologies in synthetic organic chemistry, the conversion of proposed synthetic routes to experimental procedures remains a burden on the shoulder of domain experts. In this work, we present data-driven models for predicting the entire sequence of synthesis steps starting from a textual representation of a chemical equation, for application in batch organic chemistry. We generated a data set of 693,517 chemical equations and associated action sequences by extracting and processing experimental procedure text from patents, using state-of-the-art natural language models. We used the attained data set to train three different models: a nearest-neighbor model based on recently-introduced reaction fingerprints, and two deep-learning sequence-to-sequence models based on the Transformer and BART architectures. An analysis by a trained chemist revealed that the predicted action sequences are adequate for execution without human intervention in more than 50% of the cases.
Collapse
Affiliation(s)
| | | | | | | | - Anna Iuliano
- Dipartimento di Chimica e Chimica Industriale, Università di Pisa, Pisa, Italy
| | | |
Collapse
|