1
|
Mir BA, Tayara H, Chong KT. SB-Net: Synergizing CNN and LSTM networks for uncovering retrosynthetic pathways in organic synthesis. Comput Biol Chem 2024; 112:108130. [PMID: 38954849 DOI: 10.1016/j.compbiolchem.2024.108130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 05/17/2024] [Accepted: 06/12/2024] [Indexed: 07/04/2024]
Abstract
Retrosynthesis is vital in synthesizing target products, guiding reaction pathway design crucial for drug and material discovery. Current models often neglect multi-scale feature extraction, limiting efficacy in leveraging molecular descriptors. Our proposed SB-Net model, a deep-learning architecture tailored for retrosynthesis prediction, addresses this gap. SB-Net combines CNN and Bi-LSTM architectures, excelling in capturing multi-scale molecular features. It integrates parallel branches for processing one-hot encoded descriptors and ECFP, merging through dense layers. Experimental results demonstrate SB-Net's superiority, achieving 73.6 % top-1 and 94.6 % top-10 accuracy on USPTO-50k data. Versatility is validated on MetaNetX, with rates of 52.8 % top-1, 74.3 % top-3, 79.8 % top-5, and 83.5 % top-10. SB-Net's success in bioretrosynthesis prediction tasks indicates its efficacy. This research advances computational chemistry, offering a robust deep-learning model for retrosynthesis prediction. With implications for drug discovery and synthesis planning, SB-Net promises innovative and efficient pathways.
Collapse
Affiliation(s)
- Bilal Ahmad Mir
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea; Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea.
| |
Collapse
|
2
|
Bharadwaj S, Deepika K, Kumar A, Jaiswal S, Miglani S, Singh D, Fartyal P, Kumar R, Singh S, Singh MP, Gaidhane AM, Kumar B, Jha V. Exploring the Artificial Intelligence and Its Impact in Pharmaceutical Sciences: Insights Toward the Horizons Where Technology Meets Tradition. Chem Biol Drug Des 2024; 104:e14639. [PMID: 39396920 DOI: 10.1111/cbdd.14639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Revised: 09/03/2024] [Accepted: 09/24/2024] [Indexed: 10/15/2024]
Abstract
The technological revolutions in computers and the advancement of high-throughput screening technologies have driven the application of artificial intelligence (AI) for faster discovery of drug molecules with more efficiency, and cost-friendly finding of hit or lead molecules. The ability of software and network frameworks to interpret molecular structures' representations and establish relationships/correlations has enabled various research teams to develop numerous AI platforms for identifying new lead molecules or discovering new targets for already established drug molecules. The prediction of biological activity, ADME properties, and toxicity parameters in early stages have reduced the chances of failure and associated costs in later clinical stages, which was observed at a high rate in the tedious, expensive, and laborious drug discovery process. This review focuses on the different AI and machine learning (ML) techniques with their applications mainly focused on the pharmaceutical industry. The applications of AI frameworks in the identification of molecular target, hit identification/hit-to-lead optimization, analyzing drug-receptor interactions, drug repurposing, polypharmacology, synthetic accessibility, clinical trial design, and pharmaceutical developments are discussed in detail. We have also compiled the details of various startups in AI in this field. This review will provide a comprehensive analysis and outline various state-of-the-art AI/ML techniques to the readers with their framework applications. This review also highlights the challenges in this field, which need to be addressed for further success in pharmaceutical applications.
Collapse
Affiliation(s)
- Shruti Bharadwaj
- Center for SeNSE, Indian Institute of Technology Delhi (IIT), New Delhi, India
| | - Kumari Deepika
- Department of Computer Engineering, Pune Institute of Computer Technology, Pune, India
| | - Asim Kumar
- Amity Institute of Pharmacy (AIP), Amity University Haryana, Manesar, India
| | - Shivani Jaiswal
- Institute of Pharmaceutical Research, GLA University, Mathura, India
| | - Shaweta Miglani
- Department of Education, Central University of Punjab, Bathinda, India
| | - Damini Singh
- IES Institute of Pharmacy, IES University, Bhopal, Madhya Pradesh, India
| | - Prachi Fartyal
- Department of Mathematics, Govt PG College Bajpur (US Nagar), Bazpur, Uttarakhand, India
| | - Roshan Kumar
- Department of Microbiology, Graphic Era (Deemed to be University), Dehradun, India
- Department of Microbiology, Central University of Punjab, VPO-Ghudda, Punjab, India
| | - Shareen Singh
- Centre for Research Impact & Outcome, Chitkara College of Pharmacy, Chitkara University, Rajpura, Punjab, India
| | - Mahendra Pratap Singh
- Center for Global Health Research, Saveetha Medical College and Hospital, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, India
| | - Abhay M Gaidhane
- Jawaharlal Nehru Medical College, and Global Health Academy, School of Epidemiology and Public Health, Datta Meghe Institute of Higher Education, Wardha, India
| | - Bhupinder Kumar
- Department of Pharmaceutical Science, Hemvati Nandan Bahuguna Garhwal (A Central) University, Srinagar, Uttarakhand, India
| | - Vibhu Jha
- Institute of Cancer Therapeutics, School of Pharmacy and Medical Sciences, Faculty of Life Sciences, University of Bradford, Bradford, UK
| |
Collapse
|
3
|
Shee Y, Li H, Zhang P, Nikolic AM, Lu W, Kelly HR, Manee V, Sreekumar S, Buono FG, Song JJ, Newhouse TR, Batista VS. Site-specific template generative approach for retrosynthetic planning. Nat Commun 2024; 15:7818. [PMID: 39251606 PMCID: PMC11385523 DOI: 10.1038/s41467-024-52048-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 08/26/2024] [Indexed: 09/11/2024] Open
Abstract
Retrosynthesis, the strategy of devising laboratory pathways by working backwards from the target compound, is crucial yet challenging. Enhancing retrosynthetic efficiency requires overcoming the vast complexity of chemical space, the limited known interconversions between molecules, and the challenges posed by limited experimental datasets. This study introduces generative machine learning methods for retrosynthetic planning. The approach features three innovations: generating reaction templates instead of reactants or synthons to create novel chemical transformations, allowing user selection of specific bonds to change for human-influenced synthesis, and employing a conditional kernel-elastic autoencoder (CKAE) to measure the similarity between generated and known reactions for chemical viability insights. These features form a coherent retrosynthetic framework, validated experimentally by designing a 3-step synthetic pathway for a challenging small molecule, demonstrating a significant improvement over previous 5-9 step approaches. This work highlights the utility and robustness of generative machine learning in addressing complex challenges in chemical synthesis.
Collapse
Affiliation(s)
- Yu Shee
- Department of Chemistry, Yale University, New Haven, CT, USA
| | - Haote Li
- Department of Chemistry, Yale University, New Haven, CT, USA
| | - Pengpeng Zhang
- Department of Chemistry, Yale University, New Haven, CT, USA
| | | | - Wenxin Lu
- Department of Chemistry, Yale University, New Haven, CT, USA
| | - H Ray Kelly
- Chemical Development, Boehringer Ingelheim Pharmaceuticals Inc, Ridgefield, CT, USA
| | - Vidhyadhar Manee
- Chemical Development, Boehringer Ingelheim Pharmaceuticals Inc, Ridgefield, CT, USA
| | - Sanil Sreekumar
- Chemical Development, Boehringer Ingelheim Pharmaceuticals Inc, Ridgefield, CT, USA
| | - Frederic G Buono
- Chemical Development, Boehringer Ingelheim Pharmaceuticals Inc, Ridgefield, CT, USA
| | - Jinhua J Song
- Chemical Development, Boehringer Ingelheim Pharmaceuticals Inc, Ridgefield, CT, USA
| | | | | |
Collapse
|
4
|
Liu M, Yang J, He Y, Cao F, Li W, Han W. VmmScore: An umami peptide prediction and receptor matching program based on a deep learning approach. Comput Biol Med 2024; 179:108814. [PMID: 38944902 DOI: 10.1016/j.compbiomed.2024.108814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 05/17/2024] [Accepted: 06/24/2024] [Indexed: 07/02/2024]
Abstract
Peptides, with recognized physiological and medical implications, such as the ability to lower blood pressure and lipid levels, are central to our research on umami taste perception. This study introduces a computational strategy to tackle the challenge of identifying optimal umami receptors for these peptides. Our VmmScore algorithm includes two integral components: Mlp4Umami, a predictive module that evaluates the umami taste potential of peptides, and mm-Score, which enhances the receptor matching process through a machine learning-optimized molecular docking and scoring system. This system encompasses the optimization of docking structures, clustering of umami peptides, and a comparative analysis of docking energies across peptide clusters, streamlining the receptor identification process. Employing machine learning, our method offers a strategic approach to the intricate task of umami receptor determination. We undertook virtual screening of peptides derived from Lateolabrax japonicus, experimentally verifying the umami taste of three identified peptides and determining their corresponding receptors. This work not only advances our understanding of the mechanisms behind umami taste perception but also provides a rapid and cost-effective method for peptide screening. The source code is publicly accessible at https://github.com/heyigacu/mlp4umami/, encouraging further scientific exploration and collaborative efforts within the research community.
Collapse
Affiliation(s)
- Minghao Liu
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, 2699 Qianjin Street, Changchun, 130012, China.
| | - Jiuliang Yang
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, 2699 Qianjin Street, Changchun, 130012, China.
| | - Yi He
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, 2699 Qianjin Street, Changchun, 130012, China.
| | - Fuyan Cao
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, 2699 Qianjin Street, Changchun, 130012, China.
| | - Wannan Li
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, 2699 Qianjin Street, Changchun, 130012, China.
| | - Weiwei Han
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, 2699 Qianjin Street, Changchun, 130012, China.
| |
Collapse
|
5
|
Wang X, Yin X, Jiang D, Zhao H, Wu Z, Zhang O, Wang J, Li Y, Deng Y, Liu H, Luo P, Han Y, Hou T, Yao X, Hsieh CY. Multi-modal deep learning enables efficient and accurate annotation of enzymatic active sites. Nat Commun 2024; 15:7348. [PMID: 39187482 PMCID: PMC11347633 DOI: 10.1038/s41467-024-51511-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 08/09/2024] [Indexed: 08/28/2024] Open
Abstract
Annotating active sites in enzymes is crucial for advancing multiple fields including drug discovery, disease research, enzyme engineering, and synthetic biology. Despite the development of numerous automated annotation algorithms, a significant trade-off between speed and accuracy limits their large-scale practical applications. We introduce EasIFA, an enzyme active site annotation algorithm that fuses latent enzyme representations from the Protein Language Model and 3D structural encoder, and then aligns protein-level information with the knowledge of enzymatic reactions using a multi-modal cross-attention framework. EasIFA outperforms BLASTp with a 10-fold speed increase and improved recall, precision, f1 score, and MCC by 7.57%, 13.08%, 9.68%, and 0.1012, respectively. It also surpasses empirical-rule-based algorithm and other state-of-the-art deep learning annotation method based on PSSM features, achieving a speed increase ranging from 650 to 1400 times while enhancing annotation quality. This makes EasIFA a suitable replacement for conventional tools in both industrial and academic settings. EasIFA can also effectively transfer knowledge gained from coarsely annotated enzyme databases to smaller, high-precision datasets, highlighting its ability to model sparse and high-quality databases. Additionally, EasIFA shows potential as a catalytic site monitoring tool for designing enzymes with desired functions beyond their natural distribution.
Collapse
Affiliation(s)
- Xiaorui Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
- Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China
| | - Xiaodan Yin
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
- Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Huifeng Zhao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Odin Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Yuquan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou, 730000, Gansu, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, China
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
| | - Pei Luo
- Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China
| | - Yuqiang Han
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Hong Kong, 999077, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| | - Xiaojun Yao
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China.
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
6
|
Gricourt G, Meyer P, Duigou T, Faulon JL. Artificial Intelligence Methods and Models for Retro-Biosynthesis: A Scoping Review. ACS Synth Biol 2024; 13:2276-2294. [PMID: 39047143 PMCID: PMC11334239 DOI: 10.1021/acssynbio.4c00091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 06/14/2024] [Accepted: 06/14/2024] [Indexed: 07/27/2024]
Abstract
Retrosynthesis aims to efficiently plan the synthesis of desirable chemicals by strategically breaking down molecules into readily available building block compounds. Having a long history in chemistry, retro-biosynthesis has also been used in the fields of biocatalysis and synthetic biology. Artificial intelligence (AI) is driving us toward new frontiers in synthesis planning and the exploration of chemical spaces, arriving at an opportune moment for promoting bioproduction that would better align with green chemistry, enhancing environmental practices. In this review, we summarize the recent advancements in the application of AI methods and models for retrosynthetic and retro-biosynthetic pathway design. These techniques can be based either on reaction templates or generative models and require scoring functions and planning strategies to navigate through the retrosynthetic graph of possibilities. We finally discuss limitations and promising research directions in this field.
Collapse
Affiliation(s)
- Guillaume Gricourt
- Université
Paris-Saclay, INRAE, AgroParisTech, Micalis
Institute, 78350 Jouy-en-Josas, France
| | - Philippe Meyer
- Université
Paris-Saclay, INRAE, AgroParisTech, Micalis
Institute, 78350 Jouy-en-Josas, France
| | - Thomas Duigou
- Université
Paris-Saclay, INRAE, AgroParisTech, Micalis
Institute, 78350 Jouy-en-Josas, France
| | - Jean-Loup Faulon
- Université
Paris-Saclay, INRAE, AgroParisTech, Micalis
Institute, 78350 Jouy-en-Josas, France
- The
University of Manchester, Manchester Institute
of Biotechnology, Manchester M1 7DN, U.K.
| |
Collapse
|
7
|
Wiest O, Bauer C, Helquist P, Norrby PO, Genheden S. Finding Relevant Retrosynthetic Disconnections for Stereocontrolled Reactions. J Chem Inf Model 2024; 64:5796-5805. [PMID: 38995078 DOI: 10.1021/acs.jcim.4c00370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
Machine learning-driven computer-aided synthesis planning (CASP) tools have become important tools for idea generation in the design of complex molecule synthesis but do not adequately address the stereochemical features of the target compounds. A novel approach to automated extraction of templates used in CASP that includes stereochemical information included in the US Patent and Trademark Office (USPTO) and an internal AstraZeneca database containing reactions from Reaxys, Pistachio, and AstraZeneca electronic lab notebooks is implemented in the freely available AiZynthFinder software. Three hundred sixty-seven templates covering reagent- and substrate-controlled as well as stereospecific reactions were extracted from the USPTO, while 20,724 templates were from the AstraZeneca database. The performance of these templates in multistep CASP is evaluated for 936 targets from the ChEMBL database and an in-house selection of 791 AZ designs. The potential and limitations are discussed for four case studies from ChEMBL and examples of FDA-approved drugs.
Collapse
Affiliation(s)
- Olaf Wiest
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Christoph Bauer
- Data Science and Modelling, Pharmaceutical Sciences, R&D, AstraZeneca, Gothenburg, Pepparedsleden 1, SE-431 83 Mölndal, Sweden
| | - Paul Helquist
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Per-Ola Norrby
- Data Science and Modelling, Pharmaceutical Sciences, R&D, AstraZeneca, Gothenburg, Pepparedsleden 1, SE-431 83 Mölndal, Sweden
| | - Samuel Genheden
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Pepparedsleden 1, SE-431 83 Mölndal, Sweden
| |
Collapse
|
8
|
Chen S, Babazade R, Kim T, Han S, Jung Y. A large-scale reaction dataset of mechanistic pathways of organic reactions. Sci Data 2024; 11:863. [PMID: 39127730 DOI: 10.1038/s41597-024-03709-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 07/29/2024] [Indexed: 08/12/2024] Open
Abstract
Understanding organic reaction mechanisms is crucial for interpreting the formation of products at the atomic and electronic level, but still remains as a domain of knowledgeable experts. The lack of a large-scale dataset with chemically reasonable mechanistic sequences also hinders the development of reliable machine learning models to predict organic reactions based on mechanisms as human chemists do. Here, we present a high-quality and the first large-scale reaction dataset, denoted as mech-USPTO-31K, with chemically reasonable arrow-pushing diagrams validated by synthetic chemists, encompassing a wide spectrum of polar organic reaction mechanisms. We envision this dataset curated by applying a simple and flexible method that automatically generates reaction mechanisms using autonomously extracted reaction templates and expert-coded mechanistic templates to become an invaluable tool to develop future reaction outcome prediction models and discover new reactions.
Collapse
Affiliation(s)
- Shuan Chen
- School of Chemical and Biological Engineering, Institute of Chemical Processes, Seoul National University, 1 Gwanak-ro, Seoul, 08826, South Korea
| | - Ramil Babazade
- Graduate School of Artificial Intelligence, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Daejeon, 34141, South Korea
| | - Taewan Kim
- Department of Chemistry, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Daejeon, 34141, South Korea
| | - Sunkyu Han
- Department of Chemistry, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Daejeon, 34141, South Korea.
| | - Yousung Jung
- School of Chemical and Biological Engineering, Institute of Chemical Processes, Seoul National University, 1 Gwanak-ro, Seoul, 08826, South Korea.
- Institute of Engineering Research, Seoul National University, 1 Gwanak-ro, Seoul, 08826, South Korea.
| |
Collapse
|
9
|
Zhang X, Liu J, Yang F, Zhang Q, Yang Z, Shah HA. Planning biosynthetic pathways of target molecules based on metabolic reaction prediction and AND-OR tree search. Comput Biol Chem 2024; 111:108106. [PMID: 38833912 DOI: 10.1016/j.compbiolchem.2024.108106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 05/06/2024] [Accepted: 05/13/2024] [Indexed: 06/06/2024]
Abstract
Bioretrosynthesis problem is to predict synthetic routes using substrates for given natural products (NPs). However, the huge number of metabolic reactions leads to a combinatorial explosion of searching space, which is high time-consuming and costly. Here, we propose a framework called BioRetro to predict bioretrosynthesis pathways using a one-step bioretrosynthesis network, termed HybridMLP combined with AND-OR tree heuristic search. The HybridMLP predicts precursors that will produce the target NPs, while the AND-OR tree generates the iterative multi-step biosynthetic pathways. The one-step bioretrosynthesis prediction experiments are conducted on MetaNetX dataset by using HybridMLP, which achieves 46.5%, 74.6%, 81.6% in terms of the top-1, top-5, top-10 accuracies. The great performance demonstrates the effectiveness of HybridMLP in one-step bioretrosynthesis. Besides, the evaluation of two benchmark datasets reveals that BioRetro can significantly improve the speed and success rate in predicting biosynthesis pathways. In addition, the BioRetro is further shown to find the synthetic pathway of compounds, such as ginsenoside F1 with the same substrates as reported but different enzymes, which may be the novel potential enzyme to have better catalytic performance.
Collapse
Affiliation(s)
- Xiaolei Zhang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, China.
| | - Feng Yang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Qiang Zhang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Zhihui Yang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Hayat Ali Shah
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, China
| |
Collapse
|
10
|
Rago AJ, Zoi I, Gartman JA, McDaniel KA, Jana N, Liu D, Bai WJ. Mining Medicinally Relevant Bioreduction Substrates Inspired by Ligand-Based Drug Design. J Med Chem 2024. [PMID: 39051635 DOI: 10.1021/acs.jmedchem.4c01129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Exploring the scope of biocatalytic transformations in the absence of enzyme structures without extensive experimentation is a challenging task. To expand the limited substrate capacity of carrot-mediated bioreduction and hunt for new medicinally relevant ketones with minimum cost of labor and time, we deployed a practical method inspired by ligand-based drug design. Through analyzing collected literature data and building pharmacophore and reactivity prediction models, we screened a self-built virtual library of >8000 ketones bearing the most frequently used N,O,S-heterocycles and functional groups in drug discovery. Representative examples were validated, expanding the bioreduction substrate scope. The public availability of our models alongside the straightforward screening workflow makes it time-, labor-, and cost-saving to evaluate unknown bioreduction substrates for medicinal chemistry applications, especially for a large set of structurally differentiated ketones. Our studies also showcase the novelty of utilizing medicinal chemistry principles to solve a general biocatalysis problem.
Collapse
Affiliation(s)
| | - Ioanna Zoi
- AbbVie, Inc., North Chicago, Illinois 60064, United States
| | | | | | - Navendu Jana
- AbbVie, Inc., North Chicago, Illinois 60064, United States
| | - Dachun Liu
- AbbVie, Inc., North Chicago, Illinois 60064, United States
| | - Wen-Ju Bai
- AbbVie, Inc., North Chicago, Illinois 60064, United States
| |
Collapse
|
11
|
Chen S, Jung Y. Estimating the synthetic accessibility of molecules with building block and reaction-aware SAScore. J Cheminform 2024; 16:83. [PMID: 39044299 PMCID: PMC11267797 DOI: 10.1186/s13321-024-00879-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Accepted: 07/09/2024] [Indexed: 07/25/2024] Open
Abstract
Synthetic accessibility prediction is a task to estimate how easily a given molecule might be synthesizable in the laboratory, playing a crucial role in computer-aided molecular design. Although synthesis planning programs can determine synthesis routes, their slow processing times make them impractical for large-scale molecule screening. On the other hand, existing rapid synthesis accessibility estimation methods offer speed but typically lack integration with actual synthesis routes and building block information. In this work, we introduce BR-SAScore, an enhanced version of SAScore that integrates the available building block information (B) and reaction knowledge (R) from synthesis planning programs into the scoring process. In particular, we differentiate fragments inherent in building blocks and fragments to be derived from synthesis (reactions) when scoring synthetic accessibility. Compared to existing methods, our experimental findings demonstrate that BR-SAScore offers more accurate and precise identification of a molecule's synthetic accessibility by the synthesis planning program with a fast calculation time. Moreover, we illustrate how BR-SAScore provides chemically interpretable results, aligning with the capability of the synthesis planning program embedded with the same reaction knowledge and available building blocks.Scientific contributionWe introduce BR-SAScore, an extension of SAScore, to estimate the synthetic accessibility of molecules by leveraging known building-block and reactivity information. In our experiments, BR-SAScore shows superior prediction performance on predicting molecule synthetic accessibility compared to previous methods, including SAScore and deep-learning models, while requiring significantly less computation time. In addition, we show that BR-SAScore is able to precisely identify the chemical fragment contributing to the synthetic infeasibility, holding great potential for future molecule synthesizability optimization.
Collapse
Affiliation(s)
- Shuan Chen
- Department of Chemical and Biological Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea
- Institute of Chemical Processes, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea
| | - Yousung Jung
- Department of Chemical and Biological Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea.
- Institute of Chemical Processes, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea.
- Institute of Engineering Research, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea.
| |
Collapse
|
12
|
Zeng T, Jin Z, Zheng S, Yu T, Wu R. Developing BioNavi for Hybrid Retrosynthesis Planning. JACS AU 2024; 4:2492-2502. [PMID: 39055138 PMCID: PMC11267531 DOI: 10.1021/jacsau.4c00228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 06/18/2024] [Accepted: 06/20/2024] [Indexed: 07/27/2024]
Abstract
Illuminating synthetic pathways is essential for producing valuable chemicals, such as bioactive molecules. Chemical and biological syntheses are crucial, and their integration often leads to more efficient and sustainable pathways. Despite the rapid development of retrosynthesis models, few of them consider both chemical and biological syntheses, hindering the pathway design for high-value chemicals. Here, we propose BioNavi by innovating multitask learning and reaction templates into the deep learning-driven model to design hybrid synthesis pathways in a more interpretable manner. BioNavi outperforms existing approaches on different data sets, achieving a 75% hit rate in replicating reported biosynthetic pathways and displaying superior ability in designing hybrid synthesis pathways. Additional case studies further illustrate the potential application of BioNavi in a de novo pathway design. The enhanced web server (http://biopathnavi.qmclab.com/bionavi/) simplifies input operations and implements step-by-step exploration according to user experience. We show that BioNavi is a handy navigator for designing synthetic pathways for various chemicals.
Collapse
Affiliation(s)
- Tao Zeng
- School
of Pharmaceutical Sciences, Sun Yat-sen
University, Guangzhou 510006, P. R. China
| | - Zhehao Jin
- Center
for Synthetic Biochemistry, CAS Key Laboratory of Quantitative Engineering
Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
(CAS), Shenzhen 518055, P. R. China
| | - Shuangjia Zheng
- Global
Institute of Future Technology, Shanghai
Jiao Tong University, Shanghai 200240, P. R. China
| | - Tao Yu
- Center
for Synthetic Biochemistry, CAS Key Laboratory of Quantitative Engineering
Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
(CAS), Shenzhen 518055, P. R. China
| | - Ruibo Wu
- School
of Pharmaceutical Sciences, Sun Yat-sen
University, Guangzhou 510006, P. R. China
| |
Collapse
|
13
|
Li J, Lin K, Pei J, Lai L. Challenging Complexity with Simplicity: Rethinking the Role of Single-Step Models in Computer-Aided Synthesis Planning. J Chem Inf Model 2024; 64:5470-5479. [PMID: 38940765 DOI: 10.1021/acs.jcim.4c00432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
Computer-assisted synthesis planning has become increasingly important in drug discovery. While deep-learning models have shown remarkable progress in achieving high accuracies for single-step retrosynthetic predictions, their performances in retrosynthetic route planning need to be checked. This study compares the intricate single-step models with a straightforward template enumeration approach for retrosynthetic route planning on a real-world drug molecule data set. Despite the superior single-step accuracy of advanced models, the template enumeration method with a heuristic-based retrosynthesis knowledge score was found to surpass them in efficiency in searching the reaction space, achieving a higher or comparable solve rate within the same time frame. This counterintuitive result underscores the importance of efficiency and retrosynthesis knowledge in retrosynthesis route planning and suggests that future research should incorporate a simple template enumeration as a benchmark. It also suggests that this simple yet effective strategy should be considered alongside more complex models to better cater to the practical needs of computer-assisted synthesis planning in drug discovery.
Collapse
Affiliation(s)
- Junren Li
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Kangjie Lin
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Luhua Lai
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| |
Collapse
|
14
|
Chen S, Noh J, Jang J, Kim S, Gu GH, Jung Y. Reaction Templates: Bridging Synthesis Knowledge and Artificial Intelligence. Acc Chem Res 2024; 57:1964-1972. [PMID: 38924502 DOI: 10.1021/acs.accounts.4c00261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]
Abstract
ConspectusThe field of chemical research boasts a long history of developing software to automate synthesis planning and reaction prediction. Early software relied heavily on expert systems, requiring significant effort to encode vast amounts of synthesis knowledge into a computer-readable format. However, recent advancements in deep learning have shifted the focus toward AI models, offering improved prediction capabilities. Despite these advancements, current AI models often lack the integration of known synthesis rules and intuitions, creating a gap that hinders interpretability and future development of the models. To bridge them, our research group has been actively working on incorporating reaction templates into deep learning models, achieving promising results across various applications.In this Account, we present our latest works to incorporate the known synthesis knowledge into the deep learning models through the utilization of reaction templates. We begin by highlighting the limitations of early computer programs heavily reliant on hand-coded rules. These programs, while providing a foundation for the field, presented limitations in scalability and adaptability. We then introduce SMARTS (SMILES arbitrary target specification), a popular Python-readable format for representing chemical reactions. This format of reaction encoding facilitates the quick integration of synthesis knowledge into AI models built using the Python language. With the SMARTS-based reaction templates, we introduce our recent efforts of developing an AI model for reaction-based molecule optimization. Subsequently, we discuss the recent efforts to automate the extraction of reaction templates from vast chemical reaction databases. This approach eliminates the previously required manual effort of encoding knowledge, a process that could be time-consuming and prone to error when dealing with large data sets. By customizing the automated extraction algorithm, we have developed powerful AI models for specific tasks such as retrosynthesis (LocalRetro), reaction outcome prediction (LocalTransform), and atom-to-atom mapping (LocalMapper). These models, aligned with the intuition of chemists, demonstrate the effectiveness of incorporating reaction templates into deep learning frameworks.Looking toward the future, we believe that utilizing reaction templates to connect known chemical knowledge and AI models holds immense potential for various applications. Not only can this approach significantly benefit future AI models focused on challenging tasks like reaction mechanism labeling and prediction, but we anticipate it can also extend its reach to the realm of inorganic synthesis. By integrating synthesis knowledge, we can not only achieve improved performance but also enhance the interpretability of AI models, paving the way for further advancements in AI-powered chemical synthesis.
Collapse
Affiliation(s)
- Shuan Chen
- Department of Chemical and Biological Engineering, and Institute of Chemical Process, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| | - Juhwan Noh
- Chemical Data-Driven Research Center, Korea Research Institute of Chemical Technology (KRICT), 141 Gajeong-ro, Yuseong-gu, Daejeon 34114, South Korea
| | - Jidon Jang
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology (KRICT), 141 Gajeong-ro, Yuseong-gu, Daejeon 34114, South Korea
| | - Seongmin Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291, Daehak-ro, Yuseong-gu, Daejeon 34141, South Korea
| | - Geun Ho Gu
- Department of Energy Engineering, Korea Institute of Energy Technology (KENTECH), 21 Kentech-gil, Naju, Jeonnam 58330, South Korea
| | - Yousung Jung
- Department of Chemical and Biological Engineering, and Institute of Chemical Process, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
- Institute of Engineering Research, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| |
Collapse
|
15
|
Sankaranarayanan K, Jensen KF. Similarity based functionalization for enumeration of synthetically plausible chemical libraries surrounding a target. Chem Sci 2024; 15:10221-10231. [PMID: 38966353 PMCID: PMC11220589 DOI: 10.1039/d4sc00523f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 05/22/2024] [Indexed: 07/06/2024] Open
Abstract
Functionalization of lead compounds to create analogs is a challenging step in discovering new molecules with desired properties and it is conducted throughout the chemical industry, including pharmaceuticals and agrochemicals. The process can be time-consuming and expensive, requiring expert intuition and experience. To help address synthesis planning challenges in late-stage functionalization, we have developed a molecular similarity approach that proposes single-step functionalization reactions based on analogy to precedent reactions. The developed approach mimics reaction strategies and suggests co-reactants defined implicitly by a corpus of known reactions. Using ca. 348 k reactions from the patent literature as a knowledge base, the recorded products or close analogs are among the top 20 proposed products in 74% of ∼44 k test reactions. The combinatorial growth inherent in recursive applications of the tool allows the enumeration of chemical libraries surrounding a target compound of interest. Moreover, each step of the resulting library synthesis leverages common chemical transformations reported in the literature accessible to most chemists.
Collapse
Affiliation(s)
- Karthik Sankaranarayanan
- Department of Agriculture and Biological Engineering, Purdue University West Lafayette Indiana 47907 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge Massachusetts 02139 USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge Massachusetts 02139 USA
| |
Collapse
|
16
|
Chen LY, Li YP. AutoTemplate: enhancing chemical reaction datasets for machine learning applications in organic chemistry. J Cheminform 2024; 16:74. [PMID: 38937840 PMCID: PMC11212196 DOI: 10.1186/s13321-024-00869-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 06/09/2024] [Indexed: 06/29/2024] Open
Abstract
This paper presents AutoTemplate, an innovative data preprocessing protocol, addressing the crucial need for high-quality chemical reaction datasets in the realm of machine learning applications in organic chemistry. Recent advances in artificial intelligence have expanded the application of machine learning in chemistry, particularly in yield prediction, retrosynthesis, and reaction condition prediction. However, the effectiveness of these models hinges on the integrity of chemical reaction datasets, which are often plagued by inconsistencies like missing reactants, incorrect atom mappings, and outright erroneous reactions. AutoTemplate introduces a two-stage approach to refine these datasets. The first stage involves extracting meaningful reaction transformation rules and formulating generic reaction templates using a simplified SMARTS representation. This simplification broadens the applicability of templates across various chemical reactions. The second stage is template-guided reaction curation, where these templates are systematically applied to validate and correct the reaction data. This process effectively amends missing reactant information, rectifies atom-mapping errors, and eliminates incorrect data entries. A standout feature of AutoTemplate is its capability to concurrently identify and correct false chemical reactions. It operates on the premise that most reactions in datasets are accurate, using these as templates to guide the correction of flawed entries. The protocol demonstrates its efficacy across a range of chemical reactions, significantly enhancing dataset quality. This advancement provides a more robust foundation for developing reliable machine learning models in chemistry, thereby improving the accuracy of forward and retrosynthetic predictions. AutoTemplate marks a significant progression in the preprocessing of chemical reaction datasets, bridging a vital gap and facilitating more precise and efficient machine learning applications in organic synthesis. SCIENTIFIC CONTRIBUTION: The proposed automated preprocessing tool for chemical reaction data aims to identify errors within chemical databases. Specifically, if the errors involve atom mapping or the absence of reactant types, corrections can be systematically applied using reaction templates, ultimately elevating the overall quality of the database.
Collapse
Affiliation(s)
- Lung-Yi Chen
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 10617, Taiwan
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 10617, Taiwan.
- Taiwan International Graduate Program on Sustainable Chemical Science and Technology (TIGP-SCST), No. 128, Sec. 2, Academia Road, Taipei, 11529, Taiwan.
| |
Collapse
|
17
|
Keto A, Guo T, Underdue M, Stuyver T, Coley CW, Zhang X, Krenske EH, Wiest O. Data-Efficient, Chemistry-Aware Machine Learning Predictions of Diels-Alder Reaction Outcomes. J Am Chem Soc 2024; 146:16052-16061. [PMID: 38822795 DOI: 10.1021/jacs.4c03131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2024]
Abstract
The application of machine learning models to the prediction of reaction outcomes currently needs large and/or highly featurized data sets. We show that a chemistry-aware model, NERF, which mimics the bonding changes that occur during reactions, allows for highly accurate predictions of the outcomes of Diels-Alder reactions using a relatively small training set, with no pretraining and no additional features. We establish a diverse data set of 9537 intramolecular, hetero-, aromatic, and inverse electron demand Diels-Alder reactions. This data set is used to train a NERF model, and the performance is compared against state-of-the-art classification and generative machine learning models across low- and high-data regimes, with and without pretraining. The predictive accuracy (regio- and site selectivity in the major product) achieved by NERF exceeds 90% when as little as 40% of the data set is used for training. Another high-performing model, Chemformer, requires a larger training data set (>45%) and pretraining to reach 90% Top-1 accuracy. Accurate predictions of less-represented reaction subclasses, such as those involving heteroatomic or aromatic substrates, require higher percentages of training data. We also show how NERF can use small amounts of additional training data to quickly learn new systems and improve its overall understanding of reactivity. Synthetic chemists stand to benefit as this model can be rapidly expanded and tailored to areas of chemistry corresponding to the low-data regime.
Collapse
Affiliation(s)
- Angus Keto
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Taicheng Guo
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Morgan Underdue
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Xiangliang Zhang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Elizabeth H Krenske
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Olaf Wiest
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| |
Collapse
|
18
|
Guo J, Yu C, Li K, Zhang Y, Wang G, Li S, Dong H. Retrosynthesis Zero: Self-Improving Global Synthesis Planning Using Reinforcement Learning. J Chem Theory Comput 2024; 20:4921-4938. [PMID: 38747149 DOI: 10.1021/acs.jctc.4c00071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]
Abstract
The field of computer-aided synthesis planning (CASP) has witnessed significant growth in recent years. Still, many CASP programs rely on large data sets to train neural networks, resulting in limitations due to the data quality and prior knowledge from chemists. In response, we propose Retrosynthesis Zero (ReSynZ), a reaction template-based method that combines Monte Carlo Tree Search with reinforcement learning inspired by AlphaGo Zero. Unlike other single-step reaction template-based CASP methods, ReSynZ takes complete synthesis paths for complex molecules, determined by reaction rules, as input for training the neural network. ReSynZ enables neural networks trained with relatively small reaction data sets (tens of thousands of data) to generate multiple synthesis pathways for a target molecule and suggest possible reaction conditions. On multiple data sets of molecular retrosynthesis, ReSynZ demonstrates excellent predictive performance compared to existing algorithms. The advantages, such as self-improving model features, flexible reward settings, the potential to surpass human limitations in chemical synthesis route planning, and others, make ReSynZ a valuable tool in chemical synthesis design.
Collapse
Affiliation(s)
- Jiasheng Guo
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China
| | - Chenning Yu
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China
| | - Kenan Li
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China
| | - Yijian Zhang
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China
| | - Guoqiang Wang
- School of Chemistry and Chemical Engineering, Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, Nanjing University, Nanjing 210023, China
| | - Shuhua Li
- School of Chemistry and Chemical Engineering, Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, Nanjing University, Nanjing 210023, China
| | - Hao Dong
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China
- State Key Laboratory of Analytical Chemistry for Life Science, Chemistry and Biomedicine Innovation Center (ChemBIC), Institute for Brain Sciences, Nanjing University, Nanjing 210023, China
| |
Collapse
|
19
|
Wigh D, Arrowsmith J, Pomberger A, Felton KC, Lapkin AA. ORDerly: Data Sets and Benchmarks for Chemical Reaction Data. J Chem Inf Model 2024; 64:3790-3798. [PMID: 38648077 PMCID: PMC11094788 DOI: 10.1021/acs.jcim.4c00292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/03/2024] [Accepted: 04/04/2024] [Indexed: 04/25/2024]
Abstract
Machine learning has the potential to provide tremendous value to life sciences by providing models that aid in the discovery of new molecules and reduce the time for new products to come to market. Chemical reactions play a significant role in these fields, but there is a lack of high-quality open-source chemical reaction data sets for training machine learning models. Herein, we present ORDerly, an open-source Python package for the customizable and reproducible preparation of reaction data stored in accordance with the increasingly popular Open Reaction Database (ORD) schema. We use ORDerly to clean United States patent data stored in ORD and generate data sets for forward prediction, retrosynthesis, as well as the first benchmark for reaction condition prediction. We train neural networks on data sets generated with ORDerly for condition prediction and show that data sets missing key cleaning steps can lead to silently overinflated performance metrics. Additionally, we train transformers for forward and retrosynthesis prediction and demonstrate how non-patent data can be used to evaluate model generalization. By providing a customizable open-source solution for cleaning and preparing large chemical reaction data, ORDerly is poised to push forward the boundaries of machine learning applications in chemistry.
Collapse
Affiliation(s)
- Daniel
S. Wigh
- Department of Chemical Engineering
and Biotechnology, University of Cambridge, Cambridge CB3 0AS, U.K.
| | - Joe Arrowsmith
- Department of Chemical Engineering
and Biotechnology, University of Cambridge, Cambridge CB3 0AS, U.K.
| | - Alexander Pomberger
- Department of Chemical Engineering
and Biotechnology, University of Cambridge, Cambridge CB3 0AS, U.K.
| | - Kobi C. Felton
- Department of Chemical Engineering
and Biotechnology, University of Cambridge, Cambridge CB3 0AS, U.K.
| | - Alexei A. Lapkin
- Department of Chemical Engineering
and Biotechnology, University of Cambridge, Cambridge CB3 0AS, U.K.
| |
Collapse
|
20
|
Copan AV, Moore KB, Elliott SN, Mulvihill CR, Pratali Maffei L, Klippenstein SJ. Radical Stereochemistry: Accounting for Diastereomers in Kinetic Mechanism Development. J Phys Chem A 2024. [PMID: 38683599 DOI: 10.1021/acs.jpca.4c01060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
Recent work in combustion and atmospheric chemistry has revealed cases in which diastereomers must be distinguished to accurately model a reacting flow. This paper presents an open-source framework for introducing such stereoisomer resolution into a kinetic mechanism. We detail our definitions and algorithms for labeling and enumerating the stereoisomers of a molecule and then generalize our system to describe the transition state (TS) of a reaction. This allows for the stereospecific enumeration of reactants and products while accounting for "fleeting" stereochemistry that is unique to the TS. We also present the AutoMech Chemical Identifier (AMChI), an InChI-like string identifier that accounts for stereocenters omitted by InChI. This identifier is extended to describe the TSs of reactions, providing a universal lookup key for specific reaction channels. The final piece of our methodology is an analytic formula to remove redundancy from a stereoresolved mechanism when its enantiomers exist as a racemic mixture, making it as compact as possible while fully accounting for the differences between diastereomers. In applying our methodology to two subsets of the NUIGMech1.1 mechanism, we find that our approach reduces the extra species added for large-fuel oxidation from 2231 (133%, full expansion) to 694 (41%, nonredundant expansion). We also find that for pyrolysis more than a quarter of the species in the expanded mechanism cannot be properly described by an InChI string, requiring an AMChI string to communicate their identity. Finally, we find that roughly one-quarter of the large-fuel oxidation reactions and one-third of the pyrolysis reactions include fleeting TS stereochemistry, which may have relevant effects on their kinetics.
Collapse
Affiliation(s)
- Andreas V Copan
- College of Engineering, University of Georgia, Athens, Georgia 30602, United States
- Department of Chemistry, University of Georgia, Athens, Georgia 30602, United States
| | - Kevin B Moore
- Chemical Sciences and Engineering Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Sarah N Elliott
- Chemical Sciences and Engineering Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Clayton R Mulvihill
- Department of Mechanical Engineering, Baylor University, Waco, Texas 76798, United States
| | - Luna Pratali Maffei
- Dipartimento di Chimica, Materiali e Ingegneria Chimica "Giulio Natta", Politecnico di Milano, Milano 20133, Italy
| | - Stephen J Klippenstein
- Chemical Sciences and Engineering Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
21
|
Mahjour BA, Coley CW. RDCanon: A Python Package for Canonicalizing the Order of Tokens in SMARTS Queries. J Chem Inf Model 2024; 64:2948-2954. [PMID: 38488634 DOI: 10.1021/acs.jcim.4c00138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
SMARTS is a widely used language in cheminformatics for defining substructural queries for database lookups, reaction templates for chemical transformations, and other applications. As an extension to SMILES, many SMARTS patterns can represent the same query. Despite this, no canonicalization algorithm invariant of the line notation sequence or atomic numbering is publicly available. Here, we introduce RDCanon, an open-source Python package that can be used to standardize SMARTS queries. RDCanon is designed to ensure that the sequence of atomic queries remains consistent for all graphs representing the same substructure query and to ensure a canonical sequence of primitives within each individual atom query; furthermore, the algorithm can be applied to canonicalize the order of reactants, agents, and products and their atom map numbers in reaction SMARTS templates. As part of its canonicalization algorithm, RDCanon provides a mechanism in which the canonicalized SMARTS is optimized for speed against specific molecular databases. Several case studies are provided to showcase improved efficiency in substructure matching and retrosynthetic analysis.
Collapse
Affiliation(s)
- Babak A Mahjour
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
22
|
Strieth-Kalthoff F, Szymkuć S, Molga K, Aspuru-Guzik A, Glorius F, Grzybowski BA. Artificial Intelligence for Retrosynthetic Planning Needs Both Data and Expert Knowledge. J Am Chem Soc 2024. [PMID: 38598363 DOI: 10.1021/jacs.4c00338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Rapid advancements in artificial intelligence (AI) have enabled breakthroughs across many scientific disciplines. In organic chemistry, the challenge of planning complex multistep chemical syntheses should conceptually be well-suited for AI. Yet, the development of AI synthesis planners trained solely on reaction-example-data has stagnated and is not on par with the performance of "hybrid" algorithms combining AI with expert knowledge. This Perspective examines possible causes of these shortcomings, extending beyond the established reasoning of insufficient quantities of reaction data. Drawing attention to the intricacies and data biases that are specific to the domain of synthetic chemistry, we advocate augmenting the unique capabilities of AI with the knowledge base and the reasoning strategies of domain experts. By actively involving synthetic chemists, who are the end users of any synthesis planning software, into the development process, we envision to bridge the gap between computer algorithms and the intricate nature of chemical synthesis.
Collapse
Affiliation(s)
- Felix Strieth-Kalthoff
- University of Toronto, Department of Chemistry and Department of Computer Science, 80 St. George St., Toronto, Ontario M5S 3H6, Canada
- University of Toronto, Department of Computer Science, 10 King's College Road, Toronto, Ontario M5S 3G4, Canada
| | - Sara Szymkuć
- Allchemy, 2145 45th Street #201, Highland, Indiana 46322, United States
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
| | - Karol Molga
- Allchemy, 2145 45th Street #201, Highland, Indiana 46322, United States
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
| | - Alán Aspuru-Guzik
- University of Toronto, Department of Chemistry and Department of Computer Science, 80 St. George St., Toronto, Ontario M5S 3H6, Canada
- University of Toronto, Department of Computer Science, 10 King's College Road, Toronto, Ontario M5S 3G4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave., Toronto, Ontario M5G 1M1, Canada
- University of Toronto, Department of Chemical Engineering and Applied Chemistry, 200 College St., Toronto, Ontario M5S 3E5, Canada
- University of Toronto, Department of Materials Science and Engineering, 184 College St., Toronto, Ontario M5S 3E4, Canada
| | - Frank Glorius
- Universität Münster, Organisch-Chemisches Institut, Corrensstr. 36, 48149 Münster, Germany
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
- IBS Center for Algorithmic and Robotized Synthesis, CARS, UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan 689-798, South Korea
- Department of Chemistry, UNIST, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan 689-798, South Korea
| |
Collapse
|
23
|
Dobbelaere MR, Lengyel I, Stevens CV, Van Geem KM. Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices. J Cheminform 2024; 16:37. [PMID: 38553720 PMCID: PMC10980627 DOI: 10.1186/s13321-024-00834-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/23/2024] [Indexed: 04/02/2024] Open
Abstract
The challenge of devising pathways for organic synthesis remains a central issue in the field of medicinal chemistry. Over the span of six decades, computer-aided synthesis planning has given rise to a plethora of potent tools for formulating synthetic routes. Nevertheless, a significant expert task still looms: determining the appropriate solvent, catalyst, and reagents when provided with a set of reactants to achieve and optimize the desired product for a specific step in the synthesis process. Typically, chemists identify key functional groups and rings that exert crucial influences at the reaction center, classify reactions into categories, and may assign them names. This research introduces Rxn-INSIGHT, an open-source algorithm based on the bond-electron matrix approach, with the purpose of automating this endeavor. Rxn-INSIGHT not only streamlines the process but also facilitates extensive querying of reaction databases, effectively replicating the thought processes of an organic chemist. The core functions of the algorithm encompass the classification and naming of reactions, extraction of functional groups, rings, and scaffolds from the involved chemical entities. The provision of reaction condition recommendations based on the similarity and prevalence of reactions eventually arises as a side application. The performance of our rule-based model has been rigorously assessed against a carefully curated benchmark dataset, exhibiting an accuracy rate exceeding 90% in reaction classification and surpassing 95% in reaction naming. Notably, it has been discerned that a pivotal factor in selecting analogous reactions lies in the analysis of ring structures participating in the reactions. An examination of ring structures within the USPTO chemical reaction database reveals that with just 35 unique rings, a remarkable 75% of all rings found in nearly 1 million products can be encompassed. Furthermore, Rxn-INSIGHT is proficient in suggesting appropriate choices for solvents, catalysts, and reagents in entirely novel reactions, all within the span of a second, utilizing nothing more than an everyday laptop.
Collapse
Affiliation(s)
- Maarten R Dobbelaere
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
| | - István Lengyel
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
- ChemInsights LLC, Dover, DE, 19901, USA
| | - Christian V Stevens
- SynBioC Research Group, Department of Green Chemistry and Technology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Kevin M Van Geem
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium.
| |
Collapse
|
24
|
Xie J, Wang Y, Rao J, Zheng S, Yang Y. Self-Supervised Contrastive Molecular Representation Learning with a Chemical Synthesis Knowledge Graph. J Chem Inf Model 2024; 64:1945-1954. [PMID: 38484468 DOI: 10.1021/acs.jcim.4c00157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Self-supervised molecular representation learning has demonstrated great promise in bridging machine learning and chemical science to accelerate the development of new drugs. Due to the limited reaction data, existing methods are mostly pretrained by augmenting the intrinsic topology of molecules without effectively incorporating chemical reaction prior information, which makes them difficult to generalize to chemical reaction-related tasks. To address this issue, we propose ReaKE, a reaction knowledge embedding framework, which formulates chemical reactions as a knowledge graph. Specifically, we constructed a chemical synthesis knowledge graph with reactants and products as nodes and reaction rules as the edges. Based on the knowledge graph, we further proposed novel contrastive learning at both molecule and reaction levels to capture the reaction-related functional group information within and between molecules. Extensive experiments demonstrate the effectiveness of ReaKE compared with state-of-the-art methods on several downstream tasks, including reaction classification, product prediction, and yield prediction.
Collapse
Affiliation(s)
- Jiancong Xie
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
| | - Yi Wang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
| | - Jiahua Rao
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
| | - Shuangjia Zheng
- Global Institute of Future Technology, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-sen University, Guangzhou 510006, China
| |
Collapse
|
25
|
Yao L, Guo W, Wang Z, Xiang S, Liu W, Ke G. Node-Aligned Graph-to-Graph: Elevating Template-free Deep Learning Approaches in Single-Step Retrosynthesis. JACS AU 2024; 4:992-1003. [PMID: 38559728 PMCID: PMC10976575 DOI: 10.1021/jacsau.3c00737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 01/19/2024] [Accepted: 01/29/2024] [Indexed: 04/04/2024]
Abstract
Single-step retrosynthesis in organic chemistry increasingly benefits from deep learning (DL) techniques in computer-aided synthesis design. While template-free DL models are flexible and promising for retrosynthesis prediction, they often ignore vital 2D molecular information and struggle with atom alignment for node generation, resulting in lower performance compared to the template-based and semi-template-based methods. To address these issues, we introduce node-aligned graph-to-graph (NAG2G), a transformer-based template-free DL model. NAG2G combines 2D molecular graphs and 3D conformations to retain comprehensive molecular details and incorporates product-reactant atom mapping through node alignment, which determines the order of the node-by-node graph outputs process in an autoregressive manner. Through rigorous benchmarking and detailed case studies, we have demonstrated that NAG2G stands out with its remarkable predictive accuracy on the expansive data sets of USPTO-50k and USPTO-FULL. Moreover, the model's practical utility is underscored by its successful prediction of synthesis pathways for multiple drug candidate molecules. This proves not only NAG2G's robustness but also its potential to revolutionize the prediction of complex chemical synthesis processes for future synthetic route design tasks.
Collapse
Affiliation(s)
- Lin Yao
- DP
Technology, Beijing 100080, China
| | - Wentao Guo
- DP
Technology, Beijing 100080, China
- Department
of Chemistry, University of California, Davis, California 95616, United States
- Department
of Statistics, University of California, Davis, California 95616, United States
| | - Zhen Wang
- DP
Technology, Beijing 100080, China
| | | | | | - Guolin Ke
- DP
Technology, Beijing 100080, China
| |
Collapse
|
26
|
Chen S, An S, Babazade R, Jung Y. Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning. Nat Commun 2024; 15:2250. [PMID: 38480709 PMCID: PMC10937625 DOI: 10.1038/s41467-024-46364-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 02/20/2024] [Indexed: 03/17/2024] Open
Abstract
Atom-to-atom mapping (AAM) is a task of identifying the position of each atom in the molecules before and after a chemical reaction, which is important for understanding the reaction mechanism. As more machine learning (ML) models were developed for retrosynthesis and reaction outcome prediction recently, the quality of these models is highly dependent on the quality of the AAM in reaction datasets. Although there are algorithms using graph theory or unsupervised learning to label the AAM for reaction datasets, existing methods map the atoms based on substructure alignments instead of chemistry knowledge. Here, we present LocalMapper, an ML model that learns correct AAM from chemist-labeled reactions via human-in-the-loop machine learning. We show that LocalMapper can predict the AAM for 50 K reactions with 98.5% calibrated accuracy by learning from only 2% of the human-labeled reactions from the entire dataset. More importantly, the confident predictions given by LocalMapper, which cover 97% of 50 K reactions, show 100% accuracy for 3,000 randomly sampled reactions. In an out-of-distribution experiment, LocalMapper shows favorable performance over other existing methods. We expect LocalMapper can be used to generate more precise reaction AAM and improve the quality of future ML-based reaction prediction models.
Collapse
Affiliation(s)
- Shuan Chen
- Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, South Korea
- Department of Chemical and Biological Engineering, Seoul National University, Seoul, South Korea
| | - Sunggi An
- Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, South Korea
- Department of Chemical and Biological Engineering, Seoul National University, Seoul, South Korea
| | | | - Yousung Jung
- Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, South Korea.
- Department of Chemical and Biological Engineering, Seoul National University, Seoul, South Korea.
- Institute of Chemical Processes, Seoul National University, Seoul, South Korea.
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, South Korea.
| |
Collapse
|
27
|
Liu J, Yan C, Yu Y, Lu C, Huang J, Ou-Yang L, Zhao P. MARS: a motif-based autoregressive model for retrosynthesis prediction. Bioinformatics 2024; 40:btae115. [PMID: 38426338 PMCID: PMC10948277 DOI: 10.1093/bioinformatics/btae115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/30/2024] [Accepted: 02/27/2024] [Indexed: 03/02/2024] Open
Abstract
MOTIVATION Retrosynthesis is a critical task in drug discovery, aimed at finding a viable pathway for synthesizing a given target molecule. Many existing approaches frame this task as a graph-generating problem. Specifically, these methods first identify the reaction center, and break a targeted molecule accordingly to generate the synthons. Reactants are generated by either adding atoms sequentially to synthon graphs or by directly adding appropriate leaving groups. However, both of these strategies have limitations. Adding atoms results in a long prediction sequence that increases the complexity of generation, while adding leaving groups only considers those in the training set, which leads to poor generalization. RESULTS In this paper, we propose a novel end-to-end graph generation model for retrosynthesis prediction, which sequentially identifies the reaction center, generates the synthons, and adds motifs to the synthons to generate reactants. Given that chemically meaningful motifs fall between the size of atoms and leaving groups, our model achieves lower prediction complexity than adding atoms and demonstrates superior performance than adding leaving groups. We evaluate our proposed model on a benchmark dataset and show that it significantly outperforms previous state-of-the-art models. Furthermore, we conduct ablation studies to investigate the contribution of each component of our proposed model to the overall performance on benchmark datasets. Experiment results demonstrate the effectiveness of our model in predicting retrosynthesis pathways and suggest its potential as a valuable tool in drug discovery. AVAILABILITY AND IMPLEMENTATION All code and data are available at https://github.com/szu-ljh2020/MARS.
Collapse
Affiliation(s)
- Jiahan Liu
- College of Electronic and Information Engineering, Shenzhen University, Shenzhen 518060, Guangdong, China
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen 518060, Guangdong, China
- Shenzhen Key Laboratory of Media Security and Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen University, Shenzhen 518060, Guangdong, China
| | - Chaochao Yan
- Computer Science and Engineering Department, University of Texas at Artlington, Arlington 76019, TX, United States
| | - Yang Yu
- Tencent AI Lab, Shenzhen 518057, Guangdong, China
| | - Chan Lu
- Tencent AI Lab, Shenzhen 518057, Guangdong, China
| | - Junzhou Huang
- Computer Science and Engineering Department, University of Texas at Artlington, Arlington 76019, TX, United States
| | - Le Ou-Yang
- College of Electronic and Information Engineering, Shenzhen University, Shenzhen 518060, Guangdong, China
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen 518060, Guangdong, China
- Shenzhen Key Laboratory of Media Security and Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen University, Shenzhen 518060, Guangdong, China
| | - Peilin Zhao
- Tencent AI Lab, Shenzhen 518057, Guangdong, China
| |
Collapse
|
28
|
Pham TT, Guo Z, Li B, Lapkin AA, Yan N. Synthesis of Pyrrole-2-Carboxylic Acid from Cellulose- and Chitin-Based Feedstocks Discovered by the Automated Route Search. CHEMSUSCHEM 2024; 17:e202300538. [PMID: 37792551 DOI: 10.1002/cssc.202300538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 10/02/2023] [Accepted: 10/04/2023] [Indexed: 10/06/2023]
Abstract
The shift towards sustainable feedstocks for platform chemicals requires new routes to access functional molecules that contain heteroatoms, but there are limited bio-derived feedstocks that lead to heteroatoms in platform chemicals. Combining renewable molecules of different origins could be a solution to optimize the use of atoms from renewable sources. However, the lack of retrosynthetic tools makes it challenging to examine the extensive reaction networks of various platform molecules focusing on multiple bio-based feedstocks. In this study, a protocol was developed to identify potential transformation pathways that allow for the use of feedstocks from different origins. By analyzing existing knowledge on chemical reactions in large databases, several promising synthetic routes were shortlisted, with the reaction of D-glucosamine and pyruvic acid being the most interesting to make pyrrole-2-carboxylic acid (PCA). The optimized synthetic conditions resulted in 50 % yield of PCA, with insights gained from temperature variant NMR studies. The use of substrates obtained from two different bio-feedstock bases, namely cellulose and chitin, allowed for the establishment of a PCA-based chemical space.
Collapse
Affiliation(s)
- Thuy Trang Pham
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, 117585, Singapore City, Singapore
| | - Zhen Guo
- Cambridge Centre for Advanced Research and Education in Singapore (CARES Ltd), 1 CREATE Way, #05-05 Create Tower, 138602, Singapore City, Singapore
- Chemical Data Intelligence (CDI) Pte Ltd, Robinson Road #02-00, 068898, Singapore City, Singapore
| | - Bing Li
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, 117585, Singapore City, Singapore
| | - Alexei A Lapkin
- Cambridge Centre for Advanced Research and Education in Singapore (CARES Ltd), 1 CREATE Way, #05-05 Create Tower, 138602, Singapore City, Singapore
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, CB3 0AS, UK
| | - Ning Yan
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, 117585, Singapore City, Singapore
| |
Collapse
|
29
|
Chen X, Wu W, Sun H, Chen L, Wang Y, Xia B, Zhou Y. Development and Application of a Comprehensive Nontargeted Screening Strategy for Aristolochic Acid Analogues. Anal Chem 2024; 96:1922-1931. [PMID: 38264982 DOI: 10.1021/acs.analchem.3c04064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
Aristolochic acid analogs (AAAs) are naturally occurring carcinogenic and toxic compounds that pose a safety threat to pharmaceuticals and the environment. It is challenging to screen AAAs due to their lack of characteristic mass spectral fragmentation and their presence of structural diversity. A comprehensive nontargeted screening strategy was proposed by taking into account diverse factors and incorporating various self-developed techniques, and a Python3-based toolkit called AAAs_finder was developed for its implementation. The main procedures consist of virtual structure and ultraviolet and visible (UV) spectra database creation, exact mass and UV spectra-based suspect data extraction, tandem mass spectra (MS/MS) anthropomorphic interpretation, and multicondition retention time (RT) prediction-based candidate structures ranking. To initially assess screening feasibility, eight hypothetical unknown samples were subjected to nontargeted screening using the AAAs_finder toolkit and two other advanced tools. The results showed that the former successfully identified all, while the latter two only managed to identify two and three, respectively, indicating that our strategy was more feasible. After that, the strategy was carefully evaluated for false positives and false negatives, instrument dependence, reproducibility, and sensitivity. After the evaluation, the strategy was successfully applied to the screening of AAAs in real samples, such as herbal medicine, spiked soil, and water. Overall, this study proposed a nontargeted screening strategy and toolkit independent of characteristic mass spectral fragmentation and able to overcome challenges posed by structural diversity for the AAAs screening, which is also valuable for other classes of compounds.
Collapse
Affiliation(s)
- Xiaoqi Chen
- Chengdu Institute of Organic Chemistry, Chinese Academy of Sciences, Chengdu 610041, China
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenlin Wu
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Chengdu Institute of Food Inspection, Chengdu 611130, China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing 100029, China
| | - Hongbing Sun
- Chengdu Institute of Organic Chemistry, Chinese Academy of Sciences, Chengdu 610041, China
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Sichuan Academy of Chinese Medicine Sciences, Chengdu 610041, China
| | - Lu Chen
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yu Wang
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bing Xia
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
| | - Yan Zhou
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
| |
Collapse
|
30
|
Wu Z, Wu Y, Zhu C, Wu X, Zhai S, Wang X, Su Z, Duan H. Efficient Computational Framework for Target-Specific Active Peptide Discovery: A Case Study on IL-17C Targeting Cyclic Peptides. J Chem Inf Model 2023; 63:7655-7668. [PMID: 38049371 DOI: 10.1021/acs.jcim.3c01385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/06/2023]
Abstract
The development of potentially active peptides for specific targets is critical for the modern pharmaceutical industry's growth. In this study, we present an efficient computational framework for the discovery of active peptides targeting a specific pharmacological target, which combines a conditional variational autoencoder (CVAE) and a classifier named TCPP based on the Transformer and convolutional neural network. In our example scenario, we constructed an active cyclic peptide library targeting interleukin-17C (IL-17C) through a library-based in vitro selection strategy. The CVAE model is trained on the preprocessed peptide data sets to generate potentially active peptides and the TCPP further screens the generated peptides. Ultimately, six candidate peptides predicted by the model were synthesized and assayed for their activity, and four of them exhibited promising binding affinity to IL-17C. Our study provides a one-stop-shop for target-specific active peptide discovery, which is expected to boost up the process of peptide drug development.
Collapse
Affiliation(s)
- Zhipeng Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Yejian Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Cheng Zhu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Xinyi Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Silong Zhai
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Xinqiao Wang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Zhihao Su
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| |
Collapse
|
31
|
Liu T, Cao Z, Huang Y, Wan Y, Wu J, Hsieh CY, Hou T, Kang Y. SynCluster: Reaction Type Clustering and Recommendation Framework for Synthesis Planning. JACS AU 2023; 3:3446-3461. [PMID: 38155655 PMCID: PMC10751778 DOI: 10.1021/jacsau.3c00607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 11/07/2023] [Accepted: 11/08/2023] [Indexed: 12/30/2023]
Abstract
AI-assisted synthesis planning has emerged as a valuable tool in accelerating synthetic chemistry for the discovery of new drugs and materials. The template-free approach, which showcases superior generalization capabilities, is seen as the mainstream direction in this field. However, it remains unclear whether such an end-to-end approach can achieve problem-solving performance on par with experienced chemists without fully revealing insights into the chemical mechanisms involved. Moreover, there is a lack of unified and chemically inspired frameworks for improving multitask reaction predictions in this area. In this study, we have addressed these challenges by investigating the impact of fine-grained reaction-type labels on multiple downstream tasks and propose a novel framework named SynCluster. This framework incorporates unsupervised clustering cues into the baseline models and identifies plausible chemical subspaces which is compatible with multitask extensions and can serve as model-independent indicators to effectively enhance the performance of multiple downstream tasks. In retrosynthesis prediction, SynCluster achieves significant improvements of 4.1 and 11.0% in top-1 and top-10 prediction accuracy, respectively, compared to the baseline Molecular Transformer, and achieves a notable enhancement of 13.9% in top-10 accuracy when combined with Retroformer. By incorporating simplified molecular-input line-entry system augmentation, our framework achieves higher top-10 accuracy compared to state-of-the-art sequence-based retrosynthesis models and improves over the baseline on the diversity and validity of reactants. SynCluster also achieves 94.9% top-10 accuracy in forward synthesis prediction and 51.5% top-10 Maxfrag accuracy in reagent prediction. Overall, SynCluster provides a fresh perspective with chemical interpretability and reinforcement of domain knowledge in the synthesis design. It offers a promising solution for improving the accuracy and efficiency of AI-assisted synthesis planning and bridges the gap between template-free approaches and the problem-solving abilities of experienced chemists.
Collapse
Affiliation(s)
- Tiantao Liu
- Innovation
Institute for Artificial Intelligence in Medicine of Zhejiang University,
College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Zheng Cao
- College
of Computer Science and Technology, Zhejiang
University, Hangzhou 310027, Zhejiang, China
| | - Yuansheng Huang
- Innovation
Institute for Artificial Intelligence in Medicine of Zhejiang University,
College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yue Wan
- Tencent
Quantum Laboratory, Shenzhen 518057, Guangdong, China
| | - Jian Wu
- Second
Affiliated Hospital School of Medicine, and School of Public Health, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chang-Yu Hsieh
- Innovation
Institute for Artificial Intelligence in Medicine of Zhejiang University,
College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- Innovation
Institute for Artificial Intelligence in Medicine of Zhejiang University,
College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- Innovation
Institute for Artificial Intelligence in Medicine of Zhejiang University,
College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
32
|
Heid E, Probst D, Green WH, Madsen GKH. EnzymeMap: curation, validation and data-driven prediction of enzymatic reactions. Chem Sci 2023; 14:14229-14242. [PMID: 38098707 PMCID: PMC10718068 DOI: 10.1039/d3sc02048g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 11/21/2023] [Indexed: 12/17/2023] Open
Abstract
Enzymatic reactions are an ecofriendly, selective, and versatile addition, sometimes even alternative to organic reactions for the synthesis of chemical compounds such as pharmaceuticals or fine chemicals. To identify suitable reactions, computational models to predict the activity of enzymes on non-native substrates, to perform retrosynthetic pathway searches, or to predict the outcomes of reactions including regio- and stereoselectivity are becoming increasingly important. However, current approaches are substantially hindered by the limited amount of available data, especially if balanced and atom mapped reactions are needed and if the models feature machine learning components. We therefore constructed a high-quality dataset (EnzymeMap) by developing a large set of correction and validation algorithms for recorded reactions in the literature and showcase its significant positive impact on machine learning models of retrosynthesis, forward prediction, and regioselectivity prediction, outperforming previous approaches by a large margin. Our dataset allows for deep learning models of enzymatic reactions with unprecedented accuracy, and is freely available online.
Collapse
Affiliation(s)
- Esther Heid
- Institute of Materials Chemistry, TU Wien 1060 Vienna Austria
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | | | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | | |
Collapse
|
33
|
Toniato A, Vaucher AC, Lehmann MM, Luksch T, Schwaller P, Stenta M, Laino T. Fast Customization of Chemical Language Models to Out-of-Distribution Data Sets. CHEMISTRY OF MATERIALS : A PUBLICATION OF THE AMERICAN CHEMICAL SOCIETY 2023; 35:8806-8815. [PMID: 38027545 PMCID: PMC10653079 DOI: 10.1021/acs.chemmater.3c01406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/09/2023] [Accepted: 10/09/2023] [Indexed: 12/01/2023]
Abstract
The world is on the verge of a new industrial revolution, and language models are poised to play a pivotal role in this transformative era. Their ability to offer intelligent insights and forecasts has made them a valuable asset for businesses seeking a competitive advantage. The chemical industry, in particular, can benefit significantly from harnessing their power. Since 2016 already, language models have been applied to tasks such as predicting reaction outcomes or retrosynthetic routes. While such models have demonstrated impressive abilities, the lack of publicly available data sets with universal coverage is often the limiting factor for achieving even higher accuracies. This makes it imperative for organizations to incorporate proprietary data sets into their model training processes to improve their performance. So far, however, these data sets frequently remain untapped as there are no established criteria for model customization. In this work, we report a successful methodology for retraining language models on reaction outcome prediction and single-step retrosynthesis tasks, using proprietary, nonpublic data sets. We report a considerable boost in accuracy by combining patent and proprietary data in a multidomain learning formulation. This exercise, inspired by a real-world use case, enables us to formulate guidelines that can be adopted in different corporate settings to customize chemical language models easily.
Collapse
Affiliation(s)
- Alessandra Toniato
- IBM
Research Europe, Rüschlikon 8803, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| | - Alain C. Vaucher
- IBM
Research Europe, Rüschlikon 8803, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| | | | | | - Philippe Schwaller
- IBM
Research Europe, Rüschlikon 8803, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| | - Marco Stenta
- Syngenta
Crop Protection AG, Stein 4332, Switzerland
| | - Teodoro Laino
- IBM
Research Europe, Rüschlikon 8803, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| |
Collapse
|
34
|
Ha T, Lee D, Kwon Y, Park MS, Lee S, Jang J, Choi B, Jeon H, Kim J, Choi H, Seo HT, Choi W, Hong W, Park YJ, Jang J, Cho J, Kim B, Kwon H, Kim G, Oh WS, Kim JW, Choi J, Min M, Jeon A, Jung Y, Kim E, Lee H, Choi YS. AI-driven robotic chemist for autonomous synthesis of organic molecules. SCIENCE ADVANCES 2023; 9:eadj0461. [PMID: 37910607 PMCID: PMC10619927 DOI: 10.1126/sciadv.adj0461] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Accepted: 09/27/2023] [Indexed: 11/03/2023]
Abstract
The automation of organic compound synthesis is pivotal for expediting the development of such compounds. In addition, enhancing development efficiency can be achieved by incorporating autonomous functions alongside automation. To achieve this, we developed an autonomous synthesis robot that harnesses the power of artificial intelligence (AI) and robotic technology to establish optimal synthetic recipes. Given a target molecule, our AI initially plans synthetic pathways and defines reaction conditions. It then iteratively refines these plans using feedback from the experimental robot, gradually optimizing the recipe. The system performance was validated by successfully determining synthetic recipes for three organic compounds, yielding that conversion rates that outperform existing references. Notably, this autonomous system is designed around batch reactors, making it accessible and valuable to chemists in standard laboratory settings, thereby streamlining research endeavors.
Collapse
Affiliation(s)
- Taesin Ha
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Dongseon Lee
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Youngchun Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Min Sik Park
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Sangyoon Lee
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Jaejun Jang
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Byungkwon Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Hyunjeong Jeon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Jeonghun Kim
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Hyundo Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Hyung-Tae Seo
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
- Department of Mechanical Engineering, Kyonggi University, 154-42, Gwanggyosan-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do, 16227, Republic of Korea
| | - Wonje Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Wooram Hong
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Young Jin Park
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
- School of Mechanical Engineering, Gyeongsang National University, 501, Jinju-daero, Jinju-si, Gyeongsangnam-do, Republic of Korea
| | - Junwon Jang
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Joonkee Cho
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Bosung Kim
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Hyukju Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Gahee Kim
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Won Seok Oh
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Jin Woo Kim
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Joonhyuk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Minsik Min
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Aram Jeon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Yongsik Jung
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| | - Eunji Kim
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
- School of Business Administration, Chung-Ang University, 135, Seodal-ro, Dongjak-gu, Seoul 06973, Republic of Korea
| | - Hyosug Lee
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
- College of Information and Communication Engineering, Sungkyunkwan University (SKKU), 2066, Seobu-ro, Jangan-gu, Suwon-si, Gyeonggi-do 16419, Republic of Korea
| | - Youn-Suk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 16678, Republic of Korea
| |
Collapse
|
35
|
Wang X, Hsieh CY, Yin X, Wang J, Li Y, Deng Y, Jiang D, Wu Z, Du H, Chen H, Li Y, Liu H, Wang Y, Luo P, Hou T, Yao X. Generic Interpretable Reaction Condition Predictions with Open Reaction Condition Datasets and Unsupervised Learning of Reaction Center. RESEARCH (WASHINGTON, D.C.) 2023; 6:0231. [PMID: 37849643 PMCID: PMC10578430 DOI: 10.34133/research.0231] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 08/29/2023] [Indexed: 10/19/2023]
Abstract
Effective synthesis planning powered by deep learning (DL) can significantly accelerate the discovery of new drugs and materials. However, most DL-assisted synthesis planning methods offer either none or very limited capability to recommend suitable reaction conditions (RCs) for their reaction predictions. Currently, the prediction of RCs with a DL framework is hindered by several factors, including: (a) lack of a standardized dataset for benchmarking, (b) lack of a general prediction model with powerful representation, and (c) lack of interpretability. To address these issues, we first created 2 standardized RC datasets covering a broad range of reaction classes and then proposed a powerful and interpretable Transformer-based RC predictor named Parrot. Through careful design of the model architecture, pretraining method, and training strategy, Parrot improved the overall top-3 prediction accuracy on catalysis, solvents, and other reagents by as much as 13.44%, compared to the best previous model on a newly curated dataset. Additionally, the mean absolute error of the predicted temperatures was reduced by about 4 °C. Furthermore, Parrot manifests strong generalization capacity with superior cross-chemical-space prediction accuracy. Attention analysis indicates that Parrot effectively captures crucial chemical information and exhibits a high level of interpretability in the prediction of RCs. The proposed model Parrot exemplifies how modern neural network architecture when appropriately pretrained can be versatile in making reliable, generalizable, and interpretable recommendation for RCs even when the underlying training dataset may still be limited in diversity.
Collapse
Affiliation(s)
- Xiaorui Wang
- Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health,
Macau University of Science and Technology, Macao, 999078, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
| | - Xiaodan Yin
- Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health,
Macau University of Science and Technology, Macao, 999078, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Yuquan Li
- College of Chemistry and Chemical Engineering,
Lanzhou University, Lanzhou, 730000, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
| | - Hongming Chen
- Center of Chemistry and Chemical Biology,
Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou 510530, China
| | - Yun Li
- College of Chemistry and Chemical Engineering,
Lanzhou University, Lanzhou, 730000, China
| | - Huanxiang Liu
- Faculty of Applied Sciences,
Macao Polytechnic University, Macao, 999078, China
| | - Yuwei Wang
- College of Pharmacy,
Shaanxi University of Chinese Medicine, Xianyang, Shaanxi, 712044, China
| | - Pei Luo
- Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health,
Macau University of Science and Technology, Macao, 999078, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
| | - Xiaojun Yao
- Faculty of Applied Sciences,
Macao Polytechnic University, Macao, 999078, China
| |
Collapse
|
36
|
Latendresse M, Malerich JP, Herson J, Krummenacker M, Szeto J, Vu VA, Collins N, Madrid PB. SynRoute: A Retrosynthetic Planning Software. J Chem Inf Model 2023; 63:5484-5495. [PMID: 37635298 PMCID: PMC10498441 DOI: 10.1021/acs.jcim.3c00491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Indexed: 08/29/2023]
Abstract
Computer-assisted synthetic planning has seen major advancements that stem from the availability of large reaction databases and artificial intelligence methodologies. SynRoute is a new retrosynthetic planning software tool that uses a relatively small number of general reaction templates, currently 263, along with a literature-based reaction database to find short, practical synthetic routes for target compounds. For each reaction template, a machine learning classifier is trained using data from the Pistachio reaction database to predict whether new computer-generated reactions based on the template are likely to work experimentally in the laboratory. This reaction generation methodology is used together with a vectorized Dijkstra-like search of top-scoring routes organized by synthetic strategies for easy browsing by a synthetic chemist. SynRoute was able to find routes for an average of 83% of compounds based on selection of random subsets of drug-like compounds from the ChEMBL database. Laboratory evaluation of 12 routes produced by SynRoute, to synthesize compounds not from the previous random subsets, demonstrated the ability to produce feasible overall synthetic strategies for all compounds evaluated.
Collapse
Affiliation(s)
| | | | - James Herson
- SRI International, 333 Ravenswood Ave, Menlo Park, California 94025, United States
| | - Markus Krummenacker
- SRI International, 333 Ravenswood Ave, Menlo Park, California 94025, United States
| | | | | | | | | |
Collapse
|
37
|
Sankaranarayanan K, Jensen KF. Computer-assisted multistep chemoenzymatic retrosynthesis using a chemical synthesis planner. Chem Sci 2023; 14:6467-6475. [PMID: 37325140 PMCID: PMC10266459 DOI: 10.1039/d3sc01355c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 05/17/2023] [Indexed: 06/17/2023] Open
Abstract
Chemoenzymatic synthesis methods use organic and enzyme chemistry to synthesize a desired small molecule. Complementing organic synthesis with enzyme-catalyzed selective transformations under mild conditions enables more sustainable and synthetically efficient chemical manufacturing. Here, we present a multistep retrosynthesis search algorithm to facilitate chemoenzymatic synthesis of pharmaceutical compounds, specialty chemicals, commodity chemicals, and monomers. First, we employ the synthesis planner ASKCOS to plan multistep syntheses starting from commercially available materials. Then, we identify transformations that can be catalyzed by enzymes using a small database of biocatalytic reaction rules previously curated for RetroBioCat, a computer-aided synthesis planning tool for biocatalytic cascades. Enzymatic suggestions captured by the approach include ones capable of reducing the number of synthetic steps. We successfully plan chemoenzymatic routes for active pharmaceutical ingredients or their intermediates (e.g., Sitagliptin, Rivastigmine, and Ephedrine), commodity chemicals (e.g., acrylamide and glycolic acid), and specialty chemicals (e.g., S-Metalochlor and Vanillin), in a retrospective fashion. In addition to recovering published routes, the algorithm proposes many sensible alternative pathways. Our approach provides a chemoenzymatic synthesis planning strategy by identifying synthetic transformations that could be candidates for enzyme catalysis.
Collapse
Affiliation(s)
- Karthik Sankaranarayanan
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge Massachusetts 02139 USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge Massachusetts 02139 USA
| |
Collapse
|
38
|
Hong S, Zhuo HH, Jin K, Shao G, Zhou Z. Retrosynthetic planning with experience-guided Monte Carlo tree search. Commun Chem 2023; 6:120. [PMID: 37301940 DOI: 10.1038/s42004-023-00911-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 05/24/2023] [Indexed: 06/12/2023] Open
Abstract
In retrosynthetic planning, the huge number of possible routes to synthesize a complex molecule using simple building blocks leads to a combinatorial explosion of possibilities. Even experienced chemists often have difficulty to select the most promising transformations. The current approaches rely on human-defined or machine-trained score functions which have limited chemical knowledge or use expensive estimation methods for guiding. Here we propose an experience-guided Monte Carlo tree search (EG-MCTS) to deal with this problem. Instead of rollout, we build an experience guidance network to learn knowledge from synthetic experiences during the search. Experiments on benchmark USPTO datasets show that, EG-MCTS gains significant improvement over state-of-the-art approaches both in efficiency and effectiveness. In a comparative experiment with the literature, our computer-generated routes mostly matched the reported routes. Routes designed for real drug compounds exhibit the effectiveness of EG-MCTS on assisting chemists performing retrosynthetic analysis.
Collapse
Affiliation(s)
- Siqi Hong
- School of Computer Science and Engineering, Sun Yat-Sen University, East Outer Ring Road, 510006, Guangzhou, Guangdong, China
| | - Hankz Hankui Zhuo
- School of Computer Science and Engineering, Sun Yat-Sen University, East Outer Ring Road, 510006, Guangzhou, Guangdong, China.
| | - Kebing Jin
- School of Computer Science and Engineering, Sun Yat-Sen University, East Outer Ring Road, 510006, Guangzhou, Guangdong, China
| | - Guang Shao
- School of Chemistry, Sun Yat-Sen University, East Outer Ring Road, 510006, Guangzhou, Guangdong, China
| | - Zhanwen Zhou
- School of Computer Science and Engineering, Sun Yat-Sen University, East Outer Ring Road, 510006, Guangzhou, Guangdong, China
| |
Collapse
|
39
|
Zhong W, Yang Z, Chen CYC. Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nat Commun 2023; 14:3009. [PMID: 37230985 PMCID: PMC10209957 DOI: 10.1038/s41467-023-38851-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 05/17/2023] [Indexed: 05/27/2023] Open
Abstract
Retrosynthesis planning, the process of identifying a set of available reactions to synthesize the target molecules, remains a major challenge in organic synthesis. Recently, computer-aided synthesis planning has gained renewed interest and various retrosynthesis prediction algorithms based on deep learning have been proposed. However, most existing methods are limited to the applicability and interpretability of model predictions, and further improvement of predictive accuracy to a more practical level is still required. In this work, inspired by the arrow-pushing formalism in chemical reaction mechanisms, we present an end-to-end architecture for retrosynthesis prediction called Graph2Edits. Specifically, Graph2Edits is based on graph neural network to predict the edits of the product graph in an auto-regressive manner, and sequentially generates transformation intermediates and final reactants according to the predicted edits sequence. This strategy combines the two-stage processes of semi-template-based methods into one-pot learning, improving the applicability in some complicated reactions, and also making its predictions more interpretable. Evaluated on the standard benchmark dataset USPTO-50k, our model achieves the state-of-the-art performance for semi-template-based retrosynthesis with a promising 55.1% top-1 accuracy.
Collapse
Affiliation(s)
- Weihe Zhong
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China
- School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China
| | - Ziduo Yang
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China.
- Department of Medical Research, China Medical University Hospital, Taichung, 40447, Taiwan.
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung, 41354, Taiwan.
| |
Collapse
|
40
|
Liu Q, Tang K, Zhang L, Du J, Meng Q. Computer‐assisted synthetic planning considering reaction kinetics based on transition state automated generation method. AIChE J 2023. [DOI: 10.1002/aic.18092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
Affiliation(s)
- Qilei Liu
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
| | - Kun Tang
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
| | - Lei Zhang
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
| | - Jian Du
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
| | - Qingwei Meng
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
- Ningbo Research Institute Dalian University of Technology Ningbo 315016 China
| |
Collapse
|
41
|
Stuyver T, Jorner K, Coley CW. Reaction profiles for quantum chemistry-computed [3 + 2] cycloaddition reactions. Sci Data 2023; 10:66. [PMID: 36725850 PMCID: PMC9892576 DOI: 10.1038/s41597-023-01977-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 01/18/2023] [Indexed: 02/03/2023] Open
Abstract
Bio-orthogonal click chemistry based on [3 + 2] dipolar cycloadditions has had a profound impact on the field of biochemistry and significant effort has been devoted to identify promising new candidate reactions for this purpose. To gauge whether a prospective reaction could be a suitable bio-orthogonal click reaction, information about both on- and off-target activation and reaction energies is highly valuable. Here, we use an automated workflow, based on the autodE program, to compute over 5000 reaction profiles for [3 + 2] cycloadditions involving both synthetic dipolarophiles and a set of biologically-inspired structural motifs. Based on a succinct benchmarking study, the B3LYP-D3(BJ)/def2-TZVP//B3LYP-D3(BJ)/def2-SVP level of theory was selected for the DFT calculations, and standard conditions and an (aqueous) SMD model were imposed to mimic physiological conditions. We believe that this data, as well as the presented workflow for high-throughput reaction profile computation, will be useful to screen for new bio-orthogonal reactions, as well as for the development of novel machine learning models for the prediction of chemical reactivity more broadly.
Collapse
Affiliation(s)
- Thijs Stuyver
- grid.116068.80000 0001 2341 2786Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139 USA
| | - Kjell Jorner
- grid.17063.330000 0001 2157 2938Department of Computer Science, University of Toronto, 40 St George St, Toronto, Ontario M5S 2E4 Canada ,grid.17063.330000 0001 2157 2938Department of Chemistry, Chemical Physics Theory Group, 80 St. George St., University of Toronto, Ontario, M5S 3H6 Canada ,grid.5371.00000 0001 0775 6028Department of Chemistry and Chemical Engineering, Chalmers University of Technology, Kemigården 4, SE-41258 Gothenburg, Sweden
| | - Connor W. Coley
- grid.116068.80000 0001 2341 2786Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139 USA ,grid.116068.80000 0001 2341 2786Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139 USA
| |
Collapse
|
42
|
Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023; 14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.
Collapse
Affiliation(s)
- Zhengkai Tu
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|
43
|
Merging enzymatic and synthetic chemistry with computational synthesis planning. Nat Commun 2022; 13:7747. [PMID: 36517480 PMCID: PMC9750992 DOI: 10.1038/s41467-022-35422-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 11/30/2022] [Indexed: 12/15/2022] Open
Abstract
Synthesis planning programs trained on chemical reaction data can design efficient routes to new molecules of interest, but are limited in their ability to leverage rare chemical transformations. This challenge is acute for enzymatic reactions, which are valuable due to their selectivity and sustainability but are few in number. We report a retrosynthetic search algorithm using two neural network models for retrosynthesis-one covering 7984 enzymatic transformations and one 163,723 synthetic transformations-that balances the exploration of enzymatic and synthetic reactions to identify hybrid synthesis plans. This approach extends the space of retrosynthetic moves by thousands of uniquely enzymatic one-step transformations, discovers routes to molecules for which synthetic or enzymatic searches find none, and designs shorter routes for others. Application to (-)-Δ9 tetrahydrocannabinol (THC) (dronabinol) and R,R-formoterol (arformoterol) illustrates how our strategy facilitates the replacement of metal catalysis, high step counts, or costly enantiomeric resolution with more elegant hybrid proposals.
Collapse
|
44
|
Zhu JF, Hao ZK, Liu Q, Yin Y, Lu CQ, Huang ZY, Chen EH. Towards Exploring Large Molecular Space: An Efficient Chemical Genetic Algorithm. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 2022; 37:1464-1477. [PMID: 36594005 PMCID: PMC9797891 DOI: 10.1007/s11390-021-0970-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Accepted: 04/20/2021] [Indexed: 06/17/2023]
Abstract
UNLABELLED Generating molecules with desired properties is an important task in chemistry and pharmacy. An efficient method may have a positive impact on finding drugs to treat diseases like COVID-19. Data mining and artificial intelligence may be good ways to find an efficient method. Recently, both the generative models based on deep learning and the work based on genetic algorithms have made some progress in generating molecules and optimizing the molecule's properties. However, existing methods need to be improved in efficiency and performance. To solve these problems, we propose a method named the Chemical Genetic Algorithm for Large Molecular Space (CALM). Specifically, CALM employs a scalable and efficient molecular representation called molecular matrix. Then, we design corresponding crossover, mutation, and mask operators inspired by domain knowledge and previous studies. We apply our genetic algorithm to several tasks related to molecular property optimization and constraint molecular optimization. The results of these tasks show that our approach outperforms the other state-of-the-art deep learning and genetic algorithm methods, where the z tests performed on the results of several experiments show that our method is more than 99% likely to be significant. At the same time, based on the experimental results, we point out the insufficiency in the experimental evaluation standard which affects the fair evaluation of previous work. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11390-021-0970-3.
Collapse
Affiliation(s)
- Jian-Fu Zhu
- Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026 China
| | - Zhong-Kai Hao
- Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026 China
| | - Qi Liu
- Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026 China
| | - Yu Yin
- Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026 China
| | - Cheng-Qiang Lu
- Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026 China
| | - Zhen-Ya Huang
- Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026 China
| | - En-Hong Chen
- Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026 China
| |
Collapse
|
45
|
RetroComposer: Composing Templates for Template-Based Retrosynthesis Prediction. Biomolecules 2022; 12:biom12091325. [PMID: 36139164 PMCID: PMC9496376 DOI: 10.3390/biom12091325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 09/14/2022] [Accepted: 09/15/2022] [Indexed: 11/22/2022] Open
Abstract
The main target of retrosynthesis is to recursively decompose desired molecules into available building blocks. Existing template-based retrosynthesis methods follow a template selection stereotype and suffer from limited training templates, which prevents them from discovering novel reactions. To overcome this limitation, we propose an innovative retrosynthesis prediction framework that can compose novel templates beyond training templates. As far as we know, this is the first method that uses machine learning to compose reaction templates for retrosynthesis prediction. Besides, we propose an effective reactant candidate scoring model that can capture atom-level transformations, which helps our method outperform previous methods on the USPTO-50K dataset. Experimental results show that our method can produce novel templates for 15 USPTO-50K test reactions that are not covered by training templates. We have released our source implementation.
Collapse
|
46
|
A generalized-template-based graph neural network for accurate organic reactivity prediction. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00526-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
47
|
He Y, Liu K, Han L, Han W. Clustering Analysis, Structure Fingerprint Analysis, and Quantum Chemical Calculations of Compounds from Essential Oils of Sunflower (Helianthus annuus L.) Receptacles. Int J Mol Sci 2022; 23:ijms231710169. [PMID: 36077567 PMCID: PMC9456235 DOI: 10.3390/ijms231710169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/25/2022] [Accepted: 08/30/2022] [Indexed: 11/25/2022] Open
Abstract
Sunflower (Helianthus annuus L.) is an appropriate crop for current new patterns of green agriculture, so it is important to change sunflower receptacles from waste to useful resource. However, there is limited knowledge on the functions of compounds from the essential oils of sunflower receptacles. In this study, a new method was created for chemical space network analysis and classification of small samples, and applied to 104 compounds. Here, t-SNE (t-Distributed Stochastic Neighbor Embedding) dimensions were used to reduce coordinates as node locations and edge connections of chemical space networks, respectively, and molecules were grouped according to whether the edges were connected and the proximity of the node coordinates. Through detailed analysis of the structural characteristics and fingerprints of each classified group, our classification method attained good accuracy. Targets were then identified using reverse docking methods, and the active centers of the same types of compounds were determined by quantum chemical calculation. The results indicated that these compounds can be divided into nine groups, according to their mean within-group similarity (MWGS) values. The three families with the most members, i.e., the d-limonene group (18), α-pinene group (10), and γ-maaliene group (nine members) determined the protein targets, using PharmMapper. Structure fingerprint analysis was employed to predict the binding mode of the ligands of four families of the protein targets. Thence, quantum chemical calculations were applied to the active group of the representative compounds of the four families. This study provides further scientific information to support the use of sunflower receptacles.
Collapse
Affiliation(s)
| | | | - Lu Han
- Correspondence: (L.H.); (W.H.)
| | | |
Collapse
|
48
|
Wang X, Yao C, Zhang Y, Yu J, Qiao H, Zhang C, Wu Y, Bai R, Duan H. From theory to experiment: transformer-based generation enables rapid discovery of novel reactions. J Cheminform 2022; 14:60. [PMID: 36056425 PMCID: PMC9438336 DOI: 10.1186/s13321-022-00638-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Accepted: 08/11/2022] [Indexed: 11/10/2022] Open
Abstract
Deep learning methods, such as reaction prediction and retrosynthesis analysis, have demonstrated their significance in the chemical field. However, the de novo generation of novel reactions using artificial intelligence technology requires further exploration. Inspired by molecular generation, we proposed a novel task of reaction generation. Herein, Heck reactions were applied to train the transformer model, a state-of-art natural language process model, to generate 4717 reactions after sampling and processing. Then, 2253 novel Heck reactions were confirmed by organizing chemists to judge the generated reactions. More importantly, further organic synthesis experiments were performed to verify the accuracy and feasibility of representative reactions. The total process, from Heck reaction generation to experimental verification, required only 15 days, demonstrating that our model has well-learned reaction rules in-depth and can contribute to novel reaction discovery and chemical space exploration.
Collapse
Affiliation(s)
- Xinqiao Wang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Chuansheng Yao
- College of Pharmacy, School of Medicine, Hangzhou Normal University, Hangzhou, People's Republic of China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou, People's Republic of China
| | - Yun Zhang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Jiahui Yu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Haoran Qiao
- College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai, 201203, People's Republic of China
| | - Chengyun Zhang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Yejian Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Renren Bai
- College of Pharmacy, School of Medicine, Hangzhou Normal University, Hangzhou, People's Republic of China.
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou, People's Republic of China.
| | - Hongliang Duan
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China.
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica (SIMM), Chinese Academy of Sciences, Shanghai, 201203, China.
| |
Collapse
|
49
|
Wang Z, Sun Z, Yin H, Liu X, Wang J, Zhao H, Pang CH, Wu T, Li S, Yin Z, Yu XF. Data-Driven Materials Innovation and Applications. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2022; 34:e2104113. [PMID: 35451528 DOI: 10.1002/adma.202104113] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 03/19/2022] [Indexed: 05/07/2023]
Abstract
Owing to the rapid developments to improve the accuracy and efficiency of both experimental and computational investigative methodologies, the massive amounts of data generated have led the field of materials science into the fourth paradigm of data-driven scientific research. This transition requires the development of authoritative and up-to-date frameworks for data-driven approaches for material innovation. A critical discussion on the current advances in the data-driven discovery of materials with a focus on frameworks, machine-learning algorithms, material-specific databases, descriptors, and targeted applications in the field of inorganic materials is presented. Frameworks for rationalizing data-driven material innovation are described, and a critical review of essential subdisciplines is presented, including: i) advanced data-intensive strategies and machine-learning algorithms; ii) material databases and related tools and platforms for data generation and management; iii) commonly used molecular descriptors used in data-driven processes. Furthermore, an in-depth discussion on the broad applications of material innovation, such as energy conversion and storage, environmental decontamination, flexible electronics, optoelectronics, superconductors, metallic glasses, and magnetic materials, is provided. Finally, how these subdisciplines (with insights into the synergy of materials science, computational tools, and mathematics) support data-driven paradigms is outlined, and the opportunities and challenges in data-driven material innovation are highlighted.
Collapse
Affiliation(s)
- Zhuo Wang
- Materials Interfaces Center, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, P. R. China
- Department of Chemical and Environmental Engineering, University of Nottingham Ningbo China, Ningbo, 315100, P. R. China
| | - Zhehao Sun
- Research School of Chemistry, The Australian National University, ACT, 2601, Australia
| | - Hang Yin
- Research School of Chemistry, The Australian National University, ACT, 2601, Australia
| | - Xinghui Liu
- Department of Chemistry, Sungkyunkwan University (SKKU), 2066 Seoburo, Jangan-Gu, Suwon, 16419, Republic of Korea
| | - Jinlan Wang
- School of Physics, Southeast University, Nanjing, 211189, P. R. China
| | - Haitao Zhao
- Materials Interfaces Center, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, P. R. China
| | - Cheng Heng Pang
- Department of Chemical and Environmental Engineering, University of Nottingham Ningbo China, Ningbo, 315100, P. R. China
- Municipal Key Laboratory of Clean Energy Conversion Technologies, University of Nottingham Ningbo China, Ningbo, 315100, P. R. China
| | - Tao Wu
- Key Laboratory for Carbonaceous Wastes Processing and Process Intensification Research of Zhejiang Province, University of Nottingham Ningbo China, Ningbo, 315100, P. R. China
- New Materials Institute, University of Nottingham, Ningbo, China, Ningbo, 315100, P. R. China
| | - Shuzhou Li
- School of Materials Science and Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Zongyou Yin
- Research School of Chemistry, The Australian National University, ACT, 2601, Australia
| | - Xue-Feng Yu
- Materials Interfaces Center, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, P. R. China
| |
Collapse
|
50
|
Cao Y, Yang ZQ, Zhang XL, Fan W, Wang Y, Shen J, Wei DQ, Li Q, Wei XY. Identifying the kind behind SMILES-anatomical therapeutic chemical classification using structure-only representations. Brief Bioinform 2022; 23:6677124. [PMID: 36027578 DOI: 10.1093/bib/bbac346] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 07/11/2022] [Accepted: 07/26/2022] [Indexed: 01/25/2023] Open
Abstract
Anatomical Therapeutic Chemical (ATC) classification for compounds/drugs plays an important role in drug development and basic research. However, previous methods depend on interactions extracted from STITCH dataset which may make it depend on lab experiments. We present a pilot study to explore the possibility of conducting the ATC prediction solely based on the molecular structures. The motivation is to eliminate the reliance on the costly lab experiments so that the characteristics of a drug can be pre-assessed for better decision-making and effort-saving before the actual development. To this end, we construct a new benchmark consisting of 4545 compounds which is with larger scale than the one used in previous study. A light-weight prediction model is proposed. The model is with better explainability in the sense that it is consists of a straightforward tokenization that extracts and embeds statistically and physicochemically meaningful tokens, and a deep network backed by a set of pyramid kernels to capture multi-resolution chemical structural characteristics. Its efficacy has been validated in the experiments where it outperforms the state-of-the-art methods by 15.53% in accuracy and by 69.66% in terms of efficiency. We make the benchmark dataset, source code and web server open to ease the reproduction of this study.
Collapse
Affiliation(s)
- Yi Cao
- Department of Computer Science, Sichuan University, 610065, Chengdu, China
| | - Zhen-Qun Yang
- Department of Biomedical Engineering, Chinese University of Hong Kong, Street, Shatin, Hong Kong
| | - Xu-Lu Zhang
- Department of Computer Science, Sichuan University, 610065, Chengdu, China
| | - Wenqi Fan
- Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong
| | - Yaowei Wang
- Peng Cheng Laboratory, 518000, Shenzhen, China
| | | | - Dong-Qing Wei
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Qing Li
- Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong
| | - Xiao-Yong Wei
- Department of Computer Science, Sichuan University, 610065, Chengdu, China.,Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong
| |
Collapse
|