1
|
Chen LY, Li YP. AutoTemplate: enhancing chemical reaction datasets for machine learning applications in organic chemistry. J Cheminform 2024; 16:74. [PMID: 38937840 PMCID: PMC11212196 DOI: 10.1186/s13321-024-00869-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 06/09/2024] [Indexed: 06/29/2024] Open
Abstract
This paper presents AutoTemplate, an innovative data preprocessing protocol, addressing the crucial need for high-quality chemical reaction datasets in the realm of machine learning applications in organic chemistry. Recent advances in artificial intelligence have expanded the application of machine learning in chemistry, particularly in yield prediction, retrosynthesis, and reaction condition prediction. However, the effectiveness of these models hinges on the integrity of chemical reaction datasets, which are often plagued by inconsistencies like missing reactants, incorrect atom mappings, and outright erroneous reactions. AutoTemplate introduces a two-stage approach to refine these datasets. The first stage involves extracting meaningful reaction transformation rules and formulating generic reaction templates using a simplified SMARTS representation. This simplification broadens the applicability of templates across various chemical reactions. The second stage is template-guided reaction curation, where these templates are systematically applied to validate and correct the reaction data. This process effectively amends missing reactant information, rectifies atom-mapping errors, and eliminates incorrect data entries. A standout feature of AutoTemplate is its capability to concurrently identify and correct false chemical reactions. It operates on the premise that most reactions in datasets are accurate, using these as templates to guide the correction of flawed entries. The protocol demonstrates its efficacy across a range of chemical reactions, significantly enhancing dataset quality. This advancement provides a more robust foundation for developing reliable machine learning models in chemistry, thereby improving the accuracy of forward and retrosynthetic predictions. AutoTemplate marks a significant progression in the preprocessing of chemical reaction datasets, bridging a vital gap and facilitating more precise and efficient machine learning applications in organic synthesis. SCIENTIFIC CONTRIBUTION: The proposed automated preprocessing tool for chemical reaction data aims to identify errors within chemical databases. Specifically, if the errors involve atom mapping or the absence of reactant types, corrections can be systematically applied using reaction templates, ultimately elevating the overall quality of the database.
Collapse
Affiliation(s)
- Lung-Yi Chen
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 10617, Taiwan
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 10617, Taiwan.
- Taiwan International Graduate Program on Sustainable Chemical Science and Technology (TIGP-SCST), No. 128, Sec. 2, Academia Road, Taipei, 11529, Taiwan.
| |
Collapse
|
2
|
Srinivasan K, Puliyanda A, Prasad V. Identification of Reaction Network Hypotheses for Complex Feedstocks from Spectroscopic Measurements with Minimal Human Intervention. J Phys Chem A 2024; 128:4714-4729. [PMID: 38836378 DOI: 10.1021/acs.jpca.4c01592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
In this work, we detail an automated reaction network hypothesis generation protocol for processes involving complex feedstocks where information about the species and reactions involved is unknown. Our methodology is process agnostic and can be utilized in any reactive process with spectroscopic measurements that provide information on the evolution of the components in the mixture. We decompose the mixture spectra to obtain spectroscopic signatures of the individual components and use a 1-D convolutional neural network to automatically identify functional groups indicated by them. We employ atom-atom mapping to automatically recover reaction rules that are applied on candidate molecules identified from chemistry databases through fingerprint similarity. The method is tested on synthetic data and on spectroscopic measurements of lab-scale batch hydrothermal liquefaction (HTL) of biomass to determine the accuracy of prediction across datasets of varying complexities. Our methodology is able to identify reaction network hypotheses containing reaction networks close to the ground truth in the case of synthetic data, and we are also able to recover candidate molecules and reaction networks close to the ones reported in the previous literature studies for biomass pyrolysis.
Collapse
Affiliation(s)
- Karthik Srinivasan
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| | - Anjana Puliyanda
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| | - Vinay Prasad
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| |
Collapse
|
3
|
Gosavi AA, Nandgude TD, Mishra RK, Puri DB. Exploring the Potential of Artificial Intelligence as a Facilitating Tool for Formulation Development in Fluidized Bed Processor: a Comprehensive Review. AAPS PharmSciTech 2024; 25:111. [PMID: 38740666 DOI: 10.1208/s12249-024-02816-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 04/23/2024] [Indexed: 05/16/2024] Open
Abstract
This in-depth study looks into how artificial intelligence (AI) could be used to make formulation development easier in fluidized bed processes (FBP). FBP is complex and involves numerous variables, making optimization challenging. Various AI techniques have addressed this challenge, including machine learning, neural networks, genetic algorithms, and fuzzy logic. By integrating AI with experimental design, process modeling, and optimization strategies, intelligent systems for FBP can be developed. The advantages of AI in this context include improved process understanding, reduced time and cost, enhanced product quality, and robust formulation optimization. However, data availability, model interpretability, and regulatory compliance challenges must be addressed. Case studies demonstrate successful applications of AI in decision-making, process outcome prediction, and scale-up. AI can improve efficiency, quality, and cost-effectiveness in significant ways. Still, it is important to think carefully about data quality, how easy it is to understand, and how to follow the rules. Future research should focus on fully harnessing the potential of AI to advance formulation development in FBP.
Collapse
Affiliation(s)
- Aachal A Gosavi
- Department of Pharmaceutics, Dr. D. Y. Patil Institute of Pharmaceutical Sciences and Research, Pimpri, Pune, India
| | - Tanaji D Nandgude
- Department of Pharmaceutics, JSPM University's School of Pharmaceutical Sciences, Wagholi, Pune, India
| | - Rakesh K Mishra
- Department of Pharmaceutics, Dr. D. Y. Patil Institute of Pharmaceutical Sciences and Research, Pimpri, Pune, India.
| | - Dhiraj B Puri
- Department of Mechanical Engineering, Birla Institute of Technology and Science-Pilani, K K Birla Goa Campus, Zuarinagar, Sancoale, Goa, India
| |
Collapse
|
4
|
Dobbelaere MR, Lengyel I, Stevens CV, Van Geem KM. Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices. J Cheminform 2024; 16:37. [PMID: 38553720 PMCID: PMC10980627 DOI: 10.1186/s13321-024-00834-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/23/2024] [Indexed: 04/02/2024] Open
Abstract
The challenge of devising pathways for organic synthesis remains a central issue in the field of medicinal chemistry. Over the span of six decades, computer-aided synthesis planning has given rise to a plethora of potent tools for formulating synthetic routes. Nevertheless, a significant expert task still looms: determining the appropriate solvent, catalyst, and reagents when provided with a set of reactants to achieve and optimize the desired product for a specific step in the synthesis process. Typically, chemists identify key functional groups and rings that exert crucial influences at the reaction center, classify reactions into categories, and may assign them names. This research introduces Rxn-INSIGHT, an open-source algorithm based on the bond-electron matrix approach, with the purpose of automating this endeavor. Rxn-INSIGHT not only streamlines the process but also facilitates extensive querying of reaction databases, effectively replicating the thought processes of an organic chemist. The core functions of the algorithm encompass the classification and naming of reactions, extraction of functional groups, rings, and scaffolds from the involved chemical entities. The provision of reaction condition recommendations based on the similarity and prevalence of reactions eventually arises as a side application. The performance of our rule-based model has been rigorously assessed against a carefully curated benchmark dataset, exhibiting an accuracy rate exceeding 90% in reaction classification and surpassing 95% in reaction naming. Notably, it has been discerned that a pivotal factor in selecting analogous reactions lies in the analysis of ring structures participating in the reactions. An examination of ring structures within the USPTO chemical reaction database reveals that with just 35 unique rings, a remarkable 75% of all rings found in nearly 1 million products can be encompassed. Furthermore, Rxn-INSIGHT is proficient in suggesting appropriate choices for solvents, catalysts, and reagents in entirely novel reactions, all within the span of a second, utilizing nothing more than an everyday laptop.
Collapse
Affiliation(s)
- Maarten R Dobbelaere
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
| | - István Lengyel
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
- ChemInsights LLC, Dover, DE, 19901, USA
| | - Christian V Stevens
- SynBioC Research Group, Department of Green Chemistry and Technology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Kevin M Van Geem
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium.
| |
Collapse
|
5
|
Chen Z, Zhou R, Ren P. Spectraformer: deep learning model for grain spectral qualitative analysis based on transformer structure. RSC Adv 2024; 14:8053-8066. [PMID: 38454940 PMCID: PMC10918770 DOI: 10.1039/d3ra07708j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 02/08/2024] [Indexed: 03/09/2024] Open
Abstract
This study delves into the use of compact near-infrared spectroscopy instruments for distinguishing between different varieties of barley, chickpeas, and sorghum, addressing a vital need in agriculture for precise crop variety identification. This identification is crucial for optimizing crop performance in diverse environmental conditions and enhancing food security and agricultural productivity. We also explore the potential application of transformer models in near-infrared spectroscopy and conduct an in-depth evaluation of the impact of data preprocessing and machine learning algorithms on variety classification. In our proposed spectraformer multi-classification model, we successfully differentiated 24 barley varieties, 19 chickpea varieties, and ten sorghum varieties, with the highest accuracy rates reaching 85%, 95%, and 86%, respectively. These results demonstrate that small near-infrared spectroscopy instruments are cost-effective and efficient tools with the potential to advance research in various identification methods, but also underscore the value of transformer models in near-infrared spectroscopy classification. Furthermore, we delve into the discussion of the influence of data preprocessing on the performance of deep learning models compared to traditional machine learning models, providing valuable insights for future research in this field.
Collapse
Affiliation(s)
- Zhuo Chen
- School of Information Engineering, Shanghai Maritime University Shanghai 201306 China
- Research Center of Intelligent Information Processing and Quantum Intelligent Computing Shanghai 201306 China
| | - Rigui Zhou
- School of Information Engineering, Shanghai Maritime University Shanghai 201306 China
- Research Center of Intelligent Information Processing and Quantum Intelligent Computing Shanghai 201306 China
| | - Pengju Ren
- School of Information Engineering, Shanghai Maritime University Shanghai 201306 China
- Research Center of Intelligent Information Processing and Quantum Intelligent Computing Shanghai 201306 China
| |
Collapse
|
6
|
Chen Z, Ayinde OR, Fuchs JR, Sun H, Ning X. G 2Retro as a two-step graph generative models for retrosynthesis prediction. Commun Chem 2023; 6:102. [PMID: 37253928 DOI: 10.1038/s42004-023-00897-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 05/04/2023] [Indexed: 06/01/2023] Open
Abstract
Retrosynthesis is a procedure where a target molecule is transformed into potential reactants and thus the synthesis routes can be identified. Recently, computational approaches have been developed to accelerate the design of synthesis routes. In this paper,we develop a generative framework G2Retro for one-step retrosynthesis prediction. G2Retro imitates the reversed logic of synthetic reactions. It first predicts the reaction centers in the target molecules (products), identifies the synthons needed to assemble the products, and transforms these synthons into reactants. G2Retro defines a comprehensive set of reaction center types, and learns from the molecular graphs of the products to predict potential reaction centers. To complete synthons into reactants, G2Retro considers all the involved synthon structures and the product structures to identify the optimal completion paths, and accordingly attaches small substructures sequentially to the synthons. Here we show that G2Retro is able to better predict the reactants for given products in the benchmark dataset than the state-of-the-art methods.
Collapse
Affiliation(s)
- Ziqi Chen
- Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA
| | - Oluwatosin R Ayinde
- Medicinal Chemistry and Pharmacognosy, College of Pharmacy, The Ohio State University, Columbus, OH, 43210, USA
| | - James R Fuchs
- Medicinal Chemistry and Pharmacognosy, College of Pharmacy, The Ohio State University, Columbus, OH, 43210, USA
| | - Huan Sun
- Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH, 43210, USA
| | - Xia Ning
- Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA.
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH, 43210, USA.
- Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
7
|
A Review on Artificial Intelligence Enabled Design, Synthesis, and Process Optimization of Chemical Products for Industry 4.0. Processes (Basel) 2023. [DOI: 10.3390/pr11020330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
With the development of Industry 4.0, artificial intelligence (AI) is gaining increasing attention for its performance in solving particularly complex problems in industrial chemistry and chemical engineering. Therefore, this review provides an overview of the application of AI techniques, in particular machine learning, in chemical design, synthesis, and process optimization over the past years. In this review, the focus is on the application of AI for structure-function relationship analysis, synthetic route planning, and automated synthesis. Finally, we discuss the challenges and future of AI in making chemical products.
Collapse
|
8
|
Wu Z, Cai X, Zhang C, Qiao H, Wu Y, Zhang Y, Wang X, Xie H, Luo F, Duan H. Self-Supervised Molecular Pretraining Strategy for Low-Resource Reaction Prediction Scenarios. J Chem Inf Model 2022; 62:4579-4590. [PMID: 36129104 DOI: 10.1021/acs.jcim.2c00588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In the face of low-resource reaction training samples, we construct a chemical platform for addressing small-scale reaction prediction problems. Using a self-supervised pretraining strategy called MAsked Sequence to Sequence (MASS), the Transformer model can absorb the chemical information of about 1 billion molecules and then fine-tune on a small-scale reaction prediction. To further strengthen the predictive performance of our model, we combine MASS with the reaction transfer learning strategy. Here, we show that the average improved accuracies of the Transformer model can reach 14.07, 24.26, 40.31, and 57.69% in predicting the Baeyer-Villiger, Heck, C-C bond formation, and functional group interconversion reaction data sets, respectively, marking an important step to low-resource reaction prediction.
Collapse
Affiliation(s)
- Zhipeng Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China
| | - Xiang Cai
- PyWise Biotech, Suzhou 215000, P. R. China
| | - Chengyun Zhang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China
| | - Haoran Qiao
- College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai 201203, P. R. China
| | - Yejian Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China
| | - Yun Zhang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China
| | - Xinqiao Wang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China
| | - Haiying Xie
- PUROTON Gene Medical Institute Co., Ltd., Chongqing 400700, P. R. China
| | - Feng Luo
- PUROTON Gene Medical Institute Co., Ltd., Chongqing 400700, P. R. China
| | - Hongliang Duan
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China
| |
Collapse
|
9
|
Zhu LT, Chen XZ, Ouyang B, Yan WC, Lei H, Chen Z, Luo ZH. Review of Machine Learning for Hydrodynamics, Transport, and Reactions in Multiphase Flows and Reactors. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.2c01036] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Li-Tao Zhu
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Xi-Zhong Chen
- Department of Chemical and Biological Engineering, University of Sheffield, Sheffield, S1 3JD, U.K
| | - Bo Ouyang
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Wei-Cheng Yan
- School of Chemistry and Chemical Engineering, Jiangsu University, Zhenjiang, Jiangsu 212013, China
| | - He Lei
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Zhe Chen
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Zheng-Hong Luo
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| |
Collapse
|
10
|
Challenges and Opportunities in Carbon Capture, Utilization and Storage: A Process Systems Engineering Perspective. Comput Chem Eng 2022. [DOI: 10.1016/j.compchemeng.2022.107925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
11
|
Su A, Cheng Y, Xue H, She Y, Rajan K. Artificial intelligence informed toxicity screening of amine chemistries used in the synthesis of hybrid
organic–inorganic
perovskites. AIChE J 2022. [DOI: 10.1002/aic.17699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- An Su
- College of Chemical Engineering Zhejiang University of Technology Hangzhou China
- Department of Materials Design and Innovation University at Buffalo Buffalo New York USA
| | - Yingying Cheng
- College of Chemical Engineering Zhejiang University of Technology Hangzhou China
| | - Haotian Xue
- Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals Zhejiang University of Technology Hangzhou China
| | - Yuanbin She
- College of Chemical Engineering Zhejiang University of Technology Hangzhou China
| | - Krishna Rajan
- Department of Materials Design and Innovation University at Buffalo Buffalo New York USA
| |
Collapse
|
12
|
|