1
|
Orlov AA, Akhmetshin TN, Horvath D, Marcou G, Varnek A. From High Dimensions to Human Insight: Exploring Dimensionality Reduction for Chemical Space Visualization. Mol Inform 2024:e202400265. [PMID: 39633514 DOI: 10.1002/minf.202400265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 11/08/2024] [Accepted: 11/09/2024] [Indexed: 12/07/2024]
Abstract
Dimensionality reduction is an important exploratory data analysis method that allows high-dimensional data to be represented in a human-interpretable lower-dimensional space. It is extensively applied in the analysis of chemical libraries, where chemical structure data - represented as high-dimensional feature vectors-are transformed into 2D or 3D chemical space maps. In this paper, commonly used dimensionality reduction techniques - Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), and Generative Topographic Mapping (GTM) - are evaluated in terms of neighborhood preservation and visualization capability of sets of small molecules from the ChEMBL database.
Collapse
Affiliation(s)
- Alexey A Orlov
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal Str., 67000, Strasbourg, France
| | - Tagir N Akhmetshin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal Str., 67000, Strasbourg, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal Str., 67000, Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal Str., 67000, Strasbourg, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal Str., 67000, Strasbourg, France
| |
Collapse
|
2
|
Plyer L, Marcou G, Perves C, Bonachera F, Varnek A. Implementation of a soft grading system for chemistry in a Moodle plugin: reaction handling. J Cheminform 2024; 16:90. [PMID: 39090756 PMCID: PMC11295431 DOI: 10.1186/s13321-024-00889-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 07/21/2024] [Indexed: 08/04/2024] Open
Abstract
Here, we present a new method for evaluating questions on chemical reactions in the context of remote education. This method can be used when binary grading is not sufficient as some tolerance may be acceptable. In order to determine a grade, the developed workflow uses the pairwise similarity assessment of two considered reactions, each encoded by a single molecular graph with the help of the Condensed Graph of Reaction (CGR) approach. This workflow is part of the ChemMoodle project and is implemented as a Moodle Plugin. It uses the Chemdoodle engine for reaction drawing and visualization and communicates with a REST server calculating the similarity score using ISIDA fragment descriptors. The plugin is open-source, accessible in GitHub ( https://github.com/Laboratoire-de-Chemoinformatique/moodle-qtype_reacsimilarity ) and on the Moodle plugin store ( https://moodle.org/plugins/qtype_reacsimilarity?lang=en ). Both similarity measures and fragmentation can be configured.Scientific contribution This work introduces an open-source method for evaluating chemical reaction questions within Moodle using the CGR approach. Our contribution provides a nuanced grading mechanism that accommodates acceptable tolerances in reaction assessments, enhancing the accuracy and flexibility of the grading process.
Collapse
Affiliation(s)
- Louis Plyer
- Faculté de Chimie, University of Strasbourg, Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics-UMR7140, University of Strasbourg, Strasbourg, France.
| | - Céline Perves
- Direction du Numérique (DNUM), University of Strasbourg, Strasbourg, France
| | - Fanny Bonachera
- Laboratory of Chemoinformatics-UMR7140, University of Strasbourg, Strasbourg, France
| | - Alexander Varnek
- Laboratory of Chemoinformatics-UMR7140, University of Strasbourg, Strasbourg, France
| |
Collapse
|
3
|
Shen L, Fang J, Liu L, Yang F, Jenkins JL, Kutchukian PS, Wang H. Pocket Crafter: a 3D generative modeling based workflow for the rapid generation of hit molecules in drug discovery. J Cheminform 2024; 16:33. [PMID: 38515171 PMCID: PMC10958880 DOI: 10.1186/s13321-024-00829-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 03/16/2024] [Indexed: 03/23/2024] Open
Abstract
We present a user-friendly molecular generative pipeline called Pocket Crafter, specifically designed to facilitate hit finding activity in the drug discovery process. This workflow utilized a three-dimensional (3D) generative modeling method Pocket2Mol, for the de novo design of molecules in spatial perspective for the targeted protein structures, followed by filters for chemical-physical properties and drug-likeness, structure-activity relationship analysis, and clustering to generate top virtual hit scaffolds. In our WDR5 case study, we acquired a focused set of 2029 compounds after a targeted searching within Novartis archived library based on the virtual scaffolds. Subsequently, we experimentally profiled these compounds, resulting in a novel chemical scaffold series that demonstrated activity in biochemical and biophysical assays. Pocket Crafter successfully prototyped an effective end-to-end 3D generative chemistry-based workflow for the exploration of new chemical scaffolds, which represents a promising approach in early drug discovery for hit identification.
Collapse
Affiliation(s)
- Lingling Shen
- Novartis Biomedical Research, Cambridge, MA, 02139, USA.
| | - Jian Fang
- Novartis Biomedical Research, Cambridge, MA, 02139, USA
| | - Lulu Liu
- Novartis Biomedical Research, Cambridge, MA, 02139, USA
| | - Fei Yang
- Novartis Biomedical Research, Cambridge, MA, 02139, USA
| | | | | | - He Wang
- Novartis Biomedical Research, Cambridge, MA, 02139, USA.
| |
Collapse
|
4
|
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 2024; 23:141-155. [PMID: 38066301 DOI: 10.1038/s41573-023-00832-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 02/08/2024]
Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- University of British Columbia, Vancouver, BC, Canada.
- Photonic Inc., Coquitlam, BC, Canada.
| |
Collapse
|
5
|
Bajorath J. Chemical language models for molecular design. Mol Inform 2024; 43:e202300288. [PMID: 38010610 DOI: 10.1002/minf.202300288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/22/2023] [Accepted: 11/23/2023] [Indexed: 11/29/2023]
Abstract
In drug discovery, chemical language models (CLMs) originating from natural language processing offer new opportunities for molecular design. CLMs have been developed using recurrent neural network (RNN) or transformer architectures. For the predictive performance of RNN-based encoder-decoder frameworks and transformers, attention mechanisms play a central role. Among others, emerging application areas for CLMs include constrained generative modeling and the prediction of chemical reactions or drug-target interactions. Since CLMs are applicable to any compound or target data that can be presented in a sequential format and tokenized, mappings of different types of sequences can be learned. For example, active compounds can be predicted from protein sequence motifs. Novel off-the-beat-path applications can also be considered. For example, analogue series from medicinal chemistry can be perceived and represented as chemical sequences and extended with new compounds using CLMs. Herein, methodological features of CLMs and different applications are discussed.
Collapse
Affiliation(s)
- Jürgen Bajorath
- Department of Life Science Informatics, Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany
- Lamarr Institute for Machine Learning and Artificial Intelligence, Rheinische Friedrich-Wilhelms-Universität Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany
| |
Collapse
|
6
|
Makarov DM, Lukanov MM, Rusanov AI, Mamardashvili NZ, Ksenofontov AA. Machine learning approach for predicting the yield of pyrroles and dipyrromethanes condensation reactions with aldehydes. JOURNAL OF COMPUTATIONAL SCIENCE 2023; 74:102173. [DOI: 10.1016/j.jocs.2023.102173] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2024]
|
7
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
8
|
Kraka E, Antonio JJ, Freindorf M. Reaction mechanism - explored with the unified reaction valley approach. Chem Commun (Camb) 2023; 59:7151-7165. [PMID: 37233449 DOI: 10.1039/d3cc01576a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
One of the ultimate goals of chemistry is to understand and manipulate chemical reactions, which implies the ability to monitor the reaction and its underlying mechanism at an atomic scale. In this article, we introduce the Unified Reaction Valley Approach (URVA) as a tool for elucidating reaction mechanisms, complementing existing computational procedures. URVA combines the concept of the potential energy surface with vibrational spectroscopy and describes a chemical reaction via the reaction path and the surrounding reaction valley traced out by the reacting species on the potential energy surface on their way from the entrance to the exit channel, where the products are located. The key feature of URVA is the focus on the curving of the reaction path. Moving along the reaction path, any electronic structure change of the reacting species is registered by a change in the normal vibrational modes spanning the reaction valley and their coupling with the path, which recovers the curvature of the reaction path. This leads to a unique curvature profile for each chemical reaction, with curvature minima reflecting minimal change and curvature maxima indicating the location of important chemical events such as bond breaking/formation, charge polarization and transfer, rehybridization, etc. A decomposition of the path curvature into internal coordinate components or other coordinates of relevance for the reaction under consideration, provides comprehensive insight into the origin of the chemical changes taking place. After giving an overview of current experimental and computational efforts to gain insight into the mechanism of a chemical reaction and presenting the theoretical background of URVA, we illustrate how URVA works for three diverse processes, (i) [1,3] hydrogen transfer reactions; (ii) α-keto-amino inhibitor for SARS-CoV-2 Mpro; (iii) Rh-catalyzed cyanation. We hope that this article will inspire our computational colleagues to add URVA to their repertoire and will serve as an incubator for new reaction mechanisms to be studied in collaboration with our experimental experts in the field.
Collapse
Affiliation(s)
- Elfi Kraka
- Computational and Theoretical Chemistry Group (CATCO), Department of Chemistry, Southern Methodist University, 3215 Daniel Ave, Dallas, TX 75275-0314, USA.
| | - Juliana J Antonio
- Computational and Theoretical Chemistry Group (CATCO), Department of Chemistry, Southern Methodist University, 3215 Daniel Ave, Dallas, TX 75275-0314, USA.
| | - Marek Freindorf
- Computational and Theoretical Chemistry Group (CATCO), Department of Chemistry, Southern Methodist University, 3215 Daniel Ave, Dallas, TX 75275-0314, USA.
| |
Collapse
|
9
|
Pasquini M, Stenta M. LinChemIn: SynGraph-a data model and a toolkit to analyze and compare synthetic routes. J Cheminform 2023; 15:41. [PMID: 37005691 PMCID: PMC10067316 DOI: 10.1186/s13321-023-00714-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 03/20/2023] [Indexed: 04/04/2023] Open
Abstract
BACKGROUND The increasing amount of chemical reaction data makes traditional ways to navigate its corpus less effective, while the demand for novel approaches and instruments is rising. Recent data science and machine learning techniques support the development of new ways to extract value from the available reaction data. On the one side, Computer-Aided Synthesis Planning tools can predict synthetic routes in a model-driven approach; on the other side, experimental routes can be extracted from the Network of Organic Chemistry, in which reaction data are linked in a network. In this context, the need to combine, compare and analyze synthetic routes generated by different sources arises naturally. RESULTS Here we present LinChemIn, a python toolkit that allows chemoinformatics operations on synthetic routes and reaction networks. Wrapping some third-party packages for handling graph arithmetic and chemoinformatics and implementing new data models and functionalities, LinChemIn allows the interconversion between data formats and data models and enables route-level analysis and operations, including route comparison and descriptors calculation. Object-Oriented Design principles inspire the software architecture, and the modules are structured to maximize code reusability and support code testing and refactoring. The code structure should facilitate external contributions, thus encouraging open and collaborative software development. CONCLUSIONS The current version of LinChemIn allows users to combine synthetic routes generated from various tools and analyze them, and constitutes an open and extensible framework capable of incorporating contributions from the community and fostering scientific discussion. Our roadmap envisages the development of sophisticated metrics for routes evaluation, a multi-parameter scoring system, and the implementation of an entire "ecosystem" of functionalities operating on synthetic routes. LinChemIn is freely available at https://github.com/syngenta/linchemin.
Collapse
Affiliation(s)
- Marta Pasquini
- Syngenta Crop Protection AG, Schaffhauserstrasse, 4332, Stein, AG, Switzerland.
| | - Marco Stenta
- Syngenta Crop Protection AG, Schaffhauserstrasse, 4332, Stein, AG, Switzerland
| |
Collapse
|
10
|
Jaume-Santero F, Bornet A, Valery A, Naderi N, Vicente Alvarez D, Proios D, Yazdani A, Bournez C, Fessard T, Teodoro D. Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios. J Chem Inf Model 2023; 63:1914-1924. [PMID: 36952584 PMCID: PMC10091402 DOI: 10.1021/acs.jcim.2c01407] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The prediction of chemical reaction pathways has been accelerated by the development of novel machine learning architectures based on the deep learning paradigm. In this context, deep neural networks initially designed for language translation have been used to accurately predict a wide range of chemical reactions. Among models suited for the task of language translation, the recently introduced molecular transformer reached impressive performance in terms of forward-synthesis and retrosynthesis predictions. In this study, we first present an analysis of the performance of transformer models for product, reactant, and reagent prediction tasks under different scenarios of data availability and data augmentation. We find that the impact of data augmentation depends on the prediction task and on the metric used to evaluate the model performance. Second, we probe the contribution of different combinations of input formats, tokenization schemes, and embedding strategies to model performance. We find that less stable input settings generally lead to better performance. Lastly, we validate the superiority of round-trip accuracy over simpler evaluation metrics, such as top-k accuracy, using a committee of human experts and show a strong agreement for predictions that pass the round-trip test. This demonstrates the usefulness of more elaborate metrics in complex predictive scenarios and highlights the limitations of direct comparisons to a predefined database, which may include a limited number of chemical reaction pathways.
Collapse
Affiliation(s)
- Fernando Jaume-Santero
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
| | - Alban Bornet
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
| | | | - Nona Naderi
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - David Vicente Alvarez
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
| | - Dimitrios Proios
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
| | - Anthony Yazdani
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
| | | | | | - Douglas Teodoro
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
11
|
Chen Y, Ou Y, Zheng P, Huang Y, Ge F, Dral PO. Benchmark of general-purpose machine learning-based quantum mechanical method AIQM1 on reaction barrier heights. J Chem Phys 2023; 158:074103. [PMID: 36813722 DOI: 10.1063/5.0137101] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence-enhanced quantum mechanical method 1 (AIQM1) is a general-purpose method that was shown to achieve high accuracy for many applications with a speed close to its baseline semiempirical quantum mechanical (SQM) method ODM2*. Here, we evaluate the hitherto unknown performance of out-of-the-box AIQM1 without any refitting for reaction barrier heights on eight datasets, including a total of ∼24 thousand reactions. This evaluation shows that AIQM1's accuracy strongly depends on the type of transition state and ranges from excellent for rotation barriers to poor for, e.g., pericyclic reactions. AIQM1 clearly outperforms its baseline ODM2* method and, even more so, a popular universal potential, ANI-1ccx. Overall, however, AIQM1 accuracy largely remains similar to SQM methods (and B3LYP/6-31G* for most reaction types) suggesting that it is desirable to focus on improving AIQM1 performance for barrier heights in the future. We also show that the built-in uncertainty quantification helps in identifying confident predictions. The accuracy of confident AIQM1 predictions is approaching the level of popular density functional theory methods for most reaction types. Encouragingly, AIQM1 is rather robust for transition state optimizations, even for the type of reactions it struggles with the most. Single-point calculations with high-level methods on AIQM1-optimized geometries can be used to significantly improve barrier heights, which cannot be said for its baseline ODM2* method.
Collapse
Affiliation(s)
- Yuxinxin Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yanchi Ou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yaohuang Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
12
|
Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023; 14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.
Collapse
Affiliation(s)
- Zhengkai Tu
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|
13
|
Davies JC, Pattison D, Hirst JD. Machine learning for yield prediction for chemical reactions using in situ sensors. J Mol Graph Model 2023; 118:108356. [PMID: 36272195 DOI: 10.1016/j.jmgm.2022.108356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/30/2022] [Accepted: 09/30/2022] [Indexed: 11/28/2022]
Abstract
Machine learning models were developed to predict product formation from time-series reaction data for ten Buchwald-Hartwig coupling reactions. The data was provided by DeepMatter and was collected in their DigitalGlassware cloud platform. The reaction probe has 12 sensors to measure properties of interest, including temperature, pressure, and colour. Colour was a good predictor of product formation for this reaction and machine learning models were able to learn which of the properties were important. Predictions for the current product formation (in terms of % yield) had a mean absolute error of 1.2%. For predicting 30, 60 and 120 min ahead the error rose to 3.4, 4.1 and 4.6%, respectively. The work here presents an example into the insight that can be obtained from applying machine learning methods to sensor data in synthetic chemistry.
Collapse
Affiliation(s)
- Joseph C Davies
- School of Chemistry, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
| | | | - Jonathan D Hirst
- School of Chemistry, University of Nottingham, University Park, Nottingham, NG7 2RD, UK.
| |
Collapse
|
14
|
Li S, Wang X, Wu Y, Duan H, Tang L. Generation of novel Diels-Alder reactions using a generative adversarial network. RSC Adv 2022; 12:33801-33807. [PMID: 36505715 PMCID: PMC9693912 DOI: 10.1039/d2ra06022a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Accepted: 11/07/2022] [Indexed: 11/27/2022] Open
Abstract
Deep learning has enormous potential in the chemical and pharmaceutical fields, and generative adversarial networks (GANs) in particular have exhibited remarkable performance in the field of molecular generation as generative models. However, their application in the field of organic chemistry has been limited; thus, in this study, we attempt to utilize a GAN as a generative model for the generation of Diels-Alder reactions. A MaskGAN model was trained with 14 092 Diels-Alder reactions, and 1441 novel Diels-Alder reactions were generated. Analysis of the generated reactions indicated that the model learned several reaction rules in-depth. Thus, the MaskGAN model can be used to generate organic reactions and aid chemists in the exploration of novel reactions.
Collapse
Affiliation(s)
- Sheng Li
- College of Pharmaceutical Sciences, Zhejiang University of Technology Hangzhou 310014 P. R. China
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology Hangzhou 310014 P. R. China
| | - Xinqiao Wang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology Hangzhou 310014 P. R. China
| | - Yejian Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology Hangzhou 310014 P. R. China
| | - Hongliang Duan
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology Hangzhou 310014 P. R. China
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica (SIMM), Chinese Academy of Sciences Shanghai 201203 China
| | - Lan Tang
- College of Pharmaceutical Sciences, Zhejiang University of Technology Hangzhou 310014 P. R. China
| |
Collapse
|
15
|
Stan A, Esch BVD, Ochsenfeld C. Fully Automated Generation of Prebiotically Relevant Reaction Networks from Optimized Nanoreactor Simulations. J Chem Theory Comput 2022; 18:6700-6712. [PMID: 36270030 DOI: 10.1021/acs.jctc.2c00754] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The nanoreactor approach first introduced by the group of Martı́nez [Wang et al. Nat. Chem. 2014, 6, 1044-1048] has recently attracted much attention because of its ability to accelerate the discovery of reaction pathways. Here, we provide a comprehensive study of various simulation parameters and present an alternative implementation for the reactivity-enhancing spherical constraint function, as well as for the detection of reaction events. In this context, a fully automated postsimulation evaluation procedure based on RDKit and NetworkX analysis is introduced. The chemical and physical robustness of the procedure is examined by investigating the reactivity of selected homogeneous systems. The optimized procedure is applied at the GFN2-xTB level of theory to a system composed of HCN molecules and argon atoms, acting as a buffer, yielding prebiotically plausible primary and secondary precursors for the synthesis of RNA. Furthermore, the formose reaction network is explored leading to numerous sugar precursors. The discovered compounds reflect experimental findings; however, new synthetic routes and a large collection of exotic, highly reactive molecules are observed, highlighting the predictive power of the nanoreactor approach for unraveling the reactive manifold.
Collapse
Affiliation(s)
- Alexandra Stan
- Chair of Theoretical Chemistry, Department of Chemistry, University of Munich (LMU), Butenandtstr. 7, D-81377 München, Germany
| | - Beatriz von der Esch
- Chair of Theoretical Chemistry, Department of Chemistry, University of Munich (LMU), Butenandtstr. 7, D-81377 München, Germany
| | - Christian Ochsenfeld
- Chair of Theoretical Chemistry, Department of Chemistry, University of Munich (LMU), Butenandtstr. 7, D-81377 München, Germany.,Max Planck Institute for Solid State Research, Heisenbergstr. 1, D-70569 Stuttgart, Germany
| |
Collapse
|
16
|
Krenn M, Ai Q, Barthel S, Carson N, Frei A, Frey NC, Friederich P, Gaudin T, Gayle AA, Jablonka KM, Lameiro RF, Lemm D, Lo A, Moosavi SM, Nápoles-Duarte JM, Nigam A, Pollice R, Rajan K, Schatzschneider U, Schwaller P, Skreta M, Smit B, Strieth-Kalthoff F, Sun C, Tom G, Falk von Rudorff G, Wang A, White AD, Young A, Yu R, Aspuru-Guzik A. SELFIES and the future of molecular string representations. PATTERNS (NEW YORK, N.Y.) 2022; 3:100588. [PMID: 36277819 PMCID: PMC9583042 DOI: 10.1016/j.patter.2022.100588] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings-most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.
Collapse
Affiliation(s)
- Mario Krenn
- Max Planck Institute for the Science of Light (MPL), Erlangen, Germany
| | - Qianxiang Ai
- Department of Chemistry, Fordham University, The Bronx, NY, USA
| | - Senja Barthel
- Department of Mathematics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Nessa Carson
- Syngenta Jealott’s Hill International Research Centre, Bracknell, Berkshire, UK
| | - Angelo Frei
- Department of Chemistry, Imperial College London, Molecular Sciences Research Hub, White City Campus, Wood Lane, London, UK
| | - Nathan C. Frey
- Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- IBM Research Europe, Zürich, Switzerland
| | | | - Kevin Maik Jablonka
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Rafael F. Lameiro
- Medicinal and Biological Chemistry Group, São Carlos Institute of Chemistry, University of São Paulo, São Paulo, Brazil
| | - Dominik Lemm
- Faculty of Physics, University of Vienna, Vienna, Austria
| | - Alston Lo
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Seyed Mohamad Moosavi
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | | | - AkshatKumar Nigam
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Robert Pollice
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller Universität Jena, Jena, Germany
| | - Ulrich Schatzschneider
- Institut für Anorganische Chemie, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Philippe Schwaller
- IBM Research Europe, Zürich, Switzerland
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Marta Skreta
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Berend Smit
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Felix Strieth-Kalthoff
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Chong Sun
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Gary Tom
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | | | - Andrew Wang
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Solar Fuels Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, NY, USA
| | - Adamo Young
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Rose Yu
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Alán Aspuru-Guzik
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Materials Science, University of Toronto, Toronto, ON, Canada
- Canadian Institute for Advanced Research (CIFAR) Lebovic Fellow, Toronto, ON, Canada
| |
Collapse
|
17
|
Xie Y, Zhang Y, Wong KC, Shi M, Peng C. Improving Chemical Reaction Prediction with Unlabeled Data. Molecules 2022; 27:molecules27185967. [PMID: 36144703 PMCID: PMC9506495 DOI: 10.3390/molecules27185967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Revised: 09/04/2022] [Accepted: 09/08/2022] [Indexed: 11/18/2022] Open
Abstract
Predicting products of organic chemical reactions is useful in chemical sciences, especially when one or more reactants are new organics. However, the performance of traditional learning models heavily relies on high-quality labeled data. In this work, to utilize unlabeled data for better prediction performance, we propose a method that combines semi-supervised learning with graph convolutional neural networks for chemical reaction prediction. First, we propose a Mean Teacher Weisfeiler–Lehman Network to find the reaction centers. Then, we construct the candidate product set. Finally, we use an Improved Weisfeiler–Lehman Difference Network to rank candidate products. Experimental results demonstrate that, with 400k labeled data, our framework can improve the top-5 accuracy by 0.7% using 35k unlabeled data. When the proportion of unlabeled data increases, the performance gain can be larger. For example, with 80k labeled data and 35k unlabeled data, the performance gain with our framework can be 1.8%.
Collapse
Affiliation(s)
- Yu Xie
- College of Information Science and Engineering, Ningbo University, Ningbo 315211, China
| | - Yuyang Zhang
- College of Information Science and Engineering, Ningbo University, Ningbo 315211, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hongkong 999077, China
| | - Meixia Shi
- College of Chemical Engineering, Ningbo Polytechnic, Ningbo 315000, China
| | - Chengbin Peng
- College of Information Science and Engineering, Ningbo University, Ningbo 315211, China
- Correspondence:
| |
Collapse
|
18
|
Wang X, Yao C, Zhang Y, Yu J, Qiao H, Zhang C, Wu Y, Bai R, Duan H. From theory to experiment: transformer-based generation enables rapid discovery of novel reactions. J Cheminform 2022; 14:60. [PMID: 36056425 PMCID: PMC9438336 DOI: 10.1186/s13321-022-00638-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Accepted: 08/11/2022] [Indexed: 11/10/2022] Open
Abstract
Deep learning methods, such as reaction prediction and retrosynthesis analysis, have demonstrated their significance in the chemical field. However, the de novo generation of novel reactions using artificial intelligence technology requires further exploration. Inspired by molecular generation, we proposed a novel task of reaction generation. Herein, Heck reactions were applied to train the transformer model, a state-of-art natural language process model, to generate 4717 reactions after sampling and processing. Then, 2253 novel Heck reactions were confirmed by organizing chemists to judge the generated reactions. More importantly, further organic synthesis experiments were performed to verify the accuracy and feasibility of representative reactions. The total process, from Heck reaction generation to experimental verification, required only 15 days, demonstrating that our model has well-learned reaction rules in-depth and can contribute to novel reaction discovery and chemical space exploration.
Collapse
Affiliation(s)
- Xinqiao Wang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Chuansheng Yao
- College of Pharmacy, School of Medicine, Hangzhou Normal University, Hangzhou, People's Republic of China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou, People's Republic of China
| | - Yun Zhang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Jiahui Yu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Haoran Qiao
- College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai, 201203, People's Republic of China
| | - Chengyun Zhang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Yejian Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
| | - Renren Bai
- College of Pharmacy, School of Medicine, Hangzhou Normal University, Hangzhou, People's Republic of China.
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou, People's Republic of China.
| | - Hongliang Duan
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China.
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica (SIMM), Chinese Academy of Sciences, Shanghai, 201203, China.
| |
Collapse
|
19
|
Park S, Han H, Kim H, Choi S. Machine Learning Applications for Chemical Reactions. Chem Asian J 2022; 17:e202200203. [PMID: 35471772 PMCID: PMC9401034 DOI: 10.1002/asia.202200203] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 04/26/2022] [Indexed: 11/30/2022]
Abstract
Machine learning (ML) approaches have enabled rapid and efficient molecular property predictions as well as the design of new novel materials. In addition to great success for molecular problems, ML techniques are applied to various chemical reaction problems that require huge costs to solve with the existing experimental and simulation methods. In this review, starting with basic representations of chemical reactions, we summarized recent achievements of ML studies on two different problems; predicting reaction properties and synthetic routes. The various ML models are used to predict physical properties related to chemical reaction properties (e. g. thermodynamic changes, activation barriers, and reaction rates). Furthermore, the predictions of reactivity, self-optimization of reaction, and designing retrosynthetic reaction paths are also tackled by ML approaches. Herein we illustrate various ML strategies utilized in the various context of chemical reaction studies.
Collapse
Affiliation(s)
- Sanggil Park
- Department of ChemistryIncheon Natoinal University and Research Institute of Basic SciencesIncheon22012Republic of Korea
| | - Herim Han
- Digital Bio R&D CenterMediazenSeoul07789Republic of Korea
- Department of Polymer Science and EngineeringDankook UniversityYongin, Gyeonggi16890Republic of Korea
| | - Hyungjun Kim
- Department of ChemistryIncheon Natoinal University and Research Institute of Basic SciencesIncheon22012Republic of Korea
| | - Sunghwan Choi
- Division of National SupercomputingKorea Institute of Science and Technology InformationDaejeon34141Republic of Korea
| |
Collapse
|
20
|
Nugmanov R, Dyubankova N, Gedich A, Wegner JK. Bidirectional Graphormer for Reactivity Understanding: Neural Network Trained to Reaction Atom-to-Atom Mapping Task. J Chem Inf Model 2022; 62:3307-3315. [DOI: 10.1021/acs.jcim.2c00344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Ramil Nugmanov
- Janssen Research & Development, Janssen Pharmaceutica N.V., Turnhoutseweg 30, Beerse B-2340, Belgium
| | - Natalia Dyubankova
- Janssen Research & Development, Janssen Pharmaceutica N.V., Turnhoutseweg 30, Beerse B-2340, Belgium
| | - Andrey Gedich
- Arcadia Inc., 28 k2, Bolshoy Sampsonievskiy pr., St. Petersburg 194044, Russia
| | - Joerg Kurt Wegner
- Janssen Research & Development, LLC, 255 Main St, Cambridge, Massachusetts 02142, United States
| |
Collapse
|
21
|
Lustosa DM, Milo A. Mechanistic Inference from Statistical Models at Different Data-Size Regimes. ACS Catal 2022. [DOI: 10.1021/acscatal.2c01741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Danilo M. Lustosa
- Department of Chemistry, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Anat Milo
- Department of Chemistry, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| |
Collapse
|
22
|
Venkatasubramanian V, Mann V. Artificial intelligence in reaction prediction and chemical synthesis. Curr Opin Chem Eng 2022. [DOI: 10.1016/j.coche.2021.100749] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
23
|
Su A, Wang X, Wang L, Zhang C, Wu Y, Wu X, Zhao Q, Duan H. Reproducing the invention of a named reaction: zero-shot prediction of unseen chemical reactions. Phys Chem Chem Phys 2022; 24:10280-10291. [PMID: 35437562 DOI: 10.1039/d1cp05878a] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
While state-of-art models can predict reactions through the transfer learning of thousands of samples with the same reaction types as those of the reactions to predict, how to prepare such models to predict "unseen" reactions remains an unanswered question. We aimed to study the Transformer model's ability to predict "unseen" reactions through "zero-shot reaction prediction (ZSRP)", a concept derived from zero-shot learning and zero-shot translation. We reproduced the human invention of the Chan-Lam coupling reaction where the inventor was inspired by the Suzuki reaction when improving Barton's bismuth arylation reaction. After being fine-tuned with samples from these two "existing" reactions, the USPTO-trained Transformer could predict "unseen" Chan-Lam coupling reactions with 55.7% top-1 accuracy. Our model could also mimic the later stage of the history of this reaction, where the initial case of this reaction was generalized to more reactants and reagents via "one-shot/few-shot reaction prediction (OSRP/FSRP)" approaches.
Collapse
Affiliation(s)
- An Su
- College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, P. R. China
| | - Xinqiao Wang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China.
| | - Ling Wang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China.
| | - Chengyun Zhang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China.
| | - Yejian Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China.
| | - Xinyi Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China.
| | - Qingjie Zhao
- Shanghai Institute of Material Medical, Chinese Academy of Sciences, Shanghai 201203, P. R. China
| | - Hongliang Duan
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, P. R. China. .,State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica (SIMM), Chinese Academy of Sciences, Shanghai 201203, China
| |
Collapse
|
24
|
Thomas M, Boardman A, Garcia-Ortegon M, Yang H, de Graaf C, Bender A. Applications of Artificial Intelligence in Drug Design: Opportunities and Challenges. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:1-59. [PMID: 34731463 DOI: 10.1007/978-1-0716-1787-8_1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Artificial intelligence (AI) has undergone rapid development in recent years and has been successfully applied to real-world problems such as drug design. In this chapter, we review recent applications of AI to problems in drug design including virtual screening, computer-aided synthesis planning, and de novo molecule generation, with a focus on the limitations of the application of AI therein and opportunities for improvement. Furthermore, we discuss the broader challenges imposed by AI in translating theoretical practice to real-world drug design; including quantifying prediction uncertainty and explaining model behavior.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Andrew Boardman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Miguel Garcia-Ortegon
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.,Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, UK
| | - Hongbin Yang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | | | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
25
|
Andronov M, Fedorov MV, Sosnin S. Exploring Chemical Reaction Space with Reaction Difference Fingerprints and Parametric t-SNE. ACS OMEGA 2021; 6:30743-30751. [PMID: 34805702 PMCID: PMC8600617 DOI: 10.1021/acsomega.1c04778] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 10/18/2021] [Indexed: 06/13/2023]
Abstract
Humans prefer visual representations for the analysis of large databases. In this work, we suggest a method for the visualization of the chemical reaction space. Our technique uses the t-SNE approach that is parameterized using a deep neural network (parametric t-SNE). We demonstrated that the parametric t-SNE combined with reaction difference fingerprints could provide a tool for the projection of chemical reactions on a low-dimensional manifold for easy exploration of reaction space. We showed that the global reaction landscape projected on a 2D plane corresponds well with the already known reaction types. The application of a pretrained parametric t-SNE model to new reactions allows chemists to study these reactions in a global reaction space. We validated the feasibility of this approach for two commercial drugs, darunavir and montelukast. We believe that our method can help to explore reaction space and will inspire chemists to find new reactions and synthetic ways.
Collapse
Affiliation(s)
- Mikhail Andronov
- Faculty
of Fundamental Physical and Chemical Engineering, Lomonosov Moscow State University, Leninskie gory, 1, Moscow 119991, Russian Federation
| | - Maxim V. Fedorov
- Sirius
University of Science and Technology, Olimpiysky Ave. b.1, Sochi 354000, Russian Federation
- Syntelly
LLC, Bolshoy Boulevard
30, bld. 1, Moscow 121205, Russian Federation
- Skolkovo
Institute of Science and Technology, Bolshoy Boulevard 30, bld. 1, Moscow 121205, Russian
Federation
| | - Sergey Sosnin
- Syntelly
LLC, Bolshoy Boulevard
30, bld. 1, Moscow 121205, Russian Federation
- Skolkovo
Institute of Science and Technology, Bolshoy Boulevard 30, bld. 1, Moscow 121205, Russian
Federation
| |
Collapse
|
26
|
Machine learning modelling of chemical reaction characteristics: yesterday, today, tomorrow. MENDELEEV COMMUNICATIONS 2021. [DOI: 10.1016/j.mencom.2021.11.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
27
|
Keith JA, Vassilev-Galindo V, Cheng B, Chmiela S, Gastegger M, Müller KR, Tkatchenko A. Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems. Chem Rev 2021; 121:9816-9872. [PMID: 34232033 PMCID: PMC8391798 DOI: 10.1021/acs.chemrev.1c00107] [Citation(s) in RCA: 242] [Impact Index Per Article: 60.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Indexed: 12/23/2022]
Abstract
Machine learning models are poised to make a transformative impact on chemical sciences by dramatically accelerating computational algorithms and amplifying insights available from computational chemistry methods. However, achieving this requires a confluence and coaction of expertise in computer science and physical sciences. This Review is written for new and experienced researchers working at the intersection of both fields. We first provide concise tutorials of computational chemistry and machine learning methods, showing how insights involving both can be achieved. We follow with a critical review of noteworthy applications that demonstrate how computational chemistry and machine learning can be used together to provide insightful (and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design.
Collapse
Affiliation(s)
- John A. Keith
- Department
of Chemical and Petroleum Engineering Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Valentin Vassilev-Galindo
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Bingqing Cheng
- Accelerate
Programme for Scientific Discovery, Department
of Computer Science and Technology, 15 J. J. Thomson Avenue, Cambridge CB3 0FD, United Kingdom
| | - Stefan Chmiela
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Michael Gastegger
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea
- Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany
- Google Research, Brain Team, 10117 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| |
Collapse
|
28
|
Gimadiev TR, Lin A, Afonina VA, Batyrshin D, Nugmanov RI, Akhmetshin T, Sidorov P, Duybankova N, Verhoeven J, Wegner J, Ceulemans H, Gedich A, Madzhidov TI, Varnek A. Reaction Data Curation I: Chemical Structures and Transformations Standardization. Mol Inform 2021; 40:e2100119. [PMID: 34427989 DOI: 10.1002/minf.202100119] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 08/13/2021] [Indexed: 12/11/2022]
Abstract
The quality of experimental data for chemical reactions is a critical consideration for any reaction-driven study. However, the curation of reaction data has not been extensively discussed in the literature so far. Here, we suggest a 4 steps protocol that includes the curation of individual structures (reactants and products), chemical transformations, reaction conditions and endpoints. Its implementation in Python3 using CGRTools toolkit has been used to clean three popular reaction databases Reaxys, USPTO and Pistachio. The curated USPTO database is available in the GitHub repository (Laboratoire-de-Chemoinformatique/Reaction_Data_Cleaning).
Collapse
Affiliation(s)
- Timur R Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan
| | - Arkadii Lin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France
| | - Valentina A Afonina
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Dinar Batyrshin
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Ramil I Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Tagir Akhmetshin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France.,Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan
| | | | - Jonas Verhoeven
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Joerg Wegner
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Hugo Ceulemans
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Andrey Gedich
- Arcadia Inc., Bol'shoy Sampsoniyevskiy Prospekt, 28 κopпyc 2, 194044, St Petersburg, Russia
| | - Timur I Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan.,Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France
| |
Collapse
|
29
|
Mahendran D, Gurdin G, Lewinski N, Tang C, McInnes BT. Identifying Chemical Reactions and Their Associated Attributes in Patents. Front Res Metr Anal 2021; 6:688353. [PMID: 34322654 PMCID: PMC8312343 DOI: 10.3389/frma.2021.688353] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 05/31/2021] [Indexed: 11/13/2022] Open
Abstract
Chemical patents are an essential source of information about novel chemicals and chemical reactions. However, with the increasing volume of such patents, mining information about these chemicals and chemical reactions has become a time-intensive and laborious endeavor. In this study, we present a system to extract chemical reaction events from patents automatically. Our approach consists of two steps: 1) named entity recognition (NER)-the automatic identification of chemical reaction parameters from the corresponding text, and 2) event extraction (EE)-the automatic classifying and linking of entities based on their relationships to each other. For our NER system, we evaluate bidirectional long short-term memory (BiLSTM)-based and bidirectional encoder representations from transformer (BERT)-based methods. For our EE system, we evaluate BERT-based, convolutional neural network (CNN)-based, and rule-based methods. We evaluate our NER and EE components independently and as an end-to-end system, reporting the precision, recall, and F 1 score. Our results show that the BiLSTM-based method performed best at identifying the entities, and the CNN-based method performed best at extracting events.
Collapse
Affiliation(s)
- Darshini Mahendran
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Gabrielle Gurdin
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Nastassja Lewinski
- Department of Life Science and Chemical Engineering, Virginia Commonwealth University, Richmond, VA, United States
| | - Christina Tang
- Department of Life Science and Chemical Engineering, Virginia Commonwealth University, Richmond, VA, United States
| | - Bridget T McInnes
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
30
|
Bajorath J. State-of-the-art of artificial intelligence in medicinal chemistry. Future Sci OA 2021; 7:FSO702. [PMID: 34046204 PMCID: PMC8147736 DOI: 10.2144/fsoa-2021-0030] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 03/10/2021] [Indexed: 12/22/2022] Open
Affiliation(s)
- Jürgen Bajorath
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology & Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, Bonn D, 53115, Germany
| |
Collapse
|