1
|
Das M, Ghosh A, Sunoj RB. Advances in machine learning with chemical language models in molecular property and reaction outcome predictions. J Comput Chem 2024; 45:1160-1176. [PMID: 38299229 DOI: 10.1002/jcc.27315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/06/2024] [Accepted: 01/09/2024] [Indexed: 02/02/2024]
Abstract
Molecular properties and reactions form the foundation of chemical space. Over the years, innumerable molecules have been synthesized, a smaller fraction of them found immediate applications, while a larger proportion served as a testimony to creative and empirical nature of the domain of chemical science. With increasing emphasis on sustainable practices, it is desirable that a target set of molecules are synthesized preferably through a fewer empirical attempts instead of a larger library, to realize an active candidate. In this front, predictive endeavors using machine learning (ML) models built on available data acquire high timely significance. Prediction of molecular property and reaction outcome remain one of the burgeoning applications of ML in chemical science. Among several methods of encoding molecular samples for ML models, the ones that employ language like representations are gaining steady popularity. Such representations would additionally help adopt well-developed natural language processing (NLP) models for chemical applications. Given this advantageous background, herein we describe several successful chemical applications of NLP focusing on molecular property and reaction outcome predictions. From relatively simpler recurrent neural networks (RNNs) to complex models like transformers, different network architecture have been leveraged for tasks such as de novo drug design, catalyst generation, forward and retro-synthesis predictions. The chemical language model (CLM) provides promising avenues toward a broad range of applications in a time and cost-effective manner. While we showcase an optimistic outlook of CLMs, attention is also placed on the persisting challenges in reaction domain, which would optimistically be addressed by advanced algorithms tailored to chemical language and with increased availability of high-quality datasets.
Collapse
Affiliation(s)
- Manajit Das
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| | - Ankit Ghosh
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
- Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Mumbai, India
| |
Collapse
|
2
|
Xu J, Ye X, Lv Z, Chen YH, Wang XS. The Role of Base in Reaction Performance of Photochemical Synthesis of Thiazoles: An Integrated Theoretical and Experimental Study. Chemistry 2024; 30:e202304279. [PMID: 38409580 DOI: 10.1002/chem.202304279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/25/2024] [Accepted: 02/26/2024] [Indexed: 02/28/2024]
Abstract
Artificial intelligence (AI)/machine learning (ML) is emerging as pivotal in synthetic chemistry, offering revolutionary potential in retrosynthetic analysis, reaction conditions and reaction prediction. We have combined chemical descriptors, primarily based on Density Functional Theory (DFT) calculations, with various AI/ML tools such as Multi-Layer Perceptron (MLP) and Random Forest (RF), to predict the synthesis of 2-arylbenzothiazole in photoredox reactions. Significantly, our models underscore the critical role of the molecular structure and physicochemical characteristics of the base, especially the total atomic polarizabilities, in the rate-determining steps involving cyclohexyl and phenethyl moieties of the substrate. Moreover, we validated our findings in articles through experimental studies. It showcases the power of AI/ML and quantum chemistry in shaping the future of organic chemistry.
Collapse
Affiliation(s)
- Jiaxin Xu
- The Institute for Advanced Studies (IAS), Wuhan University, Wuhan, 430072, China
| | - Xiaoyu Ye
- The Institute for Advanced Studies (IAS), Wuhan University, Wuhan, 430072, China
| | - Zongchao Lv
- The Institute for Advanced Studies (IAS), Wuhan University, Wuhan, 430072, China
- CMC Pharmaceutical Research Center, Wuhan RS Pharmaceutical Co., Ltd., Wuhan, 430073, China
| | - Yi-Hung Chen
- The Institute for Advanced Studies (IAS), Wuhan University, Wuhan, 430072, China
| | - Xiang Simon Wang
- Howard University College of Pharmacy, 2300 Fourth Street NW, Washington, DC 20059, United States
| |
Collapse
|
3
|
Vik D, Pii D, Mudaliar C, Nørregaard-Madsen M, Kontijevskis A. Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns. Sci Rep 2024; 14:8733. [PMID: 38627535 PMCID: PMC11021461 DOI: 10.1038/s41598-024-59620-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 04/12/2024] [Indexed: 04/19/2024] Open
Abstract
This study explores how machine-learning can be used to predict chromatographic retention times (RT) for the analysis of small molecules, with the objective of identifying a machine-learning framework with the robustness required to support a chemical synthesis production platform. We used internally generated data from high-throughput parallel synthesis in context of pharmaceutical drug discovery projects. We tested machine-learning models from the following frameworks: XGBoost, ChemProp, and DeepChem, using a dataset of 7552 small molecules. Our findings show that two specific models, AttentiveFP and ChemProp, performed better than XGBoost and a regular neural network in predicting RT accurately. We also assessed how well these models performed over time and found that molecular graph neural networks consistently gave accurate predictions for new chemical series. In addition, when we applied ChemProp on the publicly available METLIN SMRT dataset, it performed impressively with an average error of 38.70 s. These results highlight the efficacy of molecular graph neural networks, especially ChemProp, in diverse RT prediction scenarios, thereby enhancing the efficiency of chromatographic analysis.
Collapse
Affiliation(s)
- Daniel Vik
- Amgen Research Copenhagen, Amgen Inc., 2100, Copenhagen, Denmark.
| | - David Pii
- Amgen Research Copenhagen, Amgen Inc., 2100, Copenhagen, Denmark
| | - Chirag Mudaliar
- Amgen Research Copenhagen, Amgen Inc., 2100, Copenhagen, Denmark
| | | | | |
Collapse
|
4
|
Liu S. Harvesting Chemical Understanding with Machine Learning and Quantum Computers. ACS PHYSICAL CHEMISTRY AU 2024; 4:135-142. [PMID: 38560751 PMCID: PMC10979482 DOI: 10.1021/acsphyschemau.3c00067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 12/29/2023] [Accepted: 01/02/2024] [Indexed: 04/04/2024]
Abstract
It is tenable to argue that nobody can predict the future with certainty, yet one can learn from the past and make informed projections for the years ahead. In this Perspective, we overview the status of how theory and computation can be exploited to obtain chemical understanding from wave function theory and density functional theory, and then outlook the likely impact of machine learning (ML) and quantum computers (QC) to appreciate traditional chemical concepts in decades to come. It is maintained that the development and maturation of ML and QC methods in theoretical and computational chemistry represent two paradigm shifts about how the Schrödinger equation can be solved. New chemical understanding can be harnessed in these two new paradigms by making respective use of ML features and QC qubits. Before that happens, however, we still have hurdles to face and obstacles to overcome in both ML and QC arenas. Possible pathways to tackle these challenges are proposed. We anticipate that hierarchical modeling, in contrast to multiscale modeling, will emerge and thrive, becoming the workhorse of in silico simulations in the next few decades.
Collapse
|
5
|
Huang Y, Zheng Y, Lu X, Zhao Y, Zhou D, Zhang Y, Liu G. Simulation and Optimization: A New Direction in Supercritical Technology Based Nanomedicine. Bioengineering (Basel) 2023; 10:1404. [PMID: 38135995 PMCID: PMC10741229 DOI: 10.3390/bioengineering10121404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/04/2023] [Accepted: 12/04/2023] [Indexed: 12/24/2023] Open
Abstract
In recent years, nanomedicines prepared using supercritical technology have garnered widespread research attention due to their inherent attributes, including structural stability, high bioavailability, and commendable safety profiles. The preparation of these nanomedicines relies upon drug solubility and mixing efficiency within supercritical fluids (SCFs). Solubility is closely intertwined with operational parameters such as temperature and pressure while mixing efficiency is influenced not only by operational conditions but also by the shape and dimensions of the nozzle. Due to the special conditions of supercriticality, these parameters are difficult to measure directly, thus presenting significant challenges for the preparation and optimization of nanomedicines. Mathematical models can, to a certain extent, prognosticate solubility, while simulation models can visualize mixing efficiency during experimental procedures, offering novel avenues for advancing supercritical nanomedicines. Consequently, within the framework of this endeavor, we embark on an extensive review encompassing the application of mathematical models, artificial intelligence (AI) methodologies, and computational fluid dynamics (CFD) techniques within the medical domain of supercritical technology. We undertake the synthesis and discourse of methodologies for calculating drug solubility in SCFs, as well as the influence of operational conditions and experimental apparatus upon the outcomes of nanomedicine preparation using supercritical technology. Through this comprehensive review, we elucidate the implementation procedures and commonly employed models of diverse methodologies, juxtaposing the merits and demerits of these models. Furthermore, we assert the dependability of employing models to compute drug solubility in SCFs and simulate the experimental processes, with the capability to serve as valuable tools for aiding and optimizing experiments, as well as providing guidance in the selection of appropriate operational conditions. This, in turn, fosters innovative avenues for the development of supercritical pharmaceuticals.
Collapse
Affiliation(s)
- Yulan Huang
- State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, National Innovation Platform for Industry-Education Integration in Vaccine Research, State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen 361102, China; (Y.H.); (Y.Z.); (G.L.)
| | - Yating Zheng
- State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, National Innovation Platform for Industry-Education Integration in Vaccine Research, State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen 361102, China; (Y.H.); (Y.Z.); (G.L.)
| | - Xiaowei Lu
- Institute of Artificial Intelligence, Xiamen University, Xiamen 361002, China;
| | - Yang Zhao
- Shenzhen Research Institute, Xiamen University, Shenzhen 518000, China;
| | - Da Zhou
- School of Mathematical Sciences, Xiamen University, Xiamen 361005, China
| | - Yang Zhang
- State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, National Innovation Platform for Industry-Education Integration in Vaccine Research, State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen 361102, China; (Y.H.); (Y.Z.); (G.L.)
| | - Gang Liu
- State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, National Innovation Platform for Industry-Education Integration in Vaccine Research, State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen 361102, China; (Y.H.); (Y.Z.); (G.L.)
| |
Collapse
|