1
|
Bhowmik D, Zhang P, Fox Z, Irle S, Gounley J. Enhancing molecular design efficiency: Uniting language models and generative networks with genetic algorithms. PATTERNS (NEW YORK, N.Y.) 2024; 5:100947. [PMID: 38645768 PMCID: PMC11026973 DOI: 10.1016/j.patter.2024.100947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/14/2023] [Accepted: 02/08/2024] [Indexed: 04/23/2024]
Abstract
This study examines the effectiveness of generative models in drug discovery, material science, and polymer science, aiming to overcome constraints associated with traditional inverse design methods relying on heuristic rules. Generative models generate synthetic data resembling real data, enabling deep learning model training without extensive labeled datasets. They prove valuable in creating virtual libraries of molecules for material science and facilitating drug discovery by generating molecules with specific properties. While generative adversarial networks (GANs) are explored for these purposes, mode collapse restricts their efficacy, limiting novel structure variability. To address this, we introduce a masked language model (LM) inspired by natural language processing. Although LMs alone can have inherent limitations, we propose a hybrid architecture combining LMs and GANs to efficiently generate new molecules, demonstrating superior performance over standalone masked LMs, particularly for smaller population sizes. This hybrid LM-GAN architecture enhances efficiency in optimizing properties and generating novel samples.
Collapse
Affiliation(s)
- Debsindhu Bhowmik
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Pei Zhang
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Zachary Fox
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Stephan Irle
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - John Gounley
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| |
Collapse
|
2
|
Arrowsmith CH. Structure-guided drug discovery: back to the future. Nat Struct Mol Biol 2024; 31:395-396. [PMID: 38486110 DOI: 10.1038/s41594-024-01244-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Affiliation(s)
- Cheryl H Arrowsmith
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada.
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
3
|
Kalikadien AV, Mirza A, Hossaini AN, Sreenithya A, Pidko EA. Paving the road towards automated homogeneous catalyst design. Chempluschem 2024:e202300702. [PMID: 38279609 DOI: 10.1002/cplu.202300702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 12/20/2023] [Indexed: 01/28/2024]
Abstract
In the past decade, computational tools have become integral to catalyst design. They continue to offer significant support to experimental organic synthesis and catalysis researchers aiming for optimal reaction outcomes. More recently, data-driven approaches utilizing machine learning have garnered considerable attention for their expansive capabilities. This Perspective provides an overview of diverse initiatives in the realm of computational catalyst design and introduces our automated tools tailored for high-throughput in silico exploration of the chemical space. While valuable insights are gained through methods for high-throughput in silico exploration and analysis of chemical space, their degree of automation and modularity are key. We argue that the integration of data-driven, automated and modular workflows is key to enhancing homogeneous catalyst design on an unprecedented scale, contributing to the advancement of catalysis research.
Collapse
Affiliation(s)
- Adarsh V Kalikadien
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| | - Adrian Mirza
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| | - Aydin Najl Hossaini
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| | - Avadakkam Sreenithya
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| | - Evgeny A Pidko
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| |
Collapse
|
4
|
Nuzillard JM. Use of carbon-13 NMR to identify known natural products by querying a nuclear magnetic resonance database-An assessment. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2023; 61:582-588. [PMID: 37583258 DOI: 10.1002/mrc.5386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 07/26/2023] [Accepted: 07/29/2023] [Indexed: 08/17/2023]
Abstract
The quick identification of known organic low molecular weight compounds, also known as structural dereplication, is a highly important task in the chemical profiling of natural resource extracts. To that end, a method that relies on carbon-13 nuclear magnetic resonance (NMR) spectroscopy, elaborated in earlier works of the author's research group, requires the availability of a dedicated database that establishes relationships between chemical structures, biological and chemical taxonomy, and spectroscopy. The construction of such a database, called acd_lotus, was reported earlier, and its usefulness was illustrated by only three examples. This article presents the results of structure searches carried out starting from 58 carbon-13 NMR data sets recorded on compounds selected in the metabolomics section of the biological magnetic resonance bank (BMRB). Two compound retrieval methods were employed. The first one involves searching in the acd_lotus database using commercial software. The second one operates through the freely accessible web interface of the nmrshiftdb2 database, which includes the compounds present in acd_lotus and many others. The two structural dereplication methods have proved to be efficient and can be used together in a complementary way.
Collapse
|
5
|
Stankevičiūtė K, Woillard JB, Peck RW, Marquet P, van der Schaar M. Bridging the Worlds of Pharmacometrics and Machine Learning. Clin Pharmacokinet 2023; 62:1551-1565. [PMID: 37803104 DOI: 10.1007/s40262-023-01310-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2023] [Indexed: 10/08/2023]
Abstract
Precision medicine requires individualized modeling of disease and drug dynamics, with machine learning-based computational techniques gaining increasing popularity. The complexity of either field, however, makes current pharmacological problems opaque to machine learning practitioners, and state-of-the-art machine learning methods inaccessible to pharmacometricians. To help bridge the two worlds, we provide an introduction to current problems and techniques in pharmacometrics that ranges from pharmacokinetic and pharmacodynamic modeling to pharmacometric simulations, model-informed precision dosing, and systems pharmacology, and review some of the machine learning approaches to address them. We hope this would facilitate collaboration between experts, with complementary strengths of principled pharmacometric modeling and flexibility of machine learning leading to synergistic effects in pharmacological applications.
Collapse
Affiliation(s)
- Kamilė Stankevičiūtė
- Department of Computer Science and Technology, University of Cambridge, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK
| | - Jean-Baptiste Woillard
- INSERM U1248 P&T, University of Limoges, 2 rue du Pr Descottes, 87000, Limoges, France.
- Department of Pharmacology and Toxicology, CHU Limoges, Limoges, France.
| | - Richard W Peck
- Department of Pharmacology and Therapeutics, University of Liverpool, Liverpool, UK
- Pharma Research and Development, Roche Innovation Center, Basel, Switzerland
| | - Pierre Marquet
- INSERM U1248 P&T, University of Limoges, 2 rue du Pr Descottes, 87000, Limoges, France
- Department of Pharmacology and Toxicology, CHU Limoges, Limoges, France
| | - Mihaela van der Schaar
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- The Alan Turing Institute, London, UK
| |
Collapse
|
6
|
Götz J, Jackl MK, Jindakun C, Marziale AN, André J, Gosling DJ, Springer C, Palmieri M, Reck M, Luneau A, Brocklehurst CE, Bode JW. High-throughput synthesis provides data for predicting molecular properties and reaction success. SCIENCE ADVANCES 2023; 9:eadj2314. [PMID: 37889964 PMCID: PMC10610918 DOI: 10.1126/sciadv.adj2314] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 09/26/2023] [Indexed: 10/29/2023]
Abstract
The generation of attractive scaffolds for drug discovery efforts requires the expeditious synthesis of diverse analogues from readily available building blocks. This endeavor necessitates a trade-off between diversity and ease of access and is further complicated by uncertainty about the synthesizability and pharmacokinetic properties of the resulting compounds. Here, we document a platform that leverages photocatalytic N-heterocycle synthesis, high-throughput experimentation, automated purification, and physicochemical assays on 1152 discrete reactions. Together, the data generated allow rational predictions of the synthesizability of stereochemically diverse C-substituted N-saturated heterocycles with deep learning and reveal unexpected trends on the relationship between structure and properties. This study exemplifies how organic chemists can exploit state-of-the-art technologies to markedly increase throughput and confidence in the preparation of drug-like molecules.
Collapse
Affiliation(s)
- Julian Götz
- Laboratory of Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| | - Moritz K. Jackl
- Laboratory of Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| | - Chalupat Jindakun
- Laboratory of Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| | - Alexander N. Marziale
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, 4056 Basel, Switzerland
| | - Jérôme André
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, 4056 Basel, Switzerland
| | - Daniel J. Gosling
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, 4056 Basel, Switzerland
| | - Clayton Springer
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, Cambridge, MA 02139, USA
| | - Marco Palmieri
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, 4056 Basel, Switzerland
| | - Marcel Reck
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, 4056 Basel, Switzerland
| | - Alexandre Luneau
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, 4056 Basel, Switzerland
| | - Cara E. Brocklehurst
- Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Pharma AG, 4056 Basel, Switzerland
| | - Jeffrey W. Bode
- Laboratory of Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| |
Collapse
|
7
|
Parmar SV, Deshmukh P, Sankpal R, Watharkar S, Avasare V. Machine Learning-Enabled Predictions of Condensed Fukui Functions and Designing of Metal Pincer Complexes for Catalytic Hydrogenation of CO 2. J Phys Chem A 2023; 127:8338-8346. [PMID: 37756223 DOI: 10.1021/acs.jpca.3c04494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/29/2023]
Abstract
This research showcases the machine learning (ML)-enabled homogeneous catalyst discovery to be employed in carbon dioxide hydrogenation. To achieve the desired turnover frequency (TOF), the electrophilicity of the central metal atom is a crucial factor in transition metal pincer complexes. The condensed Fukui function is a direct measure of the catalytic performance of these pincer complexes. Herein, we demonstrate that machine learning is a convenient and effiecient method to calculate condensed Fukui functions of the central metal atom. The electrophilicity values of 202 pincer complexes were calculated by using density functional theory (DFT) to train the ML model. The test data of the experimentally established pincer complexes show a direct linkage between calculated electrophilicity and experimental TOF. Further, this data was used to develop an ML protocol to screen 2,84,062 catalyst complexes to get the electrophilicity values of the Mn, Fe, Co, and Ni transition metals encompassing various permutation combinations of PNP, PNN, NNN, and PCP pincer ligands. These findings validate the efficacy of machine learning in the rapid screening of metal pincer catalysts based on condensed Fukui functions.
Collapse
Affiliation(s)
- Saurabh V Parmar
- Department of Chemistry, Ashoka University, Sonipat, Haryana 131029, India
| | - Pratham Deshmukh
- Department of Chemistry, Sir Parashurambhau College, Pune, Maharashtra 411030, India
| | - Rutuja Sankpal
- Department of Chemistry, Sir Parashurambhau College, Pune, Maharashtra 411030, India
| | - Siddhika Watharkar
- Department of Chemistry, Sir Parashurambhau College, Pune, Maharashtra 411030, India
| | - Vidya Avasare
- Department of Chemistry, Ashoka University, Sonipat, Haryana 131029, India
- Department of Chemistry, Sir Parashurambhau College, Pune, Maharashtra 411030, India
| |
Collapse
|
8
|
Yu X, Tang D, Chng JY, Sholl DS. Efficient Exploration of Adsorption Space for Separations in Metal-Organic Frameworks Combining the Use of Molecular Simulations, Machine Learning, and Ideal Adsorbed Solution Theory. THE JOURNAL OF PHYSICAL CHEMISTRY. C, NANOMATERIALS AND INTERFACES 2023; 127:19229-19239. [PMID: 37791097 PMCID: PMC10544990 DOI: 10.1021/acs.jpcc.3c04533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/15/2023] [Indexed: 10/05/2023]
Abstract
Adsorption-based separations using metal-organic frameworks (MOFs) are promising candidates for replacing common energy-intensive separation processes. The so-called adsorption space formed by the combination of billions of possible molecules and thousands of reported MOFs is vast. It is very challenging to comprehensively evaluate the performance of MOFs for chemical separation through experiments. Molecular simulations and machine learning (ML) have been widely applied to make predictions for adsorption-based separations. Previous ML approaches to these issues were typically limited to smaller molecules and often had poor accuracy in the dilute limit. To enable exploration of a wider adsorption space, we carefully selected a diverse set of 45 molecules and 335 MOFs and generated single-component isotherms of 15,075 MOF-molecule pairs by grand canonical Monte Carlo. Using this database, we successfully developed accurate (r2 > 0.9) machine learning models predicting adsorption isotherms of diverse molecules in large libraries of MOFs. With this approach, we can efficiently make predictions of large collections of MOFs for arbitrary mixture separations. By combining molecular simulation data and ML predictions with Ideal Adsorbed Solution Theory, we tested the ability of these approaches to make predictions of adsorption selectivity and loading for challenging near-azeotropic mixtures.
Collapse
Affiliation(s)
- Xiaohan Yu
- School
of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Dai Tang
- School
of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Jia Yuan Chng
- School
of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - David S. Sholl
- School
of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
- Oak
Ridge National Laboratory, Oak Ridge, Tennessee 37830, United States
| |
Collapse
|