1
|
Kevlishvili I, St Michel RG, Garrison AG, Toney JW, Adamji H, Jia H, Román-Leshkov Y, Kulik HJ. Leveraging natural language processing to curate the tmCAT, tmPHOTO, tmBIO, and tmSCO datasets of functional transition metal complexes. Faraday Discuss 2024. [PMID: 39301698 DOI: 10.1039/d4fd00087k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
The breadth of transition metal chemical space covered by databases such as the Cambridge Structural Database and the derived computational database tmQM is not conducive to application-specific modeling and the development of structure-property relationships. Here, we employ both supervised and unsupervised natural language processing (NLP) techniques to link experimentally synthesized compounds in the tmQM database to their respective applications. Leveraging NLP models, we curate four distinct datasets: tmCAT for catalysis, tmPHOTO for photophysical activity, tmBIO for biological relevance, and tmSCO for magnetism. Analyzing the chemical substructures within each dataset reveals common chemical motifs in each of the designated applications. We then use these common chemical structures to augment our initial datasets for each application, yielding a total of 21 631 compounds in tmCAT, 4599 in tmPHOTO, 2782 in tmBIO, and 983 in tmSCO. These datasets are expected to accelerate the more targeted computational screening and development of refined structure-property relationships with machine learning.
Collapse
Affiliation(s)
- Ilia Kevlishvili
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | - Roland G St Michel
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Aaron G Garrison
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | - Jacob W Toney
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | - Husain Adamji
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | - Haojun Jia
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Yuriy Román-Leshkov
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
2
|
Kneiding H, Balcells D. Augmenting genetic algorithms with machine learning for inverse molecular design. Chem Sci 2024:d4sc02934h. [PMID: 39296997 PMCID: PMC11404003 DOI: 10.1039/d4sc02934h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 09/09/2024] [Indexed: 09/21/2024] Open
Abstract
Evolutionary and machine learning methods have been successfully applied to the generation of molecules and materials exhibiting desired properties. The combination of these two paradigms in inverse design tasks can yield powerful methods that explore massive chemical spaces more efficiently, improving the quality of the generated compounds. However, such synergistic approaches are still an incipient area of research and appear underexplored in the literature. This perspective covers different ways of incorporating machine learning approaches into evolutionary learning frameworks, with the overall goal of increasing the optimization efficiency of genetic algorithms. In particular, machine learning surrogate models for faster fitness function evaluation, discriminator models to control population diversity on-the-fly, machine learning based crossover operations, and evolution in latent space are discussed. The further potential of these synergistic approaches in generative tasks is also assessed, outlining promising directions for future developments.
Collapse
Affiliation(s)
- Hannes Kneiding
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo P.O. Box 1033, Blindern 0315 Oslo Norway
| | - David Balcells
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo P.O. Box 1033, Blindern 0315 Oslo Norway
| |
Collapse
|
3
|
Liu S, Yu J, Ni N, Wang Z, Chen M, Li Y, Xu C, Ding Y, Zhang J, Yao X, Liu H. Versatile Framework for Drug-Target Interaction Prediction by Considering Domain-Specific Features. J Chem Inf Model 2024; 64:5646-5656. [PMID: 38976879 DOI: 10.1021/acs.jcim.4c00403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Predicting drug-target interactions (DTIs) is one of the crucial tasks in drug discovery, but traditional wet-lab experiments are costly and time-consuming. Recently, deep learning has emerged as a promising tool for accelerating DTI prediction due to its powerful performance. However, the models trained on limited known DTI data struggle to generalize effectively to novel drug-target pairs. In this work, we propose a strategy to train an ensemble of models by capturing both domain-generic and domain-specific features (E-DIS) to learn diverse domain features and adapt them to out-of-distribution data. Multiple experts were trained on different domains to capture and align domain-specific information from various distributions without accessing any data from unseen domains. E-DIS provides a comprehensive representation of proteins and ligands by capturing diverse features. Experimental results on four benchmark data sets in both in-domain and cross-domain settings demonstrated that E-DIS significantly improved model performance and domain generalization compared to existing methods. Our approach presents a significant advancement in DTI prediction by combining domain-generic and domain-specific features, enhancing the generalization ability of the DTI prediction model.
Collapse
Affiliation(s)
- Shuo Liu
- School of Pharmacy, Lanzhou University, Gansu 730000, China
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Jialiang Yu
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Ningxi Ni
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Zidong Wang
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Mengyun Chen
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Yuquan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, Gansu 730000, China
| | - Chen Xu
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Yahao Ding
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Jun Zhang
- Changping Laboratory, Beijing 102200, China
| | - Xiaojun Yao
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China
| |
Collapse
|
4
|
Bhowmik D, Zhang P, Fox Z, Irle S, Gounley J. Enhancing molecular design efficiency: Uniting language models and generative networks with genetic algorithms. PATTERNS (NEW YORK, N.Y.) 2024; 5:100947. [PMID: 38645768 PMCID: PMC11026973 DOI: 10.1016/j.patter.2024.100947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/14/2023] [Accepted: 02/08/2024] [Indexed: 04/23/2024]
Abstract
This study examines the effectiveness of generative models in drug discovery, material science, and polymer science, aiming to overcome constraints associated with traditional inverse design methods relying on heuristic rules. Generative models generate synthetic data resembling real data, enabling deep learning model training without extensive labeled datasets. They prove valuable in creating virtual libraries of molecules for material science and facilitating drug discovery by generating molecules with specific properties. While generative adversarial networks (GANs) are explored for these purposes, mode collapse restricts their efficacy, limiting novel structure variability. To address this, we introduce a masked language model (LM) inspired by natural language processing. Although LMs alone can have inherent limitations, we propose a hybrid architecture combining LMs and GANs to efficiently generate new molecules, demonstrating superior performance over standalone masked LMs, particularly for smaller population sizes. This hybrid LM-GAN architecture enhances efficiency in optimizing properties and generating novel samples.
Collapse
Affiliation(s)
- Debsindhu Bhowmik
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Pei Zhang
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Zachary Fox
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Stephan Irle
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - John Gounley
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| |
Collapse
|
5
|
Vadaddi SM, Zhao Q, Savoie BM. Graph to Activation Energy Models Easily Reach Irreducible Errors but Show Limited Transferability. J Phys Chem A 2024; 128:2543-2555. [PMID: 38517281 DOI: 10.1021/acs.jpca.3c07240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Activation energy characterization of competing reactions is a costly but crucial step for understanding the kinetic relevance of distinct reaction pathways, product yields, and myriad other properties of reacting systems. The standard methodology for activation energy characterization has historically been a transition state search using the highest level of theory that can be afforded. However, recently, several groups have popularized the idea of predicting activation energies directly based on nothing more than the reactant and product graphs, a sufficiently complex neural network, and a broad enough data set. Here, we have revisited this task using the recently developed Reaction Graph Depth 1 (RGD1) transition state data set and several newly developed graph attention architectures. All of these new architectures achieve similar state-of-the-art results of ∼4 kcal/mol mean absolute error on withheld testing sets of reactions but poor performance on external testing sets composed of reactions with differing mechanisms, reaction molecularity, or reactant size distribution. Limited transferability is also shown to be shared by other contemporary graph to activation energy architectures through a series of case studies. We conclude that an array of standard graph architectures can already achieve results comparable to the irreducible error of available reaction data sets but that out-of-distribution performance remains poor.
Collapse
Affiliation(s)
- Sai Mahit Vadaddi
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Qiyuan Zhao
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| |
Collapse
|
6
|
Lin YC, Chien WC, Wang YX, Wang YH, Yang FS, Tseng LP, Hung JH. PS 2MS: A Deep Learning-Based Prediction System for Identifying New Psychoactive Substances Using Mass Spectrometry. Anal Chem 2024; 96:4835-4844. [PMID: 38488022 PMCID: PMC10974679 DOI: 10.1021/acs.analchem.3c05019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/12/2024] [Accepted: 03/04/2024] [Indexed: 03/27/2024]
Abstract
The rapid proliferation of new psychoactive substances (NPS) poses significant challenges to conventional mass-spectrometry-based identification methods due to the absence of reference spectra for these emerging substances. This paper introduces PS2MS, an AI-powered predictive system designed specifically to address the limitations of identifying the emergence of unidentified novel illicit drugs. PS2MS builds a synthetic NPS database by enumerating feasible derivatives of known substances and uses deep learning to generate mass spectra and chemical fingerprints. When the mass spectrum of an analyte does not match any known reference, PS2MS simultaneously examines the chemical fingerprint and mass spectrum against the putative NPS database using integrated metrics to deduce possible identities. Experimental results affirm the effectiveness of PS2MS in identifying cathinone derivatives within real evidence specimens, signifying its potential for practical use in identifying emerging drugs of abuse for researchers and forensic experts.
Collapse
Affiliation(s)
- Yi-Ching Lin
- Department
of Laboratory Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung 807, Taiwan
- Department
of Laboratory Medicine, School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 807, Taiwan
- Doctoral
Degree Program of Toxicology, College of Pharmacy, Kaohsiung Medical University, Kaohsiung 807, Taiwan
- Department
of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| | - Wei-Chen Chien
- Department
of Computer Science, National Yang Ming
Chiao Tung University, HsinChu 300, Taiwan
| | - Yu-Xuan Wang
- Department
of Computer Science, National Yang Ming
Chiao Tung University, HsinChu 300, Taiwan
| | - Ying-Hau Wang
- Department
of Computer Science, National Yang Ming
Chiao Tung University, HsinChu 300, Taiwan
| | - Feng-Shuo Yang
- Department
of Laboratory Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung 807, Taiwan
- Department
of Medicinal and Applied Chemistry, Kaohsiung
Medical University, Kaohsiung 807, Taiwan
| | - Li-Ping Tseng
- Department
of Laboratory Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| | - Jui-Hung Hung
- Department
of Computer Science, National Yang Ming
Chiao Tung University, HsinChu 300, Taiwan
- Program
in Biomedical Artificial Intelligence, National
Tsing Hua University, HsinChu 300, Taiwan
| |
Collapse
|
7
|
Hönig SMN, Flachsenberg F, Ehrt C, Neumann A, Schmidt R, Lemmen C, Rarey M. SpaceGrow: efficient shape-based virtual screening of billion-sized combinatorial fragment spaces. J Comput Aided Mol Des 2024; 38:13. [PMID: 38493240 PMCID: PMC10944417 DOI: 10.1007/s10822-024-00551-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 02/13/2024] [Indexed: 03/18/2024]
Abstract
The growing size of make-on-demand chemical libraries is posing new challenges to cheminformatics. These ultra-large chemical libraries became too large for exhaustive enumeration. Using a combinatorial approach instead, the resource requirement scales approximately with the number of synthons instead of the number of molecules. This gives access to billions or trillions of compounds as so-called chemical spaces with moderate hardware and in a reasonable time frame. While extremely performant ligand-based 2D methods exist in this context, 3D methods still largely rely on exhaustive enumeration and therefore fail to apply. Here, we present SpaceGrow: a novel shape-based 3D approach for ligand-based virtual screening of billions of compounds within hours on a single CPU. Compared to a conventional superposition tool, SpaceGrow shows comparable pose reproduction capacity based on RMSD and superior ranking performance while being orders of magnitude faster. Result assessment of two differently sized subsets of the eXplore space reveals a higher probability of finding superior results in larger spaces highlighting the potential of searching in ultra-large spaces. Furthermore, the application of SpaceGrow in a drug discovery workflow was investigated in four examples involving G protein-coupled receptors (GPCRs) with the aim to identify compounds with similar binding capabilities and molecular novelty.
Collapse
Affiliation(s)
- Sophia M N Hönig
- BioSolveIT, An der Ziegelei 79, 53757, Sankt Augustin, Germany
- Universität Hamburg, ZBH - Center for Bioinformatics, Albert-Einstein-Ring 8-10, 22761, Hamburg, Germany
| | | | - Christiane Ehrt
- Universität Hamburg, ZBH - Center for Bioinformatics, Albert-Einstein-Ring 8-10, 22761, Hamburg, Germany
| | | | - Robert Schmidt
- BioSolveIT, An der Ziegelei 79, 53757, Sankt Augustin, Germany
| | | | - Matthias Rarey
- Universität Hamburg, ZBH - Center for Bioinformatics, Albert-Einstein-Ring 8-10, 22761, Hamburg, Germany.
| |
Collapse
|
8
|
Tutone M, Almerico AM. Computational Approaches and Drug Discovery: Where Are We Going? Molecules 2024; 29:969. [PMID: 38474481 DOI: 10.3390/molecules29050969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 02/21/2024] [Indexed: 03/14/2024] Open
Abstract
Science is a point of view [...].
Collapse
Affiliation(s)
- Marco Tutone
- Dipartimento di Scienze e Tecnologie Biologiche Chimiche e Farmaceutiche (STEBICEF), Università degli Studi di Palermo, Via Archirafi 32, 90123 Palermo, Italy
| | - Anna Maria Almerico
- Dipartimento di Scienze e Tecnologie Biologiche Chimiche e Farmaceutiche (STEBICEF), Università degli Studi di Palermo, Via Archirafi 32, 90123 Palermo, Italy
| |
Collapse
|
9
|
Yuan T, Song X, Shi Y, Wei S, Han Y, Yang L, Zhang Y, Li X, Li Y, Shen L, Fan L. Perspectives on development of optoelectronic materials in artificial intelligence age. Chem Asian J 2024:e202301088. [PMID: 38317532 DOI: 10.1002/asia.202301088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/28/2024] [Accepted: 02/05/2024] [Indexed: 02/07/2024]
Abstract
Optoelectronic devices, such as light-emitting diodes, have been demonstrated as one of the most demanded forthcoming display and lighting technologies because of their low cost, low power consumption, high brightness, and high contrast. The improvement of device performance relies on advances in precisely designing novelty functional materials, including light-emitting materials, hosts, hole/electron transport materials, and yet which is a time-consuming, laborious and resource-intensive task. Recently, machine learning (ML) has shown great prospects to accelerate material discovery and property enhancement. This review will summarize the workflow of ML in optoelectronic materials discovery, including data collection, feature engineering, model selection, model evaluation and model application. We highlight multiple recent applications of machine-learned potentials in various optoelectronic functional materials, ranging from semiconductor quantum dots (QDs) or perovskite QDs, organic molecules to carbon-based nanomaterials. We furthermore discuss the current challenges to fully realize the potential of ML-assisted materials design for optoelectronics applications. It is anticipated that this review will provide critical insights to inspire new exciting discoveries on ML-guided of high-performance optoelectronic devices with a combined effort from different disciplines.
Collapse
Affiliation(s)
- Ting Yuan
- College of Chemistry, Key Laboratory of Theoretical & Computational Photochemistry of Ministry of Education, Beijing Normal University, Beijing, 100875, China
| | - Xianzhi Song
- College of Chemistry, Key Laboratory of Theoretical & Computational Photochemistry of Ministry of Education, Beijing Normal University, Beijing, 100875, China
| | - Yuxin Shi
- College of Chemistry, Key Laboratory of Theoretical & Computational Photochemistry of Ministry of Education, Beijing Normal University, Beijing, 100875, China
| | - Shuyan Wei
- College of Chemistry, Key Laboratory of Theoretical & Computational Photochemistry of Ministry of Education, Beijing Normal University, Beijing, 100875, China
| | - Yuyi Han
- College of Chemistry, Key Laboratory of Theoretical & Computational Photochemistry of Ministry of Education, Beijing Normal University, Beijing, 100875, China
| | - Linjuan Yang
- College of Chemistry, Key Laboratory of Theoretical & Computational Photochemistry of Ministry of Education, Beijing Normal University, Beijing, 100875, China
| | - Yang Zhang
- College of Chemistry, Key Laboratory of Theoretical & Computational Photochemistry of Ministry of Education, Beijing Normal University, Beijing, 100875, China
| | - Xiaohong Li
- College of Chemistry, Key Laboratory of Theoretical & Computational Photochemistry of Ministry of Education, Beijing Normal University, Beijing, 100875, China
| | - Yunchao Li
- College of Chemistry, Key Laboratory of Theoretical & Computational Photochemistry of Ministry of Education, Beijing Normal University, Beijing, 100875, China
| | - Lin Shen
- College of Chemistry, Key Laboratory of Theoretical & Computational Photochemistry of Ministry of Education, Beijing Normal University, Beijing, 100875, China
| | - Louzhen Fan
- College of Chemistry, Key Laboratory of Theoretical & Computational Photochemistry of Ministry of Education, Beijing Normal University, Beijing, 100875, China
| |
Collapse
|
10
|
Vennelakanti V, Kilic IB, Terrones GG, Duan C, Kulik HJ. Machine Learning Prediction of the Experimental Transition Temperature of Fe(II) Spin-Crossover Complexes. J Phys Chem A 2024; 128:204-216. [PMID: 38148525 DOI: 10.1021/acs.jpca.3c07104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
Spin-crossover (SCO) complexes are materials that exhibit changes in the spin state in response to external stimuli, with potential applications in molecular electronics. It is challenging to know a priori how to design ligands to achieve the delicate balance of entropic and enthalpic contributions needed to tailor a transition temperature close to room temperature. We leverage the SCO complexes from the previously curated SCO-95 data set [Vennelakanti et al. J. Chem. Phys. 159, 024120 (2023)] to train three machine learning (ML) models for transition temperature (T1/2) prediction using graph-based revised autocorrelations as features. We perform feature selection using random forest-ranked recursive feature addition (RF-RFA) to identify the features essential to model transferability. Of the ML models considered, the full feature set RF and recursive feature addition RF models perform best, achieving moderate correlation to experimental T1/2 values. We then compare ML T1/2 predictions to those from three previously identified best-performing density functional approximations (DFAs) which accurately predict SCO behavior across SCO-95, finding that the ML models predict T1/2 more accurately than the best-performing DFAs. In addition, we study ML model predictions for a set of 18 SCO complexes for which only estimated T1/2 values are available. Upon excluding outliers from this set, the RF-RFA RF model shows a strong correlation to estimated T1/2 values with a Pearson's r of 0.82. In contrast, DFA-predicted T1/2 values have large errors and show no correlation to estimated T1/2 values over the same set of complexes. Overall, our study demonstrates slightly superior performance of ML models in comparison with some of the best-performing DFAs, and we expect ML models to improve further as larger data sets of SCO complexes are curated and become available for model training.
Collapse
Affiliation(s)
- Vyshnavi Vennelakanti
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Irem B Kilic
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Gianmarco G Terrones
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
11
|
Karandashev K, Weinreich J, Heinen S, Arismendi Arrieta DJ, von Rudorff GF, Hermansson K, von Lilienfeld OA. Evolutionary Monte Carlo of QM Properties in Chemical Space: Electrolyte Design. J Chem Theory Comput 2023; 19:8861-8870. [PMID: 38009856 PMCID: PMC10720348 DOI: 10.1021/acs.jctc.3c00822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/29/2023] [Accepted: 10/30/2023] [Indexed: 11/29/2023]
Abstract
Optimizing a target function over the space of organic molecules is an important problem appearing in many fields of applied science but also a very difficult one due to the vast number of possible molecular systems. We propose an evolutionary Monte Carlo algorithm for solving such problems which is capable of straightforwardly tuning both exploration and exploitation characteristics of an optimization procedure while retaining favorable properties of genetic algorithms. The method, dubbed MOSAiCS (Metropolis Optimization by Sampling Adaptively in Chemical Space), is tested on problems related to optimizing components of battery electrolytes, namely, minimizing solvation energy in water or maximizing dipole moment while enforcing a lower bound on the HOMO-LUMO gap; optimization was carried out over sets of molecular graphs inspired by QM9 and Electrolyte Genome Project (EGP) data sets. MOSAiCS reliably generated molecular candidates with good target quantity values, which were in most cases better than the ones found in QM9 or EGP. While the optimization results presented in this work sometimes required up to 106 QM calculations and were thus feasible only thanks to computationally efficient ab initio approximations of properties of interest, we discuss possible strategies for accelerating MOSAiCS using machine learning approaches.
Collapse
Affiliation(s)
| | - Jan Weinreich
- Faculty
of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
| | - Stefan Heinen
- Vector
Institute for Artificial Intelligence, Toronto, M5S 1M1 Ontario, Canada
| | | | - Guido Falk von Rudorff
- Department
of Chemistry, University Kassel, Heinrich-Plett-Str.40, 34132 Kassel, Germany
- Center
for Interdisciplinary Nanostructure Science and Technology (CINSaT), Heinrich-Plett-Straße 40, 34132 Kassel, Germany
| | - Kersti Hermansson
- Department
of Chemistry-Ångström Laboratory, Uppsala University, Box 538, SE-75121 Uppsala, Sweden
| | - O. Anatole von Lilienfeld
- Vector
Institute for Artificial Intelligence, Toronto, M5S 1M1 Ontario, Canada
- Departments
of Chemistry, Materials Science and Engineering, and Physics, University of Toronto, St. George
Campus, Toronto, M5S 1A1 Ontario, Canada
- Machine
Learning Group, Technische Universität
Berlin and Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
| |
Collapse
|
12
|
Gaytán-Hernández D, Chávez-Hernández AL, López-López E, Miranda-Salas J, Saldívar-González FI, Medina-Franco JL. Art driven by visual representations of chemical space. J Cheminform 2023; 15:100. [PMID: 37865794 PMCID: PMC10590523 DOI: 10.1186/s13321-023-00770-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 10/13/2023] [Indexed: 10/23/2023] Open
Abstract
Science and art have been connected for centuries. With the development of new computational methods, new scientific disciplines have emerged, such as computational chemistry, and related fields, such as cheminformatics. Chemoinformatics is grounded on the chemical space concept: a multi-descriptor space in which chemical structures are described. In several practical applications, visual representations of the chemical space of compound datasets are low-dimensional plots helpful in identifying patterns. However, the authors propose that the plots can also be used as artistic expressions. This manuscript introduces an approach to merging art with chemoinformatics through visual and artistic representations of chemical space. As case studies, we portray the chemical space of food chemicals and other compounds to generate visually appealing graphs with twofold benefits: sharing chemical knowledge and developing pieces of art driven by chemoinformatics. The art driven by chemical space visualization will help increase the application of chemistry and art and contribute to general education and dissemination of chemoinformatics and chemistry through artistic expressions. All the code and data sets to reproduce the visual representation of the chemical space presented in the manuscript are freely available at https://github.com/DIFACQUIM/Art-Driven-by-Visual-Representations-of-Chemical-Space- . Scientific contribution: Chemical space as a concept to create digital art and as a tool to train and introduce students to cheminformatics.
Collapse
Affiliation(s)
- Daniela Gaytán-Hernández
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico
| | - Ana L Chávez-Hernández
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico
- Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, 07000, Mexico City, Mexico
| | - Jazmín Miranda-Salas
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico
| | - Fernanda I Saldívar-González
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico.
| |
Collapse
|
13
|
Kerstjens A, De Winter H. A molecule perturbation software library and its application to study the effects of molecular design constraints. J Cheminform 2023; 15:89. [PMID: 37752561 PMCID: PMC10523775 DOI: 10.1186/s13321-023-00761-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/15/2023] [Indexed: 09/28/2023] Open
Abstract
Computational molecular design can yield chemically unreasonable compounds when performed carelessly. A popular strategy to mitigate this risk is mimicking reference chemistry. This is commonly achieved by restricting the way in which molecules are constructed or modified. While it is well established that such an approach helps in designing chemically appealing molecules, concerns about these restrictions impacting chemical space exploration negatively linger. In this work we present a software library for constrained graph-based molecule manipulation and showcase its functionality by developing a molecule generator. Said generator designs molecules mimicking reference chemical features of differing granularity. We find that restricting molecular construction lightly, beyond the usual positive effects on drug-likeness and synthesizability of designed molecules, provides guidance to optimization algorithms navigating chemical space. Nonetheless, restricting molecular construction excessively can indeed hinder effective chemical space exploration.
Collapse
Affiliation(s)
- Alan Kerstjens
- Laboratory of Medicinal Chemistry, Department of Pharmaceutical Sciences, University of Antwerp, Universiteitslaan 1, 2610, Wilrijk, Belgium
| | - Hans De Winter
- Laboratory of Medicinal Chemistry, Department of Pharmaceutical Sciences, University of Antwerp, Universiteitslaan 1, 2610, Wilrijk, Belgium.
| |
Collapse
|
14
|
López-Pérez K, López-López E, Medina-Franco JL, Miranda-Quintana RA. Sampling and Mapping Chemical Space with Extended Similarity Indices. Molecules 2023; 28:6333. [PMID: 37687162 PMCID: PMC10489020 DOI: 10.3390/molecules28176333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 08/24/2023] [Accepted: 08/26/2023] [Indexed: 09/10/2023] Open
Abstract
Visualization of the chemical space is useful in many aspects of chemistry, including compound library design, diversity analysis, and exploring structure-property relationships, to name a few. Examples of notable research areas where the visualization of chemical space has strong applications are drug discovery and natural product research. However, the sheer volume of even comparatively small sub-sections of chemical space implies that we need to use approximations at the time of navigating through chemical space. ChemMaps is a visualization methodology that approximates the distribution of compounds in large datasets based on the selection of satellite compounds that yield a similar mapping of the whole dataset when principal component analysis on a similarity matrix is performed. Here, we show how the recently proposed extended similarity indices can help find regions that are relevant to sample satellites and reduce the amount of high-dimensional data needed to describe a library's chemical space.
Collapse
Affiliation(s)
- Kenneth López-Pérez
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32611, USA;
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, National Autonomous University of Mexico, Mexico City 04510, Mexico;
- Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico City 07000, Mexico
| | - José L. Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, National Autonomous University of Mexico, Mexico City 04510, Mexico;
| | | |
Collapse
|
15
|
Blanchard AE, Bhowmik D, Fox Z, Gounley J, Glaser J, Akpa BS, Irle S. Adaptive language model training for molecular design. J Cheminform 2023; 15:59. [PMID: 37291633 PMCID: PMC10249556 DOI: 10.1186/s13321-023-00719-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 04/03/2023] [Indexed: 06/10/2023] Open
Abstract
The vast size of chemical space necessitates computational approaches to automate and accelerate the design of molecular sequences to guide experimental efforts for drug discovery. Genetic algorithms provide a useful framework to incrementally generate molecules by applying mutations to known chemical structures. Recently, masked language models have been applied to automate the mutation process by leveraging large compound libraries to learn commonly occurring chemical sequences (i.e., using tokenization) and predict rearrangements (i.e., using mask prediction). Here, we consider how language models can be adapted to improve molecule generation for different optimization tasks. We use two different generation strategies for comparison, fixed and adaptive. The fixed strategy uses a pre-trained model to generate mutations; the adaptive strategy trains the language model on each new generation of molecules selected for target properties during optimization. Our results show that the adaptive strategy allows the language model to more closely fit the distribution of molecules in the population. Therefore, for enhanced fitness optimization, we suggest the use of the fixed strategy during an initial phase followed by the use of the adaptive strategy. We demonstrate the impact of adaptive training by searching for molecules that optimize both heuristic metrics, drug-likeness and synthesizability, as well as predicted protein binding affinity from a surrogate model. Our results show that the adaptive strategy provides a significant improvement in fitness optimization compared to the fixed pre-trained model, empowering the application of language models to molecular design tasks.
Collapse
Affiliation(s)
- Andrew E Blanchard
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Debsindhu Bhowmik
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
| | - Zachary Fox
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - John Gounley
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Jens Glaser
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Belinda S Akpa
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- Chemical & Biomolecular Engineering, University of Tennessee, Knoxville, TN, 37996, USA
| | - Stephan Irle
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| |
Collapse
|
16
|
Cytter Y, Nandy A, Duan C, Kulik HJ. Insights into the deviation from piecewise linearity in transition metal complexes from supervised machine learning models. Phys Chem Chem Phys 2023; 25:8103-8116. [PMID: 36876903 DOI: 10.1039/d3cp00258f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Virtual high-throughput screening (VHTS) and machine learning (ML) with density functional theory (DFT) suffer from inaccuracies from the underlying density functional approximation (DFA). Many of these inaccuracies can be traced to the lack of derivative discontinuity that leads to a curvature in the energy with electron addition or removal. Over a dataset of nearly one thousand transition metal complexes typical of VHTS applications, we computed and analyzed the average curvature (i.e., deviation from piecewise linearity) for 23 density functional approximations spanning multiple rungs of "Jacob's ladder". While we observe the expected dependence of the curvatures on Hartree-Fock exchange, we note limited correlation of curvature values between different rungs of "Jacob's ladder". We train ML models (i.e., artificial neural networks or ANNs) to predict the curvature and the associated frontier orbital energies for each of these 23 functionals and then interpret differences in curvature among the different DFAs through analysis of the ML models. Notably, we observe spin to play a much more important role in determining the curvature of range-separated and double hybrids in comparison to semi-local functionals, explaining why curvature values are weakly correlated between these and other families of functionals. Over a space of 187.2k hypothetical compounds, we use our ANNs to pinpoint DFAs for which representative transition metal complexes have near-zero curvature with low uncertainty, demonstrating an approach to accelerate screening of complexes with targeted optical gaps.
Collapse
Affiliation(s)
- Yael Cytter
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
17
|
Zhang S, Xu L, Li S, Oliveira JCA, Li X, Ackermann L, Hong X. Bridging Chemical Knowledge and Machine Learning for Performance Prediction of Organic Synthesis. Chemistry 2023; 29:e202202834. [PMID: 36206170 PMCID: PMC10099903 DOI: 10.1002/chem.202202834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Indexed: 11/29/2022]
Abstract
Recent years have witnessed a boom of machine learning (ML) applications in chemistry, which reveals the potential of data-driven prediction of synthesis performance. Digitalization and ML modelling are the key strategies to fully exploit the unique potential within the synergistic interplay between experimental data and the robust prediction of performance and selectivity. A series of exciting studies have demonstrated the importance of chemical knowledge implementation in ML, which improves the model's capability for making predictions that are challenging and often go beyond the abilities of human beings. This Minireview summarizes the cutting-edge embedding techniques and model designs in synthetic performance prediction, elaborating how chemical knowledge can be incorporated into machine learning until June 2022. By merging organic synthesis tactics and chemical informatics, we hope this Review can provide a guide map and intrigue chemists to revisit the digitalization and computerization of organic chemistry principles.
Collapse
Affiliation(s)
- Shuo‐Qing Zhang
- Center of Chemistry for Frontier TechnologiesDepartment of ChemistryState Key Laboratory of Clean Energy UtilizationZhejiang University38 Zheda RoadHangzhou310027P. R. China
| | - Li‐Cheng Xu
- Center of Chemistry for Frontier TechnologiesDepartment of ChemistryState Key Laboratory of Clean Energy UtilizationZhejiang University38 Zheda RoadHangzhou310027P. R. China
| | - Shu‐Wen Li
- Center of Chemistry for Frontier TechnologiesDepartment of ChemistryState Key Laboratory of Clean Energy UtilizationZhejiang University38 Zheda RoadHangzhou310027P. R. China
| | - João C. A. Oliveira
- Institut für Organische und Biomolekulare ChemieWöhler Research Institute for Sustainable Chemistry (WISCh)Georg-August-UniversitätTammannstraße 237077GöttingenGermany
| | - Xin Li
- Center of Chemistry for Frontier TechnologiesDepartment of ChemistryState Key Laboratory of Clean Energy UtilizationZhejiang University38 Zheda RoadHangzhou310027P. R. China
| | - Lutz Ackermann
- Institut für Organische und Biomolekulare ChemieWöhler Research Institute for Sustainable Chemistry (WISCh)Georg-August-UniversitätTammannstraße 237077GöttingenGermany
| | - Xin Hong
- Center of Chemistry for Frontier TechnologiesDepartment of ChemistryState Key Laboratory of Clean Energy UtilizationZhejiang University38 Zheda RoadHangzhou310027P. R. China
- Beijing National Laboratory for Molecular SciencesZhongguancun North First Street No. 2Beijing100190P. R. China
- Key Laboratory of Precise Synthesis ofFunctional Molecules of Zhejiang ProvinceSchool of ScienceWestlake University18 Shilongshan RoadHangzhou310024Zhejiang ProvinceP. R. China
| |
Collapse
|
18
|
López-López E, Medina-Franco JL. Towards Decoding Hepatotoxicity of Approved Drugs through Navigation of Multiverse and Consensus Chemical Spaces. Biomolecules 2023; 13:biom13010176. [PMID: 36671561 PMCID: PMC9855470 DOI: 10.3390/biom13010176] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 01/11/2023] [Accepted: 01/12/2023] [Indexed: 01/18/2023] Open
Abstract
Drug-induced liver injury (DILI) is the principal reason for failure in developing drug candidates. It is the most common reason to withdraw from the market after a drug has been approved for clinical use. In this context, data from animal models, liver function tests, and chemical properties could complement each other to understand DILI events better and prevent them. Since the chemical space concept improves decision-making drug design related to the prediction of structure-property relationships, side effects, and polypharmacology drug activity (uniquely mentioning the most recent advances), it is an attractive approach to combining different phenomena influencing DILI events (e.g., individual "chemical spaces") and exploring all events simultaneously in an integrated analysis of the DILI-relevant chemical space. However, currently, no systematic methods allow the fusion of a collection of different chemical spaces to collect different types of data on a unique chemical space representation, namely "consensus chemical space." This study is the first report that implements data fusion to consider different criteria simultaneously to facilitate the analysis of DILI-related events. In particular, the study highlights the importance of analyzing together in vitro and chemical data (e.g., topology, bond order, atom types, presence of rings, ring sizes, and aromaticity of compounds encoded on RDKit fingerprints). These properties could be aimed at improving the understanding of DILI events.
Collapse
Affiliation(s)
- Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, National Autonomous University of Mexico, Mexico City 04510, Mexico
- Department of Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV), Mexico City 07360, Mexico
- Correspondence: (E.L.-L.); (J.L.M.-F.)
| | - José L. Medina-Franco
- Department of Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV), Mexico City 07360, Mexico
- Correspondence: (E.L.-L.); (J.L.M.-F.)
| |
Collapse
|
19
|
Blanchard AE, Gounley J, Bhowmik D, Chandra Shekar M, Lyngaas I, Gao S, Yin J, Tsaris A, Wang F, Glaser J. Language models for the prediction of SARS-CoV-2 inhibitors. THE INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2022; 36:587-602. [PMID: 38603308 PMCID: PMC9548488 DOI: 10.1177/10943420221121804] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ∼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.
Collapse
Affiliation(s)
| | - John Gounley
- Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | | | | | | | - Shang Gao
- Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Junqi Yin
- Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | | | - Feiyi Wang
- Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jens Glaser
- Oak Ridge National Laboratory, Oak Ridge, TN, USA
| |
Collapse
|
20
|
Medina‐Franco JL, Chávez‐Hernández AL, López‐López E, Saldívar‐González FI. Chemical Multiverse: An Expanded View of Chemical Space. Mol Inform 2022; 41:e2200116. [PMID: 35916110 PMCID: PMC9787733 DOI: 10.1002/minf.202200116] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 08/01/2022] [Indexed: 12/30/2022]
Abstract
Technological advances and practical applications of the chemical space concept in drug discovery, natural product research, and other research areas have attracted the scientific community's attention. The large- and ultra-large chemical spaces are associated with the significant increase in the number of compounds that can potentially be made and exist and the increasing number of experimental and calculated descriptors, that are emerging that encode the molecular structure and/or property aspects of the molecules. Due to the importance and continued evolution of compound libraries, herein, we discuss definitions proposed in the literature for chemical space and emphasize the convenience, discussed in the literature to use complementary descriptors to obtain a comprehensive view of the chemical space of compound data sets. In this regard, we introduce the term chemical multiverse to refer to the comprehensive analysis of compound data sets through several chemical spaces, each defined by a different set of chemical representations. The chemical multiverse is contrasted with a related idea: consensus chemical space.
Collapse
Affiliation(s)
- José L. Medina‐Franco
- DIFACQUIM research group, Department of Pharmacy, School of ChemistryNational Autonomous University of MexicoMexico City04510Mexico
| | - Ana L. Chávez‐Hernández
- DIFACQUIM research group, Department of Pharmacy, School of ChemistryNational Autonomous University of MexicoMexico City04510Mexico
| | - Edgar López‐López
- Department of PharmacologyCenter for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV)Mexico City07360Mexico
| | - Fernanda I. Saldívar‐González
- DIFACQUIM research group, Department of Pharmacy, School of ChemistryNational Autonomous University of MexicoMexico City04510Mexico
| |
Collapse
|
21
|
Kuzhagaliyeva N, Horváth S, Williams J, Nicolle A, Sarathy SM. Artificial intelligence-driven design of fuel mixtures. Commun Chem 2022; 5:111. [PMID: 36697675 PMCID: PMC9814251 DOI: 10.1038/s42004-022-00722-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 08/15/2022] [Indexed: 01/28/2023] Open
Abstract
High-performance fuel design is imperative to achieve cleaner burning and high-efficiency engine systems. We introduce a data-driven artificial intelligence (AI) framework to design liquid fuels exhibiting tailor-made properties for combustion engine applications to improve efficiency and lower carbon emissions. The fuel design approach is a constrained optimization task integrating two parts: (i) a deep learning (DL) model to predict the properties of pure components and mixtures and (ii) search algorithms to efficiently navigate in the chemical space. Our approach presents the mixture-hidden vector as a linear combination of each single component's vectors in each blend and incorporates it into the network architecture (the mixing operator (MO)). We demonstrate that the DL model exhibits similar accuracy as competing computational techniques in predicting the properties for pure components, while the search tool can generate multiple candidate fuel mixtures. The integrated framework was evaluated to showcase the design of high-octane and low-sooting tendency fuel that is subject to gasoline specification constraints. This AI fuel design methodology enables rapidly developing fuel formulations to optimize engine efficiency and lower emissions.
Collapse
Affiliation(s)
- Nursulu Kuzhagaliyeva
- Clean Combustion Research Center (CCRC), Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
| | - Samuel Horváth
- Visual Computing Center (VCC), Computer, Electrical and Mathematical Sciences & Engineering Division, KAUST, Thuwal, 23955-6900, Saudi Arabia
- Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
| | - John Williams
- Aramco Fuel Research Center, 232 Avenue Bonaparte, Rueil-Malmaison, 92852, France
| | - Andre Nicolle
- Aramco Fuel Research Center, 232 Avenue Bonaparte, Rueil-Malmaison, 92852, France
| | - S Mani Sarathy
- Clean Combustion Research Center (CCRC), Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
22
|
A pocket-based 3D molecule generative model fueled by experimental electron density. Sci Rep 2022; 12:15100. [PMID: 36068257 PMCID: PMC9448726 DOI: 10.1038/s41598-022-19363-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Accepted: 08/29/2022] [Indexed: 11/08/2022] Open
Abstract
We report for the first time the use of experimental electron density (ED) as training data for the generation of drug-like three-dimensional molecules based on the structure of a target protein pocket. Similar to a structural biologist building molecules based on their ED, our model functions with two main components: a generative adversarial network (GAN) to generate the ligand ED in the input pocket and an ED interpretation module for molecule generation. The model was tested on three targets: a kinase (hematopoietic progenitor kinase 1), protease (SARS-CoV-2 main protease), and nuclear receptor (vitamin D receptor), and evaluated with a reference dataset composed of over 8000 compounds that have their activities reported in the literature. The evaluation considered the chemical validity, chemical space distribution-based diversity, and similarity with reference active compounds concerning the molecular structure and pocket-binding mode. Our model can generate molecules with similar structures to classical active compounds and novel compounds sharing similar binding modes with active compounds, making it a promising tool for library generation supporting high-throughput virtual screening. The ligand ED generated can also be used to support fragment-based drug design. Our model is available as an online service to academic users via https://edmg.stonewise.cn/#/create .
Collapse
|
23
|
Wang J, Wang X, Sun H, Wang M, Zeng Y, Jiang D, Wu Z, Liu Z, Liao B, Yao X, Hsieh CY, Cao D, Chen X, Hou T. ChemistGA: A Chemical Synthesizable Accessible Molecular Generation Algorithm for Real-World Drug Discovery. J Med Chem 2022; 65:12482-12496. [PMID: 36065998 DOI: 10.1021/acs.jmedchem.2c01179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Many deep learning (DL)-based molecular generative models have been proposed to design novel molecules. These models may perform well on benchmarks, but they usually do not take real-world constraints into account, such as available training data set, synthetic accessibility, and scaffold diversity in drug discovery. In this study, a new algorithm, ChemistGA, was proposed by combining the traditional heuristic algorithm with DL, in which the crossover of the traditional genetic algorithm (GA) was redefined by DL in conjunction with GA, and an innovative backcrossing operation was implemented to generate desired molecules. Our results clearly show that ChemistGA not only retains the strength of the traditional GA but also greatly enhances the synthetic accessibility and success rate of the generated molecules with desired properties. Calculations on the two benchmarks illustrate that ChemistGA achieves impressive performance among the state-of-the-art baselines, and it opens a new avenue for the application of generative models to real-world drug discovery scenarios.
Collapse
Affiliation(s)
- Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,School of Computer Science, Wuhan University, Wuhan 430072, Hubei, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, P. R. China
| | - Xiaorui Wang
- CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, P. R. China.,State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa 999078, Macau(SAR), P. R. China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, P. R. China
| | - Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, P. R. China
| | - Yundian Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, P. R. China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Zeyi Liu
- DAMTP, Centre for Mathematical Sciences, University of Cambridge, Cambridge CB30WA, U.K
| | - Ben Liao
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, P. R. China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa 999078, Macau(SAR), P. R. China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, P. R. China
| | - Xi Chen
- School of Computer Science, Wuhan University, Wuhan 430072, Hubei, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| |
Collapse
|
24
|
Saldívar-González FI, Medina-Franco JL. Approaches for enhancing the analysis of chemical space for drug discovery. Expert Opin Drug Discov 2022; 17:789-798. [PMID: 35640229 DOI: 10.1080/17460441.2022.2084608] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
INTRODUCTION Chemical space is a powerful, general, and practical conceptual framework in drug discovery and other areas in chemistry that addresses the diversity of molecules and it has various applications. Moreover, chemical space is a cornerstone of chemoinformatics as a scientific discipline. In response to the increase in the set of chemical compounds in databases, generators of chemical structures, and tools to calculate molecular descriptors, novel approaches to generate visual representations of chemical space in low dimensions are emerging and evolving. Such approaches include a wide range of commercial and free applications, software, and open-source methods. AREAS COVERED The current state of chemical space in drug design and discovery is reviewed. The topics discussed herein include advances for efficient navigation in chemical space, the use of this concept in assessing the diversity of different data sets, exploring structure-property/activity relationships for one or multiple endpoints, and compound library design. Recent advances in methodologies for generating visual representations of chemical space have been highlighted, thereby emphasizing open-source methods. EXPERT OPINION Quantitative and qualitative generation and analysis of chemical space require novel approaches for handling the increasing number of molecules and their information available in chemical databases (including emerging ultra-large libraries). In addition, it is of utmost importance to note that chemical space is a conceptual framework that goes beyond visual representation in low dimensions. However, the graphical representation of chemical space has several practical applications in drug discovery and beyond.
Collapse
Affiliation(s)
- Fernanda I Saldívar-González
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| |
Collapse
|
25
|
Modeling of the Crystallization Conditions for Organic Synthesis Product Purification Using Deep Learning. ELECTRONICS 2022. [DOI: 10.3390/electronics11091360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Crystallization is an important purification technique for solid products in a chemical laboratory. However, the correct selection of a solvent is important for the success of the procedure. In order to accelerate the solvent or solvent mixture search process, we offer an in silico alternative, i.e., a never previously demonstrated approach that can model the reaction mixture crystallization conditions which are invariant to the reaction type. The offered deep learning-based method is trained to directly predict the solvent labels used in the crystallization steps of the synthetic procedure. Our solvent label prediction task is a multi-label multi-class classification task during which the method must correctly choose one or several solvents from 13 possible examples. During the experimental investigation, we tested two multi-label classifiers (i.e., Feed-Forward and Long Short-Term Memory neural networks) applied on top of vectors. For the vectorization, we used two methods (i.e., extended-connectivity fingerprints and autoencoders) with various parameters. Our optimized technique was able to reach the accuracy of 0.870 ± 0.004 (which is 0.693 above the baseline) on the testing dataset. This allows us to assume that the proposed approach can help to accelerate manual R&D processes in chemical laboratories.
Collapse
|
26
|
Liu Y, Hong W, Cao B. MolNet‐3D: Deep Learning of Molecular Representations and Properties from 3D Topography. ADVANCED THEORY AND SIMULATIONS 2022. [DOI: 10.1002/adts.202200037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Yuanbin Liu
- Key Laboratory for Thermal Science and Power Engineering of Ministry of Education Department of Engineering Mechanics Tsinghua University Beijing 100084 China
| | - Weixiang Hong
- Institute of Systems Science National University of Singapore Singapore 119615 Singapore
| | - Bingyang Cao
- Key Laboratory for Thermal Science and Power Engineering of Ministry of Education Department of Engineering Mechanics Tsinghua University Beijing 100084 China
| |
Collapse
|
27
|
Duan C, Nandy A, Kulik HJ. Machine Learning for the Discovery, Design, and Engineering of Materials. Annu Rev Chem Biomol Eng 2022; 13:405-429. [PMID: 35320698 DOI: 10.1146/annurev-chembioeng-092320-120230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Machine learning (ML) has become a part of the fabric of high-throughput screening and computational discovery of materials. Despite its increasingly central role, challenges remain in fully realizing the promise of ML. This is especially true for the practical acceleration of the engineering of robust materials and the development of design strategies that surpass trial and error or high-throughput screening alone. Depending on the quantity being predicted and the experimental data available, ML can either outperform physics-based modes, be used to accelerate such models, or be integrated with them to improve their performance. We cover recent advances in algorithms and in their application that are starting to make inroads toward (a) the discovery of new materials through large-scale enumerative screening, (b) the design of materials through identification of rules and principles that govern materials properties, and (c) the engineering of practical materials by satisfying multiple objectives. We conclude with opportunities for further advancement to realize ML as a widespread tool for practical computational materials design. Expected final online publication date for the Annual Review of Chemical and Biomolecular Engineering, Volume 13 is October 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , , .,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , , .,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , ,
| |
Collapse
|
28
|
A Survey of Datasets, Preprocessing, Modeling Mechanisms, and Simulation Tools Based on AI for Material Analysis and Discovery. MATERIALS 2022; 15:ma15041428. [PMID: 35207968 PMCID: PMC8875409 DOI: 10.3390/ma15041428] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 01/17/2022] [Accepted: 01/26/2022] [Indexed: 02/01/2023]
Abstract
Research has become increasingly more interdisciplinary over the past few years. Artificial intelligence and its sub-fields have proven valuable for interdisciplinary research applications, especially physical sciences. Recently, machine learning-based mechanisms have been adapted for material science applications, meeting traditional experiments' challenges in a time and cost-efficient manner. The scientific community focuses on harnessing varying mechanisms to process big data sets extracted from material databases to derive hidden knowledge that can successfully be employed in technical frameworks of material screening, selection, and recommendation. However, a plethora of underlying aspects of the existing material discovery methods needs to be critically assessed to have a precise and collective analysis that can serve as a baseline for various forthcoming material discovery problems. This study presents a comprehensive survey of state-of-the-art benchmark data sets, detailed pre-processing and analysis, appropriate learning model mechanisms, and simulation techniques for material discovery. We believe that such an in-depth analysis of the mentioned aspects provides promising directions to the young interdisciplinary researchers from computing and material science fields. This study will help devise useful modeling in the materials discovery to positively contribute to the material industry, reducing the manual effort involved in the traditional material discovery. Moreover, we also present a detailed analysis of experimental and computation-based artificial intelligence mechanisms suggested by the existing literature.
Collapse
|
29
|
On the Investigation of Effective Factors on Electronic Structure Properties of Transition Metal Complexes: Robust Modeling Using GPR Approach. INTERNATIONAL JOURNAL OF CHEMICAL ENGINEERING 2022. [DOI: 10.1155/2022/8264297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Materials discovery is usually done using high-throughput computational screening. The use of costly and complex direct density functional theory (DFT) simulation methods has been commonly used to determine subtle trends in spin-state ordering and inorganic bonding of inorganic materials and, in general, to predict the electronic structure properties of transition metal complexes. A Gaussian process regression (GPR) framework consisting of four kernel functions is introduced for spin-state splitting estimation through inorganic chemistry-appropriate empirical inputs. To this end, the present study reviewed an extensive range of data values from earlier works. According to statistical analysis, the GPR model showed very good performance. The coefficients of determination were calculated to be 0.986 for the exponential and Matern kernel functions, suggesting the highest predictive power of these methods. Moreover, the sensitivity of output to inputs was measured. Artificial intelligence (AI) helped accurately predict the target values through various input ranges.
Collapse
|
30
|
Harper DR, Nandy A, Arunachalam N, Duan C, Janet JP, Kulik HJ. Representations and strategies for transferable machine learning Improve model performance in chemical discovery. J Chem Phys 2022; 156:074101. [DOI: 10.1063/5.0082964] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Daniel R Harper
- Massachusetts Institute of Technology, United States of America
| | - Aditya Nandy
- Massachusetts Institute of Technology, United States of America
| | | | - Chenru Duan
- Massachusetts Institute of Technology, United States of America
| | | | - Heather J. Kulik
- Dept of Chemical Engineering, Massachusetts Institute of Technology, United States of America
| |
Collapse
|
31
|
Kerstjens A, De Winter H. LEADD: Lamarckian evolutionary algorithm for de novo drug design. J Cheminform 2022; 14:3. [PMID: 35033209 PMCID: PMC8760751 DOI: 10.1186/s13321-022-00582-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 12/30/2021] [Indexed: 11/10/2022] Open
Abstract
Given an objective function that predicts key properties of a molecule, goal-directed de novo molecular design is a useful tool to identify molecules that maximize or minimize said objective function. Nonetheless, a common drawback of these methods is that they tend to design synthetically unfeasible molecules. In this paper we describe a Lamarckian evolutionary algorithm for de novo drug design (LEADD). LEADD attempts to strike a balance between optimization power, synthetic accessibility of designed molecules and computational efficiency. To increase the likelihood of designing synthetically accessible molecules, LEADD represents molecules as graphs of molecular fragments, and limits the bonds that can be formed between them through knowledge-based pairwise atom type compatibility rules. A reference library of drug-like molecules is used to extract fragments, fragment preferences and compatibility rules. A novel set of genetic operators that enforce these rules in a computationally efficient manner is presented. To sample chemical space more efficiently we also explore a Lamarckian evolutionary mechanism that adapts the reproductive behavior of molecules. LEADD has been compared to both standard virtual screening and a comparable evolutionary algorithm using a standardized benchmark suite and was shown to be able to identify fitter molecules more efficiently. Moreover, the designed molecules are predicted to be easier to synthesize than those designed by other evolutionary algorithms.
Collapse
Affiliation(s)
- Alan Kerstjens
- Department of Pharmaceutical Sciences, Faculty of Pharmaceutical, Biomedical and Veterinary Sciences, University of Antwerp, Universiteitsplein 1A, 2610, Wilrijk, Belgium
| | - Hans De Winter
- Department of Pharmaceutical Sciences, Faculty of Pharmaceutical, Biomedical and Veterinary Sciences, University of Antwerp, Universiteitsplein 1A, 2610, Wilrijk, Belgium.
| |
Collapse
|
32
|
Abstract
Artificial intelligence (AI) tools find increasing application in drug discovery supporting every stage of the Design-Make-Test-Analyse (DMTA) cycle. The main focus of this chapter is the application in molecular generation with the aid of deep neural networks (DNN). We present a historical overview of the main advances in the field. We analyze the concepts of distribution and goal-directed learning and then highlight some of the recent applications of generative models in drug design with a focus into research work from the biopharmaceutical industry. We present in some more detail REINVENT which is an open-source software developed within our group in AstraZeneca and the main platform for AI molecular design support for a number of medicinal chemistry projects in the company and we also demonstrate some of our work in library design. Finally, we present some of the main challenges in the application of AI in Drug Discovery and different approaches to respond to these challenges which define areas for current and future work.
Collapse
|
33
|
Kim H, Na J, Lee WB. Generative Chemical Transformer: Neural Machine Learning of Molecular Geometric Structures from Chemical Language via Attention. J Chem Inf Model 2021; 61:5804-5814. [PMID: 34855384 DOI: 10.1021/acs.jcim.1c01289] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Discovering new materials better suited to specific purposes is an important issue in improving the quality of human life. Here, a neural network that creates molecules that meet some desired multiple target conditions based on a deep understanding of chemical language is proposed (generative chemical Transformer, GCT). The attention mechanism in GCT allows a deeper understanding of molecular structures beyond the limitations of chemical language itself which cause semantic discontinuity by paying attention to characters sparsely. The significance of language models for inverse molecular design problems is investigated by quantitatively evaluating the quality of the generated molecules. GCT generates highly realistic chemical strings that satisfy both chemical and linguistic grammar rules. Molecules parsed from the generated strings simultaneously satisfy the multiple target properties and vary for a single condition set. These advances will contribute to improving the quality of human life by accelerating the process of desired material discovery.
Collapse
Affiliation(s)
- Hyunseung Kim
- School of Chemical and Biological Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Jonggeol Na
- Department of Chemical Engineering and Materials Science, Graduate Program in System Health Science and Engineering, Ewha Womans University, Seoul 03760, Republic of Korea
| | - Won Bo Lee
- School of Chemical and Biological Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, Republic of Korea
| |
Collapse
|
34
|
Ueda H, Wipf P, Nakamura H. Synthesis of sp 3-rich chiral bicyclo[3.3.1]nonanes for chemical space expansion and study of biological activities. Bioorg Med Chem 2021; 54:116561. [PMID: 34920311 DOI: 10.1016/j.bmc.2021.116561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 12/04/2021] [Accepted: 12/07/2021] [Indexed: 11/30/2022]
Abstract
Chiral sp3-rich bicyclo[3.3.1]nonane scaffolds 10-12 were synthesized as single diastereomers from aldehyde 9, which was prepared from 4,4-dimethoxycyclohexa-2,5-dienone through a copper-catalyzed enantioselective reduction. Three different types of intramolecular addition reactions were studied: SmI2-mediated reductive cyclization, base-promoted aldol reaction, and one-pot Mannich reaction. We succeeded in introducing three side-chains to scaffold 11 and construct an sp3-rich compound library in both enantiomeric variants by simply changing the chirality of the ligands. The biological evaluation revealed that all synthesized compounds exhibited a concentration-dependent inhibition of hypoxia-inducible factor-1 (HIF-1) transcriptional activity, with IC50 values in the range of 17.2-31.7 µM, whereas their effects on cell viability were varied (IC50 = 3.5 to > 100 µM). The most active compound 16f inhibits the accumulation of HIF-1α protein and mRNA in hypoxia, indicating that it has a mechanism of action distinctly different from other known compounds bearing the common bicyclo[3.3.1]nonane skeleton.
Collapse
Affiliation(s)
- Hiroki Ueda
- School of Life Science and Technology, Tokyo Institute of Technology, 4259 Nagatsuta‑cho, Midori‑ku, Yokohama 226‑8501, Japan
| | - Peter Wipf
- Department of Chemistry, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| | - Hiroyuki Nakamura
- School of Life Science and Technology, Tokyo Institute of Technology, 4259 Nagatsuta‑cho, Midori‑ku, Yokohama 226‑8501, Japan; Laboratory for Chemistry and Life Science, Institute of Innovative Research, Tokyo Institute of Technology, 4259 Nagatsuta‑cho, Midori‑ku, Yokohama 226‑8503, Japan.
| |
Collapse
|
35
|
Joshi RP, Kumar N. Artificial Intelligence for Autonomous Molecular Design: A Perspective. Molecules 2021; 26:6761. [PMID: 34833853 PMCID: PMC8619999 DOI: 10.3390/molecules26226761] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 10/23/2021] [Accepted: 10/29/2021] [Indexed: 11/23/2022] Open
Abstract
Domain-aware artificial intelligence has been increasingly adopted in recent years to expedite molecular design in various applications, including drug design and discovery. Recent advances in areas such as physics-informed machine learning and reasoning, software engineering, high-end hardware development, and computing infrastructures are providing opportunities to build scalable and explainable AI molecular discovery systems. This could improve a design hypothesis through feedback analysis, data integration that can provide a basis for the introduction of end-to-end automation for compound discovery and optimization, and enable more intelligent searches of chemical space. Several state-of-the-art ML architectures are predominantly and independently used for predicting the properties of small molecules, their high throughput synthesis, and screening, iteratively identifying and optimizing lead therapeutic candidates. However, such deep learning and ML approaches also raise considerable conceptual, technical, scalability, and end-to-end error quantification challenges, as well as skepticism about the current AI hype to build automated tools. To this end, synergistically and intelligently using these individual components along with robust quantum physics-based molecular representation and data generation tools in a closed-loop holds enormous promise for accelerated therapeutic design to critically analyze the opportunities and challenges for their more widespread application. This article aims to identify the most recent technology and breakthrough achieved by each of the components and discusses how such autonomous AI and ML workflows can be integrated to radically accelerate the protein target or disease model-based probe design that can be iteratively validated experimentally. Taken together, this could significantly reduce the timeline for end-to-end therapeutic discovery and optimization upon the arrival of any novel zoonotic transmission event. Our article serves as a guide for medicinal, computational chemistry and biology, analytical chemistry, and the ML community to practice autonomous molecular design in precision medicine and drug discovery.
Collapse
Affiliation(s)
| | - Neeraj Kumar
- Computational Biology Group, Biological Science Division, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA;
| |
Collapse
|
36
|
Takács G, Sándor M, Szalai Z, Kiss R, Balogh GT. Analysis of the uncharted, druglike property space by self-organizing maps. Mol Divers 2021; 26:2427-2441. [PMID: 34709525 PMCID: PMC9532340 DOI: 10.1007/s11030-021-10343-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 10/15/2021] [Indexed: 12/29/2022]
Abstract
Physicochemical properties are fundamental to predict the pharmacokinetic and pharmacodynamic behavior of drug candidates. Easily calculated descriptors such as molecular weight and logP have been found to correlate with the success rate of clinical trials. These properties have been previously shown to highlight a sweet-spot in the chemical space associated with favorable pharmacokinetics, which is superior against other regions during hit identification and optimization. In this study, we applied self-organizing maps (SOMs) trained on sixteen calculated properties of a subset of known drugs for the analysis of commercially available compound databases, as well as public biological and chemical databases frequently used for drug discovery. Interestingly, several regions of the property space have been identified that are highly overrepresented by commercially available chemical libraries, while we found almost completely unoccupied regions of the maps (commercially neglected chemical space resembling the properties of known drugs). Moreover, these underrepresented portions of the chemical space are compatible with most rigorous property filters applied by the pharma industry in medicinal chemistry optimization programs. Our results suggest that SOMs may be directly utilized in the strategy of library design for drug discovery to sample previously unexplored parts of the chemical space to aim at yet-undruggable targets.
Collapse
Affiliation(s)
- Gergely Takács
- Department of Chemical and Environmental Process Engineering, Budapest University of Technology and Economics, Műegyetem rakpart 3, Budapest, 1111, Hungary
- Mcule.com Kft, Bartók Béla út 105-113, Budapest, 1115, Hungary
| | - Márk Sándor
- Mcule.com Kft, Bartók Béla út 105-113, Budapest, 1115, Hungary
| | - Zoltán Szalai
- Mcule.com Kft, Bartók Béla út 105-113, Budapest, 1115, Hungary
| | - Róbert Kiss
- Mcule.com Kft, Bartók Béla út 105-113, Budapest, 1115, Hungary.
| | - György T Balogh
- Department of Chemical and Environmental Process Engineering, Budapest University of Technology and Economics, Műegyetem rakpart 3, Budapest, 1111, Hungary.
- Department of Pharmacodynamics and Biopharmacy, University of Szeged, Szeged, 6720, Hungary.
| |
Collapse
|
37
|
Taylor MG, Nandy A, Lu CC, Kulik HJ. Deciphering Cryptic Behavior in Bimetallic Transition-Metal Complexes with Machine Learning. J Phys Chem Lett 2021; 12:9812-9820. [PMID: 34597514 DOI: 10.1021/acs.jpclett.1c02852] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We demonstrate an alternative, data-driven approach to uncovering structure-property relationships for the rational design of heterobimetallic transition-metal complexes that exhibit metal-metal bonding. We tailor graph-based representations of the metal-local environment for these complexes for use in multiple linear regression and kernel ridge regression (KRR) models. We curate a set of 28 experimentally characterized complexes to develop a multiple linear regression model for oxidation potentials. We achieve good accuracy (mean absolute error of 0.25 V) and preserve transferability to unseen experimental data with a new ligand structure. We also train a KRR model on a subset of 330 structurally characterized heterobimetallics to predict the degree of metal-metal bonding. This KRR model predicts relative metal-metal bond lengths in the test set to within 5%, and analysis of key features reveals the fundamental atomic contributions (e.g., the valence electron configuration) that most strongly influence the behavior of these complexes. Our work provides guidance for rational bimetallic design, suggesting that properties, including the formal shortness ratio, should be transferable from one period to another.
Collapse
Affiliation(s)
- Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Connie C Lu
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
38
|
Duan C, Chen S, Taylor MG, Liu F, Kulik HJ. Machine learning to tame divergent density functional approximations: a new path to consensus materials design principles. Chem Sci 2021; 12:13021-13036. [PMID: 34745533 PMCID: PMC8513898 DOI: 10.1039/d1sc03701c] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 09/01/2021] [Indexed: 01/17/2023] Open
Abstract
Virtual high-throughput screening (VHTS) with density functional theory (DFT) and machine-learning (ML)-acceleration is essential in rapid materials discovery. By necessity, efficient DFT-based workflows are carried out with a single density functional approximation (DFA). Nevertheless, properties evaluated with different DFAs can be expected to disagree for cases with challenging electronic structure (e.g., open-shell transition-metal complexes, TMCs) for which rapid screening is most needed and accurate benchmarks are often unavailable. To quantify the effect of DFA bias, we introduce an approach to rapidly obtain property predictions from 23 representative DFAs spanning multiple families, “rungs” (e.g., semi-local to double hybrid) and basis sets on over 2000 TMCs. Although computed property values (e.g., spin state splitting and frontier orbital gap) differ by DFA, high linear correlations persist across all DFAs. We train independent ML models for each DFA and observe convergent trends in feature importance, providing DFA-invariant, universal design rules. We devise a strategy to train artificial neural network (ANN) models informed by all 23 DFAs and use them to predict properties (e.g., spin-splitting energy) of over 187k TMCs. By requiring consensus of the ANN-predicted DFA properties, we improve correspondence of computational lead compounds with literature-mined, experimental compounds over the typically employed single-DFA approach. Machine learning (ML)-based feature analysis reveals universal design rules regardless of density functional choices. Using the consensus among multiple functionals, we identify robust lead complexes in ML-accelerated chemical discovery.![]()
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584.,Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Shuxin Chen
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584.,Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584
| |
Collapse
|
39
|
Cincilla G, Masoni S, Blobel J. Individual and collective human intelligence in drug design: evaluating the search strategy. J Cheminform 2021; 13:80. [PMID: 34635158 PMCID: PMC8507178 DOI: 10.1186/s13321-021-00556-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 09/18/2021] [Indexed: 11/10/2022] Open
Abstract
In recent years, individual and collective human intelligence, defined as the knowledge, skills, reasoning and intuition of individuals and groups, have been used in combination with computer algorithms to solve complex scientific problems. Such approach was successfully used in different research fields such as: structural biology, comparative genomics, macromolecular crystallography and RNA design. Herein we describe an attempt to use a similar approach in small-molecule drug discovery, specifically to drive search strategies of de novo drug design. This is assessed with a case study that consists of a series of public experiments in which participants had to explore the huge chemical space in silico to find predefined compounds by designing molecules and analyzing the score associate with them. Such a process may be seen as an instantaneous surrogate of the classical design-make-test cycles carried out by medicinal chemists during the drug discovery hit to lead phase but not hindered by long synthesis and testing times. We present first findings on (1) assessing human intelligence in chemical space exploration, (2) comparing individual and collective human intelligence performance in this task and (3) contrasting some human and artificial intelligence achievements in de novo drug design.
Collapse
Affiliation(s)
- Giovanni Cincilla
- Molomics, Barcelona Science Park, c/Baldiri i Reixac 4-12, 08028, Barcelona, Spain.
| | - Simone Masoni
- Molomics, Barcelona Science Park, c/Baldiri i Reixac 4-12, 08028, Barcelona, Spain.
| | - Jascha Blobel
- Molomics, Barcelona Science Park, c/Baldiri i Reixac 4-12, 08028, Barcelona, Spain.
| |
Collapse
|
40
|
Kalogirou AS, East MP, Laitinen T, Torrice CD, Maffuid KA, Drewry DH, Koutentis PA, Johnson GL, Crona DJ, Asquith CRM. Synthesis and Evaluation of Novel 1,2,6-Thiadiazinone Kinase Inhibitors as Potent Inhibitors of Solid Tumors. Molecules 2021; 26:molecules26195911. [PMID: 34641454 PMCID: PMC8513058 DOI: 10.3390/molecules26195911] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 09/22/2021] [Accepted: 09/24/2021] [Indexed: 11/16/2022] Open
Abstract
A focused series of substituted 4H-1,2,6-thiadiazin-4-ones was designed and synthesized to probe the anti-cancer properties of this scaffold. Insights from previous kinase inhibitor programs were used to carefully select several different substitution patterns. Compounds were tested on bladder, prostate, pancreatic, breast, chordoma, and lung cancer cell lines with an additional skin fibroblast cell line as a toxicity control. This resulted in the identification of several low single digit micro molar compounds with promising therapeutic windows, particularly for bladder and prostate cancer. A number of key structural features of the 4H-1,2,6-thiadiazin-4-one scaffold are discussed that show promising scope for future improvement.
Collapse
Affiliation(s)
- Andreas S. Kalogirou
- Department of Life Sciences, School of Sciences, European University Cyprus, 6 Diogenis Str., Engomi, P.O. Box 22006, Nicosia 1516, Cyprus
- Department of Chemistry, University of Cyprus, P.O. Box 20537, Nicosia 1678, Cyprus;
- Correspondence: (A.S.K.); (C.R.M.A.); Tel.: +357-22-559655 (A.S.K.); +1-919-491-3177 (C.R.M.A.)
| | - Michael P. East
- Department of Pharmacology, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA; (M.P.E.); (G.L.J.)
| | - Tuomo Laitinen
- School of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, 70211 Kuopio, Finland;
| | - Chad D. Torrice
- Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA; (C.D.T.); (K.A.M.); (D.J.C.)
| | - Kaitlyn A. Maffuid
- Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA; (C.D.T.); (K.A.M.); (D.J.C.)
| | - David H. Drewry
- Structural Genomics Consortium, Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA;
- Lineberger Comprehensive Cancer Center, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
| | | | - Gary L. Johnson
- Department of Pharmacology, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA; (M.P.E.); (G.L.J.)
- Lineberger Comprehensive Cancer Center, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Daniel J. Crona
- Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA; (C.D.T.); (K.A.M.); (D.J.C.)
- Lineberger Comprehensive Cancer Center, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Christopher R. M. Asquith
- Department of Pharmacology, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA; (M.P.E.); (G.L.J.)
- Correspondence: (A.S.K.); (C.R.M.A.); Tel.: +357-22-559655 (A.S.K.); +1-919-491-3177 (C.R.M.A.)
| |
Collapse
|
41
|
Liu Y, Mathis C, Bajczyk MD, Marshall SM, Wilbraham L, Cronin L. Exploring and mapping chemical space with molecular assembly trees. SCIENCE ADVANCES 2021; 7:eabj2465. [PMID: 34559562 PMCID: PMC8462901 DOI: 10.1126/sciadv.abj2465] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 08/03/2021] [Indexed: 06/01/2023]
Abstract
The rule-based search of chemical space can generate an almost infinite number of molecules, but exploration of known molecules as a function of the minimum number of steps needed to build up the target graphs promises to uncover new motifs and transformations. Assembly theory is an approach to compare the intrinsic complexity and properties of molecules by the minimum number of steps needed to build up the target graphs. Here, we apply this approach to prebiotic chemistry, gene sequences, plasticizers, and opiates. This allows us to explore molecules connected to the assembly tree, rather than the entire space of molecules possible. Last, by developing a reassembly method, based on assembly trees, we found that in the case of the opiates, a new set of drug candidates could be generated that would not be accessible via conventional fragment-based drug design, thereby demonstrating how this approach might find application in drug discovery.
Collapse
Affiliation(s)
- Yu Liu
- School of Chemistry, University of Glasgow, University Avenue,
Glasgow G12 8QQ, UK
| | - Cole Mathis
- School of Chemistry, University of Glasgow, University Avenue,
Glasgow G12 8QQ, UK
| | | | - Stuart M. Marshall
- School of Chemistry, University of Glasgow, University Avenue,
Glasgow G12 8QQ, UK
| | - Liam Wilbraham
- School of Chemistry, University of Glasgow, University Avenue,
Glasgow G12 8QQ, UK
| | - Leroy Cronin
- School of Chemistry, University of Glasgow, University Avenue,
Glasgow G12 8QQ, UK
| |
Collapse
|
42
|
Webb EW, Scott PJH. Potential Applications of Artificial Intelligence and Machine Learning in Radiochemistry and Radiochemical Engineering. PET Clin 2021; 16:525-532. [PMID: 34537128 DOI: 10.1016/j.cpet.2021.06.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Artificial intelligence and machine learning are poised to disrupt PET imaging from bench to clinic. In this perspective, the authors offer insights into how the technology could be applied to improve the radiosynthesis of new radiopharmaceuticals for PET imaging, including identification of an optimal labeling approach as well as strategies for radiolabeling reaction optimization.
Collapse
Affiliation(s)
- E William Webb
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Peter J H Scott
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
43
|
Sattari K, Xie Y, Lin J. Data-driven algorithms for inverse design of polymers. SOFT MATTER 2021; 17:7607-7622. [PMID: 34397078 DOI: 10.1039/d1sm00725d] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The ever-increasing demand for novel polymers with superior properties requires a deeper understanding and exploration of the chemical space. Recently, data-driven approaches to explore the chemical space for polymer design have emerged. Among them, inverse design strategies for designing polymers with specific properties have evolved to be a significant materials informatics platform by learning hidden knowledge from materials data as well as smartly navigating the chemical space in an optimized way. In this review, we first summarize the progress in the representation of polymers, a prerequisite step for the inverse design of polymers. Then, we systematically introduce three data-driven strategies implemented for the inverse design of polymers, i.e., high-throughput virtual screening, global optimization, and generative models. Finally, we discuss the challenges and opportunities of the data-driven strategies as well as optimization algorithms employed in the inverse design of polymers.
Collapse
Affiliation(s)
- Kianoosh Sattari
- Department of Mechanical and Aerospace Engineering, University of Missouri, Columbia, MO 65211, USA.
| | | | | |
Collapse
|
44
|
Abstract
Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.
Collapse
Affiliation(s)
- Bing Huang
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
| | - O. Anatole von Lilienfeld
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
- Institute
of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
45
|
Nandy A, Duan C, Taylor MG, Liu F, Steeves AH, Kulik HJ. Computational Discovery of Transition-metal Complexes: From High-throughput Screening to Machine Learning. Chem Rev 2021; 121:9927-10000. [PMID: 34260198 DOI: 10.1021/acs.chemrev.1c00347] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Transition-metal complexes are attractive targets for the design of catalysts and functional materials. The behavior of the metal-organic bond, while very tunable for achieving target properties, is challenging to predict and necessitates searching a wide and complex space to identify needles in haystacks for target applications. This review will focus on the techniques that make high-throughput search of transition-metal chemical space feasible for the discovery of complexes with desirable properties. The review will cover the development, promise, and limitations of "traditional" computational chemistry (i.e., force field, semiempirical, and density functional theory methods) as it pertains to data generation for inorganic molecular discovery. The review will also discuss the opportunities and limitations in leveraging experimental data sources. We will focus on how advances in statistical modeling, artificial intelligence, multiobjective optimization, and automation accelerate discovery of lead compounds and design rules. The overall objective of this review is to showcase how bringing together advances from diverse areas of computational chemistry and computer science have enabled the rapid uncovering of structure-property relationships in transition-metal chemistry. We aim to highlight how unique considerations in motifs of metal-organic bonding (e.g., variable spin and oxidation state, and bonding strength/nature) set them and their discovery apart from more commonly considered organic molecules. We will also highlight how uncertainty and relative data scarcity in transition-metal chemistry motivate specific developments in machine learning representations, model training, and in computational chemistry. Finally, we will conclude with an outlook of areas of opportunity for the accelerated discovery of transition-metal complexes.
Collapse
Affiliation(s)
- Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Adam H Steeves
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
46
|
|
47
|
Medina-Franco JL, Sánchez-Cruz N, López-López E, Díaz-Eufracio BI. Progress on open chemoinformatic tools for expanding and exploring the chemical space. J Comput Aided Mol Des 2021; 36:341-354. [PMID: 34143323 PMCID: PMC8211976 DOI: 10.1007/s10822-021-00399-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 06/14/2021] [Indexed: 01/10/2023]
Abstract
The concept of chemical space is a cornerstone in chemoinformatics, and it has broad conceptual and practical applicability in many areas of chemistry, including drug design and discovery. One of the most considerable impacts is in the study of structure-property relationships where the property can be a biological activity or any other characteristic of interest to a particular chemistry discipline. The chemical space is highly dependent on the molecular representation that is also a cornerstone concept in computational chemistry. Herein, we discuss the recent progress on chemoinformatic tools developed to expand and characterize the chemical space of compound data sets using different types of molecular representations, generate visual representations of such spaces, and explore structure-property relationships in the context of chemical spaces. We emphasize the development of methods and freely available tools focusing on drug discovery applications. We also comment on the general advantages and shortcomings of using freely available and easy-to-use tools and discuss the value of using such open resources for research, education, and scientific dissemination.
Collapse
Affiliation(s)
- José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.
| | - Norberto Sánchez-Cruz
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.,Departamento de Química y Programa de Posgrado en Farmacología, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Apartado 14-740, 07000, Mexico City, Mexico
| | - Bárbara I Díaz-Eufracio
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| |
Collapse
|
48
|
Bur SK, Pomerantz WCK, Bade ML, Gee CT. Fragment-Based Ligand Discovery Using Protein-Observed 19F NMR: A Second Semester Organic Chemistry CURE Project. JOURNAL OF CHEMICAL EDUCATION 2021; 98:1963-1973. [PMID: 37274366 PMCID: PMC10237086 DOI: 10.1021/acs.jchemed.1c00028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Curriculum-based undergraduate research experiences (CUREs) have been shown to increase student retention in STEM fields and are starting to become more widely adopted in chemistry curricula. Here we describe a 10-week CURE that is suitable for a second-semester organic chemistry laboratory course. Students synthesize small molecules and use protein-observed 19F (PrOF) NMR to assess the small molecule's binding affinity to a target protein. The research project introduced students to multistep organic synthesis, structure-activity relationship studies, quantitative biophysical measurements (measuring Kd from PrOF NMR experiments), and scientific literacy. Docking experiments could be added to help students understand how changes in a ligand structure may affect binding to a protein. Assessment using the CURE survey indicates self-perceived skill gains from the course that exceed gains measured in a traditional and an inquiry-based laboratory experience. Given the speed of the binding experiment and the alignment of the synthetic methods with a second-semester organic chemistry laboratory course, a PrOF NMR fragment-based ligand discovery lab can be readily implemented in the undergraduate chemistry curriculum.
Collapse
Affiliation(s)
- Scott K Bur
- Department of Chemistry, Gustavus Adolphus College, St. Peter, Minnesota 56028, United States
| | - William C K Pomerantz
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Morgan L Bade
- Department of Chemistry, Gustavus Adolphus College, St. Peter, Minnesota 56028, United States
| | - Clifford T Gee
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
49
|
Ashraf C, Joshi N, Beck DAC, Pfaendtner J. Data Science in Chemical Engineering: Applications to Molecular Science. Annu Rev Chem Biomol Eng 2021; 12:15-37. [PMID: 33710940 DOI: 10.1146/annurev-chembioeng-101220-102232] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Chemical engineering is being rapidly transformed by the tools of data science. On the horizon, artificial intelligence (AI) applications will impact a huge swath of our work, ranging from the discovery and design of new molecules to operations and manufacturing and many areas in between. Early adoption of data science, machine learning, and early examples of AI in chemical engineering has been rich with examples of molecular data science-the application tools for molecular discovery and property optimization at the atomic scale. We summarize key advances in this nascent subfield while introducing molecular data science for a broad chemical engineering readership. We introduce the field through the concept of a molecular data science life cycle and discuss relevant aspects of five distinct phases of this process: creation of curated data sets, molecular representations, data-driven property prediction, generation of new molecules, and feasibility and synthesizability considerations.
Collapse
Affiliation(s)
- Chowdhury Ashraf
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA; ,
| | - Nisarg Joshi
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA; ,
| | - David A C Beck
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA; , .,eScience Institute, University of Washington, Seattle, Washington 98195, USA
| | - Jim Pfaendtner
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA; ,
| |
Collapse
|
50
|
Duan C, Liu F, Nandy A, Kulik HJ. Putting Density Functional Theory to the Test in Machine-Learning-Accelerated Materials Discovery. J Phys Chem Lett 2021; 12:4628-4637. [PMID: 33973793 DOI: 10.1021/acs.jpclett.1c00631] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Accelerated discovery with machine learning (ML) has begun to provide the advances in efficiency needed to overcome the combinatorial challenge of computational materials design. Nevertheless, ML-accelerated discovery both inherits the biases of training data derived from density functional theory (DFT) and leads to many attempted calculations that are doomed to fail. Many compelling functional materials and catalytic processes involve strained chemical bonds, open-shell radicals and diradicals, or metal-organic bonds to open-shell transition-metal centers. Although promising targets, these materials present unique challenges for electronic structure methods and combinatorial challenges for their discovery. In this Perspective, we describe the advances needed in accuracy, efficiency, and approach beyond what is typical in conventional DFT-based ML workflows. These challenges have begun to be addressed through ML models trained to predict the results of multiple methods or the differences between them, enabling quantitative sensitivity analysis. For DFT to be trusted for a given data point in a high-throughput screen, it must pass a series of tests. ML models that predict the likelihood of calculation success and detect the presence of strong correlation will enable rapid diagnoses and adaptation strategies. These "decision engines" represent the first steps toward autonomous workflows that avoid the need for expert determination of the robustness of DFT-based materials discoveries.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|