1
|
Guo J, Schwaller P. Augmented Memory: Sample-Efficient Generative Molecular Design with Reinforcement Learning. JACS AU 2024; 4:2160-2172. [PMID: 38938817 PMCID: PMC11200228 DOI: 10.1021/jacsau.4c00066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/29/2024] [Accepted: 04/01/2024] [Indexed: 06/29/2024]
Abstract
Sample efficiency is a fundamental challenge in de novo molecular design. Ideally, molecular generative models should learn to satisfy a desired objective under minimal calls to oracles (computational property predictors). This problem becomes more apparent when using oracles that can provide increased predictive accuracy but impose significant computational cost. Consequently, designing molecules that are optimized for such oracles cannot be achieved under a practical computational budget. Molecular generative models based on simplified molecular-input line-entry system (SMILES) have shown remarkable sample efficiency when coupled with reinforcement learning, as demonstrated in the practical molecular optimization (PMO) benchmark. Here, we first show that experience replay drastically improves the performance of multiple previously proposed algorithms. Next, we propose a novel algorithm called Augmented Memory that combines data augmentation with experience replay. We show that scores obtained from oracle calls can be reused to update the model multiple times. We compare Augmented Memory to previously proposed algorithms and show significantly enhanced sample efficiency in an exploitation task, a drug discovery case study requiring both exploration and exploitation, and a materials design case study optimizing explicitly for quantum-mechanical properties. Our method achieves a new state-of-the-art in sample-efficient de novo molecular design, outperforming all of the previously reported methods. The code is available at https://github.com/schwallergroup/augmented_memory.
Collapse
Affiliation(s)
- Jeff Guo
- Laboratory
of Artificial Chemical Intelligence (LIAC), Institut des Sciences
et Ingénierie Chimiques, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Philippe Schwaller
- Laboratory
of Artificial Chemical Intelligence (LIAC), Institut des Sciences
et Ingénierie Chimiques, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| |
Collapse
|
2
|
Zhang J, Li L, Xie X, Song XQ, Schaefer HF. Biomimetic Frustrated Lewis Pair Catalysts for Hydrogenation of CO to Methanol at Low Temperatures. ACS ORGANIC & INORGANIC AU 2024; 4:258-267. [PMID: 38585511 PMCID: PMC10996047 DOI: 10.1021/acsorginorgau.3c00064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/12/2024] [Accepted: 01/16/2024] [Indexed: 04/09/2024]
Abstract
The industrial production of methanol through CO hydrogenation using the Cu/ZnO/Al2O3 catalyst requires harsh conditions, and the development of new catalysts with low operating temperatures is highly desirable. In this study, organic biomimetic FLP catalysts with good tolerance to CO poison are theoretically designed. The base-free catalytic reaction contains the 1,1-addition of CO into a formic acid intermediate and the hydrogenation of the formic acid intermediate into methanol. Low-energy spans (25.6, 22.1, and 20.6 kcal/mol) are achieved, indicating that CO can be hydrogenated into methanol at low temperatures. The new extended aromatization-dearomatization effect involving multiple rings is proposed to effectively facilitate the rate-determining CO 1,1-addition step, and a new CO activation model is proposed for organic catalysts.
Collapse
Affiliation(s)
- Jiejing Zhang
- College
of Pharmacy, Key Laboratory of Pharmaceutical Quality Control of Hebei
Province, Key Laboratory of Medicinal Chemistry and Molecular Diagnosis
of Ministry of Education, Hebei University, Baoding 071002, Hebei, P. R. China
| | - Longfei Li
- College
of Pharmacy, Key Laboratory of Pharmaceutical Quality Control of Hebei
Province, Key Laboratory of Medicinal Chemistry and Molecular Diagnosis
of Ministry of Education, Hebei University, Baoding 071002, Hebei, P. R. China
| | - Xiaofeng Xie
- College
of Pharmacy, Key Laboratory of Pharmaceutical Quality Control of Hebei
Province, Key Laboratory of Medicinal Chemistry and Molecular Diagnosis
of Ministry of Education, Hebei University, Baoding 071002, Hebei, P. R. China
| | - Xue-Qing Song
- College
of Pharmacy, Key Laboratory of Pharmaceutical Quality Control of Hebei
Province, Key Laboratory of Medicinal Chemistry and Molecular Diagnosis
of Ministry of Education, Hebei University, Baoding 071002, Hebei, P. R. China
| | - Henry F. Schaefer
- Center
for Computational Quantum Chemistry, University
of Georgia, Athens, Georgia 30602, United States
| |
Collapse
|
3
|
Wu Y, Wang CF, Ju MG, Jia Q, Zhou Q, Lu S, Gao X, Zhang Y, Wang J. Universal machine learning aided synthesis approach of two-dimensional perovskites in a typical laboratory. Nat Commun 2024; 15:138. [PMID: 38167836 PMCID: PMC10761762 DOI: 10.1038/s41467-023-44236-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024] Open
Abstract
The past decade has witnessed the significant efforts in novel material discovery in the use of data-driven techniques, in particular, machine learning (ML). However, since it needs to consider the precursors, experimental conditions, and availability of reactants, material synthesis is generally much more complex than property and structure prediction, and very few computational predictions are experimentally realized. To solve these challenges, a universal framework that integrates high-throughput experiments, a priori knowledge of chemistry, and ML techniques such as subgroup discovery and support vector machine is proposed to guide the experimental synthesis of materials, which is capable of disclosing structure-property relationship hidden in high-throughput experiments and rapidly screening out materials with high synthesis feasibility from vast chemical space. Through application of our approach to challenging and consequential synthesis problem of 2D silver/bismuth organic-inorganic hybrid perovskites, we have increased the success rate of the synthesis feasibility by a factor of four relative to traditional approaches. This study provides a practical route for solving multidimensional chemical acceleration problems with small dataset from typical laboratory with limited experimental resources available.
Collapse
Affiliation(s)
- Yilei Wu
- Key Laboratory of Quantum Materials and Devices of Ministry of Education, School of Physics, Southeast University, 211189, Nanjing, China
| | - Chang-Feng Wang
- Institute for Science and Applications of Molecular Ferroelectrics, Key Laboratory of the Ministry of Education for Advanced Catalysis Materials, Zhejiang Normal University, 321004, Jinhua, China
| | - Ming-Gang Ju
- Key Laboratory of Quantum Materials and Devices of Ministry of Education, School of Physics, Southeast University, 211189, Nanjing, China.
| | - Qiangqiang Jia
- Institute for Science and Applications of Molecular Ferroelectrics, Key Laboratory of the Ministry of Education for Advanced Catalysis Materials, Zhejiang Normal University, 321004, Jinhua, China
| | - Qionghua Zhou
- Key Laboratory of Quantum Materials and Devices of Ministry of Education, School of Physics, Southeast University, 211189, Nanjing, China
| | - Shuaihua Lu
- Key Laboratory of Quantum Materials and Devices of Ministry of Education, School of Physics, Southeast University, 211189, Nanjing, China
| | - Xinying Gao
- Key Laboratory of Quantum Materials and Devices of Ministry of Education, School of Physics, Southeast University, 211189, Nanjing, China
| | - Yi Zhang
- Institute for Science and Applications of Molecular Ferroelectrics, Key Laboratory of the Ministry of Education for Advanced Catalysis Materials, Zhejiang Normal University, 321004, Jinhua, China.
| | - Jinlan Wang
- Key Laboratory of Quantum Materials and Devices of Ministry of Education, School of Physics, Southeast University, 211189, Nanjing, China.
- Suzhou Laboratory, Suzhou, China.
| |
Collapse
|
4
|
Beran GJO. Frontiers of molecular crystal structure prediction for pharmaceuticals and functional organic materials. Chem Sci 2023; 14:13290-13312. [PMID: 38033897 PMCID: PMC10685338 DOI: 10.1039/d3sc03903j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 11/02/2023] [Indexed: 12/02/2023] Open
Abstract
The reliability of organic molecular crystal structure prediction has improved tremendously in recent years. Crystal structure predictions for small, mostly rigid molecules are quickly becoming routine. Structure predictions for larger, highly flexible molecules are more challenging, but their crystal structures can also now be predicted with increasing rates of success. These advances are ushering in a new era where crystal structure prediction drives the experimental discovery of new solid forms. After briefly discussing the computational methods that enable successful crystal structure prediction, this perspective presents case studies from the literature that demonstrate how state-of-the-art crystal structure prediction can transform how scientists approach problems involving the organic solid state. Applications to pharmaceuticals, porous organic materials, photomechanical crystals, organic semi-conductors, and nuclear magnetic resonance crystallography are included. Finally, efforts to improve our understanding of which predicted crystal structures can actually be produced experimentally and other outstanding challenges are discussed.
Collapse
Affiliation(s)
- Gregory J O Beran
- Department of Chemistry, University of California Riverside Riverside CA 92521 USA
| |
Collapse
|
5
|
Lee J, Park D, Lee M, Lee H, Park K, Lee I, Ryu S. Machine learning-based inverse design methods considering data characteristics and design space size in materials design and manufacturing: a review. MATERIALS HORIZONS 2023; 10:5436-5456. [PMID: 37560794 DOI: 10.1039/d3mh00039g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/11/2023]
Abstract
In the last few decades, the influence of machine learning has permeated many areas of science and technology, including the field of materials science. This toolkit of data driven methods accelerated the discovery and production of new materials by accurately predicting the complicated physical processes and mechanisms that are not fully described by existing materials theories. However, the availability of a growing number of increasingly complex machine learning models confronts us with the question of "which machine learning algorithm to employ". In this review, we provide a comprehensive review of common machine learning algorithms used for materials design, as well as a guideline for selecting the most appropriate model considering the nature of the design problem. To this end, we classify the material design problems into four categories of: (i) the training data set being sufficiently large to capture the trend of design space (interpolation problem), (ii) a vast design space that cannot be explored thoroughly with the initial training data set alone (extrapolation problem), (iii) multi-fidelity datasets (small accurate dataset and large approximate dataset), and (iv) only a small dataset available. The most successful machine learning-based surrogate models and design approaches will be discussed for each case along with pertinent literature. This review focuses mostly on the use of ML algorithms for the inverse design of complicated composite structures, a topic that has received a lot of attention recently with the rise of additive manufacturing.
Collapse
Affiliation(s)
- Junhyeong Lee
- Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
| | - Donggeun Park
- Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
| | - Mingyu Lee
- Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
| | - Hugon Lee
- Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
| | - Kundo Park
- Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
| | - Ikjin Lee
- Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
| | - Seunghwa Ryu
- Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
| |
Collapse
|
6
|
Schrier J, Norquist AJ, Buonassisi T, Brgoch J. In Pursuit of the Exceptional: Research Directions for Machine Learning in Chemical and Materials Science. J Am Chem Soc 2023; 145:21699-21716. [PMID: 37754929 DOI: 10.1021/jacs.3c04783] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2023]
Abstract
Exceptional molecules and materials with one or more extraordinary properties are both technologically valuable and fundamentally interesting, because they often involve new physical phenomena or new compositions that defy expectations. Historically, exceptionality has been achieved through serendipity, but recently, machine learning (ML) and automated experimentation have been widely proposed to accelerate target identification and synthesis planning. In this Perspective, we argue that the data-driven methods commonly used today are well-suited for optimization but not for the realization of new exceptional materials or molecules. Finding such outliers should be possible using ML, but only by shifting away from using traditional ML approaches that tweak the composition, crystal structure, or reaction pathway. We highlight case studies of high-Tc oxide superconductors and superhard materials to demonstrate the challenges of ML-guided discovery and discuss the limitations of automation for this task. We then provide six recommendations for the development of ML methods capable of exceptional materials discovery: (i) Avoid the tyranny of the middle and focus on extrema; (ii) When data are limited, qualitative predictions that provide direction are more valuable than interpolative accuracy; (iii) Sample what can be made and how to make it and defer optimization; (iv) Create room (and look) for the unexpected while pursuing your goal; (v) Try to fill-in-the-blanks of input and output space; (vi) Do not confuse human understanding with model interpretability. We conclude with a description of how these recommendations can be integrated into automated discovery workflows, which should enable the discovery of exceptional molecules and materials.
Collapse
Affiliation(s)
- Joshua Schrier
- Department of Chemistry, Fordham University, The Bronx, New York 10458, United States
| | - Alexander J Norquist
- Department of Chemistry, Haverford College, Haverford, Pennsylvania 19041, United States
| | - Tonio Buonassisi
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Jakoah Brgoch
- Department of Chemistry and Texas Center for Superconductivity, University of Houston, Houston, Texas 77204, United States
| |
Collapse
|
7
|
Williamson E, Sun Z, Tappan BA, Brutchey RL. Predictive Synthesis of Copper Selenides Using a Multidimensional Phase Map Constructed with a Data-Driven Classifier. J Am Chem Soc 2023; 145:17954-17964. [PMID: 37540836 PMCID: PMC10436277 DOI: 10.1021/jacs.3c05490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Indexed: 08/06/2023]
Abstract
Copper selenides are an important family of materials with applications in catalysis, plasmonics, photovoltaics, and thermoelectrics. Despite being a binary material system, the Cu-Se phase diagram is complex and contains multiple crystal structures in addition to several metastable structures that are not found on the thermodynamic phase diagram. Consequently, the ability to synthetically navigate this complex phase space poses a significant challenge. We demonstrate that data-driven learning can successfully map this phase space in a minimal number of experiments. We combine soft chemistry (chimie douce) synthetic methods with multivariate analyses via classification techniques to enable predictive phase determination. A surrogate model was constructed with experimental data derived from a design matrix of four experimental variables: C-Se bond strength of the selenium precursor, time, temperature, and solvent composition. The reactions in the surrogate model resulted in 11 distinct phase combinations of copper selenide. These data were used to train a classification model that predicts the phase with 95.7% accuracy. The resulting decision tree enabled conclusions to be drawn about how the experimental variables affect the phase and provided prescriptive synthetic conditions for specific phase isolation. This guided the accelerated phase targeting in a minimum number of experiments of klockmannite CuSe, which could not be isolated in any of the reactions used to construct the surrogate model. The reaction conditions that the model predicted to synthesize klockmannite CuSe were experimentally validated, highlighting the utility of this approach.
Collapse
Affiliation(s)
- Emily
M. Williamson
- Department of Chemistry, University of Southern California, Los Angeles, California 90089, United States
| | - Zhaohong Sun
- Department of Chemistry, University of Southern California, Los Angeles, California 90089, United States
| | - Bryce A. Tappan
- Department of Chemistry, University of Southern California, Los Angeles, California 90089, United States
| | - Richard L. Brutchey
- Department of Chemistry, University of Southern California, Los Angeles, California 90089, United States
| |
Collapse
|
8
|
Anstine D, Isayev O. Generative Models as an Emerging Paradigm in the Chemical Sciences. J Am Chem Soc 2023; 145:8736-8750. [PMID: 37052978 PMCID: PMC10141264 DOI: 10.1021/jacs.2c13467] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Indexed: 04/14/2023]
Abstract
Traditional computational approaches to design chemical species are limited by the need to compute properties for a vast number of candidates, e.g., by discriminative modeling. Therefore, inverse design methods aim to start from the desired property and optimize a corresponding chemical structure. From a machine learning viewpoint, the inverse design problem can be addressed through so-called generative modeling. Mathematically, discriminative models are defined by learning the probability distribution function of properties given the molecular or material structure. In contrast, a generative model seeks to exploit the joint probability of a chemical species with target characteristics. The overarching idea of generative modeling is to implement a system that produces novel compounds that are expected to have a desired set of chemical features, effectively sidestepping issues found in the forward design process. In this contribution, we overview and critically analyze popular generative algorithms like generative adversarial networks, variational autoencoders, flow, and diffusion models. We highlight key differences between each of the models, provide insights into recent success stories, and discuss outstanding challenges for realizing generative modeling discovered solutions in chemical applications.
Collapse
Affiliation(s)
- Dylan
M. Anstine
- Department
of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Olexandr Isayev
- Department
of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
9
|
Estévez Ruiz EP, Lago JL, Thirumuruganandham SP. Experimental Studies on TiO 2 NT with Metal Dopants through Co-Precipitation, Sol-Gel, Hydrothermal Scheme and Corresponding Computational Molecular Evaluations. MATERIALS (BASEL, SWITZERLAND) 2023; 16:3076. [PMID: 37109913 PMCID: PMC10143655 DOI: 10.3390/ma16083076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 03/19/2023] [Accepted: 03/23/2023] [Indexed: 06/19/2023]
Abstract
In the last decade, TiO2 nanotubes have attracted the attention of the scientific community and industry due to their exceptional photocatalytic properties, opening a wide range of additional applications in the fields of renewable energy, sensors, supercapacitors, and the pharmaceutical industry. However, their use is limited because their band gap is tied to the visible light spectrum. Therefore, it is essential to dope them with metals to extend their physicochemical advantages. In this review, we provide a brief overview of the preparation of metal-doped TiO2 nanotubes. We address hydrothermal and alteration methods that have been used to study the effects of different metal dopants on the structural, morphological, and optoelectrical properties of anatase and rutile nanotubes. The progress of DFT studies on the metal doping of TiO2 nanoparticles is discussed. In addition, the traditional models and their confirmation of the results of the experiment with TiO2 nanotubes are reviewed, as well as the use of TNT in various applications and the future prospects for its development in other fields. We focus on the comprehensive analysis and practical significance of the development of TiO2 hybrid materials and the need for a better understanding of the structural-chemical properties of anatase TiO2 nanotubes with metal doping for ion storage devices such as batteries.
Collapse
Affiliation(s)
- Eduardo Patricio Estévez Ruiz
- Centro de Investigación de Ciencias Humanas y de la Educación (CICHE), Universidad Indoamérica, Ambato 180103, Ecuador
- Grupo de Polímeros, Departamento de Física y Ciencias de la Tierra, Escuela Universitaria Politécnica, Universidade da Coruña, 15471 Ferrol, Spain
| | - Joaquín López Lago
- Grupo de Polímeros, Departamento de Física y Ciencias de la Tierra, Escuela Universitaria Politécnica, Universidade da Coruña, 15471 Ferrol, Spain
| | | |
Collapse
|