1
|
Schrier J, Norquist AJ, Buonassisi T, Brgoch J. In Pursuit of the Exceptional: Research Directions for Machine Learning in Chemical and Materials Science. J Am Chem Soc 2023; 145:21699-21716. [PMID: 37754929 DOI: 10.1021/jacs.3c04783] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2023]
Abstract
Exceptional molecules and materials with one or more extraordinary properties are both technologically valuable and fundamentally interesting, because they often involve new physical phenomena or new compositions that defy expectations. Historically, exceptionality has been achieved through serendipity, but recently, machine learning (ML) and automated experimentation have been widely proposed to accelerate target identification and synthesis planning. In this Perspective, we argue that the data-driven methods commonly used today are well-suited for optimization but not for the realization of new exceptional materials or molecules. Finding such outliers should be possible using ML, but only by shifting away from using traditional ML approaches that tweak the composition, crystal structure, or reaction pathway. We highlight case studies of high-Tc oxide superconductors and superhard materials to demonstrate the challenges of ML-guided discovery and discuss the limitations of automation for this task. We then provide six recommendations for the development of ML methods capable of exceptional materials discovery: (i) Avoid the tyranny of the middle and focus on extrema; (ii) When data are limited, qualitative predictions that provide direction are more valuable than interpolative accuracy; (iii) Sample what can be made and how to make it and defer optimization; (iv) Create room (and look) for the unexpected while pursuing your goal; (v) Try to fill-in-the-blanks of input and output space; (vi) Do not confuse human understanding with model interpretability. We conclude with a description of how these recommendations can be integrated into automated discovery workflows, which should enable the discovery of exceptional molecules and materials.
Collapse
Affiliation(s)
- Joshua Schrier
- Department of Chemistry, Fordham University, The Bronx, New York 10458, United States
| | - Alexander J Norquist
- Department of Chemistry, Haverford College, Haverford, Pennsylvania 19041, United States
| | - Tonio Buonassisi
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Jakoah Brgoch
- Department of Chemistry and Texas Center for Superconductivity, University of Houston, Houston, Texas 77204, United States
| |
Collapse
|
2
|
Vargas-Hernández RA, Jorner K, Pollice R, Aspuru-Guzik A. Inverse molecular design and parameter optimization with Hückel theory using automatic differentiation. J Chem Phys 2023; 158:104801. [PMID: 36922116 DOI: 10.1063/5.0137103] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023] Open
Abstract
Semiempirical quantum chemistry has recently seen a renaissance with applications in high-throughput virtual screening and machine learning. The simplest semiempirical model still in widespread use in chemistry is Hückel's π-electron molecular orbital theory. In this work, we implemented a Hückel program using differentiable programming with the JAX framework based on limited modifications of a pre-existing NumPy version. The auto-differentiable Hückel code enabled efficient gradient-based optimization of model parameters tuned for excitation energies and molecular polarizabilities, respectively, based on as few as 100 data points from density functional theory simulations. In particular, the facile computation of the polarizability, a second-order derivative, via auto-differentiation shows the potential of differentiable programming to bypass the need for numeric differentiation or derivation of analytical expressions. Finally, we employ gradient-based optimization of atom identity for inverse design of organic electronic materials with targeted orbital energy gaps and polarizabilities. Optimized structures are obtained after as little as 15 iterations using standard gradient-based optimization algorithms.
Collapse
Affiliation(s)
- Rodrigo A Vargas-Hernández
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Kjell Jorner
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Robert Pollice
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Alán Aspuru-Guzik
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| |
Collapse
|
3
|
Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023; 14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.
Collapse
Affiliation(s)
- Zhengkai Tu
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|
4
|
Shirasawa R, Takemura I, Hattori S, Nagata Y. A semi-automated material exploration scheme to predict the solubilities of tetraphenylporphyrin derivatives. Commun Chem 2022; 5:158. [PMID: 36697881 PMCID: PMC9814751 DOI: 10.1038/s42004-022-00770-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 11/04/2022] [Indexed: 11/24/2022] Open
Abstract
Acceleration of material discovery has been tackled by informatics and laboratory automation. Here we show a semi-automated material exploration scheme to modelize the solubility of tetraphenylporphyrin derivatives. The scheme involved the following steps: definition of a practical chemical search space, prioritization of molecules in the space using an extended algorithm for submodular function maximization without requiring biased variable selection or pre-existing data, synthesis & automated measurement, and machine-learning model estimation. The optimal evaluation order selected using the algorithm covered several similar molecules (32% of all targeted molecules, whereas that obtained by random sampling and uncertainty sampling was ~7% and ~4%, respectively) with a small number of evaluations (10 molecules: 0.13% of all targeted molecules). The derived binary classification models predicted 'good solvents' with an accuracy >0.8. Overall, we confirmed the effectivity of the proposed semi-automated scheme in early-stage material search projects for accelerating a wider range of material research.
Collapse
Affiliation(s)
- Raku Shirasawa
- Advanced Research Laboratory, R&D Center, Sony Group Corporation, Atsugi Tec. 4-14-1 Asahi-cho, Atsugi-shi, Kanagawa, 243-0014, Japan.
| | - Ichiro Takemura
- Tokyo Laboratory 26, R&D Center, Sony Group Corporation, Atsugi Tec. 4-14-1 Asahi-cho, Atsugi-shi, Kanagawa, 243-0014, Japan
| | - Shinnosuke Hattori
- Advanced Research Laboratory, R&D Center, Sony Group Corporation, Atsugi Tec. 4-14-1 Asahi-cho, Atsugi-shi, Kanagawa, 243-0014, Japan
| | - Yuuya Nagata
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, Hokkaido, 001-0021, Japan.
| |
Collapse
|
5
|
Seifrid M, Pollice R, Aguilar-Granda A, Morgan Chan Z, Hotta K, Ser CT, Vestfrid J, Wu TC, Aspuru-Guzik A. Autonomous Chemical Experiments: Challenges and Perspectives on Establishing a Self-Driving Lab. Acc Chem Res 2022; 55:2454-2466. [PMID: 35948428 PMCID: PMC9454899 DOI: 10.1021/acs.accounts.2c00220] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Indexed: 01/19/2023]
Abstract
We must accelerate the pace at which we make technological advancements to address climate change and disease risks worldwide. This swifter pace of discovery requires faster research and development cycles enabled by better integration between hypothesis generation, design, experimentation, and data analysis. Typical research cycles take months to years. However, data-driven automated laboratories, or self-driving laboratories, can significantly accelerate molecular and materials discovery. Recently, substantial advancements have been made in the areas of machine learning and optimization algorithms that have allowed researchers to extract valuable knowledge from multidimensional data sets. Machine learning models can be trained on large data sets from the literature or databases, but their performance can often be hampered by a lack of negative results or metadata. In contrast, data generated by self-driving laboratories can be information-rich, containing precise details of the experimental conditions and metadata. Consequently, much larger amounts of high-quality data are gathered in self-driving laboratories. When placed in open repositories, this data can be used by the research community to reproduce experiments, for more in-depth analysis, or as the basis for further investigation. Accordingly, high-quality open data sets will increase the accessibility and reproducibility of science, which is sorely needed.In this Account, we describe our efforts to build a self-driving lab for the development of a new class of materials: organic semiconductor lasers (OSLs). Since they have only recently been demonstrated, little is known about the molecular and material design rules for thin-film, electrically-pumped OSL devices as compared to other technologies such as organic light-emitting diodes or organic photovoltaics. To realize high-performing OSL materials, we are developing a flexible system for automated synthesis via iterative Suzuki-Miyaura cross-coupling reactions. This automated synthesis platform is directly coupled to the analysis and purification capabilities. Subsequently, the molecules of interest can be transferred to an optical characterization setup. We are currently limited to optical measurements of the OSL molecules in solution. However, material properties are ultimately most important in the solid state (e.g., as a thin-film device). To that end and for a different scientific goal, we are developing a self-driving lab for inorganic thin-film materials focused on the oxygen evolution reaction.While the future of self-driving laboratories is very promising, numerous challenges still need to be overcome. These challenges can be split into cognition and motor function. Generally, the cognitive challenges are related to optimization with constraints or unexpected outcomes for which general algorithmic solutions have yet to be developed. A more practical challenge that could be resolved in the near future is that of software control and integration because few instrument manufacturers design their products with self-driving laboratories in mind. Challenges in motor function are largely related to handling heterogeneous systems, such as dispensing solids or performing extractions. As a result, it is critical to understand that adapting experimental procedures that were designed for human experimenters is not as simple as transferring those same actions to an automated system, and there may be more efficient ways to achieve the same goal in an automated fashion. Accordingly, for self-driving laboratories, we need to carefully rethink the translation of manual experimental protocols.
Collapse
Affiliation(s)
- Martin Seifrid
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Robert Pollice
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | | | - Zamyla Morgan Chan
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Acceleration
Consortium, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Kazuhiro Hotta
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Science
& Innovation Center, Mitsubishi Chemical
Corporation, 1000 Kamoshidacho, Aoba, Yokohama, Kanagawa 227-8502, Japan
| | - Cher Tian Ser
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Jenya Vestfrid
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Tony C. Wu
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Alán Aspuru-Guzik
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Department
of Chemical Engineering & Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
- Department
of Materials Science, University of Toronto, Toronto, Ontario M5S 3E4, Canada
- Vector
Institute for Artificial Intelligence, Toronto, Ontario M5S 1M1, Canada
- Lebovic
Fellow, Canadian Institute for Advanced
Research, Toronto, Ontario M5S 1M1, Canada
| |
Collapse
|