1
|
Fallani A, Medrano Sandonas L, Tkatchenko A. Inverse mapping of quantum properties to structures for chemical space of small organic molecules. Nat Commun 2024; 15:6061. [PMID: 39025883 PMCID: PMC11258234 DOI: 10.1038/s41467-024-50401-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 07/01/2024] [Indexed: 07/20/2024] Open
Abstract
Computer-driven molecular design combines the principles of chemistry, physics, and artificial intelligence to identify chemical compounds with tailored properties. While quantum-mechanical (QM) methods, coupled with machine learning, already offer a direct mapping from 3D molecular structures to their properties, effective methodologies for the inverse mapping in chemical space remain elusive. We address this challenge by demonstrating the possibility of parametrizing a chemical space with a finite set of QM properties. Our proof-of-concept implementation achieves an approximate property-to-structure mapping, the QIM model (which stands for "Quantum Inverse Mapping"), by forcing a variational auto-encoder with a property encoder to obtain a common internal representation for both structures and properties. After validating this mapping for small drug-like molecules, we illustrate its capabilities with an explainability study as well as by the generation of de novo molecular structures with targeted properties and transition pathways between conformational isomers. Our findings thus provide a proof-of-principle demonstration aiming to enable the inverse property-to-structure design in diverse chemical spaces.
Collapse
Affiliation(s)
- Alessio Fallani
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany.
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
2
|
Xiao H, Li R, Shi X, Chen Y, Zhu L, Chen X, Wang L. An invertible, invariant crystal representation for inverse design of solid-state materials using generative deep learning. Nat Commun 2023; 14:7027. [PMID: 37919277 PMCID: PMC10622439 DOI: 10.1038/s41467-023-42870-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 10/24/2023] [Indexed: 11/04/2023] Open
Abstract
The past decade has witnessed rapid progress in deep learning for molecular design, owing to the availability of invertible and invariant representations for molecules such as simplified molecular-input line-entry system (SMILES), which has powered cheminformatics since the late 1980s. However, the design of elemental components and their structural arrangement in solid-state materials to achieve certain desired properties is still a long-standing challenge in physics, chemistry and biology. This is primarily due to, unlike molecular inverse design, the lack of an invertible crystal representation that satisfies translational, rotational, and permutational invariances. To address this issue, we have developed a simplified line-input crystal-encoding system (SLICES), which is a string-based crystal representation that satisfies both invertibility and invariances. The reconstruction routine of SLICES successfully reconstructed 94.95% of over 40,000 structurally and chemically diverse crystal structures, showcasing an unprecedented invertibility. Furthermore, by only encoding compositional and topological data, SLICES guarantees invariances. We demonstrate the application of SLICES in the inverse design of direct narrow-gap semiconductors for optoelectronic applications. As a string-based, invertible, and invariant crystal representation, SLICES shows promise as a useful tool for in silico materials discovery.
Collapse
Affiliation(s)
- Hang Xiao
- School of Interdisciplinary Studies, Lingnan University, Tuen Mun, Hong Kong SAR, China
| | - Rong Li
- School of Chemical Engineering, Northwest University, Xi'an, 710069, China
| | - Xiaoyang Shi
- Department of Environmental and Sustainable Engineering, State University of New York at Albany, Albany, NY, 12222, USA
| | - Yan Chen
- Laboratory for Multiscale Mechanics and Medical Science, SV LAB, School of Aerospace, Xi'an Jiaotong University, Xi'an, 710049, China.
| | - Liangliang Zhu
- School of Chemical Engineering, Northwest University, Xi'an, 710069, China.
- Shaanxi Institute of Energy and Chemical Engineering, Xi'an, 710069, China.
| | - Xi Chen
- School of Interdisciplinary Studies, Lingnan University, Tuen Mun, Hong Kong SAR, China.
| | - Lei Wang
- National Laboratory of Solid-State Microstructures, School of Physics, Nanjing University, Nanjing, 210093, China.
- Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210093, China.
| |
Collapse
|
3
|
Garg R, Patra NR, Samal S, Babbar S, Parida K. A review on accelerated development of skin-like MXene electrodes: from experimental to machine learning. NANOSCALE 2023; 15:8110-8133. [PMID: 37096943 DOI: 10.1039/d2nr05969j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Foreshadowing future needs has catapulted the progress of skin-like electronic devices for human-machine interactions. These devices possess human skin-like properties such as stretchability, self-healability, transparency, biocompatibility, and wearability. This review highlights the recent progress in a promising material, MXenes, to realize soft, deformable, skin-like electrodes. Various structural designs, fabrication strategies, and rational guidelines adopted to realize MXene-based skin-like electrodes are outlined. We explicitly discussed machine learning-based material informatics to understand and predict the properties of MXenes. Finally, an outlook on the existing challenges and the future roadmap to realize soft skin-like MXene electrodes to facilitate technological advances in the next-generation human-machine interactions has been described.
Collapse
Affiliation(s)
- Romy Garg
- Institute of Nano Science and Technology, Mohali, Punjab, India
| | | | | | - Shubham Babbar
- Institute of Nano Science and Technology, Mohali, Punjab, India
| | | |
Collapse
|
4
|
Jelfs KE. Computational modeling to assist in the discovery of supramolecular materials. Ann N Y Acad Sci 2022; 1518:106-119. [PMID: 36251351 PMCID: PMC10091946 DOI: 10.1111/nyas.14913] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Computational modeling is increasingly used to assist in the discovery of supramolecular materials. Supramolecular materials are typically primarily built from organic components that are self-assembled through noncovalent bonding and have potential applications, including in selective binding, sorption, molecular separations, catalysis, optoelectronics, sensing, and as molecular machines. In this review, the key areas where computational prediction can assist in the discovery of supramolecular materials, including in structure prediction, property prediction, and the prediction of how to synthesize a hypothetical material are discussed, before exploring the potential impact of artificial intelligence techniques on the field. Throughout, the importance of close integration with experimental materials discovery programs will be highlighted. A series of case studies from the author's work across some different supramolecular material classes will be discussed, before finishing with a discussion of the outlook for the field.
Collapse
Affiliation(s)
- Kim E Jelfs
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, London, UK
| |
Collapse
|
5
|
Mroz AM, Posligua V, Tarzia A, Wolpert EH, Jelfs KE. Into the Unknown: How Computation Can Help Explore Uncharted Material Space. J Am Chem Soc 2022; 144:18730-18743. [PMID: 36206484 PMCID: PMC9585593 DOI: 10.1021/jacs.2c06833] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
Novel functional materials are urgently needed to help
combat the
major global challenges facing humanity, such as climate change and
resource scarcity. Yet, the traditional experimental materials discovery
process is slow and the material space at our disposal is too vast
to effectively explore using intuition-guided experimentation alone.
Most experimental materials discovery programs necessarily focus on
exploring the local space of known materials, so we are not fully
exploiting the enormous potential material space, where more novel
materials with unique properties may exist. Computation, facilitated
by improvements in open-source software and databases, as well as
computer hardware has the potential to significantly accelerate the
rational development of materials, but all too often is only used
to postrationalize experimental observations. Thus, the true predictive
power of computation, where theory leads experimentation, is not fully
utilized. Here, we discuss the challenges to successful implementation
of computation-driven materials discovery workflows, and then focus
on the progress of the field, with a particular emphasis on the challenges
to reaching novel materials.
Collapse
Affiliation(s)
- Austin M Mroz
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London, W12 0BZ, U.K
| | - Victor Posligua
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London, W12 0BZ, U.K
| | - Andrew Tarzia
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London, W12 0BZ, U.K
| | - Emma H Wolpert
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London, W12 0BZ, U.K
| | - Kim E Jelfs
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London, W12 0BZ, U.K
| |
Collapse
|
6
|
Li C, Wang C, Sun M, Zeng Y, Yuan Y, Gou Q, Wang G, Guo Y, Pu X. Correlated RNN Framework to Quickly Generate Molecules with Desired Properties for Energetic Materials in the Low Data Regime. J Chem Inf Model 2022; 62:4873-4887. [PMID: 35998331 DOI: 10.1021/acs.jcim.2c00997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Motivated by the challenging of deep learning on the low data regime and the urgent demand for intelligent design on highly energetic materials, we explore a correlated deep learning framework, which consists of three recurrent neural networks (RNNs) correlated by the transfer learning strategy, to efficiently generate new energetic molecules with a high detonation velocity in the case of very limited data available. To avoid the dependence on the external big data set, data augmentation by fragment shuffling of 303 energetic compounds is utilized to produce 500,000 molecules to pretrain RNN, through which the model can learn sufficient structure knowledge. Then the pretrained RNN is fine-tuned by focusing on the 303 energetic compounds to generate 7153 molecules similar to the energetic compounds. In order to more reliably screen the molecules with a high detonation velocity, the SMILE enumeration augmentation coupled with the pretrained knowledge is utilized to build an RNN-based prediction model, through which R2 is boosted from 0.4446 to 0.9572. The comparable performance with the transfer learning strategy based on an existing big database (ChEMBL) to produce the energetic molecules and drug-like ones further supports the effectiveness and generality of our strategy in the low data regime. High-precision quantum mechanics calculations further confirm that 35 new molecules present a higher detonation velocity and lower synthetic accessibility than the classic explosive RDX, along with good thermal stability. In particular, three new molecules are comparable to caged CL-20 in the detonation velocity. All the source codes and the data set are freely available at https://github.com/wangchenghuidream/RNNMGM.
Collapse
Affiliation(s)
- Chuan Li
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Chenghui Wang
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Ming Sun
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yan Zeng
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Yuan Yuan
- College of Management, Southwest University for Nationalities, Chengdu 610041, China
| | - Qiaolin Gou
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Guangchuan Wang
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
7
|
Menon D, Ranganathan R. A Generative Approach to Materials Discovery, Design, and Optimization. ACS OMEGA 2022; 7:25958-25973. [PMID: 35936396 PMCID: PMC9352221 DOI: 10.1021/acsomega.2c03264] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 07/11/2022] [Indexed: 05/25/2023]
Abstract
Despite its potential to transform society, materials research suffers from a major drawback: its long research timeline. Recently, machine-learning techniques have emerged as a viable solution to this drawback and have shown accuracies comparable to other computational techniques like density functional theory (DFT) at a fraction of the computational time. One particular class of machine-learning models, known as "generative models", is of particular interest owing to its ability to approximate high-dimensional probability distribution functions, which in turn can be used to generate novel data such as molecular structures by sampling these approximated probability distribution functions. This review article aims to provide an in-depth understanding of the underlying mathematical principles of popular generative models such as recurrent neural networks, variational autoencoders, and generative adversarial networks and discuss their state-of-the-art applications in the domains of biomaterials and organic drug-like materials, energy materials, and structural materials. Here, we discuss a broad range of applications of these models spanning from the discovery of drugs that treat cancer to finding the first room-temperature superconductor and from the discovery and optimization of battery and photovoltaic materials to the optimization of high-entropy alloys. We conclude by presenting a brief outlook of the major challenges that lie ahead for the mainstream usage of these models for materials research.
Collapse
|
8
|
Singh S, Sunoj RB. A Transfer Learning Approach for Reaction Discovery in Small Data Situations Using Generative Model. iScience 2022; 25:104661. [PMID: 35832891 PMCID: PMC9272387 DOI: 10.1016/j.isci.2022.104661] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 05/20/2022] [Accepted: 06/16/2022] [Indexed: 11/01/2022] Open
Abstract
Sustainable practices in chemical sciences can be better realized by adopting interdisciplinary approaches that combine the advantages of machine learning (ML) on the initially acquired small data in reaction discovery. Developing new reactions generally remains heuristic and even time and resource intensive. For instance, synthesis of fluorine-containing compounds, which constitute ∼20% of the marketed drugs, relies on deoxyfluorination of abundantly available alcohols. Herein, we demonstrate the use of a recurrent neural network-based deep generative model built on a library of just 37 alcohols for effective learning and exploration of the chemical space. The proof-of-concept ML model is able to generate good quality, synthetically accessible, higher-yielding novel alcohol molecules. This protocol would have superior utility for deployment into a practical reaction discovery pipeline. Dual pronged transfer learning, both to generate and predict yields of new molecules Demonstrated the utility for an important family of deoxyfluorination of alcohols Applicable for practically more likely situations with relatively smaller data Extendable to other reaction manifolds to facilitate expedited reaction discovery
Collapse
|
9
|
Kadulkar S, Sherman ZM, Ganesan V, Truskett TM. Machine Learning-Assisted Design of Material Properties. Annu Rev Chem Biomol Eng 2022; 13:235-254. [PMID: 35300515 DOI: 10.1146/annurev-chembioeng-092220-024340] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Designing functional materials requires a deep search through multidimensional spaces for system parameters that yield desirable material properties. For cases where conventional parameter sweeps or trial-and-error sampling are impractical, inverse methods that frame design as a constrained optimization problem present an attractive alternative. However, even efficient algorithms require time- and resource-intensive characterization of material properties many times during optimization, imposing a design bottleneck. Approaches that incorporate machine learning can help address this limitation and accelerate the discovery of materials with targeted properties. In this article, we review how to leverage machine learning to reduce dimensionality in order to effectively explore design space, accelerate property evaluation, and generate unconventional material structures with optimal properties. We also discuss promising future directions, including integration of machine learning into multiple stages of a design algorithm and interpretation of machine learning models to understand how design parameters relate to material properties. Expected final online publication date for the Annual Review of Chemical and Biomolecular Engineering, Volume 13 is October 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Sanket Kadulkar
- McKetta Department of Chemical Engineering, University of Texas at Austin, Austin, Texas, USA;
| | - Zachary M Sherman
- McKetta Department of Chemical Engineering, University of Texas at Austin, Austin, Texas, USA;
| | - Venkat Ganesan
- McKetta Department of Chemical Engineering, University of Texas at Austin, Austin, Texas, USA;
| | - Thomas M Truskett
- McKetta Department of Chemical Engineering, University of Texas at Austin, Austin, Texas, USA; .,Department of Physics, University of Texas at Austin, Austin, Texas, USA
| |
Collapse
|
10
|
Yuan Q, Szczypiński FT, Jelfs KE. Explainable graph neural networks for organic cages. DIGITAL DISCOVERY 2022; 1:127-138. [PMID: 35515082 PMCID: PMC8996732 DOI: 10.1039/d1dd00039j] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 02/09/2022] [Indexed: 01/12/2023]
Abstract
The development of accurate and explicable machine learning models to predict the properties of topologically complex systems is a challenge in materials science. Porous organic cages, a class of polycyclic molecular materials, have potential application in molecular separations, catalysis and encapsulation. For most applications of porous organic cages, having a permanent internal cavity in the absence of solvent, a property termed “shape persistence” is critical. Here, we report the development of Graph Neural Networks (GNNs) to predict the shape persistence of organic cages. Graph neural networks are a class of neural networks where the data, in our case that of organic cages, are represented by graphs. The performance of the GNN models was measured against a previously reported computational database of organic cages formed through a range of [4 + 6] reactions with a variety of reaction chemistries. The reported GNNs have an improved prediction accuracy and transferability compared to random forest predictions. Apart from the improvement in predictive power, we explored the explicability of the GNNs by computing the integrated gradient of the GNN input. The contribution of monomers and molecular fragments to the shape persistence of the organic cages could be quantitatively evaluated with integrated gradients. With the added explicability of the GNNs, it was possible not only to accurately predict the property of organic materials, but also to interpret the predictions of the deep learning models and provide structural insights for the discovery of future materials. We report the development of explainable Graph Neural Networks to predict shape persistence of organic cages. Integrated gradient analysis identifies collapse-inducing molecular fragments and helps chemists design more shape persistent structures.![]()
Collapse
Affiliation(s)
- Qi Yuan
- Department of Chemistry, Molecular Sciences Research Hub, White City Campus, Imperial College London, Wood Lane, London, UK
| | - Filip T. Szczypiński
- Department of Chemistry, Molecular Sciences Research Hub, White City Campus, Imperial College London, Wood Lane, London, UK
| | - Kim E. Jelfs
- Department of Chemistry, Molecular Sciences Research Hub, White City Campus, Imperial College London, Wood Lane, London, UK
| |
Collapse
|
11
|
Sousa T, Correia J, Pereira V, Rocha M. Generative Deep Learning for Targeted Compound Design. J Chem Inf Model 2021; 61:5343-5361. [PMID: 34699719 DOI: 10.1021/acs.jcim.0c01496] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
In the past few years, de novo molecular design has increasingly been using generative models from the emergent field of Deep Learning, proposing novel compounds that are likely to possess desired properties or activities. De novo molecular design finds applications in different fields ranging from drug discovery and materials sciences to biotechnology. A panoply of deep generative models, including architectures as Recurrent Neural Networks, Autoencoders, and Generative Adversarial Networks, can be trained on existing data sets and provide for the generation of novel compounds. Typically, the new compounds follow the same underlying statistical distributions of properties exhibited on the training data set Additionally, different optimization strategies, including transfer learning, Bayesian optimization, reinforcement learning, and conditional generation, can direct the generation process toward desired aims, regarding their biological activities, synthesis processes or chemical features. Given the recent emergence of these technologies and their relevance, this work presents a systematic and critical review on deep generative models and related optimization methods for targeted compound design, and their applications.
Collapse
Affiliation(s)
- Tiago Sousa
- Centre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, Portugal
| | - João Correia
- Centre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, Portugal
| | - Vítor Pereira
- Centre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, Portugal
| |
Collapse
|
12
|
Omar ÖH, Del Cueto M, Nematiaram T, Troisi A. High-throughput virtual screening for organic electronics: a comparative study of alternative strategies. JOURNAL OF MATERIALS CHEMISTRY. C 2021; 9:13557-13583. [PMID: 34745630 PMCID: PMC8515942 DOI: 10.1039/d1tc03256a] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 09/13/2021] [Indexed: 06/01/2023]
Abstract
We present a review of the field of high-throughput virtual screening for organic electronics materials focusing on the sequence of methodological choices that determine each virtual screening protocol. These choices are present in all high-throughput virtual screenings and addressing them systematically will lead to optimised workflows and improve their applicability. We consider the range of properties that can be computed and illustrate how their accuracy can be determined depending on the quality and size of the experimental datasets. The approaches to generate candidates for virtual screening are also extremely varied and their relative strengths and weaknesses are discussed. The analysis of high-throughput virtual screening is almost never limited to the identification of top candidates and often new patterns and structure-property relations are the most interesting findings of such searches. The review reveals a very dynamic field constantly adapting to match an evolving landscape of applications, methodologies and datasets.
Collapse
Affiliation(s)
- Ömer H Omar
- Department of Chemistry, University of Liverpool Liverpool L69 3BX UK
| | - Marcos Del Cueto
- Department of Chemistry, University of Liverpool Liverpool L69 3BX UK
| | | | - Alessandro Troisi
- Department of Chemistry, University of Liverpool Liverpool L69 3BX UK
| |
Collapse
|
13
|
Zhao ZW, Omar ÖH, Padula D, Geng Y, Troisi A. Computational Identification of Novel Families of Nonfullerene Acceptors by Modification of Known Compounds. J Phys Chem Lett 2021; 12:5009-5015. [PMID: 34018746 DOI: 10.1021/acs.jpclett.1c01010] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We considered a database of tens of thousands of known organic semiconductors and identified those compounds with computed electronic properties (orbital energies, excited state energies, and oscillator strengths) that would make them suitable as nonfullerene electron acceptors in organic solar cells. The range of parameters for the desirable acceptors is determined from a set of experimentally characterized high-efficiency nonfullerene acceptors. This search leads to ∼30 lead compounds never considered before for organic photovoltaic applications. We then proceed to modify these compounds to bring their computed solubility in line with that of the best small-molecule nonfullerene acceptors. A further refinement of the search can be based on additional properties like the reorganization energy for chemical reduction. This simple strategy, which relies on a few easily computable parameters and can be expanded to a larger set of molecules, enables the identification of completely new chemical families to be explored experimentally.
Collapse
Affiliation(s)
- Zhi-Wen Zhao
- Institute of Functional Material Chemistry, Faculty of Chemistry, Northeast Normal University, Changchun 130024, Jilin, P. R. China
| | - Ömer H Omar
- Department of Chemistry, University of Liverpool, Liverpool L69 3BX, U.K
| | - Daniele Padula
- Dipartimento di Biotecnologie, Chimica e Farmacia, Università di Siena, via A. Moro 2, Siena 53100, Italy
| | - Yun Geng
- Institute of Functional Material Chemistry, Faculty of Chemistry, Northeast Normal University, Changchun 130024, Jilin, P. R. China
| | - Alessandro Troisi
- Department of Chemistry, University of Liverpool, Liverpool L69 3BX, U.K
| |
Collapse
|
14
|
Zhang J, Mercado R, Engkvist O, Chen H. Comparative Study of Deep Generative Models on Chemical Space Coverage. J Chem Inf Model 2021; 61:2572-2581. [PMID: 34015916 DOI: 10.1021/acs.jcim.0c01328] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In recent years, deep molecular generative models have emerged as promising methods for de novo molecular design. Thanks to the rapid advance of deep learning techniques, deep learning architectures such as recurrent neural networks, variational autoencoders, and adversarial networks have been successfully employed for constructing generative models. Recently, quite a few metrics have been proposed to evaluate these deep generative models. However, many of these metrics cannot evaluate the chemical space coverage of sampled molecules. This work presents a novel and complementary metric for evaluating deep molecular generative models. The metric is based on the chemical space coverage of a reference dataset-GDB-13. The performance of seven different molecular generative models was compared by calculating what fraction of the structures, ring systems, and functional groups could be reproduced from the largely unseen reference set when using only a small fraction of GDB-13 for training. The results show that the performance of the generative models studied varies significantly using the benchmark metrics introduced herein, such that the generalization capabilities of the generative models can be clearly differentiated. In addition, the coverages of GDB-13 ring systems and functional groups were compared between the models. Our study provides a useful new metric that can be used for evaluating and comparing generative models.
Collapse
Affiliation(s)
- Jie Zhang
- Guangdong Provincial Key Laboratory of Laboratory Animals, Guangdong Laboratory Animals Monitoring Institute, Guangzhou 510663, P. R. China.,State Key Laboratory of Respiratory Disease, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, P. R. China.,Bioland Laboratory (Guangzhou Regenerative Medicine and Health-Guangdong Laboratory), Guangzhou 510530, P. R. China
| | - Rocío Mercado
- Discovery Sciences, R&D, AstraZeneca, Gothenburg 43183, Sweden
| | - Ola Engkvist
- Discovery Sciences, R&D, AstraZeneca, Gothenburg 43183, Sweden
| | - Hongming Chen
- Bioland Laboratory (Guangzhou Regenerative Medicine and Health-Guangdong Laboratory), Guangzhou 510530, P. R. China
| |
Collapse
|
15
|
Feng J, Wang H, Ji Y, Li Y. Molecular design and performance improvement in organic solar cells guided by high‐throughput screening and machine learning. NANO SELECT 2021. [DOI: 10.1002/nano.202100006] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Affiliation(s)
- Jie Feng
- Institute of Functional Nano & Soft Materials (FUNSOM), Jiangsu Key Laboratory for Carbon‐Based Functional Materials & Devices Soochow University Suzhou Jiangsu China
| | - Hongshuai Wang
- Institute of Functional Nano & Soft Materials (FUNSOM), Jiangsu Key Laboratory for Carbon‐Based Functional Materials & Devices Soochow University Suzhou Jiangsu China
| | - Yujin Ji
- Institute of Functional Nano & Soft Materials (FUNSOM), Jiangsu Key Laboratory for Carbon‐Based Functional Materials & Devices Soochow University Suzhou Jiangsu China
| | - Youyong Li
- Institute of Functional Nano & Soft Materials (FUNSOM), Jiangsu Key Laboratory for Carbon‐Based Functional Materials & Devices Soochow University Suzhou Jiangsu China
- Macao Institute of Materials Science and Engineering Macau University of Science and Technology, Taipa, Macau SAR Macau China
| |
Collapse
|
16
|
Greenaway RL, Jelfs KE. Integrating Computational and Experimental Workflows for Accelerated Organic Materials Discovery. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2021; 33:e2004831. [PMID: 33565203 DOI: 10.1002/adma.202004831] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 09/28/2020] [Indexed: 06/12/2023]
Abstract
Organic materials find application in a range of areas, including optoelectronics, sensing, encapsulation, molecular separations, and photocatalysis. The discovery of materials is frustratingly slow however, particularly when contrasted to the vast chemical space of possibilities based on the near limitless options for organic molecular precursors. The difficulty in predicting the material assembly, and consequent properties, of any molecule is another significant roadblock to targeted materials design. There has been significant progress in the development of computational approaches to screen large numbers of materials, for both their structure and properties, helping guide synthetic researchers toward promising materials. In particular, artificial intelligence techniques have the potential to make significant impact in many elements of the discovery process. Alongside this, automation and robotics are increasing the scale and speed with which materials synthesis can be realized. Herein, the focus is on demonstrating the power of integrating computational and experimental materials discovery programmes, including both a summary of key situations where approaches can be combined and a series of case studies that demonstrate recent successes.
Collapse
Affiliation(s)
- Rebecca L Greenaway
- Department of Chemistry, Imperial College London, Molecular Sciences Research Hub, White City Campus, Wood Lane, London, W12 0BZ, UK
| | - Kim E Jelfs
- Department of Chemistry, Imperial College London, Molecular Sciences Research Hub, White City Campus, Wood Lane, London, W12 0BZ, UK
| |
Collapse
|
17
|
Leguy J, Cauchy T, Glavatskikh M, Duval B, Da Mota B. EvoMol: a flexible and interpretable evolutionary algorithm for unbiased de novo molecular generation. J Cheminform 2020; 12:55. [PMID: 33431049 PMCID: PMC7494000 DOI: 10.1186/s13321-020-00458-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 08/31/2020] [Indexed: 11/24/2022] Open
Abstract
The objective of this work is to design a molecular generator capable of exploring known as well as unfamiliar areas of the chemical space. Our method must be flexible to adapt to very different problems. Therefore, it has to be able to work with or without the influence of prior data and knowledge. Moreover, regardless of the success, it should be as interpretable as possible to allow for diagnosis and improvement. We propose here a new open source generation method using an evolutionary algorithm to sequentially build molecular graphs. It is independent of starting data and can generate totally unseen compounds. To be able to search a large part of the chemical space, we define an original set of 7 generic mutations close to the atomic level. Our method achieves excellent performances and even records on the QED, penalised logP, SAscore, CLscore as well as the set of goal-directed functions defined in GuacaMol. To demonstrate its flexibility, we tackle a very different objective issued from the organic molecular materials domain. We show that EvoMol can generate sets of optimised molecules having high energy HOMO or low energy LUMO, starting only from methane. We can also set constraints on a synthesizability score and structural features. Finally, the interpretability of EvoMol allows for the visualisation of its exploration process as a chemically relevant tree. ![]()
Collapse
Affiliation(s)
- Jules Leguy
- Laboratoire LERIA, UNIV Angers, SFR MathSTIC, 2 Bd Lavoisier, 49045, Angers, France
| | - Thomas Cauchy
- Laboratoire MOLTECH-Anjou, UMR CNRS 6200, UNIV Angers, SFR MATRIX, 2 Bd Lavoisier, 49045, Angers, France.
| | - Marta Glavatskikh
- Laboratoire LERIA, UNIV Angers, SFR MathSTIC, 2 Bd Lavoisier, 49045, Angers, France.,Laboratoire MOLTECH-Anjou, UMR CNRS 6200, UNIV Angers, SFR MATRIX, 2 Bd Lavoisier, 49045, Angers, France
| | - Béatrice Duval
- Laboratoire LERIA, UNIV Angers, SFR MathSTIC, 2 Bd Lavoisier, 49045, Angers, France
| | - Benoit Da Mota
- Laboratoire LERIA, UNIV Angers, SFR MathSTIC, 2 Bd Lavoisier, 49045, Angers, France.
| |
Collapse
|
18
|
Zhang J, Terayama K, Sumita M, Yoshizoe K, Ito K, Kikuchi J, Tsuda K. NMR-TS: de novo molecule identification from NMR spectra. SCIENCE AND TECHNOLOGY OF ADVANCED MATERIALS 2020; 21:552-561. [PMID: 32939179 PMCID: PMC7476483 DOI: 10.1080/14686996.2020.1793382] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 07/05/2020] [Accepted: 07/05/2020] [Indexed: 05/09/2023]
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is an effective tool for identifying molecules in a sample. Although many previously observed NMR spectra are accumulated in public databases, they cover only a tiny fraction of the chemical space, and molecule identification is typically accomplished manually based on expert knowledge. Herein, we propose NMR-TS, a machine-learning-based python library, to automatically identify a molecule from its NMR spectrum. NMR-TS discovers candidate molecules whose NMR spectra match the target spectrum by using deep learning and density functional theory (DFT)-computed spectra. As a proof-of-concept, we identify prototypical metabolites from their computed spectra. After an average 5451 DFT runs for each spectrum, six of the nine molecules are identified correctly, and proximal molecules are obtained in the other cases. This encouraging result implies that de novo molecule generation can contribute to the fully automated identification of chemical structures. NMR-TS is available at https://github.com/tsudalab/NMR-TS.
Collapse
Affiliation(s)
- Jinzhe Zhang
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan
- RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| | - Kei Terayama
- RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Graduate School of Medicine, Kyoto University, Kyoto, Japan
- RIKEN Medical Sciences Innovation Hub Program (MIH), Yokohama, Japan
- Graduate School of Medical Life Science, Yokohama City University, Yokohama, Japan
| | - Masato Sumita
- RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- International Center for Materials Nanoarchitectonics (WPI-MANA), National Institute for Materials Science, Tsukuba, Japan
| | - Kazuki Yoshizoe
- RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| | - Kengo Ito
- Graduate School of Medical Life Science, Yokohama City University, Yokohama, Japan
- RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Jun Kikuchi
- Graduate School of Medical Life Science, Yokohama City University, Yokohama, Japan
- RIKEN Center for Sustainable Resource Science, Yokohama, Japan
- Graduate School of Bioagricultural Sciences, Nagoya University, Nagoya, Japan
| | - Koji Tsuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan
- RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba, Japan
| |
Collapse
|