1
|
Bhutto JA, Siddique B, Moussa IM, El-Sheikh MA, Hu Z, Yurong G. Machine learning assisted designing of non-fullerene electron acceptors: A quest for lower exciton binding energy. Heliyon 2024; 10:e30473. [PMID: 38711638 PMCID: PMC11070922 DOI: 10.1016/j.heliyon.2024.e30473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 04/05/2024] [Accepted: 04/27/2024] [Indexed: 05/08/2024] Open
Abstract
The designing of acceptors materials for the organic solar cells is a hot topic. The normal experimental methods are tedious and expensive for large screening. Machine learning guided exploration is more suitable solution. Bagging regression, random forest regression, gradient boosting regression, and linear regression are trained to predict exciton binding energy. Breaking Retrosynthetically Interesting Chemical Substructures (BRICS) methodology has utilized for designing of new non-fullerene acceptors (NFAs). The predicted values were used to select the designed NFAs. On the selected NFAs, clustering and chemical similarity analyses are also performed. Chemical fingerprints are used for this purpose, and the synthetic accessibility score of the new NFAs is also investigated.30 NFAs have selected with low exciton binding energy values. This approach will allow for the rapid screening of NFAs for organic solar cells. Our proposed framework stands out as a valuable tool for strategically selecting the most effective NFAs for organic solar cells and offers a streamlined approach for material discovery.
Collapse
Affiliation(s)
- Jameel Ahmed Bhutto
- College of Computer Science, Huang Gang Normal University, Huanggang, 438000, China
| | - Bilal Siddique
- Department of Chemistry, Division of Science and Technology, University of Education, Lahore, 54770, Pakistan
| | - Ihab Mohamed Moussa
- Department of Botany and Microbiology, College of Science, King Saud University, P.O. Box 2455, Riyadh, 11451, Saudi Arabia
| | - Mohamed A. El-Sheikh
- Department of Botany and Microbiology, College of Science, King Saud University, P.O. Box 2455, Riyadh, 11451, Saudi Arabia
| | - Zhihua Hu
- College of Computer Science, Huang Gang Normal University, Huanggang, 438000, China
| | - Guan Yurong
- College of Computer Science, Huang Gang Normal University, Huanggang, 438000, China
| |
Collapse
|
2
|
Akbar B, Tayara H, Chong KT. Unveiling dominant recombination loss in perovskite solar cells with a XGBoost-based machine learning approach. iScience 2024; 27:109200. [PMID: 38420582 PMCID: PMC10901077 DOI: 10.1016/j.isci.2024.109200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/12/2023] [Accepted: 02/07/2024] [Indexed: 03/02/2024] Open
Abstract
Remarkable and intelligent perovskite solar cells (PSCs) have attracted substantial attention from researchers and are undergoing rapid advancements in photovoltaic technology. These developments aim to create highly efficient energy devices with fewer dominant recombination losses within the realm of third-generation solar cells. Diverse machine learning (ML) algorithms implemented, addressing dominant losses due to recombination in PSCs, focusing on grain boundaries (GBs), interfaces, and band-to-band recombination. The extreme gradient boosting (XGBoost) classifier effectively predicts the recombination losses. Our model trained with 7-fold cross-validation to ensure generalizability and robustness. Leveraging Optuna and shapley additive explanations (SHAP) for hyperparameter optimization and investigate the influence of features on target variables, achieved 85% accuracy on over 2 million simulated data, respectively. Because of the input parameters (light intensity and open-circuit voltage), the performance evaluation measures for the dominant losses caused by the recombination predicted by proposed model were superior to those of state-of-the-art models.
Collapse
Affiliation(s)
- Basir Akbar
- Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea
| | - Kil To Chong
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| |
Collapse
|
3
|
AlFaraj Y, Mohapatra S, Shieh P, Husted KEL, Ivanoff DG, Lloyd EM, Cooper JC, Dai Y, Singhal AP, Moore JS, Sottos NR, Gomez-Bombarelli R, Johnson JA. A Model Ensemble Approach Enables Data-Driven Property Prediction for Chemically Deconstructable Thermosets in the Low-Data Regime. ACS CENTRAL SCIENCE 2023; 9:1810-1819. [PMID: 37780353 PMCID: PMC10540282 DOI: 10.1021/acscentsci.3c00502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Indexed: 10/03/2023]
Abstract
Thermosets present sustainability challenges that could potentially be addressed through the design of deconstructable variants with tunable properties; however, the combinatorial space of possible thermoset molecular building blocks (e.g., monomers, cross-linkers, and additives) and manufacturing conditions is vast, and predictive knowledge for how combinations of these molecular components translate to bulk thermoset properties is lacking. Data science could overcome these problems, but computational methods are difficult to apply to multicomponent, amorphous, statistical copolymer materials for which little data exist. Here, leveraging a data set with 101 examples, we introduce a closed-loop experimental, machine learning (ML), and virtual screening strategy to enable predictions of the glass transition temperature (Tg) of polydicyclopentadiene (pDCPD) thermosets containing cleavable bifunctional silyl ether (BSE) comonomers and/or cross-linkers with varied compositions and loadings. Molecular features and formulation variables are used as model inputs, and uncertainty is quantified through model ensembling, which together with heavy regularization helps to avoid overfitting and ultimately achieves predictions within <15 °C for thermosets with compositionally diverse BSEs. This work offers a path to predicting the properties of thermosets based on their molecular building blocks, which may accelerate the discovery of promising plastics, rubbers, and composites with improved functionality and controlled deconstructability.
Collapse
Affiliation(s)
- Yasmeen
S. AlFaraj
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Somesh Mohapatra
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Peyton Shieh
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Keith E. L. Husted
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Douglass G. Ivanoff
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Evan M. Lloyd
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
- Department
of Chemistry, University of Illinois at
Urbana—Champaign, Urbana, Illinois 61801, United States of America
| | - Julian C. Cooper
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
- Department
of Chemistry, University of Illinois at
Urbana—Champaign, Urbana, Illinois 61801, United States of America
| | - Yutong Dai
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Avni P. Singhal
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Jeffrey S. Moore
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Nancy R. Sottos
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Rafael Gomez-Bombarelli
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Jeremiah A. Johnson
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| |
Collapse
|
4
|
Bhat V, Sornberger P, Pokuri BSS, Duke R, Ganapathysubramanian B, Risko C. Electronic, redox, and optical property prediction of organic π-conjugated molecules through a hierarchy of machine learning approaches. Chem Sci 2022; 14:203-213. [PMID: 36605753 PMCID: PMC9769113 DOI: 10.1039/d2sc04676h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 11/16/2022] [Indexed: 11/18/2022] Open
Abstract
Accelerating the development of π-conjugated molecules for applications such as energy generation and storage, catalysis, sensing, pharmaceuticals, and (semi)conducting technologies requires rapid and accurate evaluation of the electronic, redox, or optical properties. While high-throughput computational screening has proven to be a tremendous aid in this regard, machine learning (ML) and other data-driven methods can further enable orders of magnitude reduction in time while at the same time providing dramatic increases in the chemical space that is explored. However, the lack of benchmark datasets containing the electronic, redox, and optical properties that characterize the diverse, known chemical space of organic π-conjugated molecules limits ML model development. Here, we present a curated dataset containing 25k molecules with density functional theory (DFT) and time-dependent DFT (TDDFT) evaluated properties that include frontier molecular orbitals, ionization energies, relaxation energies, and low-lying optical excitation energies. Using the dataset, we train a hierarchy of ML models, ranging from classical models such as ridge regression to sophisticated graph neural networks, with molecular SMILES representation as input. We observe that graph neural networks augmented with contextual information allow for significantly better predictions across a wide array of properties. Our best-performing models also provide an uncertainty quantification for the predictions. To democratize access to the data and trained models, an interactive web platform has been developed and deployed.
Collapse
Affiliation(s)
- Vinayak Bhat
- Department of Chemistry and Center for Applied Energy Research, University of Kentucky Lexington Kentucky 40506 USA
| | - Parker Sornberger
- Department of Chemistry and Center for Applied Energy Research, University of Kentucky Lexington Kentucky 40506 USA
| | - Balaji Sesha Sarath Pokuri
- Department of Mechanical Engineering and Translational AI Center, Iowa State University Ames Iowa 50010 USA
| | - Rebekah Duke
- Department of Chemistry and Center for Applied Energy Research, University of Kentucky Lexington Kentucky 40506 USA
| | | | - Chad Risko
- Department of Chemistry and Center for Applied Energy Research, University of Kentucky Lexington Kentucky 40506 USA
| |
Collapse
|
5
|
Abstract
Computational modeling is increasingly used to assist in the discovery of supramolecular materials. Supramolecular materials are typically primarily built from organic components that are self-assembled through noncovalent bonding and have potential applications, including in selective binding, sorption, molecular separations, catalysis, optoelectronics, sensing, and as molecular machines. In this review, the key areas where computational prediction can assist in the discovery of supramolecular materials, including in structure prediction, property prediction, and the prediction of how to synthesize a hypothetical material are discussed, before exploring the potential impact of artificial intelligence techniques on the field. Throughout, the importance of close integration with experimental materials discovery programs will be highlighted. A series of case studies from the author's work across some different supramolecular material classes will be discussed, before finishing with a discussion of the outlook for the field.
Collapse
Affiliation(s)
- Kim E. Jelfs
- Department of Chemistry, Molecular Sciences Research HubImperial College LondonLondonUK
| |
Collapse
|
6
|
Tao L, Arbaugh T, Byrnes J, Varshney V, Li Y. Unified machine learning protocol for copolymer structure-property predictions. STAR Protoc 2022; 3:101875. [PMID: 36595914 PMCID: PMC9700038 DOI: 10.1016/j.xpro.2022.101875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Revised: 10/06/2022] [Accepted: 11/01/2022] [Indexed: 11/23/2022] Open
Abstract
Structure-property relationships are extremely valuable when predicting the properties of polymers. This protocol demonstrates a step-by-step approach, based on multiple machine learning (ML) architectures, which is capable of processing copolymer types such as alternating, random, block, and gradient copolymers. We detail steps for necessary software installation and construction of datasets. We further describe training and optimization steps for four neural network models and subsequent model visualization and comparison using training and test values. For complete details on the use and execution of this protocol, please refer to Tao et al. (2022).1.
Collapse
Affiliation(s)
- Lei Tao
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA
| | - Tom Arbaugh
- Department of Physics, Wesleyan University, Middletown, CT 06459, USA
| | | | - Vikas Varshney
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, Dayton, OH 45433, USA
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA,Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, WI 53706-1572, USA,Corresponding author
| |
Collapse
|
7
|
Aldeghi M, Coley CW. A graph representation of molecular ensembles for polymer property prediction. Chem Sci 2022; 13:10486-10498. [PMID: 36277616 PMCID: PMC9473492 DOI: 10.1039/d2sc02839e] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 08/15/2022] [Indexed: 12/02/2022] Open
Abstract
Synthetic polymers are versatile and widely used materials. Similar to small organic molecules, a large chemical space of such materials is hypothetically accessible. Computational property prediction and virtual screening can accelerate polymer design by prioritizing candidates expected to have favorable properties. However, in contrast to organic molecules, polymers are often not well-defined single structures but an ensemble of similar molecules, which poses unique challenges to traditional chemical representations and machine learning approaches. Here, we introduce a graph representation of molecular ensembles and an associated graph neural network architecture that is tailored to polymer property prediction. We demonstrate that this approach captures critical features of polymeric materials, like chain architecture, monomer stoichiometry, and degree of polymerization, and achieves superior accuracy to off-the-shelf cheminformatics methodologies. While doing so, we built a dataset of simulated electron affinity and ionization potential values for >40k polymers with varying monomer composition, stoichiometry, and chain architecture, which may be used in the development of other tailored machine learning approaches. The dataset and machine learning models presented in this work pave the path toward new classes of algorithms for polymer informatics and, more broadly, introduce a framework for the modeling of molecular ensembles.
Collapse
Affiliation(s)
- Matteo Aldeghi
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
8
|
Tao L, Byrnes J, Varshney V, Li Y. Machine learning strategies for the structure-property relationship of copolymers. iScience 2022; 25:104585. [PMID: 35789847 PMCID: PMC9249671 DOI: 10.1016/j.isci.2022.104585] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/26/2022] [Accepted: 06/07/2022] [Indexed: 11/15/2022] Open
Abstract
Establishing the structure-property relationship is extremely valuable for the molecular design of copolymers. However, machine learning (ML) models can incorporate both chemical composition and sequence distribution of monomers, and have the generalization ability to process various copolymer types (e.g., alternating, random, block, and gradient copolymers) with a unified approach are missing. To address this challenge, we formulate four different ML models for investigation, including a feedforward neural network (FFNN) model, a convolutional neural network (CNN) model, a recurrent neural network (RNN) model, and a combined FFNN/RNN (Fusion) model. We use various copolymer types to systematically validate the performance and generalizability of different models. We find that the RNN architecture that processes the monomer sequence information both forward and backward is a more suitable ML model for copolymers with better generalizability. As a supplement to polymer informatics, our proposed approach provides an efficient way for the evaluation of copolymers.
Collapse
Affiliation(s)
- Lei Tao
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA
| | | | - Vikas Varshney
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433, USA
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA
- Polymer Program, Institute of Materials Science, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
9
|
Verma S, Rivera M, Scanlon DO, Walsh A. Machine learned calibrations to high-throughput molecular excited state calculations. J Chem Phys 2022; 156:134116. [PMID: 35395896 DOI: 10.1063/5.0084535] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Understanding the excited state properties of molecules provides insight into how they interact with light. These interactions can be exploited to design compounds for photochemical applications, including enhanced spectral conversion of light to increase the efficiency of photovoltaic cells. While chemical discovery is time- and resource-intensive experimentally, computational chemistry can be used to screen large-scale databases for molecules of interest in a procedure known as high-throughput virtual screening. The first step usually involves a high-speed but low-accuracy method to screen large numbers of molecules (potentially millions), so only the best candidates are evaluated with expensive methods. However, use of a coarse first-pass screening method can potentially result in high false positive or false negative rates. Therefore, this study uses machine learning to calibrate a high-throughput technique [eXtended Tight Binding based simplified Tamm-Dancoff approximation (xTB-sTDA)] against a higher accuracy one (time-dependent density functional theory). Testing the calibration model shows an approximately sixfold decrease in the error in-domain and an approximately threefold decrease in the out-of-domain. The resulting mean absolute error of ∼0.14 eV is in line with previous work in machine learning calibrations and out-performs previous work in linear calibration of xTB-sTDA. We then apply the calibration model to screen a 250k molecule database and map inaccuracies of xTB-sTDA in chemical space. We also show generalizability of the workflow by calibrating against a higher-level technique (CC2), yielding a similarly low error. Overall, this work demonstrates that machine learning can be used to develop a cost-effective and accurate method for large-scale excited state screening, enabling accelerated molecular discovery across a variety of disciplines.
Collapse
Affiliation(s)
- Shomik Verma
- Department of Materials, Imperial College London, Exhibition Road, London SW7 2AZ, United Kingdom
| | - Miguel Rivera
- Department of Chemistry, University College London, 20 Gordon Street, London WC1H 0AJ, United Kingdom
| | - David O Scanlon
- Department of Chemistry and Thomas Young Centre, University College London, 20 Gordon Street, London WC1H 0AJ, United Kingdom
| | - Aron Walsh
- Department of Materials, Imperial College London, Exhibition Road, London SW7 2AZ, United Kingdom
| |
Collapse
|
10
|
Nguyen D, Tao L, Li Y. Integration of Machine Learning and Coarse-Grained Molecular Simulations for Polymer Materials: Physical Understandings and Molecular Design. Front Chem 2022; 9:820417. [PMID: 35141207 PMCID: PMC8819075 DOI: 10.3389/fchem.2021.820417] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 12/31/2021] [Indexed: 12/21/2022] Open
Abstract
In recent years, the synthesis of monomer sequence-defined polymers has expanded into broad-spectrum applications in biomedical, chemical, and materials science fields. Pursuing the characterization and inverse design of these polymer systems requires our fundamental understanding not only at the individual monomer level, but also considering the chain scales, such as polymer configuration, self-assembly, and phase separation. However, our accessibility to this field is still rudimentary due to the limitations of traditional design approaches, the complexity of chemical space along with the burdened cost and time issues that prevent us from unveiling the underlying monomer sequence-structure-property relationships. Fortunately, thanks to the recent advancements in molecular dynamics simulations and machine learning (ML) algorithms, the bottlenecks in the tasks of establishing the structure-function correlation of the polymer chains can be overcome. In this review, we will discuss the applications of the integration between ML techniques and coarse-grained molecular dynamics (CGMD) simulations to solve the current issues in polymer science at the chain level. In particular, we focus on the case studies in three important topics-polymeric configuration characterization, feed-forward property prediction, and inverse design-in which CGMD simulations are leveraged to generate training datasets to develop ML-based surrogate models for specific polymer systems and designs. By doing so, this computational hybridization allows us to well establish the monomer sequence-functional behavior relationship of the polymers as well as guide us toward the best polymer chain candidates for the inverse design in undiscovered chemical space with reasonable computational cost and time. Even though there are still limitations and challenges ahead in this field, we finally conclude that this CGMD/ML integration is very promising, not only in the attempt of bridging the monomeric and macroscopic characterizations of polymer materials, but also enabling further tailored designs for sequence-specific polymers with superior properties in many practical applications.
Collapse
Affiliation(s)
- Danh Nguyen
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
| | - Lei Tao
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
- Polymer Program, Institute of Materials Science, University of Connecticut, Mansfield, CT, United States
| |
Collapse
|
11
|
Xu Y, Ju CW, Li B, Ma QS, Chen Z, Zhang L, Chen J. Hydrogen Evolution Prediction for Alternating Conjugated Copolymers Enabled by Machine Learning with Multidimension Fragmentation Descriptors. ACS APPLIED MATERIALS & INTERFACES 2021; 13:34033-34042. [PMID: 34269560 DOI: 10.1021/acsami.1c05536] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Hydrogen evolution by alternating conjugated copolymers has attracted much attention in recent years. To study alternating copolymers with data-driven strategies, two types of multidimension fragmentation descriptors (MDFD), structure-based MDFD (SMDFD), and electronic property-based MDFD (EPMDFD), have been developed with machine learning (ML) algorithms for the first time. The superiority of SMDFD-based models has been demonstrated by the highly accurate and universal predictions of electronic properties. Moreover, EPMDFD-based, experimental-parameter-free ML models were developed for the prediction of the hydrogen evolution reaction, displaying excellent accuracy (real-test accuracy = 0.91). The combination of explainable ML approaches and first-principles calculations was employed to explore photocatalytic dynamics, revealing the importance of electron delocalization in the excited state. Virtual designing of high-performance candidates can also be achieved. Our work illustrates the huge potential of ML-based material design in the field of polymeric photocatalysts toward high-performance photocatalysis.
Collapse
Affiliation(s)
- Yuzhi Xu
- Institute of Polymer Optoelectronic Materials and Devices, State Key Laboratory of Luminescent Materials and Devices, College of Materials Science and Engineering, South China University of Technology, Guangzhou 510640, China
| | - Cheng-Wei Ju
- College of Chemistry, Nankai University, Tianjin 300071, China
| | - Bo Li
- Department of Chemistry, School of Science, Tianjin University, Tianjin 300072, China
| | - Qiu-Shi Ma
- School of Resource and Environmental Engineering, Hefei University of Technology, Hefei 230009, China
| | - Zhenyu Chen
- School of Materials Science and Engineering, Nankai University, Tianjin 300350, China
| | - Lianjie Zhang
- Institute of Polymer Optoelectronic Materials and Devices, State Key Laboratory of Luminescent Materials and Devices, College of Materials Science and Engineering, South China University of Technology, Guangzhou 510640, China
| | - Junwu Chen
- Institute of Polymer Optoelectronic Materials and Devices, State Key Laboratory of Luminescent Materials and Devices, College of Materials Science and Engineering, South China University of Technology, Guangzhou 510640, China
| |
Collapse
|
12
|
Turcani L, Tarzia A, Szczypiński FT, Jelfs KE. stk: An extendable Python framework for automated molecular and supramolecular structure assembly and discovery. J Chem Phys 2021; 154:214102. [PMID: 34240979 DOI: 10.1063/5.0049708] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Computational software workflows are emerging as all-in-one solutions to speed up the discovery of new materials. Many computational approaches require the generation of realistic structural models for property prediction and candidate screening. However, molecular and supramolecular materials represent classes of materials with many potential applications for which there is no go-to database of existing structures or general protocol for generating structures. Here, we report a new version of the supramolecular toolkit, stk, an open-source, extendable, and modular Python framework for general structure generation of (supra)molecular structures. Our construction approach works on arbitrary building blocks and topologies and minimizes the input required from the user, making stk user-friendly and applicable to many material classes. This version of stk includes metal-containing structures and rotaxanes as well as general implementation and interface improvements. Additionally, this version includes built-in tools for exploring chemical space with an evolutionary algorithm and tools for database generation and visualization. The latest version of stk is freely available at github.com/lukasturcani/stk.
Collapse
Affiliation(s)
- Lukas Turcani
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London W12 0BZ, United Kingdom
| | - Andrew Tarzia
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London W12 0BZ, United Kingdom
| | - Filip T Szczypiński
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London W12 0BZ, United Kingdom
| | - Kim E Jelfs
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London W12 0BZ, United Kingdom
| |
Collapse
|
13
|
Greenaway RL, Jelfs KE. Integrating Computational and Experimental Workflows for Accelerated Organic Materials Discovery. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2021; 33:e2004831. [PMID: 33565203 PMCID: PMC11468036 DOI: 10.1002/adma.202004831] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 09/28/2020] [Indexed: 06/12/2023]
Abstract
Organic materials find application in a range of areas, including optoelectronics, sensing, encapsulation, molecular separations, and photocatalysis. The discovery of materials is frustratingly slow however, particularly when contrasted to the vast chemical space of possibilities based on the near limitless options for organic molecular precursors. The difficulty in predicting the material assembly, and consequent properties, of any molecule is another significant roadblock to targeted materials design. There has been significant progress in the development of computational approaches to screen large numbers of materials, for both their structure and properties, helping guide synthetic researchers toward promising materials. In particular, artificial intelligence techniques have the potential to make significant impact in many elements of the discovery process. Alongside this, automation and robotics are increasing the scale and speed with which materials synthesis can be realized. Herein, the focus is on demonstrating the power of integrating computational and experimental materials discovery programmes, including both a summary of key situations where approaches can be combined and a series of case studies that demonstrate recent successes.
Collapse
Affiliation(s)
- Rebecca L. Greenaway
- Department of ChemistryImperial College LondonMolecular Sciences Research HubWhite City Campus, Wood LaneLondonW12 0BZUK
| | - Kim E. Jelfs
- Department of ChemistryImperial College LondonMolecular Sciences Research HubWhite City Campus, Wood LaneLondonW12 0BZUK
| |
Collapse
|
14
|
Tu KH, Huang H, Lee S, Lee W, Sun Z, Alexander-Katz A, Ross CA. Machine Learning Predictions of Block Copolymer Self-Assembly. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2020; 32:e2005713. [PMID: 33206426 DOI: 10.1002/adma.202005713] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Revised: 10/15/2020] [Indexed: 06/11/2023]
Abstract
Directed self-assembly of block copolymers is a key enabler for nanofabrication of devices with sub-10 nm feature sizes, allowing patterning far below the resolution limit of conventional photolithography. Among all the process steps involved in block copolymer self-assembly, solvent annealing plays a dominant role in determining the film morphology and pattern quality, yet the interplay of the multiple parameters during solvent annealing, including the initial thickness, swelling, time, and solvent ratio, makes it difficult to predict and control the resultant self-assembled pattern. Here, machine learning tools are applied to analyze the solvent annealing process and predict the effect of process parameters on morphology and defectivity. Two neural networks are constructed and trained, yielding accurate prediction of the final morphology in agreement with experimental data. A ridge regression model is constructed to identify the critical parameters that determine the quality of line/space patterns. These results illustrate the potential of machine learning to inform nanomanufacturing processes.
Collapse
Affiliation(s)
- Kun-Hua Tu
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Hejin Huang
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Sangho Lee
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Wonmoo Lee
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Zehao Sun
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Alfredo Alexander-Katz
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Caroline A Ross
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| |
Collapse
|
15
|
Analysis of Photosynthetic Systems and Their Applications with Mathematical and Computational Models. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10196821] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
In biological and life science applications, photosynthesis is an important process that involves the absorption and transformation of sunlight into chemical energy. During the photosynthesis process, the light photons are captured by the green chlorophyll pigments in their photosynthetic antennae and further funneled to the reaction center. One of the most important light harvesting complexes that are highly important in the study of photosynthesis is the membrane-attached Fenna–Matthews–Olson (FMO) complex found in the green sulfur bacteria. In this review, we discuss the mathematical formulations and computational modeling of some of the light harvesting complexes including FMO. The most recent research developments in the photosynthetic light harvesting complexes are thoroughly discussed. The theoretical background related to the spectral density, quantum coherence and density functional theory has been elaborated. Furthermore, details about the transfer and excitation of energy in different sites of the FMO complex along with other vital photosynthetic light harvesting complexes have also been provided. Finally, we conclude this review by providing the current and potential applications in environmental science, energy, health and medicine, where such mathematical and computational studies of the photosynthesis and the light harvesting complexes can be readily integrated.
Collapse
|
16
|
Hanaoka K. Deep Neural Networks for Multicomponent Molecular Systems. ACS OMEGA 2020; 5:21042-21053. [PMID: 32875241 PMCID: PMC7450624 DOI: 10.1021/acsomega.0c02599] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 07/20/2020] [Indexed: 06/11/2023]
Abstract
Deep neural networks (DNNs) represent promising approaches to molecular machine learning (ML). However, their applicability remains limited to single-component materials and a general DNN model capable of handling various multicomponent molecular systems with composition data is still elusive, while current ML approaches for multicomponent molecular systems are still molecular descriptor-based. Here, a general DNN architecture extending existing molecular DNN models to multicomponent systems called MEIA is proposed. Case studies showed that the MEIA architecture could extend two exiting molecular DNN models to multicomponent systems with the same procedure, and that the obtained models that could learn both the molecular structure and composition information with equal or better accuracies compared to a well-used molecular descriptor-based model in the best model for each case study. Furthermore, the case studies also showed that, for ML tasks where the molecular structure information plays a minor role, the performance improvements by DNN models were small; while for ML tasks where the molecular structure information plays a major role, the performance improvements by DNN models were large, and DNN models showed notable predictive accuracies for an extremely sparse dataset, which cannot be modeled without the molecular structure information. The enhanced predictive ability of DNN models for sparse datasets of multicomponent systems will extend the applicability of ML in the multicomponent material design. Furthermore, the general capability of MEIA to extend DNN models to multicomponent systems will provide new opportunities to utilize the progress of actively developed single-component DNNs for the modeling of multicomponent systems.
Collapse
|
17
|
Bannwarth C, Caldeweyher E, Ehlert S, Hansen A, Pracht P, Seibert J, Spicher S, Grimme S. Extended
tight‐binding
quantum chemistry methods. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1493] [Citation(s) in RCA: 218] [Impact Index Per Article: 54.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Christoph Bannwarth
- Department of Chemistry and The PULSE Institute Stanford University Stanford California USA
| | - Eike Caldeweyher
- Mulliken Center for Theoretical Chemistry Rheinische Friedrich‐Wilhelms‐Universität Bonn Bonn Germany
| | - Sebastian Ehlert
- Mulliken Center for Theoretical Chemistry Rheinische Friedrich‐Wilhelms‐Universität Bonn Bonn Germany
| | - Andreas Hansen
- Mulliken Center for Theoretical Chemistry Rheinische Friedrich‐Wilhelms‐Universität Bonn Bonn Germany
| | - Philipp Pracht
- Mulliken Center for Theoretical Chemistry Rheinische Friedrich‐Wilhelms‐Universität Bonn Bonn Germany
| | - Jakob Seibert
- Mulliken Center for Theoretical Chemistry Rheinische Friedrich‐Wilhelms‐Universität Bonn Bonn Germany
| | - Sebastian Spicher
- Mulliken Center for Theoretical Chemistry Rheinische Friedrich‐Wilhelms‐Universität Bonn Bonn Germany
| | - Stefan Grimme
- Mulliken Center for Theoretical Chemistry Rheinische Friedrich‐Wilhelms‐Universität Bonn Bonn Germany
| |
Collapse
|
18
|
Coley CW, Eyke NS, Jensen KF. Autonomous Discovery in the Chemical Sciences Part I: Progress. Angew Chem Int Ed Engl 2020; 59:22858-22893. [DOI: 10.1002/anie.201909987] [Citation(s) in RCA: 100] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Indexed: 01/05/2023]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
19
|
Coley CW, Eyke NS, Jensen KF. Autonome Entdeckung in den chemischen Wissenschaften, Teil I: Fortschritt. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.201909987] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
20
|
Yuan Q, Santana-Bonilla A, Zwijnenburg MA, Jelfs KE. Molecular generation targeting desired electronic properties via deep generative models. NANOSCALE 2020; 12:6744-6758. [PMID: 32163074 DOI: 10.1039/c9nr10687a] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
As we seek to discover new functional materials, we need ways to explore the vast chemical space of precursor building blocks, not only generating large numbers of possible building blocks to investigate, but trying to find non-obvious options, that we might not suggest by chemical experience alone. Artificial intelligence techniques provide a possible avenue to generate large numbers of organic building blocks for functional materials, and can even do so from very small initial libraries of known building blocks. Specifically, we demonstrate the application of deep recurrent neural networks for the exploration of the chemical space of building blocks for a test case of donor-acceptor oligomers with specific electronic properties. The recurrent neural network learned how to produce novel donor-acceptor oligomers by trading off between selected atomic substitutions, such as halogenation or methylation, and molecular features such as the oligomer's size. The electronic and structural properties of the generated oligomers can be tuned by sampling from different subsets of the training database, which enabled us to enrich the library of donor-acceptors towards desired properties. We generated approximately 1700 new donor-acceptor oligomers with a recurrent neural network tuned to target oligomers with a HOMO-LUMO gap <2 eV and a dipole moment <2 Debye, which could have potential application in organic photovoltaics.
Collapse
Affiliation(s)
- Qi Yuan
- Department of Chemistry, Molecular Sciences Research Hub, White City Campus, Imperial College London, Wood Lane, London, W12 0BZ, UK.
| | - Alejandro Santana-Bonilla
- Department of Chemistry, Molecular Sciences Research Hub, White City Campus, Imperial College London, Wood Lane, London, W12 0BZ, UK.
| | - Martijn A Zwijnenburg
- Department of Chemistry, University College London, 20 Gordon Street, London WC1H 0AJ, UK
| | - Kim E Jelfs
- Department of Chemistry, Molecular Sciences Research Hub, White City Campus, Imperial College London, Wood Lane, London, W12 0BZ, UK.
| |
Collapse
|
21
|
Mapping the optoelectronic property space of small aromatic molecules. Commun Chem 2020; 3:14. [PMID: 36703446 PMCID: PMC9814262 DOI: 10.1038/s42004-020-0256-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 12/19/2019] [Indexed: 01/29/2023] Open
Abstract
Small aromatic molecules and their quinone derivatives find use in organic transistors, solar-cells, thermoelectrics, batteries and photocatalysts. These applications exploit the optoelectronic properties of these molecules and the ease by which such properties can be tuned by the introduction of heteroatoms and/or the addition of functional groups. We perform a high-throughput virtual screening using the xTB family of density functional tight-binding methods to map the optoelectronic property space of ~250,000 molecules. The large volume of data generated allows for a broad understanding of how the presence of heteroatoms and functional groups affect the ionisation potential, electron affinity and optical gap values of these molecular semiconductors, and how the structural features - on their own or in combination with one another - allow access to particular regions of the optoelectronic property space. Finally, we identify the apparent boundaries of the optoelectronic property space for these molecules: regions of property space that appear off limits for any small aromatic molecule.
Collapse
|
22
|
St John PC, Phillips C, Kemper TW, Wilson AN, Guan Y, Crowley MF, Nimlos MR, Larsen RE. Message-passing neural networks for high-throughput polymer screening. J Chem Phys 2019; 150:234111. [PMID: 31228909 DOI: 10.1063/1.5099132] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Machine learning methods have shown promise in predicting molecular properties, and given sufficient training data, machine learning approaches can enable rapid high-throughput virtual screening of large libraries of compounds. Graph-based neural network architectures have emerged in recent years as the most successful approach for predictions based on molecular structure and have consistently achieved the best performance on benchmark quantum chemical datasets. However, these models have typically required optimized 3D structural information for the molecule to achieve the highest accuracy. These 3D geometries are costly to compute for high levels of theory, limiting the applicability and practicality of machine learning methods in high-throughput screening applications. In this study, we present a new database of candidate molecules for organic photovoltaic applications, comprising approximately 91 000 unique chemical structures. Compared to existing datasets, this dataset contains substantially larger molecules (up to 200 atoms) as well as extrapolated properties for long polymer chains. We show that message-passing neural networks trained with and without 3D structural information for these molecules achieve similar accuracy, comparable to state-of-the-art methods on existing benchmark datasets. These results therefore emphasize that for larger molecules with practical applications, near-optimal prediction results can be obtained without using optimized 3D geometry as an input. We further show that learned molecular representations can be leveraged to reduce the training data required to transfer predictions to a new density functional theory functional.
Collapse
Affiliation(s)
- Peter C St John
- Biosciences Center, National Renewable Energy Laboratory, Golden, Colorado 80401-3393, USA
| | - Caleb Phillips
- Computational Science Center, National Renewable Energy Laboratory, Golden, Colorado 80401-3393, USA
| | - Travis W Kemper
- Computational Science Center, National Renewable Energy Laboratory, Golden, Colorado 80401-3393, USA
| | - A Nolan Wilson
- National Biaoenergy Center, National Renewable Energy Laboratory, Golden, Colorado 80401-3393, USA
| | - Yanfei Guan
- Department of Chemistry, Colorado State University, Fort Collins, Colorado 80523-1872, USA
| | - Michael F Crowley
- Biosciences Center, National Renewable Energy Laboratory, Golden, Colorado 80401-3393, USA
| | - Mark R Nimlos
- National Biaoenergy Center, National Renewable Energy Laboratory, Golden, Colorado 80401-3393, USA
| | - Ross E Larsen
- Computational Science Center, National Renewable Energy Laboratory, Golden, Colorado 80401-3393, USA
| |
Collapse
|