51
|
Enhancing Carbon Acid pK a Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values. Molecules 2021; 26:molecules26041048. [PMID: 33671348 PMCID: PMC7922142 DOI: 10.3390/molecules26041048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Revised: 02/08/2021] [Accepted: 02/11/2021] [Indexed: 11/25/2022] Open
Abstract
The prediction of the aqueous pKa of carbon acids by Quantitative Structure Property Relationship or cheminformatics-based methods is a rather arduous problem. Primarily, there are insufficient high-quality experimental data points measured in homogeneous conditions to allow for a good global model to be generated. In our computationally efficient pKa prediction method, we generate an atom-type feature vector, called a distance spectrum, from the assigned ionisation atom, and learn coefficients for those atom-types that show the impact each atom-type has on the pKa of the ionisable centre. In the current work, we augment our dataset with pKa values from a series of high performing local models derived from the Ab Initio Bond Lengths method (AIBL). We find that, in distilling the knowledge available from multiple models into one general model, the prediction error for an external test set is reduced compared to that using literature experimental data alone.
Collapse
|
52
|
Lee CK, Lu C, Yu Y, Sun Q, Hsieh CY, Zhang S, Liu Q, Shi L. Transfer learning with graph neural networks for optoelectronic properties of conjugated oligomers. J Chem Phys 2021; 154:024906. [PMID: 33445906 DOI: 10.1063/5.0037863] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Despite the remarkable progress of machine learning (ML) techniques in chemistry, modeling the optoelectronic properties of long conjugated oligomers and polymers with ML remains challenging due to the difficulty in obtaining sufficient training data. Here, we use transfer learning to address the data scarcity issue by pre-training graph neural networks using data from short oligomers. With only a few hundred training data, we are able to achieve an average error of about 0.1 eV for the excited-state energy of oligothiophenes against time-dependent density functional theory (TDDFT) calculations. We show that the success of our transfer learning approach relies on the relative locality of low-lying electronic excitations in long conjugated oligomers. Finally, we demonstrate the transferability of our approach by modeling the lowest-lying excited-state energies of poly(3-hexylthiophene) in its single-crystal and solution phases using the transfer learning models trained with the data of gas-phase oligothiophenes. The transfer learning predicted excited-state energy distributions agree quantitatively with TDDFT calculations and capture some important qualitative features observed in experimental absorption spectra.
Collapse
Affiliation(s)
| | - Chengqiang Lu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Yue Yu
- Chemistry and Chemical Biology, University of California, Merced, California 95343, USA
| | - Qiming Sun
- Tencent America, Palo Alto, California 94306, USA
| | | | | | - Qi Liu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Liang Shi
- Chemistry and Chemical Biology, University of California, Merced, California 95343, USA
| |
Collapse
|
53
|
Rodrigues JF, Florea L, de Oliveira MCF, Diamond D, Oliveira ON. Big data and machine learning for materials science. DISCOVER MATERIALS 2021; 1:12. [PMID: 33899049 PMCID: PMC8054236 DOI: 10.1007/s43939-021-00012-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/01/2021] [Indexed: 05/11/2023]
Abstract
Herein, we review aspects of leading-edge research and innovation in materials science that exploit big data and machine learning (ML), two computer science concepts that combine to yield computational intelligence. ML can accelerate the solution of intricate chemical problems and even solve problems that otherwise would not be tractable. However, the potential benefits of ML come at the cost of big data production; that is, the algorithms demand large volumes of data of various natures and from different sources, from material properties to sensor data. In the survey, we propose a roadmap for future developments with emphasis on computer-aided discovery of new materials and analysis of chemical sensing compounds, both prominent research fields for ML in the context of materials science. In addition to providing an overview of recent advances, we elaborate upon the conceptual and practical limitations of big data and ML applied to materials science, outlining processes, discussing pitfalls, and reviewing cases of success and failure.
Collapse
Affiliation(s)
- Jose F. Rodrigues
- Institute of Mathematical Sciences and Computing, University of São Paulo (USP), São Carlos, SP Brazil
| | - Larisa Florea
- SFI Research Centre for Advanced Materials and BioEngineering Research Trinity College Dublin, The University of Dublin, Dublin, Ireland
| | - Maria C. F. de Oliveira
- Institute of Mathematical Sciences and Computing, University of São Paulo (USP), São Carlos, SP Brazil
| | - Dermot Diamond
- Insight Centre for Data Analytics, National Centre for Sensor Research, Dublin City University, Dublin 9, Dublin, Ireland
| | - Osvaldo N. Oliveira
- São Carlos Institute of Physics, University of São Paulo (USP), São Carlos, SP Brazil
| |
Collapse
|
54
|
Khrenova MG, Mulashkin FD, Bulavko ES, Zakharova TM, Nemukhin AV. Dipole Moment Variation Clears Up Electronic Excitations in the π-Stacked Complexes of Fluorescent Protein Chromophores. J Chem Inf Model 2020; 60:6288-6297. [PMID: 33206518 DOI: 10.1021/acs.jcim.0c01028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We propose a quantitative structure-property relationship (QSPR) model for prediction of spectral tuning in cyan, green, orange, and red fluorescent proteins, which are engineered by motifs of the green fluorescent protein. Protein variants, in which their chromophores are involved in the π-stacking interaction with amino acid residues tyrosine, phenylalanine, and histidine, are prospective markers useful in bioimaging and super-resolution microscopy. In this work, we constructed training sets of the π-stacked complexes of four fluorescent protein chromophores (of the green, orange, red, and cyan series) with various substituted benzenes and imidazoles and tested the use of dipole moment variation upon excitation (DMV) as a descriptor to evaluate the vertical excitation energies in these systems. To validate this approach, we computed and analyzed electron density distributions of the π-stacked complexes and correlated the QSPR predictions with the reference values of the transition energies obtained using the high-level ab initio quantum chemistry methods. According to our results, the use of the DMV descriptor allows one to predict excitation energies in the π-stacked complexes with errors not exceeding 0.1 eV, which makes this model a practically useful tool in the development of efficient fluorescent markers for in vivo imaging.
Collapse
Affiliation(s)
- Maria G Khrenova
- Department of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russian Federation.,Bach Institute of Biochemistry, Federal Research Centre "Fundamentals of Biotechnology" of the Russian Academy of Sciences, Moscow 119071, Russian Federation
| | - Fedor D Mulashkin
- Department of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russian Federation
| | - Egor S Bulavko
- Department of Biology, Lomonosov Moscow State University, Moscow 119991, Russian Federation
| | - Tatiana M Zakharova
- Department of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russian Federation
| | - Alexander V Nemukhin
- Department of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russian Federation.,Emanuel Institute of Biochemical Physics, Russian Academy of Sciences, Moscow 119334, Russian Federation
| |
Collapse
|
55
|
Rahaman O, Gagliardi A. Deep Learning Total Energies and Orbital Energies of Large Organic Molecules Using Hybridization of Molecular Fingerprints. J Chem Inf Model 2020; 60:5971-5983. [PMID: 33118351 DOI: 10.1021/acs.jcim.0c00687] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
The ability to predict material properties without the need for resource-consuming experimental efforts can immensely accelerate material and drug discovery. Although ab initio methods can be reliable and accurate in making such predictions, they are computationally too expensive on a large scale. The recent advancements in artificial intelligence and machine learning as well as the availability of large quantum mechanics derived datasets enable us to train models on these datasets as a benchmark and to make fast predictions on much larger datasets. The success of these machine learning models highly depends on the machine-readable fingerprints of the molecules that capture their chemical properties as well as topological information. In this work, we propose a common deep learning-based framework to combine different types of molecular fingerprints to enhance prediction accuracy. A graph neural network (GNN), many-body tensor representation (MBTR), and a set of simple molecular descriptors (MD) were used to predict the total energies, highest occupied molecular orbital (HOMO) energies, and lowest unoccupied molecular orbital (LUMO) energies of a dataset containing ∼62k large organic molecules with complex aromatic rings and remarkably diverse functional groups. The results demonstrate that a combination of best performing molecular fingerprints can produce better results than the individual ones. The simple and flexible deep learning framework developed in this work can be easily adapted to incorporate other types of molecular fingerprints.
Collapse
Affiliation(s)
- Obaidur Rahaman
- Technische Universität München, Karlstr. 45, 80333 Munich, Germany
| | | |
Collapse
|
56
|
Aggarwal A, Vinayak V, Bag S, Bhattacharyya C, Waghmare UV, Maiti PK. Predicting the DNA Conductance Using a Deep Feedforward Neural Network Model. J Chem Inf Model 2020; 61:106-114. [PMID: 33320660 DOI: 10.1021/acs.jcim.0c01072] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Double-stranded DNA (dsDNA) has been established as an efficient medium for charge migration, bringing it to the forefront of the field of molecular electronics and biological research. The charge migration rate is controlled by the electronic couplings between the two nucleobases of DNA/RNA. These electronic couplings strongly depend on the intermolecular geometry and orientation. Estimating these electronic couplings for all the possible relative geometries of molecules using the computationally demanding first-principles calculations requires a lot of time and computational resources. In this article, we present a machine learning (ML)-based model to calculate the electronic coupling between any two bases of dsDNA/dsRNA and bypass the computationally expensive first-principles calculations. Using the Coulomb matrix representation which encodes the atomic identities and coordinates of the DNA base pairs to prepare the input dataset, we train a feedforward neural network model. Our neural network (NN) model can predict the electronic couplings between dsDNA base pairs with any structural orientation with a mean absolute error (MAE) of less than 0.014 eV. We further use the NN-predicted electronic coupling values to compute the dsDNA/dsRNA conductance.
Collapse
Affiliation(s)
- Abhishek Aggarwal
- Center for Condensed Matter Theory, Department of Physics, Indian Institute of Science, Bangalore 560012, India
| | - Vinayak Vinayak
- Undergraduate Program, Indian Institute of Science, Bangalore 560012, India
| | - Saientan Bag
- Center for Condensed Matter Theory, Department of Physics, Indian Institute of Science, Bangalore 560012, India
| | - Chiranjib Bhattacharyya
- Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560012, India
| | - Umesh V Waghmare
- Theoretical Sciences Unit, Jawaharlal Nehru Center for Advanced Scientific Research, Jakkur P.O., Bangalore 560064, India
| | - Prabal K Maiti
- Center for Condensed Matter Theory, Department of Physics, Indian Institute of Science, Bangalore 560012, India
| |
Collapse
|
57
|
Prediction of Henry's law constants of CO2 in imidazole ionic liquids using machine learning methods based on empirical descriptors. CHEMICAL PAPERS 2020. [DOI: 10.1007/s11696-020-01415-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
58
|
Ayoubi‐Chianeh M, Kassaee MZ. Stable four‐membered cyclosilylenes at theoretical levels. J CHIN CHEM SOC-TAIP 2020. [DOI: 10.1002/jccs.202000338] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
59
|
Nakata M, Shimazaki T, Hashimoto M, Maeda T. PubChemQC PM6: Data Sets of 221 Million Molecules with Optimized Molecular Geometries and Electronic Properties. J Chem Inf Model 2020; 60:5891-5899. [PMID: 33104339 DOI: 10.1021/acs.jcim.0c00740] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
We report on optimized molecular geometries and electronic properties calculated by the PM6 method for 94.0% of the 91.6 million molecules cataloged in PubChem Compounds retrieved on August 29, 2016. In addition to neutral states, we also calculated those for cationic, anionic, and spin flipped electronic states of 56.2%, 49.7%, and 41.3% of the molecules, respectively. Thus, the grand total of the PM6 calculations amounted to 221 million. We compared the resulting molecular geometries with B3LYP/6-31G* optimized geometries for 2.6 million molecules. The root-mean-square deviations in bond length and bond angle were approximately 0.016 Å and 1.7°, respectively. Then, using linear regression to examine the HOMO energy levels E(HOMO) in the B3LYP and PM6 calculations, we found that EB3LYP(HOMO) = 0.876EPM6(HOMO) + 1.975 (eV) and calculated the coefficient of determination to be 0.803. Likewise, we examined the LUMO energy levels and found EB3LYP(LUMO) = 1.069EPM6(LUMO) - 0.420 (eV); the coefficient of determination was 0.842. We also generated four subdata sets, each of which was composed of molecules with molecular weights less than 500. Subdata set i contained C, H, O and N, ii contained C, H, N, O, P, and S, iii contained C, H, N, O, P, S, F, and Cl, and iv contained C, H, N, O, P, S, F, Cl, Na, K, Mg, and Ca. The data sets are available at http://pubchemqc.riken.jp/pm6_datasets.html under a Creative Commons Attribution 4.0 International license.
Collapse
Affiliation(s)
- Maho Nakata
- RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako-City, Saitama 351-0198, Japan
| | - Tomomi Shimazaki
- Graduate School of System Informatics, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe, Hyogo 657-8501, JAPAN
| | - Masatomo Hashimoto
- Software Technology and Artificial Intelligence Research Laboratory, Chiba Institute of Technology, 2-17-1 Tsudanuma, Narashino, Chiba 275-0016, Japan
| | - Toshiyuki Maeda
- Software Technology and Artificial Intelligence Research Laboratory, Chiba Institute of Technology, 2-17-1 Tsudanuma, Narashino, Chiba 275-0016, Japan
| |
Collapse
|
60
|
Westermayr J, Marquetand P. Machine learning and excited-state molecular dynamics. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/ab9c3e] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
61
|
Thermodynamic radii of lanthanide ions derived from metal–ligand complexes stability constants. J INCL PHENOM MACRO 2020. [DOI: 10.1007/s10847-020-01010-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
62
|
Yu H, Wang Y, Wang X, Zhang J, Ye S, Huang Y, Luo Y, Sharman E, Chen S, Jiang J. Using Machine Learning to Predict the Dissociation Energy of Organic Carbonyls. J Phys Chem A 2020; 124:3844-3850. [PMID: 32315178 DOI: 10.1021/acs.jpca.0c01280] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Bond dissociation energy (BDE), an indicator of the strength of chemical bonds, exhibits great potential for evaluating and screening high-performance materials and catalysts, which are of critical importance in industrial applications. However, the measurement or computation of BDE via conventional experimental or theoretical methods is usually costly and involved, substantially preventing the BDE from being applied to large-scale and high-throughput studies. Therefore, a potentially more efficient approach for estimating BDE is highly desirable. To this end, we combined first-principles calculations and machine learning techniques, including neural networks and random forest, to explore the inner relationships between carbonyl structure and its BDE. Results show that machine learning can not only effectively reproduce the computed BDEs of carbonyls but also in turn serve as guidance for the rational design of carbonyl structure aimed at optimizing performance.
Collapse
Affiliation(s)
- Haishan Yu
- Hefei National Laboratory for Physical Sciences at the Microscale, Department of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Ying Wang
- Key Laboratory of Cluster Science of Ministry of Education, School of Chemistry and Chemical Engineering, Beijing Institute of Technology, Beijing 100081, China
| | - Xijun Wang
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh 27606, North Carolina, United States
| | - Jinxiao Zhang
- Hefei National Laboratory for Physical Sciences at the Microscale, Department of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Sheng Ye
- Hefei National Laboratory for Physical Sciences at the Microscale, Department of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Yan Huang
- Hefei National Laboratory for Physical Sciences at the Microscale, Department of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Yi Luo
- Hefei National Laboratory for Physical Sciences at the Microscale, Department of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Edward Sharman
- Department of Neurology, University of California, Irvine 92697, California, United States
| | - Shilu Chen
- Key Laboratory of Cluster Science of Ministry of Education, School of Chemistry and Chemical Engineering, Beijing Institute of Technology, Beijing 100081, China
| | - Jun Jiang
- Hefei National Laboratory for Physical Sciences at the Microscale, Department of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| |
Collapse
|
63
|
Antimicrobial Isoflavones and Derivatives from Erythrina (Fabaceae): Structure Activity Perspective (Sar & Qsar) on Experimental and Mined Values Against Staphylococcus Aureus. Antibiotics (Basel) 2020; 9:antibiotics9050223. [PMID: 32365905 PMCID: PMC7277434 DOI: 10.3390/antibiotics9050223] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Revised: 04/28/2020] [Accepted: 04/28/2020] [Indexed: 12/14/2022] Open
Abstract
Prenylated (iso)flavonoids, -flavans and pterocarpans from taxa in Erythrina are repeatedly flagged as potent antimicrobial compounds. In the current study, bark from E. lysistemon was extracted and seven isoflavone derivatives were purified: erybraedin A (1), phaseollidin (2), abyssinone V-4′ methyl ether (3), eryzerin C (4), alpumisoflavone (5), cristacarpin (6) and lysisteisoflavone (7). Minimum inhibition concentration (MIC) values were determined against a range of species of bacteria (skin pathogens), then values for another 67 derivatives from Erythrina, only against Staphylococcus aureus, were mined from the literature. Of the seven isolates, MIC values widely ranged from 1–600 μg/mL, with no obvious pattern of selectivity for Gram-types. Nevertheless, using the mined and experimentally determined values against S. aureus, Klekota-Roth fragments (Structure Activity Relationship: SAR) were determined then used as molecular descriptors to make a ‘decision tree’ based on structural characters inspired by the classes of antimicrobial potency (classes A-D). Furthermore, to make quantitative predictions of MIC values (Quantitative SAR: QSAR) ‘pace regression’ was utilized and validated (R² = 0.778, Q² = 0.727 and P² = 0.555). Evidently, the position and degree of prenylation is important; however, the presence of hydroxyl groups at positions 5 and 7 in ring A and 4′ in ring B is associated with lower MIC values. While antimicrobial results continue to validate the traditional use of E. lysistemon extracts (or Erythrina generally) in therapeutic applications consistent with anti-infection, it is surprising that this class of compound is not being utilized more often in general industry applications, such as food or cosmetic preservation, or in topical antimicrobial creams. Prenylated (iso)flavonoids are derived from several other Genera, such as Dorstenia (Moraceae), Ficus (Moraceae), Glycyrrhiza (Fabaceae), Paulownia (Lamiales) or Pomifera (Moraceae).
Collapse
|
64
|
Fine JA, Rajasekar AA, Jethava KP, Chopra G. Spectral deep learning for prediction and prospective validation of functional groups. Chem Sci 2020; 11:4618-4630. [PMID: 34122917 PMCID: PMC8152587 DOI: 10.1039/c9sc06240h] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 03/13/2020] [Indexed: 01/06/2023] Open
Abstract
State-of-the-art identification of the functional groups present in an unknown chemical entity requires the expertise of a skilled spectroscopist to analyse and interpret Fourier transform infra-red (FTIR), mass spectroscopy (MS) and/or nuclear magnetic resonance (NMR) data. This process can be time-consuming and error-prone, especially for complex chemical entities that are poorly characterised in the literature, or inefficient to use with synthetic robots producing molecules at an accelerated rate. Herein, we introduce a fast, multi-label deep neural network for accurately identifying all the functional groups of unknown compounds using a combination of FTIR and MS spectra. We do not use any database, pre-established rules, procedures, or peak-matching methods. Our trained neural network reveals patterns typically used by human chemists to identify standard groups. Finally, we experimentally validated our neural network, trained on single compounds, to predict functional groups in compound mixtures. Our methodology showcases practical utility for future use in autonomous analytical detection.
Collapse
Affiliation(s)
- Jonathan A Fine
- Department of Chemistry, Purdue University 560 Oval Drive West Lafayette IN 47907 USA
| | - Anand A Rajasekar
- Department of Biological Engineering, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras Chennai 600036 India
| | - Krupal P Jethava
- Department of Chemistry, Purdue University 560 Oval Drive West Lafayette IN 47907 USA
| | - Gaurav Chopra
- Department of Chemistry, Purdue University 560 Oval Drive West Lafayette IN 47907 USA
| |
Collapse
|
65
|
Kammeraad JA, Goetz J, Walker EA, Tewari A, Zimmerman PM. What Does the Machine Learn? Knowledge Representations of Chemical Reactivity. J Chem Inf Model 2020; 60:1290-1301. [PMID: 32091880 DOI: 10.1021/acs.jcim.9b00721] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
In a departure from conventional chemical approaches, data-driven models of chemical reactions have recently been shown to be statistically successful using machine learning. These models, however, are largely black box in character and have not provided the kind of chemical insights that historically advanced the field of chemistry. To examine the knowledgebase of machine-learning models-what does the machine learn-this article deconstructs black-box machine-learning models of a diverse chemical reaction data set. Through experimentation with chemical representations and modeling techniques, the analysis provides insights into the nature of how statistical accuracy can arise, even when the model lacks informative physical principles. By peeling back the layers of these complicated models we arrive at a minimal, chemically intuitive model (and no machine learning involved). This model is based on systematic reaction-type classification and Evans-Polanyi relationships within reaction types which are easily visualized and interpreted. Through exploring this simple model, we gain deeper understanding of the data set and uncover a means for expert interactions to improve the model's reliability.
Collapse
Affiliation(s)
- Joshua A Kammeraad
- Department of Chemistry, University of Michigan, 930 North University Avenue, Ann Arbor, Michigan 48109, United States
| | - Jack Goetz
- Department of Statistics, University of Michigan, 1085 South University Avenue, Ann Arbor, Michigan 48109, United States
| | - Eric A Walker
- Department of Chemistry, University of Michigan, 930 North University Avenue, Ann Arbor, Michigan 48109, United States
| | - Ambuj Tewari
- Department of Statistics, University of Michigan, 1085 South University Avenue, Ann Arbor, Michigan 48109, United States
| | - Paul M Zimmerman
- Department of Chemistry, University of Michigan, 930 North University Avenue, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
66
|
Selvaratnam B, Koodali RT, Miró P. Application of Symmetry Functions to Large Chemical Spaces Using a Convolutional Neural Network. J Chem Inf Model 2020; 60:1928-1935. [PMID: 32053367 DOI: 10.1021/acs.jcim.9b00835] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Balaranjan Selvaratnam
- Department of Chemistry, University of South Dakota, 57069 Vermillion, South Dakota, United States
| | - Ranjit T. Koodali
- Department of Chemistry, University of South Dakota, 57069 Vermillion, South Dakota, United States
| | - Pere Miró
- Department of Chemistry, University of South Dakota, 57069 Vermillion, South Dakota, United States
| |
Collapse
|
67
|
Chowdhury AJ, Yang W, Abdelfatah KE, Zare M, Heyden A, Terejanu GA. A Multiple Filter Based Neural Network Approach to the Extrapolation of Adsorption Energies on Metal Surfaces for Catalysis Applications. J Chem Theory Comput 2020; 16:1105-1114. [PMID: 31962041 DOI: 10.1021/acs.jctc.9b00986] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Computational catalyst discovery involves the development of microkinetic reactor models based on estimated parameters determined from density functional theory (DFT). For complex surface chemistries, the number of reaction intermediates can be very large, and the cost of calculating the adsorption energies by DFT for all surface intermediates even for one active site model can become prohibitive. In this paper, we have identified appropriate descriptors and machine learning models that can be used to predict a significant part of these adsorption energies given data on the rest of them. Moreover, our investigations also included the case when the species data used to train the predictive model are of different size relative to the species the model tries to predict-this is an extrapolation in the data space which is typically difficult with regular machine learning models. Due to the relative size of the available data sets, we have attempted to extrapolate from the larger species to the smaller ones in the current work. Here, we have developed a neural network based predictive model that combines an established additive atomic contribution based model with the concepts of a convolutional neural network that, when extrapolating, achieves a statistically significant improvement over the previous models.
Collapse
Affiliation(s)
| | | | | | | | | | - Gabriel A Terejanu
- Department of Computer Science , University of North Carolina at Charlotte , Charlotte , North Carolina 28223 , United States
| |
Collapse
|
68
|
Glavatskikh M, Leguy J, Hunault G, Cauchy T, Da Mota B. Dataset's chemical diversity limits the generalizability of machine learning predictions. J Cheminform 2019; 11:69. [PMID: 33430991 PMCID: PMC6852905 DOI: 10.1186/s13321-019-0391-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 10/28/2019] [Indexed: 01/18/2023] Open
Abstract
The QM9 dataset has become the golden standard for Machine Learning (ML) predictions of various chemical properties. QM9 is based on the GDB, which is a combinatorial exploration of the chemical space. ML molecular predictions have been recently published with an accuracy on par with Density Functional Theory calculations. Such ML models need to be tested and generalized on real data. PC9, a new QM9 equivalent dataset (only H, C, N, O and F and up to 9 “heavy” atoms) of the PubChemQC project is presented in this article. A statistical study of bonding distances and chemical functions shows that this new dataset encompasses more chemical diversity. Kernel Ridge Regression, Elastic Net and the Neural Network model provided by SchNet have been used on both datasets. The overall accuracy in energy prediction is higher for the QM9 subset. However, a model trained on PC9 shows a stronger ability to predict energies of the other dataset. ![]()
Collapse
Affiliation(s)
- Marta Glavatskikh
- LERIA, University of Angers, 2 Bd Lavoisier, 49045, Angers, France.,Laboratoire MOLTECH-Anjou, UMR CNRS 6200, SFR MATRIX, UNIV Angers, 2 Bd Lavoisier, 49045, Angers, France
| | - Jules Leguy
- LERIA, University of Angers, 2 Bd Lavoisier, 49045, Angers, France
| | - Gilles Hunault
- LERIA, University of Angers, 2 Bd Lavoisier, 49045, Angers, France.,HIFIH, EA 3859, Institut de Biologie en Santé PBH-IRIS, CHU, University of Angers, 4, Rue Larrey, 49933, Angers, France
| | - Thomas Cauchy
- Laboratoire MOLTECH-Anjou, UMR CNRS 6200, SFR MATRIX, UNIV Angers, 2 Bd Lavoisier, 49045, Angers, France.
| | - Benoit Da Mota
- LERIA, University of Angers, 2 Bd Lavoisier, 49045, Angers, France
| |
Collapse
|
69
|
Guda AA, Guda SA, Lomachenko KA, Soldatov MA, Pankin IA, Soldatov AV, Braglia L, Bugaev AL, Martini A, Signorile M, Groppo E, Piovano A, Borfecchia E, Lamberti C. Quantitative structural determination of active sites from in situ and operando XANES spectra: From standard ab initio simulations to chemometric and machine learning approaches. Catal Today 2019. [DOI: 10.1016/j.cattod.2018.10.071] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
70
|
Wang H, Ji Y, Li Y. Simulation and design of energy materials accelerated by machine learning. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1421] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Hongshuai Wang
- Jiangsu Key Laboratory for Carbon‐Based Functional Materials and Devices, Institute of Functional Nano & Soft Materials (FUNSOM) Soochow University Suzhou PR China
| | - Yujin Ji
- Jiangsu Key Laboratory for Carbon‐Based Functional Materials and Devices, Institute of Functional Nano & Soft Materials (FUNSOM) Soochow University Suzhou PR China
| | - Youyong Li
- Jiangsu Key Laboratory for Carbon‐Based Functional Materials and Devices, Institute of Functional Nano & Soft Materials (FUNSOM) Soochow University Suzhou PR China
| |
Collapse
|
71
|
Chang AM, Freeze JG, Batista VS. Hammett neural networks: prediction of frontier orbital energies of tungsten-benzylidyne photoredox complexes. Chem Sci 2019; 10:6844-6854. [PMID: 31391907 PMCID: PMC6657405 DOI: 10.1039/c9sc02339a] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 06/02/2019] [Indexed: 11/21/2022] Open
Abstract
The successful application of Hammett parameters as input features for regressive machine learning models is demonstrated and applied to predict energies of frontier orbitals of highly reducing tungsten–alkylidyne complexes of the form W(
Created by potrace 1.16, written by Peter Selinger 2001-2019
]]>
CArR)L4X.
The successful application of Hammett parameters as input features for regressive machine learning models is demonstrated and applied to predict energies of frontier orbitals of highly reducing tungsten–benzylidyne complexes of the form W(
Created by potrace 1.16, written by Peter Selinger 2001-2019
]]>
CArR)L4X. Using a reference molecular framework and the meta- and para-substituent Hammett parameters of the ligands, the models predict energies of frontier orbitals that correlate with redox potentials. The regressive models capture the multivariate character of electron-donating trends as influenced by multiple substituents even for non-aryl ligands, harnessing the breadth of Hammett parameters in a generalized model. We find a tungsten catalyst with tetramethylethylenediamine (tmeda) equatorial ligands and axial methoxyl substituents that should attract significant experimental interest since it is predicted to be highly reducing when photoactivated with visible light. The utilization of Hammett parameters in this study presents a generalizable and compact representation for exploring the effects of ligand substitutions.
Collapse
Affiliation(s)
- Alexander M Chang
- Department of Chemistry , Energy Sciences Institute , Yale University , New Haven , CT 06520 , USA .
| | - Jessica G Freeze
- Department of Chemistry , Energy Sciences Institute , Yale University , New Haven , CT 06520 , USA .
| | - Victor S Batista
- Department of Chemistry , Energy Sciences Institute , Yale University , New Haven , CT 06520 , USA .
| |
Collapse
|
72
|
Stuke A, Todorović M, Rupp M, Kunkel C, Ghosh K, Himanen L, Rinke P. Chemical diversity in molecular orbital energy predictions with kernel ridge regression. J Chem Phys 2019; 150:204121. [DOI: 10.1063/1.5086105] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Affiliation(s)
- Annika Stuke
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
| | - Milica Todorović
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
| | - Matthias Rupp
- Fritz Haber Institute of the Max Planck Society, Faradayweg 4-6, 14195 Berlin, Germany
| | - Christian Kunkel
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Lichtenbergstr. 4, 85747 Garching, Germany
| | - Kunal Ghosh
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
- Department of Computer Science, Aalto University, P.O. Box 15400, Aaalto FI-00076, Finland
| | - Lauri Himanen
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
| | - Patrick Rinke
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Lichtenbergstr. 4, 85747 Garching, Germany
| |
Collapse
|
73
|
Ghosh K, Stuke A, Todorović M, Jørgensen PB, Schmidt MN, Vehtari A, Rinke P. Deep Learning Spectroscopy: Neural Networks for Molecular Excitation Spectra. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2019; 6:1801367. [PMID: 31065514 PMCID: PMC6498126 DOI: 10.1002/advs.201801367] [Citation(s) in RCA: 105] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Revised: 12/21/2018] [Indexed: 05/19/2023]
Abstract
Deep learning methods for the prediction of molecular excitation spectra are presented. For the example of the electronic density of states of 132k organic molecules, three different neural network architectures: multilayer perceptron (MLP), convolutional neural network (CNN), and deep tensor neural network (DTNN) are trained and assessed. The inputs for the neural networks are the coordinates and charges of the constituent atoms of each molecule. Already, the MLP is able to learn spectra, but the root mean square error (RMSE) is still as high as 0.3 eV. The learning quality improves significantly for the CNN (RMSE = 0.23 eV) and reaches its best performance for the DTNN (RMSE = 0.19 eV). Both CNN and DTNN capture even small nuances in the spectral shape. In a showcase application of this method, the structures of 10k previously unseen organic molecules are scanned and instant spectra predictions are obtained to identify molecules for potential applications.
Collapse
Affiliation(s)
- Kunal Ghosh
- Department of Computer ScienceAalto UniversityP.O. Box 15400AaltoFI‐00076Finland
- Department of Applied PhysicsAalto UniversityP.O. Box 11100AaltoFI‐00076Finland
| | - Annika Stuke
- Department of Applied PhysicsAalto UniversityP.O. Box 11100AaltoFI‐00076Finland
| | - Milica Todorović
- Department of Applied PhysicsAalto UniversityP.O. Box 11100AaltoFI‐00076Finland
| | - Peter Bjørn Jørgensen
- Department of Applied Mathematics and Computer ScienceTechnical University of DenmarkRichard Petersens Plads,2800 Kgs.LyngbyDenmark
| | - Mikkel N. Schmidt
- Department of Applied Mathematics and Computer ScienceTechnical University of DenmarkRichard Petersens Plads,2800 Kgs.LyngbyDenmark
| | - Aki Vehtari
- Department of Computer ScienceAalto UniversityP.O. Box 15400AaltoFI‐00076Finland
| | - Patrick Rinke
- Department of Applied PhysicsAalto UniversityP.O. Box 11100AaltoFI‐00076Finland
- Chair for Theoretical Chemistry and Catalysis Research CenterTechnische Universität MünchenLichtenbergstr. 4,D‐85747GarchingGermany
| |
Collapse
|
74
|
Panapitiya G, Avendaño-Franco G, Ren P, Wen X, Li Y, Lewis JP. Machine-Learning Prediction of CO Adsorption in Thiolated, Ag-Alloyed Au Nanoclusters. J Am Chem Soc 2018; 140:17508-17514. [PMID: 30406644 DOI: 10.1021/jacs.8b08800] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Gihan Panapitiya
- Department of Physics and Astronomy, West Virginia University, Morgantown, West Virginia 26506-6315, United States
| | - Guillermo Avendaño-Franco
- Department of Physics and Astronomy, West Virginia University, Morgantown, West Virginia 26506-6315, United States
| | - Pengju Ren
- State Key Laboratory of Coal Conversion, Institute of Coal Chemistry, Chinese Academy of Sciences, Taiyuan, Shanxi 030001, China
- Synfuels China Co. Ltd., Huairou, Beijing 101407, China
| | - Xiaodong Wen
- State Key Laboratory of Coal Conversion, Institute of Coal Chemistry, Chinese Academy of Sciences, Taiyuan, Shanxi 030001, China
- Synfuels China Co. Ltd., Huairou, Beijing 101407, China
| | - Yongwang Li
- State Key Laboratory of Coal Conversion, Institute of Coal Chemistry, Chinese Academy of Sciences, Taiyuan, Shanxi 030001, China
- Synfuels China Co. Ltd., Huairou, Beijing 101407, China
| | - James P. Lewis
- Department of Physics and Astronomy, West Virginia University, Morgantown, West Virginia 26506-6315, United States
- State Key Laboratory of Coal Conversion, Institute of Coal Chemistry, Chinese Academy of Sciences, Taiyuan, Shanxi 030001, China
| |
Collapse
|
75
|
Pereira F, Aires-de-Sousa J. Machine learning for the prediction of molecular dipole moments obtained by density functional theory. J Cheminform 2018; 10:43. [PMID: 30136001 PMCID: PMC6104469 DOI: 10.1186/s13321-018-0296-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 08/11/2018] [Indexed: 11/29/2022] Open
Abstract
Machine learning (ML) algorithms were explored for the fast estimation of molecular dipole moments calculated by density functional theory (DFT) by B3LYP/6-31G(d,p) on the basis of molecular descriptors generated from DFT-optimized geometries and partial atomic charges obtained by empirical or ML schemes. A database was used with 10,071 structures, new molecular descriptors were designed and the models were validated with external test sets. Several ML algorithms were screened. Random forest regression models predicted an external test set of 3368 compounds achieving mean absolute error up to 0.44 D. The results represent a significant improvement of the dipole moments calculated using empirical point charges located at the nucleus, even assuming the DFT-optimized geometry (root mean square error, RMSE, of 0.68 D vs. 1.53 D and R2 = 0.87 vs. 0.66).![]()
Collapse
Affiliation(s)
- Florbela Pereira
- LAQV and REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal
| | - João Aires-de-Sousa
- LAQV and REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal.
| |
Collapse
|
76
|
Solov'ev VP, Ustynyuk YA, Zhokhova NI, Karpov KV. Predictive Models for HOMO and LUMO Energies of N-Donor Heterocycles as Ligands for Lanthanides Separation. Mol Inform 2018; 37:e1800025. [PMID: 29971949 DOI: 10.1002/minf.201800025] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Accepted: 06/20/2018] [Indexed: 11/11/2022]
Abstract
Quantum chemical calculations combined with QSPR methodology reveal challenging perspectives for the solution of a number of fundamental and applied problems. In this work, we performed the PM7 and DFT calculations and QSPR modeling of HOMO and LUMO energies for polydentate N-heterocyclic ligands promising for the extraction separation of lanthanides because these values are related to the ligands selectivity in the respect to the target cations. Data for QSPR modeling comprised the PM7 calculated HOMO and LUMO energies of N-donor heterocycles, including several types of both known and virtual undescribed polydentate ligands. Ensemble modeling included various molecular fragments as descriptors and different variable selection techniques to build consensus models (CMs) on a training set of 388 ligands using external cross-validation. CMs were then verified to make predictions for two external test sets: 45 ligands (T1) that were similar to the ligands of the training set, and 1546 structures (T2), which were substantially different from the ligands of the training set. The consensus models predict well in 5-fold cross-validation (RMSEHOMO =0.097 eV, RMSELUMO =0.064 eV), and on the external test sets (T1: RMSEHOMO =0.26 eV, RMSELUMO =0.24 eV; T2: RMSEHOMO =0.26 eV, RMSELUMO =0.17 eV). An analysis of the results reveals that substituents in heteroaromatic rings of the ligands and at the amide nitrogens can deeply influence their metal binding properties.
Collapse
Affiliation(s)
- Vitaly P Solov'ev
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Leninskiy prosp., 31, 119071, Moscow, Russia
| | - Yuri A Ustynyuk
- Chemistry Department, M.V. Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Nelly I Zhokhova
- Faculty of Physics, M.V. Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Kirill V Karpov
- Faculty of Physics, M.V. Lomonosov Moscow State University, 119991, Moscow, Russia
| |
Collapse
|
77
|
Jørgensen PB, Mesta M, Shil S, García Lastra JM, Jacobsen KW, Thygesen KS, Schmidt MN. Machine learning-based screening of complex molecules for polymer solar cells. J Chem Phys 2018; 148:241735. [DOI: 10.1063/1.5023563] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Peter Bjørn Jørgensen
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Richard Petersens Plads, 2800 Kongens Lyngby, Denmark
| | - Murat Mesta
- Department of Energy Conversion and Storage, Technical University of Denmark, Fysikvej, 2800 Kongens Lyngby, Denmark
| | - Suranjan Shil
- Department of Physics, Technical University of Denmark, Fysikvej, 2800 Kongens Lyngby, Denmark
| | - Juan Maria García Lastra
- Department of Energy Conversion and Storage, Technical University of Denmark, Fysikvej, 2800 Kongens Lyngby, Denmark
| | - Karsten Wedel Jacobsen
- Department of Physics, Technical University of Denmark, Fysikvej, 2800 Kongens Lyngby, Denmark
| | | | - Mikkel N. Schmidt
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Richard Petersens Plads, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
78
|
Lu SY, Mukhopadhyay S, Froese R, Zimmerman PM. Virtual Screening of Hole Transport, Electron Transport, and Host Layers for Effective OLED Design. J Chem Inf Model 2018; 58:2440-2449. [DOI: 10.1021/acs.jcim.8b00044] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Shao-Yu Lu
- Department of Chemistry, University of Michigan, 930 North University Avenue, Ann Arbor, Michigan 48109, United States
| | | | - Robert Froese
- The Dow Chemical Company, Midland, Michigan 48674, United States
| | - Paul M. Zimmerman
- Department of Chemistry, University of Michigan, 930 North University Avenue, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
79
|
Hiszpanski AM, Dsilva CJ, Kevrekidis IG, Loo YL. Data Mining for Parameters Affecting Polymorph Selection in Contorted Hexabenzocoronene Derivatives. CHEMISTRY OF MATERIALS : A PUBLICATION OF THE AMERICAN CHEMICAL SOCIETY 2018; 30:3330-3337. [PMID: 31178626 PMCID: PMC6550467 DOI: 10.1021/acs.chemmater.8b00679] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The macroscopic properties of molecular materials can be drastically influenced by their solid-state packing arrangements, of which there can be many (e.g., polymorphism). Strategies to controllably and predictively access select polymorphs are thus highly desired, but computationally predicting the conditions necessary to access a given polymorph is challenging with the current state of the art. Using derivatives of contorted hexabenzocoronene, cHBC, we employed data mining, rather than first-principles approaches, to find relationships between the crystallizing molecule, postdeposition solvent-vapor annealing conditions that induce polymorphic transformation, and the resulting polymorphs. This analysis yields a correlative function that can be used to successfully predict the appearance of either one of two polymorphs in thin films of cHBC derivatives. Within the postdeposition processing phase space of cHBC derivatives, we have demonstrated an approach to generate guidelines to select crystallization conditions to bias polymorph access. We believe this approach can be applied more broadly to accelerate the predictions of processing conditions to access desired molecular polymorphs, making progress toward one of the grand challenges identified by the Materials Genome Initiative.
Collapse
Affiliation(s)
- Anna M. Hiszpanski
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
- Materials Science Division, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
| | - Carmeline J. Dsilva
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - Ioannis G. Kevrekidis
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
- Department of Applied Mathematics and Statistics, John Hopkins University, Baltimore, Maryland 21218, United States
| | - Yueh-Lin Loo
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
- Andlinger Center for Energy and the Environment, Princeton University, Princeton, New Jersey 08544, United States
| |
Collapse
|
80
|
Abstract
Ab initio molecular dynamics is an irreplaceable technique for the realistic simulation of complex molecular systems and processes from first principles. This paper proposes a comprehensive and self-contained review of ab initio molecular dynamics from a computational perspective and from first principles. Quantum mechanics is presented from a molecular dynamics perspective. Various approximations and formulations are proposed, including the Ehrenfest, Born–Oppenheimer, and Hartree–Fock molecular dynamics. Subsequently, the Kohn–Sham formulation of molecular dynamics is introduced as well as the afferent concept of density functional. As a result, Car–Parrinello molecular dynamics is discussed, together with its extension to isothermal and isobaric processes. Car–Parrinello molecular dynamics is then reformulated in terms of path integrals. Finally, some implementation issues are analysed, namely, the pseudopotential, the orbital functional basis, and hybrid molecular dynamics.
Collapse
|
81
|
Random Forest Approach to QSPR Study of Fluorescence Properties Combining Quantum Chemical Descriptors and Solvent Conditions. J Fluoresc 2018; 28:695-706. [DOI: 10.1007/s10895-018-2233-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 04/11/2018] [Indexed: 10/17/2022]
|
82
|
Carpenter BK, Ezra GS, Farantos SC, Kramer ZC, Wiggins S. Empirical Classification of Trajectory Data: An Opportunity for the Use of Machine Learning in Molecular Dynamics. J Phys Chem B 2017; 122:3230-3241. [PMID: 28968092 DOI: 10.1021/acs.jpcb.7b08707] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Classical Hamiltonian trajectories are initiated at random points in phase space on a fixed energy shell of a model two degrees of freedom potential, consisting of two interacting minima in an otherwise flat energy plane of infinite extent. Below the energy of the plane, the dynamics are demonstrably chaotic. However, most of the work in this paper involves trajectories at a fixed energy that is 1% above that of the plane, in which regime the dynamics exhibit behavior characteristic of chaotic scattering. The trajectories are analyzed without reference to the potential, as if they had been generated in a typical direct molecular dynamics simulation. The questions addressed are whether one can recover useful information about the structures controlling the dynamics in phase space from the trajectory data alone, and whether, despite the at least partially chaotic nature of the dynamics, one can make statistically meaningful predictions of trajectory outcomes from initial conditions. It is found that key unstable periodic orbits, which can be identified on the analytical potential, appear by simple classification of the trajectories, and that the specific roles of these periodic orbits in controlling the dynamics are also readily discerned from the trajectory data alone. Two different approaches to predicting trajectory outcomes from initial conditions are evaluated, and it is shown that the more successful of them has ∼90% success. The results are compared with those from a simple neural network, which has higher predictive success (97%) but requires the information obtained from the "by-hand" analysis to achieve that level. Finally, the dynamics, which occur partly on the very flat region of the potential, show characteristics of the much-studied phenomenon called "roaming." On this potential, it is found that roaming trajectories are effectively "failed" periodic orbits and that angular momentum can be identified as a key controlling factor, despite the fact that it is not a strictly conserved quantity. It is also noteworthy that roaming on this potential occurs in the absence of a "roaming saddle," which has previously been hypothesized to be a necessary feature for roaming to occur.
Collapse
Affiliation(s)
- Barry K Carpenter
- School of Chemistry , Cardiff University , Cardiff CF10 3AT , United Kingdom
| | - Gregory S Ezra
- Department of Chemistry and Chemical Biology , Cornell University , Ithaca , New York 14853-1301 , United States
| | - Stavros C Farantos
- Institute of Electronic Structure and Laser, Foundation for Research and Technology - Hellas, and Department of Chemistry , University of Crete , Iraklion 711 10 , Greece
| | - Zeb C Kramer
- Department of Chemistry and Biochemistry , La Salle University , 1900 West Olney Avenue , Philadelphia , Pennsylvania 19141 , United States
| | - Stephen Wiggins
- School of Mathematics , University of Bristol , Bristol BS8 1TW , United Kingdom
| |
Collapse
|