Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: St John PC, Phillips C, Kemper TW, Wilson AN, Guan Y, Crowley MF, Nimlos MR, Larsen RE. Message-passing neural networks for high-throughput polymer screening. J Chem Phys 2019;150:234111. [PMID: 31228909 DOI: 10.1063/1.5099132] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

For:	St John PC, Phillips C, Kemper TW, Wilson AN, Guan Y, Crowley MF, Nimlos MR, Larsen RE. Message-passing neural networks for high-throughput polymer screening. J Chem Phys 2019;150:234111. [PMID: 31228909 DOI: 10.1063/1.5099132] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Number

Cited by Other Article(s)

Zhang C, Zhai Y, Gong Z, Duan H, She YB, Yang YF, Su A. Transfer learning across different chemical domains: virtual screening of organic materials with deep learning models pretrained on small molecule and chemical reaction data. J Cheminform 2024;16:89. [PMID: 39080777 PMCID: PMC11290278 DOI: 10.1186/s13321-024-00886-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 07/21/2024] [Indexed: 08/02/2024] Open

Abstract

Machine learning is becoming a preferred method for the virtual screening of organic materials due to its cost-effectiveness over traditional computationally demanding techniques. However, the scarcity of labeled data for organic materials poses a significant challenge for training advanced machine learning models. This study showcases the potential of utilizing databases of drug-like small molecules and chemical reactions to pretrain the BERT model, enhancing its performance in the virtual screening of organic materials. By fine-tuning the BERT models with data from five virtual screening tasks, the version pretrained with the USPTO-SMILES dataset achieved R2 scores exceeding 0.94 for three tasks and over 0.81 for two others. This performance surpasses that of models pretrained on the small molecule or organic materials databases and outperforms three traditional machine learning models trained directly on virtual screening data. The success of the USPTO-SMILES pretrained BERT model can be attributed to the diverse array of organic building blocks in the USPTO database, offering a broader exploration of the chemical space. The study further suggests that accessing a reaction database with a wider range of reactions than the USPTO could further enhance model performance. Overall, this research validates the feasibility of applying transfer learning across different chemical domains for the efficient virtual screening of organic materials.Scientific contributionThis study verifies the feasibility of applying transfer learning to large language models in different chemical fields to help organic materials perform virtual screening. Through the comparison of transfer learning from different chemical fields to a variety of organic material molecules, the high precision virtual screening of organic materials is realized.

Collapse

Affiliation(s)

Chengwei Zhang State Key Laboratory Breeding Base of Green Chemistry-Synthesis Technology, Key Laboratory of Green Chemistry-Synthesis Technology of Zhejiang Province, College of Chemical Engineering, Zhejiang University of Technology, Hangzhou, 310014, Zhejiang, China
Yushuang Zhai State Key Laboratory Breeding Base of Green Chemistry-Synthesis Technology, Key Laboratory of Green Chemistry-Synthesis Technology of Zhejiang Province, College of Chemical Engineering, Zhejiang University of Technology, Hangzhou, 310014, Zhejiang, China
Ziyang Gong Key Laboratory of Pharmaceutical Engineering of Zhejiang Province, Key Laboratory for Green Pharmaceutical Technologies and Related Equipment of Ministry of Education, Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China
Hongliang Duan Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
Yuan-Bin She State Key Laboratory Breeding Base of Green Chemistry-Synthesis Technology, Key Laboratory of Green Chemistry-Synthesis Technology of Zhejiang Province, College of Chemical Engineering, Zhejiang University of Technology, Hangzhou, 310014, Zhejiang, China
Yun-Fang Yang State Key Laboratory Breeding Base of Green Chemistry-Synthesis Technology, Key Laboratory of Green Chemistry-Synthesis Technology of Zhejiang Province, College of Chemical Engineering, Zhejiang University of Technology, Hangzhou, 310014, Zhejiang, China
An Su State Key Laboratory Breeding Base of Green Chemistry-Synthesis Technology, Key Laboratory of Green Chemistry-Synthesis Technology of Zhejiang Province, College of Chemical Engineering, Zhejiang University of Technology, Hangzhou, 310014, Zhejiang, China. Key Laboratory of Pharmaceutical Engineering of Zhejiang Province, Key Laboratory for Green Pharmaceutical Technologies and Related Equipment of Ministry of Education, Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals, Zhejiang University of Technology, Hangzhou, 310014, People's Republic of China.

Collapse

Wu T, Zhou M, Zou J, Chen Q, Qian F, Kurths J, Liu R, Tang Y. AI-guided few-shot inverse design of HDP-mimicking polymers against drug-resistant bacteria. Nat Commun 2024;15:6288. [PMID: 39060236 PMCID: PMC11282099 DOI: 10.1038/s41467-024-50533-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 07/11/2024] [Indexed: 07/28/2024] Open

Affiliation(s)

Tianyu Wu Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China
Min Zhou State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, China
Jingcheng Zou Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
Qi Chen Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
Feng Qian Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China
Jürgen Kurths Potsdam Institute for Climate Impact Research (PIK), Potsdam, 14473, Germany Institut für Physik, Humboldt-Universität zu Berlin, Berlin, 10115, Germany The Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, 200433, China
Runhui Liu State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, China. Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.
Yang Tang Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China.

Collapse

Wang S, Yue H, Yuan X. Accelerating Polymer Discovery with Uncertainty-Guided PGCNN: Explainable AI for Predicting Properties and Mechanistic Insights. J Chem Inf Model 2024;64:5500-5509. [PMID: 38953249 DOI: 10.1021/acs.jcim.4c00555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]

Huynh H, Le K, Vu L, Nguyen T, Holcomb M, Forli S, Phan H. Synergy of machine learning and density functional theory calculations for predicting experimental Lewis base affinity and Lewis polybase binding atoms. J Comput Chem 2024;45:1552-1561. [PMID: 38500409 PMCID: PMC11099847 DOI: 10.1002/jcc.27329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 01/26/2024] [Accepted: 01/31/2024] [Indexed: 03/20/2024]

Chen J, Schwaller P. Molecular hypergraph neural networks. J Chem Phys 2024;160:144307. [PMID: 38597317 DOI: 10.1063/5.0193557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 03/14/2024] [Indexed: 04/11/2024] Open

Sanchez Medina E, Kunchapu S, Sundmacher K. Gibbs-Helmholtz Graph Neural Network for the Prediction of Activity Coefficients of Polymer Solutions at Infinite Dilution. J Phys Chem A 2023;127:9863-9873. [PMID: 37943172 PMCID: PMC10683018 DOI: 10.1021/acs.jpca.3c05892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/18/2023] [Accepted: 10/25/2023] [Indexed: 11/10/2023]

Hu J, Li Z, Lin J, Zhang L. Prediction and Interpretability of Glass Transition Temperature of Homopolymers by Data-Augmented Graph Convolutional Neural Networks. ACS APPLIED MATERIALS & INTERFACES 2023;15:54006-54017. [PMID: 37934171 DOI: 10.1021/acsami.3c13698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]

Ochiai T, Inukai T, Akiyama M, Furui K, Ohue M, Matsumori N, Inuki S, Uesugi M, Sunazuka T, Kikuchi K, Kakeya H, Sakakibara Y. Variational autoencoder-based chemical latent space for large molecular structures with 3D complexity. Commun Chem 2023;6:249. [PMID: 37973971 PMCID: PMC10654724 DOI: 10.1038/s42004-023-01054-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023] Open

Wilson AN, St John PC, Marin DH, Hoyt CB, Rognerud EG, Nimlos MR, Cywar RM, Rorrer NA, Shebek KM, Broadbelt LJ, Beckham GT, Crowley MF. PolyID: Artificial Intelligence for Discovering Performance-Advantaged and Sustainable Polymers. Macromolecules 2023;56:8547-8557. [PMID: 38024155 PMCID: PMC10653284 DOI: 10.1021/acs.macromol.3c00994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 09/30/2023] [Indexed: 12/01/2023]

Abstract

A necessary transformation for a sustainable economy is the transition from fossil-derived plastics to polymers derived from biomass and waste resources. While renewable feedstocks can enhance material performance through unique chemical moieties, probing the vast material design space by experiment alone is not practically feasible. Here, we develop a machine-learning-based tool, PolyID, to reduce the design space of renewable feedstocks to enable efficient discovery of performance-advantaged, biobased polymers. PolyID is a multioutput, graph neural network specifically designed to increase accuracy and to enable quantitative structure-property relationship (QSPR) analysis for polymers. It includes a novel domain-of-validity method that was developed and applied to demonstrate how gaps in training data can be filled to improve accuracy. The model was benchmarked with both a 20% held-out subset of the original training data and 22 experimentally synthesized polymers. A mean absolute error for the glass transition temperatures of 19.8 and 26.4 °C was achieved for the test and experimental data sets, respectively. Predictions were made on polymers composed of monomers from four databases that contain biologically accessible small molecules: MetaCyc, MINEs, KEGG, and BiGG. From 1.4 × 106 accessible biobased polymers, we identified five poly(ethylene terephthalate) (PET) analogues with predicted improvements to thermal and transport performance. Experimental validation for one of the PET analogues demonstrated a glass transition temperature between 85 and 112 °C, which is higher than PET and within the predicted range of the PolyID tool. In addition to accurate predictions, we show how the model's predictions are explainable through analysis of individual bond importance for a biobased nylon. Overall, PolyID can aid the biobased polymer practitioner to navigate the vast number of renewable polymers to discover sustainable materials with enhanced performance.

Collapse

Affiliation(s)

A. Nolan Wilson Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
Peter C. St John Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
Daniela H. Marin Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
Caroline B. Hoyt Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
Erik G. Rognerud Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
Mark R. Nimlos Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
Robin M. Cywar Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
Nicholas A. Rorrer Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
Kevin M. Shebek Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States Department of Chemical and Biological Engineering and Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, United States Chemistry of Life Processes Institute, Northwestern University, Evanston, Illinois 60208, United States
Linda J. Broadbelt Department of Chemical and Biological Engineering and Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, United States
Gregg T. Beckham Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
Michael F. Crowley Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States

Collapse

Ohno M, Hayashi Y, Zhang Q, Kaneko Y, Yoshida R. SMiPoly: Generation of a Synthesizable Polymer Virtual Library Using Rule-Based Polymerization Reactions. J Chem Inf Model 2023;63:5539-5548. [PMID: 37604495 PMCID: PMC10498440 DOI: 10.1021/acs.jcim.3c00329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Indexed: 08/23/2023]

McDonald SM, Augustine EK, Lanners Q, Rudin C, Catherine Brinson L, Becker ML. Applied machine learning as a driver for polymeric biomaterials design. Nat Commun 2023;14:4838. [PMID: 37563117 PMCID: PMC10415291 DOI: 10.1038/s41467-023-40459-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 07/24/2023] [Indexed: 08/12/2023] Open

Shah HA, Liu J, Yang Z, Yang F, Zhang Q, Feng J. DeepRT: Predicting compounds presence in pathway modules and classifying into module classes using deep neural networks based on molecular properties. J Bioinform Comput Biol 2023;21:2350017. [PMID: 37632195 DOI: 10.1142/s0219720023500178] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2023]

Ngo NK, Hy TS, Kondor R. Multiresolution graph transformers and wavelet positional encoding for learning long-range and hierarchical structures. J Chem Phys 2023;159:034109. [PMID: 37466225 DOI: 10.1063/5.0152833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 06/29/2023] [Indexed: 07/20/2023] Open

Ryu JY, Elala E, Rhee JKK. Quantum Graph Neural Network Models for Materials Search. MATERIALS (BASEL, SWITZERLAND) 2023;16:4300. [PMID: 37374486 DOI: 10.3390/ma16124300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 06/03/2023] [Accepted: 06/05/2023] [Indexed: 06/29/2023]

Ricci E, Vergadou N. Integrating Machine Learning in the Coarse-Grained Molecular Simulation of Polymers. J Phys Chem B 2023;127:2302-2322. [PMID: 36888553 DOI: 10.1021/acs.jpcb.2c06354] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]

Sikorski EL, Cusentino MA, McCarthy MJ, Tranchida J, Wood MA, Thompson AP. Machine learned interatomic potential for dispersion strengthened plasma facing components. J Chem Phys 2023;158:114101. [PMID: 36948804 DOI: 10.1063/5.0135269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023] Open

Nguyen T, Bavarian M. Machine learning approach to polymer reaction engineering: Determining monomers reactivity ratios. POLYMER 2023. [DOI: 10.1016/j.polymer.2023.125866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]

Xie S. Perspectives on development of biomedical polymer materials in artificial intelligence age. J Biomater Appl 2023;37:1355-1375. [PMID: 36629787 DOI: 10.1177/08853282231151822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

Gurnani R, Kuenneth C, Toland A, Ramprasad R. Polymer Informatics at Scale with Multitask Graph Neural Networks. CHEMISTRY OF MATERIALS : A PUBLICATION OF THE AMERICAN CHEMICAL SOCIETY 2023;35:1560-1567. [PMID: 36873627 PMCID: PMC9979603 DOI: 10.1021/acs.chemmater.2c02991] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 02/03/2023] [Indexed: 06/18/2023]

Volgin IV, Batyr PA, Matseevich AV, Dobrovskiy AY, Andreeva MV, Nazarychev VM, Larin SV, Goikhman MY, Vizilter YV, Askadskii AA, Lyulin SV. Machine Learning with Enormous "Synthetic" Data Sets: Predicting Glass Transition Temperature of Polyimides Using Graph Convolutional Neural Networks. ACS OMEGA 2022;7:43678-43691. [PMID: 36506114 PMCID: PMC9730753 DOI: 10.1021/acsomega.2c04649] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/28/2022] [Indexed: 06/17/2023]

Abstract

In the present work, we address the problem of utilizing machine learning (ML) methods to predict the thermal properties of polymers by establishing "structure-property" relationships. Having focused on a particular class of heterocyclic polymers, namely polyimides (PIs), we developed a graph convolutional neural network (GCNN), being one of the most promising tools for working with big data, to predict the PI glass transition temperature T g as an example of the fundamental property of polymers. To train the GCNN, we propose an original methodology based on using a "transfer learning" approach with an enormous "synthetic" data set for pretraining and a small experimental data set for its fine-tuning. The "synthetic" data set contains more than 6 million combinatorically generated repeating units of PIs and theoretical values of their T g values calculated using the well-established Askadskii's quantitative structure-property relationship (QSPR) computational scheme. Additionally, an experimental data set for 214 PIs was also collected from the literature for training, fine-tuning, and validation of the GCNN. Both "synthetic" and experimental data sets are included into a PolyAskInG database (Polymer Askadskii's Intelligent Gateway). By using the PolyAskInG database, we developed GCNN which allows estimation of T g of PI with a mean absolute error (MAE) of about 20 K, which is 1.5 times lower than in the case of Askadskii QSPR analysis (33 K). To prove the efficiency and usability of the proposed GCNN architecture and training methodology for predicting polymer properties, we also employed "transfer learning" to develop alternative GCNN pretrained on proxy-characteristics taken from the popular quantum-chemical QM9 database for small compounds and fine-tuned on an experimental T g values data set from PolyAskInG database. The obtained results indicate that pretraining of GCNN on the "synthetic" polymer data set provides MAE which is almost twice as low as that in the case of using the QM9 data set in the pretraining stage (∼41 K). Furthermore, we address the questions associated with the influence of the differences in the size of the experimental and "synthetic" data sets (so-called "reality gap" problem), as well as their chemical composition on the training quality. Our results state the overall priority of using polymer data sets for developing deep neural networks, and GCNN in particular, for efficient prediction of polymer properties. Moreover, our work opens up a challenge for the theoretically supported generation of large "synthetic" data sets of polymer properties for the training of the complex ML models. The proposed methodology is rather versatile and may be generalized for predicting other properties of different polymers and copolymers synthesized through the polycondensation reaction.

Collapse

Antoniuk ER, Li P, Kailkhura B, Hiszpanski AM. Representing Polymers as Periodic Graphs with Learned Descriptors for Accurate Polymer Property Predictions. J Chem Inf Model 2022;62:5435-5445. [PMID: 36315033 DOI: 10.1021/acs.jcim.2c00875] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Reiser P, Neubert M, Eberhard A, Torresi L, Zhou C, Shao C, Metni H, van Hoesel C, Schopmans H, Sommer T, Friederich P. Graph neural networks for materials science and chemistry. COMMUNICATIONS MATERIALS 2022;3:93. [PMID: 36468086 PMCID: PMC9702700 DOI: 10.1038/s43246-022-00315-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/07/2022] [Indexed: 05/14/2023]

Affiliation(s)

Patrick Reiser Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
Marlen Neubert Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
André Eberhard Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
Luca Torresi Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
Chen Zhou Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
Chen Shao Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany Present Address: Institute for Applied Informatics and Formal Description Systems, Karlsruhe Institute of Technology, Kaiserstr. 89, 76133 Karlsruhe, Germany
Houssam Metni Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany ECPM, Université de Strasbourg, 25 Rue Becquerel, 67087 Strasbourg, France
Clint van Hoesel Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany Department of Applied Physics, Eindhoven University of Technology, Groene Loper 19, 5612 AP Eindhoven, The Netherlands
Henrik Schopmans Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
Timo Sommer Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany Institute for Theory of Condensed Matter, Karlsruhe Institute of Technology, Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany Present Address: School of Chemistry, Trinity College Dublin, College Green, Dublin 2, Ireland
Pascal Friederich Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany

Collapse

Aldeghi M, Coley CW. A graph representation of molecular ensembles for polymer property prediction. Chem Sci 2022;13:10486-10498. [PMID: 36277616 PMCID: PMC9473492 DOI: 10.1039/d2sc02839e] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 08/15/2022] [Indexed: 12/02/2022] Open

Xie T, France-Lanord A, Wang Y, Lopez J, Stolberg MA, Hill M, Leverick GM, Gomez-Bombarelli R, Johnson JA, Shao-Horn Y, Grossman JC. Accelerating amorphous polymer electrolyte screening by learning to reduce errors in molecular dynamics simulated properties. Nat Commun 2022;13:3415. [PMID: 35701416 PMCID: PMC9197847 DOI: 10.1038/s41467-022-30994-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 03/02/2022] [Indexed: 12/03/2022] Open

Affiliation(s)

Tian Xie Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA. .,Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
Arthur France-Lanord Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Yanming Wang Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Jeffrey Lopez Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Michael A Stolberg Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Megan Hill Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Graham Michael Leverick Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Rafael Gomez-Bombarelli Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Jeremiah A Johnson Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Yang Shao-Horn Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Jeffrey C Grossman Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA. .,Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.

Collapse

Flam-Shepherd D, Zhu K, Aspuru-Guzik A. Language models can learn complex molecular distributions. Nat Commun 2022;13:3293. [PMID: 35672310 PMCID: PMC9174447 DOI: 10.1038/s41467-022-30839-x] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 05/16/2022] [Indexed: 11/09/2022] Open

Mohapatra S, An J, Gómez-Bombarelli R. Chemistry-informed macromolecule graph representation for similarity computation, unsupervised and supervised learning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac545e] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Miyake Y, Saeki A. Machine Learning-Assisted Development of Organic Solar Cell Materials: Issues, Analyses, and Outlooks. J Phys Chem Lett 2021;12:12391-12401. [PMID: 34939806 DOI: 10.1021/acs.jpclett.1c03526] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Werner M. Decoding Interaction Patterns from the Chemical Sequence of Polymers Using Neural Networks. ACS Macro Lett 2021;10:1333-1338. [PMID: 35549009 DOI: 10.1021/acsmacrolett.1c00325] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Joshi RP, Kumar N. Artificial Intelligence for Autonomous Molecular Design: A Perspective. Molecules 2021;26:6761. [PMID: 34833853 PMCID: PMC8619999 DOI: 10.3390/molecules26226761] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 10/23/2021] [Accepted: 10/29/2021] [Indexed: 11/23/2022] Open

Abstract

Domain-aware artificial intelligence has been increasingly adopted in recent years to expedite molecular design in various applications, including drug design and discovery. Recent advances in areas such as physics-informed machine learning and reasoning, software engineering, high-end hardware development, and computing infrastructures are providing opportunities to build scalable and explainable AI molecular discovery systems. This could improve a design hypothesis through feedback analysis, data integration that can provide a basis for the introduction of end-to-end automation for compound discovery and optimization, and enable more intelligent searches of chemical space. Several state-of-the-art ML architectures are predominantly and independently used for predicting the properties of small molecules, their high throughput synthesis, and screening, iteratively identifying and optimizing lead therapeutic candidates. However, such deep learning and ML approaches also raise considerable conceptual, technical, scalability, and end-to-end error quantification challenges, as well as skepticism about the current AI hype to build automated tools. To this end, synergistically and intelligently using these individual components along with robust quantum physics-based molecular representation and data generation tools in a closed-loop holds enormous promise for accelerated therapeutic design to critically analyze the opportunities and challenges for their more widespread application. This article aims to identify the most recent technology and breakthrough achieved by each of the components and discusses how such autonomous AI and ML workflows can be integrated to radically accelerate the protein target or disease model-based probe design that can be iteratively validated experimentally. Taken together, this could significantly reduce the timeline for end-to-end therapeutic discovery and optimization upon the arrival of any novel zoonotic transmission event. Our article serves as a guide for medicinal, computational chemistry and biology, analytical chemistry, and the ML community to practice autonomous molecular design in precision medicine and drug discovery.

Collapse

Guan Y, Shree Sowndarya SV, Gallegos LC, St John PC, Paton RS. Real-time prediction of ¹H and ¹³C chemical shifts with DFT accuracy using a 3D graph neural network. Chem Sci 2021;12:12012-12026. [PMID: 34667567 PMCID: PMC8457395 DOI: 10.1039/d1sc03343c] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Accepted: 07/19/2021] [Indexed: 11/23/2022] Open

Abstract

Nuclear magnetic resonance (NMR) is one of the primary techniques used to elucidate the chemical structure, bonding, stereochemistry, and conformation of organic compounds. The distinct chemical shifts in an NMR spectrum depend upon each atom's local chemical environment and are influenced by both through-bond and through-space interactions with other atoms and functional groups. The in silico prediction of NMR chemical shifts using quantum mechanical (QM) calculations is now commonplace in aiding organic structural assignment since spectra can be computed for several candidate structures and then compared with experimental values to find the best possible match. However, the computational demands of calculating multiple structural- and stereo-isomers, each of which may typically exist as an ensemble of rapidly-interconverting conformations, are expensive. Additionally, the QM predictions themselves may lack sufficient accuracy to identify a correct structure. In this work, we address both of these shortcomings by developing a rapid machine learning (ML) protocol to predict ¹H and ¹³C chemical shifts through an efficient graph neural network (GNN) using 3D structures as input. Transfer learning with experimental data is used to improve the final prediction accuracy of a model trained using QM calculations. When tested on the CHESHIRE dataset, the proposed model predicts observed ¹³C chemical shifts with comparable accuracy to the best-performing DFT functionals (1.5 ppm) in around 1/6000 of the CPU time. An automated prediction webserver and graphical interface are accessible online at http://nova.chem.colostate.edu/cascade/. We further demonstrate the model in three applications: first, we use the model to decide the correct organic structure from candidates through experimental spectra, including complex stereoisomers; second, we automatically detect and revise incorrect chemical shift assignments in a popular NMR database, the NMRShiftDB; and third, we use NMR chemical shifts as descriptors for determination of the sites of electrophilic aromatic substitution.

From quantum chemical and experimental NMR data, a 3D graph neural network, CASCADE, has been developed to predict carbon and proton chemical shifts. Stereoisomers and conformers of organic molecules can be correctly distinguished.

Collapse

Sattari K, Xie Y, Lin J. Data-driven algorithms for inverse design of polymers. SOFT MATTER 2021;17:7607-7622. [PMID: 34397078 DOI: 10.1039/d1sm00725d] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Ward L, Dandu N, Blaiszik B, Narayanan B, Assary RS, Redfern PC, Foster I, Curtiss LA. Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models. J Phys Chem A 2021;125:5990-5998. [PMID: 34191512 DOI: 10.1021/acs.jpca.1c01960] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Bertoni M, Duran-Frigola M, Badia-I-Mompel P, Pauls E, Orozco-Ruiz M, Guitart-Pla O, Alcalde V, Diaz VM, Berenguer-Llergo A, Brun-Heath I, Villegas N, de Herreros AG, Aloy P. Bioactivity descriptors for uncharacterized chemical compounds. Nat Commun 2021;12:3932. [PMID: 34168145 PMCID: PMC8225676 DOI: 10.1038/s41467-021-24150-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 05/27/2021] [Indexed: 01/20/2023] Open

Affiliation(s)

Martino Bertoni Joint IRB-BSC-CRG Programme in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
Miquel Duran-Frigola Joint IRB-BSC-CRG Programme in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain. Ersilia Open Source Initiative, Cambridge, UK.
Pau Badia-I-Mompel Joint IRB-BSC-CRG Programme in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
Eduardo Pauls Joint IRB-BSC-CRG Programme in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
Modesto Orozco-Ruiz Joint IRB-BSC-CRG Programme in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
Oriol Guitart-Pla Joint IRB-BSC-CRG Programme in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
Víctor Alcalde Joint IRB-BSC-CRG Programme in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
Víctor M Diaz Programa de Recerca en Càncer, Institut Hospital del Mar d'Investigacions Mèdiques (IMIM) and Departament de Ciències de la Salut, Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain Faculty of Medicine and Health Sciences, International University of Catalonia, Barcelona, Catalonia, Spain
Antoni Berenguer-Llergo Joint IRB-BSC-CRG Programme in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
Isabelle Brun-Heath Joint IRB-BSC-CRG Programme in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
Núria Villegas Joint IRB-BSC-CRG Programme in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
Antonio García de Herreros Programa de Recerca en Càncer, Institut Hospital del Mar d'Investigacions Mèdiques (IMIM) and Departament de Ciències de la Salut, Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
Patrick Aloy Joint IRB-BSC-CRG Programme in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain. Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia, Spain.

Collapse

Westermayr J, Gastegger M, Schütt KT, Maurer RJ. Perspective on integrating machine learning into computational chemistry and materials science. J Chem Phys 2021;154:230903. [PMID: 34241249 DOI: 10.1063/5.0047760] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open

Deng D, Chen X, Zhang R, Lei Z, Wang X, Zhou F. XGraphBoost: Extracting Graph Neural Network-Based Features for a Better Prediction of Molecular Properties. J Chem Inf Model 2021;61:2697-2705. [PMID: 34009965 DOI: 10.1021/acs.jcim.0c01489] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]

Abstract

Determining the properties of chemical molecules is essential for screening candidates similar to a specific drug. These candidate molecules are further evaluated for their target binding affinities, side effects, target missing probabilities, etc. Conventional machine learning algorithms demonstrated satisfying prediction accuracies of molecular properties. A molecule cannot be directly loaded into a machine learning model, and a set of engineered features needs to be designed and calculated from a molecule. Such hand-crafted features rely heavily on the experiences of the investigating researchers. The concept of graph neural networks (GNNs) was recently introduced to describe the chemical molecules. The features may be automatically and objectively extracted from the molecules through various types of GNNs, e.g., GCN (graph convolution network), GGNN (gated graph neural network), DMPNN (directed message passing neural network), etc. However, the training of a stable GNN model requires a huge number of training samples and a large amount of computing power, compared with the conventional machine learning strategies. This study proposed the integrated framework XGraphBoost to extract the features using a GNN and build an accurate prediction model of molecular properties using the classifier XGBoost. The proposed framework XGraphBoost fully inherits the merits of the GNN-based automatic molecular feature extraction and XGBoost-based accurate prediction performance. Both classification and regression problems were evaluated using the framework XGraphBoost. The experimental results strongly suggest that XGraphBoost may facilitate the efficient and accurate predictions of various molecular properties. The source code is freely available to academic users at https://github.com/chenxiaowei-vincent/XGraphBoost.git.

Collapse

Mercado R, Rastemo T, Lindelöf E, Klambauer G, Engkvist O, Chen H, Bjerrum EJ. Practical notes on building molecular graph generative models. ACTA ACUST UNITED AC 2021. [DOI: 10.1002/ail2.18] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]

Xu P, Lu T, Ju L, Tian L, Li M, Lu W. Machine Learning Aided Design of Polymer with Targeted Band Gap Based on DFT Computation. J Phys Chem B 2021;125:601-611. [DOI: 10.1021/acs.jpcb.0c08674] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Kim Y, Thomas AE, Robichaud DJ, Iisa K, St John PC, Etz BD, Fioroni GM, Dutta A, McCormick RL, Mukarakate C, Kim S. A perspective on biomass-derived biofuels: From catalyst design principles to fuel properties. JOURNAL OF HAZARDOUS MATERIALS 2020;400:123198. [PMID: 32585513 DOI: 10.1016/j.jhazmat.2020.123198] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 06/05/2020] [Accepted: 06/09/2020] [Indexed: 05/24/2023]

Webb MA, Jackson NE, Gil PS, de Pablo JJ. Targeted sequence design within the coarse-grained polymer genome. SCIENCE ADVANCES 2020;6:eabc6216. [PMID: 33087352 PMCID: PMC7577717 DOI: 10.1126/sciadv.abc6216] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 09/02/2020] [Indexed: 05/05/2023]

Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost. Nat Commun 2020;11:2328. [PMID: 32393773 PMCID: PMC7214445 DOI: 10.1038/s41467-020-16201-z] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Accepted: 04/15/2020] [Indexed: 12/31/2022] Open

Lambard G, Gracheva E. SMILES-X: autonomous molecular compounds characterization for small datasets without descriptors. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/ab57f3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Abstract Abstract There is more and more evidence that machine learning can be successfully applied in materials science and related fields. However, datasets in these fields are often quite small (from tens to several thousands of samples). This means the most advanced machine learning techniques remain neglected, as they are considered to be applicable to big data only. Moreover, materials informatics methods often rely on human-engineered descriptors, that should be carefully chosen, or even created, to fit the physicochemical property that one intends to predict. In this article, we propose a new method that tackles both the issue of small datasets and the difficulty of developing task-specific descriptors. The SMILES-X is an autonomous pipeline for molecular compounds characterisation based on a {Embed-Encode-Attend-Predict} neural architecture with a data-specific Bayesian hyper-parameters optimisation. The only input to the architecture—the SMILES strings—are de-canonicalised in order to efficiently augment the data. One of the key features of the architecture is the attention mechanism, which enables the interpretation of output predictions without extra computational cost. The SMILES-X achieves state-of-the-art results in the inference of aqueous solubility (

RMSE ¯ test ≃ 0.57 ± 0.07

mols/L), hydration free energy (

RMSE ¯ test ≃ 0.81 ± 0.22

kcal/mol, which is ∼24.5% better than molecular dynamics simulations), and octanol/water distribution coefficient (

RMSE ¯ test ≃ 0.59 ± 0.02

for LogD at pH 7.4) of molecular compounds. The SMILES-X is intended to become an important asset in the toolkit of materials scientists and chemists. The source code for the SMILES-X is available at github.com/GLambard/SMILES-X. Collapse

Swanson K, Trivedi S, Lequieu J, Swanson K, Kondor R. Deep learning for automated classification and characterization of amorphous materials. SOFT MATTER 2020;16:435-446. [PMID: 31803878 DOI: 10.1039/c9sm01903k] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]

Peng SP, Zhao Y. Convolutional Neural Networks for the Design and Analysis of Non-Fullerene Acceptors. J Chem Inf Model 2019;59:4993-5001. [DOI: 10.1021/acs.jcim.9b00732] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]