1
|
Dong C, Li D, Liu J. Glass Transition Temperature Prediction of Polymers via Graph Reinforcement Learning. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2024; 40:18568-18580. [PMID: 39166275 DOI: 10.1021/acs.langmuir.4c01906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/22/2024]
Abstract
An expansive array of graph-based models has been utilized for accurate prediction of the structure-property relation of polymers. However, these approaches notably underutilize unsupervised structural information. Concentrating on the domain of heterocyclic polymers, particularly polyimides, this study delves into the glass transition temperature (Tg) prediction, aiming to fully exploit the potential within both the global and local structures of molecules. To achieve this, a graph reinforcement learning framework termed Molecular Structural Regularized Graph Convolutional Network with Reinforcement Learning (MSRGCN-RL) is proposed. Experimental results highlight the crucial role of both global and local structural regularization in precise Tg prediction. Concurrently, optimization of MSRGCN training through RL proves essential. This research leads the way in integrating Graph Neural Networks (GNNs) with reinforcement learning methodologies for the property prediction of polymers.
Collapse
Affiliation(s)
- Caibo Dong
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Dazi Li
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Jun Liu
- Key Laboratory of Beijing City on Preparation and Processing of Novel Polymer Materials, Beijing University of Chemical Technology, Beijing 100029, China
- State Key Laboratory of Organic-Inorganic Composites, College of Materials Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, China
| |
Collapse
|
2
|
Wang S, Yue H, Yuan X. Accelerating Polymer Discovery with Uncertainty-Guided PGCNN: Explainable AI for Predicting Properties and Mechanistic Insights. J Chem Inf Model 2024; 64:5500-5509. [PMID: 38953249 DOI: 10.1021/acs.jcim.4c00555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
Deep learning holds great potential for expediting the discovery of new polymers from the vast chemical space. However, accurately predicting polymer properties for practical applications based on their monomer composition has long been a challenge. The main obstacles include insufficient data, ineffective representation encoding, and lack of explainability. To address these issues, we propose an interpretable model called the Polymer Graph Convolutional Neural Network (PGCNN) that can accurately predict various polymer properties. This model is trained using the RadonPy data set and validated using experimental data samples. By integrating evidential deep learning with the model, we can quantify the uncertainty of predictions and enable sample-efficient training through uncertainty-guided active learning. Additionally, we demonstrate that the global attention of the graph embedding can aid in discovering underlying physical principles by identifying important functional groups within polymers and associating them with specific material attributes. Lastly, we explore the high-throughput screening capability of our model by rapidly identifying thousands of promising candidates with low and high thermal conductivity from a pool of one million hypothetical polymers. In summary, our research not only advances our mechanistic understanding of polymers using explainable AI but also paves the way for data-driven trustworthy discovery of polymer materials.
Collapse
Affiliation(s)
- Shuyu Wang
- Department of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao, Hebei 066000, China
| | - Hongxing Yue
- Department of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao, Hebei 066000, China
| | - Xiaoming Yuan
- Xiaoming Yuan - Department of Computer Science and Engineering, Northeastern University at Qinhuangdao, Qinhuangdao, Hebei 066000, China
| |
Collapse
|
3
|
Tian Z, Dai Y, Hu F, Shen Z, Xu H, Zhang H, Xu J, Hu Y, Diao Y, Li H. Enhancing Chemical Reaction Monitoring with a Deep Learning Model for NMR Spectra Image Matching to Target Compounds. J Chem Inf Model 2024; 64:5624-5633. [PMID: 38979856 DOI: 10.1021/acs.jcim.4c00522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
In the synthetic laboratory, researchers typically rely on nuclear magnetic resonance (NMR) spectra to elucidate structures of synthesized products and confirm whether they match the desired target compounds. As chemical synthesis technology evolves toward intelligence and continuity, efficient computer-assisted structure elucidation (CASE) techniques are required to replace time-consuming manual analysis and provide the necessary speed. However, current CASE methods typically aim to derive precise chemical structures from spectroscopic data, yet they suffer from drawbacks such as low accuracy, high computational cost, and reliance on chemical libraries. In meticulously designed chemical synthesis reactions, researchers prioritize confirming the attainment of the target product based on NMR spectra, rather than focusing on identifying the specific product obtained. For this purpose, we innovatively developed a binary classification model, termed as MatCS, to directly predict the relationship between NMR spectra image (including 1H NMR and 13C NMR) and the molecular structure of the target compound. After evaluating various feature extraction methods, MatCS employs a combination of the Graph Attention Networks and Graph Convolutional Networks to learn the structural features of molecular graphs and the pretrained ResNet101 network with a Convolutional Block Attention Module to extract features from NMR spectra images. The results show that on a challenging Testsim data set, which poses difficulty in distinguishing spectra of similar molecular structures, MatCS achieves comprehensive evaluation metrics with an F1-score of 0.81 and an AUC value of 0.87. Simultaneously, it exhibited commendable performance on an external SDBS data set containing experimental NMR spectra, showcasing substantial potential for structural verification tasks in real automated chemical synthesis.
Collapse
Affiliation(s)
- ZiJing Tian
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Yan Dai
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Feng Hu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - ZiHao Shen
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - HongLing Xu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - HongWen Zhang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - JinHang Xu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - YuTing Hu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - YanYan Diao
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, Shanghai 200062, China
- Lingang Laboratory, Shanghai 200031, China
| | - HongLin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, Shanghai 200062, China
- Lingang Laboratory, Shanghai 200031, China
| |
Collapse
|
4
|
Zhao Q, Zheng Y, Qiu Y, Yu Y, Huang M, Wu Y, Chen X, Huang Y, Cui S, Zhuang S. Graph Convolutional Network-Enhanced Model for Screening Persistent, Mobile, and Toxic and Very Persistent and Very Mobile Substances. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:6149-6157. [PMID: 38556993 DOI: 10.1021/acs.est.4c01201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The global management for persistent, mobile, and toxic (PMT) and very persistent and very mobile (vPvM) substances has been further strengthened with the rapid increase of emerging contaminants. The development of a ready-to-use and publicly available tool for the high-throughput screening of PMT/vPvM substances is thus urgently needed. However, the current model building with the coupling of conventional algorithms, small-scale data set, and simplistic features hinders the development of a robust model for screening PMT/vPvM with wide application domains. Here, we construct a graph convolutional network (GCN)-enhanced model with feature fusion of a molecular graph and molecular descriptors to effectively utilize the significant correlation between critical descriptors and PMT/vPvM substances. The model is built with 213,084 substances following the latest PMT classification criteria. The application domains of the GCN-enhanced model assessed by kernel density estimation demonstrate the high suitability for high-throughput screening PMT/vPvM substances with both a high accuracy rate (86.6%) and a low false-negative rate (6.8%). An online server named PMT/vPvM profiler is further developed with a user-friendly web interface (http://www.pmt.zj.cn/). Our study facilitates a more efficient evaluation of PMT/vPvM substances with a globally accessible screening platform.
Collapse
Affiliation(s)
- Qiming Zhao
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yuting Zheng
- Solid Waste and Chemicals Management Center, Ministry of Ecology and Environment of the People's Republic of China, Beijing 100029, China
| | - Yu Qiu
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yang Yu
- Solid Waste and Chemicals Management Center, Ministry of Ecology and Environment of the People's Republic of China, Beijing 100029, China
| | - Meiling Huang
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yiqu Wu
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Xiyu Chen
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yizhou Huang
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Shixuan Cui
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Shulin Zhuang
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
5
|
Han S, Kang Y, Park H, Yi J, Park G, Kim J. Multimodal Transformer for Property Prediction in Polymers. ACS APPLIED MATERIALS & INTERFACES 2024; 16:16853-16860. [PMID: 38501934 DOI: 10.1021/acsami.4c01207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
In this work, we designed a multimodal transformer that combines both the Simplified Molecular Input Line Entry System (SMILES) and molecular graph representations to enhance the prediction of polymer properties. Three models with different embeddings (SMILES, SMILES + monomer, and SMILES + dimer) were employed to assess the performance of incorporating multimodal features into transformer architectures. Fine-tuning results across five properties (i.e., density, glass-transition temperature (Tg), melting temperature (Tm), volume resistivity, and conductivity) demonstrated that the multimodal transformer with both the SMILES and the dimer configuration as inputs outperformed the transformer using only SMILES across all five properties. Furthermore, our model facilitates in-depth analysis by examining attention scores, providing deeper insights into the relationship between the deep learning model and the polymer attributes. We believe that our work, shedding light on the potential of multimodal transformers in predicting polymer properties, paves a new direction for understanding and refining polymer properties.
Collapse
Affiliation(s)
- Seunghee Han
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Yeonghun Kang
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Hyunsoo Park
- Department of Materials, Imperial College London, Exhibition Road, London SW7 2AZ, United Kingdom
| | - Jeesung Yi
- KOLON One&Only TOWER, 110, Magokdong-ro, Gangseo-gu, Seoul 07793, Republic of Korea
| | - Geunyeong Park
- KOLON One&Only TOWER, 110, Magokdong-ro, Gangseo-gu, Seoul 07793, Republic of Korea
| | - Jihan Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| |
Collapse
|
6
|
Chen J, Zhu L, Wang J. Quantitative structure-property relationship modelling on autoignition temperature: evaluation and comparative analysis. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2024; 35:199-218. [PMID: 38372083 DOI: 10.1080/1062936x.2024.2312527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 01/25/2024] [Indexed: 02/20/2024]
Abstract
The autoignition temperature (AIT) serves as a crucial indicator for assessing the potential hazards associated with a chemical substance. In order to gain deeper insights into model performance and facilitate the establishment of effective methodological practices for AIT predictions, this study conducts a benchmark investigation on Quantitative Structure-Property Relationship (QSPR) modelling for AIT. As novelties of this work, three significant advancements are implemented in the AIT modelling process, including explicit consideration of data quality, utilization of state-of-the-art feature engineering workflows, and the innovative application of graph-based deep learning techniques, which are employed for the first time in AIT prediction. Specifically, three traditional QSPR models (multi-linear regression, support vector regression, and artificial neural networks) are evaluated, alongside the assessment of a deep-learning model employing message passing neural network architecture supplemented by graph-data augmentation techniques.
Collapse
Affiliation(s)
- J Chen
- College of Chemical Engineering, Zhejiang University of Technology, Hangzhou, China
| | - L Zhu
- College of Chemical Engineering, Zhejiang University of Technology, Hangzhou, China
| | - J Wang
- College of Chemical Engineering, Zhejiang University of Technology, Hangzhou, China
| |
Collapse
|
7
|
Sanchez Medina E, Kunchapu S, Sundmacher K. Gibbs-Helmholtz Graph Neural Network for the Prediction of Activity Coefficients of Polymer Solutions at Infinite Dilution. J Phys Chem A 2023; 127:9863-9873. [PMID: 37943172 PMCID: PMC10683018 DOI: 10.1021/acs.jpca.3c05892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/18/2023] [Accepted: 10/25/2023] [Indexed: 11/10/2023]
Abstract
Machine learning models have gained prominence for predicting pure-component properties, yet their application to mixture property prediction remains relatively limited. However, the significance of mixtures in our daily lives is undeniable, particularly in industries such as polymer processing. This study presents a modification of the Gibbs-Helmholtz graph neural network (GH-GNN) model for predicting weight-based activity coefficients at infinite dilution (Ωij∞) in polymer solutions. We evaluate various polymer representations ranging from monomer, repeating unit, periodic unit, and oligomer and observe that, in data-scarce scenarios of polymer-solvent mixtures, polymer representation specifics have a reduced impact compared to data-rich environments. Leveraging transfer learning, we harness richer activity coefficient data from small-size systems, enhancing model accuracy and reducing prediction variability. The modified GH-GNN model achieves remarkable prediction results in mixture interpolation and solvent extrapolation tasks having an overall mean absolute error of 0.15, showcasing the potential of graph-neural-network-based models for property prediction of polymer solutions. Comparative analysis with the established models UNIFAC-ZM and Entropic-FV suggests a promising avenue for future research on the use of data-driven models for the prediction of the thermodynamic properties of polymer solutions.
Collapse
Affiliation(s)
- Edgar
Ivan Sanchez Medina
- Chair
for Process Systems Engineering, Otto-von-Guericke
University, Universitätsplatz 2, Magdeburg 39106, Germany
| | - Sreekanth Kunchapu
- Chair
for Process Systems Engineering, Otto-von-Guericke
University, Universitätsplatz 2, Magdeburg 39106, Germany
| | - Kai Sundmacher
- Chair
for Process Systems Engineering, Otto-von-Guericke
University, Universitätsplatz 2, Magdeburg 39106, Germany
- Process
Systems Engineering, Max Planck Institute
for Dynamics of Complex Technical Systems, Sandtorstraße 1, Magdeburg 39106, Germany
| |
Collapse
|
8
|
Hu J, Li Z, Lin J, Zhang L. Prediction and Interpretability of Glass Transition Temperature of Homopolymers by Data-Augmented Graph Convolutional Neural Networks. ACS APPLIED MATERIALS & INTERFACES 2023; 15:54006-54017. [PMID: 37934171 DOI: 10.1021/acsami.3c13698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
Establishing the structure-property relationship by machine learning (ML) models is extremely valuable for accelerating the molecular design of polymers. However, existing ML models for the polymers are subject to scarcity issues of training data and fewer variations of graph structures of molecules. In addition, limited works have explored the interpretability of ML models to infer the latent knowledge in the field of polymer science that could inspire ML-assisted molecular design. In this contribution, we integrate graph convolutional neural networks (GCNs) with data augmentation strategy to predict the glass transition temperature Tg of polymers. It is demonstrated that the data-augmented GCN model outperforms the conventional models and achieves a higher accuracy for the prediction of Tg despite a small amount of training data. Furthermore, taking advantage of molecular graph representations, the data-augmented GCN model has the capability to infer the importance of atoms or substructures from the understanding of Tg, which generally agrees with the experimental findings in the field of polymer science. The inferred knowledge of the GCN model is used to advise on the design of functional polymers with specific Tg. The data-augmented GCN model possesses prominent superiorities in the establishment of structure-property relationship and also provides an efficient way for accelerating the rational design of polymer molecules.
Collapse
Affiliation(s)
- Junyang Hu
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Zean Li
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Jiaping Lin
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liangshun Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
9
|
Kuenneth C, Ramprasad R. polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat Commun 2023; 14:4099. [PMID: 37433807 DOI: 10.1038/s41467-023-39868-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 06/28/2023] [Indexed: 07/13/2023] Open
Abstract
Polymers are a vital part of everyday life. Their chemical universe is so large that it presents unprecedented opportunities as well as significant challenges to identify suitable application-specific candidates. We present a complete end-to-end machine-driven polymer informatics pipeline that can search this space for suitable candidates at unprecedented speed and accuracy. This pipeline includes a polymer chemical fingerprinting capability called polyBERT (inspired by Natural Language Processing concepts), and a multitask learning approach that maps the polyBERT fingerprints to a host of properties. polyBERT is a chemical linguist that treats the chemical structure of polymers as a chemical language. The present approach outstrips the best presently available concepts for polymer property prediction based on handcrafted fingerprint schemes in speed by two orders of magnitude while preserving accuracy, thus making it a strong candidate for deployment in scalable architectures including cloud infrastructures.
Collapse
Affiliation(s)
- Christopher Kuenneth
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
- Faculty of Engineering Science, University of Bayreuth, 95447, Bayreuth, Germany
| | - Rampi Ramprasad
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA.
| |
Collapse
|
10
|
Fransen KA, Av-Ron SHM, Buchanan TR, Walsh DJ, Rota DT, Van Note L, Olsen BD. High-throughput experimentation for discovery of biodegradable polyesters. Proc Natl Acad Sci U S A 2023; 120:e2220021120. [PMID: 37252959 PMCID: PMC10266013 DOI: 10.1073/pnas.2220021120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 03/08/2023] [Indexed: 06/01/2023] Open
Abstract
The consistent rise of plastic pollution has stimulated interest in the development of biodegradable plastics. However, the study of polymer biodegradation has historically been limited to a small number of polymers due to costly and slow standard methods for measuring degradation, slowing new material innovation. High-throughput polymer synthesis and a high-throughput polymer biodegradation method are developed and applied to generate a biodegradation dataset for 642 chemically distinct polyesters and polycarbonates. The biodegradation assay was based on the clear-zone technique, using automation to optically observe the degradation of suspended polymer particles under the action of a single Pseudomonas lemoignei bacterial colony. Biodegradability was found to depend strongly on aliphatic repeat unit length, with chains less than 15 carbons and short side chains improving biodegradability. Aromatic backbone groups were generally detrimental to biodegradability; however, ortho- and para-substituted benzene rings in the backbone were more likely to be degradable than metasubstituted rings. Additionally, backbone ether groups improved biodegradability. While other heteroatoms did not show a clear improvement in biodegradability, they did demonstrate increases in biodegradation rates. Machine learning (ML) models were leveraged to predict biodegradability on this large dataset with accuracies over 82% using only chemical structure descriptors.
Collapse
Affiliation(s)
- Katharina A. Fransen
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Sarah H. M. Av-Ron
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Tess R. Buchanan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Dylan J. Walsh
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Dechen T. Rota
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Lana Van Note
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Bradley D. Olsen
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
11
|
Volgin IV, Batyr PA, Matseevich AV, Dobrovskiy AY, Andreeva MV, Nazarychev VM, Larin SV, Goikhman MY, Vizilter YV, Askadskii AA, Lyulin SV. Machine Learning with Enormous "Synthetic" Data Sets: Predicting Glass Transition Temperature of Polyimides Using Graph Convolutional Neural Networks. ACS OMEGA 2022; 7:43678-43691. [PMID: 36506114 PMCID: PMC9730753 DOI: 10.1021/acsomega.2c04649] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/28/2022] [Indexed: 06/17/2023]
Abstract
In the present work, we address the problem of utilizing machine learning (ML) methods to predict the thermal properties of polymers by establishing "structure-property" relationships. Having focused on a particular class of heterocyclic polymers, namely polyimides (PIs), we developed a graph convolutional neural network (GCNN), being one of the most promising tools for working with big data, to predict the PI glass transition temperature T g as an example of the fundamental property of polymers. To train the GCNN, we propose an original methodology based on using a "transfer learning" approach with an enormous "synthetic" data set for pretraining and a small experimental data set for its fine-tuning. The "synthetic" data set contains more than 6 million combinatorically generated repeating units of PIs and theoretical values of their T g values calculated using the well-established Askadskii's quantitative structure-property relationship (QSPR) computational scheme. Additionally, an experimental data set for 214 PIs was also collected from the literature for training, fine-tuning, and validation of the GCNN. Both "synthetic" and experimental data sets are included into a PolyAskInG database (Polymer Askadskii's Intelligent Gateway). By using the PolyAskInG database, we developed GCNN which allows estimation of T g of PI with a mean absolute error (MAE) of about 20 K, which is 1.5 times lower than in the case of Askadskii QSPR analysis (33 K). To prove the efficiency and usability of the proposed GCNN architecture and training methodology for predicting polymer properties, we also employed "transfer learning" to develop alternative GCNN pretrained on proxy-characteristics taken from the popular quantum-chemical QM9 database for small compounds and fine-tuned on an experimental T g values data set from PolyAskInG database. The obtained results indicate that pretraining of GCNN on the "synthetic" polymer data set provides MAE which is almost twice as low as that in the case of using the QM9 data set in the pretraining stage (∼41 K). Furthermore, we address the questions associated with the influence of the differences in the size of the experimental and "synthetic" data sets (so-called "reality gap" problem), as well as their chemical composition on the training quality. Our results state the overall priority of using polymer data sets for developing deep neural networks, and GCNN in particular, for efficient prediction of polymer properties. Moreover, our work opens up a challenge for the theoretically supported generation of large "synthetic" data sets of polymer properties for the training of the complex ML models. The proposed methodology is rather versatile and may be generalized for predicting other properties of different polymers and copolymers synthesized through the polycondensation reaction.
Collapse
Affiliation(s)
- Igor V. Volgin
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences (IMC
RAS), St. Petersburg 199004, Russian Federation
| | - Pavel A. Batyr
- Federal
State Unitary Enterprise “State Research Institute of Aviation
Systems” (GosNIIAS), Moscow 125167, Russian Federation
| | - Andrey V. Matseevich
- A.N.
Nesmeyanov Institute of Organoelement Compounds of Russian Academy
of Sciences (INEOS RAS), Moscow 119991, Russian Federation
| | - Alexey Yu. Dobrovskiy
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences (IMC
RAS), St. Petersburg 199004, Russian Federation
| | - Maria V. Andreeva
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences (IMC
RAS), St. Petersburg 199004, Russian Federation
| | - Victor M. Nazarychev
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences (IMC
RAS), St. Petersburg 199004, Russian Federation
| | - Sergey V. Larin
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences (IMC
RAS), St. Petersburg 199004, Russian Federation
| | - Mikhail Ya. Goikhman
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences (IMC
RAS), St. Petersburg 199004, Russian Federation
| | - Yury V. Vizilter
- Federal
State Unitary Enterprise “State Research Institute of Aviation
Systems” (GosNIIAS), Moscow 125167, Russian Federation
| | - Andrey A. Askadskii
- A.N.
Nesmeyanov Institute of Organoelement Compounds of Russian Academy
of Sciences (INEOS RAS), Moscow 119991, Russian Federation
- Moscow
State University of Civil Engineering (MGSU), Moscow 129337, Russian Federation
| | - Sergey V. Lyulin
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences (IMC
RAS), St. Petersburg 199004, Russian Federation
| |
Collapse
|
12
|
Recent advances and challenges in experiment-oriented polymer informatics. Polym J 2022. [DOI: 10.1038/s41428-022-00734-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
13
|
Antoniuk ER, Li P, Kailkhura B, Hiszpanski AM. Representing Polymers as Periodic Graphs with Learned Descriptors for Accurate Polymer Property Predictions. J Chem Inf Model 2022; 62:5435-5445. [PMID: 36315033 DOI: 10.1021/acs.jcim.2c00875] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Accurately predicting new polymers' properties with machine learning models apriori to synthesis has potential to significantly accelerate new polymers' discovery and development. However, accurately and efficiently capturing polymers' complex, periodic structures in machine learning models remains a grand challenge for the polymer cheminformatics community. Specifically, there has yet to be an ideal solution for the problems of how to capture the periodicity of polymers, as well as how to optimally develop polymer descriptors without requiring human-based feature design. In this work, we tackle these problems by utilizing a periodic polymer graph representation that accounts for polymers' periodicity and coupling it with a message-passing neural network that leverages the power of graph deep learning to automatically learn chemically relevant polymer descriptors. Remarkably, this approach achieves state-of-the-art performance on 8 out of 10 distinct polymer property prediction tasks. These results highlight the advancement in predictive capability that is possible through learning descriptors that are specifically optimized for capturing the unique chemical structure of polymers.
Collapse
Affiliation(s)
- Evan R Antoniuk
- Materials Science Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California94550-5507, United States
| | - Peggy Li
- Global Security Computing Applications Division, Computing Directorate, Lawrence Livermore National Laboratory, Livermore, California94550-5507, United States
| | - Bhavya Kailkhura
- Machine Intelligence Group/Center for Applied Scientific Computing, Computing Directorate, Lawrence Livermore National Laboratory, Livermore, California94550-5507, United States
| | - Anna M Hiszpanski
- Materials Science Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California94550-5507, United States
| |
Collapse
|
14
|
Reiser P, Neubert M, Eberhard A, Torresi L, Zhou C, Shao C, Metni H, van Hoesel C, Schopmans H, Sommer T, Friederich P. Graph neural networks for materials science and chemistry. COMMUNICATIONS MATERIALS 2022; 3:93. [PMID: 36468086 PMCID: PMC9702700 DOI: 10.1038/s43246-022-00315-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/07/2022] [Indexed: 05/14/2023]
Abstract
Machine learning plays an increasingly important role in many areas of chemistry and materials science, being used to predict materials properties, accelerate simulations, design new structures, and predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials science, as they directly work on a graph or structural representation of molecules and materials and therefore have full access to all relevant information required to characterize materials. In this Review, we provide an overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures, followed by a discussion of a wide range of recent applications of GNNs in chemistry and materials science, and concluding with a road-map for the further development and application of GNNs.
Collapse
Affiliation(s)
- Patrick Reiser
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Marlen Neubert
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - André Eberhard
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Luca Torresi
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Zhou
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Shao
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Present Address: Institute for Applied Informatics and Formal Description Systems, Karlsruhe Institute of Technology, Kaiserstr. 89, 76133 Karlsruhe, Germany
| | - Houssam Metni
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- ECPM, Université de Strasbourg, 25 Rue Becquerel, 67087 Strasbourg, France
| | - Clint van Hoesel
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Department of Applied Physics, Eindhoven University of Technology, Groene Loper 19, 5612 AP Eindhoven, The Netherlands
| | - Henrik Schopmans
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Timo Sommer
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute for Theory of Condensed Matter, Karlsruhe Institute of Technology, Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany
- Present Address: School of Chemistry, Trinity College Dublin, College Green, Dublin 2, Ireland
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
15
|
Aldeghi M, Coley CW. A graph representation of molecular ensembles for polymer property prediction. Chem Sci 2022; 13:10486-10498. [PMID: 36277616 PMCID: PMC9473492 DOI: 10.1039/d2sc02839e] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 08/15/2022] [Indexed: 12/02/2022] Open
Abstract
Synthetic polymers are versatile and widely used materials. Similar to small organic molecules, a large chemical space of such materials is hypothetically accessible. Computational property prediction and virtual screening can accelerate polymer design by prioritizing candidates expected to have favorable properties. However, in contrast to organic molecules, polymers are often not well-defined single structures but an ensemble of similar molecules, which poses unique challenges to traditional chemical representations and machine learning approaches. Here, we introduce a graph representation of molecular ensembles and an associated graph neural network architecture that is tailored to polymer property prediction. We demonstrate that this approach captures critical features of polymeric materials, like chain architecture, monomer stoichiometry, and degree of polymerization, and achieves superior accuracy to off-the-shelf cheminformatics methodologies. While doing so, we built a dataset of simulated electron affinity and ionization potential values for >40k polymers with varying monomer composition, stoichiometry, and chain architecture, which may be used in the development of other tailored machine learning approaches. The dataset and machine learning models presented in this work pave the path toward new classes of algorithms for polymer informatics and, more broadly, introduce a framework for the modeling of molecular ensembles.
Collapse
Affiliation(s)
- Matteo Aldeghi
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|