1
|
Shimakawa H, Kumada A, Sato M. Prevention of Leakage in Machine Learning Prediction for Polymer Composite Properties. J Chem Inf Model 2024; 64:3621-3629. [PMID: 38642039 DOI: 10.1021/acs.jcim.3c01894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/22/2024]
Abstract
Machine learning (ML) has facilitated property prediction for intricate materials by integrating materials and experimental features such as processing and measurement conditions. However, ML models designed for material properties have often disregarded a common issue of "leakage," resulting in an overestimation of model performance and a decrease in model transferability. This issue can arise from biases inherent in multiple data points obtained from the same experimental group. We provide a critical examination and prevention method of leakage in property prediction for polymer composites. Our proposed method utilizes data partitioning based on the experimental group to ensure that data from the same group are not mixed in both the training and test sets. Evaluation results highlight that the conventional random partitioning unintentionally inflates ML performance through the misuse of experimental features for leaking data bias within the same experimental group rather than explaining the physical causality. In contrast, the proposed method enables the leakage-free utilization of experimental features to improve prediction accuracy while ensuring model transferability. Specifically, when integrating experimental features with polymer and filler features, the conventional method overestimates the prediction performance of electrical conductivity in reducing RMSE by 26% depending on leakage, whereas the proposed method achieves a reduction in RMSE by 5% without leakage. These findings offer valuable guidance for the effective utilization of experimental features in data-driven materials science.
Collapse
Affiliation(s)
- Hajime Shimakawa
- Department of Electrical Engineering and Information Systems, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Akiko Kumada
- Department of Electrical Engineering and Information Systems, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Masahiro Sato
- Department of Electrical Engineering and Information Systems, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan
| |
Collapse
|
2
|
Xian W, Zhan YS, Maiti A, Saab AP, Li Y. Filled Elastomers: Mechanistic and Physics-Driven Modeling and Applications as Smart Materials. Polymers (Basel) 2024; 16:1387. [PMID: 38794580 PMCID: PMC11125212 DOI: 10.3390/polym16101387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/06/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024] Open
Abstract
Elastomers are made of chain-like molecules to form networks that can sustain large deformation. Rubbers are thermosetting elastomers that are obtained from irreversible curing reactions. Curing reactions create permanent bonds between the molecular chains. On the other hand, thermoplastic elastomers do not need curing reactions. Incorporation of appropriated filler particles, as has been practiced for decades, can significantly enhance mechanical properties of elastomers. However, there are fundamental questions about polymer matrix composites (PMCs) that still elude complete understanding. This is because the macroscopic properties of PMCs depend not only on the overall volume fraction (ϕ) of the filler particles, but also on their spatial distribution (i.e., primary, secondary, and tertiary structure). This work aims at reviewing how the mechanical properties of PMCs are related to the microstructure of filler particles and to the interaction between filler particles and polymer matrices. Overall, soft rubbery matrices dictate the elasticity/hyperelasticity of the PMCs while the reinforcement involves polymer-particle interactions that can significantly influence the mechanical properties of the polymer matrix interface. For ϕ values higher than a threshold, percolation of the filler particles can lead to significant reinforcement. While viscoelastic behavior may be attributed to the soft rubbery component, inelastic behaviors like the Mullins and Payne effects are highly correlated to the microstructures of the polymer matrix and the filler particles, as well as that of the polymer-particle interface. Additionally, the incorporation of specific filler particles within intelligently designed polymer systems has been shown to yield a variety of functional and responsive materials, commonly termed smart materials. We review three types of smart PMCs, i.e., magnetoelastic (M-), shape-memory (SM-), and self-healing (SH-) PMCs, and discuss the constitutive models for these smart materials.
Collapse
Affiliation(s)
- Weikang Xian
- Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA; (W.X.); (Y.-S.Z.)
| | - You-Shu Zhan
- Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA; (W.X.); (Y.-S.Z.)
| | - Amitesh Maiti
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA; (A.M.); (A.P.S.)
| | - Andrew P. Saab
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA; (A.M.); (A.P.S.)
| | - Ying Li
- Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA; (W.X.); (Y.-S.Z.)
| |
Collapse
|
3
|
Choi S, Lee J, Seo J, Han SW, Lee SH, Seo JH, Seok J. Automated BigSMILES conversion workflow and dataset for homopolymeric macromolecules. Sci Data 2024; 11:371. [PMID: 38605036 PMCID: PMC11009387 DOI: 10.1038/s41597-024-03212-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 04/02/2024] [Indexed: 04/13/2024] Open
Abstract
The simplified molecular-input line-entry system (SMILES) has been utilized in a variety of artificial intelligence analyses owing to its capability of representing chemical structures using line notation. However, its ease of representation is limited, which has led to the proposal of BigSMILES as an alternative method suitable for the representation of macromolecules. Nevertheless, research on BigSMILES remains limited due to its preprocessing requirements. Thus, this study proposes a conversion workflow of BigSMILES, focusing on its automated generation from SMILES representations of homopolymers. BigSMILES representations for 4,927,181 records are provided, thereby enabling its immediate use for various research and development applications. Our study presents detailed descriptions on a validation process to ensure the accuracy, interchangeability, and robustness of the conversion. Additionally, a systematic overview of utilized codes and functions that emphasizes their relevance in the context of BigSMILES generation are produced. This advancement is anticipated to significantly aid researchers and facilitate further studies in BigSMILES representation, including potential applications in deep learning and further extension to complex structures such as copolymers.
Collapse
Affiliation(s)
- Sunho Choi
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Joonbum Lee
- Department of Materials Science and Engineering, Korea University, Seoul, South Korea
| | - Jangwon Seo
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Sung Won Han
- School of Industrial Management Engineering, Korea University, Seoul, South Korea
| | - Sang Hyun Lee
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Ji-Hun Seo
- Department of Materials Science and Engineering, Korea University, Seoul, South Korea
| | - Junhee Seok
- School of Electrical Engineering, Korea University, Seoul, South Korea.
| |
Collapse
|
4
|
Uddin MJ, Fan J. Interpretable Machine Learning Framework to Predict the Glass Transition Temperature of Polymers. Polymers (Basel) 2024; 16:1049. [PMID: 38674969 PMCID: PMC11054142 DOI: 10.3390/polym16081049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 03/25/2024] [Accepted: 03/27/2024] [Indexed: 04/28/2024] Open
Abstract
The glass transition temperature of polymers is a key parameter in meeting the application requirements for energy absorption. Previous studies have provided some data from slow, expensive trial-and-error procedures. By recognizing these data, machine learning algorithms are able to extract valuable knowledge and disclose essential insights. In this study, a dataset of 7174 samples was utilized. The polymers were numerically represented using two methods: Morgan fingerprint and molecular descriptor. During preprocessing, the dataset was scaled using a standard scaler technique. We removed the features with small variance from the dataset and used the Pearson correlation technique to exclude the features that were highly connected. Then, the most significant features were selected using the recursive feature elimination method. Nine machine learning techniques were employed to predict the glass transition temperature and tune their hyperparameters. The models were compared using the performance metrics of mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2). We observed that the extra tree regressor provided the best results. Significant features were also identified using statistical machine learning methods. The SHAP method was also employed to demonstrate the influence of each feature on the model's output. This framework can be adaptable to other properties at a low computational expense.
Collapse
Affiliation(s)
| | - Jitang Fan
- School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
5
|
Han S, Kang Y, Park H, Yi J, Park G, Kim J. Multimodal Transformer for Property Prediction in Polymers. ACS APPLIED MATERIALS & INTERFACES 2024; 16:16853-16860. [PMID: 38501934 DOI: 10.1021/acsami.4c01207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
In this work, we designed a multimodal transformer that combines both the Simplified Molecular Input Line Entry System (SMILES) and molecular graph representations to enhance the prediction of polymer properties. Three models with different embeddings (SMILES, SMILES + monomer, and SMILES + dimer) were employed to assess the performance of incorporating multimodal features into transformer architectures. Fine-tuning results across five properties (i.e., density, glass-transition temperature (Tg), melting temperature (Tm), volume resistivity, and conductivity) demonstrated that the multimodal transformer with both the SMILES and the dimer configuration as inputs outperformed the transformer using only SMILES across all five properties. Furthermore, our model facilitates in-depth analysis by examining attention scores, providing deeper insights into the relationship between the deep learning model and the polymer attributes. We believe that our work, shedding light on the potential of multimodal transformers in predicting polymer properties, paves a new direction for understanding and refining polymer properties.
Collapse
Affiliation(s)
- Seunghee Han
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Yeonghun Kang
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Hyunsoo Park
- Department of Materials, Imperial College London, Exhibition Road, London SW7 2AZ, United Kingdom
| | - Jeesung Yi
- KOLON One&Only TOWER, 110, Magokdong-ro, Gangseo-gu, Seoul 07793, Republic of Korea
| | - Geunyeong Park
- KOLON One&Only TOWER, 110, Magokdong-ro, Gangseo-gu, Seoul 07793, Republic of Korea
| | - Jihan Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| |
Collapse
|
6
|
Patel RA, Webb MA. Data-Driven Design of Polymer-Based Biomaterials: High-throughput Simulation, Experimentation, and Machine Learning. ACS APPLIED BIO MATERIALS 2024; 7:510-527. [PMID: 36701125 DOI: 10.1021/acsabm.2c00962] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Polymers, with the capacity to tunably alter properties and response based on manipulation of their chemical characteristics, are attractive components in biomaterials. Nevertheless, their potential as functional materials is also inhibited by their complexity, which complicates rational or brute-force design and realization. In recent years, machine learning has emerged as a useful tool for facilitating materials design via efficient modeling of structure-property relationships in the chemical domain of interest. In this Spotlight, we discuss the emergence of data-driven design of polymers that can be deployed in biomaterials with particular emphasis on complex copolymer systems. We outline recent developments, as well as our own contributions and takeaways, related to high-throughput data generation for polymer systems, methods for surrogate modeling by machine learning, and paradigms for property optimization and design. Throughout this discussion, we highlight key aspects of successful strategies and other considerations that will be relevant to the future design of polymer-based biomaterials with target properties.
Collapse
Affiliation(s)
- Roshan A Patel
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| |
Collapse
|
7
|
Shi J, Walsh D, Zou W, Rebello NJ, Deagen ME, Fransen KA, Gao X, Olsen BD, Audus DJ. Calculating Pairwise Similarity of Polymer Ensembles via Earth Mover's Distance. ACS POLYMERS AU 2024; 4:66-76. [PMID: 38371731 PMCID: PMC10870752 DOI: 10.1021/acspolymersau.3c00029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 02/20/2024]
Abstract
Synthetic polymers, in contrast to small molecules and deterministic biomacromolecules, are typically ensembles composed of polymer chains with varying numbers, lengths, sequences, chemistry, and topologies. While numerous approaches exist for measuring pairwise similarity among small molecules and sequence-defined biomacromolecules, accurately determining the pairwise similarity between two polymer ensembles remains challenging. This work proposes the earth mover's distance (EMD) metric to calculate the pairwise similarity score between two polymer ensembles. EMD offers a greater resolution of chemical differences between polymer ensembles than the averaging method and provides a quantitative numeric value representing the pairwise similarity between polymer ensembles in alignment with chemical intuition. The EMD approach for assessing polymer similarity enhances the development of accurate chemical search algorithms within polymer databases and can improve machine learning techniques for polymer design, optimization, and property prediction.
Collapse
Affiliation(s)
- Jiale Shi
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Dylan Walsh
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Weizhong Zou
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Nathan J. Rebello
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael E. Deagen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Katharina A. Fransen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Xian Gao
- Department
of Chemical and Biomolecular Engineering, University of Notre Dame, Notre
Dame, Indiana 46556, United States
| | - Bradley D. Olsen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Debra J. Audus
- Materials
Science and Engineering Division, National
Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
8
|
Zhang P, Kearney L, Bhowmik D, Fox Z, Naskar AK, Gounley J. Transferring a Molecular Foundation Model for Polymer Property Predictions. J Chem Inf Model 2023; 63:7689-7698. [PMID: 38055952 DOI: 10.1021/acs.jcim.3c01650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
Transformer-based large language models have remarkable potential to accelerate design optimization for applications such as drug development and material discovery. Self-supervised pretraining of transformer models requires large-scale data sets, which are often sparsely populated in topical areas such as polymer science. State-of-the-art approaches for polymers conduct data augmentation to generate additional samples but unavoidably incur extra computational costs. In contrast, large-scale open-source data sets are available for small molecules and provide a potential solution to data scarcity through transfer learning. In this work, we show that using transformers pretrained on small molecules and fine-tuned on polymer properties achieves comparable accuracy to those trained on augmented polymer data sets for a series of benchmark prediction tasks.
Collapse
Affiliation(s)
- Pei Zhang
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Logan Kearney
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Debsindhu Bhowmik
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Zachary Fox
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Amit K Naskar
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - John Gounley
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| |
Collapse
|
9
|
Day EC, Chittari SS, Bogen MP, Knight AS. Navigating the Expansive Landscapes of Soft Materials: A User Guide for High-Throughput Workflows. ACS POLYMERS AU 2023; 3:406-427. [PMID: 38107416 PMCID: PMC10722570 DOI: 10.1021/acspolymersau.3c00025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/02/2023] [Accepted: 11/07/2023] [Indexed: 12/19/2023]
Abstract
Synthetic polymers are highly customizable with tailored structures and functionality, yet this versatility generates challenges in the design of advanced materials due to the size and complexity of the design space. Thus, exploration and optimization of polymer properties using combinatorial libraries has become increasingly common, which requires careful selection of synthetic strategies, characterization techniques, and rapid processing workflows to obtain fundamental principles from these large data sets. Herein, we provide guidelines for strategic design of macromolecule libraries and workflows to efficiently navigate these high-dimensional design spaces. We describe synthetic methods for multiple library sizes and structures as well as characterization methods to rapidly generate data sets, including tools that can be adapted from biological workflows. We further highlight relevant insights from statistics and machine learning to aid in data featurization, representation, and analysis. This Perspective acts as a "user guide" for researchers interested in leveraging high-throughput screening toward the design of multifunctional polymers and predictive modeling of structure-property relationships in soft materials.
Collapse
Affiliation(s)
| | | | - Matthew P. Bogen
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Abigail S. Knight
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
10
|
Sanchez Medina E, Kunchapu S, Sundmacher K. Gibbs-Helmholtz Graph Neural Network for the Prediction of Activity Coefficients of Polymer Solutions at Infinite Dilution. J Phys Chem A 2023; 127:9863-9873. [PMID: 37943172 PMCID: PMC10683018 DOI: 10.1021/acs.jpca.3c05892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/18/2023] [Accepted: 10/25/2023] [Indexed: 11/10/2023]
Abstract
Machine learning models have gained prominence for predicting pure-component properties, yet their application to mixture property prediction remains relatively limited. However, the significance of mixtures in our daily lives is undeniable, particularly in industries such as polymer processing. This study presents a modification of the Gibbs-Helmholtz graph neural network (GH-GNN) model for predicting weight-based activity coefficients at infinite dilution (Ωij∞) in polymer solutions. We evaluate various polymer representations ranging from monomer, repeating unit, periodic unit, and oligomer and observe that, in data-scarce scenarios of polymer-solvent mixtures, polymer representation specifics have a reduced impact compared to data-rich environments. Leveraging transfer learning, we harness richer activity coefficient data from small-size systems, enhancing model accuracy and reducing prediction variability. The modified GH-GNN model achieves remarkable prediction results in mixture interpolation and solvent extrapolation tasks having an overall mean absolute error of 0.15, showcasing the potential of graph-neural-network-based models for property prediction of polymer solutions. Comparative analysis with the established models UNIFAC-ZM and Entropic-FV suggests a promising avenue for future research on the use of data-driven models for the prediction of the thermodynamic properties of polymer solutions.
Collapse
Affiliation(s)
- Edgar
Ivan Sanchez Medina
- Chair
for Process Systems Engineering, Otto-von-Guericke
University, Universitätsplatz 2, Magdeburg 39106, Germany
| | - Sreekanth Kunchapu
- Chair
for Process Systems Engineering, Otto-von-Guericke
University, Universitätsplatz 2, Magdeburg 39106, Germany
| | - Kai Sundmacher
- Chair
for Process Systems Engineering, Otto-von-Guericke
University, Universitätsplatz 2, Magdeburg 39106, Germany
- Process
Systems Engineering, Max Planck Institute
for Dynamics of Complex Technical Systems, Sandtorstraße 1, Magdeburg 39106, Germany
| |
Collapse
|
11
|
Hu J, Li Z, Lin J, Zhang L. Prediction and Interpretability of Glass Transition Temperature of Homopolymers by Data-Augmented Graph Convolutional Neural Networks. ACS APPLIED MATERIALS & INTERFACES 2023; 15:54006-54017. [PMID: 37934171 DOI: 10.1021/acsami.3c13698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
Establishing the structure-property relationship by machine learning (ML) models is extremely valuable for accelerating the molecular design of polymers. However, existing ML models for the polymers are subject to scarcity issues of training data and fewer variations of graph structures of molecules. In addition, limited works have explored the interpretability of ML models to infer the latent knowledge in the field of polymer science that could inspire ML-assisted molecular design. In this contribution, we integrate graph convolutional neural networks (GCNs) with data augmentation strategy to predict the glass transition temperature Tg of polymers. It is demonstrated that the data-augmented GCN model outperforms the conventional models and achieves a higher accuracy for the prediction of Tg despite a small amount of training data. Furthermore, taking advantage of molecular graph representations, the data-augmented GCN model has the capability to infer the importance of atoms or substructures from the understanding of Tg, which generally agrees with the experimental findings in the field of polymer science. The inferred knowledge of the GCN model is used to advise on the design of functional polymers with specific Tg. The data-augmented GCN model possesses prominent superiorities in the establishment of structure-property relationship and also provides an efficient way for accelerating the rational design of polymer molecules.
Collapse
Affiliation(s)
- Junyang Hu
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Zean Li
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Jiaping Lin
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liangshun Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
12
|
AlFaraj Y, Mohapatra S, Shieh P, Husted KEL, Ivanoff DG, Lloyd EM, Cooper JC, Dai Y, Singhal AP, Moore JS, Sottos NR, Gomez-Bombarelli R, Johnson JA. A Model Ensemble Approach Enables Data-Driven Property Prediction for Chemically Deconstructable Thermosets in the Low-Data Regime. ACS CENTRAL SCIENCE 2023; 9:1810-1819. [PMID: 37780353 PMCID: PMC10540282 DOI: 10.1021/acscentsci.3c00502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Indexed: 10/03/2023]
Abstract
Thermosets present sustainability challenges that could potentially be addressed through the design of deconstructable variants with tunable properties; however, the combinatorial space of possible thermoset molecular building blocks (e.g., monomers, cross-linkers, and additives) and manufacturing conditions is vast, and predictive knowledge for how combinations of these molecular components translate to bulk thermoset properties is lacking. Data science could overcome these problems, but computational methods are difficult to apply to multicomponent, amorphous, statistical copolymer materials for which little data exist. Here, leveraging a data set with 101 examples, we introduce a closed-loop experimental, machine learning (ML), and virtual screening strategy to enable predictions of the glass transition temperature (Tg) of polydicyclopentadiene (pDCPD) thermosets containing cleavable bifunctional silyl ether (BSE) comonomers and/or cross-linkers with varied compositions and loadings. Molecular features and formulation variables are used as model inputs, and uncertainty is quantified through model ensembling, which together with heavy regularization helps to avoid overfitting and ultimately achieves predictions within <15 °C for thermosets with compositionally diverse BSEs. This work offers a path to predicting the properties of thermosets based on their molecular building blocks, which may accelerate the discovery of promising plastics, rubbers, and composites with improved functionality and controlled deconstructability.
Collapse
Affiliation(s)
- Yasmeen
S. AlFaraj
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Somesh Mohapatra
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Peyton Shieh
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Keith E. L. Husted
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Douglass G. Ivanoff
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Evan M. Lloyd
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
- Department
of Chemistry, University of Illinois at
Urbana—Champaign, Urbana, Illinois 61801, United States of America
| | - Julian C. Cooper
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
- Department
of Chemistry, University of Illinois at
Urbana—Champaign, Urbana, Illinois 61801, United States of America
| | - Yutong Dai
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Avni P. Singhal
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Jeffrey S. Moore
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Nancy R. Sottos
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Rafael Gomez-Bombarelli
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Jeremiah A. Johnson
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| |
Collapse
|
13
|
McDonald SM, Augustine EK, Lanners Q, Rudin C, Catherine Brinson L, Becker ML. Applied machine learning as a driver for polymeric biomaterials design. Nat Commun 2023; 14:4838. [PMID: 37563117 PMCID: PMC10415291 DOI: 10.1038/s41467-023-40459-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 07/24/2023] [Indexed: 08/12/2023] Open
Abstract
Polymers are ubiquitous to almost every aspect of modern society and their use in medical products is similarly pervasive. Despite this, the diversity in commercial polymers used in medicine is stunningly low. Considerable time and resources have been extended over the years towards the development of new polymeric biomaterials which address unmet needs left by the current generation of medical-grade polymers. Machine learning (ML) presents an unprecedented opportunity in this field to bypass the need for trial-and-error synthesis, thus reducing the time and resources invested into new discoveries critical for advancing medical treatments. Current efforts pioneering applied ML in polymer design have employed combinatorial and high throughput experimental design to address data availability concerns. However, the lack of available and standardized characterization of parameters relevant to medicine, including degradation time and biocompatibility, represents a nearly insurmountable obstacle to ML-aided design of biomaterials. Herein, we identify a gap at the intersection of applied ML and biomedical polymer design, highlight current works at this junction more broadly and provide an outlook on challenges and future directions.
Collapse
Affiliation(s)
| | - Emily K Augustine
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
| | - Quinn Lanners
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Cynthia Rudin
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - L Catherine Brinson
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
| | - Matthew L Becker
- Department of Chemistry, Duke University, Durham, NC, USA.
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA.
| |
Collapse
|
14
|
Yue T, He J, Tao L, Li Y. High-Throughput Screening and Prediction of High Modulus of Resilience Polymers Using Explainable Machine Learning. J Chem Theory Comput 2023; 19:4641-4653. [PMID: 37338332 DOI: 10.1021/acs.jctc.3c00131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2023]
Abstract
The ability to store and release elastic strain energy, as well as mechanical strength, are crucial factors in both natural and man-made mechanical systems. The modulus of resilience (R) indicates a material's capacity to absorb and release elastic strain energy, with the yield strength (σy) and Young's modulus (E) as R = σy2/(2E) for linear elastic solids. To improve the R in linear elastic solids, a high σy and low E combination in materials is sought after. However, achieving this combination is a significant challenge as both properties typically increase together. To address this challenge, we propose a computational method to quickly identify polymers with a high modulus of resilience using machine learning (ML) and validate the predictions through high-fidelity molecular dynamics (MD) simulations. Our approach commences by training single-task ML models, multitask ML models, and Evidential Deep Learning models to forecast the mechanical properties of polymers based on experimentally reported values. Utilizing explainable ML models, we were able to determine the critical substructures that significantly impact the mechanical properties of polymers, such as E and σy. This information can be utilized to create and develop new polymers with improved mechanical characteristics. Our single-task and multitask ML models can predict the properties of 12 854 real polymers and 8 million hypothetical polyimides and uncover 10 new real polymers and 10 hypothetical polyimides with exceptional modulus of resilience. The improved modulus of resilience of these novel polymers was validated through MD simulations. Our method efficiently speeds up the discovery of high-performing polymers using ML predictions and MD validation and can be applied to other polymer material discovery challenges, such as polymer membranes, dielectric polymers, and more.
Collapse
Affiliation(s)
- Tianle Yue
- Department of Mechanical Engineering, University of Wisconsin─Madison, Madison, Wisconsin 53706, United States
| | - Jinlong He
- Department of Mechanical Engineering, University of Wisconsin─Madison, Madison, Wisconsin 53706, United States
| | - Lei Tao
- Department of Mechanical Engineering, University of Connecticut, Storrs, Connecticut 06269, United States
| | - Ying Li
- Department of Mechanical Engineering, University of Wisconsin─Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
15
|
Park NH, Manica M, Born J, Hedrick JL, Erdmann T, Zubarev DY, Adell-Mill N, Arrechea PL. Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language. Nat Commun 2023; 14:3686. [PMID: 37344485 PMCID: PMC10284867 DOI: 10.1038/s41467-023-39396-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 06/12/2023] [Indexed: 06/23/2023] Open
Abstract
Advances in machine learning (ML) and automated experimentation are poised to vastly accelerate research in polymer science. Data representation is a critical aspect for enabling ML integration in research workflows, yet many data models impose significant rigidity making it difficult to accommodate a broad array of experiment and data types found in polymer science. This inflexibility presents a significant barrier for researchers to leverage their historical data in ML development. Here we show that a domain specific language, termed Chemical Markdown Language (CMDL), provides flexible, extensible, and consistent representation of disparate experiment types and polymer structures. CMDL enables seamless use of historical experimental data to fine-tune regression transformer (RT) models for generative molecular design tasks. We demonstrate the utility of this approach through the generation and the experimental validation of catalysts and polymers in the context of ring-opening polymerization-although we provide examples of how CMDL can be more broadly applied to other polymer classes. Critically, we show how the CMDL tuned model preserves key functional groups within the polymer structure, allowing for experimental validation. These results reveal the versatility of CMDL and how it facilitates translation of historical data into meaningful predictive and generative models to produce experimentally actionable output.
Collapse
Affiliation(s)
| | - Matteo Manica
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
| | - Jannis Born
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058, Basel, Switzerland
| | - James L Hedrick
- IBM Research-Almaden, 650 Harry Rd., San Jose, CA, 95120, USA
| | - Tim Erdmann
- IBM Research-Almaden, 650 Harry Rd., San Jose, CA, 95120, USA
| | | | - Nil Adell-Mill
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Arctoris, 120E Olympic Avenue, Abingdon, OX14 4SA, Oxfordshire, UK
| | | |
Collapse
|
16
|
Yamada S, Tsuboi Y, Yokoyama D, Kikuchi J. Polymer composition optimization approach based on feature extraction of bound and free water using time-domain nuclear magnetic resonance. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2023; 351:107438. [PMID: 37084520 DOI: 10.1016/j.jmr.2023.107438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 03/30/2023] [Accepted: 04/05/2023] [Indexed: 05/03/2023]
Abstract
As global environmental sustainability becomes increasingly emphasized, the development of eco-friendly materials, including solutions to the issue of marine plastics, is thriving. However, the material parameter space is vast, making efficient search a challenge. Time-domain nuclear magnetic resonance offers material property information through the complex T2 relaxation curves resulting from multiple mobilities. In this research, we used the Carr-Purcell-Meiboom-Gill (CPMG) pulse sequence to evaluate the binding state of water (water affinity) in polymers synthesized with various monomer compositions, which were immersed in seawater. We also assessed the T2 relaxation property of the polymers using the magic sandwich echo, double quantum filter, and magic-and-polarization echo filter techniques. We separated the T2 relaxation curves of CPMG into free and bound water for polymers by employing semisupervized nonnegative matrix factorization. By employing the features of separated bound water and polymer properties, a polymer composition optimization method offered crucial factors to monomers through random forests, predicted the components of the polymer using generative topography mapping regression, and determined expected values using Bayesian optimization for polymer composition candidates with the desired high water affinity and high rigidity.
Collapse
Affiliation(s)
- Shunji Yamada
- RIKEN Center for Sustainable Resource Science, 1-7-22, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Yuuri Tsuboi
- RIKEN Center for Sustainable Resource Science, 1-7-22, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Daiki Yokoyama
- RIKEN Center for Sustainable Resource Science, 1-7-22, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Jun Kikuchi
- RIKEN Center for Sustainable Resource Science, 1-7-22, Tsurumi-ku, Yokohama 230-0045, Japan; Graduate School of Bioagricultural Sciences, Nagoya University, 1 Furo-cho, Chikusa-ku, Nagoya, Aichi 464-0810, Japan; Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.
| |
Collapse
|
17
|
Armeli G, Peters JH, Koop T. Machine-Learning-Based Prediction of the Glass Transition Temperature of Organic Compounds Using Experimental Data. ACS OMEGA 2023; 8:12298-12309. [PMID: 37033862 PMCID: PMC10077449 DOI: 10.1021/acsomega.2c08146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Accepted: 03/07/2023] [Indexed: 06/19/2023]
Abstract
Knowledge of the glass transition temperature of molecular compounds that occur in atmospheric aerosol particles is important for estimating their viscosity, as it directly influences the kinetics of chemical reactions and particle phase state. While there is a great diversity of organic compounds present in aerosol particles, for only a minor fraction of them experimental glass transition temperatures are known. Therefore, we have developed a machine learning model designed to predict the glass transition temperature of organic molecular compounds based on molecule-derived input variables. The extremely randomized trees (extra trees) procedure was chosen for this purpose. Two approaches using different sets of input variables were followed. The first one uses the number of selected functional groups present in the compound, while the second one generates descriptors from a SMILES (Simplified Molecular Input Line Entry System) string. Organic compounds containing carbon, hydrogen, oxygen, nitrogen, and halogen atoms are included. For improved results, both approaches can be combined with the melting temperature of the compound as an additional input variable. The results show that the predictions of both approaches show a similar mean absolute error of about 12-13 K, with the SMILES-based predictions performing slightly better. In general, the model shows good predictive power considering the diversity of the experimental input data. Furthermore, we also show that its performance exceeds that of previous parameterizations developed for this purpose and also performs better than existing machine learning models. In order to provide user-friendly versions of the model for applications, we have developed a web site where the model can be run by interested scientists via a web-based interface without prior technical knowledge. We also provide Python code of the model. Additionally, all experimental input data are provided in form of the Bielefeld Molecular Organic Glasses (BIMOG) database. We believe that this model is a powerful tool for many applications in atmospheric aerosol science and material science.
Collapse
|
18
|
Yu M, Shi Y, Jia Q, Wang Q, Luo ZH, Yan F, Zhou YN. Ring Repeating Unit: An Upgraded Structure Representation of Linear Condensation Polymers for Property Prediction. J Chem Inf Model 2023; 63:1177-1187. [PMID: 36651860 DOI: 10.1021/acs.jcim.2c01389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Unique structure representation of polymers plays a crucial role in developing models for polymer property prediction and polymer design by data-centric approaches. Currently, monomer and repeating unit (RU) approximations are widely used to represent polymer structures for generating feature descriptors in the modeling of quantitative structure-property relationships (QSPR). However, such conventional structure representations may not uniquely approximate heterochain polymers due to the diversity of monomer combinations and the potential multi-RUs. In this study, the so-called ring repeating unit (RRU) method that can uniquely represent polymers with a broad range of structure diversity is proposed for the first time. As a proof of concept, an RRU-based QSPR model was developed to predict the associated glass transition temperature (Tg) of polyimides (PIs) with deterministic values. Comprehensive model validations including external, internal, and Y-random validations were performed. Also, an RU-based QSPR model developed based on the same large database of 1321 PIs provides nonunique prediction results, which further prove the necessity of RRU-based structure representation. Promising results obtained by the application of the RRU-based model confirm that the as-developed RRU method provides an effective representation that accurately captures the sequence of repeat units and thus realizes reliable polymer property prediction by data-driven approaches.
Collapse
Affiliation(s)
- Mengxian Yu
- School of Chemical Engineering and Materials Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Yajuan Shi
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai200240, P. R. China
| | - Qingzhu Jia
- School of Marine and Environmental Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Qiang Wang
- School of Chemical Engineering and Materials Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Zheng-Hong Luo
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai200240, P. R. China
| | - Fangyou Yan
- School of Chemical Engineering and Materials Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Yin-Ning Zhou
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai200240, P. R. China
| |
Collapse
|
19
|
Martin TB, Audus DJ. Emerging Trends in Machine Learning: A Polymer Perspective. ACS POLYMERS AU 2023. [DOI: 10.1021/acspolymersau.2c00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Affiliation(s)
- Tyler B. Martin
- National Institute of Standards and Technology, Gaithersburg, Maryland20899, United States
| | - Debra J. Audus
- National Institute of Standards and Technology, Gaithersburg, Maryland20899, United States
| |
Collapse
|
20
|
Tao L, He J, Arbaugh T, McCutcheon JR, Li Y. Machine learning prediction on the fractional free volume of polymer membranes. J Memb Sci 2023. [DOI: 10.1016/j.memsci.2022.121131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
21
|
Volgin IV, Batyr PA, Matseevich AV, Dobrovskiy AY, Andreeva MV, Nazarychev VM, Larin SV, Goikhman MY, Vizilter YV, Askadskii AA, Lyulin SV. Machine Learning with Enormous "Synthetic" Data Sets: Predicting Glass Transition Temperature of Polyimides Using Graph Convolutional Neural Networks. ACS OMEGA 2022; 7:43678-43691. [PMID: 36506114 PMCID: PMC9730753 DOI: 10.1021/acsomega.2c04649] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/28/2022] [Indexed: 06/17/2023]
Abstract
In the present work, we address the problem of utilizing machine learning (ML) methods to predict the thermal properties of polymers by establishing "structure-property" relationships. Having focused on a particular class of heterocyclic polymers, namely polyimides (PIs), we developed a graph convolutional neural network (GCNN), being one of the most promising tools for working with big data, to predict the PI glass transition temperature T g as an example of the fundamental property of polymers. To train the GCNN, we propose an original methodology based on using a "transfer learning" approach with an enormous "synthetic" data set for pretraining and a small experimental data set for its fine-tuning. The "synthetic" data set contains more than 6 million combinatorically generated repeating units of PIs and theoretical values of their T g values calculated using the well-established Askadskii's quantitative structure-property relationship (QSPR) computational scheme. Additionally, an experimental data set for 214 PIs was also collected from the literature for training, fine-tuning, and validation of the GCNN. Both "synthetic" and experimental data sets are included into a PolyAskInG database (Polymer Askadskii's Intelligent Gateway). By using the PolyAskInG database, we developed GCNN which allows estimation of T g of PI with a mean absolute error (MAE) of about 20 K, which is 1.5 times lower than in the case of Askadskii QSPR analysis (33 K). To prove the efficiency and usability of the proposed GCNN architecture and training methodology for predicting polymer properties, we also employed "transfer learning" to develop alternative GCNN pretrained on proxy-characteristics taken from the popular quantum-chemical QM9 database for small compounds and fine-tuned on an experimental T g values data set from PolyAskInG database. The obtained results indicate that pretraining of GCNN on the "synthetic" polymer data set provides MAE which is almost twice as low as that in the case of using the QM9 data set in the pretraining stage (∼41 K). Furthermore, we address the questions associated with the influence of the differences in the size of the experimental and "synthetic" data sets (so-called "reality gap" problem), as well as their chemical composition on the training quality. Our results state the overall priority of using polymer data sets for developing deep neural networks, and GCNN in particular, for efficient prediction of polymer properties. Moreover, our work opens up a challenge for the theoretically supported generation of large "synthetic" data sets of polymer properties for the training of the complex ML models. The proposed methodology is rather versatile and may be generalized for predicting other properties of different polymers and copolymers synthesized through the polycondensation reaction.
Collapse
Affiliation(s)
- Igor V. Volgin
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences (IMC
RAS), St. Petersburg 199004, Russian Federation
| | - Pavel A. Batyr
- Federal
State Unitary Enterprise “State Research Institute of Aviation
Systems” (GosNIIAS), Moscow 125167, Russian Federation
| | - Andrey V. Matseevich
- A.N.
Nesmeyanov Institute of Organoelement Compounds of Russian Academy
of Sciences (INEOS RAS), Moscow 119991, Russian Federation
| | - Alexey Yu. Dobrovskiy
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences (IMC
RAS), St. Petersburg 199004, Russian Federation
| | - Maria V. Andreeva
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences (IMC
RAS), St. Petersburg 199004, Russian Federation
| | - Victor M. Nazarychev
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences (IMC
RAS), St. Petersburg 199004, Russian Federation
| | - Sergey V. Larin
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences (IMC
RAS), St. Petersburg 199004, Russian Federation
| | - Mikhail Ya. Goikhman
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences (IMC
RAS), St. Petersburg 199004, Russian Federation
| | - Yury V. Vizilter
- Federal
State Unitary Enterprise “State Research Institute of Aviation
Systems” (GosNIIAS), Moscow 125167, Russian Federation
| | - Andrey A. Askadskii
- A.N.
Nesmeyanov Institute of Organoelement Compounds of Russian Academy
of Sciences (INEOS RAS), Moscow 119991, Russian Federation
- Moscow
State University of Civil Engineering (MGSU), Moscow 129337, Russian Federation
| | - Sergey V. Lyulin
- Institute
of Macromolecular Compounds of the Russian Academy of Sciences (IMC
RAS), St. Petersburg 199004, Russian Federation
| |
Collapse
|
22
|
Antoniuk ER, Li P, Kailkhura B, Hiszpanski AM. Representing Polymers as Periodic Graphs with Learned Descriptors for Accurate Polymer Property Predictions. J Chem Inf Model 2022; 62:5435-5445. [PMID: 36315033 DOI: 10.1021/acs.jcim.2c00875] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Accurately predicting new polymers' properties with machine learning models apriori to synthesis has potential to significantly accelerate new polymers' discovery and development. However, accurately and efficiently capturing polymers' complex, periodic structures in machine learning models remains a grand challenge for the polymer cheminformatics community. Specifically, there has yet to be an ideal solution for the problems of how to capture the periodicity of polymers, as well as how to optimally develop polymer descriptors without requiring human-based feature design. In this work, we tackle these problems by utilizing a periodic polymer graph representation that accounts for polymers' periodicity and coupling it with a message-passing neural network that leverages the power of graph deep learning to automatically learn chemically relevant polymer descriptors. Remarkably, this approach achieves state-of-the-art performance on 8 out of 10 distinct polymer property prediction tasks. These results highlight the advancement in predictive capability that is possible through learning descriptors that are specifically optimized for capturing the unique chemical structure of polymers.
Collapse
Affiliation(s)
- Evan R Antoniuk
- Materials Science Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California94550-5507, United States
| | - Peggy Li
- Global Security Computing Applications Division, Computing Directorate, Lawrence Livermore National Laboratory, Livermore, California94550-5507, United States
| | - Bhavya Kailkhura
- Machine Intelligence Group/Center for Applied Scientific Computing, Computing Directorate, Lawrence Livermore National Laboratory, Livermore, California94550-5507, United States
| | - Anna M Hiszpanski
- Materials Science Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California94550-5507, United States
| |
Collapse
|
23
|
Tao L, Arbaugh T, Byrnes J, Varshney V, Li Y. Unified machine learning protocol for copolymer structure-property predictions. STAR Protoc 2022; 3:101875. [PMID: 36595914 PMCID: PMC9700038 DOI: 10.1016/j.xpro.2022.101875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Revised: 10/06/2022] [Accepted: 11/01/2022] [Indexed: 11/23/2022] Open
Abstract
Structure-property relationships are extremely valuable when predicting the properties of polymers. This protocol demonstrates a step-by-step approach, based on multiple machine learning (ML) architectures, which is capable of processing copolymer types such as alternating, random, block, and gradient copolymers. We detail steps for necessary software installation and construction of datasets. We further describe training and optimization steps for four neural network models and subsequent model visualization and comparison using training and test values. For complete details on the use and execution of this protocol, please refer to Tao et al. (2022).1.
Collapse
Affiliation(s)
- Lei Tao
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA
| | - Tom Arbaugh
- Department of Physics, Wesleyan University, Middletown, CT 06459, USA
| | | | - Vikas Varshney
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, Dayton, OH 45433, USA
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA,Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, WI 53706-1572, USA,Corresponding author
| |
Collapse
|
24
|
Schmid F. Understanding and Modeling Polymers: The Challenge of Multiple Scales. ACS POLYMERS AU 2022. [DOI: 10.1021/acspolymersau.2c00049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Friederike Schmid
- Institut für Physik, Johannes Gutenberg-Universität Mainz, Staudingerweg 9, 55128Mainz, Germany
| |
Collapse
|
25
|
Yang J, Tao L, He J, McCutcheon JR, Li Y. Machine learning enables interpretable discovery of innovative polymers for gas separation membranes. SCIENCE ADVANCES 2022; 8:eabn9545. [PMID: 35857839 PMCID: PMC9299556 DOI: 10.1126/sciadv.abn9545] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 06/07/2022] [Indexed: 05/21/2023]
Abstract
Polymer membranes perform innumerable separations with far-reaching environmental implications. Despite decades of research, design of new membrane materials remains a largely Edisonian process. To address this shortcoming, we demonstrate a generalizable, accurate machine learning (ML) implementation for the discovery of innovative polymers with ideal performance. Specifically, multitask ML models are trained on experimental data to link polymer chemistry to gas permeabilities of He, H2, O2, N2, CO2, and CH4. We interpret the ML models and extract valuable insights into the contributions of different chemical moieties to permeability and selectivity. We then screen over 9 million hypothetical polymers and identify thousands that lie well above current performance upper bounds, including hundreds of never-before-seen ultrapermeable polymer membranes with O2 and CO2 permeability greater than 104 and 105 Barrers, respectively. High-fidelity molecular dynamics simulations confirm the ML-predicted gas permeabilities of the promising candidates, which suggests that many can be translated to reality.
Collapse
Affiliation(s)
- Jason Yang
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Lei Tao
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA
| | - Jinlong He
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA
| | - Jeffrey R. McCutcheon
- Department of Chemical & Biomolecular Engineering, Center for Environmental Sciences and Engineering, University of Connecticut, Storrs, CT 06269, USA
- Polymer Program, Institute of Materials Science, University of Connecticut, Storrs, CT 06269, USA
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA
- Polymer Program, Institute of Materials Science, University of Connecticut, Storrs, CT 06269, USA
- Corresponding author.
| |
Collapse
|
26
|
Wang M, Jiang J. Accelerating Discovery of High Fractional Free Volume Polymers from a Data-Driven Approach. ACS APPLIED MATERIALS & INTERFACES 2022; 14:31203-31215. [PMID: 35767720 DOI: 10.1021/acsami.2c03917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
As a fundamental structure characteristic in polymers, fractional free volume (FFV) plays an indispensable role in governing polymer properties and performance. However, the design of new high-FFV polymers is challenging. In this study, we report a data-driven approach and aim to accelerate the discovery of high-FFV polymers. First, a computational method is proposed to calculate FFV, and a two-step fragmentation method is developed to construct a fragment library for digital representation of polymer structures. Data mining is employed to identify promising fragments for high FFV. Subsequently, machine learning (ML) models are trained using a data set with 1683 polymers and their excellent transferability is demonstrated by out-of-sample predictions in another data set with 11,479 polymers. Finally, the ML models are used to screen ∼1 million hypothetical polymers, and 29,482 polymers with FFV > 0.2 are shortlisted; representative high-FFV polymers are validated by molecular simulations, and design strategies are highlighted. To further facilitate the discovery of new high-FFV polymers, we develop an online interactive platform https://ffv-prediction.herokuapp.com, which allows for rapid FFV predictions, given polymer structures. The data-driven approach in this study might advance the development of new high-FFV polymers and further explore quantitative structure-property relationships for polymers.
Collapse
Affiliation(s)
- Mao Wang
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 117576 Singapore, Singapore
| | - Jianwen Jiang
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 117576 Singapore, Singapore
| |
Collapse
|
27
|
Tao L, Byrnes J, Varshney V, Li Y. Machine learning strategies for the structure-property relationship of copolymers. iScience 2022; 25:104585. [PMID: 35789847 PMCID: PMC9249671 DOI: 10.1016/j.isci.2022.104585] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/26/2022] [Accepted: 06/07/2022] [Indexed: 11/15/2022] Open
Abstract
Establishing the structure-property relationship is extremely valuable for the molecular design of copolymers. However, machine learning (ML) models can incorporate both chemical composition and sequence distribution of monomers, and have the generalization ability to process various copolymer types (e.g., alternating, random, block, and gradient copolymers) with a unified approach are missing. To address this challenge, we formulate four different ML models for investigation, including a feedforward neural network (FFNN) model, a convolutional neural network (CNN) model, a recurrent neural network (RNN) model, and a combined FFNN/RNN (Fusion) model. We use various copolymer types to systematically validate the performance and generalizability of different models. We find that the RNN architecture that processes the monomer sequence information both forward and backward is a more suitable ML model for copolymers with better generalizability. As a supplement to polymer informatics, our proposed approach provides an efficient way for the evaluation of copolymers. Establish structure-property relationships of copolymer with machine learning (ML) Incorporate both chemical composition and sequential distribution of copolymers Analyze various copolymer types with different models in a unified approach Differentiate the effects of random, block, and gradient patterns of copolymers
Collapse
Affiliation(s)
- Lei Tao
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA
| | | | - Vikas Varshney
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433, USA
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA
- Polymer Program, Institute of Materials Science, University of Connecticut, Storrs, CT 06269, USA
- Corresponding author
| |
Collapse
|
28
|
Mairpady A, Mourad AHI, Mozumder MS. Accelerated Discovery of the Polymer Blends for Cartilage Repair through Data-Mining Tools and Machine-Learning Algorithm. Polymers (Basel) 2022; 14:polym14091802. [PMID: 35566970 PMCID: PMC9104973 DOI: 10.3390/polym14091802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 04/20/2022] [Accepted: 04/20/2022] [Indexed: 11/23/2022] Open
Abstract
In designing successful cartilage substitutes, the selection of scaffold materials plays a central role, among several other important factors. In an empirical approach, the selection of the most appropriate polymer(s) for cartilage repair is an expensive and time-consuming affair, as traditionally it requires numerous trials. Moreover, it is humanly impossible to go through the huge library of literature available on the potential polymer(s) and to correlate the physical, mechanical, and biological properties that might be suitable for cartilage tissue engineering. Hence, the objective of this study is to implement an inverse design approach to predict the best polymer(s)/blend(s) for cartilage repair by using a machine-learning algorithm (i.e., multinomial logistic regression (MNLR)). Initially, a systematic bibliometric analysis on cartilage repair has been performed by using the bibliometrix package in the R program. Then, the database was created by extracting the mechanical properties of the most frequently used polymers/blends from the PoLyInfo library by using data-mining tools. Then, an MNLR algorithm was run by using the mechanical properties of the polymers, which are similar to the cartilages, as the input and the polymer(s)/blends as the predicted output. The MNLR algorithm used in this study predicts polyethylene/polyethylene-graftpoly(maleic anhydride) blend as the best candidate for cartilage repair.
Collapse
Affiliation(s)
- Anusha Mairpady
- Chemical and Petroleum Engineering Department, UAE University, Al Ain P.O. Box 15551, United Arab Emirates;
| | - Abdel-Hamid I. Mourad
- Mechanical and Aerospace Engineering Department, UAE University, Al Ain P.O. Box 15551, United Arab Emirates;
- National Water and Energy Center, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
| | - Mohammad Sayem Mozumder
- Chemical and Petroleum Engineering Department, UAE University, Al Ain P.O. Box 15551, United Arab Emirates;
- Correspondence:
| |
Collapse
|
29
|
Ethier JG, Casukhela RK, Latimer JJ, Jacobsen MD, Rasin B, Gupta MK, Baldwin LA, Vaia RA. Predicting Phase Behavior of Linear Polymers in Solution Using Machine Learning. Macromolecules 2022. [DOI: 10.1021/acs.macromol.2c00245] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jeffrey G. Ethier
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433, United States
- UES, Inc., Dayton, Ohio 45431, United States
| | - Rohan K. Casukhela
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433, United States
- UES, Inc., Dayton, Ohio 45431, United States
| | - Joshua J. Latimer
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433, United States
- UES, Inc., Dayton, Ohio 45431, United States
| | - Matthew D. Jacobsen
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433, United States
| | - Boris Rasin
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433, United States
| | - Maneesh K. Gupta
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433, United States
| | - Luke A. Baldwin
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433, United States
| | - Richard A. Vaia
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433, United States
| |
Collapse
|
30
|
Nguyen D, Tao L, Li Y. Integration of Machine Learning and Coarse-Grained Molecular Simulations for Polymer Materials: Physical Understandings and Molecular Design. Front Chem 2022; 9:820417. [PMID: 35141207 PMCID: PMC8819075 DOI: 10.3389/fchem.2021.820417] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 12/31/2021] [Indexed: 12/21/2022] Open
Abstract
In recent years, the synthesis of monomer sequence-defined polymers has expanded into broad-spectrum applications in biomedical, chemical, and materials science fields. Pursuing the characterization and inverse design of these polymer systems requires our fundamental understanding not only at the individual monomer level, but also considering the chain scales, such as polymer configuration, self-assembly, and phase separation. However, our accessibility to this field is still rudimentary due to the limitations of traditional design approaches, the complexity of chemical space along with the burdened cost and time issues that prevent us from unveiling the underlying monomer sequence-structure-property relationships. Fortunately, thanks to the recent advancements in molecular dynamics simulations and machine learning (ML) algorithms, the bottlenecks in the tasks of establishing the structure-function correlation of the polymer chains can be overcome. In this review, we will discuss the applications of the integration between ML techniques and coarse-grained molecular dynamics (CGMD) simulations to solve the current issues in polymer science at the chain level. In particular, we focus on the case studies in three important topics—polymeric configuration characterization, feed-forward property prediction, and inverse design—in which CGMD simulations are leveraged to generate training datasets to develop ML-based surrogate models for specific polymer systems and designs. By doing so, this computational hybridization allows us to well establish the monomer sequence-functional behavior relationship of the polymers as well as guide us toward the best polymer chain candidates for the inverse design in undiscovered chemical space with reasonable computational cost and time. Even though there are still limitations and challenges ahead in this field, we finally conclude that this CGMD/ML integration is very promising, not only in the attempt of bridging the monomeric and macroscopic characterizations of polymer materials, but also enabling further tailored designs for sequence-specific polymers with superior properties in many practical applications.
Collapse
Affiliation(s)
- Danh Nguyen
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
| | - Lei Tao
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
- Polymer Program, Institute of Materials Science, University of Connecticut, Mansfield, CT, United States
- *Correspondence: Ying Li,
| |
Collapse
|
31
|
Aldeghi M, Coley CW. A graph representation of molecular ensembles for polymer property prediction. Chem Sci 2022; 13:10486-10498. [PMID: 36277616 PMCID: PMC9473492 DOI: 10.1039/d2sc02839e] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 08/15/2022] [Indexed: 12/02/2022] Open
Abstract
Synthetic polymers are versatile and widely used materials. Similar to small organic molecules, a large chemical space of such materials is hypothetically accessible. Computational property prediction and virtual screening can accelerate polymer design by prioritizing candidates expected to have favorable properties. However, in contrast to organic molecules, polymers are often not well-defined single structures but an ensemble of similar molecules, which poses unique challenges to traditional chemical representations and machine learning approaches. Here, we introduce a graph representation of molecular ensembles and an associated graph neural network architecture that is tailored to polymer property prediction. We demonstrate that this approach captures critical features of polymeric materials, like chain architecture, monomer stoichiometry, and degree of polymerization, and achieves superior accuracy to off-the-shelf cheminformatics methodologies. While doing so, we built a dataset of simulated electron affinity and ionization potential values for >40k polymers with varying monomer composition, stoichiometry, and chain architecture, which may be used in the development of other tailored machine learning approaches. The dataset and machine learning models presented in this work pave the path toward new classes of algorithms for polymer informatics and, more broadly, introduce a framework for the modeling of molecular ensembles. A graph representation that captures critical features of polymeric materials and an associated graph neural network achieve superior accuracy to off-the-shelf cheminformatics methodologies.![]()
Collapse
Affiliation(s)
- Matteo Aldeghi
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Connor W. Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|