1
|
Wu T, Zhou M, Zou J, Chen Q, Qian F, Kurths J, Liu R, Tang Y. AI-guided few-shot inverse design of HDP-mimicking polymers against drug-resistant bacteria. Nat Commun 2024; 15:6288. [PMID: 39060236 PMCID: PMC11282099 DOI: 10.1038/s41467-024-50533-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 07/11/2024] [Indexed: 07/28/2024] Open
Abstract
Host defense peptide (HDP)-mimicking polymers are promising therapeutic alternatives to antibiotics and have large-scale untapped potential. Artificial intelligence (AI) exhibits promising performance on large-scale chemical-content design, however, existing AI methods face difficulties on scarcity data in each family of HDP-mimicking polymers (<102), much smaller than public polymer datasets (>105), and multi-constraints on properties and structures when exploring high-dimensional polymer space. Herein, we develop a universal AI-guided few-shot inverse design framework by designing multi-modal representations to enrich polymer information for predictions and creating a graph grammar distillation for chemical space restriction to improve the efficiency of multi-constrained polymer generation with reinforcement learning. Exampled with HDP-mimicking β-amino acid polymers, we successfully simulate predictions of over 105 polymers and identify 83 optimal polymers. Furthermore, we synthesize an optimal polymer DM0.8iPen0.2 and find that this polymer exhibits broad-spectrum and potent antibacterial activity against multiple clinically isolated antibiotic-resistant pathogens, validating the effectiveness of AI-guided design strategy.
Collapse
Affiliation(s)
- Tianyu Wu
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China
| | - Min Zhou
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Jingcheng Zou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Qi Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Feng Qian
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China
| | - Jürgen Kurths
- Potsdam Institute for Climate Impact Research (PIK), Potsdam, 14473, Germany
- Institut für Physik, Humboldt-Universität zu Berlin, Berlin, 10115, Germany
- The Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, 200433, China
| | - Runhui Liu
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, China.
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.
| | - Yang Tang
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
2
|
Xian W, Zhan YS, Maiti A, Saab AP, Li Y. Filled Elastomers: Mechanistic and Physics-Driven Modeling and Applications as Smart Materials. Polymers (Basel) 2024; 16:1387. [PMID: 38794580 PMCID: PMC11125212 DOI: 10.3390/polym16101387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/06/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024] Open
Abstract
Elastomers are made of chain-like molecules to form networks that can sustain large deformation. Rubbers are thermosetting elastomers that are obtained from irreversible curing reactions. Curing reactions create permanent bonds between the molecular chains. On the other hand, thermoplastic elastomers do not need curing reactions. Incorporation of appropriated filler particles, as has been practiced for decades, can significantly enhance mechanical properties of elastomers. However, there are fundamental questions about polymer matrix composites (PMCs) that still elude complete understanding. This is because the macroscopic properties of PMCs depend not only on the overall volume fraction (ϕ) of the filler particles, but also on their spatial distribution (i.e., primary, secondary, and tertiary structure). This work aims at reviewing how the mechanical properties of PMCs are related to the microstructure of filler particles and to the interaction between filler particles and polymer matrices. Overall, soft rubbery matrices dictate the elasticity/hyperelasticity of the PMCs while the reinforcement involves polymer-particle interactions that can significantly influence the mechanical properties of the polymer matrix interface. For ϕ values higher than a threshold, percolation of the filler particles can lead to significant reinforcement. While viscoelastic behavior may be attributed to the soft rubbery component, inelastic behaviors like the Mullins and Payne effects are highly correlated to the microstructures of the polymer matrix and the filler particles, as well as that of the polymer-particle interface. Additionally, the incorporation of specific filler particles within intelligently designed polymer systems has been shown to yield a variety of functional and responsive materials, commonly termed smart materials. We review three types of smart PMCs, i.e., magnetoelastic (M-), shape-memory (SM-), and self-healing (SH-) PMCs, and discuss the constitutive models for these smart materials.
Collapse
Affiliation(s)
- Weikang Xian
- Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA; (W.X.); (Y.-S.Z.)
| | - You-Shu Zhan
- Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA; (W.X.); (Y.-S.Z.)
| | - Amitesh Maiti
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA; (A.M.); (A.P.S.)
| | - Andrew P. Saab
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA; (A.M.); (A.P.S.)
| | - Ying Li
- Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA; (W.X.); (Y.-S.Z.)
| |
Collapse
|
3
|
Patel RA, Webb MA. Data-Driven Design of Polymer-Based Biomaterials: High-throughput Simulation, Experimentation, and Machine Learning. ACS APPLIED BIO MATERIALS 2024; 7:510-527. [PMID: 36701125 DOI: 10.1021/acsabm.2c00962] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Polymers, with the capacity to tunably alter properties and response based on manipulation of their chemical characteristics, are attractive components in biomaterials. Nevertheless, their potential as functional materials is also inhibited by their complexity, which complicates rational or brute-force design and realization. In recent years, machine learning has emerged as a useful tool for facilitating materials design via efficient modeling of structure-property relationships in the chemical domain of interest. In this Spotlight, we discuss the emergence of data-driven design of polymers that can be deployed in biomaterials with particular emphasis on complex copolymer systems. We outline recent developments, as well as our own contributions and takeaways, related to high-throughput data generation for polymer systems, methods for surrogate modeling by machine learning, and paradigms for property optimization and design. Throughout this discussion, we highlight key aspects of successful strategies and other considerations that will be relevant to the future design of polymer-based biomaterials with target properties.
Collapse
Affiliation(s)
- Roshan A Patel
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| |
Collapse
|
4
|
Shi J, Walsh D, Zou W, Rebello NJ, Deagen ME, Fransen KA, Gao X, Olsen BD, Audus DJ. Calculating Pairwise Similarity of Polymer Ensembles via Earth Mover's Distance. ACS POLYMERS AU 2024; 4:66-76. [PMID: 38371731 PMCID: PMC10870752 DOI: 10.1021/acspolymersau.3c00029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 02/20/2024]
Abstract
Synthetic polymers, in contrast to small molecules and deterministic biomacromolecules, are typically ensembles composed of polymer chains with varying numbers, lengths, sequences, chemistry, and topologies. While numerous approaches exist for measuring pairwise similarity among small molecules and sequence-defined biomacromolecules, accurately determining the pairwise similarity between two polymer ensembles remains challenging. This work proposes the earth mover's distance (EMD) metric to calculate the pairwise similarity score between two polymer ensembles. EMD offers a greater resolution of chemical differences between polymer ensembles than the averaging method and provides a quantitative numeric value representing the pairwise similarity between polymer ensembles in alignment with chemical intuition. The EMD approach for assessing polymer similarity enhances the development of accurate chemical search algorithms within polymer databases and can improve machine learning techniques for polymer design, optimization, and property prediction.
Collapse
Affiliation(s)
- Jiale Shi
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Dylan Walsh
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Weizhong Zou
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Nathan J. Rebello
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael E. Deagen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Katharina A. Fransen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Xian Gao
- Department
of Chemical and Biomolecular Engineering, University of Notre Dame, Notre
Dame, Indiana 46556, United States
| | - Bradley D. Olsen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Debra J. Audus
- Materials
Science and Engineering Division, National
Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
5
|
Zhang P, Kearney L, Bhowmik D, Fox Z, Naskar AK, Gounley J. Transferring a Molecular Foundation Model for Polymer Property Predictions. J Chem Inf Model 2023; 63:7689-7698. [PMID: 38055952 DOI: 10.1021/acs.jcim.3c01650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
Transformer-based large language models have remarkable potential to accelerate design optimization for applications such as drug development and material discovery. Self-supervised pretraining of transformer models requires large-scale data sets, which are often sparsely populated in topical areas such as polymer science. State-of-the-art approaches for polymers conduct data augmentation to generate additional samples but unavoidably incur extra computational costs. In contrast, large-scale open-source data sets are available for small molecules and provide a potential solution to data scarcity through transfer learning. In this work, we show that using transformers pretrained on small molecules and fine-tuned on polymer properties achieves comparable accuracy to those trained on augmented polymer data sets for a series of benchmark prediction tasks.
Collapse
Affiliation(s)
- Pei Zhang
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Logan Kearney
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Debsindhu Bhowmik
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Zachary Fox
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Amit K Naskar
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - John Gounley
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| |
Collapse
|
6
|
Toland A, Tran H, Chen L, Li Y, Zhang C, Gutekunst W, Ramprasad R. Accelerated Scheme to Predict Ring-Opening Polymerization Enthalpy: Simulation-Experimental Data Fusion and Multitask Machine Learning. J Phys Chem A 2023; 127:10709-10716. [PMID: 38055927 PMCID: PMC10749451 DOI: 10.1021/acs.jpca.3c05870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/10/2023] [Accepted: 11/16/2023] [Indexed: 12/08/2023]
Abstract
Ring-opening enthalpy (ΔHROP) is a fundamental thermodynamic quantity controlling the polymerization and depolymerization of an important class of recyclable polymers, namely, those created from ring-opening polymerization (ROP). Highly accurate first-principles-based computational methods to compute ΔHROP are computationally too demanding to efficiently guide the design of depolymerizable polymers. In this work, we develop a generalizable machine-learning model that was trained on experimental measurements and reliably computed simulation results of ΔHROP (the latter provides a pathway to systematically increase the chemical diversity of the data). Predictions of ΔHROP using this machine-learning model require essentially no time while the prediction accuracy is about ∼8 kJ/mol, approaching the well-known chemical accuracy. We hope that this effort will contribute to the future development of new depolymerizable polymers.
Collapse
Affiliation(s)
- Aubrey Toland
- School
of Materials Science & Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Huan Tran
- School
of Materials Science & Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Lihua Chen
- School
of Materials Science & Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Yinghao Li
- School
of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Chao Zhang
- School
of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Will Gutekunst
- School
of Chemistry and Biochemistry, Georgia Institute
of Technology, Atlanta, Georgia 30332, United States
| | - Rampi Ramprasad
- School
of Materials Science & Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
7
|
Deng S, Chen C, Li K, Chen X, Xia K, Li S. Structure-Based Multilevel Descriptors for High-throughput Screening of Elastomers. J Phys Chem B 2023; 127:10077-10087. [PMID: 37942925 DOI: 10.1021/acs.jpcb.3c06025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2023]
Abstract
To discover new materials, high-throughput screening (HTS) with machine learning (ML) requires universally available descriptors that can accurately predict the desired properties. For elastomers, experimental and simulation data in current descriptors may not be available for all candidates of interest, hindering elastomer discovery through HTS. To address this challenge, we introduce structure-based multilevel (SM) descriptors of elastomers derived solely from molecular structure that is universally available. Our SM descriptors are hierarchically organized to capture both local soft and hard segment structures as well as the global structures of elastomers. With the SM-Morgan Fingerprint (SM-MF) descriptor, one of our SM descriptors, a machine learning model accurately predicts elastomer toughness with a remarkable accuracy of 0.91. Furthermore, an HTS pipeline is established to swiftly screen elastomers with targeted toughness. We also demonstrate the generality and applicability of SM descriptors by using them to construct HTS pipelines for screening elastomers with a targeted critical strain or Young's modulus. The user-friendliness and low computational cost of SM descriptors make them a promising tool to significantly enhance HTS in the search for novel materials.
Collapse
Affiliation(s)
- Siyan Deng
- School of Materials Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
| | - Chao Chen
- School of Materials Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
| | - Ke Li
- Institute of Materials Research and Engineering (IMRE), Agency for Science, Technology and Research (A*STAR), 2 Fusionopolis Way, Innovis #08-03, Singapore 138634, Republic of Singapore
| | - Xi Chen
- School of Materials Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
| | - Kelin Xia
- School of Physical and Mathematical Sciences, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
| | - Shuzhou Li
- School of Materials Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
| |
Collapse
|
8
|
Phua YK, Fujigaya T, Kato K. Predicting the anion conductivities and alkaline stabilities of anion conducting membrane polymeric materials: development of explainable machine learning models. SCIENCE AND TECHNOLOGY OF ADVANCED MATERIALS 2023; 24:2261833. [PMID: 37854121 PMCID: PMC10580864 DOI: 10.1080/14686996.2023.2261833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 09/18/2023] [Indexed: 10/20/2023]
Abstract
Anion exchange membranes (AEMs) are core components in fuel cells and water electrolyzers, which are crucial to realize a sustainable hydrogen society. The low anion conductivity and durability of AEMs have hindered the commercialization of AEM-based devices, and research and development (R&D) to improve AEM materials is often resource-intensive. Although machine learning (ML) is commonly used in many fields to accelerate R&D while reducing resource consumption, it is rarely used in the AEM field. Three problems hinder the adoption of ML models, namely, the low explainability of ML models; complication with expressing both homopolymers and copolymers in unity to train a single ML model; and difficulty in building a single ML model that comprehends various polymer types. This study presents the first ML models that solve all three problems. Our models predicted the anion conductivity for a diverse set of unseen AEM materials with high accuracy (root mean squared error = 0.014 S cm-1), regardless of their state (freshly synthesized or degraded). This enables virtual pre-synthesis screening of novel AEM materials, reducing resource consumption. Moreover, human-comprehensible prediction logic revealed new factors affecting the anion conductivity of AEM materials. Such capability to reveal new important variables for AEM materials design could shift the paradigm of AEM R&D. This proposed method is not limited to AEM materials, instead it presents a technology that is applicable to the diverse set of polymers currently available.
Collapse
Affiliation(s)
- Yin Kan Phua
- Department of Applied Chemistry, Graduate School of Engineering, Kyushu University, Fukuoka, Japan
| | - Tsuyohiko Fujigaya
- Department of Applied Chemistry, Graduate School of Engineering, Kyushu University, Fukuoka, Japan
- International Institute for Carbon Neutral Energy Research, Kyushu University, Fukuoka, Japan
- Center for Molecular Systems, Kyushu University, Fukuoka, Japan
| | - Koichiro Kato
- Department of Applied Chemistry, Graduate School of Engineering, Kyushu University, Fukuoka, Japan
- Center for Molecular Systems, Kyushu University, Fukuoka, Japan
- Research Institute for Information Technology, Kyushu University, Fukuoka, Japan
| |
Collapse
|
9
|
AlFaraj Y, Mohapatra S, Shieh P, Husted KEL, Ivanoff DG, Lloyd EM, Cooper JC, Dai Y, Singhal AP, Moore JS, Sottos NR, Gomez-Bombarelli R, Johnson JA. A Model Ensemble Approach Enables Data-Driven Property Prediction for Chemically Deconstructable Thermosets in the Low-Data Regime. ACS CENTRAL SCIENCE 2023; 9:1810-1819. [PMID: 37780353 PMCID: PMC10540282 DOI: 10.1021/acscentsci.3c00502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Indexed: 10/03/2023]
Abstract
Thermosets present sustainability challenges that could potentially be addressed through the design of deconstructable variants with tunable properties; however, the combinatorial space of possible thermoset molecular building blocks (e.g., monomers, cross-linkers, and additives) and manufacturing conditions is vast, and predictive knowledge for how combinations of these molecular components translate to bulk thermoset properties is lacking. Data science could overcome these problems, but computational methods are difficult to apply to multicomponent, amorphous, statistical copolymer materials for which little data exist. Here, leveraging a data set with 101 examples, we introduce a closed-loop experimental, machine learning (ML), and virtual screening strategy to enable predictions of the glass transition temperature (Tg) of polydicyclopentadiene (pDCPD) thermosets containing cleavable bifunctional silyl ether (BSE) comonomers and/or cross-linkers with varied compositions and loadings. Molecular features and formulation variables are used as model inputs, and uncertainty is quantified through model ensembling, which together with heavy regularization helps to avoid overfitting and ultimately achieves predictions within <15 °C for thermosets with compositionally diverse BSEs. This work offers a path to predicting the properties of thermosets based on their molecular building blocks, which may accelerate the discovery of promising plastics, rubbers, and composites with improved functionality and controlled deconstructability.
Collapse
Affiliation(s)
- Yasmeen
S. AlFaraj
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Somesh Mohapatra
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Peyton Shieh
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Keith E. L. Husted
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Douglass G. Ivanoff
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Evan M. Lloyd
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
- Department
of Chemistry, University of Illinois at
Urbana—Champaign, Urbana, Illinois 61801, United States of America
| | - Julian C. Cooper
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
- Department
of Chemistry, University of Illinois at
Urbana—Champaign, Urbana, Illinois 61801, United States of America
| | - Yutong Dai
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Avni P. Singhal
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Jeffrey S. Moore
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Nancy R. Sottos
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Rafael Gomez-Bombarelli
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Jeremiah A. Johnson
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| |
Collapse
|
10
|
Himanshu, Chakraborty K, Patra TK. Developing efficient deep learning model for predicting copolymer properties. Phys Chem Chem Phys 2023; 25:25166-25176. [PMID: 37712405 DOI: 10.1039/d3cp03100d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/16/2023]
Abstract
Deep learning models are gaining popularity and potency in predicting polymer properties. These models can be built using pre-existing data and are useful for the rapid prediction of polymer properties. However, the performance of a deep learning model is intricately connected to its topology and the volume of training data. There is no facile protocol available to select a deep learning architecture, and there is a lack of a large volume of homogeneous sequence-property data of polymers. These two factors are the primary bottleneck for the efficient development of deep learning models for polymers. Here we assess the severity of these factors and propose strategies to address them. We show that a linear layer-by-layer expansion of a neural network can help in identifying the best neural network topology for a given problem. Moreover, we map the discrete sequence space of a polymer to a continuous one-dimensional latent space using a feature extraction technique to identify minimal data points for training a deep learning model. We implement these approaches for two representative cases of building sequence-property surrogate models, viz., the single-molecule radius of gyration of a copolymer and copolymer compatibilizer. This work demonstrates efficient methods for building deep learning models with minimal data and hyperparameters for predicting sequence-defined properties of polymers.
Collapse
Affiliation(s)
- Himanshu
- Department of Chemical Engineering and Center for Atomistic Modeling and Materials Design, Indian Institute of Technology Madras, Chennai, TN 600036, India.
| | - Kaushik Chakraborty
- Department of Chemical Engineering and Center for Atomistic Modeling and Materials Design, Indian Institute of Technology Madras, Chennai, TN 600036, India.
| | - Tarak K Patra
- Department of Chemical Engineering and Center for Atomistic Modeling and Materials Design, Indian Institute of Technology Madras, Chennai, TN 600036, India.
| |
Collapse
|
11
|
Kuenneth C, Ramprasad R. polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat Commun 2023; 14:4099. [PMID: 37433807 DOI: 10.1038/s41467-023-39868-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 06/28/2023] [Indexed: 07/13/2023] Open
Abstract
Polymers are a vital part of everyday life. Their chemical universe is so large that it presents unprecedented opportunities as well as significant challenges to identify suitable application-specific candidates. We present a complete end-to-end machine-driven polymer informatics pipeline that can search this space for suitable candidates at unprecedented speed and accuracy. This pipeline includes a polymer chemical fingerprinting capability called polyBERT (inspired by Natural Language Processing concepts), and a multitask learning approach that maps the polyBERT fingerprints to a host of properties. polyBERT is a chemical linguist that treats the chemical structure of polymers as a chemical language. The present approach outstrips the best presently available concepts for polymer property prediction based on handcrafted fingerprint schemes in speed by two orders of magnitude while preserving accuracy, thus making it a strong candidate for deployment in scalable architectures including cloud infrastructures.
Collapse
Affiliation(s)
- Christopher Kuenneth
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
- Faculty of Engineering Science, University of Bayreuth, 95447, Bayreuth, Germany
| | - Rampi Ramprasad
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA.
| |
Collapse
|
12
|
Park NH, Manica M, Born J, Hedrick JL, Erdmann T, Zubarev DY, Adell-Mill N, Arrechea PL. Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language. Nat Commun 2023; 14:3686. [PMID: 37344485 PMCID: PMC10284867 DOI: 10.1038/s41467-023-39396-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 06/12/2023] [Indexed: 06/23/2023] Open
Abstract
Advances in machine learning (ML) and automated experimentation are poised to vastly accelerate research in polymer science. Data representation is a critical aspect for enabling ML integration in research workflows, yet many data models impose significant rigidity making it difficult to accommodate a broad array of experiment and data types found in polymer science. This inflexibility presents a significant barrier for researchers to leverage their historical data in ML development. Here we show that a domain specific language, termed Chemical Markdown Language (CMDL), provides flexible, extensible, and consistent representation of disparate experiment types and polymer structures. CMDL enables seamless use of historical experimental data to fine-tune regression transformer (RT) models for generative molecular design tasks. We demonstrate the utility of this approach through the generation and the experimental validation of catalysts and polymers in the context of ring-opening polymerization-although we provide examples of how CMDL can be more broadly applied to other polymer classes. Critically, we show how the CMDL tuned model preserves key functional groups within the polymer structure, allowing for experimental validation. These results reveal the versatility of CMDL and how it facilitates translation of historical data into meaningful predictive and generative models to produce experimentally actionable output.
Collapse
Affiliation(s)
| | - Matteo Manica
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
| | - Jannis Born
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058, Basel, Switzerland
| | - James L Hedrick
- IBM Research-Almaden, 650 Harry Rd., San Jose, CA, 95120, USA
| | - Tim Erdmann
- IBM Research-Almaden, 650 Harry Rd., San Jose, CA, 95120, USA
| | | | - Nil Adell-Mill
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Arctoris, 120E Olympic Avenue, Abingdon, OX14 4SA, Oxfordshire, UK
| | | |
Collapse
|
13
|
Martin TB, Audus DJ. Emerging Trends in Machine Learning: A Polymer Perspective. ACS POLYMERS AU 2023; 3:239-258. [PMID: 37334191 PMCID: PMC10273415 DOI: 10.1021/acspolymersau.2c00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/20/2022] [Accepted: 12/21/2022] [Indexed: 01/19/2023]
Abstract
In the last five years, there has been tremendous growth in machine learning and artificial intelligence as applied to polymer science. Here, we highlight the unique challenges presented by polymers and how the field is addressing them. We focus on emerging trends with an emphasis on topics that have received less attention in the review literature. Finally, we provide an outlook for the field, outline important growth areas in machine learning and artificial intelligence for polymer science and discuss important advances from the greater material science community.
Collapse
Affiliation(s)
- Tyler B. Martin
- National Institute of Standards
and Technology, Gaithersburg, Maryland20899, United States
| | - Debra J. Audus
- National Institute of Standards
and Technology, Gaithersburg, Maryland20899, United States
| |
Collapse
|
14
|
Yu M, Shi Y, Jia Q, Wang Q, Luo ZH, Yan F, Zhou YN. Ring Repeating Unit: An Upgraded Structure Representation of Linear Condensation Polymers for Property Prediction. J Chem Inf Model 2023; 63:1177-1187. [PMID: 36651860 DOI: 10.1021/acs.jcim.2c01389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Unique structure representation of polymers plays a crucial role in developing models for polymer property prediction and polymer design by data-centric approaches. Currently, monomer and repeating unit (RU) approximations are widely used to represent polymer structures for generating feature descriptors in the modeling of quantitative structure-property relationships (QSPR). However, such conventional structure representations may not uniquely approximate heterochain polymers due to the diversity of monomer combinations and the potential multi-RUs. In this study, the so-called ring repeating unit (RRU) method that can uniquely represent polymers with a broad range of structure diversity is proposed for the first time. As a proof of concept, an RRU-based QSPR model was developed to predict the associated glass transition temperature (Tg) of polyimides (PIs) with deterministic values. Comprehensive model validations including external, internal, and Y-random validations were performed. Also, an RU-based QSPR model developed based on the same large database of 1321 PIs provides nonunique prediction results, which further prove the necessity of RRU-based structure representation. Promising results obtained by the application of the RRU-based model confirm that the as-developed RRU method provides an effective representation that accurately captures the sequence of repeat units and thus realizes reliable polymer property prediction by data-driven approaches.
Collapse
Affiliation(s)
- Mengxian Yu
- School of Chemical Engineering and Materials Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Yajuan Shi
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai200240, P. R. China
| | - Qingzhu Jia
- School of Marine and Environmental Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Qiang Wang
- School of Chemical Engineering and Materials Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Zheng-Hong Luo
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai200240, P. R. China
| | - Fangyou Yan
- School of Chemical Engineering and Materials Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Yin-Ning Zhou
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai200240, P. R. China
| |
Collapse
|
15
|
Aldeghi M, Coley CW. A graph representation of molecular ensembles for polymer property prediction. Chem Sci 2022; 13:10486-10498. [PMID: 36277616 PMCID: PMC9473492 DOI: 10.1039/d2sc02839e] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 08/15/2022] [Indexed: 12/02/2022] Open
Abstract
Synthetic polymers are versatile and widely used materials. Similar to small organic molecules, a large chemical space of such materials is hypothetically accessible. Computational property prediction and virtual screening can accelerate polymer design by prioritizing candidates expected to have favorable properties. However, in contrast to organic molecules, polymers are often not well-defined single structures but an ensemble of similar molecules, which poses unique challenges to traditional chemical representations and machine learning approaches. Here, we introduce a graph representation of molecular ensembles and an associated graph neural network architecture that is tailored to polymer property prediction. We demonstrate that this approach captures critical features of polymeric materials, like chain architecture, monomer stoichiometry, and degree of polymerization, and achieves superior accuracy to off-the-shelf cheminformatics methodologies. While doing so, we built a dataset of simulated electron affinity and ionization potential values for >40k polymers with varying monomer composition, stoichiometry, and chain architecture, which may be used in the development of other tailored machine learning approaches. The dataset and machine learning models presented in this work pave the path toward new classes of algorithms for polymer informatics and, more broadly, introduce a framework for the modeling of molecular ensembles.
Collapse
Affiliation(s)
- Matteo Aldeghi
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
16
|
Tao L, Byrnes J, Varshney V, Li Y. Machine learning strategies for the structure-property relationship of copolymers. iScience 2022; 25:104585. [PMID: 35789847 PMCID: PMC9249671 DOI: 10.1016/j.isci.2022.104585] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/26/2022] [Accepted: 06/07/2022] [Indexed: 11/15/2022] Open
Abstract
Establishing the structure-property relationship is extremely valuable for the molecular design of copolymers. However, machine learning (ML) models can incorporate both chemical composition and sequence distribution of monomers, and have the generalization ability to process various copolymer types (e.g., alternating, random, block, and gradient copolymers) with a unified approach are missing. To address this challenge, we formulate four different ML models for investigation, including a feedforward neural network (FFNN) model, a convolutional neural network (CNN) model, a recurrent neural network (RNN) model, and a combined FFNN/RNN (Fusion) model. We use various copolymer types to systematically validate the performance and generalizability of different models. We find that the RNN architecture that processes the monomer sequence information both forward and backward is a more suitable ML model for copolymers with better generalizability. As a supplement to polymer informatics, our proposed approach provides an efficient way for the evaluation of copolymers.
Collapse
Affiliation(s)
- Lei Tao
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA
| | | | - Vikas Varshney
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433, USA
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA
- Polymer Program, Institute of Materials Science, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
17
|
Kumar R. Materiomically Designed Polymeric Vehicles for Nucleic Acids: Quo Vadis? ACS APPLIED BIO MATERIALS 2022; 5:2507-2535. [PMID: 35642794 DOI: 10.1021/acsabm.2c00346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Despite rapid advances in molecular biology, particularly in site-specific genome editing technologies, such as CRISPR/Cas9 and base editing, financial and logistical challenges hinder a broad population from accessing and benefiting from gene therapy. To improve the affordability and scalability of gene therapy, we need to deploy chemically defined, economical, and scalable materials, such as synthetic polymers. For polymers to deliver nucleic acids efficaciously to targeted cells, they must optimally combine design attributes, such as architecture, length, composition, spatial distribution of monomers, basicity, hydrophilic-hydrophobic phase balance, or protonation degree. Designing polymeric vectors for specific nucleic acid payloads is a multivariate optimization problem wherein even minuscule deviations from the optimum are poorly tolerated. To explore the multivariate polymer design space rapidly, efficiently, and fruitfully, we must integrate parallelized polymer synthesis, high-throughput biological screening, and statistical modeling. Although materiomics approaches promise to streamline polymeric vector development, several methodological ambiguities must be resolved. For instance, establishing a flexible polymer ontology that accommodates recent synthetic advances, enforcing uniform polymer characterization and data reporting standards, and implementing multiplexed in vitro and in vivo screening studies require considerable planning, coordination, and effort. This contribution will acquaint readers with the challenges associated with materiomics approaches to polymeric gene delivery and offers guidelines for overcoming these challenges. Here, we summarize recent developments in combinatorial polymer synthesis, high-throughput screening of polymeric vectors, omics-based approaches to polymer design, barcoding schemes for pooled in vitro and in vivo screening, and identify materiomics-inspired research directions that will realize the long-unfulfilled clinical potential of polymeric carriers in gene therapy.
Collapse
Affiliation(s)
- Ramya Kumar
- Department of Chemical & Biological Engineering, Colorado School of Mines, 1613 Illinois St, Golden, Colorado 80401, United States
| |
Collapse
|
18
|
Quach CD, Gilmer JB, Pert D, Mason-Hogans A, Iacovella CR, Cummings PT, McCabe C. High-throughput screening of tribological properties of monolayer films using molecular dynamics and machine learning. J Chem Phys 2022; 156:154902. [PMID: 35459321 DOI: 10.1063/5.0080838] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Monolayer films have shown promise as a lubricating layer to reduce friction and wear of mechanical devices with separations on the nanoscale. These films have a vast design space with many tunable properties that can affect their tribological effectiveness. For example, terminal group chemistry, film composition, and backbone chemistry can all lead to films with significantly different tribological properties. This design space, however, is very difficult to explore without a combinatorial approach and an automatable, reproducible, and extensible workflow to screen for promising candidate films. Using the Molecular Simulation Design Framework (MoSDeF), a combinatorial screening study was performed to explore 9747 unique monolayer films (116 964 total simulations) and a machine learning (ML) model using a random forest regressor, an ensemble learning technique, to explore the role of terminal group chemistry and its effect on tribological effectiveness. The most promising films were found to contain small terminal groups such as cyano and ethylene. The ML model was subsequently applied to screen terminal group candidates identified from the ChEMBL small molecule library. Approximately 193 131 unique film candidates were screened with approximately a five order of magnitude speed-up in analysis compared to simulation alone. The ML model was thus able to be used as a predictive tool to greatly speed up the initial screening of promising candidate films for future simulation studies, suggesting that computational screening in combination with ML can greatly increase the throughput in combinatorial approaches to generate in silico data and then train ML models in a controlled, self-consistent fashion.
Collapse
Affiliation(s)
- Co D Quach
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37235, USA
| | - Justin B Gilmer
- Interdiscplinary Materials Science, Vanderbilt University, Nashville, Tennessee 37235, USA
| | - Daniel Pert
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37235, USA
| | - Akanke Mason-Hogans
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37235, USA
| | - Christopher R Iacovella
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37235, USA
| | - Peter T Cummings
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37235, USA
| | - Clare McCabe
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37235, USA
| |
Collapse
|