1
|
Bhowmik D, Zhang P, Fox Z, Irle S, Gounley J. Enhancing molecular design efficiency: Uniting language models and generative networks with genetic algorithms. PATTERNS (NEW YORK, N.Y.) 2024; 5:100947. [PMID: 38645768 PMCID: PMC11026973 DOI: 10.1016/j.patter.2024.100947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/14/2023] [Accepted: 02/08/2024] [Indexed: 04/23/2024]
Abstract
This study examines the effectiveness of generative models in drug discovery, material science, and polymer science, aiming to overcome constraints associated with traditional inverse design methods relying on heuristic rules. Generative models generate synthetic data resembling real data, enabling deep learning model training without extensive labeled datasets. They prove valuable in creating virtual libraries of molecules for material science and facilitating drug discovery by generating molecules with specific properties. While generative adversarial networks (GANs) are explored for these purposes, mode collapse restricts their efficacy, limiting novel structure variability. To address this, we introduce a masked language model (LM) inspired by natural language processing. Although LMs alone can have inherent limitations, we propose a hybrid architecture combining LMs and GANs to efficiently generate new molecules, demonstrating superior performance over standalone masked LMs, particularly for smaller population sizes. This hybrid LM-GAN architecture enhances efficiency in optimizing properties and generating novel samples.
Collapse
Affiliation(s)
- Debsindhu Bhowmik
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Pei Zhang
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Zachary Fox
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Stephan Irle
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - John Gounley
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| |
Collapse
|
2
|
Zhang P, Kearney L, Bhowmik D, Fox Z, Naskar AK, Gounley J. Transferring a Molecular Foundation Model for Polymer Property Predictions. J Chem Inf Model 2023; 63:7689-7698. [PMID: 38055952 DOI: 10.1021/acs.jcim.3c01650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
Transformer-based large language models have remarkable potential to accelerate design optimization for applications such as drug development and material discovery. Self-supervised pretraining of transformer models requires large-scale data sets, which are often sparsely populated in topical areas such as polymer science. State-of-the-art approaches for polymers conduct data augmentation to generate additional samples but unavoidably incur extra computational costs. In contrast, large-scale open-source data sets are available for small molecules and provide a potential solution to data scarcity through transfer learning. In this work, we show that using transformers pretrained on small molecules and fine-tuned on polymer properties achieves comparable accuracy to those trained on augmented polymer data sets for a series of benchmark prediction tasks.
Collapse
Affiliation(s)
- Pei Zhang
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Logan Kearney
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Debsindhu Bhowmik
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Zachary Fox
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Amit K Naskar
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - John Gounley
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| |
Collapse
|
3
|
Day EC, Chittari SS, Bogen MP, Knight AS. Navigating the Expansive Landscapes of Soft Materials: A User Guide for High-Throughput Workflows. ACS POLYMERS AU 2023; 3:406-427. [PMID: 38107416 PMCID: PMC10722570 DOI: 10.1021/acspolymersau.3c00025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/02/2023] [Accepted: 11/07/2023] [Indexed: 12/19/2023]
Abstract
Synthetic polymers are highly customizable with tailored structures and functionality, yet this versatility generates challenges in the design of advanced materials due to the size and complexity of the design space. Thus, exploration and optimization of polymer properties using combinatorial libraries has become increasingly common, which requires careful selection of synthetic strategies, characterization techniques, and rapid processing workflows to obtain fundamental principles from these large data sets. Herein, we provide guidelines for strategic design of macromolecule libraries and workflows to efficiently navigate these high-dimensional design spaces. We describe synthetic methods for multiple library sizes and structures as well as characterization methods to rapidly generate data sets, including tools that can be adapted from biological workflows. We further highlight relevant insights from statistics and machine learning to aid in data featurization, representation, and analysis. This Perspective acts as a "user guide" for researchers interested in leveraging high-throughput screening toward the design of multifunctional polymers and predictive modeling of structure-property relationships in soft materials.
Collapse
Affiliation(s)
| | | | - Matthew P. Bogen
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Abigail S. Knight
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
4
|
Zheng X, Zhang X, Chen TT, Watanabe I. Deep Learning in Mechanical Metamaterials: From Prediction and Generation to Inverse Design. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2023; 35:e2302530. [PMID: 37332101 DOI: 10.1002/adma.202302530] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 05/27/2023] [Indexed: 06/20/2023]
Abstract
Mechanical metamaterials are meticulously designed structures with exceptional mechanical properties determined by their microstructures and constituent materials. Tailoring their material and geometric distribution unlocks the potential to achieve unprecedented bulk properties and functions. However, current mechanical metamaterial design considerably relies on experienced designers' inspiration through trial and error, while investigating their mechanical properties and responses entails time-consuming mechanical testing or computationally expensive simulations. Nevertheless, recent advancements in deep learning have revolutionized the design process of mechanical metamaterials, enabling property prediction and geometry generation without prior knowledge. Furthermore, deep generative models can transform conventional forward design into inverse design. Many recent studies on the implementation of deep learning in mechanical metamaterials are highly specialized, and their pros and cons may not be immediately evident. This critical review provides a comprehensive overview of the capabilities of deep learning in property prediction, geometry generation, and inverse design of mechanical metamaterials. Additionally, this review highlights the potential of leveraging deep learning to create universally applicable datasets, intelligently designed metamaterials, and material intelligence. This article is expected to be valuable not only to researchers working on mechanical metamaterials but also those in the field of materials informatics.
Collapse
Affiliation(s)
- Xiaoyang Zheng
- Center for Basic Research on Materials, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, 305-0047, Japan
- Graduate School of Pure and Applied Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8573, Japan
| | - Xubo Zhang
- Graduate School of Pure and Applied Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8573, Japan
| | - Ta-Te Chen
- Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8603, Japan
- National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, 305-0047, Japan
| | - Ikumu Watanabe
- Center for Basic Research on Materials, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, 305-0047, Japan
- Graduate School of Pure and Applied Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8573, Japan
| |
Collapse
|
5
|
Zhao Y, Chen Z, Dong Y. Compliance Prediction for Structural Topology Optimization on the Basis of Moment Invariants and a Generalized Regression Neural Network. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1396. [PMID: 37895517 PMCID: PMC10606044 DOI: 10.3390/e25101396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 09/18/2023] [Accepted: 09/25/2023] [Indexed: 10/29/2023]
Abstract
Topology optimization techniques are essential for manufacturing industries, such as designing fiber-reinforced polymer composites (FRPCs) and structures with outstanding strength-to-weight ratios and light weights. In the SIMP approach, artificial intelligence algorithms are commonly utilized to enhance traditional FEM-based compliance minimization procedures. Based on an effective generalized regression neural network (GRNN), a new deep learning algorithm of compliance prediction for structural topology optimization is proposed. The algorithm learns the structural information using a fourth-order moment invariant analysis of the structural topology obtained from FEA at different iterations of classical topology optimization. A cantilever and a simply supported beam problem are used as ground-truth datasets, and the moment invariants are used as independent variables for input features. By comparing it with the well-known convolutional neural network (CNN) and deep neural network (DNN) models, the proposed GRNN model achieves a high prediction accuracy (R2 > 0.97) and drastically shortens the training and prediction cost. Furthermore, the GRNN algorithm exhibits excellent generalization ability on the prediction performance of the optimized topology with rotations and varied material volume fractions. This algorithm is promising for the replacement of the FEA calculation in the SIMP method, and can be applied to real-time optimization for advanced FRPC structure design.
Collapse
Affiliation(s)
- Yunmei Zhao
- School of Aerospace Engineering and Applied Mechanics, Tongji University, Shanghai 200092, China;
| | - Zhenyue Chen
- School of Aerospace Engineering and Applied Mechanics, Tongji University, Shanghai 200092, China;
| | - Yiqun Dong
- Department of Aeronautics and Astronautics, Fudan University, Shanghai 200433, China
| |
Collapse
|
6
|
Himanshu, Chakraborty K, Patra TK. Developing efficient deep learning model for predicting copolymer properties. Phys Chem Chem Phys 2023; 25:25166-25176. [PMID: 37712405 DOI: 10.1039/d3cp03100d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/16/2023]
Abstract
Deep learning models are gaining popularity and potency in predicting polymer properties. These models can be built using pre-existing data and are useful for the rapid prediction of polymer properties. However, the performance of a deep learning model is intricately connected to its topology and the volume of training data. There is no facile protocol available to select a deep learning architecture, and there is a lack of a large volume of homogeneous sequence-property data of polymers. These two factors are the primary bottleneck for the efficient development of deep learning models for polymers. Here we assess the severity of these factors and propose strategies to address them. We show that a linear layer-by-layer expansion of a neural network can help in identifying the best neural network topology for a given problem. Moreover, we map the discrete sequence space of a polymer to a continuous one-dimensional latent space using a feature extraction technique to identify minimal data points for training a deep learning model. We implement these approaches for two representative cases of building sequence-property surrogate models, viz., the single-molecule radius of gyration of a copolymer and copolymer compatibilizer. This work demonstrates efficient methods for building deep learning models with minimal data and hyperparameters for predicting sequence-defined properties of polymers.
Collapse
Affiliation(s)
- Himanshu
- Department of Chemical Engineering and Center for Atomistic Modeling and Materials Design, Indian Institute of Technology Madras, Chennai, TN 600036, India.
| | - Kaushik Chakraborty
- Department of Chemical Engineering and Center for Atomistic Modeling and Materials Design, Indian Institute of Technology Madras, Chennai, TN 600036, India.
| | - Tarak K Patra
- Department of Chemical Engineering and Center for Atomistic Modeling and Materials Design, Indian Institute of Technology Madras, Chennai, TN 600036, India.
| |
Collapse
|
7
|
Ohno M, Hayashi Y, Zhang Q, Kaneko Y, Yoshida R. SMiPoly: Generation of a Synthesizable Polymer Virtual Library Using Rule-Based Polymerization Reactions. J Chem Inf Model 2023; 63:5539-5548. [PMID: 37604495 PMCID: PMC10498440 DOI: 10.1021/acs.jcim.3c00329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Indexed: 08/23/2023]
Abstract
Recent advances in machine learning have led to the rapid adoption of various computational methods for de novo molecular design in polymer research, including high-throughput virtual screening and inverse molecular design. In such workflows, molecular generators play an essential role in creation or sequential modification of candidate polymer structures. Machine learning-assisted molecular design has made great technical progress over the past few years. However, the difficulty of identifying synthetic routes to such designed polymers remains unresolved. To address this technical limitation, we present Small Molecules into Polymers (SMiPoly), a Python library for virtual polymer generation that implements 22 chemical rules for commonly applied polymerization reactions. For given small organic molecules to form a candidate monomer set, the SMiPoly generator conducts possible polymerization reactions to generate an exhaustive list of potentially synthesizable polymers. In this study, using 1083 readily available monomers, we generated 169,347 unique polymers forming seven different molecular types: polyolefin, polyester, polyether, polyamide, polyimide, polyurethane, and polyoxazolidone. By comparing the distribution of the virtually created polymers with approximately 16,000 real polymers synthesized so far, it was found that the coverage and novelty of the SMiPoly-generated polymers can reach 48 and 53%, respectively. Incorporating the SMiPoly library into a molecular design workflow will accelerate the process of de novo polymer synthesis by shortening the step to select synthesizable candidate polymers.
Collapse
Affiliation(s)
- Mitsuru Ohno
- Daicel
Corporation, Kita-ku, 530-0011 Osaka, Japan
| | - Yoshihiro Hayashi
- The
Institute of Statistical Mathematics, Research Organization of Information
and Systems, Tachikawa, Tokyo 190-8562, Japan
- The
Graduate University for Advanced Studies, SOKENDAI, Tachikawa, Tokyo 190-8562, Japan
| | - Qi Zhang
- The
Institute of Statistical Mathematics, Research Organization of Information
and Systems, Tachikawa, Tokyo 190-8562, Japan
| | - Yu Kaneko
- Daicel
Corporation, Kita-ku, 530-0011 Osaka, Japan
| | - Ryo Yoshida
- The
Institute of Statistical Mathematics, Research Organization of Information
and Systems, Tachikawa, Tokyo 190-8562, Japan
- The
Graduate University for Advanced Studies, SOKENDAI, Tachikawa, Tokyo 190-8562, Japan
- National
Institute for Materials Science, 305-0047 Ibaraki, Japan
| |
Collapse
|
8
|
McDonald SM, Augustine EK, Lanners Q, Rudin C, Catherine Brinson L, Becker ML. Applied machine learning as a driver for polymeric biomaterials design. Nat Commun 2023; 14:4838. [PMID: 37563117 PMCID: PMC10415291 DOI: 10.1038/s41467-023-40459-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 07/24/2023] [Indexed: 08/12/2023] Open
Abstract
Polymers are ubiquitous to almost every aspect of modern society and their use in medical products is similarly pervasive. Despite this, the diversity in commercial polymers used in medicine is stunningly low. Considerable time and resources have been extended over the years towards the development of new polymeric biomaterials which address unmet needs left by the current generation of medical-grade polymers. Machine learning (ML) presents an unprecedented opportunity in this field to bypass the need for trial-and-error synthesis, thus reducing the time and resources invested into new discoveries critical for advancing medical treatments. Current efforts pioneering applied ML in polymer design have employed combinatorial and high throughput experimental design to address data availability concerns. However, the lack of available and standardized characterization of parameters relevant to medicine, including degradation time and biocompatibility, represents a nearly insurmountable obstacle to ML-aided design of biomaterials. Herein, we identify a gap at the intersection of applied ML and biomedical polymer design, highlight current works at this junction more broadly and provide an outlook on challenges and future directions.
Collapse
Affiliation(s)
| | - Emily K Augustine
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
| | - Quinn Lanners
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Cynthia Rudin
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - L Catherine Brinson
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
| | - Matthew L Becker
- Department of Chemistry, Duke University, Durham, NC, USA.
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA.
| |
Collapse
|
9
|
Kim S, Schroeder CM, Jackson NE. Open Macromolecular Genome: Generative Design of Synthetically Accessible Polymers. ACS POLYMERS AU 2023; 3:318-330. [PMID: 37576712 PMCID: PMC10416319 DOI: 10.1021/acspolymersau.3c00003] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 03/13/2023] [Accepted: 03/14/2023] [Indexed: 03/31/2023]
Abstract
A grand challenge in polymer science lies in the predictive design of new polymeric materials with targeted functionality. However, de novo design of functional polymers is challenging due to the vast chemical space and an incomplete understanding of structure-property relations. Recent advances in deep generative modeling have facilitated the efficient exploration of molecular design space, but data sparsity in polymer science is a major obstacle hindering progress. In this work, we introduce a vast polymer database known as the Open Macromolecular Genome (OMG), which contains synthesizable polymer chemistries compatible with known polymerization reactions and commercially available reactants selected for synthetic feasibility. The OMG is used in concert with a synthetically aware generative model known as Molecule Chef to identify property-optimized constitutional repeating units, constituent reactants, and reaction pathways of polymers, thereby advancing polymer design into the realm of synthetic relevance. As a proof-of-principle demonstration, we show that polymers with targeted octanol-water solubilities are readily generated together with monomer reactant building blocks and associated polymerization reactions. Suggested reactants are further integrated with Reaxys polymerization data to provide hypothetical reaction conditions (e.g., temperature, catalysts, and solvents). Broadly, the OMG is a polymer design approach capable of enabling data-intensive generative models for synthetic polymer design. Overall, this work represents a significant advance, enabling the property targeted design of synthetic polymers subject to practical synthetic constraints.
Collapse
Affiliation(s)
- Seonghwan Kim
- Department
of Materials Science and Engineering, University
of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Charles M. Schroeder
- Department
of Chemistry, University of Illinois at
Urbana-Champaign, Urbana, Illinois 61801, United States
- Department
of Materials Science and Engineering, University
of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Beckman
Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department
of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Nicholas E. Jackson
- Department
of Chemistry, University of Illinois at
Urbana-Champaign, Urbana, Illinois 61801, United States
- Beckman
Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
10
|
Shi J, Albreiki F, Yamil J Colón, Srivastava S, Whitmer JK. Transfer Learning Facilitates the Prediction of Polymer-Surface Adhesion Strength. J Chem Theory Comput 2023; 19:4631-4640. [PMID: 37068204 DOI: 10.1021/acs.jctc.2c01314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
Abstract
Machine learning (ML) accelerates the exploration of material properties and their links to the structure of the underlying molecules. In previous work [Shi et al. ACS Applied Materials & Interfaces 2022, 14, 37161-37169.], ML models were applied to predict the adhesive free energy of polymer-surface interactions with high accuracy from the knowledge of the sequence data, demonstrating successes in inverse-design of polymer sequence for known surface compositions. While the method was shown to be successful in designing polymers for a known surface, extensive data sets were needed for each specific surface in order to train the surrogate models. Ideally, one should be able to infer information about similar surfaces without having to regenerate a full complement of adhesion data for each new case. In the current work, we demonstrate a transfer learning (TL) technique using a deep neural network to improve the accuracy of ML models trained on small data sets by pretraining on a larger database from a related system and fine-tuning the weights of all layers with a small amount of additional data. The shared knowledge from the pretrained model facilitates the prediction accuracy significantly on small data sets. We also explore the limits of database size on accuracy and the optimal tuning of network architecture and parameters for our learning tasks. While applied to a relatively simple coarse-grained (CG) polymer model, the general lessons of this study apply to detailed modeling studies and the broader problems of inverse materials design.
Collapse
Affiliation(s)
- Jiale Shi
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Fahed Albreiki
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Yamil J Colón
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Samanvaya Srivastava
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, Los Angeles, California 90095, United States
- California NanoSystems Institute, Center for Biological Physics, University of California, Los Angeles, Los Angeles, California 90095, United States
- Institute for Carbon Management, University of California, Los Angeles, Los Angeles, California 90095, United States
- Center for Biological Physics, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Jonathan K Whitmer
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| |
Collapse
|
11
|
Park NH, Manica M, Born J, Hedrick JL, Erdmann T, Zubarev DY, Adell-Mill N, Arrechea PL. Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language. Nat Commun 2023; 14:3686. [PMID: 37344485 PMCID: PMC10284867 DOI: 10.1038/s41467-023-39396-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 06/12/2023] [Indexed: 06/23/2023] Open
Abstract
Advances in machine learning (ML) and automated experimentation are poised to vastly accelerate research in polymer science. Data representation is a critical aspect for enabling ML integration in research workflows, yet many data models impose significant rigidity making it difficult to accommodate a broad array of experiment and data types found in polymer science. This inflexibility presents a significant barrier for researchers to leverage their historical data in ML development. Here we show that a domain specific language, termed Chemical Markdown Language (CMDL), provides flexible, extensible, and consistent representation of disparate experiment types and polymer structures. CMDL enables seamless use of historical experimental data to fine-tune regression transformer (RT) models for generative molecular design tasks. We demonstrate the utility of this approach through the generation and the experimental validation of catalysts and polymers in the context of ring-opening polymerization-although we provide examples of how CMDL can be more broadly applied to other polymer classes. Critically, we show how the CMDL tuned model preserves key functional groups within the polymer structure, allowing for experimental validation. These results reveal the versatility of CMDL and how it facilitates translation of historical data into meaningful predictive and generative models to produce experimentally actionable output.
Collapse
Affiliation(s)
| | - Matteo Manica
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
| | - Jannis Born
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058, Basel, Switzerland
| | - James L Hedrick
- IBM Research-Almaden, 650 Harry Rd., San Jose, CA, 95120, USA
| | - Tim Erdmann
- IBM Research-Almaden, 650 Harry Rd., San Jose, CA, 95120, USA
| | | | - Nil Adell-Mill
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Arctoris, 120E Olympic Avenue, Abingdon, OX14 4SA, Oxfordshire, UK
| | | |
Collapse
|
12
|
Lee S, Nam D, Yang DC, Choe W. Unveiling Hidden Zeolitic Imidazolate Frameworks Guided by Intuition-Based Geometrical Factors. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2023; 19:e2300036. [PMID: 36759958 DOI: 10.1002/smll.202300036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 01/20/2023] [Indexed: 06/18/2023]
Abstract
Herein, synthesizable candidate topologies to form zeolitic imidazolate frameworks (ZIFs) are efficiently identified from over 2 000 000 hypothetical structures in zeolite databases, using structural descriptors extracted from known ZIFs. A combination of intuition-based structural descriptors, such as ring patterns, node numbers, and TOT bridging angles (T = tetrahedral metal nodes in zeolites and ZIFs), is used as data filters to eliminate topologies infeasible for ZIF formation. Carefully chosen structural descriptors facilitate the prediction of plausible ZIF topologies. To investigate potential applications as porous ZIFs, this work performs hydrogen adsorption screening and suggested notable target ZIFs. The collection of new plausible ZIFs, derived from the combined descriptors, will be a structural blueprint for synthetic chemists.
Collapse
Affiliation(s)
- Soochan Lee
- Department of Chemistry, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulsan, 44919, Republic of Korea
| | - Dongsik Nam
- Department of Chemistry, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulsan, 44919, Republic of Korea
| | - David ChangMo Yang
- Department of Chemistry, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulsan, 44919, Republic of Korea
| | - Wonyoung Choe
- Department of Chemistry, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulsan, 44919, Republic of Korea
- Graduate School of Carbon Neutrality, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulsan, 44919, Republic of Korea
| |
Collapse
|
13
|
Ricci E, Vergadou N. Integrating Machine Learning in the Coarse-Grained Molecular Simulation of Polymers. J Phys Chem B 2023; 127:2302-2322. [PMID: 36888553 DOI: 10.1021/acs.jpcb.2c06354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
Machine learning (ML) is having an increasing impact on the physical sciences, engineering, and technology and its integration into molecular simulation frameworks holds great potential to expand their scope of applicability to complex materials and facilitate fundamental knowledge and reliable property predictions, contributing to the development of efficient materials design routes. The application of ML in materials informatics in general, and polymer informatics in particular, has led to interesting results, however great untapped potential lies in the integration of ML techniques into the multiscale molecular simulation methods for the study of macromolecular systems, specifically in the context of Coarse Grained (CG) simulations. In this Perspective, we aim at presenting the pioneering recent research efforts in this direction and discussing how these new ML-based techniques can contribute to critical aspects of the development of multiscale molecular simulation methods for bulk complex chemical systems, especially polymers. Prerequisites for the implementation of such ML-integrated methods and open challenges that need to be met toward the development of general systematic ML-based coarse graining schemes for polymers are discussed.
Collapse
Affiliation(s)
- Eleonora Ricci
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
- Institute of Informatics and Telecommunications, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
| | - Niki Vergadou
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
| |
Collapse
|
14
|
Dong Q, Gong X, Yuan K, Jiang Y, Zhang L, Li W. Inverse Design of Complex Block Copolymers for Exotic Self-Assembled Structures Based on Bayesian Optimization. ACS Macro Lett 2023; 12:401-407. [PMID: 36888723 DOI: 10.1021/acsmacrolett.3c00020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2023]
Abstract
Variable chain topologies of multiblock copolymers provide great opportunities for the formation of numerous self-assembled nanostructures with promising potential applications. However, the consequent large parameter space poses new challenges for searching the stable parameter region of desired novel structures. In this Letter, by combining Bayesian optimization (BO), fast Fourier transform-assisted 3D convolutional neural network (FFT-3DCNN), and self-consistent field theory (SCFT), we develop a data-driven and fully automated inverse design framework to search for the desired novel structures self-assembled by ABC-type multiblock copolymers. Stable phase regions of three exotic target structures are efficiently identified in high-dimensional parameter space. Our work advances the new research paradigm of inverse design in the field of block copolymers.
Collapse
Affiliation(s)
- Qingshu Dong
- State Key Laboratory of Molecular Engineering of Polymers, Key Laboratory of Computational Physical Sciences, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| | - Xiangrui Gong
- School of Chemistry, Center of Soft Matter Physics and its Applications, Beihang University, Beijing 100191, China
| | - Kangrui Yuan
- State Key Laboratory of Molecular Engineering of Polymers, Key Laboratory of Computational Physical Sciences, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| | - Ying Jiang
- School of Chemistry, Center of Soft Matter Physics and its Applications, Beihang University, Beijing 100191, China
| | - Liangshun Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Weihua Li
- State Key Laboratory of Molecular Engineering of Polymers, Key Laboratory of Computational Physical Sciences, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| |
Collapse
|
15
|
Wu JQ, Gong XQ, Wang Q, Yan F, Li JJ. A QSPR study for predicting θ(LCST) and θ(UCST) in binary polymer solutions. Chem Eng Sci 2023. [DOI: 10.1016/j.ces.2022.118326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
16
|
Martin TB, Audus DJ. Emerging Trends in Machine Learning: A Polymer Perspective. ACS POLYMERS AU 2023. [DOI: 10.1021/acspolymersau.2c00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Affiliation(s)
- Tyler B. Martin
- National Institute of Standards and Technology, Gaithersburg, Maryland20899, United States
| | - Debra J. Audus
- National Institute of Standards and Technology, Gaithersburg, Maryland20899, United States
| |
Collapse
|
17
|
Kumar R. Materiomically Designed Polymeric Vehicles for Nucleic Acids: Quo Vadis? ACS APPLIED BIO MATERIALS 2022; 5:2507-2535. [PMID: 35642794 DOI: 10.1021/acsabm.2c00346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Despite rapid advances in molecular biology, particularly in site-specific genome editing technologies, such as CRISPR/Cas9 and base editing, financial and logistical challenges hinder a broad population from accessing and benefiting from gene therapy. To improve the affordability and scalability of gene therapy, we need to deploy chemically defined, economical, and scalable materials, such as synthetic polymers. For polymers to deliver nucleic acids efficaciously to targeted cells, they must optimally combine design attributes, such as architecture, length, composition, spatial distribution of monomers, basicity, hydrophilic-hydrophobic phase balance, or protonation degree. Designing polymeric vectors for specific nucleic acid payloads is a multivariate optimization problem wherein even minuscule deviations from the optimum are poorly tolerated. To explore the multivariate polymer design space rapidly, efficiently, and fruitfully, we must integrate parallelized polymer synthesis, high-throughput biological screening, and statistical modeling. Although materiomics approaches promise to streamline polymeric vector development, several methodological ambiguities must be resolved. For instance, establishing a flexible polymer ontology that accommodates recent synthetic advances, enforcing uniform polymer characterization and data reporting standards, and implementing multiplexed in vitro and in vivo screening studies require considerable planning, coordination, and effort. This contribution will acquaint readers with the challenges associated with materiomics approaches to polymeric gene delivery and offers guidelines for overcoming these challenges. Here, we summarize recent developments in combinatorial polymer synthesis, high-throughput screening of polymeric vectors, omics-based approaches to polymer design, barcoding schemes for pooled in vitro and in vivo screening, and identify materiomics-inspired research directions that will realize the long-unfulfilled clinical potential of polymeric carriers in gene therapy.
Collapse
Affiliation(s)
- Ramya Kumar
- Department of Chemical & Biological Engineering, Colorado School of Mines, 1613 Illinois St, Golden, Colorado 80401, United States
| |
Collapse
|
18
|
Data-driven approaches for structure-property relationships in polymer science for prediction and understanding. Polym J 2022. [DOI: 10.1038/s41428-022-00648-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
19
|
Wang J, Wang Y, Chen Y. Inverse Design of Materials by Machine Learning. MATERIALS 2022; 15:ma15051811. [PMID: 35269043 PMCID: PMC8911677 DOI: 10.3390/ma15051811] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 02/13/2022] [Accepted: 02/24/2022] [Indexed: 02/04/2023]
Abstract
It is safe to say that every invention that has changed the world has depended on materials. At present, the demand for the development of materials and the invention or design of new materials is becoming more and more urgent since peoples' current production and lifestyle needs must be changed to help mitigate the climate. Structure-property relationships are a vital paradigm in materials science. However, these relationships are often nonlinear, and the pattern is likely to change with length scales and time scales, posing a huge challenge. With the development of physics, statistics, computer science, etc., machine learning offers the opportunity to systematically find new materials. Especially by inverse design based on machine learning, one can make use of the existing knowledge without attempting mathematical inversion of the relevant integrated differential equation of the electronic structure but by using backpropagation to overcome local minimax traps and perform a fast calculation of the gradient information for a target function concerning the design variable to find the optimizations. The methodologies have been applied to various materials including polymers, photonics, inorganic materials, porous materials, 2-D materials, etc. Different types of design problems require different approaches, for which many algorithms and optimization approaches have been demonstrated in different scenarios. In this mini-review, we will not specifically sum up machine learning methodologies, but will provide a more material perspective and summarize some cut-edging studies.
Collapse
Affiliation(s)
- Jia Wang
- School of Space and Environment, Beihang University, Beijing 102206, China;
| | - Yingxue Wang
- National Engineering Laboratory for Risk Perception and Prevention, Beijing 100081, China
- Correspondence:
| | - Yanan Chen
- School of Materials Science and Engineering, Tianjin University, Tianjin 300072, China;
| |
Collapse
|
20
|
Xu P, Chen H, Li M, Lu W. New Opportunity: Machine Learning for Polymer Materials Design and Discovery. ADVANCED THEORY AND SIMULATIONS 2022. [DOI: 10.1002/adts.202100565] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Pengcheng Xu
- Materials Genome Institute Shanghai University Shanghai 200444 China
| | - Huimin Chen
- Department of Mathematics College of Sciences Shanghai University Shanghai 200444 China
| | - Minjie Li
- Department of Chemistry College of Sciences Shanghai University Shanghai 200444 China
| | - Wencong Lu
- Materials Genome Institute Shanghai University Shanghai 200444 China
- Department of Chemistry College of Sciences Shanghai University Shanghai 200444 China
| |
Collapse
|
21
|
Nguyen D, Tao L, Li Y. Integration of Machine Learning and Coarse-Grained Molecular Simulations for Polymer Materials: Physical Understandings and Molecular Design. Front Chem 2022; 9:820417. [PMID: 35141207 PMCID: PMC8819075 DOI: 10.3389/fchem.2021.820417] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 12/31/2021] [Indexed: 12/21/2022] Open
Abstract
In recent years, the synthesis of monomer sequence-defined polymers has expanded into broad-spectrum applications in biomedical, chemical, and materials science fields. Pursuing the characterization and inverse design of these polymer systems requires our fundamental understanding not only at the individual monomer level, but also considering the chain scales, such as polymer configuration, self-assembly, and phase separation. However, our accessibility to this field is still rudimentary due to the limitations of traditional design approaches, the complexity of chemical space along with the burdened cost and time issues that prevent us from unveiling the underlying monomer sequence-structure-property relationships. Fortunately, thanks to the recent advancements in molecular dynamics simulations and machine learning (ML) algorithms, the bottlenecks in the tasks of establishing the structure-function correlation of the polymer chains can be overcome. In this review, we will discuss the applications of the integration between ML techniques and coarse-grained molecular dynamics (CGMD) simulations to solve the current issues in polymer science at the chain level. In particular, we focus on the case studies in three important topics—polymeric configuration characterization, feed-forward property prediction, and inverse design—in which CGMD simulations are leveraged to generate training datasets to develop ML-based surrogate models for specific polymer systems and designs. By doing so, this computational hybridization allows us to well establish the monomer sequence-functional behavior relationship of the polymers as well as guide us toward the best polymer chain candidates for the inverse design in undiscovered chemical space with reasonable computational cost and time. Even though there are still limitations and challenges ahead in this field, we finally conclude that this CGMD/ML integration is very promising, not only in the attempt of bridging the monomeric and macroscopic characterizations of polymer materials, but also enabling further tailored designs for sequence-specific polymers with superior properties in many practical applications.
Collapse
Affiliation(s)
- Danh Nguyen
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
| | - Lei Tao
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
- Polymer Program, Institute of Materials Science, University of Connecticut, Mansfield, CT, United States
- *Correspondence: Ying Li,
| |
Collapse
|
22
|
Abstract
Optimal design of polymers is a challenging task due to their enormous chemical and configurational space. Recent advances in computations, machine learning, and increasing trends in data and software availability can potentially address this problem and accelerate the molecular-scale design of polymers. Here, the central problem of polymer design is reviewed, and the general ideas of data-driven methods and their working principles in the context of polymer design are discussed. This Review provides a historical perspective and a summary of current trends and outlines future scopes of data-driven methods for polymer research. A few representative case studies on the use of such data-driven methods for discovering new polymers with exceptional properties are presented. Moreover, attempts are made to highlight how data-driven strategies aid in establishing new correlations and advancing the fundamental understanding of polymers. This Review posits that the combination of machine learning, rapid computational characterization of polymers, and availability of large open-sourced homogeneous data will transform polymer research and development over the coming decades. It is hoped that this Review will serve as a useful reference to researchers who wish to develop and deploy data-driven methods for polymer research and education.
Collapse
|
23
|
Cencer MM, Moore JS, Assary RS. Machine learning for polymeric materials: an introduction. POLYM INT 2021. [DOI: 10.1002/pi.6345] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
- Morgan M Cencer
- Department of Chemistry University of Illinois at Urbana‐Champaign Urbana IL USA
- Materials Science Division Argonne National Laboratory Lemont IL USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana‐Champaign Urbana IL USA
| | - Jeffrey S Moore
- Department of Chemistry University of Illinois at Urbana‐Champaign Urbana IL USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana‐Champaign Urbana IL USA
| | - Rajeev S Assary
- Materials Science Division Argonne National Laboratory Lemont IL USA
| |
Collapse
|