1
|
Rebello NJ, Arora A, Mochigase H, Lin TS, Shi J, Audus DJ, Muckley ES, Osmani A, Olsen BD. The Block Copolymer Phase Behavior Database. J Chem Inf Model 2024; 64:6464-6476. [PMID: 39126359 DOI: 10.1021/acs.jcim.4c00242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2024]
Abstract
The Block Copolymer Database (BCDB) is a platform that allows users to search, submit, visualize, benchmark, and download experimental phase measurements and their associated characterization information for di- and multiblock copolymers. To the best of our knowledge, there is no widely accepted data model for publishing experimental and simulation data on block copolymer self-assembly. This proposed data schema with traceable information can accommodate any number of blocks and at the time of publication contains over 5400 block copolymer total melt phase measurements mined from the literature and manually curated and simulation data points of the phase diagram generated from self-consistent field theory that can rapidly be augmented. This database can be accessed via the Community Resource for Innovation in Polymer Technology (CRIPT) web application and the Materials Data Facility. The chemical structure of the polymer is encoded in BigSMILES, an extension of the Simplified Molecular-Input Line-Entry System (SMILES) into the macromolecular domain, and the user can search repeat units and functional groups using the SMARTS search syntax (SMILES Arbitrary Target Specification). The user can also query characterization and phase information using Structured Query Language (SQL) and download custom sets of block copolymer data to train machine learning models. Finally, a protocol is presented in which GPT-4, an AI-powered large language model, can be used to rapidly screen and identify block copolymer papers from the literature using only the abstract text and determine whether they have BCDB data, allowing the database to grow as the number of published papers on the World Wide Web increases. The F1 score for this model is 0.74. This platform is an important step in making polymer data more accessible to the broader community.
Collapse
Affiliation(s)
- Nathan J Rebello
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Akash Arora
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Hidenobu Mochigase
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Tzyy-Shyang Lin
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Jiale Shi
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Debra J Audus
- Materials Science and Engineering Division, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Eric S Muckley
- Citrine Informatics, Redwood City, California 94063-2483, United States
| | - Ardiana Osmani
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Bradley D Olsen
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
2
|
Dong Q, Xu Z, Song Q, Qiang Y, Cao Y, Li W. Automated Search Strategy for Novel Ordered Structures of Block Copolymers. ACS Macro Lett 2024; 13:987-993. [PMID: 39042468 DOI: 10.1021/acsmacrolett.4c00384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/25/2024]
Abstract
Block copolymers with different architectures can possibly generate innumerable stable or metastable structures and thus provide an irreplaceable platform for theoretically exploring novel structures. Self-consistent field theory (SCFT) is a powerful tool to predict the ordered structures of block copolymers; however, it is sensitively dependent on its initial condition. Here we propose to use multiple symmetry-adapted basis functions to generate the initial conditions of SCFT and then apply Bayesian optimization to search for ordered structures by navigating the coefficient space of these basis functions. Without any prior knowledge, our scheme can automatically recover hundreds of ordered structures for two simple block copolymers, including most of the common structures and complex Frank-Kasper structures, together with many novel structures. By applying the automated scheme to various block copolymers, a huge number of novel structures can be obtained to expand the structural library, which may create new opportunities for the scientific community.
Collapse
Affiliation(s)
- Qingshu Dong
- State Key Laboratory of Molecular Engineering of Polymers, Research Center of AI for Polymer Science, Key Laboratory of Computational Physical Sciences, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| | - Zhanwen Xu
- State Key Laboratory of Molecular Engineering of Polymers, Research Center of AI for Polymer Science, Key Laboratory of Computational Physical Sciences, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| | - Qingliang Song
- State Key Laboratory of Molecular Engineering of Polymers, Research Center of AI for Polymer Science, Key Laboratory of Computational Physical Sciences, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| | - Yicheng Qiang
- State Key Laboratory of Molecular Engineering of Polymers, Research Center of AI for Polymer Science, Key Laboratory of Computational Physical Sciences, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| | - Yu Cao
- Shaanxi International Research Center for Soft Matter, State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an 710049, China
| | - Weihua Li
- State Key Laboratory of Molecular Engineering of Polymers, Research Center of AI for Polymer Science, Key Laboratory of Computational Physical Sciences, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| |
Collapse
|
3
|
Shi J, Walsh D, Zou W, Rebello NJ, Deagen ME, Fransen KA, Gao X, Olsen BD, Audus DJ. Calculating Pairwise Similarity of Polymer Ensembles via Earth Mover's Distance. ACS POLYMERS AU 2024; 4:66-76. [PMID: 38371731 PMCID: PMC10870752 DOI: 10.1021/acspolymersau.3c00029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 02/20/2024]
Abstract
Synthetic polymers, in contrast to small molecules and deterministic biomacromolecules, are typically ensembles composed of polymer chains with varying numbers, lengths, sequences, chemistry, and topologies. While numerous approaches exist for measuring pairwise similarity among small molecules and sequence-defined biomacromolecules, accurately determining the pairwise similarity between two polymer ensembles remains challenging. This work proposes the earth mover's distance (EMD) metric to calculate the pairwise similarity score between two polymer ensembles. EMD offers a greater resolution of chemical differences between polymer ensembles than the averaging method and provides a quantitative numeric value representing the pairwise similarity between polymer ensembles in alignment with chemical intuition. The EMD approach for assessing polymer similarity enhances the development of accurate chemical search algorithms within polymer databases and can improve machine learning techniques for polymer design, optimization, and property prediction.
Collapse
Affiliation(s)
- Jiale Shi
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Dylan Walsh
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Weizhong Zou
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Nathan J. Rebello
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael E. Deagen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Katharina A. Fransen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Xian Gao
- Department
of Chemical and Biomolecular Engineering, University of Notre Dame, Notre
Dame, Indiana 46556, United States
| | - Bradley D. Olsen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Debra J. Audus
- Materials
Science and Engineering Division, National
Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
4
|
Rebello NJ, Lin TS, Nazeer H, Olsen BD. BigSMARTS: A Topologically Aware Query Language and Substructure Search Algorithm for Polymer Chemical Structures. J Chem Inf Model 2023; 63:6555-6568. [PMID: 37874026 DOI: 10.1021/acs.jcim.3c00978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Molecular search is important in chemistry, biology, and informatics for identifying molecular structures within large data sets, improving knowledge discovery and innovation, and making chemical data FAIR (findable, accessible, interoperable, reusable). Search algorithms for polymers are significantly less developed than those for small molecules because polymer search relies on searching by polymer name, which can be challenging because polymer naming is overly broad (i.e., polyethylene), complicated for complex chemical structures, and often does not correspond to official IUPAC conventions. Chemical structure search in polymers is limited to substructures, such as monomers, without awareness of connectivity or topology. This work introduces a novel query language and graph traversal search algorithm for polymers that provides the first search method able to fully capture all of the chemical structures present in polymers. The BigSMARTS query language, an extension of the small-molecule SMARTS language, allows users to write queries that localize monomer and functional group searches to different parts of the polymer, like the middle block of a triblock, the side chain of a graft, and the backbone of a repeat unit. The substructure search algorithm is based on the traversal of graph representations of the generating functions for the stochastic graphs of polymers. Operationally, the algorithm first identifies cycles representing the monomers and then the end groups and finally performs a depth-first search to match entire subgraphs. To validate the algorithm, hundreds of queries were searched against hundreds of target chemistries and topologies from the literature, with approximately 440,000 query-target pairs. This tool provides a detailed algorithm that can be implemented in search engines to provide search results with full matching of the monomer connectivity and polymer topology.
Collapse
Affiliation(s)
- Nathan J Rebello
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Tzyy-Shyang Lin
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Heeba Nazeer
- Department of Computer Science, Wellesley College, 106 Central Street, Wellesley, Massachusetts 02481, United States
| | - Bradley D Olsen
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
5
|
Shi J, Albreiki F, Yamil J Colón, Srivastava S, Whitmer JK. Transfer Learning Facilitates the Prediction of Polymer-Surface Adhesion Strength. J Chem Theory Comput 2023; 19:4631-4640. [PMID: 37068204 DOI: 10.1021/acs.jctc.2c01314] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
Abstract
Machine learning (ML) accelerates the exploration of material properties and their links to the structure of the underlying molecules. In previous work [Shi et al. ACS Applied Materials & Interfaces 2022, 14, 37161-37169.], ML models were applied to predict the adhesive free energy of polymer-surface interactions with high accuracy from the knowledge of the sequence data, demonstrating successes in inverse-design of polymer sequence for known surface compositions. While the method was shown to be successful in designing polymers for a known surface, extensive data sets were needed for each specific surface in order to train the surrogate models. Ideally, one should be able to infer information about similar surfaces without having to regenerate a full complement of adhesion data for each new case. In the current work, we demonstrate a transfer learning (TL) technique using a deep neural network to improve the accuracy of ML models trained on small data sets by pretraining on a larger database from a related system and fine-tuning the weights of all layers with a small amount of additional data. The shared knowledge from the pretrained model facilitates the prediction accuracy significantly on small data sets. We also explore the limits of database size on accuracy and the optimal tuning of network architecture and parameters for our learning tasks. While applied to a relatively simple coarse-grained (CG) polymer model, the general lessons of this study apply to detailed modeling studies and the broader problems of inverse materials design.
Collapse
Affiliation(s)
- Jiale Shi
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Fahed Albreiki
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Yamil J Colón
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Samanvaya Srivastava
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, Los Angeles, California 90095, United States
- California NanoSystems Institute, Center for Biological Physics, University of California, Los Angeles, Los Angeles, California 90095, United States
- Institute for Carbon Management, University of California, Los Angeles, Los Angeles, California 90095, United States
- Center for Biological Physics, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Jonathan K Whitmer
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| |
Collapse
|
6
|
Liu Z, Liu YX, Yang Y, Li J. Template Design for Complex Block Copolymer Patterns Using a Machine Learning Method. ACS APPLIED MATERIALS & INTERFACES 2023. [PMID: 37335810 DOI: 10.1021/acsami.3c05018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2023]
Abstract
This study represents the first attempt to address the inverse design problem of the guiding template for directed self-assembly (DSA) patterns using solely machine learning methods. By formulating the problem as a multi-label classification task, the study shows that it is possible to predict templates without requiring any forward simulations. A series of neural network (NN) models, ranging from the basic two-layer convolutional neural network (CNN) to the large NN models (32-layer CNN with 8 residual blocks), have been trained using simulated pattern samples generated by thousands of self-consistent field theory (SCFT) calculations; a number of augmentation techniques, especially suitable for predicting morphologies, have been also proposed to enhance the performance of the NN model. The exact match accuracy of the model in predicting the template of simulated patterns was significantly improved from 59.8% for the baseline model to 97.1% for the best model of this study. The best model also demonstrates an excellent generalization ability in predicting the template for human-designed DSA patterns, while the simplest baseline model is ineffective in this task.
Collapse
Affiliation(s)
- Zhihan Liu
- The State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| | - Yi-Xin Liu
- The State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| | - Yuliang Yang
- The State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| | - Jianfeng Li
- The State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| |
Collapse
|
7
|
Martin TB, Audus DJ. Emerging Trends in Machine Learning: A Polymer Perspective. ACS POLYMERS AU 2023; 3:239-258. [PMID: 37334191 PMCID: PMC10273415 DOI: 10.1021/acspolymersau.2c00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/20/2022] [Accepted: 12/21/2022] [Indexed: 01/19/2023]
Abstract
In the last five years, there has been tremendous growth in machine learning and artificial intelligence as applied to polymer science. Here, we highlight the unique challenges presented by polymers and how the field is addressing them. We focus on emerging trends with an emphasis on topics that have received less attention in the review literature. Finally, we provide an outlook for the field, outline important growth areas in machine learning and artificial intelligence for polymer science and discuss important advances from the greater material science community.
Collapse
Affiliation(s)
- Tyler B. Martin
- National Institute of Standards
and Technology, Gaithersburg, Maryland20899, United States
| | - Debra J. Audus
- National Institute of Standards
and Technology, Gaithersburg, Maryland20899, United States
| |
Collapse
|
8
|
Fransen KA, Av-Ron SHM, Buchanan TR, Walsh DJ, Rota DT, Van Note L, Olsen BD. High-throughput experimentation for discovery of biodegradable polyesters. Proc Natl Acad Sci U S A 2023; 120:e2220021120. [PMID: 37252959 PMCID: PMC10266013 DOI: 10.1073/pnas.2220021120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 03/08/2023] [Indexed: 06/01/2023] Open
Abstract
The consistent rise of plastic pollution has stimulated interest in the development of biodegradable plastics. However, the study of polymer biodegradation has historically been limited to a small number of polymers due to costly and slow standard methods for measuring degradation, slowing new material innovation. High-throughput polymer synthesis and a high-throughput polymer biodegradation method are developed and applied to generate a biodegradation dataset for 642 chemically distinct polyesters and polycarbonates. The biodegradation assay was based on the clear-zone technique, using automation to optically observe the degradation of suspended polymer particles under the action of a single Pseudomonas lemoignei bacterial colony. Biodegradability was found to depend strongly on aliphatic repeat unit length, with chains less than 15 carbons and short side chains improving biodegradability. Aromatic backbone groups were generally detrimental to biodegradability; however, ortho- and para-substituted benzene rings in the backbone were more likely to be degradable than metasubstituted rings. Additionally, backbone ether groups improved biodegradability. While other heteroatoms did not show a clear improvement in biodegradability, they did demonstrate increases in biodegradation rates. Machine learning (ML) models were leveraged to predict biodegradability on this large dataset with accuracies over 82% using only chemical structure descriptors.
Collapse
Affiliation(s)
- Katharina A. Fransen
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Sarah H. M. Av-Ron
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Tess R. Buchanan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Dylan J. Walsh
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Dechen T. Rota
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Lana Van Note
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Bradley D. Olsen
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
9
|
Xuan Y, Delaney KT, Ceniceros HD, Fredrickson GH. Machine learning and polymer self-consistent field theory in two spatial dimensions. J Chem Phys 2023; 158:144103. [PMID: 37061486 DOI: 10.1063/5.0142608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023] Open
Abstract
A computational framework that leverages data from self-consistent field theory simulations with deep learning to accelerate the exploration of parameter space for block copolymers is presented. This is a substantial two-dimensional extension of the framework introduced in the work of Xuan et al. [J. Comput. Phys. 443, 110519 (2021)]. Several innovations and improvements are proposed. (1) A Sobolev space-trained, convolutional neural network is employed to handle the exponential dimension increase of the discretized, local average monomer density fields and to strongly enforce both spatial translation and rotation invariance of the predicted, field-theoretic intensive Hamiltonian. (2) A generative adversarial network (GAN) is introduced to efficiently and accurately predict saddle point, local average monomer density fields without resorting to gradient descent methods that employ the training set. This GAN approach yields important savings of both memory and computational cost. (3) The proposed machine learning framework is successfully applied to 2D cell size optimization as a clear illustration of its broad potential to accelerate the exploration of parameter space for discovering polymer nanostructures. Extensions to three-dimensional phase discovery appear to be feasible.
Collapse
Affiliation(s)
- Yao Xuan
- Department of Mathematics, University of California, Santa Barbara, California 93106, USA
| | - Kris T Delaney
- Materials Research Laboratory, University of California, Santa Barbara, California 93106, USA
| | - Hector D Ceniceros
- Department of Mathematics, University of California, Santa Barbara, California 93106, USA
| | - Glenn H Fredrickson
- Materials Research Laboratory, University of California, Santa Barbara, California 93106, USA
| |
Collapse
|
10
|
Ricci E, Vergadou N. Integrating Machine Learning in the Coarse-Grained Molecular Simulation of Polymers. J Phys Chem B 2023; 127:2302-2322. [PMID: 36888553 DOI: 10.1021/acs.jpcb.2c06354] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
Machine learning (ML) is having an increasing impact on the physical sciences, engineering, and technology and its integration into molecular simulation frameworks holds great potential to expand their scope of applicability to complex materials and facilitate fundamental knowledge and reliable property predictions, contributing to the development of efficient materials design routes. The application of ML in materials informatics in general, and polymer informatics in particular, has led to interesting results, however great untapped potential lies in the integration of ML techniques into the multiscale molecular simulation methods for the study of macromolecular systems, specifically in the context of Coarse Grained (CG) simulations. In this Perspective, we aim at presenting the pioneering recent research efforts in this direction and discussing how these new ML-based techniques can contribute to critical aspects of the development of multiscale molecular simulation methods for bulk complex chemical systems, especially polymers. Prerequisites for the implementation of such ML-integrated methods and open challenges that need to be met toward the development of general systematic ML-based coarse graining schemes for polymers are discussed.
Collapse
Affiliation(s)
- Eleonora Ricci
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
- Institute of Informatics and Telecommunications, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
| | - Niki Vergadou
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
| |
Collapse
|
11
|
Dong Q, Gong X, Yuan K, Jiang Y, Zhang L, Li W. Inverse Design of Complex Block Copolymers for Exotic Self-Assembled Structures Based on Bayesian Optimization. ACS Macro Lett 2023; 12:401-407. [PMID: 36888723 DOI: 10.1021/acsmacrolett.3c00020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2023]
Abstract
Variable chain topologies of multiblock copolymers provide great opportunities for the formation of numerous self-assembled nanostructures with promising potential applications. However, the consequent large parameter space poses new challenges for searching the stable parameter region of desired novel structures. In this Letter, by combining Bayesian optimization (BO), fast Fourier transform-assisted 3D convolutional neural network (FFT-3DCNN), and self-consistent field theory (SCFT), we develop a data-driven and fully automated inverse design framework to search for the desired novel structures self-assembled by ABC-type multiblock copolymers. Stable phase regions of three exotic target structures are efficiently identified in high-dimensional parameter space. Our work advances the new research paradigm of inverse design in the field of block copolymers.
Collapse
Affiliation(s)
- Qingshu Dong
- State Key Laboratory of Molecular Engineering of Polymers, Key Laboratory of Computational Physical Sciences, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| | - Xiangrui Gong
- School of Chemistry, Center of Soft Matter Physics and its Applications, Beihang University, Beijing 100191, China
| | - Kangrui Yuan
- State Key Laboratory of Molecular Engineering of Polymers, Key Laboratory of Computational Physical Sciences, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| | - Ying Jiang
- School of Chemistry, Center of Soft Matter Physics and its Applications, Beihang University, Beijing 100191, China
| | - Liangshun Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Weihua Li
- State Key Laboratory of Molecular Engineering of Polymers, Key Laboratory of Computational Physical Sciences, Department of Macromolecular Science, Fudan University, Shanghai 200433, China
| |
Collapse
|
12
|
Yu M, Shi Y, Jia Q, Wang Q, Luo ZH, Yan F, Zhou YN. Ring Repeating Unit: An Upgraded Structure Representation of Linear Condensation Polymers for Property Prediction. J Chem Inf Model 2023; 63:1177-1187. [PMID: 36651860 DOI: 10.1021/acs.jcim.2c01389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Unique structure representation of polymers plays a crucial role in developing models for polymer property prediction and polymer design by data-centric approaches. Currently, monomer and repeating unit (RU) approximations are widely used to represent polymer structures for generating feature descriptors in the modeling of quantitative structure-property relationships (QSPR). However, such conventional structure representations may not uniquely approximate heterochain polymers due to the diversity of monomer combinations and the potential multi-RUs. In this study, the so-called ring repeating unit (RRU) method that can uniquely represent polymers with a broad range of structure diversity is proposed for the first time. As a proof of concept, an RRU-based QSPR model was developed to predict the associated glass transition temperature (Tg) of polyimides (PIs) with deterministic values. Comprehensive model validations including external, internal, and Y-random validations were performed. Also, an RU-based QSPR model developed based on the same large database of 1321 PIs provides nonunique prediction results, which further prove the necessity of RRU-based structure representation. Promising results obtained by the application of the RRU-based model confirm that the as-developed RRU method provides an effective representation that accurately captures the sequence of repeat units and thus realizes reliable polymer property prediction by data-driven approaches.
Collapse
Affiliation(s)
- Mengxian Yu
- School of Chemical Engineering and Materials Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Yajuan Shi
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai200240, P. R. China
| | - Qingzhu Jia
- School of Marine and Environmental Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Qiang Wang
- School of Chemical Engineering and Materials Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Zheng-Hong Luo
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai200240, P. R. China
| | - Fangyou Yan
- School of Chemical Engineering and Materials Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Yin-Ning Zhou
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai200240, P. R. China
| |
Collapse
|
13
|
Abstract
The application of machine learning to the materials domain has traditionally struggled with two major challenges: a lack of large, curated data sets and the need to understand the physics behind the machine-learning prediction. The former problem is particularly acute in the polymers domain. Here we aim to simultaneously tackle these challenges through the incorporation of scientific knowledge, thus, providing improved predictions for smaller data sets, both under interpolation and extrapolation, and a degree of explainability. We focus on imperfect theories, as they are often readily available and easier to interpret. Using a system of a polymer in different solvent qualities, we explore numerous methods for incorporating theory into machine learning using different machine-learning models, including Gaussian process regression. Ultimately, we find that encoding the functional form of the theory performs best followed by an encoding of the numeric values of the theory.
Collapse
Affiliation(s)
- Debra J Audus
- Materials Science and Engineering Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Austin McDannald
- Materials Measurement Science Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Brian DeCost
- Materials Measurement Science Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
14
|
Aldeghi M, Coley CW. A graph representation of molecular ensembles for polymer property prediction. Chem Sci 2022; 13:10486-10498. [PMID: 36277616 PMCID: PMC9473492 DOI: 10.1039/d2sc02839e] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 08/15/2022] [Indexed: 12/02/2022] Open
Abstract
Synthetic polymers are versatile and widely used materials. Similar to small organic molecules, a large chemical space of such materials is hypothetically accessible. Computational property prediction and virtual screening can accelerate polymer design by prioritizing candidates expected to have favorable properties. However, in contrast to organic molecules, polymers are often not well-defined single structures but an ensemble of similar molecules, which poses unique challenges to traditional chemical representations and machine learning approaches. Here, we introduce a graph representation of molecular ensembles and an associated graph neural network architecture that is tailored to polymer property prediction. We demonstrate that this approach captures critical features of polymeric materials, like chain architecture, monomer stoichiometry, and degree of polymerization, and achieves superior accuracy to off-the-shelf cheminformatics methodologies. While doing so, we built a dataset of simulated electron affinity and ionization potential values for >40k polymers with varying monomer composition, stoichiometry, and chain architecture, which may be used in the development of other tailored machine learning approaches. The dataset and machine learning models presented in this work pave the path toward new classes of algorithms for polymer informatics and, more broadly, introduce a framework for the modeling of molecular ensembles.
Collapse
Affiliation(s)
- Matteo Aldeghi
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
15
|
Data-driven approaches for structure-property relationships in polymer science for prediction and understanding. Polym J 2022. [DOI: 10.1038/s41428-022-00648-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
16
|
Nguyen D, Tao L, Li Y. Integration of Machine Learning and Coarse-Grained Molecular Simulations for Polymer Materials: Physical Understandings and Molecular Design. Front Chem 2022; 9:820417. [PMID: 35141207 PMCID: PMC8819075 DOI: 10.3389/fchem.2021.820417] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 12/31/2021] [Indexed: 12/21/2022] Open
Abstract
In recent years, the synthesis of monomer sequence-defined polymers has expanded into broad-spectrum applications in biomedical, chemical, and materials science fields. Pursuing the characterization and inverse design of these polymer systems requires our fundamental understanding not only at the individual monomer level, but also considering the chain scales, such as polymer configuration, self-assembly, and phase separation. However, our accessibility to this field is still rudimentary due to the limitations of traditional design approaches, the complexity of chemical space along with the burdened cost and time issues that prevent us from unveiling the underlying monomer sequence-structure-property relationships. Fortunately, thanks to the recent advancements in molecular dynamics simulations and machine learning (ML) algorithms, the bottlenecks in the tasks of establishing the structure-function correlation of the polymer chains can be overcome. In this review, we will discuss the applications of the integration between ML techniques and coarse-grained molecular dynamics (CGMD) simulations to solve the current issues in polymer science at the chain level. In particular, we focus on the case studies in three important topics-polymeric configuration characterization, feed-forward property prediction, and inverse design-in which CGMD simulations are leveraged to generate training datasets to develop ML-based surrogate models for specific polymer systems and designs. By doing so, this computational hybridization allows us to well establish the monomer sequence-functional behavior relationship of the polymers as well as guide us toward the best polymer chain candidates for the inverse design in undiscovered chemical space with reasonable computational cost and time. Even though there are still limitations and challenges ahead in this field, we finally conclude that this CGMD/ML integration is very promising, not only in the attempt of bridging the monomeric and macroscopic characterizations of polymer materials, but also enabling further tailored designs for sequence-specific polymers with superior properties in many practical applications.
Collapse
Affiliation(s)
- Danh Nguyen
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
| | - Lei Tao
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
- Polymer Program, Institute of Materials Science, University of Connecticut, Mansfield, CT, United States
| |
Collapse
|
17
|
Müller M. Selection of Advances in Theory and Simulation during the First Decade of ACS Macro Letters. ACS Macro Lett 2021; 10:1629-1635. [PMID: 35549151 DOI: 10.1021/acsmacrolett.1c00750] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Marcus Müller
- Institute for Theoretical Physics, Georg-August-University, 37077 Göttingen, Germany
| |
Collapse
|