1
|
Dong C, Li D, Liu J. Glass Transition Temperature Prediction of Polymers via Graph Reinforcement Learning. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2024; 40:18568-18580. [PMID: 39166275 DOI: 10.1021/acs.langmuir.4c01906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/22/2024]
Abstract
An expansive array of graph-based models has been utilized for accurate prediction of the structure-property relation of polymers. However, these approaches notably underutilize unsupervised structural information. Concentrating on the domain of heterocyclic polymers, particularly polyimides, this study delves into the glass transition temperature (Tg) prediction, aiming to fully exploit the potential within both the global and local structures of molecules. To achieve this, a graph reinforcement learning framework termed Molecular Structural Regularized Graph Convolutional Network with Reinforcement Learning (MSRGCN-RL) is proposed. Experimental results highlight the crucial role of both global and local structural regularization in precise Tg prediction. Concurrently, optimization of MSRGCN training through RL proves essential. This research leads the way in integrating Graph Neural Networks (GNNs) with reinforcement learning methodologies for the property prediction of polymers.
Collapse
Affiliation(s)
- Caibo Dong
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Dazi Li
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Jun Liu
- Key Laboratory of Beijing City on Preparation and Processing of Novel Polymer Materials, Beijing University of Chemical Technology, Beijing 100029, China
- State Key Laboratory of Organic-Inorganic Composites, College of Materials Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, China
| |
Collapse
|
2
|
Li SC, Wu H, Menon A, Spiekermann KA, Li YP, Green WH. When Do Quantum Mechanical Descriptors Help Graph Neural Networks to Predict Chemical Properties? J Am Chem Soc 2024; 146:23103-23120. [PMID: 39106041 DOI: 10.1021/jacs.4c04670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/07/2024]
Abstract
Deep graph neural networks are extensively utilized to predict chemical reactivity and molecular properties. However, because of the complexity of chemical space, such models often have difficulty extrapolating beyond the chemistry contained in the training set. Augmenting the model with quantum mechanical (QM) descriptors is anticipated to improve its generalizability. However, obtaining QM descriptors often requires CPU-intensive computational chemistry calculations. To identify when QM descriptors help graph neural networks predict chemical properties, we conduct a systematic investigation of the impact of atom, bond, and molecular QM descriptors on the performance of directed message passing neural networks (D-MPNNs) for predicting 16 molecular properties. The analysis surveys computational and experimental targets, as well as classification and regression tasks, and varied data set sizes from several hundred to hundreds of thousands of data points. Our results indicate that QM descriptors are mostly beneficial for D-MPNN performance on small data sets, provided that the descriptors correlate well with the targets and can be readily computed with high accuracy. Otherwise, using QM descriptors can add cost without benefit or even introduce unwanted noise that can degrade model performance. Strategic integration of QM descriptors with D-MPNN unlocks potential for physics-informed, data-efficient modeling with some interpretability that can streamline de novo drug and material designs. To facilitate the use of QM descriptors in machine learning workflows for chemistry, we provide a set of guidelines regarding when and how to best leverage QM descriptors, a high-throughput workflow to compute them, and an enhancement to Chemprop, a widely adopted open-source D-MPNN implementation for chemical property prediction.
Collapse
Affiliation(s)
- Shih-Cheng Li
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Haoyang Wu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Angiras Menon
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
3
|
Lyu N, Bergold P, Soley MB, Wang C, Batista VS. Holographic Gaussian Boson Sampling with Matrix Product States on 3D cQED Processors. J Chem Theory Comput 2024; 20:6402-6413. [PMID: 38968605 DOI: 10.1021/acs.jctc.4c00384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2024]
Abstract
We introduce quantum circuits for simulations of multimode state vectors on 3D circuit quantum electrodynamics (cQED) processors using matrix product state representations. The circuits are demonstrated as applied to simulations of molecular docking based on holographic Gaussian boson sampling (GBS), as illustrated for the binding of a thiol-containing aryl sulfonamide ligand to the tumor necrosis factor-α converting enzyme receptor. We show that cQED devices with a modest number of modes could be employed to simulate multimode systems by repurposing working modes through measurement and reinitialization. We anticipate that a wide range of GBS applications could be implemented on compact 3D cQED processors analogously using the holographic approach. Simulations on qubit-based quantum computers could be implemented analogously using circuits that represent continuous variables in terms of truncated expansions of Fock states.
Collapse
Affiliation(s)
- Ningyi Lyu
- Department of Chemistry, Yale University, New Haven, Connecticut 06520, United States
| | - Paul Bergold
- Department of Mathematics, University of Surrey, Guildford GU2 7XH, U.K
| | - Micheline B Soley
- Department of Chemistry, Yale University, New Haven, Connecticut 06520, United States
- Yale Quantum Institute, Yale University, New Haven, Connecticut 06511, United States
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Chen Wang
- Department of Physics, University of Massachusetts-Amherst, Amherst, Massachusetts 01003, United States
| | - Victor S Batista
- Department of Chemistry, Yale University, New Haven, Connecticut 06520, United States
- Yale Quantum Institute, Yale University, New Haven, Connecticut 06511, United States
| |
Collapse
|
4
|
Wang S, Yue H, Yuan X. Accelerating Polymer Discovery with Uncertainty-Guided PGCNN: Explainable AI for Predicting Properties and Mechanistic Insights. J Chem Inf Model 2024; 64:5500-5509. [PMID: 38953249 DOI: 10.1021/acs.jcim.4c00555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
Deep learning holds great potential for expediting the discovery of new polymers from the vast chemical space. However, accurately predicting polymer properties for practical applications based on their monomer composition has long been a challenge. The main obstacles include insufficient data, ineffective representation encoding, and lack of explainability. To address these issues, we propose an interpretable model called the Polymer Graph Convolutional Neural Network (PGCNN) that can accurately predict various polymer properties. This model is trained using the RadonPy data set and validated using experimental data samples. By integrating evidential deep learning with the model, we can quantify the uncertainty of predictions and enable sample-efficient training through uncertainty-guided active learning. Additionally, we demonstrate that the global attention of the graph embedding can aid in discovering underlying physical principles by identifying important functional groups within polymers and associating them with specific material attributes. Lastly, we explore the high-throughput screening capability of our model by rapidly identifying thousands of promising candidates with low and high thermal conductivity from a pool of one million hypothetical polymers. In summary, our research not only advances our mechanistic understanding of polymers using explainable AI but also paves the way for data-driven trustworthy discovery of polymer materials.
Collapse
Affiliation(s)
- Shuyu Wang
- Department of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao, Hebei 066000, China
| | - Hongxing Yue
- Department of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao, Hebei 066000, China
| | - Xiaoming Yuan
- Xiaoming Yuan - Department of Computer Science and Engineering, Northeastern University at Qinhuangdao, Qinhuangdao, Hebei 066000, China
| |
Collapse
|
5
|
Choi S, Lee J, Seo J, Han SW, Lee SH, Seo JH, Seok J. Automated BigSMILES conversion workflow and dataset for homopolymeric macromolecules. Sci Data 2024; 11:371. [PMID: 38605036 PMCID: PMC11009387 DOI: 10.1038/s41597-024-03212-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 04/02/2024] [Indexed: 04/13/2024] Open
Abstract
The simplified molecular-input line-entry system (SMILES) has been utilized in a variety of artificial intelligence analyses owing to its capability of representing chemical structures using line notation. However, its ease of representation is limited, which has led to the proposal of BigSMILES as an alternative method suitable for the representation of macromolecules. Nevertheless, research on BigSMILES remains limited due to its preprocessing requirements. Thus, this study proposes a conversion workflow of BigSMILES, focusing on its automated generation from SMILES representations of homopolymers. BigSMILES representations for 4,927,181 records are provided, thereby enabling its immediate use for various research and development applications. Our study presents detailed descriptions on a validation process to ensure the accuracy, interchangeability, and robustness of the conversion. Additionally, a systematic overview of utilized codes and functions that emphasizes their relevance in the context of BigSMILES generation are produced. This advancement is anticipated to significantly aid researchers and facilitate further studies in BigSMILES representation, including potential applications in deep learning and further extension to complex structures such as copolymers.
Collapse
Affiliation(s)
- Sunho Choi
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Joonbum Lee
- Department of Materials Science and Engineering, Korea University, Seoul, South Korea
| | - Jangwon Seo
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Sung Won Han
- School of Industrial Management Engineering, Korea University, Seoul, South Korea
| | - Sang Hyun Lee
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Ji-Hun Seo
- Department of Materials Science and Engineering, Korea University, Seoul, South Korea
| | - Junhee Seok
- School of Electrical Engineering, Korea University, Seoul, South Korea.
| |
Collapse
|
6
|
Zhang X, Sheng Y, Liu X, Yang J, Goddard Iii WA, Ye C, Zhang W. Polymer-Unit Graph: Advancing Interpretability in Graph Neural Network Machine Learning for Organic Polymer Semiconductor Materials. J Chem Theory Comput 2024; 20:2908-2920. [PMID: 38551455 DOI: 10.1021/acs.jctc.3c01385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
The graph representation of complex materials plays a crucial role in the field of inorganic and organic materials investigations for developing data-centric materials science, such as those using graph neural networks (GNNs). However, the currently prevalent GNN models are primarily employed for investigating periodic crystals and organic small molecule data, yet they still encounter challenges in terms of interpretability and computational efficiency when applied to polymer monomers and organic macromolecules data. There is still a lack of graph representation of organic polymers and macromolecules specifically tailored for GNN models to explore the structural characteristics. The Polymer-unit Graph, a novel coarse-grained graph representation method introduced in study, is dedicated to expressing and analyzing polymers and macromolecules. By incorporating the Polymer-unit Graph into the GNN models and analyzing the organic semiconductor (OSC) materials database, it becomes possible to uncover intricate structure-property relationships involving branched-chain engineering, fluoridation substitution, and donor-acceptor combination effects on the elementary structure of OSC polymers. Furthermore, the Polymer-unit Graph enables visualizing the relationship between target properties and polymer units while reducing training time by an impressive 98% and minimizing molecular graph representation models. In conclusion, the Polymer-unit Graph successfully integrates the concept of Polymer-unit into the field of GNNs, enabling more accurate analysis and understanding of organic polymers and macromolecules.
Collapse
Affiliation(s)
- Xinyue Zhang
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Ye Sheng
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Xiumin Liu
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
- Key Laboratory of Soft Chemistry and Functional Materials of MOE, School of Chemistry and Chemical Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| | - Jiong Yang
- Materials Genome Institute, Shanghai University, Shanghai 200444, PR China
| | - William A Goddard Iii
- Materials and Process Simulation Center (MSC), California Institute of Technology, Pasadena, California 91125, United States
| | - Caichao Ye
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
- Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Wenqing Zhang
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| |
Collapse
|
7
|
Patel RA, Webb MA. Data-Driven Design of Polymer-Based Biomaterials: High-throughput Simulation, Experimentation, and Machine Learning. ACS APPLIED BIO MATERIALS 2024; 7:510-527. [PMID: 36701125 DOI: 10.1021/acsabm.2c00962] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Polymers, with the capacity to tunably alter properties and response based on manipulation of their chemical characteristics, are attractive components in biomaterials. Nevertheless, their potential as functional materials is also inhibited by their complexity, which complicates rational or brute-force design and realization. In recent years, machine learning has emerged as a useful tool for facilitating materials design via efficient modeling of structure-property relationships in the chemical domain of interest. In this Spotlight, we discuss the emergence of data-driven design of polymers that can be deployed in biomaterials with particular emphasis on complex copolymer systems. We outline recent developments, as well as our own contributions and takeaways, related to high-throughput data generation for polymer systems, methods for surrogate modeling by machine learning, and paradigms for property optimization and design. Throughout this discussion, we highlight key aspects of successful strategies and other considerations that will be relevant to the future design of polymer-based biomaterials with target properties.
Collapse
Affiliation(s)
- Roshan A Patel
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| |
Collapse
|
8
|
Shi J, Walsh D, Zou W, Rebello NJ, Deagen ME, Fransen KA, Gao X, Olsen BD, Audus DJ. Calculating Pairwise Similarity of Polymer Ensembles via Earth Mover's Distance. ACS POLYMERS AU 2024; 4:66-76. [PMID: 38371731 PMCID: PMC10870752 DOI: 10.1021/acspolymersau.3c00029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 02/20/2024]
Abstract
Synthetic polymers, in contrast to small molecules and deterministic biomacromolecules, are typically ensembles composed of polymer chains with varying numbers, lengths, sequences, chemistry, and topologies. While numerous approaches exist for measuring pairwise similarity among small molecules and sequence-defined biomacromolecules, accurately determining the pairwise similarity between two polymer ensembles remains challenging. This work proposes the earth mover's distance (EMD) metric to calculate the pairwise similarity score between two polymer ensembles. EMD offers a greater resolution of chemical differences between polymer ensembles than the averaging method and provides a quantitative numeric value representing the pairwise similarity between polymer ensembles in alignment with chemical intuition. The EMD approach for assessing polymer similarity enhances the development of accurate chemical search algorithms within polymer databases and can improve machine learning techniques for polymer design, optimization, and property prediction.
Collapse
Affiliation(s)
- Jiale Shi
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Dylan Walsh
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Weizhong Zou
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Nathan J. Rebello
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael E. Deagen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Katharina A. Fransen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Xian Gao
- Department
of Chemical and Biomolecular Engineering, University of Notre Dame, Notre
Dame, Indiana 46556, United States
| | - Bradley D. Olsen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Debra J. Audus
- Materials
Science and Engineering Division, National
Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
9
|
Qiu H, Liu L, Qiu X, Dai X, Ji X, Sun ZY. PolyNC: a natural and chemical language model for the prediction of unified polymer properties. Chem Sci 2024; 15:534-544. [PMID: 38179518 PMCID: PMC10763023 DOI: 10.1039/d3sc05079c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 12/04/2023] [Indexed: 01/06/2024] Open
Abstract
Language models exhibit a profound aptitude for addressing multimodal and multidomain challenges, a competency that eludes the majority of off-the-shelf machine learning models. Consequently, language models hold great potential for comprehending the intricate interplay between material compositions and diverse properties, thereby accelerating material design, particularly in the realm of polymers. While past limitations in polymer data hindered the use of data-intensive language models, the growing availability of standardized polymer data and effective data augmentation techniques now opens doors to previously uncharted territories. Here, we present a revolutionary model to enable rapid and precise prediction of Polymer properties via the power of Natural language and Chemical language (PolyNC). To showcase the efficacy of PolyNC, we have meticulously curated a labeled prompt-structure-property corpus encompassing 22 970 polymer data points on a series of essential polymer properties. Through the use of natural language prompts, PolyNC gains a comprehensive understanding of polymer properties, while employing chemical language (SMILES) to describe polymer structures. In a unified text-to-text manner, PolyNC consistently demonstrates exceptional performance on both regression tasks (such as property prediction) and the classification task (polymer classification). Simultaneous and interactive multitask learning enables PolyNC to holistically grasp the structure-property relationships of polymers. Through a combination of experiments and characterizations, the generalization ability of PolyNC has been demonstrated, with attention analysis further indicating that PolyNC effectively learns structural information about polymers from multimodal inputs. This work provides compelling evidence of the potential for deploying end-to-end language models in polymer research, representing a significant advancement in the AI community's dedicated pursuit of advancing polymer science.
Collapse
Affiliation(s)
- Haoke Qiu
- State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
- School of Applied Chemistry and Engineering, University of Science and Technology of China Hefei 230026 China
| | - Lunyang Liu
- State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
| | - Xuepeng Qiu
- School of Applied Chemistry and Engineering, University of Science and Technology of China Hefei 230026 China
- CAS Key Laboratory of High-Performance Synthetic Rubber and its Composite Materials, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
| | - Xuemin Dai
- CAS Key Laboratory of High-Performance Synthetic Rubber and its Composite Materials, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
| | - Xiangling Ji
- State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
- School of Applied Chemistry and Engineering, University of Science and Technology of China Hefei 230026 China
| | - Zhao-Yan Sun
- State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
- School of Applied Chemistry and Engineering, University of Science and Technology of China Hefei 230026 China
| |
Collapse
|
10
|
Day EC, Chittari SS, Bogen MP, Knight AS. Navigating the Expansive Landscapes of Soft Materials: A User Guide for High-Throughput Workflows. ACS POLYMERS AU 2023; 3:406-427. [PMID: 38107416 PMCID: PMC10722570 DOI: 10.1021/acspolymersau.3c00025] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/02/2023] [Accepted: 11/07/2023] [Indexed: 12/19/2023]
Abstract
Synthetic polymers are highly customizable with tailored structures and functionality, yet this versatility generates challenges in the design of advanced materials due to the size and complexity of the design space. Thus, exploration and optimization of polymer properties using combinatorial libraries has become increasingly common, which requires careful selection of synthetic strategies, characterization techniques, and rapid processing workflows to obtain fundamental principles from these large data sets. Herein, we provide guidelines for strategic design of macromolecule libraries and workflows to efficiently navigate these high-dimensional design spaces. We describe synthetic methods for multiple library sizes and structures as well as characterization methods to rapidly generate data sets, including tools that can be adapted from biological workflows. We further highlight relevant insights from statistics and machine learning to aid in data featurization, representation, and analysis. This Perspective acts as a "user guide" for researchers interested in leveraging high-throughput screening toward the design of multifunctional polymers and predictive modeling of structure-property relationships in soft materials.
Collapse
Affiliation(s)
| | | | - Matthew P. Bogen
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Abigail S. Knight
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
11
|
Ting JM, Tamayo-Mendoza T, Petersen SR, Van Reet J, Ahmed UA, Snell NJ, Fisher JD, Stern M, Oviedo F. Frontiers in nonviral delivery of small molecule and genetic drugs, driven by polymer chemistry and machine learning for materials informatics. Chem Commun (Camb) 2023; 59:14197-14209. [PMID: 37955165 DOI: 10.1039/d3cc04705a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2023]
Abstract
Materials informatics (MI) has immense potential to accelerate the pace of innovation and new product development in biotechnology. Close collaborations between skilled physical and life scientists with data scientists are being established in pursuit of leveraging MI tools in automation and artificial intelligence (AI) to predict material properties in vitro and in vivo. However, the scarcity of large, standardized, and labeled materials data for connecting structure-function relationships represents one of the largest hurdles to overcome. In this Highlight, focus is brought to emerging developments in polymer-based therapeutic delivery platforms, where teams generate large experimental datasets around specific therapeutics and successfully establish a design-to-deployment cycle of specialized nanocarriers. Three select collaborations demonstrate how custom-built polymers protect and deliver small molecules, nucleic acids, and proteins, representing ideal use-cases for machine learning to understand how molecular-level interactions impact drug stabilization and release. We conclude with our perspectives on how MI innovations in automation efficiencies and digitalization of data-coupled with fundamental insight and creativity from the polymer science community-can accelerate translation of more gene therapies into lifesaving medicines.
Collapse
|
12
|
Sanchez Medina E, Kunchapu S, Sundmacher K. Gibbs-Helmholtz Graph Neural Network for the Prediction of Activity Coefficients of Polymer Solutions at Infinite Dilution. J Phys Chem A 2023; 127:9863-9873. [PMID: 37943172 PMCID: PMC10683018 DOI: 10.1021/acs.jpca.3c05892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/18/2023] [Accepted: 10/25/2023] [Indexed: 11/10/2023]
Abstract
Machine learning models have gained prominence for predicting pure-component properties, yet their application to mixture property prediction remains relatively limited. However, the significance of mixtures in our daily lives is undeniable, particularly in industries such as polymer processing. This study presents a modification of the Gibbs-Helmholtz graph neural network (GH-GNN) model for predicting weight-based activity coefficients at infinite dilution (Ωij∞) in polymer solutions. We evaluate various polymer representations ranging from monomer, repeating unit, periodic unit, and oligomer and observe that, in data-scarce scenarios of polymer-solvent mixtures, polymer representation specifics have a reduced impact compared to data-rich environments. Leveraging transfer learning, we harness richer activity coefficient data from small-size systems, enhancing model accuracy and reducing prediction variability. The modified GH-GNN model achieves remarkable prediction results in mixture interpolation and solvent extrapolation tasks having an overall mean absolute error of 0.15, showcasing the potential of graph-neural-network-based models for property prediction of polymer solutions. Comparative analysis with the established models UNIFAC-ZM and Entropic-FV suggests a promising avenue for future research on the use of data-driven models for the prediction of the thermodynamic properties of polymer solutions.
Collapse
Affiliation(s)
- Edgar
Ivan Sanchez Medina
- Chair
for Process Systems Engineering, Otto-von-Guericke
University, Universitätsplatz 2, Magdeburg 39106, Germany
| | - Sreekanth Kunchapu
- Chair
for Process Systems Engineering, Otto-von-Guericke
University, Universitätsplatz 2, Magdeburg 39106, Germany
| | - Kai Sundmacher
- Chair
for Process Systems Engineering, Otto-von-Guericke
University, Universitätsplatz 2, Magdeburg 39106, Germany
- Process
Systems Engineering, Max Planck Institute
for Dynamics of Complex Technical Systems, Sandtorstraße 1, Magdeburg 39106, Germany
| |
Collapse
|
13
|
Hu J, Li Z, Lin J, Zhang L. Prediction and Interpretability of Glass Transition Temperature of Homopolymers by Data-Augmented Graph Convolutional Neural Networks. ACS APPLIED MATERIALS & INTERFACES 2023; 15:54006-54017. [PMID: 37934171 DOI: 10.1021/acsami.3c13698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
Establishing the structure-property relationship by machine learning (ML) models is extremely valuable for accelerating the molecular design of polymers. However, existing ML models for the polymers are subject to scarcity issues of training data and fewer variations of graph structures of molecules. In addition, limited works have explored the interpretability of ML models to infer the latent knowledge in the field of polymer science that could inspire ML-assisted molecular design. In this contribution, we integrate graph convolutional neural networks (GCNs) with data augmentation strategy to predict the glass transition temperature Tg of polymers. It is demonstrated that the data-augmented GCN model outperforms the conventional models and achieves a higher accuracy for the prediction of Tg despite a small amount of training data. Furthermore, taking advantage of molecular graph representations, the data-augmented GCN model has the capability to infer the importance of atoms or substructures from the understanding of Tg, which generally agrees with the experimental findings in the field of polymer science. The inferred knowledge of the GCN model is used to advise on the design of functional polymers with specific Tg. The data-augmented GCN model possesses prominent superiorities in the establishment of structure-property relationship and also provides an efficient way for accelerating the rational design of polymer molecules.
Collapse
Affiliation(s)
- Junyang Hu
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Zean Li
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Jiaping Lin
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liangshun Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
14
|
Rebello NJ, Lin TS, Nazeer H, Olsen BD. BigSMARTS: A Topologically Aware Query Language and Substructure Search Algorithm for Polymer Chemical Structures. J Chem Inf Model 2023; 63:6555-6568. [PMID: 37874026 DOI: 10.1021/acs.jcim.3c00978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Molecular search is important in chemistry, biology, and informatics for identifying molecular structures within large data sets, improving knowledge discovery and innovation, and making chemical data FAIR (findable, accessible, interoperable, reusable). Search algorithms for polymers are significantly less developed than those for small molecules because polymer search relies on searching by polymer name, which can be challenging because polymer naming is overly broad (i.e., polyethylene), complicated for complex chemical structures, and often does not correspond to official IUPAC conventions. Chemical structure search in polymers is limited to substructures, such as monomers, without awareness of connectivity or topology. This work introduces a novel query language and graph traversal search algorithm for polymers that provides the first search method able to fully capture all of the chemical structures present in polymers. The BigSMARTS query language, an extension of the small-molecule SMARTS language, allows users to write queries that localize monomer and functional group searches to different parts of the polymer, like the middle block of a triblock, the side chain of a graft, and the backbone of a repeat unit. The substructure search algorithm is based on the traversal of graph representations of the generating functions for the stochastic graphs of polymers. Operationally, the algorithm first identifies cycles representing the monomers and then the end groups and finally performs a depth-first search to match entire subgraphs. To validate the algorithm, hundreds of queries were searched against hundreds of target chemistries and topologies from the literature, with approximately 440,000 query-target pairs. This tool provides a detailed algorithm that can be implemented in search engines to provide search results with full matching of the monomer connectivity and polymer topology.
Collapse
Affiliation(s)
- Nathan J Rebello
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Tzyy-Shyang Lin
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Heeba Nazeer
- Department of Computer Science, Wellesley College, 106 Central Street, Wellesley, Massachusetts 02481, United States
| | - Bradley D Olsen
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
15
|
AlFaraj Y, Mohapatra S, Shieh P, Husted KEL, Ivanoff DG, Lloyd EM, Cooper JC, Dai Y, Singhal AP, Moore JS, Sottos NR, Gomez-Bombarelli R, Johnson JA. A Model Ensemble Approach Enables Data-Driven Property Prediction for Chemically Deconstructable Thermosets in the Low-Data Regime. ACS CENTRAL SCIENCE 2023; 9:1810-1819. [PMID: 37780353 PMCID: PMC10540282 DOI: 10.1021/acscentsci.3c00502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Indexed: 10/03/2023]
Abstract
Thermosets present sustainability challenges that could potentially be addressed through the design of deconstructable variants with tunable properties; however, the combinatorial space of possible thermoset molecular building blocks (e.g., monomers, cross-linkers, and additives) and manufacturing conditions is vast, and predictive knowledge for how combinations of these molecular components translate to bulk thermoset properties is lacking. Data science could overcome these problems, but computational methods are difficult to apply to multicomponent, amorphous, statistical copolymer materials for which little data exist. Here, leveraging a data set with 101 examples, we introduce a closed-loop experimental, machine learning (ML), and virtual screening strategy to enable predictions of the glass transition temperature (Tg) of polydicyclopentadiene (pDCPD) thermosets containing cleavable bifunctional silyl ether (BSE) comonomers and/or cross-linkers with varied compositions and loadings. Molecular features and formulation variables are used as model inputs, and uncertainty is quantified through model ensembling, which together with heavy regularization helps to avoid overfitting and ultimately achieves predictions within <15 °C for thermosets with compositionally diverse BSEs. This work offers a path to predicting the properties of thermosets based on their molecular building blocks, which may accelerate the discovery of promising plastics, rubbers, and composites with improved functionality and controlled deconstructability.
Collapse
Affiliation(s)
- Yasmeen
S. AlFaraj
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Somesh Mohapatra
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Peyton Shieh
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Keith E. L. Husted
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Douglass G. Ivanoff
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Evan M. Lloyd
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
- Department
of Chemistry, University of Illinois at
Urbana—Champaign, Urbana, Illinois 61801, United States of America
| | - Julian C. Cooper
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
- Department
of Chemistry, University of Illinois at
Urbana—Champaign, Urbana, Illinois 61801, United States of America
| | - Yutong Dai
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Avni P. Singhal
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Jeffrey S. Moore
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Nancy R. Sottos
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Rafael Gomez-Bombarelli
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Jeremiah A. Johnson
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| |
Collapse
|
16
|
Song Y, Chang S, Tian J, Pan W, Feng L, Ji H. A Comprehensive Comparative Analysis of Deep Learning Based Feature Representations for Molecular Taste Prediction. Foods 2023; 12:3386. [PMID: 37761095 PMCID: PMC10529232 DOI: 10.3390/foods12183386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 08/30/2023] [Accepted: 09/01/2023] [Indexed: 09/29/2023] Open
Abstract
Taste determination in small molecules is critical in food chemistry but traditional experimental methods can be time-consuming. Consequently, computational techniques have emerged as valuable tools for this task. In this study, we explore taste prediction using various molecular feature representations and assess the performance of different machine learning algorithms on a dataset comprising 2601 molecules. The results reveal that GNN-based models outperform other approaches in taste prediction. Moreover, consensus models that combine diverse molecular representations demonstrate improved performance. Among these, the molecular fingerprints + GNN consensus model emerges as the top performer, highlighting the complementary strengths of GNNs and molecular fingerprints. These findings have significant implications for food chemistry research and related fields. By leveraging these computational approaches, taste prediction can be expedited, leading to advancements in understanding the relationship between molecular structure and taste perception in various food components and related compounds.
Collapse
Affiliation(s)
- Yu Song
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, China;
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Sihao Chang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Jing Tian
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Weihua Pan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Lu Feng
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, China;
| | - Hongchao Ji
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| |
Collapse
|
17
|
McDonald SM, Augustine EK, Lanners Q, Rudin C, Catherine Brinson L, Becker ML. Applied machine learning as a driver for polymeric biomaterials design. Nat Commun 2023; 14:4838. [PMID: 37563117 PMCID: PMC10415291 DOI: 10.1038/s41467-023-40459-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 07/24/2023] [Indexed: 08/12/2023] Open
Abstract
Polymers are ubiquitous to almost every aspect of modern society and their use in medical products is similarly pervasive. Despite this, the diversity in commercial polymers used in medicine is stunningly low. Considerable time and resources have been extended over the years towards the development of new polymeric biomaterials which address unmet needs left by the current generation of medical-grade polymers. Machine learning (ML) presents an unprecedented opportunity in this field to bypass the need for trial-and-error synthesis, thus reducing the time and resources invested into new discoveries critical for advancing medical treatments. Current efforts pioneering applied ML in polymer design have employed combinatorial and high throughput experimental design to address data availability concerns. However, the lack of available and standardized characterization of parameters relevant to medicine, including degradation time and biocompatibility, represents a nearly insurmountable obstacle to ML-aided design of biomaterials. Herein, we identify a gap at the intersection of applied ML and biomedical polymer design, highlight current works at this junction more broadly and provide an outlook on challenges and future directions.
Collapse
Affiliation(s)
| | - Emily K Augustine
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
| | - Quinn Lanners
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Cynthia Rudin
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - L Catherine Brinson
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
| | - Matthew L Becker
- Department of Chemistry, Duke University, Durham, NC, USA.
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA.
| |
Collapse
|
18
|
Malhotra P, Biswas S, Sharma GD. Directed Message Passing Neural Network for Predicting Power Conversion Efficiency in Organic Solar Cells. ACS APPLIED MATERIALS & INTERFACES 2023; 15:37741-37747. [PMID: 37490851 DOI: 10.1021/acsami.3c08068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
Organic solar cells (OSCs) have emerged as a promising technology for renewable energy generation, and researchers are constantly exploring ways to improve their efficiency. For prediction of photovoltaic properties in OSCs, many machine learning models have been used in the past. All the models are used with fixed molecular descriptors and molecular fingerprints as input for power conversion efficiency (PCE) prediction. Recently, the graph neural network (GNN), which can model graph structures of the molecule, has received increasing attention as a method that could potentially overcome the limitations of fixed descriptors by learning the task-specific representations using graph convolutions. In this study, we have used the directed message passing neural network (D-MPNN), an emerging type of GNN for predicting PCE of organic solar cells, and the results are compared for the same train and test set with fixed descriptors and fingerprints. The excellent performance demonstrated by the D-MPNN model in this investigation highlights its potential for predicting PCE, surpassing the limitations of conventional fixed descriptors.
Collapse
Affiliation(s)
- Prateek Malhotra
- Department of Physics, The LNM Institute of Information Technology, Jamdoli, Jaipur, Rajasthan 302031, India
| | - Subhayan Biswas
- Department of Physics, The LNM Institute of Information Technology, Jamdoli, Jaipur, Rajasthan 302031, India
| | - Ganesh D Sharma
- Department of Physics, The LNM Institute of Information Technology, Jamdoli, Jaipur, Rajasthan 302031, India
| |
Collapse
|
19
|
Kuenneth C, Ramprasad R. polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat Commun 2023; 14:4099. [PMID: 37433807 DOI: 10.1038/s41467-023-39868-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 06/28/2023] [Indexed: 07/13/2023] Open
Abstract
Polymers are a vital part of everyday life. Their chemical universe is so large that it presents unprecedented opportunities as well as significant challenges to identify suitable application-specific candidates. We present a complete end-to-end machine-driven polymer informatics pipeline that can search this space for suitable candidates at unprecedented speed and accuracy. This pipeline includes a polymer chemical fingerprinting capability called polyBERT (inspired by Natural Language Processing concepts), and a multitask learning approach that maps the polyBERT fingerprints to a host of properties. polyBERT is a chemical linguist that treats the chemical structure of polymers as a chemical language. The present approach outstrips the best presently available concepts for polymer property prediction based on handcrafted fingerprint schemes in speed by two orders of magnitude while preserving accuracy, thus making it a strong candidate for deployment in scalable architectures including cloud infrastructures.
Collapse
Affiliation(s)
- Christopher Kuenneth
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
- Faculty of Engineering Science, University of Bayreuth, 95447, Bayreuth, Germany
| | - Rampi Ramprasad
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA.
| |
Collapse
|
20
|
Lo S, Seifrid M, Gaudin T, Aspuru-Guzik A. Augmenting Polymer Datasets by Iterative Rearrangement. J Chem Inf Model 2023. [PMID: 37390494 DOI: 10.1021/acs.jcim.3c00144] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2023]
Abstract
One of the biggest obstacles to successful polymer property prediction is an effective representation that accurately captures the sequence of repeat units in a polymer. Motivated by the success of data augmentation in computer vision and natural language processing, we explore augmenting polymer data by iteratively rearranging the molecular representation while preserving the correct connectivity, revealing additional substructural information that is not present in a single representation. We evaluate the effects of this technique on the performance of machine learning models trained on three polymer datasets and compare them to common molecular representations. Data augmentation does not yield significant improvements in machine learning property prediction performance compared to equivalent (non-augmented) representations. In datasets where the target property is primarily influenced by the polymer sequence rather than experimental parameters, this data augmentation technique provides molecular embedding with more information to improve property prediction accuracy.
Collapse
Affiliation(s)
- Stanley Lo
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
| | - Martin Seifrid
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, Ontario M5S 2E4, Canada
- IBM Research Zürich, Rüschlikon, Zürich 8803, Switzerland
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, Ontario M5S 2E4, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, 200 College St., Toronto, Ontario M5S 3E5, Canada
- Department of Materials Science and Engineering, University of Toronto, 184 College St., Toronto, Ontario M5S 3E4, Canada
- CIFAR Artificial Intelligence Research Chair, Vector Institute, Toronto, Ontario M5S 1M1, Canada
- Canadian Institute for Advanced Research (CIFAR), Toronto, Ontario M5S 1M1, Canada
| |
Collapse
|
21
|
Park NH, Manica M, Born J, Hedrick JL, Erdmann T, Zubarev DY, Adell-Mill N, Arrechea PL. Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language. Nat Commun 2023; 14:3686. [PMID: 37344485 PMCID: PMC10284867 DOI: 10.1038/s41467-023-39396-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 06/12/2023] [Indexed: 06/23/2023] Open
Abstract
Advances in machine learning (ML) and automated experimentation are poised to vastly accelerate research in polymer science. Data representation is a critical aspect for enabling ML integration in research workflows, yet many data models impose significant rigidity making it difficult to accommodate a broad array of experiment and data types found in polymer science. This inflexibility presents a significant barrier for researchers to leverage their historical data in ML development. Here we show that a domain specific language, termed Chemical Markdown Language (CMDL), provides flexible, extensible, and consistent representation of disparate experiment types and polymer structures. CMDL enables seamless use of historical experimental data to fine-tune regression transformer (RT) models for generative molecular design tasks. We demonstrate the utility of this approach through the generation and the experimental validation of catalysts and polymers in the context of ring-opening polymerization-although we provide examples of how CMDL can be more broadly applied to other polymer classes. Critically, we show how the CMDL tuned model preserves key functional groups within the polymer structure, allowing for experimental validation. These results reveal the versatility of CMDL and how it facilitates translation of historical data into meaningful predictive and generative models to produce experimentally actionable output.
Collapse
Affiliation(s)
| | - Matteo Manica
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
| | - Jannis Born
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058, Basel, Switzerland
| | - James L Hedrick
- IBM Research-Almaden, 650 Harry Rd., San Jose, CA, 95120, USA
| | - Tim Erdmann
- IBM Research-Almaden, 650 Harry Rd., San Jose, CA, 95120, USA
| | | | - Nil Adell-Mill
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Arctoris, 120E Olympic Avenue, Abingdon, OX14 4SA, Oxfordshire, UK
| | | |
Collapse
|
22
|
Ryu JY, Elala E, Rhee JKK. Quantum Graph Neural Network Models for Materials Search. MATERIALS (BASEL, SWITZERLAND) 2023; 16:4300. [PMID: 37374486 DOI: 10.3390/ma16124300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 06/03/2023] [Accepted: 06/05/2023] [Indexed: 06/29/2023]
Abstract
Inspired by classical graph neural networks, we discuss a novel quantum graph neural network (QGNN) model to predict the chemical and physical properties of molecules and materials. QGNNs were investigated to predict the energy gap between the highest occupied and lowest unoccupied molecular orbitals of small organic molecules. The models utilize the equivariantly diagonalizable unitary quantum graph circuit (EDU-QGC) framework to allow discrete link features and minimize quantum circuit embedding. The results show QGNNs can achieve lower test loss compared to classical models if a similar number of trainable variables are used, and converge faster in training. This paper also provides a review of classical graph neural network models for materials research and various QGNNs.
Collapse
Affiliation(s)
- Ju-Young Ryu
- School of Electrical Engineering & ITRC of Quantum Computing for AI, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
- Qunova Computing, Incorporated, 193 Munji-ro, Yuseong-gu, Daejeon 34051, Republic of Korea
| | - Eyuel Elala
- School of Electrical Engineering & ITRC of Quantum Computing for AI, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
- Qunova Computing, Incorporated, 193 Munji-ro, Yuseong-gu, Daejeon 34051, Republic of Korea
| | - June-Koo Kevin Rhee
- School of Electrical Engineering & ITRC of Quantum Computing for AI, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
- Qunova Computing, Incorporated, 193 Munji-ro, Yuseong-gu, Daejeon 34051, Republic of Korea
| |
Collapse
|
23
|
Hickman RJ, Bannigan P, Bao Z, Aspuru-Guzik A, Allen C. Self-driving laboratories: A paradigm shift in nanomedicine development. MATTER 2023; 6:1071-1081. [PMID: 37020832 PMCID: PMC9993483 DOI: 10.1016/j.matt.2023.02.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Nanomedicines have transformed promising therapeutic agents into clinically approved medicines with optimal safety and efficacy profiles. This is exemplified by the mRNA vaccines against COVID-19, which were made possible by lipid nanoparticle technology. Despite the success of nanomedicines to date, their design remains far from trivial, in part due to the complexity associated with their preclinical development. Herein, we propose a nanomedicine materials acceleration platform (NanoMAP) to streamline the preclinical development of these formulations. NanoMAP combines high-throughput experimentation with state-of-the-art advances in artificial intelligence (including active learning and few-shot learning) as well as a web-based application for data sharing. The deployment of NanoMAP requires interdisciplinary collaboration between leading figures in drug delivery and artificial intelligence to enable this data-driven design approach. The proposed approach will not only expedite the development of next-generation nanomedicines but also encourage participation of the pharmaceutical science community in a large data curation initiative.
Collapse
Affiliation(s)
- Riley J Hickman
- Department of Chemistry, University of Toronto, Toronto, ON M5S 3H6, Canada
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON M5S 1M1, Canada
| | - Pauric Bannigan
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON M5S 3M2, Canada
| | - Zeqing Bao
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON M5S 3M2, Canada
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, Toronto, ON M5S 3H6, Canada
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON M5S 1M1, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), Toronto, ON M5S 1M1, Canada
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, Toronto, ON M5S 3E5, Canada
- Department of Materials Science & Engineering, University of Toronto, Toronto, ON M5S 3E4, Canada
- CIFAR Artificial Intelligence Research Chair, Vector Institute, Toronto, ON M5S 1M1, Canada
| | - Christine Allen
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON M5S 3M2, Canada
| |
Collapse
|
24
|
Gurnani R, Kuenneth C, Toland A, Ramprasad R. Polymer Informatics at Scale with Multitask Graph Neural Networks. CHEMISTRY OF MATERIALS : A PUBLICATION OF THE AMERICAN CHEMICAL SOCIETY 2023; 35:1560-1567. [PMID: 36873627 PMCID: PMC9979603 DOI: 10.1021/acs.chemmater.2c02991] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 02/03/2023] [Indexed: 06/18/2023]
Abstract
Artificial intelligence-based methods are becoming increasingly effective at screening libraries of polymers down to a selection that is manageable for experimental inquiry. The vast majority of presently adopted approaches for polymer screening rely on handcrafted chemostructural features extracted from polymer repeat units-a burdensome task as polymer libraries, which approximate the polymer chemical search space, progressively grow over time. Here, we demonstrate that directly "machine learning" important features from a polymer repeat unit is a cheap and viable alternative to extracting expensive features by hand. Our approach-based on graph neural networks, multitask learning, and other advanced deep learning techniques-speeds up feature extraction by 1-2 orders of magnitude relative to presently adopted handcrafted methods without compromising model accuracy for a variety of polymer property prediction tasks. We anticipate that our approach, which unlocks the screening of truly massive polymer libraries at scale, will enable more sophisticated and large scale screening technologies in the field of polymer informatics.
Collapse
|
25
|
Bradford G, Lopez J, Ruza J, Stolberg MA, Osterude R, Johnson JA, Gomez-Bombarelli R, Shao-Horn Y. Chemistry-Informed Machine Learning for Polymer Electrolyte Discovery. ACS CENTRAL SCIENCE 2023; 9:206-216. [PMID: 36844492 PMCID: PMC9951296 DOI: 10.1021/acscentsci.2c01123] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Indexed: 06/18/2023]
Abstract
Solid polymer electrolytes (SPEs) have the potential to improve lithium-ion batteries by enhancing safety and enabling higher energy densities. However, SPEs suffer from significantly lower ionic conductivity than liquid and solid ceramic electrolytes, limiting their adoption in functional batteries. To facilitate more rapid discovery of high ionic conductivity SPEs, we developed a chemistry-informed machine learning model that accurately predicts ionic conductivity of SPEs. The model was trained on SPE ionic conductivity data from hundreds of experimental publications. Our chemistry-informed model encodes the Arrhenius equation, which describes temperature activated processes, into the readout layer of a state-of-the-art message passing neural network and has significantly improved accuracy over models that do not encode temperature dependence. Chemically informed readout layers are compatible with deep learning for other property prediction tasks and are especially useful where limited training data are available. Using the trained model, ionic conductivity values were predicted for several thousand candidate SPE formulations, allowing us to identify promising candidate SPEs. We also generated predictions for several different anions in poly(ethylene oxide) and poly(trimethylene carbonate), demonstrating the utility of our model in identifying descriptors for SPE ionic conductivity.
Collapse
Affiliation(s)
- Gabriel Bradford
- Department
of Mechanical Engineering, Massachusetts
Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States
| | - Jeffrey Lopez
- Research
Laboratory of Electronics, Massachusetts
Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States
| | - Jurgis Ruza
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States
| | - Michael A. Stolberg
- Department
of Chemistry, Massachusetts Institute of
Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States
| | - Richard Osterude
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States
| | - Jeremiah A. Johnson
- Department
of Chemistry, Massachusetts Institute of
Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States
| | - Rafael Gomez-Bombarelli
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States
| | - Yang Shao-Horn
- Department
of Mechanical Engineering, Massachusetts
Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States
| |
Collapse
|
26
|
Martin TB, Audus DJ. Emerging Trends in Machine Learning: A Polymer Perspective. ACS POLYMERS AU 2023. [DOI: 10.1021/acspolymersau.2c00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Affiliation(s)
- Tyler B. Martin
- National Institute of Standards and Technology, Gaithersburg, Maryland20899, United States
| | - Debra J. Audus
- National Institute of Standards and Technology, Gaithersburg, Maryland20899, United States
| |
Collapse
|
27
|
Bhat V, Sornberger P, Pokuri BSS, Duke R, Ganapathysubramanian B, Risko C. Electronic, redox, and optical property prediction of organic π-conjugated molecules through a hierarchy of machine learning approaches. Chem Sci 2022; 14:203-213. [PMID: 36605753 PMCID: PMC9769113 DOI: 10.1039/d2sc04676h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 11/16/2022] [Indexed: 11/18/2022] Open
Abstract
Accelerating the development of π-conjugated molecules for applications such as energy generation and storage, catalysis, sensing, pharmaceuticals, and (semi)conducting technologies requires rapid and accurate evaluation of the electronic, redox, or optical properties. While high-throughput computational screening has proven to be a tremendous aid in this regard, machine learning (ML) and other data-driven methods can further enable orders of magnitude reduction in time while at the same time providing dramatic increases in the chemical space that is explored. However, the lack of benchmark datasets containing the electronic, redox, and optical properties that characterize the diverse, known chemical space of organic π-conjugated molecules limits ML model development. Here, we present a curated dataset containing 25k molecules with density functional theory (DFT) and time-dependent DFT (TDDFT) evaluated properties that include frontier molecular orbitals, ionization energies, relaxation energies, and low-lying optical excitation energies. Using the dataset, we train a hierarchy of ML models, ranging from classical models such as ridge regression to sophisticated graph neural networks, with molecular SMILES representation as input. We observe that graph neural networks augmented with contextual information allow for significantly better predictions across a wide array of properties. Our best-performing models also provide an uncertainty quantification for the predictions. To democratize access to the data and trained models, an interactive web platform has been developed and deployed.
Collapse
Affiliation(s)
- Vinayak Bhat
- Department of Chemistry and Center for Applied Energy Research, University of Kentucky Lexington Kentucky 40506 USA
| | - Parker Sornberger
- Department of Chemistry and Center for Applied Energy Research, University of Kentucky Lexington Kentucky 40506 USA
| | - Balaji Sesha Sarath Pokuri
- Department of Mechanical Engineering and Translational AI Center, Iowa State University Ames Iowa 50010 USA
| | - Rebekah Duke
- Department of Chemistry and Center for Applied Energy Research, University of Kentucky Lexington Kentucky 40506 USA
| | | | - Chad Risko
- Department of Chemistry and Center for Applied Energy Research, University of Kentucky Lexington Kentucky 40506 USA
| |
Collapse
|
28
|
Antoniuk ER, Li P, Kailkhura B, Hiszpanski AM. Representing Polymers as Periodic Graphs with Learned Descriptors for Accurate Polymer Property Predictions. J Chem Inf Model 2022; 62:5435-5445. [PMID: 36315033 DOI: 10.1021/acs.jcim.2c00875] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Accurately predicting new polymers' properties with machine learning models apriori to synthesis has potential to significantly accelerate new polymers' discovery and development. However, accurately and efficiently capturing polymers' complex, periodic structures in machine learning models remains a grand challenge for the polymer cheminformatics community. Specifically, there has yet to be an ideal solution for the problems of how to capture the periodicity of polymers, as well as how to optimally develop polymer descriptors without requiring human-based feature design. In this work, we tackle these problems by utilizing a periodic polymer graph representation that accounts for polymers' periodicity and coupling it with a message-passing neural network that leverages the power of graph deep learning to automatically learn chemically relevant polymer descriptors. Remarkably, this approach achieves state-of-the-art performance on 8 out of 10 distinct polymer property prediction tasks. These results highlight the advancement in predictive capability that is possible through learning descriptors that are specifically optimized for capturing the unique chemical structure of polymers.
Collapse
Affiliation(s)
- Evan R Antoniuk
- Materials Science Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California94550-5507, United States
| | - Peggy Li
- Global Security Computing Applications Division, Computing Directorate, Lawrence Livermore National Laboratory, Livermore, California94550-5507, United States
| | - Bhavya Kailkhura
- Machine Intelligence Group/Center for Applied Scientific Computing, Computing Directorate, Lawrence Livermore National Laboratory, Livermore, California94550-5507, United States
| | - Anna M Hiszpanski
- Materials Science Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California94550-5507, United States
| |
Collapse
|