1
|
Gormley AJ. Machine learning in drug delivery. J Control Release 2024; 373:23-30. [PMID: 38909704 PMCID: PMC11384327 DOI: 10.1016/j.jconrel.2024.06.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 06/17/2024] [Accepted: 06/19/2024] [Indexed: 06/25/2024]
Abstract
For decades, drug delivery scientists have been performing trial-and-error experimentation to manually sample parameter spaces and optimize release profiles through rational design. To enable this approach, scientists spend much of their career learning nuanced drug-material interactions that drive system behavior. In relatively simple systems, rational design criteria allow us to fine tune release profiles and enable efficacious therapies. However, as materials and drugs become increasingly sophisticated and their interactions have non-linear and compounding effects, the field is suffering the Curse of Dimensionality which prevents us from comprehending complex structure-function relationships. In the past, we have embraced this complexity by implementing high-throughput screens to increase the probability of finding ideal compositions. However, this brute force method was inefficient and led many to abandon these fishing expeditions. Fortunately, methods in data science including artificial intelligence / machine learning (AI/ML) are providing ideal analytical tools to model this complex data and ascertain quantitative structure-function relationships. In this Oration, I speak to the potential value of data science in drug delivery with particular focus on polymeric delivery systems. Here, I do not suggest that AI/ML will simply replace mechanistic understanding of complex systems. Rather, I propose that AI/ML should be yet another useful tool in the lab to navigate complex parameter spaces. The recent hype around AI/ML is breathtaking and potentially over inflated, but the value of these methods is poised to revolutionize how we perform science. Therefore, I encourage readers to consider adopting these skills and applying data science methods to their own problems. If done successfully, I believe we will all realize a paradigm shift in our approach to drug delivery.
Collapse
Affiliation(s)
- Adam J Gormley
- Associate Professor, Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, United States.
| |
Collapse
|
2
|
Rebello NJ, Arora A, Mochigase H, Lin TS, Shi J, Audus DJ, Muckley ES, Osmani A, Olsen BD. The Block Copolymer Phase Behavior Database. J Chem Inf Model 2024; 64:6464-6476. [PMID: 39126359 DOI: 10.1021/acs.jcim.4c00242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2024]
Abstract
The Block Copolymer Database (BCDB) is a platform that allows users to search, submit, visualize, benchmark, and download experimental phase measurements and their associated characterization information for di- and multiblock copolymers. To the best of our knowledge, there is no widely accepted data model for publishing experimental and simulation data on block copolymer self-assembly. This proposed data schema with traceable information can accommodate any number of blocks and at the time of publication contains over 5400 block copolymer total melt phase measurements mined from the literature and manually curated and simulation data points of the phase diagram generated from self-consistent field theory that can rapidly be augmented. This database can be accessed via the Community Resource for Innovation in Polymer Technology (CRIPT) web application and the Materials Data Facility. The chemical structure of the polymer is encoded in BigSMILES, an extension of the Simplified Molecular-Input Line-Entry System (SMILES) into the macromolecular domain, and the user can search repeat units and functional groups using the SMARTS search syntax (SMILES Arbitrary Target Specification). The user can also query characterization and phase information using Structured Query Language (SQL) and download custom sets of block copolymer data to train machine learning models. Finally, a protocol is presented in which GPT-4, an AI-powered large language model, can be used to rapidly screen and identify block copolymer papers from the literature using only the abstract text and determine whether they have BCDB data, allowing the database to grow as the number of published papers on the World Wide Web increases. The F1 score for this model is 0.74. This platform is an important step in making polymer data more accessible to the broader community.
Collapse
Affiliation(s)
- Nathan J Rebello
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Akash Arora
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Hidenobu Mochigase
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Tzyy-Shyang Lin
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Jiale Shi
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Debra J Audus
- Materials Science and Engineering Division, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Eric S Muckley
- Citrine Informatics, Redwood City, California 94063-2483, United States
| | - Ardiana Osmani
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Bradley D Olsen
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
3
|
Tamura R, Nagata K, Sodeyama K, Nakamura K, Tokuhira T, Shibata S, Hammura K, Sugisawa H, Kawamura M, Tsurimoto T, Naito M, Demura M, Nakanishi T. Machine learning prediction of the mechanical properties of injection-molded polypropylene through X-ray diffraction analysis. SCIENCE AND TECHNOLOGY OF ADVANCED MATERIALS 2024; 25:2388016. [PMID: 39156883 PMCID: PMC11328794 DOI: 10.1080/14686996.2024.2388016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 07/25/2024] [Accepted: 07/30/2024] [Indexed: 08/20/2024]
Abstract
Predicting the mechanical properties of polymer materials using machine learning is essential for the design of next-generation of polymers. However, the strong relationship between the higher-order structure of polymers and their mechanical properties hinders the mechanical property predictions based on their primary structures. To incorporate information on higher-order structures into the prediction model, X-ray diffraction (XRD) can be used. This study proposes a strategy to generate appropriate descriptors from the XRD analysis of the injection-molded polypropylene samples, which were prepared under almost the same injection molding conditions. To this end, first, Bayesian spectral deconvolution is used to automatically create high-dimensional descriptors. Second, informative descriptors are selected to achieve highly accurate predictions by implementing the black-box optimization method using Ising machine. This approach was applied to custom-built polymer datasets containing data on homo- polypropylene and derived composite polymers with the addition of elastomers. Results show that reasonable accuracy of predictions for seven mechanical properties can be achieved using only XRD.
Collapse
Affiliation(s)
- Ryo Tamura
- Materials Open Platform for Chemistry, National Institute for Materials Science, Ibaraki, Japan
- Center for Basic Research on Materials, National Institute for Materials Science, Ibaraki, Japan
| | - Kenji Nagata
- Materials Open Platform for Chemistry, National Institute for Materials Science, Ibaraki, Japan
- Center for Basic Research on Materials, National Institute for Materials Science, Ibaraki, Japan
| | - Keitaro Sodeyama
- Materials Open Platform for Chemistry, National Institute for Materials Science, Ibaraki, Japan
- Center for Basic Research on Materials, National Institute for Materials Science, Ibaraki, Japan
| | - Kensaku Nakamura
- Materials Open Platform for Chemistry, National Institute for Materials Science, Ibaraki, Japan
- Process Technology Laboratory, R&D Center, Mitsui Chemicals, Inc., Chiba, Japan
| | - Toshiki Tokuhira
- Materials Open Platform for Chemistry, National Institute for Materials Science, Ibaraki, Japan
- Process Technology Laboratory, R&D Center, Mitsui Chemicals, Inc., Chiba, Japan
| | - Satoshi Shibata
- Materials Open Platform for Chemistry, National Institute for Materials Science, Ibaraki, Japan
- Essential Chemicals Research Laboratory, Sumitomo Chemical Co., Chiba, Japan
| | - Kazuki Hammura
- Materials Open Platform for Chemistry, National Institute for Materials Science, Ibaraki, Japan
- Platform Laboratory for Science & Technology, Asahi KASEI Corporation, Shizuoka, Japan
| | - Hiroki Sugisawa
- Materials Open Platform for Chemistry, National Institute for Materials Science, Ibaraki, Japan
- Science & Innovation Center, Mitsubishi Chemical Corporation, Kanagawa, Japan
| | - Masaya Kawamura
- Materials Open Platform for Chemistry, National Institute for Materials Science, Ibaraki, Japan
- Science & Innovation Center, Mitsubishi Chemical Corporation, Kanagawa, Japan
| | - Teruki Tsurimoto
- Materials Open Platform for Chemistry, National Institute for Materials Science, Ibaraki, Japan
- Science & Innovation Center, Mitsubishi Chemical Corporation, Kanagawa, Japan
| | - Masanobu Naito
- Materials Open Platform for Chemistry, National Institute for Materials Science, Ibaraki, Japan
- Research Center for Macromolecules and Biomaterials, National Institute for Materials Science, Ibaraki, Japan
| | - Masahiko Demura
- Materials Open Platform for Chemistry, National Institute for Materials Science, Ibaraki, Japan
- Center for Basic Research on Materials, National Institute for Materials Science, Ibaraki, Japan
- Research Network and Facility Services Division, National Institute for Materials Science, Ibaraki, Japan
| | - Takashi Nakanishi
- Materials Open Platform for Chemistry, National Institute for Materials Science, Ibaraki, Japan
- Research Center for Materials Nanoarchitectonics (MANA), National Institute for Materials Science, Ibaraki, Japan
| |
Collapse
|
4
|
Wu T, Zhou M, Zou J, Chen Q, Qian F, Kurths J, Liu R, Tang Y. AI-guided few-shot inverse design of HDP-mimicking polymers against drug-resistant bacteria. Nat Commun 2024; 15:6288. [PMID: 39060236 PMCID: PMC11282099 DOI: 10.1038/s41467-024-50533-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 07/11/2024] [Indexed: 07/28/2024] Open
Abstract
Host defense peptide (HDP)-mimicking polymers are promising therapeutic alternatives to antibiotics and have large-scale untapped potential. Artificial intelligence (AI) exhibits promising performance on large-scale chemical-content design, however, existing AI methods face difficulties on scarcity data in each family of HDP-mimicking polymers (<102), much smaller than public polymer datasets (>105), and multi-constraints on properties and structures when exploring high-dimensional polymer space. Herein, we develop a universal AI-guided few-shot inverse design framework by designing multi-modal representations to enrich polymer information for predictions and creating a graph grammar distillation for chemical space restriction to improve the efficiency of multi-constrained polymer generation with reinforcement learning. Exampled with HDP-mimicking β-amino acid polymers, we successfully simulate predictions of over 105 polymers and identify 83 optimal polymers. Furthermore, we synthesize an optimal polymer DM0.8iPen0.2 and find that this polymer exhibits broad-spectrum and potent antibacterial activity against multiple clinically isolated antibiotic-resistant pathogens, validating the effectiveness of AI-guided design strategy.
Collapse
Affiliation(s)
- Tianyu Wu
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China
| | - Min Zhou
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Jingcheng Zou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Qi Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Feng Qian
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China
| | - Jürgen Kurths
- Potsdam Institute for Climate Impact Research (PIK), Potsdam, 14473, Germany
- Institut für Physik, Humboldt-Universität zu Berlin, Berlin, 10115, Germany
- The Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, 200433, China
| | - Runhui Liu
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, China.
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.
| | - Yang Tang
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
5
|
Huang Y, Zhong S, Gan L, Chen Y. Development of Machine Learning Models for Ion-Selective Electrode Cation Sensor Design. ACS ES&T ENGINEERING 2024; 4:1702-1711. [PMID: 39021402 PMCID: PMC11250033 DOI: 10.1021/acsestengg.4c00087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 03/15/2024] [Accepted: 03/15/2024] [Indexed: 07/20/2024]
Abstract
Polyvinyl chloride (PVC) membrane-based ion-selective electrode (ISE) sensors are common tools for water assessments, but their development relies on time-consuming and costly experimental investigations. To address this challenge, this study combines machine learning (ML), Morgan fingerprint, and Bayesian optimization technologies with experimental results to develop high-performance PVC-based ISE cation sensors. By using 1745 data sets collected from 20 years of literature, appropriate ML models are trained to enable accurate prediction and a deep understanding of the relationship between ISE components and sensor performance (R 2 = 0.75). Rapid ionophore screening is achieved using the Morgan fingerprint based on atomic groups derived from ML model interpretation. Bayesian optimization is then applied to identify optimal combinations of ISE materials with the potential to deliver desirable ISE sensor performance. Na+, Mg2+, and Al3+ sensors fabricated from Bayesian optimization results exhibit excellent Nernst slopes with less than 8.2% deviation from the ideal value and superb detection limits at 10-7 M level based on experimental validation results. This approach can potentially transform sensor development into a more time-efficient, cost-effective, and rational design process, guided by ML-based techniques.
Collapse
Affiliation(s)
- Yuankai Huang
- School
of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Shifa Zhong
- School
of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
- Department
of Environmental Science, School of Ecological and Environmental Sciences, East China Normal University, Shanghai 200241, China
| | - Lan Gan
- School
of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Yongsheng Chen
- School
of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
6
|
Kehrein J, Bunker A, Luxenhofer R. POxload: Machine Learning Estimates Drug Loadings of Polymeric Micelles. Mol Pharm 2024; 21:3356-3374. [PMID: 38805643 PMCID: PMC11394009 DOI: 10.1021/acs.molpharmaceut.4c00086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2024]
Abstract
Block copolymers, composed of poly(2-oxazoline)s and poly(2-oxazine)s, can serve as drug delivery systems; they form micelles that carry poorly water-soluble drugs. Many recent studies have investigated the effects of structural changes of the polymer and the hydrophobic cargo on drug loading. In this work, we combine these data to establish an extended formulation database. Different molecular properties and fingerprints are tested for their applicability to serve as formulation-specific mixture descriptors. A variety of classification and regression models are built for different descriptor subsets and thresholds of loading efficiency and loading capacity, with the best models achieving overall good statistics for both cross- and external validation (balanced accuracies of 0.8). Subsequently, important features are dissected for interpretation, and the DrugBank is screened for potential therapeutic use cases where these polymers could be used to develop novel formulations of hydrophobic drugs. The most promising models are provided as an open-source software tool for other researchers to test the applicability of these delivery systems for potential new drug candidates.
Collapse
Affiliation(s)
- Josef Kehrein
- Soft Matter Chemistry, Department of Chemistry, Faculty of Science, University of Helsinki, A. I. Virtasen aukio 1, 00014 Helsinki, Finland
- Drug Research Program, Division of Pharmaceutical Biosciences Faculty of Pharmacy, University of Helsinki, Viikinkaari 5 E, 00014 Helsinki, Finland
| | - Alex Bunker
- Drug Research Program, Division of Pharmaceutical Biosciences Faculty of Pharmacy, University of Helsinki, Viikinkaari 5 E, 00014 Helsinki, Finland
| | - Robert Luxenhofer
- Soft Matter Chemistry, Department of Chemistry, Faculty of Science, University of Helsinki, A. I. Virtasen aukio 1, 00014 Helsinki, Finland
| |
Collapse
|
7
|
Luong KD, Singh A. Application of Transformers in Cheminformatics. J Chem Inf Model 2024; 64:4392-4409. [PMID: 38815246 PMCID: PMC11167597 DOI: 10.1021/acs.jcim.3c02070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/05/2024] [Accepted: 05/06/2024] [Indexed: 06/01/2024]
Abstract
By accelerating time-consuming processes with high efficiency, computing has become an essential part of many modern chemical pipelines. Machine learning is a class of computing methods that can discover patterns within chemical data and utilize this knowledge for a wide variety of downstream tasks, such as property prediction or substance generation. The complex and diverse chemical space requires complex machine learning architectures with great learning power. Recently, learning models based on transformer architectures have revolutionized multiple domains of machine learning, including natural language processing and computer vision. Naturally, there have been ongoing endeavors in adopting these techniques to the chemical domain, resulting in a surge of publications within a short period. The diversity of chemical structures, use cases, and learning models necessitate a comprehensive summarization of existing works. In this paper, we review recent innovations in adapting transformers to solve learning problems in chemistry. Because chemical data is diverse and complex, we structure our discussion based on chemical representations. Specifically, we highlight the strengths and weaknesses of each representation, the current progress of adapting transformer architectures, and future directions.
Collapse
Affiliation(s)
- Kha-Dinh Luong
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| | - Ambuj Singh
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| |
Collapse
|
8
|
Das M, Ghosh A, Sunoj RB. Advances in machine learning with chemical language models in molecular property and reaction outcome predictions. J Comput Chem 2024; 45:1160-1176. [PMID: 38299229 DOI: 10.1002/jcc.27315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/06/2024] [Accepted: 01/09/2024] [Indexed: 02/02/2024]
Abstract
Molecular properties and reactions form the foundation of chemical space. Over the years, innumerable molecules have been synthesized, a smaller fraction of them found immediate applications, while a larger proportion served as a testimony to creative and empirical nature of the domain of chemical science. With increasing emphasis on sustainable practices, it is desirable that a target set of molecules are synthesized preferably through a fewer empirical attempts instead of a larger library, to realize an active candidate. In this front, predictive endeavors using machine learning (ML) models built on available data acquire high timely significance. Prediction of molecular property and reaction outcome remain one of the burgeoning applications of ML in chemical science. Among several methods of encoding molecular samples for ML models, the ones that employ language like representations are gaining steady popularity. Such representations would additionally help adopt well-developed natural language processing (NLP) models for chemical applications. Given this advantageous background, herein we describe several successful chemical applications of NLP focusing on molecular property and reaction outcome predictions. From relatively simpler recurrent neural networks (RNNs) to complex models like transformers, different network architecture have been leveraged for tasks such as de novo drug design, catalyst generation, forward and retro-synthesis predictions. The chemical language model (CLM) provides promising avenues toward a broad range of applications in a time and cost-effective manner. While we showcase an optimistic outlook of CLMs, attention is also placed on the persisting challenges in reaction domain, which would optimistically be addressed by advanced algorithms tailored to chemical language and with increased availability of high-quality datasets.
Collapse
Affiliation(s)
- Manajit Das
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| | - Ankit Ghosh
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
- Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Mumbai, India
| |
Collapse
|
9
|
Choi S, Lee J, Seo J, Han SW, Lee SH, Seo JH, Seok J. Automated BigSMILES conversion workflow and dataset for homopolymeric macromolecules. Sci Data 2024; 11:371. [PMID: 38605036 PMCID: PMC11009387 DOI: 10.1038/s41597-024-03212-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 04/02/2024] [Indexed: 04/13/2024] Open
Abstract
The simplified molecular-input line-entry system (SMILES) has been utilized in a variety of artificial intelligence analyses owing to its capability of representing chemical structures using line notation. However, its ease of representation is limited, which has led to the proposal of BigSMILES as an alternative method suitable for the representation of macromolecules. Nevertheless, research on BigSMILES remains limited due to its preprocessing requirements. Thus, this study proposes a conversion workflow of BigSMILES, focusing on its automated generation from SMILES representations of homopolymers. BigSMILES representations for 4,927,181 records are provided, thereby enabling its immediate use for various research and development applications. Our study presents detailed descriptions on a validation process to ensure the accuracy, interchangeability, and robustness of the conversion. Additionally, a systematic overview of utilized codes and functions that emphasizes their relevance in the context of BigSMILES generation are produced. This advancement is anticipated to significantly aid researchers and facilitate further studies in BigSMILES representation, including potential applications in deep learning and further extension to complex structures such as copolymers.
Collapse
Affiliation(s)
- Sunho Choi
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Joonbum Lee
- Department of Materials Science and Engineering, Korea University, Seoul, South Korea
| | - Jangwon Seo
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Sung Won Han
- School of Industrial Management Engineering, Korea University, Seoul, South Korea
| | - Sang Hyun Lee
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Ji-Hun Seo
- Department of Materials Science and Engineering, Korea University, Seoul, South Korea
| | - Junhee Seok
- School of Electrical Engineering, Korea University, Seoul, South Korea.
| |
Collapse
|
10
|
Jintoku H, Futaba DN. Machine Learning-Assisted Exploration and Identification of Aqueous Dispersants in the Vast Diversity of Organic Chemicals. ACS APPLIED MATERIALS & INTERFACES 2024; 16:11800-11808. [PMID: 38390722 DOI: 10.1021/acsami.3c18612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]
Abstract
Dispersion represents a central processing method in the organization of nanomaterials; however, the strong interparticle interaction represents a significant obstacle to fabricating homogeneous and stable dispersions. While dispersants can greatly assist in overcoming this obstacle, the appropriate type is dependent on such factors as nanomaterial, solvent, experimental conditions, etc., and there is no general guide to assist in the selection from the vast number of possibilities. We report a strategy and successful demonstration of the machine-learning-based "Dispersant Explorer", which surveys and identifies suitable dispersants from open databases. Through the combined use of experimental and molecular descriptors derived from SMILES databases, the model showed exceptional predictive accuracy in surveying about ∼1000 chemical compounds and identifying those that could be applied as dispersants. Furthermore, fabrication of transparent conducting films using the predicted and previously unknown dispersant exhibited the highest sheet resistance and transmittance compared with those of other reported undoped films. This result highlights that, in addition to opening new avenues for novel dispersant discovery, machine learning has a potential to elucidate the chemical structures essential for optimal dispersion performance to assist in the advancement of the complex topic of nanomaterial processing.
Collapse
Affiliation(s)
- Hirokuni Jintoku
- Nano Carbon Device Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba Central 5, 1-1-1 Higashi, Tsukuba 305-8565, Ibaraki, Japan
| | - Don N Futaba
- Nano Carbon Device Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba Central 5, 1-1-1 Higashi, Tsukuba 305-8565, Ibaraki, Japan
| |
Collapse
|
11
|
Davel CM, Bernat T, Wagner JR, Shirts MR. Parameterization of General Organic Polymers within the Open Force Field Framework. J Chem Inf Model 2024; 64:1290-1305. [PMID: 38303159 PMCID: PMC11090695 DOI: 10.1021/acs.jcim.3c01691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Polymer and chemically modified biopolymer systems present unique challenges to traditional molecular simulation preparation workflows. First, typical polymer and biomolecular input formats, such as Protein Data Bank (PDB) files, lack adequate chemical information needed for the parameterization of new chemistries. Second, polymers are typically too large for accurate partial charge generation methods. In this work, we employ direct chemical perception through the Open Force Field toolkit to create a flexible polymer simulation workflow for organic polymers, encompassing everything from biopolymers to soft materials. We propose and test a new input specification for monomer information that can, along with a 3D conformational geometry, parametrize and simulate most soft-material systems within the same workflow used for smaller ligands. The monomer format encompasses a subset of the SMIRKS substructure query language to uniquely identify chemical information and repeating charges in underspecified systems through matching atomic connectivity. This workflow is combined with several different approaches for automatic partial-charge generation for larger systems. As an initial proof of concept, a variety of diverse polymeric systems were parametrized with the Open Force Field toolkit, including functionalized proteins, DNA, homopolymers, cross-linked systems, and sugars. Additionally, shape properties and radial distribution functions were computed from molecular dynamics simulations of poly(ethylene glycol), polyacrylamide, and poly(N-isopropylacrylamide) homopolymers in aqueous solution and compared to previous simulation results in order to demonstrate a start-to-finish workflow for simulation and property prediction. We expect that these tools will greatly expedite the day-to-day computational research of soft-matter simulations and create a robust atomic-scale polymer specification in conjunction with existing polymer structural notations.
Collapse
Affiliation(s)
- Connor M Davel
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Timotej Bernat
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Jeffrey R Wagner
- The Open Force Field Initiative, Open Molecular Software Foundation, Davis, California 95616, United States
| | - Michael R Shirts
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| |
Collapse
|
12
|
Patel RA, Webb MA. Data-Driven Design of Polymer-Based Biomaterials: High-throughput Simulation, Experimentation, and Machine Learning. ACS APPLIED BIO MATERIALS 2024; 7:510-527. [PMID: 36701125 DOI: 10.1021/acsabm.2c00962] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Polymers, with the capacity to tunably alter properties and response based on manipulation of their chemical characteristics, are attractive components in biomaterials. Nevertheless, their potential as functional materials is also inhibited by their complexity, which complicates rational or brute-force design and realization. In recent years, machine learning has emerged as a useful tool for facilitating materials design via efficient modeling of structure-property relationships in the chemical domain of interest. In this Spotlight, we discuss the emergence of data-driven design of polymers that can be deployed in biomaterials with particular emphasis on complex copolymer systems. We outline recent developments, as well as our own contributions and takeaways, related to high-throughput data generation for polymer systems, methods for surrogate modeling by machine learning, and paradigms for property optimization and design. Throughout this discussion, we highlight key aspects of successful strategies and other considerations that will be relevant to the future design of polymer-based biomaterials with target properties.
Collapse
Affiliation(s)
- Roshan A Patel
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| |
Collapse
|
13
|
Shi J, Walsh D, Zou W, Rebello NJ, Deagen ME, Fransen KA, Gao X, Olsen BD, Audus DJ. Calculating Pairwise Similarity of Polymer Ensembles via Earth Mover's Distance. ACS POLYMERS AU 2024; 4:66-76. [PMID: 38371731 PMCID: PMC10870752 DOI: 10.1021/acspolymersau.3c00029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 02/20/2024]
Abstract
Synthetic polymers, in contrast to small molecules and deterministic biomacromolecules, are typically ensembles composed of polymer chains with varying numbers, lengths, sequences, chemistry, and topologies. While numerous approaches exist for measuring pairwise similarity among small molecules and sequence-defined biomacromolecules, accurately determining the pairwise similarity between two polymer ensembles remains challenging. This work proposes the earth mover's distance (EMD) metric to calculate the pairwise similarity score between two polymer ensembles. EMD offers a greater resolution of chemical differences between polymer ensembles than the averaging method and provides a quantitative numeric value representing the pairwise similarity between polymer ensembles in alignment with chemical intuition. The EMD approach for assessing polymer similarity enhances the development of accurate chemical search algorithms within polymer databases and can improve machine learning techniques for polymer design, optimization, and property prediction.
Collapse
Affiliation(s)
- Jiale Shi
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Dylan Walsh
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Weizhong Zou
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Nathan J. Rebello
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael E. Deagen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Katharina A. Fransen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Xian Gao
- Department
of Chemical and Biomolecular Engineering, University of Notre Dame, Notre
Dame, Indiana 46556, United States
| | - Bradley D. Olsen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Debra J. Audus
- Materials
Science and Engineering Division, National
Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
14
|
Tiwari SP, Shi W, Budhathoki S, Baker J, Sekizkardes AK, Zhu L, Kusuma VA, Hopkinson DP, Steckel JA. Creation of Polymer Datasets with Targeted Backbones for Screening of High-Performance Membranes for Gas Separation. J Chem Inf Model 2024; 64:638-652. [PMID: 38294781 DOI: 10.1021/acs.jcim.3c01232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
A simple approach was developed to computationally construct a polymer dataset by combining simplified molecular-input line-entry system (SMILES) strings of a targeted polymer backbone and a variety of molecular fragments. This method was used to create 14 polymer datasets by combining seven polymer backbones and molecules from two large molecular datasets (MOSES and QM9). Polymer backbones that were studied include four polydimethylsiloxane (PDMS) based backbones, poly(ethylene oxide) (PEO), poly(allyl glycidyl ether) (PAGE), and polyphosphazene (PPZ). The generated polymer datasets can be used for various cheminformatics tasks, including high-throughput screening for gas permeability and selectivity. This study utilized machine learning (ML) models to screen the polymers for CO2/CH4 and CO2/N2 gas separation using membranes. Several polymers of interest were identified. The results highlight that employing an ML model fitted to polymer selectivities leads to higher accuracy in predicting polymer selectivity compared to using the ratio of predicted permeabilities.
Collapse
Affiliation(s)
- Surya Prakash Tiwari
- National Energy Technology Laboratory, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
- NETL Support Contractor, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
| | - Wei Shi
- National Energy Technology Laboratory, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
| | - Samir Budhathoki
- National Energy Technology Laboratory, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
- NETL Support Contractor, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
| | - James Baker
- National Energy Technology Laboratory, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
- NETL Support Contractor, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
| | - Ali K Sekizkardes
- National Energy Technology Laboratory, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
- NETL Support Contractor, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
| | - Lingxiang Zhu
- National Energy Technology Laboratory, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
- NETL Support Contractor, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
| | - Victor A Kusuma
- National Energy Technology Laboratory, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
- NETL Support Contractor, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
| | - David P Hopkinson
- National Energy Technology Laboratory, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
| | - Janice A Steckel
- National Energy Technology Laboratory, 626 Cochran Mill Road, Pittsburgh, Pennsylvania 15236, United States
| |
Collapse
|
15
|
Qiu H, Liu L, Qiu X, Dai X, Ji X, Sun ZY. PolyNC: a natural and chemical language model for the prediction of unified polymer properties. Chem Sci 2024; 15:534-544. [PMID: 38179518 PMCID: PMC10763023 DOI: 10.1039/d3sc05079c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 12/04/2023] [Indexed: 01/06/2024] Open
Abstract
Language models exhibit a profound aptitude for addressing multimodal and multidomain challenges, a competency that eludes the majority of off-the-shelf machine learning models. Consequently, language models hold great potential for comprehending the intricate interplay between material compositions and diverse properties, thereby accelerating material design, particularly in the realm of polymers. While past limitations in polymer data hindered the use of data-intensive language models, the growing availability of standardized polymer data and effective data augmentation techniques now opens doors to previously uncharted territories. Here, we present a revolutionary model to enable rapid and precise prediction of Polymer properties via the power of Natural language and Chemical language (PolyNC). To showcase the efficacy of PolyNC, we have meticulously curated a labeled prompt-structure-property corpus encompassing 22 970 polymer data points on a series of essential polymer properties. Through the use of natural language prompts, PolyNC gains a comprehensive understanding of polymer properties, while employing chemical language (SMILES) to describe polymer structures. In a unified text-to-text manner, PolyNC consistently demonstrates exceptional performance on both regression tasks (such as property prediction) and the classification task (polymer classification). Simultaneous and interactive multitask learning enables PolyNC to holistically grasp the structure-property relationships of polymers. Through a combination of experiments and characterizations, the generalization ability of PolyNC has been demonstrated, with attention analysis further indicating that PolyNC effectively learns structural information about polymers from multimodal inputs. This work provides compelling evidence of the potential for deploying end-to-end language models in polymer research, representing a significant advancement in the AI community's dedicated pursuit of advancing polymer science.
Collapse
Affiliation(s)
- Haoke Qiu
- State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
- School of Applied Chemistry and Engineering, University of Science and Technology of China Hefei 230026 China
| | - Lunyang Liu
- State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
| | - Xuepeng Qiu
- School of Applied Chemistry and Engineering, University of Science and Technology of China Hefei 230026 China
- CAS Key Laboratory of High-Performance Synthetic Rubber and its Composite Materials, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
| | - Xuemin Dai
- CAS Key Laboratory of High-Performance Synthetic Rubber and its Composite Materials, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
| | - Xiangling Ji
- State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
- School of Applied Chemistry and Engineering, University of Science and Technology of China Hefei 230026 China
| | - Zhao-Yan Sun
- State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
- School of Applied Chemistry and Engineering, University of Science and Technology of China Hefei 230026 China
| |
Collapse
|
16
|
Day EC, Chittari SS, Bogen MP, Knight AS. Navigating the Expansive Landscapes of Soft Materials: A User Guide for High-Throughput Workflows. ACS POLYMERS AU 2023; 3:406-427. [PMID: 38107416 PMCID: PMC10722570 DOI: 10.1021/acspolymersau.3c00025] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/02/2023] [Accepted: 11/07/2023] [Indexed: 12/19/2023]
Abstract
Synthetic polymers are highly customizable with tailored structures and functionality, yet this versatility generates challenges in the design of advanced materials due to the size and complexity of the design space. Thus, exploration and optimization of polymer properties using combinatorial libraries has become increasingly common, which requires careful selection of synthetic strategies, characterization techniques, and rapid processing workflows to obtain fundamental principles from these large data sets. Herein, we provide guidelines for strategic design of macromolecule libraries and workflows to efficiently navigate these high-dimensional design spaces. We describe synthetic methods for multiple library sizes and structures as well as characterization methods to rapidly generate data sets, including tools that can be adapted from biological workflows. We further highlight relevant insights from statistics and machine learning to aid in data featurization, representation, and analysis. This Perspective acts as a "user guide" for researchers interested in leveraging high-throughput screening toward the design of multifunctional polymers and predictive modeling of structure-property relationships in soft materials.
Collapse
Affiliation(s)
| | | | - Matthew P. Bogen
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Abigail S. Knight
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
17
|
Hu J, Li Z, Lin J, Zhang L. Prediction and Interpretability of Glass Transition Temperature of Homopolymers by Data-Augmented Graph Convolutional Neural Networks. ACS APPLIED MATERIALS & INTERFACES 2023; 15:54006-54017. [PMID: 37934171 DOI: 10.1021/acsami.3c13698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
Establishing the structure-property relationship by machine learning (ML) models is extremely valuable for accelerating the molecular design of polymers. However, existing ML models for the polymers are subject to scarcity issues of training data and fewer variations of graph structures of molecules. In addition, limited works have explored the interpretability of ML models to infer the latent knowledge in the field of polymer science that could inspire ML-assisted molecular design. In this contribution, we integrate graph convolutional neural networks (GCNs) with data augmentation strategy to predict the glass transition temperature Tg of polymers. It is demonstrated that the data-augmented GCN model outperforms the conventional models and achieves a higher accuracy for the prediction of Tg despite a small amount of training data. Furthermore, taking advantage of molecular graph representations, the data-augmented GCN model has the capability to infer the importance of atoms or substructures from the understanding of Tg, which generally agrees with the experimental findings in the field of polymer science. The inferred knowledge of the GCN model is used to advise on the design of functional polymers with specific Tg. The data-augmented GCN model possesses prominent superiorities in the establishment of structure-property relationship and also provides an efficient way for accelerating the rational design of polymer molecules.
Collapse
Affiliation(s)
- Junyang Hu
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Zean Li
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Jiaping Lin
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liangshun Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
18
|
Wilson AN, St John PC, Marin DH, Hoyt CB, Rognerud EG, Nimlos MR, Cywar RM, Rorrer NA, Shebek KM, Broadbelt LJ, Beckham GT, Crowley MF. PolyID: Artificial Intelligence for Discovering Performance-Advantaged and Sustainable Polymers. Macromolecules 2023; 56:8547-8557. [PMID: 38024155 PMCID: PMC10653284 DOI: 10.1021/acs.macromol.3c00994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 09/30/2023] [Indexed: 12/01/2023]
Abstract
A necessary transformation for a sustainable economy is the transition from fossil-derived plastics to polymers derived from biomass and waste resources. While renewable feedstocks can enhance material performance through unique chemical moieties, probing the vast material design space by experiment alone is not practically feasible. Here, we develop a machine-learning-based tool, PolyID, to reduce the design space of renewable feedstocks to enable efficient discovery of performance-advantaged, biobased polymers. PolyID is a multioutput, graph neural network specifically designed to increase accuracy and to enable quantitative structure-property relationship (QSPR) analysis for polymers. It includes a novel domain-of-validity method that was developed and applied to demonstrate how gaps in training data can be filled to improve accuracy. The model was benchmarked with both a 20% held-out subset of the original training data and 22 experimentally synthesized polymers. A mean absolute error for the glass transition temperatures of 19.8 and 26.4 °C was achieved for the test and experimental data sets, respectively. Predictions were made on polymers composed of monomers from four databases that contain biologically accessible small molecules: MetaCyc, MINEs, KEGG, and BiGG. From 1.4 × 106 accessible biobased polymers, we identified five poly(ethylene terephthalate) (PET) analogues with predicted improvements to thermal and transport performance. Experimental validation for one of the PET analogues demonstrated a glass transition temperature between 85 and 112 °C, which is higher than PET and within the predicted range of the PolyID tool. In addition to accurate predictions, we show how the model's predictions are explainable through analysis of individual bond importance for a biobased nylon. Overall, PolyID can aid the biobased polymer practitioner to navigate the vast number of renewable polymers to discover sustainable materials with enhanced performance.
Collapse
Affiliation(s)
- A. Nolan Wilson
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Peter C. St John
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Daniela H. Marin
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Caroline B. Hoyt
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Erik G. Rognerud
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Mark R. Nimlos
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Robin M. Cywar
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Nicholas A. Rorrer
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Kevin M. Shebek
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
- Department
of Chemical and Biological Engineering and Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, United States
- Chemistry
of Life Processes Institute, Northwestern
University, Evanston, Illinois 60208, United States
| | - Linda J. Broadbelt
- Department
of Chemical and Biological Engineering and Center for Synthetic Biology, Northwestern University, Evanston, Illinois 60208, United States
| | - Gregg T. Beckham
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| | - Michael F. Crowley
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, Colorado 80401, United States
| |
Collapse
|
19
|
Rebello NJ, Lin TS, Nazeer H, Olsen BD. BigSMARTS: A Topologically Aware Query Language and Substructure Search Algorithm for Polymer Chemical Structures. J Chem Inf Model 2023; 63:6555-6568. [PMID: 37874026 DOI: 10.1021/acs.jcim.3c00978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Molecular search is important in chemistry, biology, and informatics for identifying molecular structures within large data sets, improving knowledge discovery and innovation, and making chemical data FAIR (findable, accessible, interoperable, reusable). Search algorithms for polymers are significantly less developed than those for small molecules because polymer search relies on searching by polymer name, which can be challenging because polymer naming is overly broad (i.e., polyethylene), complicated for complex chemical structures, and often does not correspond to official IUPAC conventions. Chemical structure search in polymers is limited to substructures, such as monomers, without awareness of connectivity or topology. This work introduces a novel query language and graph traversal search algorithm for polymers that provides the first search method able to fully capture all of the chemical structures present in polymers. The BigSMARTS query language, an extension of the small-molecule SMARTS language, allows users to write queries that localize monomer and functional group searches to different parts of the polymer, like the middle block of a triblock, the side chain of a graft, and the backbone of a repeat unit. The substructure search algorithm is based on the traversal of graph representations of the generating functions for the stochastic graphs of polymers. Operationally, the algorithm first identifies cycles representing the monomers and then the end groups and finally performs a depth-first search to match entire subgraphs. To validate the algorithm, hundreds of queries were searched against hundreds of target chemistries and topologies from the literature, with approximately 440,000 query-target pairs. This tool provides a detailed algorithm that can be implemented in search engines to provide search results with full matching of the monomer connectivity and polymer topology.
Collapse
Affiliation(s)
- Nathan J Rebello
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Tzyy-Shyang Lin
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Heeba Nazeer
- Department of Computer Science, Wellesley College, 106 Central Street, Wellesley, Massachusetts 02481, United States
| | - Bradley D Olsen
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
20
|
Ohno M, Hayashi Y, Zhang Q, Kaneko Y, Yoshida R. SMiPoly: Generation of a Synthesizable Polymer Virtual Library Using Rule-Based Polymerization Reactions. J Chem Inf Model 2023; 63:5539-5548. [PMID: 37604495 PMCID: PMC10498440 DOI: 10.1021/acs.jcim.3c00329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Indexed: 08/23/2023]
Abstract
Recent advances in machine learning have led to the rapid adoption of various computational methods for de novo molecular design in polymer research, including high-throughput virtual screening and inverse molecular design. In such workflows, molecular generators play an essential role in creation or sequential modification of candidate polymer structures. Machine learning-assisted molecular design has made great technical progress over the past few years. However, the difficulty of identifying synthetic routes to such designed polymers remains unresolved. To address this technical limitation, we present Small Molecules into Polymers (SMiPoly), a Python library for virtual polymer generation that implements 22 chemical rules for commonly applied polymerization reactions. For given small organic molecules to form a candidate monomer set, the SMiPoly generator conducts possible polymerization reactions to generate an exhaustive list of potentially synthesizable polymers. In this study, using 1083 readily available monomers, we generated 169,347 unique polymers forming seven different molecular types: polyolefin, polyester, polyether, polyamide, polyimide, polyurethane, and polyoxazolidone. By comparing the distribution of the virtually created polymers with approximately 16,000 real polymers synthesized so far, it was found that the coverage and novelty of the SMiPoly-generated polymers can reach 48 and 53%, respectively. Incorporating the SMiPoly library into a molecular design workflow will accelerate the process of de novo polymer synthesis by shortening the step to select synthesizable candidate polymers.
Collapse
Affiliation(s)
- Mitsuru Ohno
- Daicel
Corporation, Kita-ku, 530-0011 Osaka, Japan
| | - Yoshihiro Hayashi
- The
Institute of Statistical Mathematics, Research Organization of Information
and Systems, Tachikawa, Tokyo 190-8562, Japan
- The
Graduate University for Advanced Studies, SOKENDAI, Tachikawa, Tokyo 190-8562, Japan
| | - Qi Zhang
- The
Institute of Statistical Mathematics, Research Organization of Information
and Systems, Tachikawa, Tokyo 190-8562, Japan
| | - Yu Kaneko
- Daicel
Corporation, Kita-ku, 530-0011 Osaka, Japan
| | - Ryo Yoshida
- The
Institute of Statistical Mathematics, Research Organization of Information
and Systems, Tachikawa, Tokyo 190-8562, Japan
- The
Graduate University for Advanced Studies, SOKENDAI, Tachikawa, Tokyo 190-8562, Japan
- National
Institute for Materials Science, 305-0047 Ibaraki, Japan
| |
Collapse
|
21
|
McDonald SM, Augustine EK, Lanners Q, Rudin C, Catherine Brinson L, Becker ML. Applied machine learning as a driver for polymeric biomaterials design. Nat Commun 2023; 14:4838. [PMID: 37563117 PMCID: PMC10415291 DOI: 10.1038/s41467-023-40459-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 07/24/2023] [Indexed: 08/12/2023] Open
Abstract
Polymers are ubiquitous to almost every aspect of modern society and their use in medical products is similarly pervasive. Despite this, the diversity in commercial polymers used in medicine is stunningly low. Considerable time and resources have been extended over the years towards the development of new polymeric biomaterials which address unmet needs left by the current generation of medical-grade polymers. Machine learning (ML) presents an unprecedented opportunity in this field to bypass the need for trial-and-error synthesis, thus reducing the time and resources invested into new discoveries critical for advancing medical treatments. Current efforts pioneering applied ML in polymer design have employed combinatorial and high throughput experimental design to address data availability concerns. However, the lack of available and standardized characterization of parameters relevant to medicine, including degradation time and biocompatibility, represents a nearly insurmountable obstacle to ML-aided design of biomaterials. Herein, we identify a gap at the intersection of applied ML and biomedical polymer design, highlight current works at this junction more broadly and provide an outlook on challenges and future directions.
Collapse
Affiliation(s)
| | - Emily K Augustine
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
| | - Quinn Lanners
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Cynthia Rudin
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - L Catherine Brinson
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
| | - Matthew L Becker
- Department of Chemistry, Duke University, Durham, NC, USA.
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA.
| |
Collapse
|
22
|
Kim S, Schroeder CM, Jackson NE. Open Macromolecular Genome: Generative Design of Synthetically Accessible Polymers. ACS POLYMERS AU 2023; 3:318-330. [PMID: 37576712 PMCID: PMC10416319 DOI: 10.1021/acspolymersau.3c00003] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 03/13/2023] [Accepted: 03/14/2023] [Indexed: 03/31/2023]
Abstract
A grand challenge in polymer science lies in the predictive design of new polymeric materials with targeted functionality. However, de novo design of functional polymers is challenging due to the vast chemical space and an incomplete understanding of structure-property relations. Recent advances in deep generative modeling have facilitated the efficient exploration of molecular design space, but data sparsity in polymer science is a major obstacle hindering progress. In this work, we introduce a vast polymer database known as the Open Macromolecular Genome (OMG), which contains synthesizable polymer chemistries compatible with known polymerization reactions and commercially available reactants selected for synthetic feasibility. The OMG is used in concert with a synthetically aware generative model known as Molecule Chef to identify property-optimized constitutional repeating units, constituent reactants, and reaction pathways of polymers, thereby advancing polymer design into the realm of synthetic relevance. As a proof-of-principle demonstration, we show that polymers with targeted octanol-water solubilities are readily generated together with monomer reactant building blocks and associated polymerization reactions. Suggested reactants are further integrated with Reaxys polymerization data to provide hypothetical reaction conditions (e.g., temperature, catalysts, and solvents). Broadly, the OMG is a polymer design approach capable of enabling data-intensive generative models for synthetic polymer design. Overall, this work represents a significant advance, enabling the property targeted design of synthetic polymers subject to practical synthetic constraints.
Collapse
Affiliation(s)
- Seonghwan Kim
- Department
of Materials Science and Engineering, University
of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Charles M. Schroeder
- Department
of Chemistry, University of Illinois at
Urbana-Champaign, Urbana, Illinois 61801, United States
- Department
of Materials Science and Engineering, University
of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Beckman
Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department
of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Nicholas E. Jackson
- Department
of Chemistry, University of Illinois at
Urbana-Champaign, Urbana, Illinois 61801, United States
- Beckman
Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
23
|
Wang H, Fu T, Du Y, Gao W, Huang K, Liu Z, Chandak P, Liu S, Van Katwyk P, Deac A, Anandkumar A, Bergen K, Gomes CP, Ho S, Kohli P, Lasenby J, Leskovec J, Liu TY, Manrai A, Marks D, Ramsundar B, Song L, Sun J, Tang J, Veličković P, Welling M, Zhang L, Coley CW, Bengio Y, Zitnik M. Scientific discovery in the age of artificial intelligence. Nature 2023; 620:47-60. [PMID: 37532811 DOI: 10.1038/s41586-023-06221-2] [Citation(s) in RCA: 113] [Impact Index Per Article: 113.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 05/16/2023] [Indexed: 08/04/2023]
Abstract
Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment and accelerate research, helping scientists to generate hypotheses, design experiments, collect and interpret large datasets, and gain insights that might not have been possible using traditional scientific methods alone. Here we examine breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deep learning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency. Generative AI methods can create designs, such as small-molecule drugs and proteins, by analysing diverse data modalities, including images and sequences. We discuss how these methods can help scientists throughout the scientific process and the central issues that remain despite such advances. Both developers and users of AI toolsneed a better understanding of when such approaches need improvement, and challenges posed by poor data quality and stewardship remain. These issues cut across scientific disciplines and require developing foundational algorithmic approaches that can contribute to scientific understanding or acquire it autonomously, making them critical areas of focus for AI innovation.
Collapse
Affiliation(s)
- Hanchen Wang
- Department of Engineering, University of Cambridge, Cambridge, UK
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
- Department of Research and Early Development, Genentech Inc, South San Francisco, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Tianfan Fu
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Yuanqi Du
- Department of Computer Science, Cornell University, Ithaca, NY, USA
| | - Wenhao Gao
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Kexin Huang
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Ziming Liu
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Payal Chandak
- Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA, USA
| | - Shengchao Liu
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- Université de Montréal, Montreal, Quebec, Canada
| | - Peter Van Katwyk
- Department of Earth, Environmental and Planetary Sciences, Brown University, Providence, RI, USA
- Data Science Institute, Brown University, Providence, RI, USA
| | - Andreea Deac
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- Université de Montréal, Montreal, Quebec, Canada
| | - Anima Anandkumar
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
- NVIDIA, Santa Clara, CA, USA
| | - Karianne Bergen
- Department of Earth, Environmental and Planetary Sciences, Brown University, Providence, RI, USA
- Data Science Institute, Brown University, Providence, RI, USA
| | - Carla P Gomes
- Department of Computer Science, Cornell University, Ithaca, NY, USA
| | - Shirley Ho
- Center for Computational Astrophysics, Flatiron Institute, New York, NY, USA
- Department of Astrophysical Sciences, Princeton University, Princeton, NJ, USA
- Department of Physics, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Physics and Center for Data Science, New York University, New York, NY, USA
| | | | - Joan Lasenby
- Department of Engineering, University of Cambridge, Cambridge, UK
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | - Arjun Manrai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Debora Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Le Song
- BioMap, Beijing, China
- Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| | - Jimeng Sun
- University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Jian Tang
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- HEC Montréal, Montreal, Quebec, Canada
- CIFAR AI Chair, Toronto, Ontario, Canada
| | - Petar Veličković
- Google DeepMind, London, UK
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
| | - Max Welling
- University of Amsterdam, Amsterdam, Netherlands
- Microsoft Research Amsterdam, Amsterdam, Netherlands
| | - Linfeng Zhang
- DP Technology, Beijing, China
- AI for Science Institute, Beijing, China
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yoshua Bengio
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- Université de Montréal, Montreal, Quebec, Canada
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
24
|
Shi J, Albreiki F, Yamil J Colón, Srivastava S, Whitmer JK. Transfer Learning Facilitates the Prediction of Polymer-Surface Adhesion Strength. J Chem Theory Comput 2023; 19:4631-4640. [PMID: 37068204 DOI: 10.1021/acs.jctc.2c01314] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
Abstract
Machine learning (ML) accelerates the exploration of material properties and their links to the structure of the underlying molecules. In previous work [Shi et al. ACS Applied Materials & Interfaces 2022, 14, 37161-37169.], ML models were applied to predict the adhesive free energy of polymer-surface interactions with high accuracy from the knowledge of the sequence data, demonstrating successes in inverse-design of polymer sequence for known surface compositions. While the method was shown to be successful in designing polymers for a known surface, extensive data sets were needed for each specific surface in order to train the surrogate models. Ideally, one should be able to infer information about similar surfaces without having to regenerate a full complement of adhesion data for each new case. In the current work, we demonstrate a transfer learning (TL) technique using a deep neural network to improve the accuracy of ML models trained on small data sets by pretraining on a larger database from a related system and fine-tuning the weights of all layers with a small amount of additional data. The shared knowledge from the pretrained model facilitates the prediction accuracy significantly on small data sets. We also explore the limits of database size on accuracy and the optimal tuning of network architecture and parameters for our learning tasks. While applied to a relatively simple coarse-grained (CG) polymer model, the general lessons of this study apply to detailed modeling studies and the broader problems of inverse materials design.
Collapse
Affiliation(s)
- Jiale Shi
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Fahed Albreiki
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Yamil J Colón
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Samanvaya Srivastava
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, Los Angeles, California 90095, United States
- California NanoSystems Institute, Center for Biological Physics, University of California, Los Angeles, Los Angeles, California 90095, United States
- Institute for Carbon Management, University of California, Los Angeles, Los Angeles, California 90095, United States
- Center for Biological Physics, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Jonathan K Whitmer
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| |
Collapse
|
25
|
Lo S, Seifrid M, Gaudin T, Aspuru-Guzik A. Augmenting Polymer Datasets by Iterative Rearrangement. J Chem Inf Model 2023. [PMID: 37390494 DOI: 10.1021/acs.jcim.3c00144] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2023]
Abstract
One of the biggest obstacles to successful polymer property prediction is an effective representation that accurately captures the sequence of repeat units in a polymer. Motivated by the success of data augmentation in computer vision and natural language processing, we explore augmenting polymer data by iteratively rearranging the molecular representation while preserving the correct connectivity, revealing additional substructural information that is not present in a single representation. We evaluate the effects of this technique on the performance of machine learning models trained on three polymer datasets and compare them to common molecular representations. Data augmentation does not yield significant improvements in machine learning property prediction performance compared to equivalent (non-augmented) representations. In datasets where the target property is primarily influenced by the polymer sequence rather than experimental parameters, this data augmentation technique provides molecular embedding with more information to improve property prediction accuracy.
Collapse
Affiliation(s)
- Stanley Lo
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
| | - Martin Seifrid
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, Ontario M5S 2E4, Canada
- IBM Research Zürich, Rüschlikon, Zürich 8803, Switzerland
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, Ontario M5S 2E4, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, 200 College St., Toronto, Ontario M5S 3E5, Canada
- Department of Materials Science and Engineering, University of Toronto, 184 College St., Toronto, Ontario M5S 3E4, Canada
- CIFAR Artificial Intelligence Research Chair, Vector Institute, Toronto, Ontario M5S 1M1, Canada
- Canadian Institute for Advanced Research (CIFAR), Toronto, Ontario M5S 1M1, Canada
| |
Collapse
|
26
|
Park NH, Manica M, Born J, Hedrick JL, Erdmann T, Zubarev DY, Adell-Mill N, Arrechea PL. Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language. Nat Commun 2023; 14:3686. [PMID: 37344485 PMCID: PMC10284867 DOI: 10.1038/s41467-023-39396-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 06/12/2023] [Indexed: 06/23/2023] Open
Abstract
Advances in machine learning (ML) and automated experimentation are poised to vastly accelerate research in polymer science. Data representation is a critical aspect for enabling ML integration in research workflows, yet many data models impose significant rigidity making it difficult to accommodate a broad array of experiment and data types found in polymer science. This inflexibility presents a significant barrier for researchers to leverage their historical data in ML development. Here we show that a domain specific language, termed Chemical Markdown Language (CMDL), provides flexible, extensible, and consistent representation of disparate experiment types and polymer structures. CMDL enables seamless use of historical experimental data to fine-tune regression transformer (RT) models for generative molecular design tasks. We demonstrate the utility of this approach through the generation and the experimental validation of catalysts and polymers in the context of ring-opening polymerization-although we provide examples of how CMDL can be more broadly applied to other polymer classes. Critically, we show how the CMDL tuned model preserves key functional groups within the polymer structure, allowing for experimental validation. These results reveal the versatility of CMDL and how it facilitates translation of historical data into meaningful predictive and generative models to produce experimentally actionable output.
Collapse
Affiliation(s)
| | - Matteo Manica
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
| | - Jannis Born
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058, Basel, Switzerland
| | - James L Hedrick
- IBM Research-Almaden, 650 Harry Rd., San Jose, CA, 95120, USA
| | - Tim Erdmann
- IBM Research-Almaden, 650 Harry Rd., San Jose, CA, 95120, USA
| | | | - Nil Adell-Mill
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Arctoris, 120E Olympic Avenue, Abingdon, OX14 4SA, Oxfordshire, UK
| | | |
Collapse
|
27
|
Martin TB, Audus DJ. Emerging Trends in Machine Learning: A Polymer Perspective. ACS POLYMERS AU 2023; 3:239-258. [PMID: 37334191 PMCID: PMC10273415 DOI: 10.1021/acspolymersau.2c00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/20/2022] [Accepted: 12/21/2022] [Indexed: 01/19/2023]
Abstract
In the last five years, there has been tremendous growth in machine learning and artificial intelligence as applied to polymer science. Here, we highlight the unique challenges presented by polymers and how the field is addressing them. We focus on emerging trends with an emphasis on topics that have received less attention in the review literature. Finally, we provide an outlook for the field, outline important growth areas in machine learning and artificial intelligence for polymer science and discuss important advances from the greater material science community.
Collapse
Affiliation(s)
- Tyler B. Martin
- National Institute of Standards
and Technology, Gaithersburg, Maryland20899, United States
| | - Debra J. Audus
- National Institute of Standards
and Technology, Gaithersburg, Maryland20899, United States
| |
Collapse
|
28
|
Ucak UV, Ashyrmamatov I, Lee J. Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization. J Cheminform 2023; 15:55. [PMID: 37248531 PMCID: PMC10228139 DOI: 10.1186/s13321-023-00725-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 05/14/2023] [Indexed: 05/31/2023] Open
Abstract
Tokenization is an important preprocessing step in natural language processing that may have a significant influence on prediction quality. This research showed that the traditional SMILES tokenization has a certain limitation that results in tokens failing to reflect the true nature of molecules. To address this issue, we developed the atom-in-SMILES tokenization scheme that eliminates ambiguities in the generic nature of SMILES tokens. Our results in multiple chemical translation and molecular property prediction tasks demonstrate that proper tokenization has a significant impact on prediction quality. In terms of prediction accuracy and token degeneration, atom-in-SMILES is more effective method in generating higher-quality SMILES sequences from AI-based chemical models compared to other tokenization and representation schemes. We investigated the degrees of token degeneration of various schemes and analyzed their adverse effects on prediction quality. Additionally, token-level repetitions were quantified, and generated examples were incorporated for qualitative examination. We believe that the atom-in-SMILES tokenization has a great potential to be adopted by broad related scientific communities, as it provides chemically accurate, tailor-made tokens for molecular property prediction, chemical translation, and molecular generative models.
Collapse
Affiliation(s)
- Umit V Ucak
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Republic of Korea
| | | | - Juyong Lee
- Research Institute of Pharmaceutical Science, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
29
|
Yan T, Balzer AH, Herbert KM, Epps TH, Korley LTJ. Circularity in polymers: addressing performance and sustainability challenges using dynamic covalent chemistries. Chem Sci 2023; 14:5243-5265. [PMID: 37234906 PMCID: PMC10208058 DOI: 10.1039/d3sc00551h] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 04/14/2023] [Indexed: 05/28/2023] Open
Abstract
The circularity of current and future polymeric materials is a major focus of fundamental and applied research, as undesirable end-of-life outcomes and waste accumulation are global problems that impact our society. The recycling or repurposing of thermoplastics and thermosets is an attractive solution to these issues, yet both options are encumbered by poor property retention upon reuse, along with heterogeneities in common waste streams that limit property optimization. Dynamic covalent chemistry, when applied to polymeric materials, enables the targeted design of reversible bonds that can be tailored to specific reprocessing conditions to help address conventional recycling challenges. In this review, we highlight the key features of several dynamic covalent chemistries that can promote closed-loop recyclability and we discuss recent synthetic progress towards incorporating these chemistries into new polymers and existing commodity plastics. Next, we outline how dynamic covalent bonds and polymer network structure influence thermomechanical properties related to application and recyclability, with a focus on predictive physical models that describe network rearrangement. Finally, we examine the potential economic and environmental impacts of dynamic covalent polymeric materials in closed-loop processing using elements derived from techno-economic analysis and life-cycle assessment, including minimum selling prices and greenhouse gas emissions. Throughout each section, we discuss interdisciplinary obstacles that hinder the widespread adoption of dynamic polymers and present opportunities and new directions toward the realization of circularity in polymeric materials.
Collapse
Affiliation(s)
- Tianwei Yan
- Department of Chemical & Biomolecular Engineering, University of Delaware Newark 19716 Delaware USA
- Center for Plastics Innovation (CPI), University of Delaware Newark 19716 Delaware USA
| | - Alex H Balzer
- Department of Chemical & Biomolecular Engineering, University of Delaware Newark 19716 Delaware USA
- Center for Plastics Innovation (CPI), University of Delaware Newark 19716 Delaware USA
| | - Katie M Herbert
- Center for Plastics Innovation (CPI), University of Delaware Newark 19716 Delaware USA
| | - Thomas H Epps
- Department of Chemical & Biomolecular Engineering, University of Delaware Newark 19716 Delaware USA
- Center for Plastics Innovation (CPI), University of Delaware Newark 19716 Delaware USA
- Department of Materials Science and Engineering, University of Delaware Newark 19716 Delaware USA
- Center for Research in Soft matter and Polymers (CRiSP), University of Delaware Newark 19716 Delaware USA
| | - LaShanda T J Korley
- Department of Chemical & Biomolecular Engineering, University of Delaware Newark 19716 Delaware USA
- Center for Plastics Innovation (CPI), University of Delaware Newark 19716 Delaware USA
- Department of Materials Science and Engineering, University of Delaware Newark 19716 Delaware USA
- Center for Research in Soft matter and Polymers (CRiSP), University of Delaware Newark 19716 Delaware USA
| |
Collapse
|
30
|
Guha R, Velegol D. Harnessing Shannon entropy-based descriptors in machine learning models to enhance the prediction accuracy of molecular properties. J Cheminform 2023; 15:54. [PMID: 37211605 DOI: 10.1186/s13321-023-00712-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 03/18/2023] [Indexed: 05/23/2023] Open
Abstract
Accurate prediction of molecular properties is essential in the screening and development of drug molecules and other functional materials. Traditionally, property-specific molecular descriptors are used in machine learning models. This in turn requires the identification and development of target or problem-specific descriptors. Additionally, an increase in the prediction accuracy of the model is not always feasible from the standpoint of targeted descriptor usage. We explored the accuracy and generalizability issues using a framework of Shannon entropies, based on SMILES, SMARTS and/or InChiKey strings of respective molecules. Using various public databases of molecules, we showed that the accuracy of the prediction of machine learning models could be significantly enhanced simply by using Shannon entropy-based descriptors evaluated directly from SMILES. Analogous to partial pressures and total pressure of gases in a mixture, we used atom-wise fractional Shannon entropy in combination with total Shannon entropy from respective tokens of the string representation to model the molecule efficiently. The proposed descriptor was competitive in performance with standard descriptors such as Morgan fingerprints and SHED in regression models. Additionally, we found that either a hybrid descriptor set containing the Shannon entropy-based descriptors or an optimized, ensemble architecture of multilayer perceptrons and graph neural networks using the Shannon entropies was synergistic to improve the prediction accuracy. This simple approach of coupling the Shannon entropy framework to other standard descriptors and/or using it in ensemble models could find applications in boosting the performance of molecular property predictions in chemistry and material science.
Collapse
Affiliation(s)
- Rajarshi Guha
- Intel Corporation, 2501 NE Century Blvd, Hillsboro, OR, 97124, USA.
| | - Darrell Velegol
- Department of Chemical Engineering, Pennsylvania State University, University Park, PA, 16802, USA
| |
Collapse
|
31
|
Anstine D, Isayev O. Generative Models as an Emerging Paradigm in the Chemical Sciences. J Am Chem Soc 2023; 145:8736-8750. [PMID: 37052978 PMCID: PMC10141264 DOI: 10.1021/jacs.2c13467] [Citation(s) in RCA: 44] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Indexed: 04/14/2023]
Abstract
Traditional computational approaches to design chemical species are limited by the need to compute properties for a vast number of candidates, e.g., by discriminative modeling. Therefore, inverse design methods aim to start from the desired property and optimize a corresponding chemical structure. From a machine learning viewpoint, the inverse design problem can be addressed through so-called generative modeling. Mathematically, discriminative models are defined by learning the probability distribution function of properties given the molecular or material structure. In contrast, a generative model seeks to exploit the joint probability of a chemical species with target characteristics. The overarching idea of generative modeling is to implement a system that produces novel compounds that are expected to have a desired set of chemical features, effectively sidestepping issues found in the forward design process. In this contribution, we overview and critically analyze popular generative algorithms like generative adversarial networks, variational autoencoders, flow, and diffusion models. We highlight key differences between each of the models, provide insights into recent success stories, and discuss outstanding challenges for realizing generative modeling discovered solutions in chemical applications.
Collapse
Affiliation(s)
- Dylan
M. Anstine
- Department
of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Olexandr Isayev
- Department
of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
32
|
Meyer T, Ramirez C, Tamasi MJ, Gormley AJ. A User's Guide to Machine Learning for Polymeric Biomaterials. ACS POLYMERS AU 2023; 3:141-157. [PMID: 37065715 PMCID: PMC10103193 DOI: 10.1021/acspolymersau.2c00037] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 10/27/2022] [Accepted: 10/27/2022] [Indexed: 11/18/2022]
Abstract
The development of novel biomaterials is a challenging process, complicated by a design space with high dimensionality. Requirements for performance in the complex biological environment lead to difficult a priori rational design choices and time-consuming empirical trial-and-error experimentation. Modern data science practices, especially artificial intelligence (AI)/machine learning (ML), offer the promise to help accelerate the identification and testing of next-generation biomaterials. However, it can be a daunting task for biomaterial scientists unfamiliar with modern ML techniques to begin incorporating these useful tools into their development pipeline. This Perspective lays the foundation for a basic understanding of ML while providing a step-by-step guide to new users on how to begin implementing these techniques. A tutorial Python script has been developed walking users through the application of an ML pipeline using data from a real biomaterial design challenge based on group's research. This tutorial provides an opportunity for readers to see and experiment with ML and its syntax in Python. The Google Colab notebook can be easily accessed and copied from the following URL: www.gormleylab.com/MLcolab.
Collapse
Affiliation(s)
- Travis
A. Meyer
- Department of Biomedical
Engineering, Rutgers, The State University
of New Jersey, Piscataway, New Jersey 08854, United States
| | - Cesar Ramirez
- Department of Biomedical
Engineering, Rutgers, The State University
of New Jersey, Piscataway, New Jersey 08854, United States
| | - Matthew J. Tamasi
- Department of Biomedical
Engineering, Rutgers, The State University
of New Jersey, Piscataway, New Jersey 08854, United States
| | - Adam J. Gormley
- Department of Biomedical
Engineering, Rutgers, The State University
of New Jersey, Piscataway, New Jersey 08854, United States
| |
Collapse
|
33
|
Zhang H, Sundaresan S, Webb MA. Molecular Dynamics Investigation of Nanoscale Hydrophobicity of Polymer Surfaces: What Makes Water Wet? J Phys Chem B 2023. [PMID: 37043668 DOI: 10.1021/acs.jpcb.3c00616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/14/2023]
Abstract
The wettability of a polymer surface─related to its hydrophobicity or tendency to repel water─can be crucial for determining its utility, such as for a coating or a purification membrane. While wettability is commonly associated with the macroscopic measurement of a contact angle between surface, water, and air, the molecular physics that underlie these macroscopic observations are not fully known, and anticipating the relative behavior of different polymers is challenging. To address this gap in molecular-level understanding, we use molecular dynamics simulations to investigate and contrast interactions of water with six chemically distinct polymers: polytetrafluoroethylene, polyethylene, polyvinyl chloride, poly(methyl methacrylate), Nylon-66, and poly(vinyl alcohol). We show that several prospective quantitative metrics for hydrophobicity agree well with experimental contact angles. Moreover, the behavior of water in proximity to these polymer surfaces can be distinguished with analysis of interfacial water dynamics, extent of hydrogen bonding, and molecular orientation─even when macroscopic measures of hydrophobicity are similar. The predominant factor dictating wettability is found to be the extent of hydrogen bonding between polymer and water, but the precise manifestation of hydrogen bonding and its impact on surface water structure varies. In the absence of hydrogen bonding, other molecular interactions and polymer mechanics control hydrophobic ordering. These results provide new insights into how polymer chemistry specifically impacts water-polymer interactions and translates to surface hydrophobicity. Such factors may facilitate the design or processing of polymer surfaces to achieve targeted wetting behavior, and presented analyses can be useful in studying the interfacial physics of other systems.
Collapse
Affiliation(s)
- Hang Zhang
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Sankaran Sundaresan
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| |
Collapse
|
34
|
Walsh DJ, Zou W, Schneider L, Mello R, Deagen ME, Mysona J, Lin TS, de Pablo JJ, Jensen KF, Audus DJ, Olsen BD. Community Resource for Innovation in Polymer Technology (CRIPT): A Scalable Polymer Material Data Structure. ACS CENTRAL SCIENCE 2023; 9:330-338. [PMID: 36968543 PMCID: PMC10037456 DOI: 10.1021/acscentsci.3c00011] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The Community Resource for Innovation in Polymer Technology (CRIPT) data model is designed to address the high complexity in defining a polymer structure and the intricacies involved with characterizing material properties.
Collapse
Affiliation(s)
- Dylan J. Walsh
- Department of Chemical
Engineering, Massachusetts Institute of
Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Weizhong Zou
- Department of Chemical
Engineering, Massachusetts Institute of
Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Ludwig Schneider
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Reid Mello
- Department of Chemical
Engineering, Massachusetts Institute of
Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Michael E. Deagen
- Department of Chemical
Engineering, Massachusetts Institute of
Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Joshua Mysona
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Tzyy-Shyang Lin
- Department of Chemical
Engineering, Massachusetts Institute of
Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Juan J. de Pablo
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Klavs F. Jensen
- Department of Chemical
Engineering, Massachusetts Institute of
Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Debra J. Audus
- Materials
Science and Engineering Division, National
Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Bradley D. Olsen
- Department of Chemical
Engineering, Massachusetts Institute of
Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
35
|
Nguyen T, Bavarian M. Machine learning approach to polymer reaction engineering: Determining monomers reactivity ratios. POLYMER 2023. [DOI: 10.1016/j.polymer.2023.125866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
|
36
|
Gurnani R, Kuenneth C, Toland A, Ramprasad R. Polymer Informatics at Scale with Multitask Graph Neural Networks. CHEMISTRY OF MATERIALS : A PUBLICATION OF THE AMERICAN CHEMICAL SOCIETY 2023; 35:1560-1567. [PMID: 36873627 PMCID: PMC9979603 DOI: 10.1021/acs.chemmater.2c02991] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 02/03/2023] [Indexed: 06/18/2023]
Abstract
Artificial intelligence-based methods are becoming increasingly effective at screening libraries of polymers down to a selection that is manageable for experimental inquiry. The vast majority of presently adopted approaches for polymer screening rely on handcrafted chemostructural features extracted from polymer repeat units-a burdensome task as polymer libraries, which approximate the polymer chemical search space, progressively grow over time. Here, we demonstrate that directly "machine learning" important features from a polymer repeat unit is a cheap and viable alternative to extracting expensive features by hand. Our approach-based on graph neural networks, multitask learning, and other advanced deep learning techniques-speeds up feature extraction by 1-2 orders of magnitude relative to presently adopted handcrafted methods without compromising model accuracy for a variety of polymer property prediction tasks. We anticipate that our approach, which unlocks the screening of truly massive polymer libraries at scale, will enable more sophisticated and large scale screening technologies in the field of polymer informatics.
Collapse
|
37
|
Yu M, Shi Y, Jia Q, Wang Q, Luo ZH, Yan F, Zhou YN. Ring Repeating Unit: An Upgraded Structure Representation of Linear Condensation Polymers for Property Prediction. J Chem Inf Model 2023; 63:1177-1187. [PMID: 36651860 DOI: 10.1021/acs.jcim.2c01389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Unique structure representation of polymers plays a crucial role in developing models for polymer property prediction and polymer design by data-centric approaches. Currently, monomer and repeating unit (RU) approximations are widely used to represent polymer structures for generating feature descriptors in the modeling of quantitative structure-property relationships (QSPR). However, such conventional structure representations may not uniquely approximate heterochain polymers due to the diversity of monomer combinations and the potential multi-RUs. In this study, the so-called ring repeating unit (RRU) method that can uniquely represent polymers with a broad range of structure diversity is proposed for the first time. As a proof of concept, an RRU-based QSPR model was developed to predict the associated glass transition temperature (Tg) of polyimides (PIs) with deterministic values. Comprehensive model validations including external, internal, and Y-random validations were performed. Also, an RU-based QSPR model developed based on the same large database of 1321 PIs provides nonunique prediction results, which further prove the necessity of RRU-based structure representation. Promising results obtained by the application of the RRU-based model confirm that the as-developed RRU method provides an effective representation that accurately captures the sequence of repeat units and thus realizes reliable polymer property prediction by data-driven approaches.
Collapse
Affiliation(s)
- Mengxian Yu
- School of Chemical Engineering and Materials Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Yajuan Shi
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai200240, P. R. China
| | - Qingzhu Jia
- School of Marine and Environmental Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Qiang Wang
- School of Chemical Engineering and Materials Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Zheng-Hong Luo
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai200240, P. R. China
| | - Fangyou Yan
- School of Chemical Engineering and Materials Science, Tianjin University of Science and Technology, Tianjin300457, P. R. China
| | - Yin-Ning Zhou
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai200240, P. R. China
| |
Collapse
|
38
|
Ucak UV, Ashyrmamatov I, Lee J. Reconstruction of lossless molecular representations from fingerprints. J Cheminform 2023; 15:26. [PMID: 36823647 PMCID: PMC9948316 DOI: 10.1186/s13321-023-00693-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 02/04/2023] [Indexed: 02/25/2023] Open
Abstract
The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the internal structure of SMILES representations. In this context, this study exploits the resolution and robustness of unique molecular representations, i.e., SMILES and SELFIES (SELF-referencIng Embedded strings), reconstructed from a set of structural fingerprints, which are proposed and used herein as vital representational tools for chemical and natural language processing (NLP) applications. This is achieved by restoring the connectivity information lost during fingerprint transformation with high accuracy. Notably, the results reveal that seemingly irreversible molecule-to-fingerprint conversion is feasible. More specifically, four structural fingerprints, extended connectivity, topological torsion, atom pairs, and atomic environments can be used as inputs and outputs of chemical NLP applications. Therefore, this comprehensive study addresses the major limitation of structural fingerprints that precludes their use in NLP models. Our findings will facilitate the development of text- or fingerprint-based chemoinformatic models for generative and translational tasks.
Collapse
Affiliation(s)
- Umit V. Ucak
- grid.31501.360000 0004 0470 5905Research Institute of Pharmaceutical Science, College of Pharmacy, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826 Republic of Korea
| | - Islambek Ashyrmamatov
- grid.412010.60000 0001 0707 9039Department of Chemistry, Kangwon National University, Chuncheon, 24341 Republic of Korea
| | - Juyong Lee
- Research Institute of Pharmaceutical Science, College of Pharmacy, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea. .,Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea.
| |
Collapse
|
39
|
Lin TS, Rebello NJ, Lee GH, Morris MA, Olsen BD. Canonicalizing BigSMILES for Polymers with Defined Backbones. ACS POLYMERS AU 2022; 2:486-500. [PMID: 36561286 PMCID: PMC9761857 DOI: 10.1021/acspolymersau.2c00009] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 08/16/2022] [Accepted: 08/17/2022] [Indexed: 11/06/2022]
Abstract
BigSMILES, a line notation for encapsulating the molecular structure of stochastic molecules such as polymers, was recently proposed as a compact and readable solution for writing macromolecules. While BigSMILES strings serve as useful identifiers for reconstructing the molecular connectivity for polymers, in general, BigSMILES allows the same polymer to be codified into multiple equally valid representations. Having a canonicalization scheme that eliminates the multiplicity would be very useful in reducing time-intensive tasks like structural comparison and molecular search into simple string-matching tasks. Motivated by this, in this work, two strategies for deriving canonical representations for linear polymers are proposed. In the first approach, a canonicalization scheme is proposed to standardize the expression of BigSMILES stochastic objects, thereby standardizing the expression of overall BigSMILES strings. In the second approach, an analogy between formal language theory and the molecular ensemble of polymer molecules is drawn. Linear polymers can be converted into regular languages, and the minimal deterministic finite automaton uniquely associated with each prescribed language is used as the basis for constructing the unique text identifier associated with each distinct polymer. Overall, this work presents algorithms to convert linear polymers into unique structure-based text identifiers. The derived identifiers can be readily applied in chemical information systems for polymers and other polymer informatics applications.
Collapse
Affiliation(s)
- Tzyy-Shyang Lin
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States
| | - Nathan J. Rebello
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States
| | - Guang-He Lee
- Computer
Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States
| | - Melody A. Morris
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States
| | - Bradley D. Olsen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts02139, United States,
| |
Collapse
|
40
|
Recent advances and challenges in experiment-oriented polymer informatics. Polym J 2022. [DOI: 10.1038/s41428-022-00734-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
41
|
Li YQ, Jiang Y, Wang LQ, Li JF. Data and Machine Learning in Polymer Science. CHINESE JOURNAL OF POLYMER SCIENCE 2022. [DOI: 10.1007/s10118-022-2868-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
42
|
Xu H, Ma S, Hou Y, Zhang Q, Wang R, Luo Y, Gao X. Machine Learning-Assisted Identification of Copolymer Microstructures Based on Microscopic Images. ACS APPLIED MATERIALS & INTERFACES 2022; 14:47157-47166. [PMID: 36206079 DOI: 10.1021/acsami.2c15311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The microstructure of polymer materials is an important bridge between their molecular structure and macroproperties, which is of great significance to be effectively identified. With the increasing refinement of polymer material design, the microstructure of different polymer materials gradually converges, which is difficult to distinguish. In this study, the machine learning method is applied to recognize the microstructure. A highly accurate and interpretable model based on small experimental data sets has been completed by the methods of transfer learning and feature visualization, making the result of the model that can be explained from the perspective of physical chemistry. This work provides an idea for identifying microstructure and will help further promote intelligent polymer research and development.
Collapse
Affiliation(s)
- Han Xu
- The State Key Laboratory of Chemical Engineering, College of Chemical and Biological Engineering, Zhejiang University, 38 Zheda Road, Hangzhou310027, China
| | - Sainan Ma
- The State Key Laboratory of Chemical Engineering, College of Chemical and Biological Engineering, Zhejiang University, 38 Zheda Road, Hangzhou310027, China
- Ningbo Research Institute, Zhejiang University, Ningbo315100, China
| | - Yang Hou
- The State Key Laboratory of Chemical Engineering, College of Chemical and Biological Engineering, Zhejiang University, 38 Zheda Road, Hangzhou310027, China
| | - Qinghua Zhang
- The State Key Laboratory of Chemical Engineering, College of Chemical and Biological Engineering, Zhejiang University, 38 Zheda Road, Hangzhou310027, China
| | - Rui Wang
- Department of Chemical and Biomolecular Engineering, University of California Berkeley, Berkeley, California94720, United States
| | - Yingwu Luo
- The State Key Laboratory of Chemical Engineering, College of Chemical and Biological Engineering, Zhejiang University, 38 Zheda Road, Hangzhou310027, China
| | - Xiang Gao
- The State Key Laboratory of Chemical Engineering, College of Chemical and Biological Engineering, Zhejiang University, 38 Zheda Road, Hangzhou310027, China
- Ningbo Research Institute, Zhejiang University, Ningbo315100, China
| |
Collapse
|
43
|
Krenn M, Ai Q, Barthel S, Carson N, Frei A, Frey NC, Friederich P, Gaudin T, Gayle AA, Jablonka KM, Lameiro RF, Lemm D, Lo A, Moosavi SM, Nápoles-Duarte JM, Nigam A, Pollice R, Rajan K, Schatzschneider U, Schwaller P, Skreta M, Smit B, Strieth-Kalthoff F, Sun C, Tom G, Falk von Rudorff G, Wang A, White AD, Young A, Yu R, Aspuru-Guzik A. SELFIES and the future of molecular string representations. PATTERNS (NEW YORK, N.Y.) 2022; 3:100588. [PMID: 36277819 PMCID: PMC9583042 DOI: 10.1016/j.patter.2022.100588] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings-most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.
Collapse
Affiliation(s)
- Mario Krenn
- Max Planck Institute for the Science of Light (MPL), Erlangen, Germany
| | - Qianxiang Ai
- Department of Chemistry, Fordham University, The Bronx, NY, USA
| | - Senja Barthel
- Department of Mathematics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Nessa Carson
- Syngenta Jealott’s Hill International Research Centre, Bracknell, Berkshire, UK
| | - Angelo Frei
- Department of Chemistry, Imperial College London, Molecular Sciences Research Hub, White City Campus, Wood Lane, London, UK
| | - Nathan C. Frey
- Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- IBM Research Europe, Zürich, Switzerland
| | | | - Kevin Maik Jablonka
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Rafael F. Lameiro
- Medicinal and Biological Chemistry Group, São Carlos Institute of Chemistry, University of São Paulo, São Paulo, Brazil
| | - Dominik Lemm
- Faculty of Physics, University of Vienna, Vienna, Austria
| | - Alston Lo
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Seyed Mohamad Moosavi
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | | | - AkshatKumar Nigam
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Robert Pollice
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller Universität Jena, Jena, Germany
| | - Ulrich Schatzschneider
- Institut für Anorganische Chemie, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Philippe Schwaller
- IBM Research Europe, Zürich, Switzerland
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Marta Skreta
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Berend Smit
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Felix Strieth-Kalthoff
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Chong Sun
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Gary Tom
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | | | - Andrew Wang
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Solar Fuels Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, NY, USA
| | - Adamo Young
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Rose Yu
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Alán Aspuru-Guzik
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Materials Science, University of Toronto, Toronto, ON, Canada
- Canadian Institute for Advanced Research (CIFAR) Lebovic Fellow, Toronto, ON, Canada
| |
Collapse
|
44
|
Andraju N, Curtzwiler GW, Ji Y, Kozliak E, Ranganathan P. Machine-Learning-Based Predictions of Polymer and Postconsumer Recycled Polymer Properties: A Comprehensive Review. ACS APPLIED MATERIALS & INTERFACES 2022; 14:42771-42790. [PMID: 36102317 DOI: 10.1021/acsami.2c08301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
There has been a tremendous increase in demand for virgin and postconsumer recycled (PCR) polymers due to their wide range of chemical and physical characteristics. Despite the numerous potential benefits of using a data-driven approach to polymer design, major hurdles exist in the development of polymer informatics due to the complicated hierarchical polymer structures. In this review, a brief introduction on virgin polymer structure, PCR polymers, compatibilization of polymers to be recycled, and their characterization using sensor array technologies as well as factors affecting the polymer properties are provided. Machine-learning (ML) algorithms are gaining attention as cost-effective scalable solutions to exploit the physical and chemical structures of polymers. The basic steps for applying ML in polymer science such as fingerprinting, algorithms, open-source databases, representations, and polymer design are detailed in this review. Further, a state-of-the-art review of the prediction of various polymer material properties using ML is reviewed. Finally, we discuss open-ended research questions on ML application to PCR polymers as well as potential challenges in the prediction of their properties using artificial intelligence for more efficient and targeted PCR polymer discovery and development.
Collapse
Affiliation(s)
- Nagababu Andraju
- School of Electrical Engineering and Computer Science (SEECS), University of North Dakota, Grand Forks, North Dakota 58202, United States
| | - Greg W Curtzwiler
- Polymer and Food Protection Consortium, Department of Food Science and Human Nutrition, Iowa State University, Ames, Iowa 50011, United States
| | - Yun Ji
- Department of Chemical Engineering, University of North Dakota, Grand Forks, North Dakota 58202, United States
| | - Evguenii Kozliak
- Department of Chemistry, University of North Dakota, Grand Forks, North Dakota 58202, United States
| | - Prakash Ranganathan
- School of Electrical Engineering and Computer Science (SEECS), University of North Dakota, Grand Forks, North Dakota 58202, United States
| |
Collapse
|
45
|
Aldeghi M, Coley CW. A graph representation of molecular ensembles for polymer property prediction. Chem Sci 2022; 13:10486-10498. [PMID: 36277616 PMCID: PMC9473492 DOI: 10.1039/d2sc02839e] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 08/15/2022] [Indexed: 12/02/2022] Open
Abstract
Synthetic polymers are versatile and widely used materials. Similar to small organic molecules, a large chemical space of such materials is hypothetically accessible. Computational property prediction and virtual screening can accelerate polymer design by prioritizing candidates expected to have favorable properties. However, in contrast to organic molecules, polymers are often not well-defined single structures but an ensemble of similar molecules, which poses unique challenges to traditional chemical representations and machine learning approaches. Here, we introduce a graph representation of molecular ensembles and an associated graph neural network architecture that is tailored to polymer property prediction. We demonstrate that this approach captures critical features of polymeric materials, like chain architecture, monomer stoichiometry, and degree of polymerization, and achieves superior accuracy to off-the-shelf cheminformatics methodologies. While doing so, we built a dataset of simulated electron affinity and ionization potential values for >40k polymers with varying monomer composition, stoichiometry, and chain architecture, which may be used in the development of other tailored machine learning approaches. The dataset and machine learning models presented in this work pave the path toward new classes of algorithms for polymer informatics and, more broadly, introduce a framework for the modeling of molecular ensembles.
Collapse
Affiliation(s)
- Matteo Aldeghi
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
46
|
Fox T, Bieler M, Haebel P, Ochoa R, Peters S, Weber A. BILN: A Human-Readable Line Notation for Complex Peptides. J Chem Inf Model 2022; 62:3942-3947. [PMID: 35984937 DOI: 10.1021/acs.jcim.2c00703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We present an easy, human-readable line notation to describe even complex peptides.
Collapse
Affiliation(s)
- Thomas Fox
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Michael Bieler
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Peter Haebel
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Rodrigo Ochoa
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Stefan Peters
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| | - Alexander Weber
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| |
Collapse
|
47
|
Shi J, Quevillon MJ, Amorim Valença PH, Whitmer JK. Predicting Adhesive Free Energies of Polymer-Surface Interactions with Machine Learning. ACS APPLIED MATERIALS & INTERFACES 2022; 14:37161-37169. [PMID: 35917495 DOI: 10.1021/acsami.2c08891] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Polymer-surface interactions are crucial to many biological processes and industrial applications. Here we propose a machine learning method to connect a model polymer's sequence with its adhesion to decorated surfaces. We simulate the adhesive free energies of 20000 unique coarse-grained one-dimensional polymer sequences interacting with functionalized surfaces and build support vector regression models that demonstrate inexpensive and reliable prediction of the adhesive free energy as a function of sequence. Our work highlights the promising integration of coarse-grained simulation with data-driven machine learning methods for the design of functional polymers and represents an important step toward linking polymer compositions with polymer-surface interactions.
Collapse
Affiliation(s)
- Jiale Shi
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Michael J Quevillon
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Pedro H Amorim Valença
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Jonathan K Whitmer
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| |
Collapse
|
48
|
Guo M, Shou W, Makatura L, Erps T, Foshey M, Matusik W. Polygrammar: Grammar for Digital Polymer Representation and Generation. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2022; 9:e2101864. [PMID: 35678650 PMCID: PMC9376847 DOI: 10.1002/advs.202101864] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 12/04/2021] [Indexed: 05/22/2023]
Abstract
Polymers are widely studied materials with diverse properties and applications determined by molecular structures. It is essential to represent these structures clearly and explore the full space of achievable chemical designs. However, existing approaches cannot offer comprehensive design models for polymers because of their inherent scale and structural complexity. Here, a parametric, context-sensitive grammar designed specifically for polymers (PolyGrammar) is proposed. Using the symbolic hypergraph representation and 14 simple production rules, PolyGrammar can represent and generate all valid polyurethane structures. An algorithm is presented to translate any polyurethane structure from the popular Simplified Molecular-Input Line-entry System (SMILES) string format into the PolyGrammar representation. The representative power of PolyGrammar is tested by translating a dataset of over 600 polyurethane samples collected from the literature. Furthermore, it is shown that PolyGrammar can be easily extended to other copolymers and homopolymers. By offering a complete, explicit representation scheme and an explainable generative model with validity guarantees, PolyGrammar takes an essential step toward a more comprehensive and practical system for polymer discovery and exploration. As the first bridge between formal languages and chemistry, PolyGrammar also serves as a critical blueprint to inform the design of similar grammars for other chemistries, including organic and inorganic molecules.
Collapse
Affiliation(s)
- Minghao Guo
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
- CUHK Multimedia LabThe Chinese University of Hong KongSha TinHong Kong
| | - Wan Shou
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Liane Makatura
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Timothy Erps
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Michael Foshey
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| | - Wojciech Matusik
- Computer Science and Artificial Intelligence LabMassachusetts Institute of TechnologyCambridgeMA02139USA
| |
Collapse
|
49
|
Wang Y, Kalscheur J, Ebikade E, Li Q, Vlachos DG. LigninGraphs: lignin structure determination with multiscale graph modeling. J Cheminform 2022; 14:43. [PMID: 35794646 PMCID: PMC9261032 DOI: 10.1186/s13321-022-00627-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 06/12/2022] [Indexed: 11/10/2022] Open
Abstract
Lignin is an aromatic biopolymer found in ubiquitous sources of woody biomass. Designing and optimizing lignin valorization processes requires a fundamental understanding of lignin structures. Experimental characterization techniques, such as 2D-heteronuclear single quantum coherence (HSQC) nuclear magnetic resonance (NMR) spectra, could elucidate the global properties of the polymer molecules. Computer models could extend the resolution of experiments by representing structures at the molecular and atomistic scales. We introduce a graph-based multiscale modeling framework for lignin structure generation and visualization. The framework employs accelerated rejection-free polymerization and hierarchical Metropolis Monte Carlo optimization algorithms. We obtain structure libraries for various lignin feedstocks based on literature and new experimental NMR data for poplar wood, pinewood, and herbaceous lignin. The framework could guide researchers towards feasible lignin structures, efficient space exploration, and future kinetics modeling. Its software implementation in Python, LigninGraphs, is open-source and available on GitHub.
Collapse
Affiliation(s)
- Yifan Wang
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy St, Newark, DE, 19716, USA.,Catalysis Center for Energy Innovation, RAPID Manufacturing Institute, and Delaware Energy Institute (DEI), University of Delaware, 221 Academy St, Newark, DE, 19716, USA
| | - Jake Kalscheur
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy St, Newark, DE, 19716, USA.,Catalysis Center for Energy Innovation, RAPID Manufacturing Institute, and Delaware Energy Institute (DEI), University of Delaware, 221 Academy St, Newark, DE, 19716, USA
| | - Elvis Ebikade
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy St, Newark, DE, 19716, USA.,Catalysis Center for Energy Innovation, RAPID Manufacturing Institute, and Delaware Energy Institute (DEI), University of Delaware, 221 Academy St, Newark, DE, 19716, USA
| | - Qiang Li
- Catalysis Center for Energy Innovation, RAPID Manufacturing Institute, and Delaware Energy Institute (DEI), University of Delaware, 221 Academy St, Newark, DE, 19716, USA
| | - Dionisios G Vlachos
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy St, Newark, DE, 19716, USA. .,Catalysis Center for Energy Innovation, RAPID Manufacturing Institute, and Delaware Energy Institute (DEI), University of Delaware, 221 Academy St, Newark, DE, 19716, USA.
| |
Collapse
|
50
|
Tamasi MJ, Patel RA, Borca CH, Kosuri S, Mugnier H, Upadhya R, Murthy NS, Webb MA, Gormley AJ. Machine Learning on a Robotic Platform for the Design of Polymer-Protein Hybrids. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2022. [PMID: 35593444 DOI: 10.34770/h938-nn26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Polymer-protein hybrids are intriguing materials that can bolster protein stability in non-native environments, thereby enhancing their utility in diverse medicinal, commercial, and industrial applications. One stabilization strategy involves designing synthetic random copolymers with compositions attuned to the protein surface, but rational design is complicated by the vast chemical and composition space. Here, a strategy is reported to design protein-stabilizing copolymers based on active machine learning, facilitated by automated material synthesis and characterization platforms. The versatility and robustness of the approach is demonstrated by the successful identification of copolymers that preserve, or even enhance, the activity of three chemically distinct enzymes following exposure to thermal denaturing conditions. Although systematic screening results in mixed success, active learning appropriately identifies unique and effective copolymer chemistries for the stabilization of each enzyme. Overall, this work broadens the capabilities to design fit-for-purpose synthetic copolymers that promote or otherwise manipulate protein activity, with extensions toward the design of robust polymer-protein hybrid materials.
Collapse
Affiliation(s)
- Matthew J Tamasi
- Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Roshan A Patel
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, 08544, USA
| | - Carlos H Borca
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, 08544, USA
| | - Shashank Kosuri
- Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Heloise Mugnier
- Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Rahul Upadhya
- Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - N Sanjeeva Murthy
- Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, 08544, USA
| | - Adam J Gormley
- Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| |
Collapse
|