1
|
Jones MS, Khanna S, Ferguson AL. FlowBack: A Generalized Flow-Matching Approach for Biomolecular Backmapping. J Chem Inf Model 2025. [PMID: 39772562 DOI: 10.1021/acs.jcim.4c02046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
Abstract
Coarse-grained models have become ubiquitous in biomolecular modeling tasks aimed at studying slow dynamical processes such as protein folding and DNA hybridization. These models can considerably accelerate sampling but it remains challenging to accurately and efficiently restore all-atom detail to the coarse-grained trajectory, which can be vital for detailed understanding of molecular mechanisms and calculation of observables contingent on all-atom coordinates. In this work, we introduce FlowBack as a deep generative model employing a flow-matching objective to map samples from a coarse-grained prior distribution to an all-atom data distribution. We construct our prior distribution to be agnostic to the coarse-grained map and molecular type. A protein-specific model trained on ∼65k structures from the Protein Data Bank achieves state-of-the-art performance on structural metrics compared to previous generative and rules-based approaches in applications to static PDB structures, all-atom simulations of fast-folding proteins, and coarse-grained trajectories generated by a machine-learned force field. A DNA-protein model trained on ∼1.5k DNA-protein complexes achieves excellent reconstruction and generative capabilities on static DNA-protein complexes from the Protein Data Bank as well as on out-of-distribution coarse-grained dynamical simulations of DNA-protein complexation. FlowBack offers an accurate, efficient, and easy-to-use tool to recover all-atom structures from coarse-grained molecular simulations with higher robustness and fewer steric clashes than previous approaches. We make FlowBack freely available to the community as an open source Python package.
Collapse
Affiliation(s)
- Michael S Jones
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Smayan Khanna
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
2
|
Nowitzke J, Bista S, Raman S, Dahal N, Stirnemann G, Popa I. Mechanical Unfolding of Network Nodes Drives the Stress Response of Protein-Based Materials. ACS NANO 2024; 18:31031-31043. [PMID: 39487800 DOI: 10.1021/acsnano.4c07352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2024]
Abstract
Biomaterials synthesized from cross-linked folded proteins have untapped potential for biocompatible, resilient, and responsive implementations, but face challenges due to costly molecular refinement and limited understanding of their mechanical response. Under a stress vector, these materials combine the gel-like response of cross-linked networks with the mechanical unfolding and extension of proteins from well-defined 3D structures to unstructured polypeptides. Yet the nanoscale dynamics governing their viscoelastic response remains poorly understood. This lack of understanding is further exacerbated by the fact that the mechanical stability of protein domains depends not only on their structure, but also on the direction of the force vector. To this end, here we propose a coarse-grained network model based on the physical characteristics of polyproteins and combine it with the mechanical unfolding response of protein domains, obtained from single molecule measurements and steered molecular dynamics simulations, to explain the macroscopic response of protein-based materials to a stress vector. We find that domains are about 10-fold more stable when force is applied along their end-to-end coordinate than along the other tethering geometries that are possible inside the biomaterial. As such, the macroscopic response of protein-based materials is mainly driven by the unfolding of the node-domains and rearrangement of these nodes inside the material. The predictions from our models are then confirmed experimentally using force-clamp rheometry. This model is a critical step toward developing protein-based materials with predictable response and that can enable applications for shape memory and energy storage and dissipation.
Collapse
Affiliation(s)
- Joel Nowitzke
- Department of Physics, University of Wisconsin-Milwaukee, 3135 N Maryland Avenue, Milwaukee, Wisconsin 53211, United States
| | - Sanam Bista
- Department of Physics, University of Wisconsin-Milwaukee, 3135 N Maryland Avenue, Milwaukee, Wisconsin 53211, United States
| | - Sadia Raman
- Department of Physics, University of Wisconsin-Milwaukee, 3135 N Maryland Avenue, Milwaukee, Wisconsin 53211, United States
| | - Narayan Dahal
- Department of Physics, University of Wisconsin-Milwaukee, 3135 N Maryland Avenue, Milwaukee, Wisconsin 53211, United States
| | - Guillaume Stirnemann
- PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne University, CNRS, Paris 75005, France
| | - Ionel Popa
- Department of Physics, University of Wisconsin-Milwaukee, 3135 N Maryland Avenue, Milwaukee, Wisconsin 53211, United States
| |
Collapse
|
3
|
Lowry GV, Giraldo JP, Steinmetz NF, Avellan A, Demirer GS, Ristroph KD, Wang GJ, Hendren CO, Alabi CA, Caparco A, da Silva W, González-Gamboa I, Grieger KD, Jeon SJ, Khodakovskaya MV, Kohay H, Kumar V, Muthuramalingam R, Poffenbarger H, Santra S, Tilton RD, White JC. Towards realizing nano-enabled precision delivery in plants. NATURE NANOTECHNOLOGY 2024; 19:1255-1269. [PMID: 38844663 DOI: 10.1038/s41565-024-01667-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Accepted: 03/27/2024] [Indexed: 09/18/2024]
Abstract
Nanocarriers (NCs) that can precisely deliver active agents, nutrients and genetic materials into plants will make crop agriculture more resilient to climate change and sustainable. As a research field, nano-agriculture is still developing, with significant scientific and societal barriers to overcome. In this Review, we argue that lessons can be learned from mammalian nanomedicine. In particular, it may be possible to enhance efficiency and efficacy by improving our understanding of how NC properties affect their interactions with plant surfaces and biomolecules, and their ability to carry and deliver cargo to specific locations. New tools are required to rapidly assess NC-plant interactions and to explore and verify the range of viable targeting approaches in plants. Elucidating these interactions can lead to the creation of computer-generated in silico models (digital twins) to predict the impact of different NC and plant properties, biological responses, and environmental conditions on the efficiency and efficacy of nanotechnology approaches. Finally, we highlight the need for nano-agriculture researchers and social scientists to converge in order to develop sustainable, safe and socially acceptable NCs.
Collapse
Affiliation(s)
- Gregory V Lowry
- Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Juan Pablo Giraldo
- Botany and Plant Sciences, University of California, Riverside, Riverside, CA, USA.
| | - Nicole F Steinmetz
- Department of NanoEngineering, University of California San Diego, San Diego, CA, USA
- Department of Bioengineering, University of California San Diego, San Diego, CA, USA
- Department of Radiology, University of California San Diego, San Diego, CA, USA
- Center for Nano-ImmunoEngineering, University of California San Diego, San Diego, CA, USA
- Shu and K.C. Chien and Peter Farrell Collaboratory, University of California San Diego, San Diego, CA, USA
- Center for Engineering in Cancer, Institute of Engineering in Medicine, University of California San Diego, San Diego, CA, USA
- Moores Cancer Center, University of California, University of California San Diego, San Diego, CA, USA
- Institute for Materials Discovery and Design, University of California San Diego, San Diego, CA, USA
| | | | - Gozde S Demirer
- Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Kurt D Ristroph
- Agricultural and Biological Engineering, Purdue University, West Lafayette, IN, USA
| | - Gerald J Wang
- Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Christine O Hendren
- Geological and Environmental Sciences, Appalachian State University, Boone, NC, USA
| | | | - Adam Caparco
- Department of NanoEngineering, University of California San Diego, San Diego, CA, USA
| | | | | | - Khara D Grieger
- Applied Ecology, North Carolina State University, Raleigh, NC, USA
| | - Su-Ji Jeon
- Botany and Plant Sciences, University of California, Riverside, Riverside, CA, USA
| | | | - Hagay Kohay
- Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Vivek Kumar
- Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
| | | | | | - Swadeshmukul Santra
- Department of Chemistry and Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL, USA
| | - Robert D Tilton
- Chemical Engineering and Biomedical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Jason C White
- The Connecticut Agricultural Research Station, New Haven, CT, USA
| |
Collapse
|
4
|
Piskorz T, Perez-Chirinos L, Qiao B, Sasselli IR. Tips and Tricks in the Modeling of Supramolecular Peptide Assemblies. ACS OMEGA 2024; 9:31254-31273. [PMID: 39072142 PMCID: PMC11270692 DOI: 10.1021/acsomega.4c02628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 06/17/2024] [Accepted: 06/19/2024] [Indexed: 07/30/2024]
Abstract
Supramolecular peptide assemblies (SPAs) hold promise as materials for nanotechnology and biomedicine. Although their investigation often entails adapting experimental techniques from their protein counterparts, SPAs are fundamentally distinct from proteins, posing unique challenges for their study. Computational methods have emerged as indispensable tools for gaining deeper insights into SPA structures at the molecular level, surpassing the limitations of experimental techniques, and as screening tools to reduce the experimental search space. However, computational studies have grappled with issues stemming from the absence of standardized procedures and relevant crystal structures. Fundamental disparities between SPAs and protein simulations, such as the absence of experimentally validated initial structures and the importance of the simulation size, number of molecules, and concentration, have compounded these challenges. Understanding the roles of various parameters and the capabilities of different models and simulation setups remains an ongoing endeavor. In this review, we aim to provide readers with guidance on the parameters to consider when conducting SPA simulations, elucidating their potential impact on outcomes and validity.
Collapse
Affiliation(s)
| | - Laura Perez-Chirinos
- Center
for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramón 182, 20014 Donostia-San Sebastián, Spain
| | - Baofu Qiao
- Department
of Natural Sciences, Baruch College, City
University of New York, New York, New York 10010, United States
| | - Ivan R. Sasselli
- Centro
de Física de Materiales (CFM), CSIC-UPV/EHU, Paseo Manuel de Lardizabal 5, 20018 San Sebastián, Spain
| |
Collapse
|
5
|
Salter LC, Wojciechowski JP, McLean B, Charchar P, Barnes PRF, Creamer A, Doutch J, Barriga HMG, Holme MN, Yarovsky I, Stevens MM. 3,4-Ethylenedioxythiophene Hydrogels: Relating Structure and Charge Transport in Supramolecular Gels. CHEMISTRY OF MATERIALS : A PUBLICATION OF THE AMERICAN CHEMICAL SOCIETY 2024; 36:3092-3106. [PMID: 38617802 PMCID: PMC11007859 DOI: 10.1021/acs.chemmater.3c01360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 03/05/2024] [Accepted: 03/05/2024] [Indexed: 04/16/2024]
Abstract
Ionic charge transport is a ubiquitous language of communication in biological systems. As such, bioengineering is in constant need of innovative, soft, and biocompatible materials that facilitate ionic conduction. Low molecular weight gelators (LMWGs) are complex self-assembled materials that have received increasing attention in recent years. Beyond their biocompatible, self-healing, and stimuli responsive facets, LMWGs can be viewed as a "solid" electrolyte solution. In this work, we investigate 3,4-ethylenedioxythiophene (EDOT) as a capping group for a small peptide library, which we use as a system to understand the relationship between modes of assembly and charge transport in supramolecular gels. Through a combination of techniques including small-angle neutron scattering (SANS), NMR-based Van't Hoff analysis, atomic force microscopy (AFM), rheology, four-point probe, and electrochemical impedance spectroscopy (EIS), we found that modifications to the peptide sequence result in distinct assembly pathways, thermodynamic parameters, mechanical properties, and ionic conductivities. Four-point probe conductivity measurements and electrochemical impedance spectroscopy suggest that ionic conductivity is approximately doubled by programmable gel assemblies with hollow cylinder morphologies relative to gels containing solid fibers or a control electrolyte. More broadly, it is hoped this work will serve as a platform for those working on charge transport of aqueous soft materials in general.
Collapse
Affiliation(s)
- Luke C.
B. Salter
- Department
of Materials and Department of Bioengineering, Institute of Biomedical
Engineering, Imperial College London, London SW7 2AZ, United Kingdom
| | - Jonathan P. Wojciechowski
- Department
of Materials and Department of Bioengineering, Institute of Biomedical
Engineering, Imperial College London, London SW7 2AZ, United Kingdom
| | - Ben McLean
- School
of Engineering, RMIT University, Melbourne, Victoria 3001, Australia
- ARC
Research Hub for Australian Steel Innovation, https://www.rmit.edu.au/research/centres-collaborations/multi-partner-collaborations/arc-research-hub-aus-steel-manufacturing
| | - Patrick Charchar
- School
of Engineering, RMIT University, Melbourne, Victoria 3001, Australia
| | - Piers R. F. Barnes
- Department
of Physics, Imperial College London, London SW7 2AZ, United
Kingdom
| | - Adam Creamer
- Department
of Materials and Department of Bioengineering, Institute of Biomedical
Engineering, Imperial College London, London SW7 2AZ, United Kingdom
| | - James Doutch
- ISIS
Muon and Neutron Source, Rutherford Appleton
Laboratory, Harwell Campus, Oxfordshire OX11 0QX, United Kingdom
| | - Hanna M. G. Barriga
- Department
of Medical Biochemistry and Biophysics, Karolinska Institute, 171 77 Stockholm, Sweden
| | - Margaret N. Holme
- Department
of Medical Biochemistry and Biophysics, Karolinska Institute, 171 77 Stockholm, Sweden
| | - Irene Yarovsky
- School
of Engineering, RMIT University, Melbourne, Victoria 3001, Australia
| | - Molly M. Stevens
- Department
of Materials and Department of Bioengineering, Institute of Biomedical
Engineering, Imperial College London, London SW7 2AZ, United Kingdom
- Department
of Medical Biochemistry and Biophysics, Karolinska Institute, 171 77 Stockholm, Sweden
- Department
of Physiology, Anatomy and Genetics, Department of Engineering Science,
and Kavli Institute for Nanoscience Discovery, University of Oxford, OX1
3QU, Oxford, United Kingdom
| |
Collapse
|
6
|
Wu Z, Zhou T. Structural Coarse-Graining via Multiobjective Optimization with Differentiable Simulation. J Chem Theory Comput 2024; 20:2605-2617. [PMID: 38483262 DOI: 10.1021/acs.jctc.3c01348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
In the realm of multiscale molecular simulations, structure-based coarse-graining is a prominent approach for creating efficient coarse-grained (CG) representations of soft matter systems, such as polymers. This involves optimizing CG interactions by matching static correlation functions of the corresponding degrees of freedom in all-atom (AA) models. Here, we present a versatile method, namely, differentiable coarse-graining (DiffCG), which combines multiobjective optimization and differentiable simulation. The DiffCG approach is capable of constructing robust CG models by iteratively optimizing the effective potentials to simultaneously match multiple target properties. We demonstrate our approach by concurrently optimizing bonded and nonbonded potentials of a CG model of polystyrene (PS) melts. The resulting CG-PS model effectively reproduces both the structural characteristics, such as the equilibrium probability distribution of microscopic degrees of freedom and the thermodynamic pressure of the AA counterpart. More importantly, leveraging the multiobjective optimization capability, we develop a precise and efficient CG model for PS melts that is transferable across a wide range of temperatures, i.e., from 400 to 600 K. It is achieved via optimizing a pairwise potential with nonlinear temperature dependence in the CG model to simultaneously match target data from AA-MD simulations at multiple thermodynamic states. The temperature transferable CG-PS model demonstrates its ability to accurately predict the radial distribution functions and density at different temperatures, including those that are not included in the target thermodynamic states. Our work opens up a promising route for developing accurate and transferable CG models of complex soft-matter systems through multiobjective optimization with differentiable simulation.
Collapse
Affiliation(s)
- Zhenghao Wu
- Department of Chemistry, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, P. R. China
| | - Tianhang Zhou
- College of Carbon Neutrality Future Technology, State Key Laboratory of Heavy Oil Processing, China University of Petroleum (Beijing), Beijing 102249, P. R. China
| |
Collapse
|
7
|
Wang Y, Stebe KJ, de la Fuente-Nunez C, Radhakrishnan R. Computational Design of Peptides for Biomaterials Applications. ACS APPLIED BIO MATERIALS 2024; 7:617-625. [PMID: 36971822 PMCID: PMC11190638 DOI: 10.1021/acsabm.2c01023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
Abstract
Computer-aided molecular design and protein engineering emerge as promising and active subjects in bioengineering and biotechnological applications. On one hand, due to the advancing computing power in the past decade, modeling toolkits and force fields have been put to use for accurate multiscale modeling of biomolecules including lipid, protein, carbohydrate, and nucleic acids. On the other hand, machine learning emerges as a revolutionary data analysis tool that promises to leverage physicochemical properties and structural information obtained from modeling in order to build quantitative protein structure-function relationships. We review recent computational works that utilize state-of-the-art computational methods to engineer peptides and proteins for various emerging biomedical, antimicrobial, and antifreeze applications. We also discuss challenges and possible future directions toward developing a roadmap for efficient biomolecular design and engineering.
Collapse
Affiliation(s)
- Yiming Wang
- Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| | - Kathleen J Stebe
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| | - Cesar de la Fuente-Nunez
- Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Machine Biology Group, Department of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| | - Ravi Radhakrishnan
- Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| |
Collapse
|
8
|
Patel RA, Webb MA. Data-Driven Design of Polymer-Based Biomaterials: High-throughput Simulation, Experimentation, and Machine Learning. ACS APPLIED BIO MATERIALS 2024; 7:510-527. [PMID: 36701125 DOI: 10.1021/acsabm.2c00962] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Polymers, with the capacity to tunably alter properties and response based on manipulation of their chemical characteristics, are attractive components in biomaterials. Nevertheless, their potential as functional materials is also inhibited by their complexity, which complicates rational or brute-force design and realization. In recent years, machine learning has emerged as a useful tool for facilitating materials design via efficient modeling of structure-property relationships in the chemical domain of interest. In this Spotlight, we discuss the emergence of data-driven design of polymers that can be deployed in biomaterials with particular emphasis on complex copolymer systems. We outline recent developments, as well as our own contributions and takeaways, related to high-throughput data generation for polymer systems, methods for surrogate modeling by machine learning, and paradigms for property optimization and design. Throughout this discussion, we highlight key aspects of successful strategies and other considerations that will be relevant to the future design of polymer-based biomaterials with target properties.
Collapse
Affiliation(s)
- Roshan A Patel
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| |
Collapse
|
9
|
Zhang DT, Baldauf L, Roet S, Lervik A, van Erp TS. Highly parallelizable path sampling with minimal rejections using asynchronous replica exchange and infinite swaps. Proc Natl Acad Sci U S A 2024; 121:e2318731121. [PMID: 38315841 PMCID: PMC10873605 DOI: 10.1073/pnas.2318731121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/09/2024] [Indexed: 02/07/2024] Open
Abstract
Capturing rare yet pivotal events poses a significant challenge for molecular simulations. Path sampling provides a unique approach to tackle this issue without altering the potential energy landscape or dynamics, enabling recovery of both thermodynamic and kinetic information. However, despite its exponential acceleration compared to standard molecular dynamics, generating numerous trajectories can still require a long time. By harnessing our recent algorithmic innovations-particularly subtrajectory moves with high acceptance, coupled with asynchronous replica exchange featuring infinite swaps-we establish a highly parallelizable and rapidly converging path sampling protocol, compatible with diverse high-performance computing architectures. We demonstrate our approach on the liquid-vapor phase transition in superheated water, the unfolding of the chignolin protein, and water dissociation. The latter, performed at the ab initio level, achieves comparable statistical accuracy within days, in contrast to a previous study requiring over a year.
Collapse
Affiliation(s)
- Daniel T. Zhang
- Department of Chemistry, Norwegian University of Science and Technology, TrondheimN-7491, Norway
| | - Lukas Baldauf
- Department of Chemistry, Norwegian University of Science and Technology, TrondheimN-7491, Norway
| | - Sander Roet
- Department of Chemistry, Utrecht University, Utrecht3584 CH, Netherlands
| | - Anders Lervik
- Department of Chemistry, Norwegian University of Science and Technology, TrondheimN-7491, Norway
| | - Titus S. van Erp
- Department of Chemistry, Norwegian University of Science and Technology, TrondheimN-7491, Norway
| |
Collapse
|
10
|
Min J, Rong X, Zhang J, Su R, Wang Y, Qi W. Computational Design of Peptide Assemblies. J Chem Theory Comput 2024; 20:532-550. [PMID: 38206800 DOI: 10.1021/acs.jctc.3c01054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
With the ongoing development of peptide self-assembling materials, there is growing interest in exploring novel functional peptide sequences. From short peptides to long polypeptides, as the functionality increases, the sequence space is also expanding exponentially. Consequently, attempting to explore all functional sequences comprehensively through experience and experiments alone has become impractical. By utilizing computational methods, especially artificial intelligence enhanced molecular dynamics (MD) simulation and de novo peptide design, there has been a significant expansion in the exploration of sequence space. Through these methods, a variety of supramolecular functional materials, including fibers, two-dimensional arrays, nanocages, etc., have been designed by meticulously controlling the inter- and intramolecular interactions. In this review, we first provide a brief overview of the current main computational methods and then focus on the computational design methods for various self-assembled peptide materials. Additionally, we introduce some representative protein self-assemblies to offer guidance for the design of self-assembling peptides.
Collapse
Affiliation(s)
- Jiwei Min
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, P. R. China
| | - Xi Rong
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, P. R. China
| | - Jiaxing Zhang
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, P. R. China
| | - Rongxin Su
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, P. R. China
- Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin 300072, P. R. China
- Tianjin Key Laboratory of Membrane Science and Desalination Technology, Tianjin 300072, P. R. China
| | - Yuefei Wang
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, P. R. China
- Tianjin Key Laboratory of Membrane Science and Desalination Technology, Tianjin 300072, P. R. China
| | - Wei Qi
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, P. R. China
- Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin 300072, P. R. China
- Tianjin Key Laboratory of Membrane Science and Desalination Technology, Tianjin 300072, P. R. China
| |
Collapse
|
11
|
Sasselli IR, Coluzza I. Assessment of the MARTINI 3 Performance for Short Peptide Self-Assembly. J Chem Theory Comput 2024; 20:224-238. [PMID: 38113378 PMCID: PMC10782451 DOI: 10.1021/acs.jctc.3c01015] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/30/2023] [Accepted: 11/30/2023] [Indexed: 12/21/2023]
Abstract
The coarse-grained MARTINI force field, initially developed for membranes, has proven to be an exceptional tool for investigating supramolecular peptide assemblies. Over the years, the force field underwent refinements to enhance accuracy, enabling, for example, the reproduction of protein-ligand interactions and constant pH behavior. However, these protein-focused improvements seem to have compromised its ability to model short peptide self-assembly. In this study, we assess the performance of MARTINI 3 in reproducing peptide self-assembly using the well-established diphenylalanine (FF) as our test case. Unlike its success in version 2.1, FF does not even exhibit aggregation in version 3. By systematically exploring parameters for the aromatic side chains and charged backbone beads, we established a parameter set that effectively reproduces tube formation. Remarkably, these parameter adjustments also replicate the self-assembly of other di- and tripeptides and coassemblies. Furthermore, our analysis uncovers pivotal insights for enhancing the performance of MARTINI in modeling short peptide self-assembly. Specifically, we identify issues stemming from overestimated hydrophilicity arising from charged termini and disruptions in π-stacking interactions due to insufficient planarity in aromatic groups and a discrepancy in intermolecular distances between this and backbone-backbone interactions. This investigation demonstrates that strategic modifications can harness the advancements offered by MARTINI 3 for the realm of short peptide self-assembly.
Collapse
Affiliation(s)
- Ivan R. Sasselli
- Centro
de Física de Materiales (CFM), CSIC-UPV/EHU, Paseo Manuel de Lardizabal 5, 20018 San Sebastián, Spain
- Center
for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research
and Technology Alliance (BRTA), Paseo de Miramón 182, 20014 Donostia-San Sebastián, Spain
| | - Ivan Coluzza
- Ikerbasque,
Basque Foundation for Science, Plaza de Euskadi 5, 48009 Bilbao, Spain
- BCMaterials,
Basque Center for Materials, Applications and Nanostructures, UPV/EHU Science Park, 48940 Leioa, Spain
| |
Collapse
|
12
|
An Y, Webb MA, Jacobs WM. Active learning of the thermodynamics-dynamics trade-off in protein condensates. SCIENCE ADVANCES 2024; 10:eadj2448. [PMID: 38181073 PMCID: PMC10775998 DOI: 10.1126/sciadv.adj2448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 12/04/2023] [Indexed: 01/07/2024]
Abstract
Phase-separated biomolecular condensates exhibit a wide range of dynamic properties, which depend on the sequences of the constituent proteins and RNAs. However, it is unclear to what extent condensate dynamics can be tuned without also changing the thermodynamic properties that govern phase separation. Using coarse-grained simulations of intrinsically disordered proteins, we show that the dynamics and thermodynamics of homopolymer condensates are strongly correlated, with increased condensate stability being coincident with low mobilities and high viscosities. We then apply an "active learning" strategy to identify heteropolymer sequences that break this correlation. This data-driven approach and accompanying analysis reveal how heterogeneous amino acid compositions and nonuniform sequence patterning map to a range of independently tunable dynamic and thermodynamic properties of biomolecular condensates. Our results highlight key molecular determinants governing the physical properties of biomolecular condensates and establish design rules for the development of stimuli-responsive biomaterials.
Collapse
Affiliation(s)
- Yaxin An
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
- Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Michael A. Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - William M. Jacobs
- Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
13
|
Airas J, Ding X, Zhang B. Transferable Implicit Solvation via Contrastive Learning of Graph Neural Networks. ACS CENTRAL SCIENCE 2023; 9:2286-2297. [PMID: 38161379 PMCID: PMC10755853 DOI: 10.1021/acscentsci.3c01160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/26/2023] [Accepted: 10/31/2023] [Indexed: 01/03/2024]
Abstract
Implicit solvent models are essential for molecular dynamics simulations of biomolecules, striking a balance between computational efficiency and biological realism. Efforts are underway to develop accurate and transferable implicit solvent models and coarse-grained (CG) force fields in general, guided by a bottom-up approach that matches the CG energy function with the potential of mean force (PMF) defined by the finer system. However, practical challenges arise due to the lack of analytical expressions for the PMF and algorithmic limitations in parameterizing CG force fields. To address these challenges, a machine learning-based approach is proposed, utilizing graph neural networks (GNNs) to represent the solvation free energy and potential contrasting for parameter optimization. We demonstrate the effectiveness of the approach by deriving a transferable GNN implicit solvent model using 600,000 atomistic configurations of six proteins obtained from explicit solvent simulations. The GNN model provides solvation free energy estimations much more accurately than state-of-the-art implicit solvent models, reproducing configurational distributions of explicit solvent simulations. We also demonstrate the reasonable transferability of the GNN model outside of the training data. Our study offers valuable insights for deriving systematically improvable implicit solvent models and CG force fields from a bottom-up perspective.
Collapse
Affiliation(s)
- Justin Airas
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, United
States
| | - Xinqiang Ding
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, United
States
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, United
States
| |
Collapse
|
14
|
Tang Y, Kim JY, Ip CKM, Bahmani A, Chen Q, Rosenberger MG, Esser-Kahn AP, Ferguson AL. Data-driven discovery of innate immunomodulators via machine learning-guided high throughput screening. Chem Sci 2023; 14:12747-12766. [PMID: 38020385 PMCID: PMC10646978 DOI: 10.1039/d3sc03613h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 10/18/2023] [Indexed: 12/01/2023] Open
Abstract
The innate immune response is vital for the success of prophylactic vaccines and immunotherapies. Control of signaling in innate immune pathways can improve prophylactic vaccines by inhibiting unfavorable systemic inflammation and immunotherapies by enhancing immune stimulation. In this work, we developed a machine learning-enabled active learning pipeline to guide in vitro experimental screening and discovery of small molecule immunomodulators that improve immune responses by altering the signaling activity of innate immune responses stimulated by traditional pattern recognition receptor agonists. Molecules were tested by in vitro high throughput screening (HTS) where we measured modulation of the nuclear factor κ-light-chain-enhancer of activated B-cells (NF-κB) and the interferon regulatory factors (IRF) pathways. These data were used to train data-driven predictive models linking molecular structure to modulation of the NF-κB and IRF responses using deep representational learning, Gaussian process regression, and Bayesian optimization. By interleaving successive rounds of model training and in vitro HTS, we performed an active learning-guided traversal of a 139 998 molecule library. After sampling only ∼2% of the library, we discovered viable molecules with unprecedented immunomodulatory capacity, including those capable of suppressing NF-κB activity by up to 15-fold, elevating NF-κB activity by up to 5-fold, and elevating IRF activity by up to 6-fold. We extracted chemical design rules identifying particular chemical fragments as principal drivers of specific immunomodulation behaviors. We validated the immunomodulatory effect of a subset of our top candidates by measuring cytokine release profiles. Of these, one molecule induced a 3-fold enhancement in IFN-β production when delivered with a cyclic di-nucleotide stimulator of interferon genes (STING) agonist. In sum, our machine learning-enabled screening approach presents an efficient immunomodulator discovery pipeline that has furnished a library of novel small molecules with a strong capacity to enhance or suppress innate immune signaling pathways to shape and improve prophylactic vaccination and immunotherapies.
Collapse
Affiliation(s)
- Yifeng Tang
- Pritzker School of Molecular Engineering, University of Chicago Chicago IL 60637 USA
| | - Jeremiah Y Kim
- Pritzker School of Molecular Engineering, University of Chicago Chicago IL 60637 USA
| | - Carman K M Ip
- Cellular Screening Center, University of Chicago Chicago IL 60637 USA
| | - Azadeh Bahmani
- Cellular Screening Center, University of Chicago Chicago IL 60637 USA
| | - Qing Chen
- Pritzker School of Molecular Engineering, University of Chicago Chicago IL 60637 USA
| | - Matthew G Rosenberger
- Pritzker School of Molecular Engineering, University of Chicago Chicago IL 60637 USA
| | - Aaron P Esser-Kahn
- Pritzker School of Molecular Engineering, University of Chicago Chicago IL 60637 USA
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago Chicago IL 60637 USA
| |
Collapse
|
15
|
Jones MS, Shmilovich K, Ferguson AL. DiAMoNDBack: Diffusion-Denoising Autoregressive Model for Non-Deterministic Backmapping of Cα Protein Traces. J Chem Theory Comput 2023; 19:7908-7923. [PMID: 37906711 DOI: 10.1021/acs.jctc.3c00840] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long time scales, such as aggregation and folding. The reduced resolution realizes computational accelerations, but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only Cα coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the Cα trace and previously backmapped backbone and side-chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side-chain all-atom configurations consistent with the coarse-grained Cα trace. We train DiAMoNDBack over 65k+ structures from the Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side-chain clashes, and the diversity of the generated side-chain configurational states. We make the DiAMoNDBack model publicly available as a free and open-source Python package.
Collapse
Affiliation(s)
- Michael S Jones
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
16
|
Wang J, Liu Z, Zhao S, Xu T, Wang H, Li SZ, Li W. Deep Learning Empowers the Discovery of Self-Assembling Peptides with Over 10 Trillion Sequences. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2301544. [PMID: 37749875 PMCID: PMC10625107 DOI: 10.1002/advs.202301544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 08/03/2023] [Indexed: 09/27/2023]
Abstract
Self-assembling of peptides is essential for a variety of biological and medical applications. However, it is challenging to investigate the self-assembling properties of peptides within the complete sequence space due to the enormous sequence quantities. Here, it is demonstrated that a transformer-based deep learning model is effective in predicting the aggregation propensity (AP) of peptide systems, even for decapeptide and mixed-pentapeptide systems with over 10 trillion sequence quantities. Based on the predicted AP values, not only the aggregation laws for designing self-assembling peptides are derived, but the transferability relation among the APs of pentapeptides, decapeptides, and mixed pentapeptides is also revealed, leading to discoveries of self-assembling peptides by concatenating or mixing, as consolidated by experiments. This deep learning approach enables speedy, accurate, and thorough search and design of self-assembling peptides within the complete sequence space of oligopeptides, advancing peptide science by inspiring new biological and medical applications.
Collapse
Affiliation(s)
- Jiaqi Wang
- Research Center for Industries of the FutureWestlake UniversityHangzhou310030China
- School of EngineeringWestlake UniversityHangzhou310030China
| | - Zihan Liu
- AI LabResearch Center for Industries of the FutureWestlake UniversityHangzhou310030China
| | - Shuang Zhao
- Research Center for Industries of the FutureWestlake UniversityHangzhou310030China
- School of EngineeringWestlake UniversityHangzhou310030China
| | - Tengyan Xu
- Department of ChemistrySchool of ScienceWestlake UniversityHangzhou310030China
- Institute of Natural SciencesWestlake Institute for Advanced Study18 Shilongshan RoadHangzhouZhejiang Province310024China
| | - Huaimin Wang
- Department of ChemistrySchool of ScienceWestlake UniversityHangzhou310030China
- Institute of Natural SciencesWestlake Institute for Advanced Study18 Shilongshan RoadHangzhouZhejiang Province310024China
| | - Stan Z. Li
- AI LabResearch Center for Industries of the FutureWestlake UniversityHangzhou310030China
| | - Wenbin Li
- Research Center for Industries of the FutureWestlake UniversityHangzhou310030China
- School of EngineeringWestlake UniversityHangzhou310030China
| |
Collapse
|
17
|
Airas J, Ding X, Zhang B. Transferable Coarse Graining via Contrastive Learning of Graph Neural Networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.08.556923. [PMID: 37745447 PMCID: PMC10515757 DOI: 10.1101/2023.09.08.556923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Coarse-grained (CG) force fields are essential for molecular dynamics simulations of biomolecules, striking a balance between computational efficiency and biological realism. These simulations employ simplified models grouping atoms into interaction sites, enabling the study of complex biomolecular systems over biologically relevant timescales. Efforts are underway to develop accurate and transferable CG force fields, guided by a bottom-up approach that matches the CG energy function with the potential of mean force (PMF) defined by the finer system. However, practical challenges arise due to many-body effects, lack of analytical expressions for the PMF, and limitations in parameterizing CG force fields. To address these challenges, a machine learning-based approach is proposed, utilizing graph neural networks (GNNs) to represent CG force fields and potential contrasting for parameterization from atomistic simulation data. We demonstrate the effectiveness of the approach by deriving a transferable GNN implicit solvent model using 600,000 atomistic configurations of six proteins obtained from explicit solvent simulations. The GNN model provides solvation free energy estimations much more accurately than state-of-the-art implicit solvent models, reproducing configurational distributions of explicit solvent simulations. We also demonstrate the reasonable transferability of the GNN model outside the training data. Our study offers valuable insights for building accurate coarse-grained models bottom-up.
Collapse
Affiliation(s)
- Justin Airas
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Xinqiang Ding
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
18
|
Yu T, Boob AG, Singh N, Su Y, Zhao H. In vitro continuous protein evolution empowered by machine learning and automation. Cell Syst 2023; 14:633-644. [PMID: 37224814 DOI: 10.1016/j.cels.2023.04.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Revised: 11/19/2022] [Accepted: 04/20/2023] [Indexed: 05/26/2023]
Abstract
Directed evolution has become one of the most successful and powerful tools for protein engineering. However, the efforts required for designing, constructing, and screening a large library of variants can be laborious, time-consuming, and costly. With the recent advent of machine learning (ML) in the directed evolution of proteins, researchers can now evaluate variants in silico and guide a more efficient directed evolution campaign. Furthermore, recent advancements in laboratory automation have enabled the rapid execution of long, complex experiments for high-throughput data acquisition in both industrial and academic settings, thus providing the means to collect a large quantity of data required to develop ML models for protein engineering. In this perspective, we propose a closed-loop in vitro continuous protein evolution framework that leverages the best of both worlds, ML and automation, and provide a brief overview of the recent developments in the field.
Collapse
Affiliation(s)
- Tianhao Yu
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, Urbana, IL, USA; NSF Molecule Maker Lab Institute, Urbana, IL, USA
| | - Aashutosh Girish Boob
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, Urbana, IL, USA; DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Nilmani Singh
- DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Yufeng Su
- NSF Molecule Maker Lab Institute, Urbana, IL, USA; Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Huimin Zhao
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, Urbana, IL, USA; NSF Molecule Maker Lab Institute, Urbana, IL, USA; DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| |
Collapse
|
19
|
Sori L, Pizzi A, Bergamaschi G, Gori A, Gautieri A, Demitri N, Soncini M, Metrangolo P. Computation meets experiment: identification of highly efficient fibrillating peptides. CrystEngComm 2023; 25:4503-4510. [PMID: 38014394 PMCID: PMC10424810 DOI: 10.1039/d3ce00495c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 07/03/2023] [Indexed: 11/29/2023]
Abstract
Self-assembling peptides are of huge interest for biological, medical and nanotechnological applications. The enormous chemical variety that is available from the 20 amino acids offers potentially unlimited peptide sequences, but it is currently an issue to predict their supramolecular behavior in a reliable and cheap way. Herein we report a computational method to screen and forecast the aqueous self-assembly propensity of amyloidogenic pentapeptides. This method was found also as an interesting tool to predict peptide crystallinity, which may be of interest for the development of peptide based drugs.
Collapse
Affiliation(s)
- Lorenzo Sori
- Laboratory of Supramolecular and BioNano Materials (SupraBioNanoLab), Department of Chemistry, Materials, and Chemical Engineering "Giulio Natta", Politecnico di Milano Via Luigi Mancinelli 7 20131 Milan Italy
| | - Andrea Pizzi
- Laboratory of Supramolecular and BioNano Materials (SupraBioNanoLab), Department of Chemistry, Materials, and Chemical Engineering "Giulio Natta", Politecnico di Milano Via Luigi Mancinelli 7 20131 Milan Italy
| | - Greta Bergamaschi
- Istituto di Scienze e Tecnologie Chimiche - National Research Council of Italy (SCITEC-CNR) 20131 Milan Italy
| | - Alessandro Gori
- Istituto di Scienze e Tecnologie Chimiche - National Research Council of Italy (SCITEC-CNR) 20131 Milan Italy
| | - Alfonso Gautieri
- Department of Electronics, Information and Bioengineering, Politecnico di Milano 20131 Milan Italy
| | - Nicola Demitri
- Elettra - Sincrotrone Trieste S.S. 14 Km 163.5 in Area Science Park 34149 Basovizza - Trieste Italy
| | - Monica Soncini
- Department of Electronics, Information and Bioengineering, Politecnico di Milano 20131 Milan Italy
| | - Pierangelo Metrangolo
- Laboratory of Supramolecular and BioNano Materials (SupraBioNanoLab), Department of Chemistry, Materials, and Chemical Engineering "Giulio Natta", Politecnico di Milano Via Luigi Mancinelli 7 20131 Milan Italy
| |
Collapse
|
20
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
21
|
Kazmirchuk TDD, Bradbury-Jost C, Withey TA, Gessese T, Azad T, Samanfar B, Dehne F, Golshani A. Peptides of a Feather: How Computation Is Taking Peptide Therapeutics under Its Wing. Genes (Basel) 2023; 14:1194. [PMID: 37372372 PMCID: PMC10298604 DOI: 10.3390/genes14061194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 05/24/2023] [Accepted: 05/26/2023] [Indexed: 06/29/2023] Open
Abstract
Leveraging computation in the development of peptide therapeutics has garnered increasing recognition as a valuable tool to generate novel therapeutics for disease-related targets. To this end, computation has transformed the field of peptide design through identifying novel therapeutics that exhibit enhanced pharmacokinetic properties and reduced toxicity. The process of in-silico peptide design involves the application of molecular docking, molecular dynamics simulations, and machine learning algorithms. Three primary approaches for peptide therapeutic design including structural-based, protein mimicry, and short motif design have been predominantly adopted. Despite the ongoing progress made in this field, there are still significant challenges pertaining to peptide design including: enhancing the accuracy of computational methods; improving the success rate of preclinical and clinical trials; and developing better strategies to predict pharmacokinetics and toxicity. In this review, we discuss past and present research pertaining to the design and development of in-silico peptide therapeutics in addition to highlighting the potential of computation and artificial intelligence in the future of disease therapeutics.
Collapse
Affiliation(s)
- Thomas David Daniel Kazmirchuk
- Department of Biology, and the Ottawa Institute of Systems Biology (OISB), Carleton University, Ottawa, ON K1S 5B6, Canada
| | - Calvin Bradbury-Jost
- Department of Biology, and the Ottawa Institute of Systems Biology (OISB), Carleton University, Ottawa, ON K1S 5B6, Canada
| | - Taylor Ann Withey
- Department of Biology, and the Ottawa Institute of Systems Biology (OISB), Carleton University, Ottawa, ON K1S 5B6, Canada
| | - Tadesse Gessese
- Department of Biology, and the Ottawa Institute of Systems Biology (OISB), Carleton University, Ottawa, ON K1S 5B6, Canada
| | - Taha Azad
- Department of Microbiology and Infectious Diseases, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke (CHUS), Sherbrooke, QC J1H 5N4, Canada
| | - Bahram Samanfar
- Department of Biology, and the Ottawa Institute of Systems Biology (OISB), Carleton University, Ottawa, ON K1S 5B6, Canada
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre (ORDC), Ottawa, ON K1A 0C6, Canada
| | - Frank Dehne
- School of Computer Science, Carleton University, Ottawa, ON K1S 5B6, Canada
| | - Ashkan Golshani
- Department of Biology, and the Ottawa Institute of Systems Biology (OISB), Carleton University, Ottawa, ON K1S 5B6, Canada
| |
Collapse
|
22
|
Meyer T, Ramirez C, Tamasi MJ, Gormley AJ. A User's Guide to Machine Learning for Polymeric Biomaterials. ACS POLYMERS AU 2023; 3:141-157. [PMID: 37065715 PMCID: PMC10103193 DOI: 10.1021/acspolymersau.2c00037] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 10/27/2022] [Accepted: 10/27/2022] [Indexed: 11/18/2022]
Abstract
The development of novel biomaterials is a challenging process, complicated by a design space with high dimensionality. Requirements for performance in the complex biological environment lead to difficult a priori rational design choices and time-consuming empirical trial-and-error experimentation. Modern data science practices, especially artificial intelligence (AI)/machine learning (ML), offer the promise to help accelerate the identification and testing of next-generation biomaterials. However, it can be a daunting task for biomaterial scientists unfamiliar with modern ML techniques to begin incorporating these useful tools into their development pipeline. This Perspective lays the foundation for a basic understanding of ML while providing a step-by-step guide to new users on how to begin implementing these techniques. A tutorial Python script has been developed walking users through the application of an ML pipeline using data from a real biomaterial design challenge based on group's research. This tutorial provides an opportunity for readers to see and experiment with ML and its syntax in Python. The Google Colab notebook can be easily accessed and copied from the following URL: www.gormleylab.com/MLcolab.
Collapse
Affiliation(s)
- Travis
A. Meyer
- Department of Biomedical
Engineering, Rutgers, The State University
of New Jersey, Piscataway, New Jersey 08854, United States
| | - Cesar Ramirez
- Department of Biomedical
Engineering, Rutgers, The State University
of New Jersey, Piscataway, New Jersey 08854, United States
| | - Matthew J. Tamasi
- Department of Biomedical
Engineering, Rutgers, The State University
of New Jersey, Piscataway, New Jersey 08854, United States
| | - Adam J. Gormley
- Department of Biomedical
Engineering, Rutgers, The State University
of New Jersey, Piscataway, New Jersey 08854, United States
| |
Collapse
|
23
|
Ricci E, Vergadou N. Integrating Machine Learning in the Coarse-Grained Molecular Simulation of Polymers. J Phys Chem B 2023; 127:2302-2322. [PMID: 36888553 DOI: 10.1021/acs.jpcb.2c06354] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
Machine learning (ML) is having an increasing impact on the physical sciences, engineering, and technology and its integration into molecular simulation frameworks holds great potential to expand their scope of applicability to complex materials and facilitate fundamental knowledge and reliable property predictions, contributing to the development of efficient materials design routes. The application of ML in materials informatics in general, and polymer informatics in particular, has led to interesting results, however great untapped potential lies in the integration of ML techniques into the multiscale molecular simulation methods for the study of macromolecular systems, specifically in the context of Coarse Grained (CG) simulations. In this Perspective, we aim at presenting the pioneering recent research efforts in this direction and discussing how these new ML-based techniques can contribute to critical aspects of the development of multiscale molecular simulation methods for bulk complex chemical systems, especially polymers. Prerequisites for the implementation of such ML-integrated methods and open challenges that need to be met toward the development of general systematic ML-based coarse graining schemes for polymers are discussed.
Collapse
Affiliation(s)
- Eleonora Ricci
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
- Institute of Informatics and Telecommunications, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
| | - Niki Vergadou
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
| |
Collapse
|
24
|
Gavrilov AA, Potemkin II. Copolymers with Nonblocky Sequences as Novel Materials with Finely Tuned Properties. J Phys Chem B 2023; 127:1479-1489. [PMID: 36790352 DOI: 10.1021/acs.jpcb.2c07689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
The copolymer sequence can be considered as a new tool to shape the resulting system properties on demand. This perspective is devoted to copolymers with "partially segregated" (or nonblocky) sequences. Such copolymers include gradient copolymers and copolymers with random sequences as well as copolymers with precisely controlled sequences. We overview recent developments in the synthesis of these systems as well as new findings regarding their properties, in particular, self-assembly in solutions and in melts. An emphasis is put on how the microscopic behavior of polymer chains is influenced by the chain sequences. In addition to that, a novel class of approaches allowing one to efficiently tackle the problem of copolymer chain sequence design─data driven methods (artificial intelligence and machine learning)─is discussed.
Collapse
Affiliation(s)
- Alexey A Gavrilov
- Physics Department, Lomonosov Moscow State University, Moscow 119991, Russian Federation.,Semenov Federal Research Center for Chemical Physics, Moscow 119991, Russian Federation
| | - Igor I Potemkin
- Physics Department, Lomonosov Moscow State University, Moscow 119991, Russian Federation
| |
Collapse
|
25
|
Qi X, Hu Y, Wang R, Yang Y, Zhao Y. Recent Advance of Machine Learning in Selecting New Materials. ACTA CHIMICA SINICA 2023. [DOI: 10.6023/a22110446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|
26
|
Khan P, Kaushik R, Jayaraj A. Approaches and Perspective of Coarse-Grained Modeling and Simulation for Polymer-Nanoparticle Hybrid Systems. ACS OMEGA 2022; 7:47567-47586. [PMID: 36591142 PMCID: PMC9798744 DOI: 10.1021/acsomega.2c06248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/21/2022] [Indexed: 06/17/2023]
Abstract
Molecular modeling and simulations have emerged as effective and indispensable tools to characterize polymeric systems. They provide fundamental and essential insights to design a product of the required properties and to improve the understanding of a phenomenon at the molecular level for a particular system. The polymer-nanoparticle hybrids are materials with outstanding properties and correspondingly large applications whose study has benefited from this new paradigm. However, despite the significant expansion of modern day computational powers, investigation of the long time and large length scale phenomenon in polymeric and polymer-nanoparticle systems is still a challenging task to complete through all-atom molecular dynamics (AA-MD) simulations. To circumvent this problem, a variety of coarse-grained (CG) models have been proposed, ranging from the generic CG models for qualitative properties predictions to more realistic chemically specific CG models for quantitative properties predictions. These CG models have already delivered some success stories in the study of several spatial and temporal evolutions of many processes. Some of these studies were beyond the feasibility of traditional atomistic resolution models due to either the size or the time constraints. This review captures the different types of popular CG approaches that are utilized in the investigation of the microscopic behavior of polymer-nanoparticle hybrid systems. The rationale of this article is to furnish an overview of the popular CG approaches and their applications, to review several important and most recent developments, and to delineate the perspectives on future directions in the field.
Collapse
Affiliation(s)
- Parvez Khan
- Department
of Chemical Engineering, Aligarh Muslim
University, Aligarh202002, India
| | - Rahul Kaushik
- Laboratory
for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, Yokohama, Kanagawa230-0045, Japan
| | - Abhilash Jayaraj
- Department
of Chemistry, Wesleyan University, Middletown, Connecticut06459, United States
| |
Collapse
|
27
|
Ferguson AL, Tovar JD. Evolution of π-Peptide Self-Assembly: From Understanding to Prediction and Control. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2022; 38:15463-15475. [PMID: 36475709 DOI: 10.1021/acs.langmuir.2c02399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Supramolecular materials derived from the self-assembly of engineered molecules continue to garner tremendous scientific and technological interest. Recent innovations include the realization of nano- and mesoscale particles (0D), rods and fibrils (1D), sheets (2D), and even extended lattices (3D). Our research groups have focused attention over the past 15 years on one particular class of supramolecular materials derived from oligopeptides with embedded π-electron units, where the oligopeptides can be viewed as substituents or side chains to direct the assembly of the central π-electron cores. Upon assembly, the π-systems are driven into close cofacial architectures that facilitate a variety of energy migration processes within the nanomaterial volume, including exciton transport, voltage transmission, and photoinduced electron transfer. Like many practitioners of supramolecular materials science, many of our initial molecular designs were designed with substantial inspiration from biologically occurring self-assembly coupled with input from chemical intuition and molecular modeling and simulation. In this feature article, we summarize our current understanding of the π-peptide self-assembly process as documented through our body of publications in this area. We address fundamental spectroscopic and computational tools used to extract information regarding the internal structures and energetics of the π-peptide assemblies, and we address the current state of the art in terms of recent applications of data science tools in conjunction with high-throughput computational screening and experimental assays to guide the efficient traversal of the π-peptide molecular design space. The abstract image details our integrated program of chemical synthesis, spectroscopic and functional characterization, multiscale simulation, and machine learning which has advanced the understanding and control of the assembly of synthetic π-conjugated peptides into supramolecular nanostructures with energy and biomedical applications.
Collapse
Affiliation(s)
- Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - John D Tovar
- Department of Chemistry, Johns Hopkins University, 3400 N. Charles Street, Baltimore, Maryland 21218 United States
| |
Collapse
|
28
|
Shmilovich K, Stieffenhofer M, Charron NE, Hoffmann M. Temporally Coherent Backmapping of Molecular Trajectories From Coarse-Grained to Atomistic Resolution. J Phys Chem A 2022; 126:9124-9139. [PMID: 36417670 PMCID: PMC9743211 DOI: 10.1021/acs.jpca.2c07716] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Coarse-graining offers a means to extend the achievable time and length scales of molecular dynamics simulations beyond what is practically possible in the atomistic regime. Sampling molecular configurations of interest can be done efficiently using coarse-grained simulations, from which meaningful physicochemical information can be inferred if the corresponding all-atom configurations are reconstructed. However, this procedure of backmapping to reintroduce the lost atomistic detail into coarse-grain structures has proven a challenging task due to the many feasible atomistic configurations that can be associated with one coarse-grain structure. Existing backmapping methods are strictly frame-based, relying on either heuristics to replace coarse-grain particles with atomic fragments and subsequent relaxation or parametrized models to propose atomic coordinates separately and independently for each coarse-grain structure. These approaches neglect information from previous trajectory frames that is critical to ensuring temporal coherence of the backmapped trajectory, while also offering information potentially helpful to producing higher-fidelity atomic reconstructions. In this work, we present a deep learning-enabled data-driven approach for temporally coherent backmapping that explicitly incorporates information from preceding trajectory structures. Our method trains a conditional variational autoencoder to nondeterministically reconstruct atomistic detail conditioned on both the target coarse-grain configuration and the previously reconstructed atomistic configuration. We demonstrate our backmapping approach on two exemplar biomolecular systems: alanine dipeptide and the miniprotein chignolin. We show that our backmapped trajectories accurately recover the structural, thermodynamic, and kinetic properties of the atomistic trajectory data.
Collapse
Affiliation(s)
- Kirill Shmilovich
- Pritzker
School of Molecular Engineering, University
of Chicago, Chicago, Illinois60637, United States,E-mail:
| | | | - Nicholas E. Charron
- Weiss
School of Natural Sciences, Department of Physics and Astronomy, Rice University, Houston, Texas77005, United States,Department
of Physics, Freie Universität Berlin, Berlin14195, Germany
| | - Moritz Hoffmann
- Fachbereich
Mathematik und Informatik, Freie Universität
Berlin, Berlin14195, Germany
| |
Collapse
|
29
|
Abstract
Coarse-grained models have proven helpful for simulating complex systems over long time scales to provide molecular insights into various processes. Methodologies for systematic parametrization of the underlying energy function or force field that describes the interactions among different components of the system are of great interest for ensuring simulation accuracy. We present a new method, potential contrasting, to enable efficient learning of force fields that can accurately reproduce the conformational distribution produced with all-atom simulations. Potential contrasting generalizes the noise contrastive estimation method with umbrella sampling to better learn the complex energy landscape of molecular systems. When applied to the Trp-cage protein, we found that the technique produces force fields that thoroughly capture the thermodynamics of the folding process despite the use of only α-carbons in the coarse-grained model. We further showed that potential contrasting could be applied over large data sets that combine the conformational ensembles of many proteins to improve force field transferability. We anticipate potential contrasting as a powerful tool for building general-purpose coarse-grained force fields.
Collapse
Affiliation(s)
- Xinqiang Ding
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
30
|
Magi Meconi G, Sasselli IR, Bianco V, Onuchic JN, Coluzza I. Key aspects of the past 30 years of protein design. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2022; 85:086601. [PMID: 35704983 DOI: 10.1088/1361-6633/ac78ef] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 06/15/2022] [Indexed: 06/15/2023]
Abstract
Proteins are the workhorse of life. They are the building infrastructure of living systems; they are the most efficient molecular machines known, and their enzymatic activity is still unmatched in versatility by any artificial system. Perhaps proteins' most remarkable feature is their modularity. The large amount of information required to specify each protein's function is analogically encoded with an alphabet of just ∼20 letters. The protein folding problem is how to encode all such information in a sequence of 20 letters. In this review, we go through the last 30 years of research to summarize the state of the art and highlight some applications related to fundamental problems of protein evolution.
Collapse
Affiliation(s)
- Giulia Magi Meconi
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | - Ivan R Sasselli
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | | | - Jose N Onuchic
- Center for Theoretical Biological Physics, Department of Physics & Astronomy, Department of Chemistry, Department of Biosciences, Rice University, Houston, TX 77251, United States of America
| | - Ivan Coluzza
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Bld. Martina Casiano, UPV/EHU Science Park, Barrio Sarriena s/n, 48940 Leioa, Spain
- Basque Foundation for Science, Ikerbasque, 48009, Bilbao, Spain
| |
Collapse
|
31
|
Tamasi MJ, Patel RA, Borca CH, Kosuri S, Mugnier H, Upadhya R, Murthy NS, Webb MA, Gormley AJ. Machine Learning on a Robotic Platform for the Design of Polymer-Protein Hybrids. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2022; 34:e2201809. [PMID: 35593444 PMCID: PMC9339531 DOI: 10.1002/adma.202201809] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 04/26/2022] [Indexed: 06/04/2023]
Abstract
Polymer-protein hybrids are intriguing materials that can bolster protein stability in non-native environments, thereby enhancing their utility in diverse medicinal, commercial, and industrial applications. One stabilization strategy involves designing synthetic random copolymers with compositions attuned to the protein surface, but rational design is complicated by the vast chemical and composition space. Here, a strategy is reported to design protein-stabilizing copolymers based on active machine learning, facilitated by automated material synthesis and characterization platforms. The versatility and robustness of the approach is demonstrated by the successful identification of copolymers that preserve, or even enhance, the activity of three chemically distinct enzymes following exposure to thermal denaturing conditions. Although systematic screening results in mixed success, active learning appropriately identifies unique and effective copolymer chemistries for the stabilization of each enzyme. Overall, this work broadens the capabilities to design fit-for-purpose synthetic copolymers that promote or otherwise manipulate protein activity, with extensions toward the design of robust polymer-protein hybrid materials.
Collapse
Affiliation(s)
- Matthew J Tamasi
- Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Roshan A Patel
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, 08544, USA
| | - Carlos H Borca
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, 08544, USA
| | - Shashank Kosuri
- Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Heloise Mugnier
- Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Rahul Upadhya
- Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - N Sanjeeva Murthy
- Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, 08544, USA
| | - Adam J Gormley
- Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| |
Collapse
|
32
|
Kohn EM, Shirley DJ, Hinds NM, Fry HC, Caputo GA. Peptide‐assisted
supramolecular polymerization of the anionic porphyrin
meso‐tetra
(
4‐sulfonatophenyl
)porphine. Pept Sci (Hoboken) 2022. [DOI: 10.1002/pep2.24288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Eric M. Kohn
- Department of Chemistry & Biochemistry Rowan University Glassboro New Jersey USA
- Bantivoglio Honors College Rowan University Glassboro New Jersey USA
- Department of Chemistry University of Wisconsin Madison Wisconsin USA
| | - David J. Shirley
- Department of Chemistry & Biochemistry Rowan University Glassboro New Jersey USA
- Division of Chemical Biology and Medicinal Chemistry Eshelman School of Pharmacy, University of North Carolina Chapel Hill North Carolina USA
| | - Nicole M. Hinds
- Department of Chemistry & Biochemistry Rowan University Glassboro New Jersey USA
| | - H. Christopher Fry
- Argonne National Laboratory Center for Nanoscale Materials Lemont Illinois USA
| | - Gregory A. Caputo
- Department of Chemistry & Biochemistry Rowan University Glassboro New Jersey USA
| |
Collapse
|
33
|
Quach CD, Gilmer JB, Pert D, Mason-Hogans A, Iacovella CR, Cummings PT, McCabe C. High-throughput screening of tribological properties of monolayer films using molecular dynamics and machine learning. J Chem Phys 2022; 156:154902. [PMID: 35459321 DOI: 10.1063/5.0080838] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Monolayer films have shown promise as a lubricating layer to reduce friction and wear of mechanical devices with separations on the nanoscale. These films have a vast design space with many tunable properties that can affect their tribological effectiveness. For example, terminal group chemistry, film composition, and backbone chemistry can all lead to films with significantly different tribological properties. This design space, however, is very difficult to explore without a combinatorial approach and an automatable, reproducible, and extensible workflow to screen for promising candidate films. Using the Molecular Simulation Design Framework (MoSDeF), a combinatorial screening study was performed to explore 9747 unique monolayer films (116 964 total simulations) and a machine learning (ML) model using a random forest regressor, an ensemble learning technique, to explore the role of terminal group chemistry and its effect on tribological effectiveness. The most promising films were found to contain small terminal groups such as cyano and ethylene. The ML model was subsequently applied to screen terminal group candidates identified from the ChEMBL small molecule library. Approximately 193 131 unique film candidates were screened with approximately a five order of magnitude speed-up in analysis compared to simulation alone. The ML model was thus able to be used as a predictive tool to greatly speed up the initial screening of promising candidate films for future simulation studies, suggesting that computational screening in combination with ML can greatly increase the throughput in combinatorial approaches to generate in silico data and then train ML models in a controlled, self-consistent fashion.
Collapse
Affiliation(s)
- Co D Quach
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37235, USA
| | - Justin B Gilmer
- Interdiscplinary Materials Science, Vanderbilt University, Nashville, Tennessee 37235, USA
| | - Daniel Pert
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37235, USA
| | - Akanke Mason-Hogans
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37235, USA
| | - Christopher R Iacovella
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37235, USA
| | - Peter T Cummings
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37235, USA
| | - Clare McCabe
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37235, USA
| |
Collapse
|
34
|
Mohr B, Shmilovich K, Kleinwächter IS, Schneider D, Ferguson AL, Bereau T. Data-driven discovery of cardiolipin-selective small molecules by computational active learning. Chem Sci 2022; 13:4498-4511. [PMID: 35656132 PMCID: PMC9019913 DOI: 10.1039/d2sc00116k] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 02/24/2022] [Indexed: 12/23/2022] Open
Abstract
Subtle variations in the lipid composition of mitochondrial membranes can have a profound impact on mitochondrial function. The inner mitochondrial membrane contains the phospholipid cardiolipin, which has been demonstrated to act as a biomarker for a number of diverse pathologies. Small molecule dyes capable of selectively partitioning into cardiolipin membranes enable visualization and quantification of the cardiolipin content. Here we present a data-driven approach that combines a deep learning-enabled active learning workflow with coarse-grained molecular dynamics simulations and alchemical free energy calculations to discover small organic compounds able to selectively permeate cardiolipin-containing membranes. By employing transferable coarse-grained models we efficiently navigate the all-atom design space corresponding to small organic molecules with molecular weight less than ≈500 Da. After direct simulation of only 0.42% of our coarse-grained search space we identify molecules with considerably increased levels of cardiolipin selectivity compared to a widely used cardiolipin probe 10-N-nonyl acridine orange. Our accumulated simulation data enables us to derive interpretable design rules linking coarse-grained structure to cardiolipin selectivity. The findings are corroborated by fluorescence anisotropy measurements of two compounds conforming to our defined design rules. Our findings highlight the potential of coarse-grained representations and multiscale modelling for materials discovery and design.
Collapse
Affiliation(s)
- Bernadette Mohr
- Van't Hoff Institute for Molecular Sciences and Informatics Institute, University of Amsterdam Amsterdam 1098 XH The Netherlands
| | - Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago Chicago Illinois 60637 USA
| | - Isabel S Kleinwächter
- Department of Chemistry - Biochemistry, Johannes Gutenberg University Mainz 55128 Mainz Germany
| | - Dirk Schneider
- Department of Chemistry - Biochemistry, Johannes Gutenberg University Mainz 55128 Mainz Germany
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago Chicago Illinois 60637 USA
| | - Tristan Bereau
- Van't Hoff Institute for Molecular Sciences and Informatics Institute, University of Amsterdam Amsterdam 1098 XH The Netherlands
- Max Planck Institute for Polymer Research 55128 Mainz Germany
| |
Collapse
|
35
|
Abstract
Optimal design of polymers is a challenging task due to their enormous chemical and configurational space. Recent advances in computations, machine learning, and increasing trends in data and software availability can potentially address this problem and accelerate the molecular-scale design of polymers. Here, the central problem of polymer design is reviewed, and the general ideas of data-driven methods and their working principles in the context of polymer design are discussed. This Review provides a historical perspective and a summary of current trends and outlines future scopes of data-driven methods for polymer research. A few representative case studies on the use of such data-driven methods for discovering new polymers with exceptional properties are presented. Moreover, attempts are made to highlight how data-driven strategies aid in establishing new correlations and advancing the fundamental understanding of polymers. This Review posits that the combination of machine learning, rapid computational characterization of polymers, and availability of large open-sourced homogeneous data will transform polymer research and development over the coming decades. It is hoped that this Review will serve as a useful reference to researchers who wish to develop and deploy data-driven methods for polymer research and education.
Collapse
Affiliation(s)
- Tarak K. Patra
- Department of Chemical Engineering,
Center for Atomistic Modeling and Materials Design and Center for
Carbon Capture Utilization and Storage, Indian Institute of Technology Madras, Chennai, TN 600036, India
| |
Collapse
|
36
|
Gianti E, Percec S. Machine Learning at the Interface of Polymer Science and Biology: How Far Can We Go? Biomacromolecules 2022; 23:576-591. [PMID: 35133143 DOI: 10.1021/acs.biomac.1c01436] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This Perspective outlines recent progress and future directions for using machine learning (ML), a data-driven method, to address critical questions in the design, synthesis, processing, and characterization of biomacromolecules. The achievement of these tasks requires the navigation of vast and complex chemical and biological spaces, difficult to accomplish with reasonable speed. Using modern algorithms and supercomputers, quantum physics methods are able to examine systems containing a few hundred interacting species and determine the probability of finding them in a particular region of phase space, thereby anticipating their properties. Likewise, modern approaches in chemistry and biomolecular simulation, supported by high performance computing, have culminated in producing data sets of escalating size and intrinsically high complexity. Hence, using ML to extract relevant information from these fields is of paramount importance to advance our understanding of chemical and biomolecular systems. At the heart of ML approaches lie statistical algorithms, which by evaluating a portion of a given data set, identify, learn, and manipulate the underlying rules that govern the whole data set. The assembly of a quality model to represent the data followed by the predictions and elimination of error sources are the key steps in ML. In addition to a growing infrastructure of ML tools to address complex problems, an increasing number of aspects related to our understanding of the fundamental properties of biomacromolecules are exposed to ML. These fields, including those residing at the interface of polymer science and biology (i.e., structure determination, de novo design, folding, and dynamics), strive to adopt and take advantage of the transformative power offered by approaches in the ML domain, which clearly has the potential of accelerating research in the field of biomacromolecules.
Collapse
Affiliation(s)
- Eleonora Gianti
- Institute for Computational Molecular Science (ICMS), Temple University, Philadelphia, Pennsylvania 19122, United States.,Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
| | - Simona Percec
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
| |
Collapse
|
37
|
Ferguson AL, Brown KA. Data-Driven Design and Autonomous Experimentation in Soft and Biological Materials Engineering. Annu Rev Chem Biomol Eng 2022; 13:25-44. [PMID: 35236085 DOI: 10.1146/annurev-chembioeng-092120-020803] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This article reviews recent developments in the applications of machine learning, data-driven modeling, transfer learning, and autonomous experimentation for the discovery, design, and optimization of soft and biological materials. The design and engineering of molecules and molecular systems have long been a preoccupation of chemical and biomolecular engineers using a variety of computational and experimental techniques. Increasingly, researchers have looked to emerging and established tools in artificial intelligence and machine learning to integrate with established approaches in chemical science to realize powerful, efficient, and in some cases autonomous platforms for molecular discovery, materials engineering, and process optimization. This review summarizes the basic principles underpinning these techniques and highlights recent successful example applications in autonomous materials discovery, transfer learning, and multi-fidelity active learning. Expected final online publication date for the Annual Review of Chemical and Biomolecular Engineering, Volume 13 is October 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, USA;
| | - Keith A Brown
- Mechanical Engineering, Boston University, Boston, Massachusetts 02215, USA;
| |
Collapse
|
38
|
Nguyen D, Tao L, Li Y. Integration of Machine Learning and Coarse-Grained Molecular Simulations for Polymer Materials: Physical Understandings and Molecular Design. Front Chem 2022; 9:820417. [PMID: 35141207 PMCID: PMC8819075 DOI: 10.3389/fchem.2021.820417] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 12/31/2021] [Indexed: 12/21/2022] Open
Abstract
In recent years, the synthesis of monomer sequence-defined polymers has expanded into broad-spectrum applications in biomedical, chemical, and materials science fields. Pursuing the characterization and inverse design of these polymer systems requires our fundamental understanding not only at the individual monomer level, but also considering the chain scales, such as polymer configuration, self-assembly, and phase separation. However, our accessibility to this field is still rudimentary due to the limitations of traditional design approaches, the complexity of chemical space along with the burdened cost and time issues that prevent us from unveiling the underlying monomer sequence-structure-property relationships. Fortunately, thanks to the recent advancements in molecular dynamics simulations and machine learning (ML) algorithms, the bottlenecks in the tasks of establishing the structure-function correlation of the polymer chains can be overcome. In this review, we will discuss the applications of the integration between ML techniques and coarse-grained molecular dynamics (CGMD) simulations to solve the current issues in polymer science at the chain level. In particular, we focus on the case studies in three important topics-polymeric configuration characterization, feed-forward property prediction, and inverse design-in which CGMD simulations are leveraged to generate training datasets to develop ML-based surrogate models for specific polymer systems and designs. By doing so, this computational hybridization allows us to well establish the monomer sequence-functional behavior relationship of the polymers as well as guide us toward the best polymer chain candidates for the inverse design in undiscovered chemical space with reasonable computational cost and time. Even though there are still limitations and challenges ahead in this field, we finally conclude that this CGMD/ML integration is very promising, not only in the attempt of bridging the monomeric and macroscopic characterizations of polymer materials, but also enabling further tailored designs for sequence-specific polymers with superior properties in many practical applications.
Collapse
Affiliation(s)
- Danh Nguyen
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
| | - Lei Tao
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
- Polymer Program, Institute of Materials Science, University of Connecticut, Mansfield, CT, United States
| |
Collapse
|
39
|
Iovanac NC, MacKnight R, Savoie BM. Actively Searching: Inverse Design of Novel Molecules with Simultaneously Optimized Properties. J Phys Chem A 2022; 126:333-340. [PMID: 34985908 DOI: 10.1021/acs.jpca.1c08191] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Combining quantum chemistry characterizations with generative machine learning models has the potential to accelerate molecular discovery. In this paradigm, quantum chemistry acts as a relatively cost-effective oracle for evaluating the properties of particular molecules, while generative models provide a means of sampling chemical space based on learned structure-function relationships. For practical applications, multiple potentially orthogonal properties must be optimized in tandem during a discovery workflow. This carries additional difficulties associated with the specificity of the targets and the ability for the model to reconcile all properties simultaneously. Here, we demonstrate an active learning approach to improve the performance of multi-target generative chemical models. We first demonstrate the effectiveness of a set of baseline models trained on single property prediction tasks in generating novel compounds (i.e., not present in the training data) with various property targets, including both interpolative and extrapolative generation scenarios. For property ranges where accurate targeting proves difficult, the novel compounds suggested by the model are characterized using quantum chemistry and the new molecules closest to expressing the desired properties are fed back into the generative model for additional training. This gradually improves the generative models' understanding of targeted areas of chemical space and shifts the distribution of the generated compounds toward the targeted values. We then demonstrate the effectiveness of this active learning approach in generating compounds with multiple chemical constraints, including vertical ionization potential, electron affinity, and dipole moment targets, and validate the results at the ωB97X-D3/def2-TZVP level. This method requires no modifications to extant generative approaches, but rather utilizes their inherent generative and predictive aspects for self-refinement, and can be applied to situations where any number of properties with varying degrees of correlation must be optimized simultaneously.
Collapse
Affiliation(s)
- Nicolae C Iovanac
- Charles D. Davidson School of Chemical Engineering, Purdue University, 480 Stadium Mall Drive, West Lafayette, Indiana 47906, United States
| | - Robert MacKnight
- Charles D. Davidson School of Chemical Engineering, Purdue University, 480 Stadium Mall Drive, West Lafayette, Indiana 47906, United States
| | - Brett M Savoie
- Charles D. Davidson School of Chemical Engineering, Purdue University, 480 Stadium Mall Drive, West Lafayette, Indiana 47906, United States
| |
Collapse
|
40
|
Kelkar AS, Dallin BC, Van Lehn RC. Identifying nonadditive contributions to the hydrophobicity of chemically heterogeneous surfaces via dual-loop active learning. J Chem Phys 2022; 156:024701. [PMID: 35032988 DOI: 10.1063/5.0072385] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Hydrophobic interactions drive numerous biological and synthetic processes. The materials used in these processes often possess chemically heterogeneous surfaces that are characterized by diverse chemical groups positioned in close proximity at the nanoscale; examples include functionalized nanomaterials and biomolecules, such as proteins and peptides. Nonadditive contributions to the hydrophobicity of such surfaces depend on the chemical identities and spatial patterns of polar and nonpolar groups in ways that remain poorly understood. Here, we develop a dual-loop active learning framework that combines a fast reduced-accuracy method (a convolutional neural network) with a slow higher-accuracy method (molecular dynamics simulations with enhanced sampling) to efficiently predict the hydration free energy, a thermodynamic descriptor of hydrophobicity, for nearly 200 000 chemically heterogeneous self-assembled monolayers (SAMs). Analysis of this dataset reveals that SAMs with distinct polar groups exhibit substantial variations in hydrophobicity as a function of their composition and patterning, but the clustering of nonpolar groups is a common signature of highly hydrophobic patterns. Further molecular dynamics analysis relates such clustering to the perturbation of interfacial water structure. These results provide new insight into the influence of chemical heterogeneity on hydrophobicity via quantitative analysis of a large set of surfaces, enabled by the active learning approach.
Collapse
Affiliation(s)
- Atharva S Kelkar
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Drive, Madison, Wisconsin 53706, USA
| | - Bradley C Dallin
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Drive, Madison, Wisconsin 53706, USA
| | - Reid C Van Lehn
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Drive, Madison, Wisconsin 53706, USA
| |
Collapse
|
41
|
Sivaraman G, Jackson NE. Coarse-Grained Density Functional Theory Predictions via Deep Kernel Learning. J Chem Theory Comput 2022; 18:1129-1141. [PMID: 35020388 DOI: 10.1021/acs.jctc.1c01001] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Scalable electronic predictions are critical for soft materials design. Recently, the Electronic Coarse-Graining (ECG) method was introduced to renormalize all-atom quantum chemical (QC) predictions to coarse-grained (CG) resolutions using deep neural networks (DNNs). While DNNs can learn complex representations that prove challenging for kernel-based methods, they are susceptible to overfitting and the overconfidence of uncertainty estimations. Here, we develop ECG within a GPU-accelerated Deep Kernel Learning (DKL) framework to enable CG QC predictions using range-separated hybrid density functional theory (DFT), obtaining a 107 speedup relative to naive all-atom QC. By treating the predicted electronic properties as random Gaussian Processes, DKL incorporates CG mapping degeneracy by learning the distribution of electronic energies as a function of CG configuration. DKL-ECG accurately reproduces molecular orbital energies from range-separated DFT while facilitating efficient training via active learning using the uncertainties provided by DKL. We show that while active learning algorithms enable efficient sampling of a more diverse configurational space relative to random sampling, all explored query methods exhibit comparable performance for the examined system. We attribute this result to the significant overlap of the feature space and output property distributions across multiple temperatures.
Collapse
Affiliation(s)
- Ganesh Sivaraman
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Nicholas E Jackson
- Department of Chemistry, University of Illinois at Urbana-Champaign, 505 South Mathews Avenue, Urbana, Illinois 61801, United States
| |
Collapse
|
42
|
Dhamankar S, Webb MA. Chemically specific coarse‐graining of polymers: Methods and prospects. JOURNAL OF POLYMER SCIENCE 2021. [DOI: 10.1002/pol.20210555] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Satyen Dhamankar
- Department of Chemical and Biological Engineering Princeton University Princeton New Jersey USA
| | - Michael A. Webb
- Department of Chemical and Biological Engineering Princeton University Princeton New Jersey USA
| |
Collapse
|
43
|
Sheehan F, Sementa D, Jain A, Kumar M, Tayarani-Najjaran M, Kroiss D, Ulijn RV. Peptide-Based Supramolecular Systems Chemistry. Chem Rev 2021; 121:13869-13914. [PMID: 34519481 DOI: 10.1021/acs.chemrev.1c00089] [Citation(s) in RCA: 156] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Peptide-based supramolecular systems chemistry seeks to mimic the ability of life forms to use conserved sets of building blocks and chemical reactions to achieve a bewildering array of functions. Building on the design principles for short peptide-based nanomaterials with properties, such as self-assembly, recognition, catalysis, and actuation, are increasingly available. Peptide-based supramolecular systems chemistry is starting to address the far greater challenge of systems-level design to access complex functions that emerge when multiple reactions and interactions are coordinated and integrated. We discuss key features relevant to systems-level design, including regulating supramolecular order and disorder, development of active and adaptive systems by considering kinetic and thermodynamic design aspects and combinatorial dynamic covalent and noncovalent interactions. Finally, we discuss how structural and dynamic design concepts, including preorganization and induced fit, are critical to the ability to develop adaptive materials with adaptive and tunable photonic, electronic, and catalytic properties. Finally, we highlight examples where multiple features are combined, resulting in chemical systems and materials that display adaptive properties that cannot be achieved without this level of integration.
Collapse
Affiliation(s)
- Fahmeed Sheehan
- Advanced Science Research Center (ASRC) at the Graduate Center City University of New York 85 St. Nicholas Terrace New York, New York 10031, United States.,Department of Chemistry, Hunter College City University of New York 695 Park Avenue, New York, New York 10065, United States.,Ph.D. Program in Chemistry The Graduate Center of the City University of New York 365 fifth Avenue, New York, New York 10016, United States
| | - Deborah Sementa
- Advanced Science Research Center (ASRC) at the Graduate Center City University of New York 85 St. Nicholas Terrace New York, New York 10031, United States
| | - Ankit Jain
- Advanced Science Research Center (ASRC) at the Graduate Center City University of New York 85 St. Nicholas Terrace New York, New York 10031, United States
| | - Mohit Kumar
- Advanced Science Research Center (ASRC) at the Graduate Center City University of New York 85 St. Nicholas Terrace New York, New York 10031, United States.,Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 10-12, Barcelona 08028, Spain
| | - Mona Tayarani-Najjaran
- Advanced Science Research Center (ASRC) at the Graduate Center City University of New York 85 St. Nicholas Terrace New York, New York 10031, United States.,Department of Chemistry, Hunter College City University of New York 695 Park Avenue, New York, New York 10065, United States.,Ph.D. Program in Chemistry The Graduate Center of the City University of New York 365 fifth Avenue, New York, New York 10016, United States
| | - Daniela Kroiss
- Advanced Science Research Center (ASRC) at the Graduate Center City University of New York 85 St. Nicholas Terrace New York, New York 10031, United States.,Department of Chemistry, Hunter College City University of New York 695 Park Avenue, New York, New York 10065, United States.,Ph.D. Program in Biochemistry The Graduate Center of the City University of New York 365 5th Avenue, New York, New York 10016, United States
| | - Rein V Ulijn
- Advanced Science Research Center (ASRC) at the Graduate Center City University of New York 85 St. Nicholas Terrace New York, New York 10031, United States.,Department of Chemistry, Hunter College City University of New York 695 Park Avenue, New York, New York 10065, United States.,Ph.D. Program in Chemistry The Graduate Center of the City University of New York 365 fifth Avenue, New York, New York 10016, United States.,Ph.D. Program in Biochemistry The Graduate Center of the City University of New York 365 5th Avenue, New York, New York 10016, United States
| |
Collapse
|
44
|
Statt A, Kleeblatt DC, Reinhart WF. Unsupervised learning of sequence-specific aggregation behavior for a model copolymer. SOFT MATTER 2021; 17:7697-7707. [PMID: 34350929 DOI: 10.1039/d1sm01012c] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We apply a recently developed unsupervised machine learning scheme for local environments [Reinhart, Comput. Mater. Sci., 2021, 196, 110511] to characterize large-scale, disordered aggregates formed by sequence-defined macromolecules. This method provides new insight into the structure of these disordered, dilute aggregates, which has proven difficult to understand using collective variables manually derived from expert knowledge [Statt et al., J. Chem. Phys., 2020, 152, 075101]. In contrast to such conventional order parameters, we are able to classify the global aggregate structure directly using descriptions of the local environments. The resulting characterization provides a deeper understanding of the range of possible self-assembled structures and their relationships to each other. We also provide detailed analysis of the effects of finite system size, stochasticity, and kinetics of these aggregates based on the learned collective variables. Interestingly, we find that the spatiotemporal evolution of systems in the learned latent space is smooth and continuous, despite being derived from only a single snapshot from each of about 1000 monomer sequences. These results demonstrate the insight which can be gained by applying unsupervised machine learning to soft matter systems, especially when suitable order parameters are not known.
Collapse
Affiliation(s)
- Antonia Statt
- Materials Science and Engineering, Grainger College of Engineering, University of Illinois, Urbana-Champaign, IL 61801, USA
| | | | | |
Collapse
|
45
|
Gormley AJ, Webb MA. Machine learning in combinatorial polymer chemistry. NATURE REVIEWS. MATERIALS 2021; 6:642-644. [PMID: 34394961 PMCID: PMC8356908 DOI: 10.1038/s41578-021-00282-3] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
The design of new functional polymers depends on the successful navigation of their structure-function landscapes. Advances in combinatorial polymer chemistry and machine learning provide exciting opportunities for the engineering of fit-for-purpose polymeric materials.
Collapse
Affiliation(s)
- Adam J. Gormley
- Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Michael A. Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, USA
| |
Collapse
|
46
|
Kuenneth C, Schertzer W, Ramprasad R. Copolymer Informatics with Multitask Deep Neural Networks. Macromolecules 2021. [DOI: 10.1021/acs.macromol.1c00728] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Christopher Kuenneth
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - William Schertzer
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Rampi Ramprasad
- School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
47
|
Alessandri R, Grünewald F, Marrink SJ. The Martini Model in Materials Science. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2021; 33:e2008635. [PMID: 33956373 PMCID: PMC11468591 DOI: 10.1002/adma.202008635] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 03/15/2021] [Indexed: 06/12/2023]
Abstract
The Martini model, a coarse-grained force field initially developed with biomolecular simulations in mind, has found an increasing number of applications in the field of soft materials science. The model's underlying building block principle does not pose restrictions on its application beyond biomolecular systems. Here, the main applications to date of the Martini model in materials science are highlighted, and a perspective for the future developments in this field is given, particularly in light of recent developments such as the new version of the model, Martini 3.
Collapse
Affiliation(s)
- Riccardo Alessandri
- Zernike Institute for Advanced Materials and Groningen Biomolecular Sciences and Biotechnology InstituteUniversity of GroningenNijenborgh 4Groningen9747AGThe Netherlands
- Present address:
Pritzker School of Molecular EngineeringUniversity of ChicagoChicagoIL60637USA
| | - Fabian Grünewald
- Zernike Institute for Advanced Materials and Groningen Biomolecular Sciences and Biotechnology InstituteUniversity of GroningenNijenborgh 4Groningen9747AGThe Netherlands
| | - Siewert J. Marrink
- Zernike Institute for Advanced Materials and Groningen Biomolecular Sciences and Biotechnology InstituteUniversity of GroningenNijenborgh 4Groningen9747AGThe Netherlands
| |
Collapse
|
48
|
Ferguson AL, Hachmann J, Miller TF, Pfaendtner J. The Journal of Physical Chemistry A/ B/ C Virtual Special Issue on Machine Learning in Physical Chemistry. J Phys Chem A 2021; 124:9113-9118. [PMID: 33147969 DOI: 10.1021/acs.jpca.0c09205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
49
|
Zhao S, Cai T, Zhang L, Li W, Lin J. Autonomous Construction of Phase Diagrams of Block Copolymers by Theory-Assisted Active Machine Learning. ACS Macro Lett 2021; 10:598-602. [PMID: 35570770 DOI: 10.1021/acsmacrolett.1c00133] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Equilibrium phase diagrams serve as blueprints for rational design of nanostructured materials of block copolymers, but their construction is time-consuming and requires profound expertise. Herein, by virtue of the knowledge of self-consistent field theory (SCFT), the active-learning method is developed to autonomously construct the phase diagrams of block copolymers. Without human intervention, the SCFT-assisted active-learning method can rapidly search the undetected phases and efficiently reproduce the complicated phase diagrams of diblock copolymers and multiblock terpolymers via decreasing the number of sampling points to about 20%. It is clearly demonstrated that the combined uncertainty sampling/random selection scheme in the active-learning method shows the outperformance in spite of a small amount of initial data set. This work highlights the promising integration of theoretical modeling with machine learning and represents a crucial step toward rational design of nanostructured materials.
Collapse
Affiliation(s)
- Shuochen Zhao
- Shanghai Key Laboratory of Advanced Polymeric Materials, Key Laboratory for Ultrafine Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Tianyun Cai
- Shanghai Key Laboratory of Advanced Polymeric Materials, Key Laboratory for Ultrafine Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liangshun Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, Key Laboratory for Ultrafine Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Weihua Li
- State Key Laboratory of Molecular Engineering of Polymers, Key Laboratory of Computational Physical Sciences, Department of Macromolecular Science, Fudan University, Shanghai 200438, China
| | - Jiaping Lin
- Shanghai Key Laboratory of Advanced Polymeric Materials, Key Laboratory for Ultrafine Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
50
|
van Teijlingen A, Tuttle T. Beyond Tripeptides Two-Step Active Machine Learning for Very Large Data sets. J Chem Theory Comput 2021; 17:3221-3232. [PMID: 33904712 PMCID: PMC8278388 DOI: 10.1021/acs.jctc.1c00159] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Self-assembling peptide nanostructures have been shown to be of great importance in nature and have presented many promising applications, for example, in medicine as drug-delivery vehicles, biosensors, and antivirals. Being very promising candidates for the growing field of bottom-up manufacture of functional nanomaterials, previous work (Frederix, et al. 2011 and 2015) has screened all possible amino acid combinations for di- and tripeptides in search of such materials. However, the enormous complexity and variety of linear combinations of the 20 amino acids make exhaustive simulation of all combinations of tetrapeptides and above infeasible. Therefore, we have developed an active machine-learning method (also known as "iterative learning" and "evolutionary search method") which leverages a lower-resolution data set encompassing the whole search space and a just-in-time high-resolution data set which further analyzes those target peptides selected by the lower-resolution model. This model uses newly generated data upon each iteration to improve both lower- and higher-resolution models in the search for ideal candidates. Curation of the lower-resolution data set is explored as a method to control the selected candidates, based on criteria such as log P. A major aim of this method is to produce the best results in the least computationally demanding way. This model has been developed to be broadly applicable to other search spaces with minor changes to the algorithm, allowing its use in other areas of research.
Collapse
Affiliation(s)
| | - Tell Tuttle
- Department of Chemistry, University of Strathclyde, 295 Cathedral Street, Glasgow G1 1XL, U.K
| |
Collapse
|