1
|
Changiarath A, Arya A, Xenidis VA, Padeken J, Stelzl LS. Sequence determinants of protein phase separation and recognition by protein phase-separated condensates through molecular dynamics and active learning. Faraday Discuss 2024. [PMID: 39319382 DOI: 10.1039/d4fd00099d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
Elucidating how protein sequence determines the properties of disordered proteins and their phase-separated condensates is a great challenge in computational chemistry, biology, and biophysics. Quantitative molecular dynamics simulations and derived free energy values can in principle capture how a sequence encodes the chemical and biological properties of a protein. These calculations are, however, computationally demanding, even after reducing the representation by coarse-graining; exploring the large spaces of potentially relevant sequences remains a formidable task. We employ an "active learning" scheme introduced by Yang et al. (bioRxiv, 2022, https://doi.org/10.1101/2022.08.05.502972) to reduce the number of labelled examples needed from simulations, where a neural network-based model suggests the most useful examples for the next training cycle. Applying this Bayesian optimisation framework, we determine properties of protein sequences with coarse-grained molecular dynamics, which enables the network to establish sequence-property relationships for disordered proteins and their self-interactions and their interactions in phase-separated condensates. We show how iterative training with second virial coefficients derived from the simulations of disordered protein sequences leads to a rapid improvement in predicting peptide self-interactions. We employ this Bayesian approach to efficiently search for new sequences that bind to condensates of the disordered C-terminal domain (CTD) of RNA Polymerase II, by simulating molecular recognition of peptides to phase-separated condensates in coarse-grained molecular dynamics. By searching for protein sequences which prefer to self-interact rather than interact with another protein sequence we are able to shape the morphology of protein condensates and design multiphasic protein condensates.
Collapse
Affiliation(s)
- Arya Changiarath
- Institute of Physics, Johannes Gutenberg University (JGU) Mainz, Germany
| | - Aayush Arya
- Institute of Physics, Johannes Gutenberg University (JGU) Mainz, Germany
| | | | - Jan Padeken
- Institute of Molecular Biology (IMB) Mainz, Germany
| | - Lukas S Stelzl
- Institute of Molecular Biology (IMB) Mainz, Germany
- Institute of Molecular Physiology, Johannes Gutenberg University (JGU) Mainz, Germany.
- KOMET1, Institute of Physics, Johannes Gutenberg University (JGU) Mainz, Germany
| |
Collapse
|
2
|
Loeffler HH, Wan S, Klähn M, Bhati AP, Coveney PV. Optimal Molecular Design: Generative Active Learning Combining REINVENT with Precise Binding Free Energy Ranking Simulations. J Chem Theory Comput 2024; 20. [PMID: 39225482 PMCID: PMC11428133 DOI: 10.1021/acs.jctc.4c00576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 08/08/2024] [Accepted: 08/08/2024] [Indexed: 09/04/2024]
Abstract
Active learning (AL) is a specific instance of sequential experimental design and uses machine learning to intelligently choose the next data point or batch of molecular structures to be evaluated. In this sense, it closely mimics the iterative design-make-test-analysis cycle of laboratory experiments to find optimized compounds for a given design task. Here, we describe an AL protocol which combines generative molecular AI, using REINVENT, and physics-based absolute binding free energy molecular dynamics simulation, using ESMACS, to discover new ligands for two different target proteins, 3CLpro and TNKS2. We have deployed our generative active learning (GAL) protocol on Frontier, the world's only exa-scale machine. We show that the protocol can find higher-scoring molecules compared to the baseline, a surrogate ML docking model for 3CLpro and compounds with experimentally determined binding affinities for TNKS2. The ligands found are also chemically diverse and occupy a different chemical space than the baseline. We vary the batch sizes that are put forward for free energy assessment in each GAL cycle to assess the impact on their efficiency on the GAL protocol and recommend their optimal values in different scenarios. Overall, we demonstrate a powerful capability of the combination of physics-based and AI methods which yields effective chemical space sampling at an unprecedented scale and is of immediate and direct relevance to modern, data-driven drug discovery.
Collapse
Affiliation(s)
- Hannes H. Loeffler
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Mölndal 431 83, Sweden
| | - Shunzhou Wan
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
| | - Marco Klähn
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Mölndal 431 83, Sweden
| | - Agastya P. Bhati
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
| | - Peter V. Coveney
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
- Advanced
Research Computing Centre, University College
London, London WC1H 0AJ, U.K.
- Institute
for Informatics, Faculty of Science, University
of Amsterdam, Amsterdam 1098XH, The Netherlands
| |
Collapse
|
3
|
Abranches DO, Maginn EJ, Colón YJ. Stochastic machine learning via sigma profiles to build a digital chemical space. Proc Natl Acad Sci U S A 2024; 121:e2404676121. [PMID: 39042681 PMCID: PMC11295021 DOI: 10.1073/pnas.2404676121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 06/16/2024] [Indexed: 07/25/2024] Open
Abstract
This work establishes a different paradigm on digital molecular spaces and their efficient navigation by exploiting sigma profiles. To do so, the remarkable capability of Gaussian processes (GPs), a type of stochastic machine learning model, to correlate and predict physicochemical properties from sigma profiles is demonstrated, outperforming state-of-the-art neural networks previously published. The amount of chemical information encoded in sigma profiles eases the learning burden of machine learning models, permitting the training of GPs on small datasets which, due to their negligible computational cost and ease of implementation, are ideal models to be combined with optimization tools such as gradient search or Bayesian optimization (BO). Gradient search is used to efficiently navigate the sigma profile digital space, quickly converging to local extrema of target physicochemical properties. While this requires the availability of pretrained GP models on existing datasets, such limitations are eliminated with the implementation of BO, which can find global extrema with a limited number of iterations. A remarkable example of this is that of BO toward boiling temperature optimization. Holding no knowledge of chemistry except for the sigma profile and boiling temperature of carbon monoxide (the worst possible initial guess), BO finds the global maximum of the available boiling temperature dataset (over 1,000 molecules encompassing more than 40 families of organic and inorganic compounds) in just 15 iterations (i.e., 15 property measurements), cementing sigma profiles as a powerful digital chemical space for molecular optimization and discovery, particularly when little to no experimental data is initially available.
Collapse
Affiliation(s)
- Dinis O. Abranches
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, IN46556
| | - Edward J. Maginn
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, IN46556
| | - Yamil J. Colón
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, IN46556
| |
Collapse
|
4
|
Kopeć M, Borek-Dorosz A, Jarczewska K, Barańska M, Abramczyk H. The role of cardiolipin and cytochrome c in mitochondrial metabolism of cancer cells determined by Raman imaging: in vitro study on the brain glioblastoma U-87 MG cell line. Analyst 2024; 149:2697-2708. [PMID: 38506099 DOI: 10.1039/d4an00015c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
In this paper, we present Raman imaging as a non-invasive approach for studying changes in mitochondrial metabolism caused by cardiolipin-cytochrome c interactions. We investigated the effect of mitochondrial dysregulation on cardiolipin (CL) and cytochrome c (Cyt c) interactions for a brain cancer cell line (U-87 MG). Mitochondrial metabolism was monitored by checking the intensities of the Raman bands at 750 cm-1, 1126 cm-1, 1310 cm-1, 1337 cm-1, 1444 cm-1 and 1584 cm-1. The presented results indicate that under pathological conditions, the content and redox status of Cyt c in mitochondria can be used as a Raman marker to characterize changes in cellular metabolism. This work provides evidence that cardiolipin-cytochrome c interactions are crucial for mitochondrial energy homeostasis by controlling the redox status of Cyt c in the electron transport chain, switching from disabling Cyt c reduction and enabling peroxidase activity. This paper provides experimental support for the hypothesis of how cardiolipin-cytochrome c interactions regulate electron transfer in the respiratory chain, apoptosis and mROS production in mitochondria.
Collapse
Affiliation(s)
- Monika Kopeć
- Lodz University of Technology, Institute of Applied Radiation Chemistry, Laboratory of Laser Molecular Spectroscopy, Wroblewskiego 15, 93-590 Lodz, Poland.
- Jagiellonian University, Faculty of Chemistry, Gronostajowa 2, 30-387 Krakow, Poland
| | | | - Karolina Jarczewska
- Lodz University of Technology, Institute of Applied Radiation Chemistry, Laboratory of Laser Molecular Spectroscopy, Wroblewskiego 15, 93-590 Lodz, Poland.
| | - Małgorzata Barańska
- Jagiellonian University, Faculty of Chemistry, Gronostajowa 2, 30-387 Krakow, Poland
| | - Halina Abramczyk
- Lodz University of Technology, Institute of Applied Radiation Chemistry, Laboratory of Laser Molecular Spectroscopy, Wroblewskiego 15, 93-590 Lodz, Poland.
| |
Collapse
|
5
|
Wang Z, Chen A, Tao K, Han Y, Li J. MatGPT: A Vane of Materials Informatics from Past, Present, to Future. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2306733. [PMID: 37813548 DOI: 10.1002/adma.202306733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 09/05/2023] [Indexed: 10/17/2023]
Abstract
Combining materials science, artificial intelligence (AI), physical chemistry, and other disciplines, materials informatics is continuously accelerating the vigorous development of new materials. The emergence of "GPT (Generative Pre-trained Transformer) AI" shows that the scientific research field has entered the era of intelligent civilization with "data" as the basic factor and "algorithm + computing power" as the core productivity. The continuous innovation of AI will impact the cognitive laws and scientific methods, and reconstruct the knowledge and wisdom system. This leads to think more about materials informatics. Here, a comprehensive discussion of AI models and materials infrastructures is provided, and the advances in the discovery and design of new materials are reviewed. With the rise of new research paradigms triggered by "AI for Science", the vane of materials informatics: "MatGPT", is proposed and the technical path planning from the aspects of data, descriptors, generative models, pretraining models, directed design models, collaborative training, experimental robots, as well as the efforts and preparations needed to develop a new generation of materials informatics, is carried out. Finally, the challenges and constraints faced by materials informatics are discussed, in order to achieve a more digital, intelligent, and automated construction of materials informatics with the joint efforts of more interdisciplinary scientists.
Collapse
Affiliation(s)
- Zhilong Wang
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory of Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano Electronics, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - An Chen
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory of Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano Electronics, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Kehao Tao
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory of Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano Electronics, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanqiang Han
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory of Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano Electronics, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Jinjin Li
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory of Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano Electronics, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
6
|
de Raffele D, Ilie IM. Unlocking novel therapies: cyclic peptide design for amyloidogenic targets through synergies of experiments, simulations, and machine learning. Chem Commun (Camb) 2024; 60:632-645. [PMID: 38131333 DOI: 10.1039/d3cc04630c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Existing therapies for neurodegenerative diseases like Parkinson's and Alzheimer's address only their symptoms and do not prevent disease onset. Common therapeutic agents, such as small molecules and antibodies struggle with insufficient selectivity, stability and bioavailability, leading to poor performance in clinical trials. Peptide-based therapeutics are emerging as promising candidates, with successful applications for cardiovascular diseases and cancers due to their high bioavailability, good efficacy and specificity. In particular, cyclic peptides have a long in vivo stability, while maintaining a robust antibody-like binding affinity. However, the de novo design of cyclic peptides is challenging due to the lack of long-lived druggable pockets of the target polypeptide, absence of exhaustive conformational distributions of the target and/or the binder, unknown binding site, methodological limitations, associated constraints (failed trials, time, money) and the vast combinatorial sequence space. Hence, efficient alignment and cooperation between disciplines, and synergies between experiments and simulations complemented by popular techniques like machine-learning can significantly speed up the therapeutic cyclic-peptide development for neurodegenerative diseases. We review the latest advancements in cyclic peptide design against amyloidogenic targets from a computational perspective in light of recent advancements and potential of machine learning to optimize the design process. We discuss the difficulties encountered when designing novel peptide-based inhibitors and we propose new strategies incorporating experiments, simulations and machine learning to design cyclic peptides to inhibit the toxic propagation of amyloidogenic polypeptides. Importantly, these strategies extend beyond the mere design of cyclic peptides and serve as template for the de novo generation of (bio)materials with programmable properties.
Collapse
Affiliation(s)
- Daria de Raffele
- University of Amsterdam, van 't Hoff Institute for Molecular Sciences, Science Park 904, P.O. Box 94157, 1090 GD Amsterdam, The Netherlands.
- Amsterdam Center for Multiscale Modeling (ACMM), University of Amsterdam, P.O. Box 94157, 1090 GD Amsterdam, The Netherlands
| | - Ioana M Ilie
- University of Amsterdam, van 't Hoff Institute for Molecular Sciences, Science Park 904, P.O. Box 94157, 1090 GD Amsterdam, The Netherlands.
- Amsterdam Center for Multiscale Modeling (ACMM), University of Amsterdam, P.O. Box 94157, 1090 GD Amsterdam, The Netherlands
| |
Collapse
|
7
|
Tang Y, Kim JY, Ip CKM, Bahmani A, Chen Q, Rosenberger MG, Esser-Kahn AP, Ferguson AL. Data-driven discovery of innate immunomodulators via machine learning-guided high throughput screening. Chem Sci 2023; 14:12747-12766. [PMID: 38020385 PMCID: PMC10646978 DOI: 10.1039/d3sc03613h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 10/18/2023] [Indexed: 12/01/2023] Open
Abstract
The innate immune response is vital for the success of prophylactic vaccines and immunotherapies. Control of signaling in innate immune pathways can improve prophylactic vaccines by inhibiting unfavorable systemic inflammation and immunotherapies by enhancing immune stimulation. In this work, we developed a machine learning-enabled active learning pipeline to guide in vitro experimental screening and discovery of small molecule immunomodulators that improve immune responses by altering the signaling activity of innate immune responses stimulated by traditional pattern recognition receptor agonists. Molecules were tested by in vitro high throughput screening (HTS) where we measured modulation of the nuclear factor κ-light-chain-enhancer of activated B-cells (NF-κB) and the interferon regulatory factors (IRF) pathways. These data were used to train data-driven predictive models linking molecular structure to modulation of the NF-κB and IRF responses using deep representational learning, Gaussian process regression, and Bayesian optimization. By interleaving successive rounds of model training and in vitro HTS, we performed an active learning-guided traversal of a 139 998 molecule library. After sampling only ∼2% of the library, we discovered viable molecules with unprecedented immunomodulatory capacity, including those capable of suppressing NF-κB activity by up to 15-fold, elevating NF-κB activity by up to 5-fold, and elevating IRF activity by up to 6-fold. We extracted chemical design rules identifying particular chemical fragments as principal drivers of specific immunomodulation behaviors. We validated the immunomodulatory effect of a subset of our top candidates by measuring cytokine release profiles. Of these, one molecule induced a 3-fold enhancement in IFN-β production when delivered with a cyclic di-nucleotide stimulator of interferon genes (STING) agonist. In sum, our machine learning-enabled screening approach presents an efficient immunomodulator discovery pipeline that has furnished a library of novel small molecules with a strong capacity to enhance or suppress innate immune signaling pathways to shape and improve prophylactic vaccination and immunotherapies.
Collapse
Affiliation(s)
- Yifeng Tang
- Pritzker School of Molecular Engineering, University of Chicago Chicago IL 60637 USA
| | - Jeremiah Y Kim
- Pritzker School of Molecular Engineering, University of Chicago Chicago IL 60637 USA
| | - Carman K M Ip
- Cellular Screening Center, University of Chicago Chicago IL 60637 USA
| | - Azadeh Bahmani
- Cellular Screening Center, University of Chicago Chicago IL 60637 USA
| | - Qing Chen
- Pritzker School of Molecular Engineering, University of Chicago Chicago IL 60637 USA
| | - Matthew G Rosenberger
- Pritzker School of Molecular Engineering, University of Chicago Chicago IL 60637 USA
| | - Aaron P Esser-Kahn
- Pritzker School of Molecular Engineering, University of Chicago Chicago IL 60637 USA
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago Chicago IL 60637 USA
| |
Collapse
|
8
|
Jones MS, Shmilovich K, Ferguson AL. DiAMoNDBack: Diffusion-Denoising Autoregressive Model for Non-Deterministic Backmapping of Cα Protein Traces. J Chem Theory Comput 2023; 19:7908-7923. [PMID: 37906711 DOI: 10.1021/acs.jctc.3c00840] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long time scales, such as aggregation and folding. The reduced resolution realizes computational accelerations, but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only Cα coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the Cα trace and previously backmapped backbone and side-chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side-chain all-atom configurations consistent with the coarse-grained Cα trace. We train DiAMoNDBack over 65k+ structures from the Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side-chain clashes, and the diversity of the generated side-chain configurational states. We make the DiAMoNDBack model publicly available as a free and open-source Python package.
Collapse
Affiliation(s)
- Michael S Jones
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
9
|
Moore JH, Margreitter C, Janet JP, Engkvist O, de Groot BL, Gapsys V. Automated relative binding free energy calculations from SMILES to ΔΔG. Commun Chem 2023; 6:82. [PMID: 37106032 PMCID: PMC10140266 DOI: 10.1038/s42004-023-00859-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 03/22/2023] [Indexed: 04/29/2023] Open
Abstract
In drug discovery, computational methods are a key part of making informed design decisions and prioritising experiments. In particular, optimizing compound affinity is a central concern during the early stages of development. In the last 10 years, alchemical free energy (FE) calculations have transformed our ability to incorporate accurate in silico potency predictions in design decisions, and represent the 'gold standard' for augmenting experiment-driven drug discovery. However, relative FE calculations are complex to set up, require significant expert intervention to prepare the calculation and analyse the results or are provided only as closed-source software, not allowing for fine-grained control over the underlying settings. In this work, we introduce an end-to-end relative FE workflow based on the non-equilibrium switching approach that facilitates calculation of binding free energies starting from SMILES strings. The workflow is implemented using fully modular steps, allowing various components to be exchanged depending on licence availability. We further investigate the dependence of the calculated free energy accuracy on the initial ligand pose generated by various docking algorithms. We show that both commercial and open-source docking engines can be used to generate poses that lead to good correlation of free energies with experimental reference data.
Collapse
Affiliation(s)
- J Harry Moore
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | | | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| | - Bert L de Groot
- Computational Biomolecular Dynamics Group, Department of Theoretical and Computational Biophysics, Max Planck Institute for Multidisciplinary Sciences, Am Fassberg 11, D-37077, Göttingen, Germany.
| | - Vytautas Gapsys
- Computational Biomolecular Dynamics Group, Department of Theoretical and Computational Biophysics, Max Planck Institute for Multidisciplinary Sciences, Am Fassberg 11, D-37077, Göttingen, Germany.
| |
Collapse
|
10
|
Ramesh PS, Patra TK. Polymer sequence design via molecular simulation-based active learning. SOFT MATTER 2023; 19:282-294. [PMID: 36519427 DOI: 10.1039/d2sm01193j] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Molecular-scale interactions and chemical structures offer an enormous opportunity to tune material properties. However, designing materials from their molecular scale is a grand challenge owing to the practical limitations in exploring astronomically large design spaces using traditional experimental or computational methods. Advancements in data science and machine learning have produced a host of tools and techniques that can address this problem and facilitate the efficient exploration of large search spaces. In this work, a blended approach integrating physics-based methods, machine learning techniques and uncertainty quantification is implemented to effectively screen a macromolecular sequence space and design target structures. Here, we survey and assess the efficacy of data-driven methods within the framework of active learning for a challenging design problem, viz., sequence optimization of a copolymer. We report the impact of surrogate models, kernels, and initial conditions on the convergence of the active learning method for the sequence design problem. This work establishes optimal strategies and hyperparameters for efficient inverse design of polymer sequences via active learning.
Collapse
Affiliation(s)
- Praneeth S Ramesh
- Department of Chemical Engineering, Center for Atomistic Modeling and Materials Design and Center for Carbon Capture Utilization and Storage, Indian Institute of Technology Madras, Chennai, TN 600036, India.
| | - Tarak K Patra
- Department of Chemical Engineering, Center for Atomistic Modeling and Materials Design and Center for Carbon Capture Utilization and Storage, Indian Institute of Technology Madras, Chennai, TN 600036, India.
| |
Collapse
|
11
|
Jiao S, Katz LE, Shell MS. Inverse Design of Pore Wall Chemistry To Control Solute Transport and Selectivity. ACS CENTRAL SCIENCE 2022; 8:1609-1617. [PMID: 36589891 PMCID: PMC9801506 DOI: 10.1021/acscentsci.2c01011] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Indexed: 05/08/2023]
Abstract
Next-generation membranes for purification and reuse of highly contaminated water require materials with precisely tuned functionality to address key challenges, including the removal of small, charge-neutral solutes. Bioinspired multifunctional membrane surfaces enhance transport properties, but the combinatorically large chemical space is difficult to navigate through trial and error. Here, we demonstrate a computational inverse design approach to efficiently identify promising materials and elucidate design rules. We develop a combined evolutionary optimization, machine learning, and molecular simulation workflow to spatially design chemical functional group patterning in a model nanopore that enhances transport of water relative to solutes. The genetic optimization discovers nonintuitive functionalization strategies that hinder the transport of solutes through the pore, simply by patterning hydrophobic methyl and hydrophilic hydroxyl functional groups. Examining these patterns, we demonstrate that they exploit an unexpected diffusive solute hopping mechanism. This inverse design procedure and the identification of novel molecular mechanisms for pore chemical heterogeneity to impact solute selectivity demonstrate new routes to the design of membrane materials with novel functionalities. More broadly, this work illustrates how chemical design is a powerful strategy to modulate water-mediated surface-solute interactions in complex, soft material systems that are relevant to diverse technologies.
Collapse
Affiliation(s)
- Sally Jiao
- Department
of Chemical Engineering, University of California, Santa Barbara, California93106, United States
| | - Lynn E. Katz
- Department
of Civil, Architectural and Environmental Engineering, University of Texas at Austin, Austin, Texas78712, United States
| | - M. Scott Shell
- Department
of Chemical Engineering, University of California, Santa Barbara, California93106, United States
| |
Collapse
|
12
|
Shmilovich K, Stieffenhofer M, Charron NE, Hoffmann M. Temporally Coherent Backmapping of Molecular Trajectories From Coarse-Grained to Atomistic Resolution. J Phys Chem A 2022; 126:9124-9139. [PMID: 36417670 PMCID: PMC9743211 DOI: 10.1021/acs.jpca.2c07716] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Coarse-graining offers a means to extend the achievable time and length scales of molecular dynamics simulations beyond what is practically possible in the atomistic regime. Sampling molecular configurations of interest can be done efficiently using coarse-grained simulations, from which meaningful physicochemical information can be inferred if the corresponding all-atom configurations are reconstructed. However, this procedure of backmapping to reintroduce the lost atomistic detail into coarse-grain structures has proven a challenging task due to the many feasible atomistic configurations that can be associated with one coarse-grain structure. Existing backmapping methods are strictly frame-based, relying on either heuristics to replace coarse-grain particles with atomic fragments and subsequent relaxation or parametrized models to propose atomic coordinates separately and independently for each coarse-grain structure. These approaches neglect information from previous trajectory frames that is critical to ensuring temporal coherence of the backmapped trajectory, while also offering information potentially helpful to producing higher-fidelity atomic reconstructions. In this work, we present a deep learning-enabled data-driven approach for temporally coherent backmapping that explicitly incorporates information from preceding trajectory structures. Our method trains a conditional variational autoencoder to nondeterministically reconstruct atomistic detail conditioned on both the target coarse-grain configuration and the previously reconstructed atomistic configuration. We demonstrate our backmapping approach on two exemplar biomolecular systems: alanine dipeptide and the miniprotein chignolin. We show that our backmapped trajectories accurately recover the structural, thermodynamic, and kinetic properties of the atomistic trajectory data.
Collapse
Affiliation(s)
- Kirill Shmilovich
- Pritzker
School of Molecular Engineering, University
of Chicago, Chicago, Illinois60637, United States,E-mail:
| | | | - Nicholas E. Charron
- Weiss
School of Natural Sciences, Department of Physics and Astronomy, Rice University, Houston, Texas77005, United States,Department
of Physics, Freie Universität Berlin, Berlin14195, Germany
| | - Moritz Hoffmann
- Fachbereich
Mathematik und Informatik, Freie Universität
Berlin, Berlin14195, Germany
| |
Collapse
|
13
|
Khalak Y, Tresadern G, Hahn DF, de Groot BL, Gapsys V. Chemical Space Exploration with Active Learning and Alchemical Free Energies. J Chem Theory Comput 2022; 18:6259-6270. [PMID: 36148968 PMCID: PMC9558370 DOI: 10.1021/acs.jctc.2c00752] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Indexed: 11/30/2022]
Abstract
Drug discovery can be thought of as a search for a needle in a haystack: searching through a large chemical space for the most active compounds. Computational techniques can narrow the search space for experimental follow up, but even they become unaffordable when evaluating large numbers of molecules. Therefore, machine learning (ML) strategies are being developed as computationally cheaper complementary techniques for navigating and triaging large chemical libraries. Here, we explore how an active learning protocol can be combined with first-principles based alchemical free energy calculations to identify high affinity phosphodiesterase 2 (PDE2) inhibitors. We first calibrate the procedure using a set of experimentally characterized PDE2 binders. The optimized protocol is then used prospectively on a large chemical library to navigate toward potent inhibitors. In the active learning cycle, at every iteration a small fraction of compounds is probed by alchemical calculations and the obtained affinities are used to train ML models. With successive rounds, high affinity binders are identified by explicitly evaluating only a small subset of compounds in a large chemical library, thus providing an efficient protocol that robustly identifies a large fraction of true positives.
Collapse
Affiliation(s)
- Yuriy Khalak
- Computational
Biomolecular Dynamics Group, Department of Theoretical and Computational
Biophysics, Max Planck Institute for Multidisciplinary
Sciences, Am Fassberg 11, D-37077 Göttingen, Germany
| | - Gary Tresadern
- Computational
Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340 Beerse, Belgium
| | - David F. Hahn
- Computational
Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Bert L. de Groot
- Computational
Biomolecular Dynamics Group, Department of Theoretical and Computational
Biophysics, Max Planck Institute for Multidisciplinary
Sciences, Am Fassberg 11, D-37077 Göttingen, Germany
| | - Vytautas Gapsys
- Computational
Biomolecular Dynamics Group, Department of Theoretical and Computational
Biophysics, Max Planck Institute for Multidisciplinary
Sciences, Am Fassberg 11, D-37077 Göttingen, Germany
| |
Collapse
|
14
|
Kleinwächter I, Mohr B, Joppe A, Hellmann N, Bereau T, Osiewacz HD, Schneider D. CLiB - a novel cardiolipin-binder isolated via data-driven and in vitro screening. RSC Chem Biol 2022; 3:941-954. [PMID: 35866160 PMCID: PMC9257654 DOI: 10.1039/d2cb00125j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 06/01/2022] [Indexed: 11/21/2022] Open
Abstract
Cardiolipin, the mitochondria marker lipid, is crucially involved in stabilizing the inner mitochondrial membrane and is vital for the activity of mitochondrial proteins and protein complexes. Directly targeting cardiolipin by a chemical-biology approach and thereby altering the cellular concentration of "available" cardiolipin eventually allows to systematically study the dependence of cellular processes on cardiolipin availability. In the present study, physics-based coarse-grained free energy calculations allowed us to identify the physical and chemical properties indicative of cardiolipin selectivity and to apply these to screen a compound database for putative cardiolipin-binders. The membrane binding properties of the 22 most promising molecules identified in the in silico approach were screened in vitro, using model membrane systems finally resulting in the identification of a single molecule, CLiB (CardioLipin-Binder). CLiB clearly affects respiration of cardiolipin-containing intact bacterial cells as well as of isolated mitochondria. Thus, the structure and function of mitochondrial membranes and membrane proteins might be (indirectly) targeted and controlled by CLiB for basic research and, potentially, also for therapeutic purposes.
Collapse
Affiliation(s)
- Isabel Kleinwächter
- Department of Chemistry, Biochemistry, Johannes Gutenberg University Mainz Hanns-Dieter-Hüsch-Weg 17 55128 Mainz Germany
| | - Bernadette Mohr
- Van 't Hoff Institute for Molecular Sciences and Informatics Institute, University of Amsterdam Amsterdam The Netherlands
| | - Aljoscha Joppe
- Institute for Molecular Biosciences, J. W. Goethe University Frankfurt am Main Germany
| | - Nadja Hellmann
- Department of Chemistry, Biochemistry, Johannes Gutenberg University Mainz Hanns-Dieter-Hüsch-Weg 17 55128 Mainz Germany
| | - Tristan Bereau
- Van 't Hoff Institute for Molecular Sciences and Informatics Institute, University of Amsterdam Amsterdam The Netherlands
| | - Heinz D Osiewacz
- Institute for Molecular Biosciences, J. W. Goethe University Frankfurt am Main Germany
| | - Dirk Schneider
- Department of Chemistry, Biochemistry, Johannes Gutenberg University Mainz Hanns-Dieter-Hüsch-Weg 17 55128 Mainz Germany
- Institute of Molecular Physiology, Johannes Gutenberg University Mainz Hanns-Dieter-Hüsch-Weg 17 55128 Mainz Germany
| |
Collapse
|
15
|
Forero‐Martinez NC, Lin K, Kremer K, Andrienko D. Virtual Screening for Organic Solar Cells and Light Emitting Diodes. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2022; 9:e2200825. [PMID: 35460204 PMCID: PMC9259727 DOI: 10.1002/advs.202200825] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 03/14/2022] [Indexed: 06/14/2023]
Abstract
The field of organic semiconductors is multifaceted and the potentially suitable molecular compounds are very diverse. Representative examples include discotic liquid crystals, dye-sensitized solar cells, conjugated polymers, and graphene-based low-dimensional materials. This huge variety not only represents enormous challenges for synthesis but also for theory, which aims at a comprehensive understanding and structuring of the plethora of possible compounds. Eventually computational methods should point to new, better materials, which have not yet been synthesized. In this perspective, it is shown that the answer to this question rests upon the delicate balance between computational efficiency and accuracy of the methods used in the virtual screening. To illustrate the fundamentals of virtual screening, chemical design of non-fullerene acceptors, thermally activated delayed fluorescence emitters, and nanographenes are discussed.
Collapse
Affiliation(s)
| | - Kun‐Han Lin
- Max Planck Institute for Polymer ResearchAckermannweg 10Mainz55128Germany
| | - Kurt Kremer
- Max Planck Institute for Polymer ResearchAckermannweg 10Mainz55128Germany
| | - Denis Andrienko
- Max Planck Institute for Polymer ResearchAckermannweg 10Mainz55128Germany
| |
Collapse
|
16
|
Aldeghi M, Coley CW. A focus on simulation and machine learning as complementary tools for chemical space navigation. Chem Sci 2022; 13:8221-8223. [PMID: 35919730 PMCID: PMC9297700 DOI: 10.1039/d2sc90130g] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Computer-aided molecular design benefits from the integration of two complementary approaches: machine learning and first-principles simulation. Mohr et al. (B. Mohr, K. Shmilovich, I. S. Kleinwächter, D. Schneider, A. L. Ferguson and T. Bereau, Chem. Sci., 2022, 13, 4498–4511, https://pubs.rsc.org/en/content/articlelanding/2022/sc/d2sc00116k) demonstrated the discovery of a cardiolipin-selective molecule via the combination of coarse-grained molecular dynamics, alchemical free energy calculations, Bayesian optimization and interpretable regression to reveal design principles. Machine learning and simulation synergistically contribute to the discovery of novel cardiolipin-selective molecules.![]()
Collapse
Affiliation(s)
- Matteo Aldeghi
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Connor W. Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|