1
|
Mehdi S, Tiwary P. Thermodynamics-inspired explanations of artificial intelligence. Nat Commun 2024; 15:7859. [PMID: 39251574 PMCID: PMC11385982 DOI: 10.1038/s41467-024-51970-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 08/20/2024] [Indexed: 09/11/2024] Open
Abstract
In recent years, predictive machine learning models have gained prominence across various scientific domains. However, their black-box nature necessitates establishing trust in them before accepting their predictions as accurate. One promising strategy involves employing explanation techniques that elucidate the rationale behind a model's predictions in a way that humans can understand. However, assessing the degree of human interpretability of these explanations is a nontrivial challenge. In this work, we introduce interpretation entropy as a universal solution for evaluating the human interpretability of any linear model. Using this concept and drawing inspiration from classical thermodynamics, we present Thermodynamics-inspired Explainable Representations of AI and other black-box Paradigms, a method for generating optimally human-interpretable explanations in a model-agnostic manner. We demonstrate the wide-ranging applicability of this method by explaining predictions from various black-box model architectures across diverse domains, including molecular simulations, text, and image classification.
Collapse
Affiliation(s)
- Shams Mehdi
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, 20742, USA
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, 20742, USA.
- University of Maryland Institute for Health Computing, Bethesda, Maryland, 20852, USA.
| |
Collapse
|
2
|
Green AJ, Truong L, Thunga P, Leong C, Hancock M, Tanguay RL, Reif DM. Deep autoencoder-based behavioral pattern recognition outperforms standard statistical methods in high-dimensional zebrafish studies. PLoS Comput Biol 2024; 20:e1012423. [PMID: 39255309 PMCID: PMC11414989 DOI: 10.1371/journal.pcbi.1012423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 09/20/2024] [Accepted: 08/15/2024] [Indexed: 09/12/2024] Open
Abstract
Zebrafish have become an essential model organism in screening for developmental neurotoxic chemicals and their molecular targets. The success of zebrafish as a screening model is partially due to their physical characteristics including their relatively simple nervous system, rapid development, experimental tractability, and genetic diversity combined with technical advantages that allow for the generation of large amounts of high-dimensional behavioral data. These data are complex and require advanced machine learning and statistical techniques to comprehensively analyze and capture spatiotemporal responses. To accomplish this goal, we have trained semi-supervised deep autoencoders using behavior data from unexposed larval zebrafish to extract quintessential "normal" behavior. Following training, our network was evaluated using data from larvae shown to have significant changes in behavior (using a traditional statistical framework) following exposure to toxicants that include nanomaterials, aromatics, per- and polyfluoroalkyl substances (PFAS), and other environmental contaminants. Further, our model identified new chemicals (Perfluoro-n-octadecanoic acid, 8-Chloroperfluorooctylphosphonic acid, and Nonafluoropentanamide) as capable of inducing abnormal behavior at multiple chemical-concentrations pairs not captured using distance moved alone. Leveraging this deep learning model will allow for better characterization of the different exposure-induced behavioral phenotypes, facilitate improved genetic and neurobehavioral analysis in mechanistic determination studies and provide a robust framework for analyzing complex behaviors found in higher-order model systems.
Collapse
Affiliation(s)
- Adrian J. Green
- Bioinformatics Research Center, Department of Biological Sciences, NC State University, Raleigh, North Carolina, United States of America
- Sciome LLC, Research Triangle Park, North Carolina, United States of America
| | - Lisa Truong
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| | - Preethi Thunga
- Bioinformatics Research Center, Department of Biological Sciences, NC State University, Raleigh, North Carolina, United States of America
| | - Connor Leong
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| | - Melody Hancock
- Bioinformatics Research Center, Department of Biological Sciences, NC State University, Raleigh, North Carolina, United States of America
| | - Robyn L. Tanguay
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| | - David M. Reif
- Bioinformatics Research Center, Department of Biological Sciences, NC State University, Raleigh, North Carolina, United States of America
- Predictive Toxicology Branch, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| |
Collapse
|
3
|
Zhang J, Zhang O, Bonati L, Hou T. Combining Transition Path Sampling with Data-Driven Collective Variables through a Reactivity-Biased Shooting Algorithm. J Chem Theory Comput 2024; 20:4523-4532. [PMID: 38801759 DOI: 10.1021/acs.jctc.4c00423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Rare event sampling is a central problem in modern computational chemistry research. Among the existing methods, transition path sampling (TPS) can generate unbiased representations of reaction processes. However, its efficiency depends on the ability to generate reactive trial paths, which in turn depends on the quality of the shooting algorithm used. We propose a new algorithm based on the shooting success rate, i.e., reactivity, measured as a function of a reduced set of collective variables (CVs). These variables are extracted with a machine learning approach directly from TPS simulations, using a multitask objective function. Iteratively, this workflow significantly improves the shooting efficiency without any prior knowledge of the process. In addition, the optimized CVs can be used with biased enhanced sampling methodologies to accurately reconstruct the free energy profiles. We tested the method on three different systems: a two-dimensional toy model, conformational transitions of alanine dipeptide, and hydrolysis of acetyl chloride in bulk water. In the latter, we integrated our workflow with an active learning scheme to learn a reactive machine learning-based potential, which allowed us to study the mechanism and free energy profile with an ab initio-like accuracy.
Collapse
Affiliation(s)
- Jintu Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
- Atomistic Simulations, Italian Institute of Technology, Genova 16152, Italy
| | - Odin Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Luigi Bonati
- Atomistic Simulations, Italian Institute of Technology, Genova 16152, Italy
| | - TingJun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
4
|
Keller BG, Bolhuis PG. Dynamical Reweighting for Biased Rare Event Simulations. Annu Rev Phys Chem 2024; 75:137-162. [PMID: 38941527 DOI: 10.1146/annurev-physchem-083122-124538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2024]
Abstract
Dynamical reweighting techniques aim to recover the correct molecular dynamics from a simulation at a modified potential energy surface. They are important for unbiasing enhanced sampling simulations of molecular rare events. Here, we review the theoretical frameworks of dynamical reweighting for modified potentials. Based on an overview of kinetic models with increasing level of detail, we discuss techniques to reweight two-state dynamics, multistate dynamics, and path integrals. We explore the natural link to transition path sampling and how the effect of nonequilibrium forces can be reweighted. We end by providing an outlook on how dynamical reweighting integrates with techniques for optimizing collective variables and with modern potential energy surfaces.
Collapse
Affiliation(s)
- Bettina G Keller
- Department of Biology, Chemistry and Pharmacy, Freie Universität Berlin, Berlin, Germany;
| | - Peter G Bolhuis
- Van 't Hoff Institute for Molecular Sciences, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
5
|
Lelièvre T, Pigeon T, Stoltz G, Zhang W. Analyzing Multimodal Probability Measures with Autoencoders. J Phys Chem B 2024; 128:2607-2631. [PMID: 38466759 DOI: 10.1021/acs.jpcb.3c07075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Finding collective variables to describe some important coarse-grained information on physical systems, in particular metastable states, remains a key issue in molecular dynamics. Recently, machine learning techniques have been intensively used to complement and possibly bypass expert knowledge in order to construct collective variables. Our focus here is on neural network approaches based on autoencoders. We study some relevant mathematical properties of the loss function considered for training autoencoders and provide physical interpretations based on conditional variances and minimum energy paths. We also consider various extensions in order to better describe physical systems, by incorporating more information on transition states at saddle points, and/or allowing for multiple decoders in order to describe several transition paths. Our results are illustrated on toy two-dimensional systems and on alanine dipeptide.
Collapse
Affiliation(s)
- Tony Lelièvre
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
| | - Thomas Pigeon
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
- IFP Energies Nouvelles, Rond-Point de l'Echangeur de Solaize, BP 3, 69360 Solaize, France
| | - Gabriel Stoltz
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
| | - Wei Zhang
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany
- Zuse Institute Berlin, Takustraße 7, 14195 Berlin, Germany
| |
Collapse
|
6
|
Fu H, Bian H, Shao X, Cai W. Collective Variable-Based Enhanced Sampling: From Human Learning to Machine Learning. J Phys Chem Lett 2024; 15:1774-1783. [PMID: 38329095 DOI: 10.1021/acs.jpclett.3c03542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Enhanced-sampling algorithms relying on collective variables (CVs) are extensively employed to study complex (bio)chemical processes that are not amenable to brute-force molecular simulations. The selection of appropriate CVs characterizing the slow movement modes is of paramount importance for reliable and efficient enhanced-sampling simulations. In this Perspective, we first review the application and limitations of CVs obtained from chemical and geometrical intuition. We also introduce path-sampling algorithms, which can identify path-like CVs in a high-dimensional free-energy space. Machine-learning algorithms offer a viable approach to finding suitable CVs by analyzing trajectories from preliminary simulations. We discuss both the performance of machine-learning-derived CVs in enhanced-sampling simulations of experimental models and the challenges involved in applying these CVs to realistic, complex molecular assemblies. Moreover, we provide a prospective view of the potential advancements of machine-learning algorithms for the development of CVs in the field of enhanced-sampling simulations.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Hengwei Bian
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
7
|
Nicolle A, Deng S, Ihme M, Kuzhagaliyeva N, Ibrahim EA, Farooq A. Mixtures Recomposition by Neural Nets: A Multidisciplinary Overview. J Chem Inf Model 2024; 64:597-620. [PMID: 38284618 DOI: 10.1021/acs.jcim.3c01633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]
Abstract
Artificial Neural Networks (ANNs) are transforming how we understand chemical mixtures, providing an expressive view of the chemical space and multiscale processes. Their hybridization with physical knowledge can bridge the gap between predictivity and understanding of the underlying processes. This overview explores recent progress in ANNs, particularly their potential in the 'recomposition' of chemical mixtures. Graph-based representations reveal patterns among mixture components, and deep learning models excel in capturing complexity and symmetries when compared to traditional Quantitative Structure-Property Relationship models. Key components, such as Hamiltonian networks and convolution operations, play a central role in representing multiscale mixtures. The integration of ANNs with Chemical Reaction Networks and Physics-Informed Neural Networks for inverse chemical kinetic problems is also examined. The combination of sensors with ANNs shows promise in optical and biomimetic applications. A common ground is identified in the context of statistical physics, where ANN-based methods iteratively adapt their models by blending their initial states with training data. The concept of mixture recomposition unveils a reciprocal inspiration between ANNs and reactive mixtures, highlighting learning behaviors influenced by the training environment.
Collapse
Affiliation(s)
- Andre Nicolle
- Aramco Fuel Research Center, Rueil-Malmaison 92852, France
| | - Sili Deng
- Massachusetts Institute of Technology, Cambridge 02139, Massachusetts, United States
| | - Matthias Ihme
- Stanford University, Stanford 94305, California, United States
| | | | - Emad Al Ibrahim
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Aamir Farooq
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| |
Collapse
|
8
|
Herringer NSM, Dasetty S, Gandhi D, Lee J, Ferguson AL. Permutationally Invariant Networks for Enhanced Sampling (PINES): Discovery of Multimolecular and Solvent-Inclusive Collective Variables. J Chem Theory Comput 2024; 20:178-198. [PMID: 38150421 DOI: 10.1021/acs.jctc.3c00923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
The typically rugged nature of molecular free-energy landscapes can frustrate efficient sampling of the thermodynamically relevant phase space due to the presence of high free-energy barriers. Enhanced sampling techniques can improve phase space exploration by accelerating sampling along particular collective variables (CVs). A number of techniques exist for the data-driven discovery of CVs parametrizing the important large-scale motions of the system. A challenge to CV discovery is learning CVs invariant to the symmetries of the molecular system, frequently rigid translation, rigid rotation, and permutational relabeling of identical particles. Of these, permutational invariance has proved a persistent challenge in frustrating the data-driven discovery of multimolecular CVs in systems of self-assembling particles and solvent-inclusive CVs for solvated systems. In this work, we integrate permutation invariant vector (PIV) featurizations with autoencoding neural networks to learn nonlinear CVs invariant to translation, rotation, and permutation and perform interleaved rounds of CV discovery and enhanced sampling to iteratively expand the sampling of configurational phase space and obtain converged CVs and free-energy landscapes. We demonstrate the permutationally invariant network for enhanced sampling (PINES) approach in applications to the self-assembly of a 13-atom argon cluster, association/dissociation of a NaCl ion pair in water, and hydrophobic collapse of a C45H92 n-pentatetracontane polymer chain. We make the approach freely available as a new module within the PLUMED2 enhanced sampling libraries.
Collapse
Affiliation(s)
| | - Siva Dasetty
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Diya Gandhi
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Junhee Lee
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
9
|
Green AJ, Truong L, Thunga P, Leong C, Hancock M, Tanguay RL, Reif DM. Deep autoencoder-based behavioral pattern recognition outperforms standard statistical methods in high-dimensional zebrafish studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.13.557544. [PMID: 37745446 PMCID: PMC10515950 DOI: 10.1101/2023.09.13.557544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Zebrafish have become an essential tool in screening for developmental neurotoxic chemicals and their molecular targets. The success of zebrafish as a screening model is partially due to their physical characteristics including their relatively simple nervous system, rapid development, experimental tractability, and genetic diversity combined with technical advantages that allow for the generation of large amounts of high-dimensional behavioral data. These data are complex and require advanced machine learning and statistical techniques to comprehensively analyze and capture spatiotemporal responses. To accomplish this goal, we have trained semi-supervised deep autoencoders using behavior data from unexposed larval zebrafish to extract quintessential "normal" behavior. Following training, our network was evaluated using data from larvae shown to have significant changes in behavior (using a traditional statistical framework) following exposure to toxicants that include nanomaterials, aromatics, per- and polyfluoroalkyl substances (PFAS), and other environmental contaminants. Further, our model identified new chemicals (Perfluoro-n-octadecanoic acid, 8-Chloroperfluorooctylphosphonic acid, and Nonafluoropentanamide) as capable of inducing abnormal behavior at multiple chemical-concentrations pairs not captured using distance moved alone. Leveraging this deep learning model will allow for better characterization of the different exposure-induced behavioral phenotypes, facilitate improved genetic and neurobehavioral analysis in mechanistic determination studies and provide a robust framework for analyzing complex behaviors found in higher-order model systems.
Collapse
Affiliation(s)
- Adrian J Green
- Department of Biological Sciences, NC State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, NC State University, Raleigh, North Carolina, United States of America
| | - Lisa Truong
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| | - Preethi Thunga
- Department of Statistics, NC State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, NC State University, Raleigh, North Carolina, United States of America
| | - Connor Leong
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| | - Melody Hancock
- Department of Biological Sciences, NC State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, NC State University, Raleigh, North Carolina, United States of America
| | - Robyn L Tanguay
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| | - David M Reif
- Department of Biological Sciences, NC State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, NC State University, Raleigh, North Carolina, United States of America
| |
Collapse
|
10
|
Mouaffac L, Palacio-Rodriguez K, Pietrucci F. Optimal Reaction Coordinates and Kinetic Rates from the Projected Dynamics of Transition Paths. J Chem Theory Comput 2023; 19:5701-5711. [PMID: 37550088 DOI: 10.1021/acs.jctc.3c00158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/09/2023]
Abstract
Finding optimal reaction coordinates and predicting accurate kinetic rates for activated processes are two of the foremost challenges of molecular simulations. We introduce an algorithm that tackles the two problems at once: starting from a limited number of reactive molecular dynamics trajectories (transition paths), we automatically generate with a Monte Carlo approach a sequence of different reaction coordinates that progressively reduce the kinetic rate of their projected effective dynamics. Based on a variational principle, the minimal rate accurately approximates the exact one, and it corresponds to the optimal reaction coordinate. After benchmarking the method on an analytic double-well system, we apply it to complex atomistic systems: the interaction of carbon nanoparticles of different sizes in water.
Collapse
Affiliation(s)
- Line Mouaffac
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, F-75005 Paris, France
| | - Karen Palacio-Rodriguez
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, F-75005 Paris, France
| | - Fabio Pietrucci
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, F-75005 Paris, France
| |
Collapse
|
11
|
Chen H, Roux B, Chipot C. Discovering Reaction Pathways, Slow Variables, and Committor Probabilities with Machine Learning. J Chem Theory Comput 2023; 19:4414-4426. [PMID: 37224455 PMCID: PMC11372462 DOI: 10.1021/acs.jctc.3c00028] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
A significant challenge faced by atomistic simulations is the difficulty, and often impossibility, to sample the transitions between metastable states of the free-energy landscape associated with slow molecular processes. Importance-sampling schemes represent an appealing option to accelerate the underlying dynamics by smoothing out the relevant free-energy barriers, but require the definition of suitable reaction-coordinate (RC) models expressed in terms of compact low-dimensional sets of collective variables (CVs). While most computational studies of slow molecular processes have traditionally relied on educated guesses based on human intuition to reduce the dimensionality of the problem at hand, a variety of machine-learning (ML) algorithms have recently emerged as powerful alternatives to discover meaningful CVs capable of capturing the dynamics of the slowest degrees of freedom. Considering a simple paradigmatic situation in which the long-time dynamics is dominated by the transition between two known metastable states, we compare two variational data-driven ML methods based on Siamese neural networks aimed at discovering a meaningful RC model─the slowest decorrelating CV of the molecular process, and the committor probability to first reach one of the two metastable states. One method is the state-free reversible variational approach for Markov processes networks (VAMPnets), or SRVs─the other, inspired by the transition path theory framework, is the variational committor-based neural networks, or VCNs. The relationship and the ability of these methodologies to discover the relevant descriptors of the slow molecular process of interest are illustrated with a series of simple model systems. We also show that both strategies are amenable to importance-sampling schemes through an appropriate reweighting algorithm that approximates the kinetic properties of the transition.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy cedex, France
| | - Benoît Roux
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, United States
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy cedex, France
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, United States
- NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute for Advanced Science and Technology, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
12
|
Dutta P, Sengupta N. Efficient Interrogation of the Kinetic Barriers Demarcating Catalytic States of a Tyrosine Kinase with Optimal Physical Descriptors and Mixture Models. Chemphyschem 2023; 24:e202200595. [PMID: 36394126 DOI: 10.1002/cphc.202200595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 11/16/2022] [Accepted: 11/16/2022] [Indexed: 11/18/2022]
Abstract
Computer simulations are increasingly used to access thermo-kinetic information underlying structural transformation of protein kinases. Such information are necessary to probe their roles in disease progression and interactions with drug targets. However, the investigations are frequently challenged by forbiddingly high computational expense, and by the lack of standard protocols for the design of low dimensional physical descriptors that encode system features important for transitions. Here, we consider the demarcating characteristics of the different states of Abelson tyrosine kinase associated with distinct catalytic activity to construct a set of physically meaningful, orthogonal collective variables that preserve the slow modes of the system. Independent sampling of each metastable state is followed by the estimation of global partition function along the appropriate physical descriptors using the modified Expectation Maximized Molecular Dynamics method. The resultant free energy barriers are in excellent agreement with experimentally known rate-limiting dynamics and activation energy computed with conventional enhanced sampling methods. We discuss possible directions for further development and applications.
Collapse
Affiliation(s)
- Pallab Dutta
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER) Kolkata, Mohanpur, 741246, India
| | - Neelanjana Sengupta
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER) Kolkata, Mohanpur, 741246, India
| |
Collapse
|
13
|
Chen H, Chipot C. Chasing collective variables using temporal data-driven strategies. QRB DISCOVERY 2023; 4:e2. [PMID: 37564298 PMCID: PMC10411323 DOI: 10.1017/qrd.2022.23] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/21/2022] [Accepted: 12/29/2022] [Indexed: 01/09/2023] Open
Abstract
The convergence of free-energy calculations based on importance sampling depends heavily on the choice of collective variables (CVs), which in principle, should include the slow degrees of freedom of the biological processes to be investigated. Autoencoders (AEs), as emerging data-driven dimension reduction tools, have been utilised for discovering CVs. AEs, however, are often treated as black boxes, and what AEs actually encode during training, and whether the latent variables from encoders are suitable as CVs for further free-energy calculations remains unknown. In this contribution, we review AEs and their time-series-based variants, including time-lagged AEs (TAEs) and modified TAEs, as well as the closely related model variational approach for Markov processes networks (VAMPnets). We then show through numerical examples that AEs learn the high-variance modes instead of the slow modes. In stark contrast, time series-based models are able to capture the slow modes. Moreover, both modified TAEs with extensions from slow feature analysis and the state-free reversible VAMPnets (SRVs) can yield orthogonal multidimensional CVs. As an illustration, we employ SRVs to discover the CVs of the isomerizations of N-acetyl-N'-methylalanylamide and trialanine by iterative learning with trajectories from biased simulations. Last, through numerical experiments with anisotropic diffusion, we investigate the potential relationship of time-series-based models and committor probabilities.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
- Theoretical and Computational Biophysics Group, Beckman Institute, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL61801, USA
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL60637, USA
| |
Collapse
|
14
|
Wu S, Li H, Ma A. A Rigorous Method for Identifying a One-Dimensional Reaction Coordinate in Complex Molecules. J Chem Theory Comput 2022; 18:2836-2844. [PMID: 35427129 DOI: 10.1021/acs.jctc.2c00132] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Understanding the mechanism of functional protein dynamics is critical to understanding protein functions. Reaction coordinate (RC) is a central topic in protein dynamics, and the grail is to find the one-dimensional RC (1D-RC) that can fully determine the value of a committor (i.e., the reaction probability in configuration space) for any protein configuration. We present a new method that, for the first time, uses a fundamental mechanical operator, the generalized work functional, to identify the rigorous 1D-RC in complex molecules. For a prototypical biomolecular isomerization reaction, the 1D-RC identified by the current method can determine the committor with an accuracy far exceeding what was achieved by previous methods. This method only requires modest computational cost and can be readily applied to large molecules. Most importantly, the generalized work functional is the physical determinant of the collectivity in functional protein dynamics and provides a tentative roadmap that connects the structure of a protein to its function.
Collapse
Affiliation(s)
- Shanshan Wu
- Richard Loan and Hill Department of Biomedical Engineering, The University of Illinois at Chicago, 851 South Morgan Street, Chicago, Illinois 60607, United States
| | - Huiyu Li
- Richard Loan and Hill Department of Biomedical Engineering, The University of Illinois at Chicago, 851 South Morgan Street, Chicago, Illinois 60607, United States
| | - Ao Ma
- Richard Loan and Hill Department of Biomedical Engineering, The University of Illinois at Chicago, 851 South Morgan Street, Chicago, Illinois 60607, United States
| |
Collapse
|
15
|
Kikutsuji T, Mori Y, Okazaki KI, Mori T, Kim K, Matubayasi N. Explaining reaction coordinates of alanine dipeptide isomerization obtained from deep neural networks using Explainable Artificial Intelligence (XAI). J Chem Phys 2022; 156:154108. [DOI: 10.1063/5.0087310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A method for obtaining appropriate reaction coordinates is required to identify transition states distinguishing product and reactant in complex molecular systems. Recently, abundant research has been devoted to obtaining reaction coordinates using artificial neural networks from deep learning literature, where many collective variables are typically utilized in the input layer. However, it is difficult to explain the details of which collective variables contribute to the predicted reaction coordinates owing to the complexity of the nonlinear functions in deep neural networks. To overcome this limitation, we used Explainable Artificial Intelligence (XAI) methods of the Local Interpretable Model-agnostic Explanation (LIME) and the game theory-based framework known as Shapley Additive exPlanations (SHAP). We demonstrated that XAI enables us to obtain the degree of contribution of each collective variable to reaction coordinates that is determined by nonlinear regressions with deep learning for the committor of the alanine dipeptide isomerization in vacuum. In particular, both LIME and SHAP provide important features to the predicted reaction coordinates, which are characterized by appropriate dihedral angles consistent with those previously reported from the committor test analysis. The present study offers an AI-aided framework to explain the appropriate reaction coordinates, which acquires considerable significance when the number of degrees of freedom increases.
Collapse
Affiliation(s)
| | | | - Kei-ichi Okazaki
- Department of Theoretical and Computational Molecular Science, Institute for Molecular Science, Japan
| | - Toshifumi Mori
- Kyushu University Institute for Materials Chemistry and Engineering, Japan
| | - Kang Kim
- Graduate School of Engineering Science, Osaka University - Toyonaka Campus, Japan
| | - Nobuyuki Matubayasi
- Division of Chemical Engineering, Graduate School of Engineering Science, Osaka University, Japan
| |
Collapse
|
16
|
Paul TK, Taraphder S. Nonlinear Reaction Coordinate of an Enzyme Catalyzed Proton Transfer Reaction. J Phys Chem B 2022; 126:1413-1425. [PMID: 35138854 DOI: 10.1021/acs.jpcb.1c08760] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
We present an in-depth study on the theoretical calculation of an optimum reaction coordinate as a linear or nonlinear combination of important collective variables (CVs) sampled from an ensemble of reactive transition paths for an intramolecular proton transfer reaction catalyzed by the enzyme human carbonic anhydrase (HCA) II. The linear models are optimized by likelihood maximization for a given number of CVs. The nonlinear models are based on an artificial neural network with the same number of CVs and optimized by minimizing the root-mean-square error in comparison to a training set of committor estimators generated for the given transition. The nonlinear reaction coordinate thus obtained yields the free energy of activation and rate constant as 9.46 kcal mol-1 and 1.25 × 106 s-1, respectively. These estimates are found to be in quantitative agreement with the known experimental results. We have also used an extended autoencoder model to show that a similar analysis can be carried out using a single CV only. The resultant free energies and kinetics of the reaction slightly overestimate the experimental data. The implications of these results are discussed using a detailed microkinetic scheme of the proton transfer reaction catalyzed by HCA II.
Collapse
Affiliation(s)
- Tanmoy Kumar Paul
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Srabani Taraphder
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| |
Collapse
|
17
|
Belkacemi Z, Gkeka P, Lelièvre T, Stoltz G. Chasing Collective Variables Using Autoencoders and Biased Trajectories. J Chem Theory Comput 2021; 18:59-78. [PMID: 34965117 DOI: 10.1021/acs.jctc.1c00415] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Free energy biasing methods have proven to be powerful tools to accelerate the simulation of important conformational changes of molecules by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e., collective variables (CVs). Alternatively, such CVs can be identified using machine learning (ML) and dimensionality reduction algorithms. In this context, approaches where the CVs are learned in an iterative way using adaptive biasing have been proposed: at each iteration, the learned CV is used to perform free energy adaptive biasing to generate new data and learn a new CV. In this paper, we introduce a new iterative method involving CV learning with autoencoders: Free Energy Biasing and Iterative Learning with AutoEncoders (FEBILAE). Our method includes a reweighting scheme to ensure that the learning model optimizes the same loss at each iteration and achieves CV convergence. Using the alanine dipeptide system and the solvated chignolin mini-protein system as examples, we present results of our algorithm using the extended adaptive biasing force as the free energy adaptive biasing method.
Collapse
Affiliation(s)
- Zineb Belkacemi
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,Structure Design and Informatics, Sanofi 1371 R&D, 91385 Chilly-Mazarin, France
| | - Paraskevi Gkeka
- Structure Design and Informatics, Sanofi 1371 R&D, 91385 Chilly-Mazarin, France
| | - Tony Lelièvre
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,MATHERIALS Team-Project, Inria, 75589 Paris, France
| | - Gabriel Stoltz
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,MATHERIALS Team-Project, Inria, 75589 Paris, France
| |
Collapse
|