1
|
Nikidis E, Kyriakopoulos N, Tohid R, Kachrimanis K, Kioseoglou J. Harnessing machine learning for efficient large-scale interatomic potential for sildenafil and pharmaceuticals containing H, C, N, O, and S. NANOSCALE 2024; 16:18014-18026. [PMID: 39252581 DOI: 10.1039/d4nr00929k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
In this study a cutting-edge approach to producing accurate and computationally efficient interatomic potentials using machine learning algorithms is presented. Specifically, the study focuses on the application of Allegro, a novel machine learning algorithm, running on high-performance GPUs for training potentials. The choice of training parameters plays a pivotal role in the quality of the potential functions. To enable this methodology, the "Solvated Protein Fragments" dataset, containing nearly 2.7 million Density Functional Theory (DFT) calculations for many-body intermolecular interactions involving protein fragments and water molecules, encompassing H, C, N, O, and S elements, is considered as the training dataset. The project optimizes computational efficiency by reducing the initial dataset size according to the intended application. To assess the efficacy of the approach, the sildenafil citrate, iso-sildenafil, aspirin, ibuprofen, mebendazole and urea, representing all five relevant elements, serve as the test bed. The results of the Allegro-trained potentials demonstrate outstanding performance, benefiting from the combination of an appropriate training dataset and parameter selection. This notably enhanced computational efficiency when compared to the computationally intensive DFT method aided by GPU acceleration. Validation of the produced interatomic potentials is achieved through Allegro's own evaluation mechanism, yielding exceptional accuracy. Further verification is carried out through LAMMPS molecular dynamics simulations. Structural optimization by energy minimization and NPT Molecular Dynamics simulations are performed for each potential, assessing relaxation processes and energy reduction. Additional structures, including urea, ammonia, uracil, oxalic acid, and acetic acid, are tested, highlighting the potential's versatility in describing systems containing the aforementioned elements. Visualization of the results confirms the scientific accuracy of each structure's relaxation. The findings of this study demonstrate strong scaling and the potential for applications in pharmaceutical research, allowing the exploration of larger molecular structures not previously amenable to computational analysis at this level of accuracy The success of the machine learning approach underscores its potential to revolutionize computational solid-state physics.
Collapse
Affiliation(s)
- E Nikidis
- Physics Department, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece.
- Center for Interdisciplinary Research & Innovation, Aristotle University of Thessaloniki, Greece
| | - N Kyriakopoulos
- Physics Department, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece.
- Center for Interdisciplinary Research & Innovation, Aristotle University of Thessaloniki, Greece
| | - R Tohid
- Center of Computation and Technology, Louisiana State University, 70803 Baton Rouge, USA
| | - K Kachrimanis
- Center for Interdisciplinary Research & Innovation, Aristotle University of Thessaloniki, Greece
- Pharmaceutical Technology Department, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - J Kioseoglou
- Physics Department, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece.
- Center for Interdisciplinary Research & Innovation, Aristotle University of Thessaloniki, Greece
| |
Collapse
|
2
|
Ruzmetov T, Hung TI, Jonnalagedda SP, Chen SH, Fasihianifard P, Guo Z, Bhanu B, Chang CEA. Sampling Conformational Ensembles of Highly Dynamic Proteins via Generative Deep Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.05.592587. [PMID: 38979147 PMCID: PMC11230202 DOI: 10.1101/2024.05.05.592587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Proteins are inherently dynamic, and their conformational ensembles are functionally important in biology. Large-scale motions may govern protein structure-function relationship, and numerous transient but stable conformations of intrinsically disordered proteins (IDPs) can play a crucial role in biological function. Investigating conformational ensembles to understand regulations and disease-related aggregations of IDPs is challenging both experimentally and computationally. In this paper we first introduced an unsupervised deep learning-based model, termed Internal Coordinate Net (ICoN), which learns the physical principles of conformational changes from molecular dynamics (MD) simulation data. Second, we selected interpolating data points in the learned latent space that rapidly identify novel synthetic conformations with sophisticated and large-scale sidechains and backbone arrangements. Third, with the highly dynamic amyloid-β 1-42 (Aβ42) monomer, our deep learning model provided a comprehensive sampling of Aβ42's conformational landscape. Analysis of these synthetic conformations revealed conformational clusters that can be used to rationalize experimental findings. Additionally, the method can identify novel conformations with important interactions in atomistic details that are not included in the training data. New synthetic conformations showed distinct sidechain rearrangements that are probed by our EPR and amino acid substitution studies. This approach is highly transferable and can be used for any available data for training. The work also demonstrated the ability for deep learning to utilize learned natural atomistic motions in protein conformation sampling.
Collapse
|
3
|
Herringer NSM, Dasetty S, Gandhi D, Lee J, Ferguson AL. Permutationally Invariant Networks for Enhanced Sampling (PINES): Discovery of Multimolecular and Solvent-Inclusive Collective Variables. J Chem Theory Comput 2024; 20:178-198. [PMID: 38150421 DOI: 10.1021/acs.jctc.3c00923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
The typically rugged nature of molecular free-energy landscapes can frustrate efficient sampling of the thermodynamically relevant phase space due to the presence of high free-energy barriers. Enhanced sampling techniques can improve phase space exploration by accelerating sampling along particular collective variables (CVs). A number of techniques exist for the data-driven discovery of CVs parametrizing the important large-scale motions of the system. A challenge to CV discovery is learning CVs invariant to the symmetries of the molecular system, frequently rigid translation, rigid rotation, and permutational relabeling of identical particles. Of these, permutational invariance has proved a persistent challenge in frustrating the data-driven discovery of multimolecular CVs in systems of self-assembling particles and solvent-inclusive CVs for solvated systems. In this work, we integrate permutation invariant vector (PIV) featurizations with autoencoding neural networks to learn nonlinear CVs invariant to translation, rotation, and permutation and perform interleaved rounds of CV discovery and enhanced sampling to iteratively expand the sampling of configurational phase space and obtain converged CVs and free-energy landscapes. We demonstrate the permutationally invariant network for enhanced sampling (PINES) approach in applications to the self-assembly of a 13-atom argon cluster, association/dissociation of a NaCl ion pair in water, and hydrophobic collapse of a C45H92 n-pentatetracontane polymer chain. We make the approach freely available as a new module within the PLUMED2 enhanced sampling libraries.
Collapse
Affiliation(s)
| | - Siva Dasetty
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Diya Gandhi
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Junhee Lee
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
4
|
Liu Z. Accelerating Kinetics with Time-Reversal Path Sampling. Molecules 2023; 28:8147. [PMID: 38138635 PMCID: PMC10745403 DOI: 10.3390/molecules28248147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 12/07/2023] [Accepted: 12/13/2023] [Indexed: 12/24/2023] Open
Abstract
In comparison to numerous enhanced sampling methods for equilibrium thermodynamics, accelerating simulations for kinetics and nonequilibrium statistics are relatively rare and less effective. Here, we derive a time-reversal path sampling (tRPS) method based on time reversibility to accelerate simulations for determining the transition rates between free-energy basins. It converts the difficult uphill path sampling into an easy downhill problem. This method is easy to implement, i.e., forward and backward shooting simulations with opposite initial velocities are conducted from random initial conformations within a transition-state region until they reach the basin minima, which are then assembled to give the distribution of transition paths efficiently. The effects of tRPS are demonstrated using a comparison with direct simulations of protein folding and unfolding, where tRPS is shown to give results consistent with direct simulations and increase the efficiency by up to five orders of magnitude. This approach is generally applicable to stochastic processes with microscopic reversibility, regardless of whether the variables are continuous or discrete.
Collapse
Affiliation(s)
- Zhirong Liu
- Beijing National Laboratory for Molecular Sciences (BNLMS), College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| |
Collapse
|
5
|
Patil K, Wang Y, Chen Z, Suresh K, Radhakrishnan R. Activating mutations drive human MEK1 kinase using a gear-shifting mechanism. Biochem J 2023; 480:1733-1751. [PMID: 37869794 PMCID: PMC10872882 DOI: 10.1042/bcj20230281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/30/2023] [Accepted: 10/20/2023] [Indexed: 10/24/2023]
Abstract
There is an unmet need to classify cancer-promoting kinase mutations in a mechanistically cognizant way. The challenge is to understand how mutations stabilize different kinase configurations to alter function, and how this influences pathogenic potential of the kinase and its responses to therapeutic inhibitors. This goal is made more challenging by the complexity of the mutational landscape of diseases, and is further compounded by the conformational plasticity of each variant where multiple conformations coexist. We focus here on the human MEK1 kinase, a vital component of the RAS/MAPK pathway in which mutations cause cancers and developmental disorders called RASopathies. We sought to explore how these mutations alter the human MEK1 kinase at atomic resolution by utilizing enhanced sampling simulations and free energy calculations. We computationally mapped the different conformational stabilities of individual mutated systems by delineating the free energy landscapes, and showed how this relates directly to experimentally quantified developmental transformation potentials of the mutations. We conclude that mutations leverage variations in the hydrogen bonding network associated with the conformational plasticity to progressively stabilize the active-like conformational state of the kinase while destabilizing the inactive-like state. The mutations alter residue-level internal molecular correlations by differentially prioritizing different conformational states, delineating the various modes of MEK1 activation reminiscent of a gear-shifting mechanism. We define the molecular basis of conversion of this kinase from its inactive to its active state, connecting structure, dynamics, and function by delineating the energy landscape and conformational plasticity, thus augmenting our understanding of MEK1 regulation.
Collapse
Affiliation(s)
- Keshav Patil
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, U.S.A
| | - Yiming Wang
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, U.S.A
| | - Zhangtao Chen
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, U.S.A
| | - Krishna Suresh
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, U.S.A
| | - Ravi Radhakrishnan
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, U.S.A
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, U.S.A
| |
Collapse
|
6
|
Shi J, Albreiki F, Yamil J Colón, Srivastava S, Whitmer JK. Transfer Learning Facilitates the Prediction of Polymer-Surface Adhesion Strength. J Chem Theory Comput 2023; 19:4631-4640. [PMID: 37068204 DOI: 10.1021/acs.jctc.2c01314] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
Abstract
Machine learning (ML) accelerates the exploration of material properties and their links to the structure of the underlying molecules. In previous work [Shi et al. ACS Applied Materials & Interfaces 2022, 14, 37161-37169.], ML models were applied to predict the adhesive free energy of polymer-surface interactions with high accuracy from the knowledge of the sequence data, demonstrating successes in inverse-design of polymer sequence for known surface compositions. While the method was shown to be successful in designing polymers for a known surface, extensive data sets were needed for each specific surface in order to train the surrogate models. Ideally, one should be able to infer information about similar surfaces without having to regenerate a full complement of adhesion data for each new case. In the current work, we demonstrate a transfer learning (TL) technique using a deep neural network to improve the accuracy of ML models trained on small data sets by pretraining on a larger database from a related system and fine-tuning the weights of all layers with a small amount of additional data. The shared knowledge from the pretrained model facilitates the prediction accuracy significantly on small data sets. We also explore the limits of database size on accuracy and the optimal tuning of network architecture and parameters for our learning tasks. While applied to a relatively simple coarse-grained (CG) polymer model, the general lessons of this study apply to detailed modeling studies and the broader problems of inverse materials design.
Collapse
Affiliation(s)
- Jiale Shi
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Fahed Albreiki
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Yamil J Colón
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Samanvaya Srivastava
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, Los Angeles, California 90095, United States
- California NanoSystems Institute, Center for Biological Physics, University of California, Los Angeles, Los Angeles, California 90095, United States
- Institute for Carbon Management, University of California, Los Angeles, Los Angeles, California 90095, United States
- Center for Biological Physics, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Jonathan K Whitmer
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| |
Collapse
|
7
|
Naleem N, Abreu CRA, Warmuz K, Tong M, Kirmizialtin S, Tuckerman ME. An exploration of machine learning models for the determination of reaction coordinates associated with conformational transitions. J Chem Phys 2023; 159:034102. [PMID: 37458344 DOI: 10.1063/5.0147597] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 06/23/2023] [Indexed: 07/20/2023] Open
Abstract
Determining collective variables (CVs) for conformational transitions is crucial to understanding their dynamics and targeting them in enhanced sampling simulations. Often, CVs are proposed based on intuition or prior knowledge of a system. However, the problem of systematically determining a proper reaction coordinate (RC) for a specific process in terms of a set of putative CVs can be achieved using committor analysis (CA). Identifying essential degrees of freedom that govern such transitions using CA remains elusive because of the high dimensionality of the conformational space. Various schemes exist to leverage the power of machine learning (ML) to extract an RC from CA. Here, we extend these studies and compare the ability of 17 different ML schemes to identify accurate RCs associated with conformational transitions. We tested these methods on an alanine dipeptide in vacuum and on a sarcosine dipeptoid in an implicit solvent. Our comparison revealed that the light gradient boosting machine method outperforms other methods. In order to extract key features from the models, we employed Shapley Additive exPlanations analysis and compared its interpretation with the "feature importance" approach. For the alanine dipeptide, our methodology identifies ϕ and θ dihedrals as essential degrees of freedom in the C7ax to C7eq transition. For the sarcosine dipeptoid system, the dihedrals ψ and ω are the most important for the cisαD to transαD transition. We further argue that analysis of the full dynamical pathway, and not just endpoint states, is essential for identifying key degrees of freedom governing transitions.
Collapse
Affiliation(s)
- Nawavi Naleem
- Chemistry Program, Science Division, New York University, Abu Dhabi, UAE
| | - Charlles R A Abreu
- Chemical Engineering Department, Escola de Química, Universidade Federal do Rio de Janeiro, 21941-909 Rio de Janeiro, RJ, Brazil
| | - Krzysztof Warmuz
- Computer Science Program, Science Division, New York University, Abu Dhabi, UAE
| | - Muchen Tong
- Department of Chemistry, New York University (NYU), New York, New York 10003, USA
| | - Serdal Kirmizialtin
- Chemistry Program, Science Division, New York University, Abu Dhabi, UAE
- Department of Chemistry, New York University (NYU), New York, New York 10003, USA
- Center for Smart Engineering Materials, New York University, Abu Dhabi, UAE
| | - Mark E Tuckerman
- Department of Chemistry, New York University (NYU), New York, New York 10003, USA
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, USA
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, 3663 Zhongshan Rd. North, Shanghai 200062, China
- Simons Center for Computational Physical Chemistry at New York University, New York, New York 10003, USA
| |
Collapse
|
8
|
Xiao S, Song Z, Tian H, Tao P. Assessments of Variational Autoencoder in Protein Conformation Exploration. JOURNAL OF COMPUTATIONAL BIOPHYSICS AND CHEMISTRY 2023; 22:489-501. [PMID: 38826699 PMCID: PMC11138204 DOI: 10.1142/s2737416523500217] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Molecular dynamics (MD) simulations have been extensively used to study protein dynamics and subsequently functions. However, MD simulations are often insufficient to explore adequate conformational space for protein functions within reachable timescales. Accordingly, many enhanced sampling methods, including variational autoencoder (VAE) based methods, have been developed to address this issue. The purpose of this study is to evaluate the feasibility of using VAE to assist in the exploration of protein conformational landscapes. Using three modeling systems, we showed that VAE could capture high-level hidden information which distinguishes protein conformations. These models could also be used to generate new physically plausible protein conformations for direct sampling in favorable conformational spaces. We also found that VAE worked better in interpolation than extrapolation and increasing latent space dimension could lead to a trade-off between performances and complexities.
Collapse
Affiliation(s)
- Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| | - Zilin Song
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| | - Hao Tian
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| |
Collapse
|
9
|
Chen S, Kalanat N, Xie Y, Li S, Zwart JA, Sadler JM, Appling AP, Oliver SK, Read JS, Jia X. Physics-guided machine learning from simulated data with different physical parameters. Knowl Inf Syst 2023. [DOI: 10.1007/s10115-023-01864-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
10
|
Tian H, Jiang X, Xiao S, La Force H, Larson EC, Tao P. LAST: Latent Space-Assisted Adaptive Sampling for Protein Trajectories. J Chem Inf Model 2023; 63:67-75. [PMID: 36472885 PMCID: PMC9904845 DOI: 10.1021/acs.jcim.2c01213] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Molecular dynamics (MD) simulation is widely used to study protein conformations and dynamics. However, conventional simulation suffers from being trapped in some local energy minima that are hard to escape. Thus, most of the computational time is spent sampling in the already visited regions. This leads to an inefficient sampling process and further hinders the exploration of protein movements in affordable simulation time. The advancement of deep learning provides new opportunities for protein sampling. Variational autoencoders are a class of deep learning models to learn a low-dimensional representation (referred to as the latent space) that can capture the key features of the input data. Based on this characteristic, we proposed a new adaptive sampling method, latent space-assisted adaptive sampling for protein trajectories (LAST), to accelerate the exploration of protein conformational space. This method comprises cycles of (i) variational autoencoder training, (ii) seed structure selection on the latent space, and (iii) conformational sampling through additional MD simulations. The proposed approach is validated through the sampling of four structures of two protein systems: two metastable states of Escherichia coli adenosine kinase (ADK) and two native states of Vivid (VVD). In all four conformations, seed structures were shown to lie on the boundary of conformation distributions. Moreover, large conformational changes were observed in a shorter simulation time when compared with structural dissimilarity sampling (SDS) and conventional MD (cMD) simulations in both systems. In metastable ADK simulations, LAST explored two transition paths toward two stable states, while SDS explored only one and cMD neither. In VVD light state simulations, LAST was three times faster than cMD simulation with a similar conformational space. Overall, LAST is comparable to SDS and is a promising tool in adaptive sampling. The LAST method is publicly available at https://github.com/smu-tao-group/LAST to facilitate related research.
Collapse
Affiliation(s)
- Hao Tian
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas75206, United States
| | - Xi Jiang
- Department of Statistical Science, Southern Methodist University, Dallas, Texas75206, United States
| | - Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas75206, United States
| | - Hunter La Force
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas75206, United States
| | - Eric C Larson
- Department of Computer Science, Southern Methodist University, Dallas, Texas75206, United States
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas75206, United States
| |
Collapse
|
11
|
Chen H, Chipot C. Chasing collective variables using temporal data-driven strategies. QRB DISCOVERY 2023; 4:e2. [PMID: 37564298 PMCID: PMC10411323 DOI: 10.1017/qrd.2022.23] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/21/2022] [Accepted: 12/29/2022] [Indexed: 01/09/2023] Open
Abstract
The convergence of free-energy calculations based on importance sampling depends heavily on the choice of collective variables (CVs), which in principle, should include the slow degrees of freedom of the biological processes to be investigated. Autoencoders (AEs), as emerging data-driven dimension reduction tools, have been utilised for discovering CVs. AEs, however, are often treated as black boxes, and what AEs actually encode during training, and whether the latent variables from encoders are suitable as CVs for further free-energy calculations remains unknown. In this contribution, we review AEs and their time-series-based variants, including time-lagged AEs (TAEs) and modified TAEs, as well as the closely related model variational approach for Markov processes networks (VAMPnets). We then show through numerical examples that AEs learn the high-variance modes instead of the slow modes. In stark contrast, time series-based models are able to capture the slow modes. Moreover, both modified TAEs with extensions from slow feature analysis and the state-free reversible VAMPnets (SRVs) can yield orthogonal multidimensional CVs. As an illustration, we employ SRVs to discover the CVs of the isomerizations of N-acetyl-N'-methylalanylamide and trialanine by iterative learning with trajectories from biased simulations. Last, through numerical experiments with anisotropic diffusion, we investigate the potential relationship of time-series-based models and committor probabilities.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
- Theoretical and Computational Biophysics Group, Beckman Institute, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL61801, USA
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL60637, USA
| |
Collapse
|
12
|
Baima J, Goryaeva AM, Swinburne TD, Maillet JB, Nastar M, Marinica MC. Capabilities and limits of autoencoders for extracting collective variables in atomistic materials science. Phys Chem Chem Phys 2022; 24:23152-23163. [PMID: 36128869 DOI: 10.1039/d2cp01917e] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Free energy calculations in materials science are routinely hindered by the need to provide reaction coordinates that can meaningfully partition atomic configuration space, a prerequisite for most enhanced sampling approaches. Recent studies on molecular systems have highlighted the possibility of constructing appropriate collective variables directly from atomic motions through deep learning techniques. Here we extend this class of approaches to condensed matter problems, for which we encode the finite temperature collective variable by an iterative procedure starting from 0 K features of the energy landscape i.e. activation events or migration mechanisms given by a minimum - saddle point - minimum sequence. We employ the autoencoder neural networks in order to build a scalar collective variable for use with the adaptive biasing force method. Particular attention is given to design choices required for application to crystalline systems with defects, including the filtering of thermal motions which otherwise dominate the autoencoder input. The machine-learning workflow is tested on body-centered cubic iron and its common defects, such as small vacancy or self-interstitial clusters and screw dislocations. For localized defects, excellent collective variables as well as derivatives, necessary for free energy sampling, are systematically obtained. However, the approach has a limited accuracy when dealing with reaction coordinates that include atomic displacements of a magnitude comparable to thermal motions, e.g. the ones produced by the long-range elastic field of dislocations. We then combine the extraction of collective variables by autoencoders with an adaptive biasing force free energy method based on Bayesian inference. Using a vacancy migration as an example, we demonstrate the performance of coupling these two approaches for simultaneous discovery of reaction coordinates and free energy sampling in systems with localized defects.
Collapse
Affiliation(s)
- Jacopo Baima
- Université Paris-Saclay, CEA, Service de Recherches de Métallurgie Physique, Gif-sur-Yvette 91191, France.
| | - Alexandra M Goryaeva
- Université Paris-Saclay, CEA, Service de Recherches de Métallurgie Physique, Gif-sur-Yvette 91191, France.
| | - Thomas D Swinburne
- Aix-Marseille Université, CNRS, CINaM UMR 7325, Campus de Luminy, 13288 Marseille, France
| | | | - Maylise Nastar
- Université Paris-Saclay, CEA, Service de Recherches de Métallurgie Physique, Gif-sur-Yvette 91191, France.
| | - Mihai-Cosmin Marinica
- Université Paris-Saclay, CEA, Service de Recherches de Métallurgie Physique, Gif-sur-Yvette 91191, France.
| |
Collapse
|
13
|
Bhakat S. Collective variable discovery in the age of machine learning: reality, hype and everything in between. RSC Adv 2022; 12:25010-25024. [PMID: 36199882 PMCID: PMC9437778 DOI: 10.1039/d2ra03660f] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/20/2022] [Indexed: 11/21/2022] Open
Abstract
Understanding the kinetics and thermodynamics profile of biomolecules is necessary to understand their functional roles which has a major impact in mechanism driven drug discovery. Molecular dynamics simulation has been routinely used to understand conformational dynamics and molecular recognition in biomolecules. Statistical analysis of high-dimensional spatiotemporal data generated from molecular dynamics simulation requires identification of a few low-dimensional variables which can describe the essential dynamics of a system without significant loss of information. In physical chemistry, these low-dimensional variables are often called collective variables. Collective variables are used to generate reduced representations of free energy surfaces and calculate transition probabilities between different metastable basins. However the choice of collective variables is not trivial for complex systems. Collective variables range from geometric criteria such as distances and dihedral angles to abstract ones such as weighted linear combinations of multiple geometric variables. The advent of machine learning algorithms led to increasing use of abstract collective variables to represent biomolecular dynamics. In this review, I will highlight several nuances of commonly used collective variables ranging from geometric to abstract ones. Further, I will put forward some cases where machine learning based collective variables were used to describe simple systems which in principle could have been described by geometric ones. Finally, I will put forward my thoughts on artificial general intelligence and how it can be used to discover and predict collective variables from spatiotemporal data generated by molecular dynamics simulations.
Collapse
Affiliation(s)
- Soumendranath Bhakat
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania Pennsylvania 19104-6059 USA +1 30549 32620
| |
Collapse
|
14
|
Monroe JI, Shen VK. Systematic Control of Collective Variables Learned from Variational Autoencoders. J Chem Phys 2022; 157:094116. [DOI: 10.1063/5.0105120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Variational autoencoders (VAEs) are rapidly gaining popularity within molecular simulation for discovering low-dimensional, or latent, representations, which are critical for both analyzing and accelerating simulations. However, it remains unclear how the information a VAE learns is connected to its probabilistic structure, and, in turn, its loss function. Previous studies have focused on feature engineering, \emph{ad hoc} modifications to loss functions, or adjustment of the prior to enforce desirable latent space properties. By applying effectively arbitrarily flexible priors via normalizing flows, we focus instead on how adjusting the structure of the decoding model impacts the learned latent coordinate. We systematically adjust the power and flexibility of the decoding distribution, observing that this has a significant impact on the structure of the latent space as measured by a suite of metrics developed in this work. By also varying weights on separate terms within each VAE loss function, we show that the level of detail encoded can be further tuned. This provides practical guidance for utilizing VAEs to extract varying resolutions of low-dimensional information from molecular dynamics and Monte Carlo simulations.
Collapse
|
15
|
Li Y, Gong H. Identifying a Feasible Transition Pathway between Two Conformational States for a Protein. J Chem Theory Comput 2022; 18:4529-4543. [PMID: 35723447 DOI: 10.1021/acs.jctc.2c00390] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Proteins usually need to transit between different conformational states to fulfill their biological functions. In the mechanistic study of such transition processes by molecular dynamics simulations, identification of the minimum free energy path (MFEP) can substantially reduce the sampling space, thus enabling rigorous thermodynamic evaluation of the process. Conventionally, the MFEP is derived by iterative local optimization from an initial path, which is typically generated by simple brute force techniques like the targeted molecular dynamics (tMD). Therefore, the quality of the initial path determines the successfulness of MFEP estimation. In this work, we propose a method to improve derivation of the initial path. Through iterative relaxation-biasing simulations in a bidirectional manner, this method can construct a feasible transition pathway connecting two known states for a protein. Evaluation on small, fast-folding proteins against long equilibrium trajectories supports the good sampling efficiency of our method. When applied to larger proteins including the catalytic domain of human c-Src kinase as well as the converter domain of myosin VI, the paths generated by our method deviate significantly from those computed with the generic tMD approach. More importantly, free energy profiles and intermediate states obtained from our paths exhibit remarkable improvements over those from tMD paths with respect to both physical rationality and consistency with a priori knowledge.
Collapse
Affiliation(s)
- Yao Li
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
16
|
Monroe JI, Shen VK. Learning Efficient, Collective Monte Carlo Moves with Variational Autoencoders. J Chem Theory Comput 2022; 18:3622-3636. [PMID: 35613327 PMCID: PMC11210279 DOI: 10.1021/acs.jctc.2c00110] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Discovering meaningful collective variables for enhancing sampling, via applied biasing potentials or tailored MC move sets, remains a major challenge within molecular simulation. While recent studies identifying collective variables with variational autoencoders (VAEs) have focused on the encoding and latent space discovered by a VAE, the impact of the decoding and its ability to act as a generative model remains unexplored. We demonstrate how VAEs may be used to learn (on-the-fly and with minimal human intervention) highly efficient, collective Monte Carlo moves that accelerate sampling along the learned collective variable. In contrast to many machine learning-based efforts to bias sampling and generate novel configurations, our methods result in exact sampling in the ensemble of interest and do not require reweighting. In fact, we show that the acceptance rates of our moves approach unity for a perfect VAE model. While this is never observed in practice, VAE-based Monte Carlo moves still enhance sampling of new configurations. We demonstrate, however, that the form of the encoding and decoding distributions, in particular the extent to which the decoder reflects the underlying physics, greatly impacts the performance of the trained VAE.
Collapse
Affiliation(s)
- Jacob I Monroe
- Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899-8320, United States
| | - Vincent K Shen
- Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899-8320, United States
| |
Collapse
|
17
|
|
18
|
Ketkaew R, Creazzo F, Luber S. Machine Learning-Assisted Discovery of Hidden States in Expanded Free Energy Space. J Phys Chem Lett 2022; 13:1797-1805. [PMID: 35171614 DOI: 10.1021/acs.jpclett.1c04004] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Collective variables (CVs) are crucial parameters in enhanced sampling calculations and strongly impact the quality of the obtained free energy surface. However, many existing CVs are unique to and dependent on the system they are constructed with, making the developed CV non-transferable to other systems. Herein, we develop a non-instructor-led deep autoencoder neural network (DAENN) for discovering general-purpose CVs. The DAENN is used to train a model by learning molecular representations upon unbiased trajectories that contain only the reactant conformers. The prior knowledge of nonconstraint reactants coupled with the here-introduced topology variable and loss-like penalty function are only required to make the biasing method able to expand its configurational (phase) space to unexplored energy basins. Our developed autoencoder is efficient and relatively inexpensive to use in terms of a priori knowledge, enabling one to automatically search for hidden CVs of the reaction of interest.
Collapse
Affiliation(s)
- Rangsiman Ketkaew
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Fabrizio Creazzo
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Sandra Luber
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| |
Collapse
|
19
|
Wang D, Wang Y, Chang J, Zhang L, Wang H, E W. Efficient sampling of high-dimensional free energy landscapes using adaptive reinforced dynamics. NATURE COMPUTATIONAL SCIENCE 2022; 2:20-29. [PMID: 38177702 DOI: 10.1038/s43588-021-00173-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Accepted: 11/15/2021] [Indexed: 01/06/2024]
Abstract
Enhanced sampling methods such as metadynamics and umbrella sampling have become essential tools for exploring the configuration space of molecules and materials. At the same time, they have long faced a number of issues such as the inefficiency when dealing with a large number of collective variables (CVs) or systems with high free energy barriers. Here we show that, with clustering and adaptive tuning techniques, the reinforced dynamics (RiD) scheme can be used to efficiently explore the configuration space and free energy landscapes with a large number of CVs or systems with high free energy barriers. We illustrate this by studying various representative and challenging examples. First we demonstrate the efficiency of adaptive RiD compared with other methods and construct the nine-dimensional (9D) free energy landscape of a peptoid trimer, which has energy barriers of more than 8 kcal mol-1. We then study the folding of the protein chignolin using 18 CVs. In this case, both the folding and unfolding rates are observed to be 4.30 μs-1. Finally, we propose a protein structure refinement protocol based on RiD. This protocol allows us to efficiently employ more than 100 CVs for exploring the landscape of protein structures and it gives rise to an overall improvement of 14.6 units over the initial global distance test-high accuracy (GDT-HA) score.
Collapse
Affiliation(s)
- Dongdong Wang
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
- DP Technology, Beijing, People's Republic of China
| | - Yanze Wang
- DP Technology, Beijing, People's Republic of China
- College of Chemistry and Molecular Engineering, Peking University, Beijing, People's Republic of China
| | - Junhan Chang
- DP Technology, Beijing, People's Republic of China
- College of Chemistry and Molecular Engineering, Peking University, Beijing, People's Republic of China
| | - Linfeng Zhang
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA.
- DP Technology, Beijing, People's Republic of China.
| | - Han Wang
- Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, Beijing, People's Republic of China.
| | - Weinan E
- School of Mathematical Sciences, Peking University, Beijing, People's Republic of China
- Department of Mathematics and Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
- Beijing Institute of Big Data Research, Beijing, People's Republic of China
| |
Collapse
|
20
|
Beyerle ER, Guenza MG. Identifying the leading dynamics of ubiquitin: A comparison between the tICA and the LE4PD slow fluctuations in amino acids' position. J Chem Phys 2021; 155:244108. [PMID: 34972386 DOI: 10.1063/5.0059688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Molecular Dynamics (MD) simulations of proteins implicitly contain the information connecting the atomistic molecular structure and proteins' biologically relevant motion, where large-scale fluctuations are deemed to guide folding and function. In the complex multiscale processes described by MD trajectories, it is difficult to identify, separate, and study those large-scale fluctuations. This problem can be formulated as the need to identify a small number of collective variables that guide the slow kinetic processes. The most promising method among the ones used to study the slow leading processes in proteins' dynamics is the time-structure based on time-lagged independent component analysis (tICA), which identifies the dominant components in a noisy signal. Recently, we developed an anisotropic Langevin approach for the dynamics of proteins, called the anisotropic Langevin Equation for Protein Dynamics or LE4PD-XYZ. This approach partitions the protein's MD dynamics into mostly uncorrelated, wavelength-dependent, diffusive modes. It associates with each mode a free-energy map, where one measures the spatial extension and the time evolution of the mode-dependent, slow dynamical fluctuations. Here, we compare the tICA modes' predictions with the collective LE4PD-XYZ modes. We observe that the two methods consistently identify the nature and extension of the slowest fluctuation processes. The tICA separates the leading processes in a smaller number of slow modes than the LE4PD does. The LE4PD provides time-dependent information at short times and a formal connection to the physics of the kinetic processes that are missing in the pure statistical analysis of tICA.
Collapse
Affiliation(s)
- E R Beyerle
- Institute for Fundamental Science and Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon 97403, USA
| | - M G Guenza
- Institute for Fundamental Science and Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon 97403, USA
| |
Collapse
|
21
|
Tian H, Jiang X, Trozzi F, Xiao S, Larson EC, Tao P. Explore Protein Conformational Space With Variational Autoencoder. Front Mol Biosci 2021; 8:781635. [PMID: 34869602 PMCID: PMC8633506 DOI: 10.3389/fmolb.2021.781635] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 10/28/2021] [Indexed: 12/02/2022] Open
Abstract
Molecular dynamics (MD) simulations have been actively used in the study of protein structure and function. However, extensive sampling in the protein conformational space requires large computational resources and takes a prohibitive amount of time. In this study, we demonstrated that variational autoencoders (VAEs), a type of deep learning model, can be employed to explore the conformational space of a protein through MD simulations. VAEs are shown to be superior to autoencoders (AEs) through a benchmark study, with low deviation between the training and decoded conformations. Moreover, we show that the learned latent space in the VAE can be used to generate unsampled protein conformations. Additional simulations starting from these generated conformations accelerated the sampling process and explored hidden spaces in the conformational landscape.
Collapse
Affiliation(s)
- Hao Tian
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| | - Xi Jiang
- Department of Statistical Science, Southern Methodist University, Dallas, TX, United States
| | - Francesco Trozzi
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| | - Sian Xiao
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| | - Eric C. Larson
- Department of Computer Science, Southern Methodist University, Dallas, TX, United States
| | - Peng Tao
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| |
Collapse
|
22
|
Chen M. Collective variable-based enhanced sampling and machine learning. THE EUROPEAN PHYSICAL JOURNAL. B 2021; 94:211. [PMID: 34697536 PMCID: PMC8527828 DOI: 10.1140/epjb/s10051-021-00220-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 10/03/2021] [Indexed: 05/14/2023]
Abstract
ABSTRACT Collective variable-based enhanced sampling methods have been widely used to study thermodynamic properties of complex systems. Efficiency and accuracy of these enhanced sampling methods are affected by two factors: constructing appropriate collective variables for enhanced sampling and generating accurate free energy surfaces. Recently, many machine learning techniques have been developed to improve the quality of collective variables and the accuracy of free energy surfaces. Although machine learning has achieved great successes in improving enhanced sampling methods, there are still many challenges and open questions. In this perspective, we shall review recent developments on integrating machine learning techniques and collective variable-based enhanced sampling approaches. We also discuss challenges and future research directions including generating kinetic information, exploring high-dimensional free energy surfaces, and efficiently sampling all-atom configurations. GRAPHIC ABSTRACT
Collapse
Affiliation(s)
- Ming Chen
- Department of Chemistry, Purdue University, West Lafayette, IN 47907 USA
| |
Collapse
|
23
|
Moritsugu K. Multiscale Enhanced Sampling Using Machine Learning. Life (Basel) 2021; 11:life11101076. [PMID: 34685447 PMCID: PMC8540671 DOI: 10.3390/life11101076] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 10/06/2021] [Accepted: 10/08/2021] [Indexed: 01/18/2023] Open
Abstract
Multiscale enhanced sampling (MSES) allows for an enhanced sampling of all-atom protein structures by coupling with the accelerated dynamics of the associated coarse-grained (CG) model. In this paper, we propose an MSES extension to replace the CG model with the dynamics on the reduced subspace generated by a machine learning approach, the variational autoencoder (VAE). The molecular dynamic (MD) trajectories of the ribose-binding protein (RBP) in both the closed and open forms were used as the input by extracting the inter-residue distances as the structural features in order to train the VAE model, allowing the encoded latent layer to characterize the difference in the structural dynamics of the closed and open forms. The interpolated data characterizing the RBP structural change in between the closed and open forms were thus efficiently generated in the low-dimensional latent space of the VAE, which was then decoded into the time-series data of the inter-residue distances and was useful for driving the structural sampling at an atomistic resolution via the MSES scheme. The free energy surfaces on the latent space demonstrated the refinement of the generated data that had a single basin into the simulated data containing two closed and open basins, thus illustrating the usefulness of the MD simulation together with the molecular mechanics force field in recovering the correct structural ensemble.
Collapse
Affiliation(s)
- Kei Moritsugu
- Graduate School of Medical Life Science, Yokohama City University, Yokohama 230-0045, Japan
| |
Collapse
|
24
|
Fas BA, Maiani E, Sora V, Kumar M, Mashkoor M, Lambrughi M, Tiberti M, Papaleo E. The conformational and mutational landscape of the ubiquitin-like marker for autophagosome formation in cancer. Autophagy 2021; 17:2818-2841. [PMID: 33302793 PMCID: PMC8525936 DOI: 10.1080/15548627.2020.1847443] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Revised: 10/28/2020] [Accepted: 11/03/2020] [Indexed: 02/06/2023] Open
Abstract
Macroautophagy/autophagy is a cellular process to recycle damaged cellular components, and its modulation can be exploited for disease treatments. A key autophagy player is the ubiquitin-like protein MAP1LC3B/LC3B. Mutations and changes in MAP1LC3B expression occur in cancer samples. However, the investigation of the effects of these mutations on MAP1LC3B protein structure is still missing. Despite many LC3B structures that have been solved, a comprehensive study, including dynamics, has not yet been undertaken. To address this knowledge gap, we assessed nine physical models for biomolecular simulations for their capabilities to describe the structural ensemble of MAP1LC3B. With the resulting MAP1LC3B structural ensembles, we characterized the impact of 26 missense mutations from pan-cancer studies with different approaches, and we experimentally validated our prediction for six variants using cellular assays. Our findings shed light on damaging or neutral mutations in MAP1LC3B, providing an atlas of its modifications in cancer. In particular, P32Q mutation was found detrimental for protein stability with a propensity to aggregation. In a broader context, our framework can be applied to assess the pathogenicity of protein mutations or to prioritize variants for experimental studies, allowing to comprehensively account for different aspects that mutational events alter in terms of protein structure and function.Abbreviations: ATG: autophagy-related; Cα: alpha carbon; CG: coarse-grained; CHARMM: Chemistry at Harvard macromolecular mechanics; CONAN: contact analysis; FUNDC1: FUN14 domain containing 1; FYCO1: FYVE and coiled-coil domain containing 1; GABARAP: GABA type A receptor-associated protein; GROMACS: Groningen machine for chemical simulations; HP: hydrophobic pocket; LIR: LC3 interacting region; MAP1LC3B/LC3B microtubule associated protein 1 light chain 3 B; MD: molecular dynamics; OPTN: optineurin; OSF: open software foundation; PE: phosphatidylethanolamine, PLEKHM1: pleckstrin homology domain-containing family M 1; PSN: protein structure network; PTM: post-translational modification; SA: structural alphabet; SLiM: short linear motif; SQSTM1/p62: sequestosome 1; WT: wild-type.
Collapse
Affiliation(s)
- Burcu Aykac Fas
- Computational Biology Laboratory, Danish Cancer Society Research Center, Copenhagen, Denmark
| | - Emiliano Maiani
- Computational Biology Laboratory, Danish Cancer Society Research Center, Copenhagen, Denmark
| | - Valentina Sora
- Computational Biology Laboratory, Danish Cancer Society Research Center, Copenhagen, Denmark
| | - Mukesh Kumar
- Computational Biology Laboratory, Danish Cancer Society Research Center, Copenhagen, Denmark
| | - Maliha Mashkoor
- Computational Biology Laboratory, Danish Cancer Society Research Center, Copenhagen, Denmark
| | - Matteo Lambrughi
- Computational Biology Laboratory, Danish Cancer Society Research Center, Copenhagen, Denmark
| | - Matteo Tiberti
- Computational Biology Laboratory, Danish Cancer Society Research Center, Copenhagen, Denmark
| | - Elena Papaleo
- Computational Biology Laboratory, Danish Cancer Society Research Center, Copenhagen, Denmark
- Translational Disease Systems Biology, Faculty of Health and Medical Sciences, Novo Nordisk Foundation Center for Protein Research University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
25
|
Bandyopadhyay S, Mondal J. A deep autoencoder framework for discovery of metastable ensembles in biomacromolecules. J Chem Phys 2021; 155:114106. [PMID: 34551528 DOI: 10.1063/5.0059965] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Biomacromolecules manifest dynamic conformational fluctuation and involve mutual interconversion among metastable states. A robust mapping of their conformational landscape often requires the low-dimensional projection of the conformational ensemble along optimized collective variables (CVs). However, the traditional choice for the CV is often limited by user-intuition and prior knowledge about the system, and this lacks a rigorous assessment of their optimality over other candidate CVs. To address this issue, we propose an approach in which we first choose the possible combinations of inter-residue Cα-distances within a given macromolecule as a set of input CVs. Subsequently, we derive a non-linear combination of latent space embedded CVs via auto-encoding the unbiased molecular dynamics simulation trajectories within the framework of the feed-forward neural network. We demonstrate the ability of the derived latent space variables in elucidating the conformational landscape in four hierarchically complex systems. The latent space CVs identify key metastable states of a bead-in-a-spring polymer. The combination of the adopted dimensional reduction technique with a Markov state model, built on the derived latent space, reveals multiple spatially and kinetically well-resolved metastable conformations for GB1 β-hairpin. A quantitative comparison based on the variational approach-based scoring of the auto-encoder-derived latent space CVs with the ones obtained via independent component analysis (principal component analysis or time-structured independent component analysis) confirms the optimality of the former. As a practical application, the auto-encoder-derived CVs were found to predict the reinforced folding of a Trp-cage mini-protein in aqueous osmolyte solution. Finally, the protocol was able to decipher the conformational heterogeneities involved in a complex metalloenzyme, namely, cytochrome P450.
Collapse
Affiliation(s)
- Satyabrata Bandyopadhyay
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500046, India
| | - Jagannath Mondal
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500046, India
| |
Collapse
|
26
|
Glielmo A, Husic BE, Rodriguez A, Clementi C, Noé F, Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem Rev 2021; 121:9722-9758. [PMID: 33945269 PMCID: PMC8391792 DOI: 10.1021/acs.chemrev.0c01195] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Indexed: 12/21/2022]
Abstract
Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
Collapse
Affiliation(s)
- Aldo Glielmo
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
| | - Brooke E. Husic
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
| | - Alex Rodriguez
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| | - Cecilia Clementi
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Frank Noé
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Alessandro Laio
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| |
Collapse
|
27
|
Träger S, Tamò G, Aydin D, Fonti G, Audagnotto M, Dal Peraro M. CLoNe: automated clustering based on local density neighborhoods for application to biomolecular structural ensembles. Bioinformatics 2021; 37:921-928. [PMID: 32821900 PMCID: PMC8128458 DOI: 10.1093/bioinformatics/btaa742] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 07/14/2020] [Accepted: 08/18/2020] [Indexed: 11/14/2022] Open
Abstract
Motivation Proteins are intrinsically dynamic entities. Flexibility sampling methods, such as molecular dynamics or those arising from integrative modeling strategies, are now commonplace and enable the study of molecular conformational landscapes in many contexts. Resulting structural ensembles increase in size as technological and algorithmic advancements take place, making their analysis increasingly demanding. In this regard, cluster analysis remains a go-to approach for their classification. However, many state-of-the-art algorithms are restricted to specific cluster properties. Combined with tedious parameter fine-tuning, cluster analysis of protein structural ensembles suffers from the lack of a generally applicable and easy to use clustering scheme. Results We present CLoNe, an original Python-based clustering scheme that builds on the Density Peaks algorithm of Rodriguez and Laio. CLoNe relies on a probabilistic analysis of local density distributions derived from nearest neighbors to find relevant clusters regardless of cluster shape, size, distribution and amount. We show its capabilities on many toy datasets with properties otherwise dividing state-of-the-art approaches and improves on the original algorithm in key aspects. Applied to structural ensembles, CLoNe was able to extract meaningful conformations from membrane binding events and ligand-binding pocket opening as well as identify dominant dimerization motifs or inter-domain organization. CLoNe additionally saves clusters as individual trajectories for further analysis and provides scripts for automated use with molecular visualization software. Availability and implementation www.epfl.ch/labs/lbm/resources, github.com/LBM-EPFL/CLoNe. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sylvain Träger
- Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne 1025, Switzerland.,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Giorgio Tamò
- Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne 1025, Switzerland.,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Deniz Aydin
- Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne 1025, Switzerland.,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Giulia Fonti
- Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne 1025, Switzerland.,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Martina Audagnotto
- Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne 1025, Switzerland.,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Matteo Dal Peraro
- Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne 1025, Switzerland.,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| |
Collapse
|
28
|
Computational methods for exploring protein conformations. Biochem Soc Trans 2021; 48:1707-1724. [PMID: 32756904 PMCID: PMC7458412 DOI: 10.1042/bst20200193] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 07/07/2020] [Accepted: 07/09/2020] [Indexed: 12/13/2022]
Abstract
Proteins are dynamic molecules that can transition between a potentially wide range of structures comprising their conformational ensemble. The nature of these conformations and their relative probabilities are described by a high-dimensional free energy landscape. While computer simulation techniques such as molecular dynamics simulations allow characterisation of the metastable conformational states and the transitions between them, and thus free energy landscapes, to be characterised, the barriers between states can be high, precluding efficient sampling without substantial computational resources. Over the past decades, a dizzying array of methods have emerged for enhancing conformational sampling, and for projecting the free energy landscape onto a reduced set of dimensions that allow conformational states to be distinguished, known as collective variables (CVs), along which sampling may be directed. Here, a brief description of what biomolecular simulation entails is followed by a more detailed exposition of the nature of CVs and methods for determining these, and, lastly, an overview of the myriad different approaches for enhancing conformational sampling, most of which rely upon CVs, including new advances in both CV determination and conformational sampling due to machine learning.
Collapse
|
29
|
Ward MD, Zimmerman MI, Meller A, Chung M, Swamidass SJ, Bowman GR. Deep learning the structural determinants of protein biochemical properties by comparing structural ensembles with DiffNets. Nat Commun 2021; 12:3023. [PMID: 34021153 PMCID: PMC8140102 DOI: 10.1038/s41467-021-23246-1] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 04/16/2021] [Indexed: 12/05/2022] Open
Abstract
Understanding the structural determinants of a protein's biochemical properties, such as activity and stability, is a major challenge in biology and medicine. Comparing computer simulations of protein variants with different biochemical properties is an increasingly powerful means to drive progress. However, success often hinges on dimensionality reduction algorithms for simplifying the complex ensemble of structures each variant adopts. Unfortunately, common algorithms rely on potentially misleading assumptions about what structural features are important, such as emphasizing larger geometric changes over smaller ones. Here we present DiffNets, self-supervised autoencoders that avoid such assumptions, and automatically identify the relevant features, by requiring that the low-dimensional representations they learn are sufficient to predict the biochemical differences between protein variants. For example, DiffNets automatically identify subtle structural signatures that predict the relative stabilities of β-lactamase variants and duty ratios of myosin isoforms. DiffNets should also be applicable to understanding other perturbations, such as ligand binding.
Collapse
Affiliation(s)
- Michael D Ward
- Department of Biochemistry & Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for the Science and Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO, USA
| | - Maxwell I Zimmerman
- Department of Biochemistry & Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for the Science and Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO, USA
| | - Artur Meller
- Department of Biochemistry & Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for the Science and Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO, USA
| | - Moses Chung
- Department of Biochemistry & Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for the Science and Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO, USA
| | - S J Swamidass
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - Gregory R Bowman
- Department of Biochemistry & Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA.
- Center for the Science and Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
30
|
Machine learning in protein structure prediction. Curr Opin Chem Biol 2021; 65:1-8. [PMID: 34015749 DOI: 10.1016/j.cbpa.2021.04.005] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 04/10/2021] [Indexed: 12/31/2022]
Abstract
Prediction of protein structure from sequence has been intensely studied for many decades, owing to the problem's importance and its uniquely well-defined physical and computational bases. While progress has historically ebbed and flowed, the past two years saw dramatic advances driven by the increasing "neuralization" of structure prediction pipelines, whereby computations previously based on energy models and sampling procedures are replaced by neural networks. The extraction of physical contacts from the evolutionary record; the distillation of sequence-structure patterns from known structures; the incorporation of templates from homologs in the Protein Databank; and the refinement of coarsely predicted structures into finely resolved ones have all been reformulated using neural networks. Cumulatively, this transformation has resulted in algorithms that can now predict single protein domains with a median accuracy of 2.1 Å, setting the stage for a foundational reconfiguration of the role of biomolecular modeling within the life sciences.
Collapse
|
31
|
Hoseini P, Zhao L, Shehu A. Generative deep learning for macromolecular structure and dynamics. Curr Opin Struct Biol 2020; 67:170-177. [PMID: 33338762 DOI: 10.1016/j.sbi.2020.11.012] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 11/16/2020] [Accepted: 11/23/2020] [Indexed: 01/06/2023]
Abstract
Much scientific enquiry across disciplines is founded upon a mechanistic treatment of dynamic systems that ties form to function. A highly visible instance of this is in molecular biology, where characterizing macromolecular structure and dynamics is central to a detailed, molecular-level understanding of biological processes in the living cell. The current computational paradigm utilizes optimization as the generative process for modeling both structure and structural dynamics. Computational biology researchers are now attempting to wield generative models employing deep neural networks as an alternative computational paradigm. In this review, we summarize such efforts. We highlight progress and shortcomings. More importantly, we expose challenges that macromolecular structure poses to deep generative models and take this opportunity to introduce the structural biology community to several recent advances in the deep learning community that promise a way forward.
Collapse
Affiliation(s)
- Pourya Hoseini
- Department of Computer Science, George Mason University, 4400 University Drive, Fairfax, VA 22030, USA; Center for Advancing Human-Machine Partnerships, George Mason University, 4400 University Drive, Fairfax, VA 22030, USA
| | - Liang Zhao
- Department of Computer Science, Emory University, 201 Dowman Dr, Atlanta, GA 30322, USA; Center for Advancing Human-Machine Partnerships, George Mason University, 4400 University Drive, Fairfax, VA 22030, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, 4400 University Drive, Fairfax, VA 22030, USA; Center for Advancing Human-Machine Partnerships, George Mason University, 4400 University Drive, Fairfax, VA 22030, USA.
| |
Collapse
|
32
|
Bernetti M, Bertazzo M, Masetti M. Data-Driven Molecular Dynamics: A Multifaceted Challenge. Pharmaceuticals (Basel) 2020; 13:E253. [PMID: 32961909 PMCID: PMC7557855 DOI: 10.3390/ph13090253] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 09/14/2020] [Accepted: 09/16/2020] [Indexed: 12/18/2022] Open
Abstract
The big data concept is currently revolutionizing several fields of science including drug discovery and development. While opening up new perspectives for better drug design and related strategies, big data analysis strongly challenges our current ability to manage and exploit an extraordinarily large and possibly diverse amount of information. The recent renewal of machine learning (ML)-based algorithms is key in providing the proper framework for addressing this issue. In this respect, the impact on the exploitation of molecular dynamics (MD) simulations, which have recently reached mainstream status in computational drug discovery, can be remarkable. Here, we review the recent progress in the use of ML methods coupled to biomolecular simulations with potentially relevant implications for drug design. Specifically, we show how different ML-based strategies can be applied to the outcome of MD simulations for gaining knowledge and enhancing sampling. Finally, we discuss how intrinsic limitations of MD in accurately modeling biomolecular systems can be alleviated by including information coming from experimental data.
Collapse
Affiliation(s)
- Mattia Bernetti
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), via Bonomea 265, I-34136 Trieste, Italy;
| | - Martina Bertazzo
- Computational Sciences, Istituto Italiano di Tecnologia, via Morego 30, I-16163 Genova, Italy;
| | - Matteo Masetti
- Department of Pharmacy and Biotechnology, Alma Mater Studiorum—Università di Bologna, via Belmeloro 6, I-40126 Bologna, Italy
| |
Collapse
|
33
|
Gkeka P, Stoltz G, Barati Farimani A, Belkacemi Z, Ceriotti M, Chodera JD, Dinner AR, Ferguson AL, Maillet JB, Minoux H, Peter C, Pietrucci F, Silveira A, Tkatchenko A, Trstanova Z, Wiewiora R, Lelièvre T. Machine Learning Force Fields and Coarse-Grained Variables in Molecular Dynamics: Application to Materials and Biological Systems. J Chem Theory Comput 2020; 16:4757-4775. [PMID: 32559068 PMCID: PMC8312194 DOI: 10.1021/acs.jctc.0c00355] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Machine learning encompasses tools and algorithms that are now becoming popular in almost all scientific and technological fields. This is true for molecular dynamics as well, where machine learning offers promises of extracting valuable information from the enormous amounts of data generated by simulation of complex systems. We provide here a review of our current understanding of goals, benefits, and limitations of machine learning techniques for computational studies on atomistic systems, focusing on the construction of empirical force fields from ab initio databases and the determination of reaction coordinates for free energy computation and enhanced sampling.
Collapse
Affiliation(s)
- Paraskevi Gkeka
- Integrated Drug Discovery, Sanofi R&D, 91385 Chilly-Mazarin, France
| | - Gabriel Stoltz
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
- Matherials Project-Team, Inria Paris, 75012 Paris, France
| | | | - Zineb Belkacemi
- Integrated Drug Discovery, Sanofi R&D, 91385 Chilly-Mazarin, France
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
| | - Michele Ceriotti
- Laboratory of Computational Science and Modelling, Institute of Materials, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Aaron R Dinner
- Department of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, United States
| | | | - Hervé Minoux
- Integrated Drug Discovery, Sanofi R&D, 94403 Vitry-sur-Seine, France
| | | | - Fabio Pietrucci
- UMR CNRS 7590, MNHN, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, Sorbonne Université, 75005 Paris, France
| | - Ana Silveira
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Zofia Trstanova
- School of Mathematics, The University of Edinburgh, Edinburgh EH9 3FD, U.K
| | - Rafal Wiewiora
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Tony Lelièvre
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
- Matherials Project-Team, Inria Paris, 75012 Paris, France
| |
Collapse
|
34
|
Zhang J, Gong H. Frontier Expansion Sampling: A Method to Accelerate Conformational Search by Identifying Novel Seed Structures for Restart. J Chem Theory Comput 2020; 16:4813-4821. [PMID: 32585102 DOI: 10.1021/acs.jctc.0c00064] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Traditional molecular dynamics (MD) simulations have difficulties in tracking the slow molecular motions, at least partially due to the waste of sampling in already sampled regions. Here, we proposed a new enhanced sampling method, frontier expansion sampling (FEXS), to improve the sampling efficiency of molecular simulations by iteratively selecting seed structures diversely distributed at the "frontier" of an already sampled region to initiate new simulations. Different from other enhanced sampling methods, FEXS identifies the "frontier" seeds by integrating the Gaussian mixture model and the convex hull algorithm, which effectively improves the structural variation among the selected seeds and thus the descendant simulations. Validation in three protein systems, including the folding of chignolin, open-to-closed transition of maltodextrin binding protein, and internal conformational change of bovine pancreatic trypsin inhibitor, confirmed the effectiveness of this novel method in enhancing the sampling of conventional MD simulations to observe the large-scale protein conformational changes. When compared with other enhanced sampling methods like the structural dissimilarity sampling (SDS), FEXS reached at least the same level of sampling efficiency but was capable of providing complementary information in the three tested protein systems.
Collapse
Affiliation(s)
- Juanrong Zhang
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
35
|
Shamsi Z, Chan M, Shukla D. TLmutation: Predicting the Effects of Mutations Using Transfer Learning. J Phys Chem B 2020; 124:3845-3854. [PMID: 32308006 DOI: 10.1021/acs.jpcb.0c00197] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
A reccurring challenge in bioinformatics is predicting the phenotypic consequence of amino acid variation in proteins. With the recent advancements in sequencing techniques, sufficient genomic data has become available to train models that predict the evolutionary statistical energies, but there is still inadequate experimental data to directly predict functional effects. One approach to overcome this data scarcity is to apply transfer learning and train more models with available data sets. In this study, we propose a set of transfer learning algorithms we call TLmutation, which implements a supervised transfer learning algorithm that transfers knowledge from survival data of a protein to a particular function of that protein. This is followed by an unsupervised transfer learning algorithm that extends the knowledge to a homologous protein. We explore the application of our algorithms in three cases. First, we test the supervised transfer on 17 previously published deep mutagenesis data sets to complete and refine missing data points. We further investigate these data sets to identify which mutations build better predictors of variant functions. In the second case, we apply the algorithm to predict higher-order mutations solely from single point mutagenesis data. Finally, we perform the unsupervised transfer learning algorithm to predict mutational effects of homologous proteins from experimental data sets. These algorithms are generalized to transfer knowledge between Markov random field models. We show the benefit of our transfer learning algorithms to utilize informative deep mutational data and provide new insights into protein variant functions. As these algorithms are generalized to transfer knowledge between Markov random field models, we expect these algorithms to be applicable to other disciplines.
Collapse
Affiliation(s)
- Zahra Shamsi
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Matthew Chan
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,NIH Center for Macromolecular Modeling and Bioinformatics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
36
|
Sherman ZM, Howard MP, Lindquist BA, Jadrich RB, Truskett TM. Inverse methods for design of soft materials. J Chem Phys 2020; 152:140902. [DOI: 10.1063/1.5145177] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Affiliation(s)
- Zachary M. Sherman
- McKetta Department of Chemical Engineering, University of Texas at Austin, Austin, Texas 78712, USA
| | - Michael P. Howard
- McKetta Department of Chemical Engineering, University of Texas at Austin, Austin, Texas 78712, USA
| | - Beth A. Lindquist
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Ryan B. Jadrich
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Thomas M. Truskett
- McKetta Department of Chemical Engineering, University of Texas at Austin, Austin, Texas 78712, USA
- Department of Physics, University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
37
|
Sidky H, Chen W, Ferguson AL. Machine learning for collective variable discovery and enhanced sampling in biomolecular simulation. Mol Phys 2020. [DOI: 10.1080/00268976.2020.1737742] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Hythem Sidky
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, USA
| | - Wei Chen
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Andrew L. Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, USA
| |
Collapse
|
38
|
Abstract
Machine learning (ML) is transforming all areas of science. The complex and time-consuming calculations in molecular simulations are particularly suitable for an ML revolution and have already been profoundly affected by the application of existing ML methods. Here we review recent ML methods for molecular simulation, with particular focus on (deep) neural networks for the prediction of quantum-mechanical energies and forces, on coarse-grained molecular dynamics, on the extraction of free energy surfaces and kinetics, and on generative network approaches to sample molecular equilibrium structures and compute thermodynamics. To explain these methods and illustrate open methodological problems, we review some important principles of molecular physics and describe how they can be incorporated into ML structures. Finally, we identify and describe a list of open challenges for the interface between ML and molecular simulation.
Collapse
Affiliation(s)
- Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; .,Department of Physics, Freie Universität Berlin, 14195 Berlin, Germany.,Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA;
| | - Alexandre Tkatchenko
- Physics and Materials Science Research Unit, University of Luxembourg, 1511 Luxembourg, Luxembourg;
| | - Klaus-Robert Müller
- Department of Computer Science, Technical University Berlin, 10587 Berlin, Germany; .,Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany.,Department of Brain and Cognitive Engineering, Korea University, Seoul 136-713, South Korea
| | - Cecilia Clementi
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; .,Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA; .,Department of Physics, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
39
|
Armacost KA, Riniker S, Cournia Z. Novel Directions in Free Energy Methods and Applications. J Chem Inf Model 2020; 60:1-5. [DOI: 10.1021/acs.jcim.9b01174] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Kira A. Armacost
- Computational and Structural Chemistry, MRL, Merck & Co., Inc. West Point, Pennsylvania 19486, United States
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Zoe Cournia
- Biomedical Research Foundation Academy of Athens, Soranou Ephessiou 4, 11527 Athens, Greece
| |
Collapse
|
40
|
Whitfield TW, Ragland DA, Zeldovich KB, Schiffer CA. Characterizing Protein-Ligand Binding Using Atomistic Simulation and Machine Learning: Application to Drug Resistance in HIV-1 Protease. J Chem Theory Comput 2020; 16:1284-1299. [PMID: 31877249 DOI: 10.1021/acs.jctc.9b00781] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Over the past several decades, atomistic simulations of biomolecules, whether carried out using molecular dynamics or Monte Carlo techniques, have provided detailed insights into their function. Comparing the results of such simulations for a few closely related systems has guided our understanding of the mechanisms by which changes such as ligand binding or mutation can alter the function. The general problem of detecting and interpreting such mechanisms from simulations of many related systems, however, remains a challenge. This problem is addressed here by applying supervised and unsupervised machine learning techniques to a variety of thermodynamic observables extracted from molecular dynamics simulations of different systems. As an important test case, these methods are applied to understand the evasion by human immunodeficiency virus type-1 (HIV-1) protease of darunavir, a potent inhibitor to which resistance can develop via the simultaneous mutation of multiple amino acids. Complex mutational patterns have been observed among resistant strains, presenting a challenge to developing a mechanistic picture of resistance in the protease. In order to dissect these patterns and gain mechanistic insight into the role of specific mutations, molecular dynamics simulations were carried out on a collection of HIV-1 protease variants, chosen to include highly resistant strains and susceptible controls, in complex with darunavir. Using a machine learning approach that takes advantage of the hierarchical nature in the relationships among the sequence, structure, and function, an integrative analysis of these trajectories reveals key details of the resistance mechanism, including changes in the protein structure, hydrogen bonding, and protein-ligand contacts.
Collapse
Affiliation(s)
- Troy W Whitfield
- Department of Medicine , University of Massachusetts Medical School , Worcester , Massachusetts 01605 , United States.,Program in Bioinformatics and Integrative Biology , University of Massachusetts Medical School , Worcester , Massachusetts 01605 , United States
| | - Debra A Ragland
- Department of Biochemistry and Molecular Pharmacology , University of Massachusetts Medical School , Worcester , Massachusetts 01605 , United States
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology , University of Massachusetts Medical School , Worcester , Massachusetts 01605 , United States
| | - Celia A Schiffer
- Department of Biochemistry and Molecular Pharmacology , University of Massachusetts Medical School , Worcester , Massachusetts 01605 , United States
| |
Collapse
|
41
|
Lemke T, Berg A, Jain A, Peter C. EncoderMap(II): Visualizing Important Molecular Motions with Improved Generation of Protein Conformations. J Chem Inf Model 2019; 59:4550-4560. [PMID: 31647645 DOI: 10.1021/acs.jcim.9b00675] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Dimensionality reduction can be used to project high-dimensional molecular data into a simplified, low-dimensional map. One feature of our recently introduced dimensionality reduction technique EncoderMap, which relies on the combination of an autoencoder with multidimensional scaling, is its ability to do the reverse. It is able to generate conformations for any selected points in the low-dimensional map. This transfers the simplified, low-dimensional map back into the high-dimensional conformational space. Although the output is again high-dimensional, certain aspects of the simplification are preserved. The generated conformations only mirror the most dominant conformational differences that determine the positions of conformational states in the low-dimensional map. This allows depicting such differences and-in consequence-visualizing molecular motions and gives a unique perspective on high-dimensional conformational data. In our previous work, protein conformations described in backbone dihedral angle space were used as the input for EncoderMap, and conformations were also generated in this space. For large proteins, however, the generation of conformations is inaccurate with this approach due to the local character of backbone dihedral angles. Here, we present an improved variant of EncoderMap which is able to generate large protein conformations that are accurate in short-range and long-range orders. This is achieved by differentiable reconstruction of Cartesian coordinates from the generated dihedrals, which allows adding a contribution to the cost function that monitors the accuracy of all pairwise distances between the Cα-atoms of the generated conformations. The improved capabilities to generate conformations of large, even multidomain, proteins are demonstrated for two examples: diubiquitin and a part of the Ssa1 Hsp70 yeast chaperone. We show that the improved variant of EncoderMap can nicely visualize motions of protein domains relative to each other but is also able to highlight important conformational changes within the individual domains.
Collapse
Affiliation(s)
- Tobias Lemke
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany
| | - Andrej Berg
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany
| | - Alok Jain
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany.,Department of Biotechnology , National Institute of Pharmaceutical Education and Research Ahmedabad , Gandhinagar , Gujarat 382355 , India
| | - Christine Peter
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany
| |
Collapse
|
42
|
Chen W, Sidky H, Ferguson AL. Capabilities and limitations of time-lagged autoencoders for slow mode discovery in dynamical systems. J Chem Phys 2019. [DOI: 10.1063/1.5112048] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Affiliation(s)
- Wei Chen
- Department of Physics, University of Illinois at Urbana-Champaign, 1110 West Green Street, Urbana, Illinois 61801, USA
| | - Hythem Sidky
- Pritzker School of Molecular Engineering, University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, USA
| | - Andrew L. Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, USA
| |
Collapse
|
43
|
Provasi D. Ligand-Binding Calculations with Metadynamics. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2019; 2022:233-253. [PMID: 31396906 DOI: 10.1007/978-1-4939-9608-7_10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
All-atom molecular dynamics simulations can capture the dynamic degrees of freedom that characterize molecular recognition, the knowledge of which constitutes the cornerstone of rational approaches to drug design and optimization. In particular, enhanced sampling algorithms, such as metadynamics, are powerful tools to dramatically reduce the computational cost required for a mechanistic description of the binding process. Here, we describe the essential details characterizing these simulation strategies, focusing on the critical step of identifying suitable reaction coordinates, as well as on the different analysis algorithms to estimate binding affinity and residence times. We conclude with a survey of published applications that provides explicit examples of successful simulations for several targets.
Collapse
Affiliation(s)
- Davide Provasi
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
44
|
Past-future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics. Nat Commun 2019; 10:3573. [PMID: 31395868 PMCID: PMC6687748 DOI: 10.1038/s41467-019-11405-4] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 07/10/2019] [Indexed: 02/06/2023] Open
Abstract
The ability to rapidly learn from high-dimensional data to make reliable bets about the future is crucial in many contexts. This could be a fly avoiding predators, or the retina processing gigabytes of data to guide human actions. In this work we draw parallels between these and the efficient sampling of biomolecules with hundreds of thousands of atoms. For this we use the Predictive Information Bottleneck framework used for the first two problems, and re-formulate it for the sampling of biomolecules, especially when plagued with rare events. Our method uses a deep neural network to learn the minimally complex yet most predictive aspects of a given biomolecular trajectory. This information is used to perform iteratively biased simulations that enhance the sampling and directly obtain associated thermodynamic and kinetic information. We demonstrate the method on two test-pieces, studying processes slower than milliseconds, calculating free energies, kinetics and critical mutations. Efficient sampling of rare events in all-atom molecular dynamics simulations remains a challenge. Here, the authors adapt the Predictive Information Bottleneck framework to sample biomolecular structure and dynamics through iterative rounds of biased simulations and deep learning.
Collapse
|
45
|
Tribello GA, Gasparotto P. Using Dimensionality Reduction to Analyze Protein Trajectories. Front Mol Biosci 2019; 6:46. [PMID: 31275943 PMCID: PMC6593086 DOI: 10.3389/fmolb.2019.00046] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 05/31/2019] [Indexed: 11/24/2022] Open
Abstract
In recent years the analysis of molecular dynamics trajectories using dimensionality reduction algorithms has become commonplace. These algorithms seek to find a low-dimensional representation of a trajectory that is, according to a well-defined criterion, optimal. A number of different strategies for generating projections of trajectories have been proposed but little has been done to systematically compare how these various approaches fare when it comes to analysing trajectories for biomolecules in explicit solvent. In the following paper, we have thus analyzed a molecular dynamics trajectory of the C-terminal fragment of the immunoglobulin binding domain B1 of protein G of Streptococcus modeled in explicit solvent using a range of different dimensionality reduction algorithms. We have then tried to systematically compare the projections generated using each of these algorithms by using a clustering algorithm to find the positions and extents of the basins in the high-dimensional energy landscape. We find that no algorithm outshines all the other in terms of the quality of the projection it generates. Instead, all the algorithms do a reasonable job when it comes to building a projection that separates some of the configurations that lie in different basins. Having said that, however, all the algorithms struggle to project the basins because they all have a large intrinsic dimensionality.
Collapse
Affiliation(s)
- Gareth A Tribello
- Atomistic Simulation Centre, School of Mathematics and Physics, Queen's University Belfast, Belfast, United Kingdom
| | - Piero Gasparotto
- Department of Physics and Astronomy, Thomas Young Centre, University College London, London, United Kingdom
| |
Collapse
|
46
|
Chen W, Sidky H, Ferguson AL. Nonlinear discovery of slow molecular modes using state-free reversible VAMPnets. J Chem Phys 2019; 150:214114. [PMID: 31176319 DOI: 10.1063/1.5092521] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The success of enhanced sampling molecular simulations that accelerate along collective variables (CVs) is predicated on the availability of variables coincident with the slow collective motions governing the long-time conformational dynamics of a system. It is challenging to intuit these slow CVs for all but the simplest molecular systems, and their data-driven discovery directly from molecular simulation trajectories has been a central focus of the molecular simulation community to both unveil the important physical mechanisms and drive enhanced sampling. In this work, we introduce state-free reversible VAMPnets (SRV) as a deep learning architecture that learns nonlinear CV approximants to the leading slow eigenfunctions of the spectral decomposition of the transfer operator that evolves equilibrium-scaled probability distributions through time. Orthogonality of the learned CVs is naturally imposed within network training without added regularization. The CVs are inherently explicit and differentiable functions of the input coordinates making them well-suited to use in enhanced sampling calculations. We demonstrate the utility of SRVs in capturing parsimonious nonlinear representations of complex system dynamics in applications to 1D and 2D toy systems where the true eigenfunctions are exactly calculable and to molecular dynamics simulations of alanine dipeptide and the WW domain protein.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, University of Illinois at Urbana-Champaign, 1110 West Green Street, Urbana, Illinois 61801, USA
| | - Hythem Sidky
- Institute for Molecular Engineering, University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, USA
| | - Andrew L Ferguson
- Institute for Molecular Engineering, University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, USA
| |
Collapse
|
47
|
Nagel D, Weber A, Lickert B, Stock G. Dynamical coring of Markov state models. J Chem Phys 2019; 150:094111. [DOI: 10.1063/1.5081767] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Affiliation(s)
- Daniel Nagel
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| | - Anna Weber
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| | - Benjamin Lickert
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| | - Gerhard Stock
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| |
Collapse
|
48
|
Wayment-Steele HK, Pande VS. Note: Variational encoding of protein dynamics benefits from maximizing latent autocorrelation. J Chem Phys 2019; 149:216101. [PMID: 30525733 DOI: 10.1063/1.5043303] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
As deep Variational Auto-Encoder (VAE) frameworks become more widely used for modeling biomolecular simulation data, we emphasize the capability of the VAE architecture to concurrently maximize the time scale of the latent space while inferring a reduced coordinate, which assists in finding slow processes as according to the variational approach to conformational dynamics. We provide evidence that the VDE framework [Hernández et al., Phys. Rev. E 97, 062412 (2018)], which uses this autocorrelation loss along with a time-lagged reconstruction loss, obtains a variationally optimized latent coordinate in comparison with related loss functions. We thus recommend leveraging the autocorrelation of the latent space while training neural network models of biomolecular simulation data to better represent slow processes.
Collapse
Affiliation(s)
| | - Vijay S Pande
- Department of Bioengineering, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
49
|
Schöberl M, Zabaras N, Koutsourelakis PS. Predictive collective variable discovery with deep Bayesian models. J Chem Phys 2019; 150:024109. [DOI: 10.1063/1.5058063] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Markus Schöberl
- Center for Informatics and Computational Science, University of Notre Dame, 311 Cushing Hall, Notre Dame, Indiana 46556, USA
- Continuum Mechanics Group, Technical University of Munich, Boltzmannstraße 15, 85748 Garching, Germany
| | - Nicholas Zabaras
- Center for Informatics and Computational Science, University of Notre Dame, 311 Cushing Hall, Notre Dame, Indiana 46556, USA
| | | |
Collapse
|
50
|
Lemke T, Peter C. EncoderMap: Dimensionality Reduction and Generation of Molecule Conformations. J Chem Theory Comput 2019; 15:1209-1215. [DOI: 10.1021/acs.jctc.8b00975] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Affiliation(s)
- Tobias Lemke
- Theoretical Chemistry, University of Konstanz, 78547 Konstanz, Germany
| | - Christine Peter
- Theoretical Chemistry, University of Konstanz, 78547 Konstanz, Germany
| |
Collapse
|