1
|
Chatterjee S, Ray D. Acceleration with Interpretability: A Surrogate Model-Based Collective Variable for Enhanced Sampling. J Chem Theory Comput 2025. [PMID: 39905595 DOI: 10.1021/acs.jctc.4c01603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2025]
Abstract
Most enhanced sampling methods facilitate the exploration of molecular free energy landscapes by applying a bias potential along a reduced dimensional collective variable (CV) space. The success of these methods depends on the ability of the CVs to follow the relevant slow modes of the system. Intuitive CVs, such as distances or contacts, often prove inadequate, particularly in biological systems involving many coupled degrees of freedom. Machine learning algorithms, especially neural networks (NN), can automate the process of CV discovery by combining a large number of molecular descriptors and often outperform intuitive CVs in sampling efficiency. However, their lack of interpretability and high cost of evaluation during trajectory propagation make NN-CVs difficult to apply to large biomolecular processes. Here, we introduce a surrogate model approach using lasso regression to express the output of a neural network as a linear combination of an automatically chosen subset of the input descriptors. We demonstrate successful applications of our surrogate model CVs in the enhanced sampling simulation of the conformational landscape of alanine dipeptide and chignolin mini-protein. In addition to providing mechanistic insights due to their explainable nature, the surrogate model CVs showed a negligible loss in efficiency and accuracy, compared to the NN-CVs, in reconstructing the underlying free energy surface. Moreover, due to their simplified functional forms, these CVs are better at extrapolating to unseen regions of the conformational space, e.g., saddle points. Surrogate model CVs are also less expensive to evaluate compared to their NN counterparts, making them suitable for enhanced sampling simulation of large and complex biomolecular processes.
Collapse
Affiliation(s)
- Sompriya Chatterjee
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon 97403, United States
- Materials Science Institute, University of Oregon, Eugene, Oregon 97403, United States
| | - Dhiman Ray
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon 97403, United States
- Materials Science Institute, University of Oregon, Eugene, Oregon 97403, United States
| |
Collapse
|
2
|
Liu Z, Grigas AT, Sumner J, Knab E, Davis CM, O'Hern CS. Identifying the minimal sets of distance restraints for FRET-assisted protein structural modeling. Protein Sci 2024; 33:e5219. [PMID: 39548730 PMCID: PMC11568256 DOI: 10.1002/pro.5219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/26/2024] [Indexed: 11/18/2024]
Abstract
Proteins naturally occur in crowded cellular environments and interact with other proteins, nucleic acids, and organelles. Since most previous experimental protein structure determination techniques require that proteins occur in idealized, non-physiological environments, the effects of realistic cellular environments on protein structure are largely unexplored. Recently, Förster resonance energy transfer (FRET) has been shown to be an effective experimental method for investigating protein structure in vivo. Inter-residue distances measured in vivo can be incorporated as restraints in molecular dynamics (MD) simulations to model protein structural dynamics in vivo. Since most FRET studies only obtain inter-residue separations for a small number of amino acid pairs, it is important to determine the minimum number of restraints in the MD simulations that are required to achieve a given root-mean-square deviation (RMSD) from the experimental structural ensemble. Further, what is the optimal method for selecting these inter-residue restraints? Here, we implement several methods for selecting the most important FRET pairs and determine the number of pairsN r $$ {N}_r $$ that are needed to induce conformational changes in proteins between two experimentally determined structures. We find that enforcing only a small fraction of restraints,N r / N ≲ 0.08 $$ {N}_r/N\lesssim 0.08 $$ , whereN $$ N $$ is the number of amino acids, can induce the conformational changes. These results establish the efficacy of FRET-assisted MD simulations for atomic scale structural modeling of proteins in vivo.
Collapse
Affiliation(s)
- Zhuoyi Liu
- Department of Mechanical Engineering and Materials ScienceYale UniversityNew HavenConnecticutUSA
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
| | - Alex T. Grigas
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
| | - Jacob Sumner
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
| | - Edward Knab
- Department of ChemistryYale UniversityNew HavenConnecticutUSA
| | | | - Corey S. O'Hern
- Department of Mechanical Engineering and Materials ScienceYale UniversityNew HavenConnecticutUSA
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
- Department of PhysicsYale UniversityNew HavenConnecticutUSA
- Department of Applied PhysicsYale UniversityNew HavenConnecticutUSA
| |
Collapse
|
3
|
Javed R, Kapakayala AB, Nair NN. Buckets Instead of Umbrellas for Enhanced Sampling and Free Energy Calculations. J Chem Theory Comput 2024; 20:8450-8460. [PMID: 39344058 DOI: 10.1021/acs.jctc.4c00776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Umbrella sampling has been a workhorse for free energy calculations in molecular simulations for several decades. In conventional umbrella sampling, restraining bias potentials are strategically applied along one or several collective variables. Major drawbacks associated with this method are the requirement of a large number of bias windows and the poor sampling of the transverse coordinates. In this work, we propose an alternate formalism that departs from the traditional umbrella sampling to mitigate these issues, where we replace umbrella-type restraining bias potentials with bucket-type wall potentials. This modification permits one to formulate an efficient computational strategy leveraging wall potentials and metadynamics sampling. This new method, called "bucket sampling", can significantly reduce the computational cost of obtaining converged high-dimensional free energy surfaces. Extensions of the proposed method with temperature acceleration and replica exchange solute tempering are also demonstrated.
Collapse
Affiliation(s)
- Ramsha Javed
- Department of Chemistry, Indian Institute of Technology Kanpur, Kanpur 208016, India
| | - Anji Babu Kapakayala
- Department of Chemistry, Indian Institute of Technology Kanpur, Kanpur 208016, India
| | - Nisanth N Nair
- Department of Chemistry, Indian Institute of Technology Kanpur, Kanpur 208016, India
| |
Collapse
|
4
|
Bajpai S, Petkov BK, Tong M, Abreu CRA, Nair NN, Tuckerman ME. An interoperable implementation of collective-variable based enhanced sampling methods in extended phase space within the OpenMM package. J Comput Chem 2023; 44:2166-2183. [PMID: 37464902 DOI: 10.1002/jcc.27182] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 05/30/2023] [Accepted: 06/06/2023] [Indexed: 07/20/2023]
Abstract
Collective variable (CV)-based enhanced sampling techniques are widely used today for accelerating barrier-crossing events in molecular simulations. A class of these methods, which includes temperature accelerated molecular dynamics (TAMD)/driven-adiabatic free energy dynamics (d-AFED), unified free energy dynamics (UFED), and temperature accelerated sliced sampling (TASS), uses an extended variable formalism to achieve quick exploration of conformational space. These techniques are powerful, as they enhance the sampling of a large number of CVs simultaneously compared to other techniques. Extended variables are kept at a much higher temperature than the physical temperature by ensuring adiabatic separation between the extended and physical subsystems and employing rigorous thermostatting. In this work, we present a computational platform to perform extended phase space enhanced sampling simulations using the open-source molecular dynamics engine OpenMM. The implementation allows users to have interoperability of sampling techniques, as well as employ state-of-the-art thermostats and multiple time-stepping. This work also presents protocols for determining the critical parameters and procedures for reconstructing high-dimensional free energy surfaces. As a demonstration, we present simulation results on the high dimensional conformational landscapes of the alanine tripeptide in vacuo, tetra-N-methylglycine (tetra-sarcosine) peptoid in implicit solvent, and the Trp-cage mini protein in explicit water.
Collapse
Affiliation(s)
- Shitanshu Bajpai
- Department of Chemistry, Indian Institute of Technology Kanpur (IITK), Kanpur, India
| | - Brian K Petkov
- Department of Chemistry, New York University (NYU), New York, New York, USA
| | - Muchen Tong
- Department of Chemistry, New York University (NYU), New York, New York, USA
| | - Charlles R A Abreu
- Chemical Engineering Department, Escola de Química, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Nisanth N Nair
- Department of Chemistry, Indian Institute of Technology Kanpur (IITK), Kanpur, India
| | - Mark E Tuckerman
- Department of Chemistry, New York University (NYU), New York, New York, USA
- Courant Institute of Mathematical Sciences, New York University (NYU), New York, New York, USA
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, China
- Simons Center for Computational Physical Chemistry, New York University, New York, New York, USA
| |
Collapse
|
5
|
Conflitti P, Raniolo S, Limongelli V. Perspectives on Ligand/Protein Binding Kinetics Simulations: Force Fields, Machine Learning, Sampling, and User-Friendliness. J Chem Theory Comput 2023; 19:6047-6061. [PMID: 37656199 PMCID: PMC10536999 DOI: 10.1021/acs.jctc.3c00641] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Indexed: 09/02/2023]
Abstract
Computational techniques applied to drug discovery have gained considerable popularity for their ability to filter potentially active drugs from inactive ones, reducing the time scale and costs of preclinical investigations. The main focus of these studies has historically been the search for compounds endowed with high affinity for a specific molecular target to ensure the formation of stable and long-lasting complexes. Recent evidence has also correlated the in vivo drug efficacy with its binding kinetics, thus opening new fascinating scenarios for ligand/protein binding kinetic simulations in drug discovery. The present article examines the state of the art in the field, providing a brief summary of the most popular and advanced ligand/protein binding kinetics techniques and evaluating their current limitations and the potential solutions to reach more accurate kinetic models. Particular emphasis is put on the need for a paradigm change in the present methodologies toward ligand and protein parametrization, the force field problem, characterization of the transition states, the sampling issue, and algorithms' performance, user-friendliness, and data openness.
Collapse
Affiliation(s)
- Paolo Conflitti
- Faculty
of Biomedical Sciences, Euler Institute, Universitá della Svizzera italiana (USI), 6900 Lugano, Switzerland
| | - Stefano Raniolo
- Faculty
of Biomedical Sciences, Euler Institute, Universitá della Svizzera italiana (USI), 6900 Lugano, Switzerland
| | - Vittorio Limongelli
- Faculty
of Biomedical Sciences, Euler Institute, Universitá della Svizzera italiana (USI), 6900 Lugano, Switzerland
- Department
of Pharmacy, University of Naples “Federico
II”, 80131 Naples, Italy
| |
Collapse
|
6
|
Mendels D, Byléhn F, Sirk TW, de Pablo JJ. Systematic modification of functionality in disordered elastic networks through free energy surface tailoring. SCIENCE ADVANCES 2023; 9:eadf7541. [PMID: 37285442 DOI: 10.1126/sciadv.adf7541] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 05/01/2023] [Indexed: 06/09/2023]
Abstract
A combined machine learning-physics-based approach is explored for molecular and materials engineering. Specifically, collective variables, akin to those used in enhanced sampled simulations, are constructed using a machine learning model trained on data gathered from a single system. Through the constructed collective variables, it becomes possible to identify critical molecular interactions in the considered system, the modulation of which enables a systematic tailoring of the system's free energy landscape. To explore the efficacy of the proposed approach, we use it to engineer allosteric regulation and uniaxial strain fluctuations in a complex disordered elastic network. Its successful application in these two cases provides insights regarding how functionality is governed in systems characterized by extensive connectivity and points to its potential for design of complex molecular systems.
Collapse
Affiliation(s)
- Dan Mendels
- Pritzker School of Molecular Engineering, University of Chicago, 5640 S. Ellis Avenue, Chicago, IL 60637 USA
| | - Fabian Byléhn
- Pritzker School of Molecular Engineering, University of Chicago, 5640 S. Ellis Avenue, Chicago, IL 60637 USA
| | - Timothy W Sirk
- Polymers Branch, U.S. CCDC Army Research Laboratory, Aberdeen Proving Ground, MD 21005, USA
| | - Juan J de Pablo
- Pritzker School of Molecular Engineering, University of Chicago, 5640 S. Ellis Avenue, Chicago, IL 60637 USA
| |
Collapse
|
7
|
Yang W, Templeton C, Rosenberger D, Bittracher A, Nüske F, Noé F, Clementi C. Slicing and Dicing: Optimal Coarse-Grained Representation to Preserve Molecular Kinetics. ACS CENTRAL SCIENCE 2023; 9:186-196. [PMID: 36844497 PMCID: PMC9951291 DOI: 10.1021/acscentsci.2c01200] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Indexed: 05/05/2023]
Abstract
The aim of molecular coarse-graining approaches is to recover relevant physical properties of the molecular system via a lower-resolution model that can be more efficiently simulated. Ideally, the lower resolution still accounts for the degrees of freedom necessary to recover the correct physical behavior. The selection of these degrees of freedom has often relied on the scientist's chemical and physical intuition. In this article, we make the argument that in soft matter contexts desirable coarse-grained models accurately reproduce the long-time dynamics of a system by correctly capturing the rare-event transitions. We propose a bottom-up coarse-graining scheme that correctly preserves the relevant slow degrees of freedom, and we test this idea for three systems of increasing complexity. We show that in contrast to this method existing coarse-graining schemes such as those from information theory or structure-based approaches are not able to recapitulate the slow time scales of the system.
Collapse
Affiliation(s)
- Wangfei Yang
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Graduate
Program in Systems, Synthetic and Physical Biology, Rice University, Houston, Texas77005, United States
| | - Clark Templeton
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - David Rosenberger
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - Andreas Bittracher
- Department
of Mathematics and Computer Science, Freie
Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - Feliks Nüske
- Max
Planck Institute for Dynamics of Complex Technical Systems, Sandtorstrasse 1, 39106Magdeburg, Germany
| | - Frank Noé
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Mathematics and Computer Science, Freie
Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Chemistry, Rice University, Houston, Texas77005, United States
| | - Cecilia Clementi
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Chemistry, Rice University, Houston, Texas77005, United States
- Department
of Physics, Rice University, Houston, Texas77005, United States
- E-mail:
| |
Collapse
|