1
|
Shimono Y, Hakamada M, Mabuchi M. NPEX: Never give up protein exploration with deep reinforcement learning. J Mol Graph Model 2024; 131:108802. [PMID: 38838617 DOI: 10.1016/j.jmgm.2024.108802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/05/2024] [Accepted: 05/24/2024] [Indexed: 06/07/2024]
Abstract
Elucidating unknown structures of proteins, such as metastable states, is critical in designing therapeutic agents. Protein structure exploration has been performed using advanced computational methods, especially molecular dynamics and Markov chain Monte Carlo simulations, which require untenably long calculation times and prior structural knowledge. Here, we developed an innovative method for protein structure determination called never give up protein exploration (NPEX) with deep reinforcement learning. The NPEX method leverages the soft actor-critic algorithm and the intrinsic reward system, effectively adding a bias potential without the need for prior knowledge. To demonstrate the method's effectiveness, we applied it to four models: a double well, a triple well, the alanine dipeptide, and the tryptophan cage. Compared with Markov chain Monte Carlo simulations, NPEX had markedly greater sampling efficiency. The significantly enhanced computational efficiency and lack of prior domain knowledge requirements of the NPEX method will revolutionize protein structure exploration.
Collapse
Affiliation(s)
- Yuta Shimono
- Graduate School of Energy Science, Kyoto University, Yoshidahonmachi, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Masataka Hakamada
- Graduate School of Energy Science, Kyoto University, Yoshidahonmachi, Sakyo-ku, Kyoto, 606-8501, Japan.
| | - Mamoru Mabuchi
- Graduate School of Energy Science, Kyoto University, Yoshidahonmachi, Sakyo-ku, Kyoto, 606-8501, Japan
| |
Collapse
|
2
|
Blumer O, Reuveni S, Hirshberg B. Short-Time Infrequent Metadynamics for Improved Kinetics Inference. J Chem Theory Comput 2024; 20:3484-3491. [PMID: 38668722 PMCID: PMC11099961 DOI: 10.1021/acs.jctc.4c00170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 04/02/2024] [Accepted: 04/02/2024] [Indexed: 05/15/2024]
Abstract
Infrequent Metadynamics is a popular method to obtain the rates of long time-scale processes from accelerated simulations. The inference procedure is based on rescaling the first-passage times of the Metadynamics trajectories using a bias-dependent acceleration factor. While useful in many cases, it is limited to Poisson kinetics, and a reliable estimation of the unbiased rate requires slow bias deposition and prior knowledge of efficient collective variables. Here, we propose an improved inference scheme, which is based on two key observations: (1) the time-independent rate of Poisson processes can be estimated using short trajectories only. (2) Short trajectories experience minimal bias, and their rescaled first-passage times follow the unbiased distribution even for relatively high deposition rates and suboptimal collective variables. Therefore, by basing the inference procedure on short time scales, we obtain an improved trade-off between speedup and accuracy at no additional computational cost, especially when employing suboptimal collective variables. We demonstrate the improved inference scheme for a model system and two molecular systems.
Collapse
Affiliation(s)
- Ofir Blumer
- School
of Chemistry, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Shlomi Reuveni
- School
of Chemistry, Tel Aviv University, Tel Aviv 6997801, Israel
- The
Center for Computational Molecular and Materials Science, Tel Aviv University, Tel Aviv 6997801, Israel
- The
Center for Physics and Chemistry of Living Systems, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Barak Hirshberg
- School
of Chemistry, Tel Aviv University, Tel Aviv 6997801, Israel
- The
Center for Computational Molecular and Materials Science, Tel Aviv University, Tel Aviv 6997801, Israel
- The
Center for Physics and Chemistry of Living Systems, Tel Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
3
|
Ellaway JIJ, Anyango S, Nair S, Zaki HA, Nadzirin N, Powell HR, Gutmanas A, Varadi M, Velankar S. Identifying protein conformational states in the Protein Data Bank: Toward unlocking the potential of integrative dynamics studies. STRUCTURAL DYNAMICS (MELVILLE, N.Y.) 2024; 11:034701. [PMID: 38774441 PMCID: PMC11106648 DOI: 10.1063/4.0000251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 05/08/2024] [Indexed: 05/24/2024]
Abstract
Studying protein dynamics and conformational heterogeneity is crucial for understanding biomolecular systems and treating disease. Despite the deposition of over 215 000 macromolecular structures in the Protein Data Bank and the advent of AI-based structure prediction tools such as AlphaFold2, RoseTTAFold, and ESMFold, static representations are typically produced, which fail to fully capture macromolecular motion. Here, we discuss the importance of integrating experimental structures with computational clustering to explore the conformational landscapes that manifest protein function. We describe the method developed by the Protein Data Bank in Europe - Knowledge Base to identify distinct conformational states, demonstrate the resource's primary use cases, through examples, and discuss the need for further efforts to annotate protein conformations with functional information. Such initiatives will be crucial in unlocking the potential of protein dynamics data, expediting drug discovery research, and deepening our understanding of macromolecular mechanisms.
Collapse
Affiliation(s)
- Joseph I. J. Ellaway
- Protein Data Bank in Europe, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Stephen Anyango
- Protein Data Bank in Europe, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Sreenath Nair
- Protein Data Bank in Europe, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Hossam A. Zaki
- The Warren Alpert Medical School of Brown University, Providence, Rhode Island 02903, USA
| | - Nurul Nadzirin
- Protein Data Bank in Europe, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Harold R. Powell
- Imperial College London, Department of Life Sciences, London, United Kingdom
| | - Aleksandras Gutmanas
- WaveBreak Therapeutics Ltd., Clarendon House, Clarendon Road, Cambridge, United Kingdom
| | - Mihaly Varadi
- Protein Data Bank in Europe, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Sameer Velankar
- Protein Data Bank in Europe, European Bioinformatics Institute, Hinxton, United Kingdom
| |
Collapse
|
4
|
Xie P, Car R, E W. Ab initio generalized Langevin equation. Proc Natl Acad Sci U S A 2024; 121:e2308668121. [PMID: 38551836 PMCID: PMC10998567 DOI: 10.1073/pnas.2308668121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 02/22/2024] [Indexed: 04/08/2024] Open
Abstract
We introduce a machine learning-based approach called ab initio generalized Langevin equation (AIGLE) to model the dynamics of slow collective variables (CVs) in materials and molecules. In this scheme, the parameters are learned from atomistic simulations based on ab initio quantum mechanical models. Force field, memory kernel, and noise generator are constructed in the context of the Mori-Zwanzig formalism, under the constraint of the fluctuation-dissipation theorem. Combined with deep potential molecular dynamics and electronic density functional theory, this approach opens the way to multiscale modeling in a variety of situations. Here, we demonstrate this capability with a study of two mesoscale processes in crystalline lead titanate, namely the field-driven dynamics of a planar ferroelectric domain wall, and the dynamics of an extensive lattice of coarse-grained electric dipoles. In the first case, AIGLE extends the reach of ab initio simulations to a regime of noise-driven motions not accessible to molecular dynamics. In the second case, AIGLE deals with an extensive set of CVs by adopting a local approximation for the memory kernel and retaining only short-range noise correlations. The scheme is computationally more efficient than molecular dynamics by several orders of magnitude and mimics the microscopic dynamics at low frequencies where it reproduces accurately the dominant far-infrared absorption frequency.
Collapse
Affiliation(s)
- Pinchen Xie
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ08544
| | - Roberto Car
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ08544
- Department of Chemistry and Princeton Materials Institute, Princeton University, Princeton, NJ08544
- Department of Physics, Princeton University, Princeton, NJ08544
| | - Weinan E
- AI for Science Institute, Beijing100080, China
- Center for Machine Learning Research and School of Mathematical Sciences, Peking University, Beijing100084, China
| |
Collapse
|
5
|
Kleiman DE, Nadeem H, Shukla D. Adaptive Sampling Methods for Molecular Dynamics in the Era of Machine Learning. J Phys Chem B 2023; 127:10669-10681. [PMID: 38081185 DOI: 10.1021/acs.jpcb.3c04843] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2023]
Abstract
Molecular dynamics (MD) simulations are fundamental computational tools for the study of proteins and their free energy landscapes. However, sampling protein conformational changes through MD simulations is challenging due to the relatively long time scales of these processes. Many enhanced sampling approaches have emerged to tackle this problem, including biased sampling and path-sampling methods. In this Perspective, we focus on adaptive sampling algorithms. These techniques differ from other approaches because the thermodynamic ensemble is preserved and the sampling is enhanced solely by restarting MD trajectories at particularly chosen seeds rather than introducing biasing forces. We begin our treatment with an overview of theoretically transparent methods, where we discuss principles and guidelines for adaptive sampling. Then, we present a brief summary of select methods that have been applied to realistic systems in the past. Finally, we discuss recent advances in adaptive sampling methodology powered by deep learning techniques, as well as their shortcomings.
Collapse
Affiliation(s)
- Diego E Kleiman
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Hassan Nadeem
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Diwakar Shukla
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
6
|
Liu Z. Accelerating Kinetics with Time-Reversal Path Sampling. Molecules 2023; 28:8147. [PMID: 38138635 PMCID: PMC10745403 DOI: 10.3390/molecules28248147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 12/07/2023] [Accepted: 12/13/2023] [Indexed: 12/24/2023] Open
Abstract
In comparison to numerous enhanced sampling methods for equilibrium thermodynamics, accelerating simulations for kinetics and nonequilibrium statistics are relatively rare and less effective. Here, we derive a time-reversal path sampling (tRPS) method based on time reversibility to accelerate simulations for determining the transition rates between free-energy basins. It converts the difficult uphill path sampling into an easy downhill problem. This method is easy to implement, i.e., forward and backward shooting simulations with opposite initial velocities are conducted from random initial conformations within a transition-state region until they reach the basin minima, which are then assembled to give the distribution of transition paths efficiently. The effects of tRPS are demonstrated using a comparison with direct simulations of protein folding and unfolding, where tRPS is shown to give results consistent with direct simulations and increase the efficiency by up to five orders of magnitude. This approach is generally applicable to stochastic processes with microscopic reversibility, regardless of whether the variables are continuous or discrete.
Collapse
Affiliation(s)
- Zhirong Liu
- Beijing National Laboratory for Molecular Sciences (BNLMS), College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| |
Collapse
|
7
|
Bajpai S, Petkov BK, Tong M, Abreu CRA, Nair NN, Tuckerman ME. An interoperable implementation of collective-variable based enhanced sampling methods in extended phase space within the OpenMM package. J Comput Chem 2023; 44:2166-2183. [PMID: 37464902 DOI: 10.1002/jcc.27182] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 05/30/2023] [Accepted: 06/06/2023] [Indexed: 07/20/2023]
Abstract
Collective variable (CV)-based enhanced sampling techniques are widely used today for accelerating barrier-crossing events in molecular simulations. A class of these methods, which includes temperature accelerated molecular dynamics (TAMD)/driven-adiabatic free energy dynamics (d-AFED), unified free energy dynamics (UFED), and temperature accelerated sliced sampling (TASS), uses an extended variable formalism to achieve quick exploration of conformational space. These techniques are powerful, as they enhance the sampling of a large number of CVs simultaneously compared to other techniques. Extended variables are kept at a much higher temperature than the physical temperature by ensuring adiabatic separation between the extended and physical subsystems and employing rigorous thermostatting. In this work, we present a computational platform to perform extended phase space enhanced sampling simulations using the open-source molecular dynamics engine OpenMM. The implementation allows users to have interoperability of sampling techniques, as well as employ state-of-the-art thermostats and multiple time-stepping. This work also presents protocols for determining the critical parameters and procedures for reconstructing high-dimensional free energy surfaces. As a demonstration, we present simulation results on the high dimensional conformational landscapes of the alanine tripeptide in vacuo, tetra-N-methylglycine (tetra-sarcosine) peptoid in implicit solvent, and the Trp-cage mini protein in explicit water.
Collapse
Affiliation(s)
- Shitanshu Bajpai
- Department of Chemistry, Indian Institute of Technology Kanpur (IITK), Kanpur, India
| | - Brian K Petkov
- Department of Chemistry, New York University (NYU), New York, New York, USA
| | - Muchen Tong
- Department of Chemistry, New York University (NYU), New York, New York, USA
| | - Charlles R A Abreu
- Chemical Engineering Department, Escola de Química, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Nisanth N Nair
- Department of Chemistry, Indian Institute of Technology Kanpur (IITK), Kanpur, India
| | - Mark E Tuckerman
- Department of Chemistry, New York University (NYU), New York, New York, USA
- Courant Institute of Mathematical Sciences, New York University (NYU), New York, New York, USA
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, China
- Simons Center for Computational Physical Chemistry, New York University, New York, New York, USA
| |
Collapse
|
8
|
Abstract
A survey of protein databases indicates that the majority of enzymes exist in oligomeric forms, with about half of those found in the UniProt database being homodimeric. Understanding why many enzymes are in their dimeric form is imperative. Recent developments in experimental and computational techniques have allowed for a deeper comprehension of the cooperative interactions between the subunits of dimeric enzymes. This review aims to succinctly summarize these recent advancements by providing an overview of experimental and theoretical methods, as well as an understanding of cooperativity in substrate binding and the molecular mechanisms of cooperative catalysis within homodimeric enzymes. Focus is set upon the beneficial effects of dimerization and cooperative catalysis. These advancements not only provide essential case studies and theoretical support for comprehending dimeric enzyme catalysis but also serve as a foundation for designing highly efficient catalysts, such as dimeric organic catalysts. Moreover, these developments have significant implications for drug design, as exemplified by Paxlovid, which was designed for the homodimeric main protease of SARS-CoV-2.
Collapse
Affiliation(s)
- Ke-Wei Chen
- Lab of Computional Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Tian-Yu Sun
- Shenzhen Bay Laboratory, Shenzhen 518132, China
| | - Yun-Dong Wu
- Lab of Computional Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
- Shenzhen Bay Laboratory, Shenzhen 518132, China
| |
Collapse
|
9
|
Zhang Z, Liu Q, Lee CK, Hsieh CY, Chen E. An equivariant generative framework for molecular graph-structure Co-design. Chem Sci 2023; 14:8380-8392. [PMID: 37564414 PMCID: PMC10411624 DOI: 10.1039/d3sc02538a] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 07/05/2023] [Indexed: 08/12/2023] Open
Abstract
Designing molecules with desirable physiochemical properties and functionalities is a long-standing challenge in chemistry, material science, and drug discovery. Recently, machine learning-based generative models have emerged as promising approaches for de novo molecule design. However, further refinement of methodology is highly desired as most existing methods lack unified modeling of 2D topology and 3D geometry information and fail to effectively learn the structure-property relationship for molecule design. Here we present MolCode, a roto-translation equivariant generative framework for molecular graph-structure Co-design. In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure. Extensive experimental results show that MolCode outperforms previous methods on a series of challenging tasks including de novo molecule design, targeted molecule discovery, and structure-based drug design. Particularly, MolCode not only consistently generates valid (99.95% validity) and diverse (98.75% uniqueness) molecular graphs/structures with desirable properties, but also generates drug-like molecules with high affinity to target proteins (61.8% high affinity ratio), which demonstrates MolCode's potential applications in material design and drug discovery. Our extensive investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design, and provide new insights into machine learning-based molecule representation and generation.
Collapse
Affiliation(s)
- Zaixi Zhang
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China Hefei Anhui 230026 China
- State Key Laboratory of Cognitive Intelligence Hefei Anhui 230088 China
| | - Qi Liu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China Hefei Anhui 230026 China
- State Key Laboratory of Cognitive Intelligence Hefei Anhui 230088 China
| | | | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou Zhejiang 310058 China
| | - Enhong Chen
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China Hefei Anhui 230026 China
- State Key Laboratory of Cognitive Intelligence Hefei Anhui 230088 China
| |
Collapse
|
10
|
Wang H, Fu T, Du Y, Gao W, Huang K, Liu Z, Chandak P, Liu S, Van Katwyk P, Deac A, Anandkumar A, Bergen K, Gomes CP, Ho S, Kohli P, Lasenby J, Leskovec J, Liu TY, Manrai A, Marks D, Ramsundar B, Song L, Sun J, Tang J, Veličković P, Welling M, Zhang L, Coley CW, Bengio Y, Zitnik M. Scientific discovery in the age of artificial intelligence. Nature 2023; 620:47-60. [PMID: 37532811 DOI: 10.1038/s41586-023-06221-2] [Citation(s) in RCA: 69] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 05/16/2023] [Indexed: 08/04/2023]
Abstract
Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment and accelerate research, helping scientists to generate hypotheses, design experiments, collect and interpret large datasets, and gain insights that might not have been possible using traditional scientific methods alone. Here we examine breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deep learning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency. Generative AI methods can create designs, such as small-molecule drugs and proteins, by analysing diverse data modalities, including images and sequences. We discuss how these methods can help scientists throughout the scientific process and the central issues that remain despite such advances. Both developers and users of AI toolsneed a better understanding of when such approaches need improvement, and challenges posed by poor data quality and stewardship remain. These issues cut across scientific disciplines and require developing foundational algorithmic approaches that can contribute to scientific understanding or acquire it autonomously, making them critical areas of focus for AI innovation.
Collapse
Affiliation(s)
- Hanchen Wang
- Department of Engineering, University of Cambridge, Cambridge, UK
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
- Department of Research and Early Development, Genentech Inc, South San Francisco, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Tianfan Fu
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Yuanqi Du
- Department of Computer Science, Cornell University, Ithaca, NY, USA
| | - Wenhao Gao
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Kexin Huang
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Ziming Liu
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Payal Chandak
- Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA, USA
| | - Shengchao Liu
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- Université de Montréal, Montreal, Quebec, Canada
| | - Peter Van Katwyk
- Department of Earth, Environmental and Planetary Sciences, Brown University, Providence, RI, USA
- Data Science Institute, Brown University, Providence, RI, USA
| | - Andreea Deac
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- Université de Montréal, Montreal, Quebec, Canada
| | - Anima Anandkumar
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
- NVIDIA, Santa Clara, CA, USA
| | - Karianne Bergen
- Department of Earth, Environmental and Planetary Sciences, Brown University, Providence, RI, USA
- Data Science Institute, Brown University, Providence, RI, USA
| | - Carla P Gomes
- Department of Computer Science, Cornell University, Ithaca, NY, USA
| | - Shirley Ho
- Center for Computational Astrophysics, Flatiron Institute, New York, NY, USA
- Department of Astrophysical Sciences, Princeton University, Princeton, NJ, USA
- Department of Physics, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Physics and Center for Data Science, New York University, New York, NY, USA
| | | | - Joan Lasenby
- Department of Engineering, University of Cambridge, Cambridge, UK
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | - Arjun Manrai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Debora Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Le Song
- BioMap, Beijing, China
- Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| | - Jimeng Sun
- University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Jian Tang
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- HEC Montréal, Montreal, Quebec, Canada
- CIFAR AI Chair, Toronto, Ontario, Canada
| | - Petar Veličković
- Google DeepMind, London, UK
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
| | - Max Welling
- University of Amsterdam, Amsterdam, Netherlands
- Microsoft Research Amsterdam, Amsterdam, Netherlands
| | - Linfeng Zhang
- DP Technology, Beijing, China
- AI for Science Institute, Beijing, China
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yoshua Bengio
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- Université de Montréal, Montreal, Quebec, Canada
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
11
|
Direct generation of protein conformational ensembles via machine learning. Nat Commun 2023; 14:774. [PMID: 36774359 PMCID: PMC9922302 DOI: 10.1038/s41467-023-36443-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 02/01/2023] [Indexed: 02/13/2023] Open
Abstract
Dynamics and conformational sampling are essential for linking protein structure to biological function. While challenging to probe experimentally, computer simulations are widely used to describe protein dynamics, but at significant computational costs that continue to limit the systems that can be studied. Here, we demonstrate that machine learning can be trained with simulation data to directly generate physically realistic conformational ensembles of proteins without the need for any sampling and at negligible computational cost. As a proof-of-principle we train a generative adversarial network based on a transformer architecture with self-attention on coarse-grained simulations of intrinsically disordered peptides. The resulting model, idpGAN, can predict sequence-dependent coarse-grained ensembles for sequences that are not present in the training set demonstrating that transferability can be achieved beyond the limited training data. We also retrain idpGAN on atomistic simulation data to show that the approach can be extended in principle to higher-resolution conformational ensemble generation.
Collapse
|
12
|
Kleiman DE, Shukla D. Multiagent Reinforcement Learning-Based Adaptive Sampling for Conformational Dynamics of Proteins. J Chem Theory Comput 2022; 18:5422-5434. [PMID: 36044642 DOI: 10.1021/acs.jctc.2c00683] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Machine learning is increasingly applied to improve the efficiency and accuracy of molecular dynamics (MD) simulations. Although the growth of distributed computer clusters has allowed researchers to obtain higher amounts of data, unbiased MD simulations have difficulty sampling rare states, even under massively parallel adaptive sampling schemes. To address this issue, several algorithms inspired by reinforcement learning (RL) have arisen to promote exploration of the slow collective variables (CVs) of complex systems. Nonetheless, most of these algorithms are not well-suited to leverage the information gained by simultaneously sampling a system from different initial states (e.g., a protein in different conformations associated with distinct functional states). To fill this gap, we propose two algorithms inspired by multiagent RL that extend the functionality of closely related techniques (REAP and TSLC) to situations where the sampling can be accelerated by learning from different regions of the energy landscape through coordinated agents. Essentially, the algorithms work by remembering which agent discovered each conformation and sharing this information with others at the action-space discretization step. A stakes function is introduced to modulate how different agents sense rewards from discovered states of the system. The consequences are three-fold: (i) agents learn to prioritize CVs using only relevant data, (ii) redundant exploration is reduced, and (iii) agents that obtain higher stakes are assigned more actions. We compare our algorithm with other adaptive sampling techniques (least counts, REAP, TSLC, and AdaptiveBandit) to show and rationalize the gain in performance.
Collapse
Affiliation(s)
- Diego E Kleiman
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Diwakar Shukla
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
13
|
Zhu Z, Deng Z, Wang Q, Wang Y, Zhang D, Xu R, Guo L, Wen H. Simulation and Machine Learning Methods for Ion-Channel Structure Determination, Mechanistic Studies and Drug Design. Front Pharmacol 2022; 13:939555. [PMID: 35837274 PMCID: PMC9275593 DOI: 10.3389/fphar.2022.939555] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 06/07/2022] [Indexed: 11/13/2022] Open
Abstract
Ion channels are expressed in almost all living cells, controlling the in-and-out communications, making them ideal drug targets, especially for central nervous system diseases. However, owing to their dynamic nature and the presence of a membrane environment, ion channels remain difficult targets for the past decades. Recent advancement in cryo-electron microscopy and computational methods has shed light on this issue. An explosion in high-resolution ion channel structures paved way for structure-based rational drug design and the state-of-the-art simulation and machine learning techniques dramatically improved the efficiency and effectiveness of computer-aided drug design. Here we present an overview of how simulation and machine learning-based methods fundamentally changed the ion channel-related drug design at different levels, as well as the emerging trends in the field.
Collapse
Affiliation(s)
- Zhengdan Zhu
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Beijing Institute of Big Data Research, Beijing, China
| | - Zhenfeng Deng
- DP Technology, Beijing, China
- School of Pharmaceutical Sciences, Peking University, Beijing, China
| | | | | | - Duo Zhang
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- DP Technology, Beijing, China
| | - Ruihan Xu
- DP Technology, Beijing, China
- National Engineering Research Center of Visual Technology, Peking University, Beijing, China
| | | | - Han Wen
- DP Technology, Beijing, China
| |
Collapse
|
14
|
Wang Y, Zhang C, Tang K, Wang X. En route for molecular dynamics simulation of a living cell. FUNDAMENTAL RESEARCH 2022. [DOI: 10.1016/j.fmre.2022.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
|
15
|
Tuckerman ME. The curse of dimensionality loses its power. NATURE COMPUTATIONAL SCIENCE 2022; 2:6-7. [PMID: 38177704 DOI: 10.1038/s43588-021-00182-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Affiliation(s)
- Mark E Tuckerman
- Department of Chemistry, New York University (NYU), New York City, NY, USA.
- Courant Institute of Mathematical Sciences, New York University (NYU), New York City, NY, USA.
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, China.
| |
Collapse
|