1
|
Unke OT, Stöhr M, Ganscha S, Unterthiner T, Maennel H, Kashubin S, Ahlin D, Gastegger M, Medrano Sandonas L, Berryman JT, Tkatchenko A, Müller KR. Biomolecular dynamics with machine-learned quantum-mechanical force fields trained on diverse chemical fragments. SCIENCE ADVANCES 2024; 10:eadn4397. [PMID: 38579003 DOI: 10.1126/sciadv.adn4397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Accepted: 02/29/2024] [Indexed: 04/07/2024]
Abstract
The GEMS method enables molecular dynamics simulations of large heterogeneous systems at ab initio quality.
Collapse
Affiliation(s)
- Oliver T Unke
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Martin Stöhr
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Stefan Ganscha
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Thomas Unterthiner
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Hartmut Maennel
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Sergii Kashubin
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Daniel Ahlin
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
- BASLEARN - TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587 Berlin, Germany
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Joshua T Berryman
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| |
Collapse
|
2
|
Fu H, Bian H, Shao X, Cai W. Collective Variable-Based Enhanced Sampling: From Human Learning to Machine Learning. J Phys Chem Lett 2024; 15:1774-1783. [PMID: 38329095 DOI: 10.1021/acs.jpclett.3c03542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Enhanced-sampling algorithms relying on collective variables (CVs) are extensively employed to study complex (bio)chemical processes that are not amenable to brute-force molecular simulations. The selection of appropriate CVs characterizing the slow movement modes is of paramount importance for reliable and efficient enhanced-sampling simulations. In this Perspective, we first review the application and limitations of CVs obtained from chemical and geometrical intuition. We also introduce path-sampling algorithms, which can identify path-like CVs in a high-dimensional free-energy space. Machine-learning algorithms offer a viable approach to finding suitable CVs by analyzing trajectories from preliminary simulations. We discuss both the performance of machine-learning-derived CVs in enhanced-sampling simulations of experimental models and the challenges involved in applying these CVs to realistic, complex molecular assemblies. Moreover, we provide a prospective view of the potential advancements of machine-learning algorithms for the development of CVs in the field of enhanced-sampling simulations.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Hengwei Bian
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
3
|
Kehrein J, Sotriffer C. Molecular Dynamics Simulations for Rationalizing Polymer Bioconjugation Strategies: Challenges, Recent Developments, and Future Opportunities. ACS Biomater Sci Eng 2024; 10:51-74. [PMID: 37466304 DOI: 10.1021/acsbiomaterials.3c00636] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/20/2023]
Abstract
The covalent modification of proteins with polymers is a well-established method for improving the pharmacokinetic properties of therapeutically valuable biologics. The conjugated polymer chains of the resulting hybrid represent highly flexible macromolecular structures. As the dynamics of such systems remain rather elusive for established experimental techniques from the field of protein structure elucidation, molecular dynamics simulations have proven as a valuable tool for studying such conjugates at an atomistic level, thereby complementing experimental studies. With a focus on new developments, this review aims to provide researchers from the polymer bioconjugation field with a concise and up to date overview of such approaches. After introducing basic principles of molecular dynamics simulations, as well as methods for and potential pitfalls in modeling bioconjugates, the review illustrates how these computational techniques have contributed to the understanding of bioconjugates and bioconjugation strategies in the recent past and how they may lead to a more rational design of novel bioconjugates in the future.
Collapse
Affiliation(s)
- Josef Kehrein
- Institute of Pharmacy and Food Chemistry, University of Würzburg, Würzburg 97074, Germany
| | - Christoph Sotriffer
- Institute of Pharmacy and Food Chemistry, University of Würzburg, Würzburg 97074, Germany
| |
Collapse
|
4
|
Fu H, Chipot C, Shao X, Cai W. Standard Binding Free-Energy Calculations: How Far Are We from Automation? J Phys Chem B 2023; 127:10459-10468. [PMID: 37824848 DOI: 10.1021/acs.jpcb.3c04370] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2023]
Abstract
Recent success stories suggest that in silico protein-ligand binding free-energy calculations are approaching chemical accuracy. However, their widespread application remains limited by the extensive human intervention required, posing challenges for the neophyte. As such, it is critical to develop automated workflows for estimating protein-ligand binding affinities with minimum personal involvement. Key human efforts include setting up and tuning enhanced-sampling or alchemical-transformation algorithms as a preamble to computational binding free-energy estimations. Additionally, preparing input files, bookkeeping, and postprocessing represent nontrivial tasks. In this Perspective, we discuss recent progress in automating standard binding free-energy calculations, featuring the development of adaptive or parameter-free algorithms, standardization of binding free-energy calculation workflows, and the implementation of user-friendly software. We also assess the current state of automated standard binding free-energy calculations and evaluate the limitations of existing methods. Last, we outline the requirements for future algorithms and workflows to facilitate automated free-energy calculations for diverse protein-ligand complexes.
Collapse
Affiliation(s)
- Haohao Fu
- State Key Laboratory of Medicinal Chemical Biology, Tianjin Key Laboratory of Biosensing and Molecular Recognition, Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Christophe Chipot
- Laboratoire International Associé CNRS and University of Illinois at Urbana-Champaign, UMR no. 7019, Université de Lorraine, BP 70239, F-54506 Vandoeuvre-lès-Nancy, France
- Department of Physics, University of Illinois at Urbana-Champaign, 1110 West Green Street, Urbana, Illinois 61801, United States
- Department of Chemistry, The University of Chicago, 5735 South Ellis Avenue, Chicago, Illinois 60637, United States
- Department of Chemistry, The University of Hawai'i at Ma̅noa, 2545 McCarthy Mall, Honolulu, Hawaii 96822, United States
| | - Xueguang Shao
- State Key Laboratory of Medicinal Chemical Biology, Tianjin Key Laboratory of Biosensing and Molecular Recognition, Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- State Key Laboratory of Medicinal Chemical Biology, Tianjin Key Laboratory of Biosensing and Molecular Recognition, Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
5
|
Mustali J, Yasuda I, Hirano Y, Yasuoka K, Gautieri A, Arai N. Unsupervised deep learning for molecular dynamics simulations: a novel analysis of protein-ligand interactions in SARS-CoV-2 M pro. RSC Adv 2023; 13:34249-34261. [PMID: 38019981 PMCID: PMC10663885 DOI: 10.1039/d3ra06375e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 11/06/2023] [Indexed: 12/01/2023] Open
Abstract
Molecular dynamics (MD) simulations, which are central to drug discovery, offer detailed insights into protein-ligand interactions. However, analyzing large MD datasets remains a challenge. Current machine-learning solutions are predominantly supervised and have data labelling and standardisation issues. In this study, we adopted an unsupervised deep-learning framework, previously benchmarked for rigid proteins, to study the more flexible SARS-CoV-2 main protease (Mpro). We ran MD simulations of Mpro with various ligands and refined the data by focusing on binding-site residues and time frames in stable protein conformations. The optimal descriptor chosen was the distance between the residues and the center of the binding pocket. Using this approach, a local dynamic ensemble was generated and fed into our neural network to compute Wasserstein distances across system pairs, revealing ligand-induced conformational differences in Mpro. Dimensionality reduction yielded an embedding map that correlated ligand-induced dynamics and binding affinity. Notably, the high-affinity compounds showed pronounced effects on the protein's conformations. We also identified the key residues that contributed to these differences. Our findings emphasize the potential of combining unsupervised deep learning with MD simulations to extract valuable information and accelerate drug discovery.
Collapse
Affiliation(s)
- Jessica Mustali
- Department of Electronics, Information and Bioengineering, Politecnico di Milano Italy
| | - Ikki Yasuda
- Department of Mechanical Engineering, Keio University Japan
| | | | - Kenji Yasuoka
- Department of Mechanical Engineering, Keio University Japan
| | - Alfonso Gautieri
- Department of Electronics, Information and Bioengineering, Politecnico di Milano Italy
| | - Noriyoshi Arai
- Department of Mechanical Engineering, Keio University Japan
| |
Collapse
|
6
|
3D Conformational Generative Models for Biological Structures Using Graph Information-Embedded Relative Coordinates. MOLECULES (BASEL, SWITZERLAND) 2022; 28:molecules28010321. [PMID: 36615515 PMCID: PMC9823299 DOI: 10.3390/molecules28010321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/13/2022] [Accepted: 12/24/2022] [Indexed: 01/04/2023]
Abstract
Developing molecular generative models for directly generating 3D conformation has recently become a hot research area. Here, an autoencoder based generative model was proposed for molecular conformation generation. A unique feature of our method is that the graph information embedded relative coordinate (GIE-RC), satisfying translation and rotation invariance, was proposed as a novel way for encoding molecular three-dimensional structure. Compared with commonly used Cartesian coordinate and internal coordinate, GIE-RC is less sensitive on errors when decoding latent variables to 3D coordinates. By using this method, a complex 3D generation task can be turned into a graph node feature generation problem. Examples were shown that the GIE-RC based autoencoder model can be used for both ligand and peptide conformation generation. Additionally, this model was used as an efficient conformation sampling method to augment conformation data needed in the construction of neural network-based force field.
Collapse
|
7
|
Kleiman DE, Shukla D. Multiagent Reinforcement Learning-Based Adaptive Sampling for Conformational Dynamics of Proteins. J Chem Theory Comput 2022; 18:5422-5434. [PMID: 36044642 DOI: 10.1021/acs.jctc.2c00683] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Machine learning is increasingly applied to improve the efficiency and accuracy of molecular dynamics (MD) simulations. Although the growth of distributed computer clusters has allowed researchers to obtain higher amounts of data, unbiased MD simulations have difficulty sampling rare states, even under massively parallel adaptive sampling schemes. To address this issue, several algorithms inspired by reinforcement learning (RL) have arisen to promote exploration of the slow collective variables (CVs) of complex systems. Nonetheless, most of these algorithms are not well-suited to leverage the information gained by simultaneously sampling a system from different initial states (e.g., a protein in different conformations associated with distinct functional states). To fill this gap, we propose two algorithms inspired by multiagent RL that extend the functionality of closely related techniques (REAP and TSLC) to situations where the sampling can be accelerated by learning from different regions of the energy landscape through coordinated agents. Essentially, the algorithms work by remembering which agent discovered each conformation and sharing this information with others at the action-space discretization step. A stakes function is introduced to modulate how different agents sense rewards from discovered states of the system. The consequences are three-fold: (i) agents learn to prioritize CVs using only relevant data, (ii) redundant exploration is reduced, and (iii) agents that obtain higher stakes are assigned more actions. We compare our algorithm with other adaptive sampling techniques (least counts, REAP, TSLC, and AdaptiveBandit) to show and rationalize the gain in performance.
Collapse
Affiliation(s)
- Diego E Kleiman
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Diwakar Shukla
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
8
|
Jones D, Allen JE, Yang Y, Drew Bennett WF, Gokhale M, Moshiri N, Rosing TS. Accelerators for Classical Molecular Dynamics Simulations of Biomolecules. J Chem Theory Comput 2022; 18:4047-4069. [PMID: 35710099 PMCID: PMC9281402 DOI: 10.1021/acs.jctc.1c01214] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Atomistic Molecular Dynamics (MD) simulations provide researchers the ability to model biomolecular structures such as proteins and their interactions with drug-like small molecules with greater spatiotemporal resolution than is otherwise possible using experimental methods. MD simulations are notoriously expensive computational endeavors that have traditionally required massive investment in specialized hardware to access biologically relevant spatiotemporal scales. Our goal is to summarize the fundamental algorithms that are employed in the literature to then highlight the challenges that have affected accelerator implementations in practice. We consider three broad categories of accelerators: Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), and Application Specific Integrated Circuits (ASICs). These categories are comparatively studied to facilitate discussion of their relative trade-offs and to gain context for the current state of the art. We conclude by providing insights into the potential of emerging hardware platforms and algorithms for MD.
Collapse
Affiliation(s)
- Derek Jones
- Department
of Computer Science and Engineering, University
of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, United States
- Global
Security Computing Applications Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Jonathan E. Allen
- Global
Security Computing Applications Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Yue Yang
- Biosciences
and Biotechnology Division, Lawrence Livermore
National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - William F. Drew Bennett
- Biosciences
and Biotechnology Division, Lawrence Livermore
National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Maya Gokhale
- Center
for Applied Scientific Computing, Lawrence
Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Niema Moshiri
- Department
of Computer Science and Engineering, University
of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, United States
| | - Tajana S. Rosing
- Department
of Computer Science and Engineering, University
of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, United States
| |
Collapse
|
9
|
Li Y, Gong H. Identifying a Feasible Transition Pathway between Two Conformational States for a Protein. J Chem Theory Comput 2022; 18:4529-4543. [PMID: 35723447 DOI: 10.1021/acs.jctc.2c00390] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Proteins usually need to transit between different conformational states to fulfill their biological functions. In the mechanistic study of such transition processes by molecular dynamics simulations, identification of the minimum free energy path (MFEP) can substantially reduce the sampling space, thus enabling rigorous thermodynamic evaluation of the process. Conventionally, the MFEP is derived by iterative local optimization from an initial path, which is typically generated by simple brute force techniques like the targeted molecular dynamics (tMD). Therefore, the quality of the initial path determines the successfulness of MFEP estimation. In this work, we propose a method to improve derivation of the initial path. Through iterative relaxation-biasing simulations in a bidirectional manner, this method can construct a feasible transition pathway connecting two known states for a protein. Evaluation on small, fast-folding proteins against long equilibrium trajectories supports the good sampling efficiency of our method. When applied to larger proteins including the catalytic domain of human c-Src kinase as well as the converter domain of myosin VI, the paths generated by our method deviate significantly from those computed with the generic tMD approach. More importantly, free energy profiles and intermediate states obtained from our paths exhibit remarkable improvements over those from tMD paths with respect to both physical rationality and consistency with a priori knowledge.
Collapse
Affiliation(s)
- Yao Li
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
10
|
Xia D, Chen J, Fu Z, Xu T, Wang Z, Liu W, Xie HB, Peijnenburg WJGM. Potential Application of Machine-Learning-Based Quantum Chemical Methods in Environmental Chemistry. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:2115-2123. [PMID: 35084191 DOI: 10.1021/acs.est.1c05970] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
It is an important topic in environmental sciences to understand the behavior and toxicology of chemical pollutants. Quantum chemical methodologies have served as useful tools for probing behavior and toxicology of chemical pollutants in recent decades. In recent years, machine learning (ML) techniques have brought revolutionary developments to the field of quantum chemistry, which may be beneficial for investigating environmental behavior and toxicology of chemical pollutants. However, the ML-based quantum chemical methods (ML-QCMs) have only scarcely been used in environmental chemical studies so far. To promote applications of the promising methods, this Perspective summarizes recent progress in the ML-QCMs and focuses on their potential applications in environmental chemical studies that could hardly be achieved by the conventional quantum chemical methods. Potential applications and challenges of the ML-QCMs in predicting degradation networks of chemical pollutants, searching global minima for atmospheric nanoclusters, discovering heterogeneous or photochemical transformation pathways of pollutants, as well as predicting environmentally relevant end points with wave functions as descriptors are introduced and discussed.
Collapse
Affiliation(s)
- Deming Xia
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zhiqiang Fu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Tong Xu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zhongyu Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Wenjia Liu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Hong-Bin Xie
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Willie J G M Peijnenburg
- Institute of Environmental Sciences (CML), Leiden University, Leiden 2300 RA, The Netherlands
- Centre for Safety of Substances and Products, National Institute of Public Health and the Environment (RIVM), Bilthoven 3720 BA, The Netherlands
| |
Collapse
|
11
|
Huang Y, Xia Y, Yang L, Wei J, Yang YI, Gao YQ. SPONGE
: A
GPU‐Accelerated
Molecular Dynamics Package with Enhanced Sampling and
AI‐Driven
Algorithms. CHINESE J CHEM 2022. [DOI: 10.1002/cjoc.202100456] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Affiliation(s)
- Yu‐Peng Huang
- College of Chemistry and Molecular Engineering Peking University Beijing 100871 China
- Beijing National Laboratory for Molecular Sciences Peking University Beijing 100871 China
- Biomedical Pioneering Innovation Center Peking University Beijing 100871 China
| | - Yijie Xia
- College of Chemistry and Molecular Engineering Peking University Beijing 100871 China
- Beijing National Laboratory for Molecular Sciences Peking University Beijing 100871 China
- Biomedical Pioneering Innovation Center Peking University Beijing 100871 China
| | - Lijiang Yang
- College of Chemistry and Molecular Engineering Peking University Beijing 100871 China
- Beijing National Laboratory for Molecular Sciences Peking University Beijing 100871 China
- Biomedical Pioneering Innovation Center Peking University Beijing 100871 China
- Beijing Advanced Innovation Center for Genomics Peking University Beijing 100871 China
| | - Jiachen Wei
- State Key Laboratory of Nonlinear Mechanics and Beijing Key Laboratory of Engineered Construction and Mechanobiology, Institute of Mechanics Chinese Academy of Sciences Beijing 100190 China
- Shenzhen Bay Laboratory, Gaoke Innovation Center, Guangqiao Road, Guangming District Shenzhen Guangdong 518132 China
| | - Yi Isaac Yang
- Shenzhen Bay Laboratory, Gaoke Innovation Center, Guangqiao Road, Guangming District Shenzhen Guangdong 518132 China
| | - Yi Qin Gao
- College of Chemistry and Molecular Engineering Peking University Beijing 100871 China
- Beijing National Laboratory for Molecular Sciences Peking University Beijing 100871 China
- Biomedical Pioneering Innovation Center Peking University Beijing 100871 China
- Beijing Advanced Innovation Center for Genomics Peking University Beijing 100871 China
- Shenzhen Bay Laboratory, Gaoke Innovation Center, Guangqiao Road, Guangming District Shenzhen Guangdong 518132 China
| |
Collapse
|
12
|
Bal KM. Reweighted Jarzynski Sampling: Acceleration of Rare Events and Free Energy Calculation with a Bias Potential Learned from Nonequilibrium Work. J Chem Theory Comput 2021; 17:6766-6774. [PMID: 34714088 DOI: 10.1021/acs.jctc.1c00574] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
We introduce a simple enhanced sampling approach for the calculation of free energy differences and barriers along a one-dimensional reaction coordinate. First, a small number of short nonequilibrium simulations are carried out along the reaction coordinate, and the Jarzynski equality is used to learn an approximate free energy surface from the nonequilibrium work distribution. This free energy estimate is represented in a compact form as an artificial neural network and used as an external bias potential to accelerate rare events in a subsequent molecular dynamics simulation. The final free energy estimate is then obtained by reweighting the equilibrium probability distribution of the reaction coordinate sampled under the influence of the external bias. We apply our reweighted Jarzynski sampling recipe to four processes of varying scales and complexities─spanning chemical reaction in the gas phase, pair association in solution, and droplet nucleation in supersaturated vapor. In all cases, we find reweighted Jarzynski sampling to be a very efficient strategy, resulting in rapid convergence of the free energy to high precision.
Collapse
Affiliation(s)
- Kristof M Bal
- Department of Chemistry and NANOlab Center of Excellence, University of Antwerp, Universiteitsplein 1, 2610 Antwerp, Belgium
| |
Collapse
|
13
|
Das A, Rose DC, Garrahan JP, Limmer DT. Reinforcement learning of rare diffusive dynamics. J Chem Phys 2021; 155:134105. [PMID: 34624994 DOI: 10.1063/5.0057323] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We present a method to probe rare molecular dynamics trajectories directly using reinforcement learning. We consider trajectories that are conditioned to transition between regions of configuration space in finite time, such as those relevant in the study of reactive events, and trajectories exhibiting rare fluctuations of time-integrated quantities in the long time limit, such as those relevant in the calculation of large deviation functions. In both cases, reinforcement learning techniques are used to optimize an added force that minimizes the Kullback-Leibler divergence between the conditioned trajectory ensemble and a driven one. Under the optimized added force, the system evolves the rare fluctuation as a typical one, affording a variational estimate of its likelihood in the original trajectory ensemble. Low variance gradients employing value functions are proposed to increase the convergence of the optimal force. The method we develop employing these gradients leads to efficient and accurate estimates of both the optimal force and the likelihood of the rare event for a variety of model systems.
Collapse
Affiliation(s)
- Avishek Das
- Department of Chemistry, University of California, Berkeley, California 94609, USA
| | - Dominic C Rose
- School of Physics and Astronomy, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | - Juan P Garrahan
- School of Physics and Astronomy, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | - David T Limmer
- Department of Chemistry, University of California, Berkeley, California 94609, USA
| |
Collapse
|
14
|
Glielmo A, Husic BE, Rodriguez A, Clementi C, Noé F, Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem Rev 2021; 121:9722-9758. [PMID: 33945269 PMCID: PMC8391792 DOI: 10.1021/acs.chemrev.0c01195] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Indexed: 12/21/2022]
Abstract
Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
Collapse
Affiliation(s)
- Aldo Glielmo
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
| | - Brooke E. Husic
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
| | - Alex Rodriguez
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| | - Cecilia Clementi
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Frank Noé
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Alessandro Laio
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| |
Collapse
|
15
|
Unke O, Chmiela S, Sauceda HE, Gastegger M, Poltavsky I, Schütt KT, Tkatchenko A, Müller KR. Machine Learning Force Fields. Chem Rev 2021; 121:10142-10186. [PMID: 33705118 PMCID: PMC8391964 DOI: 10.1021/acs.chemrev.0c01111] [Citation(s) in RCA: 404] [Impact Index Per Article: 134.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Indexed: 12/27/2022]
Abstract
In recent years, the use of machine learning (ML) in computational chemistry has enabled numerous advances previously out of reach due to the computational complexity of traditional electronic-structure methods. One of the most promising applications is the construction of ML-based force fields (FFs), with the aim to narrow the gap between the accuracy of ab initio methods and the efficiency of classical FFs. The key idea is to learn the statistical relation between chemical structure and potential energy without relying on a preconceived notion of fixed chemical bonds or knowledge about the relevant interactions. Such universal ML approximations are in principle only limited by the quality and quantity of the reference data used to train them. This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them. The core concepts underlying ML-FFs are described in detail, and a step-by-step guide for constructing and testing them from scratch is given. The text concludes with a discussion of the challenges that remain to be overcome by the next generation of ML-FFs.
Collapse
Affiliation(s)
- Oliver
T. Unke
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Stefan Chmiela
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Huziel E. Sauceda
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Michael Gastegger
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Igor Poltavsky
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Kristof T. Schütt
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BIFOLD−Berlin
Institute for the Foundations of Learning and Data, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
- Max Planck
Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- Google
Research, Brain Team, Berlin, Germany
| |
Collapse
|
16
|
Schwalbe-Koda D, Tan AR, Gómez-Bombarelli R. Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks. Nat Commun 2021; 12:5104. [PMID: 34429418 PMCID: PMC8384857 DOI: 10.1038/s41467-021-25342-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 08/02/2021] [Indexed: 12/14/2022] Open
Abstract
Neural network (NN) interatomic potentials provide fast prediction of potential energy surfaces, closely matching the accuracy of the electronic structure methods used to produce the training data. However, NN predictions are only reliable within well-learned training domains, and show volatile behavior when extrapolating. Uncertainty quantification methods can flag atomic configurations for which prediction confidence is low, but arriving at such uncertain regions requires expensive sampling of the NN phase space, often using atomistic simulations. Here, we exploit automatic differentiation to drive atomistic systems towards high-likelihood, high-uncertainty configurations without the need for molecular dynamics simulations. By performing adversarial attacks on an uncertainty metric, informative geometries that expand the training domain of NNs are sampled. When combined with an active learning loop, this approach bootstraps and improves NN potentials while decreasing the number of calls to the ground truth method. This efficiency is demonstrated on sampling of kinetic barriers, collective variables in molecules, and supramolecular chemistry in zeolite-molecule interactions, and can be extended to any NN potential architecture and materials system.
Collapse
Affiliation(s)
- Daniel Schwalbe-Koda
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Aik Rui Tan
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Rafael Gómez-Bombarelli
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
17
|
Bolnykh V, Rossetti G, Rothlisberger U, Carloni P. Expanding the boundaries of ligand–target modeling by exascale calculations. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1535] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Viacheslav Bolnykh
- Laboratory of Computational Chemistry and Biochemistry École Polytechnique Fédérale de Lausanne Lausanne Switzerland
- Computational Biomedicine, Institute of Neuroscience and Medicine (INM‐9)/Institute for Advanced Simulations (IAS‐5) Forschungszentrum Jülich Jülich Germany
| | - Giulia Rossetti
- Computational Biomedicine, Institute of Neuroscience and Medicine (INM‐9)/Institute for Advanced Simulations (IAS‐5) Forschungszentrum Jülich Jülich Germany
- Jülich Supercomputing Centre (JSC) Forschungszentrum Jülich Jülich Germany
- Department of Hematology, Oncology, Hemostaseology and Stem Cell Transplantation University Hospital Aachen RWTH Aachen University Aachen Germany
| | - Ursula Rothlisberger
- Laboratory of Computational Chemistry and Biochemistry École Polytechnique Fédérale de Lausanne Lausanne Switzerland
| | - Paolo Carloni
- Institute for Neuroscience and Medicine and Institute for Advanced Simulations (IAS‐5/INM‐9) “Computational Biomedicine” Forschungszentrum Jülich Jülich Germany
- JARA‐Institute INM‐11 “Molecular Neuroscience and Neuroimaging” Forschungszentrum Jülich Jülich Germany
| |
Collapse
|
18
|
Computational methods for exploring protein conformations. Biochem Soc Trans 2021; 48:1707-1724. [PMID: 32756904 PMCID: PMC7458412 DOI: 10.1042/bst20200193] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 07/07/2020] [Accepted: 07/09/2020] [Indexed: 12/13/2022]
Abstract
Proteins are dynamic molecules that can transition between a potentially wide range of structures comprising their conformational ensemble. The nature of these conformations and their relative probabilities are described by a high-dimensional free energy landscape. While computer simulation techniques such as molecular dynamics simulations allow characterisation of the metastable conformational states and the transitions between them, and thus free energy landscapes, to be characterised, the barriers between states can be high, precluding efficient sampling without substantial computational resources. Over the past decades, a dizzying array of methods have emerged for enhancing conformational sampling, and for projecting the free energy landscape onto a reduced set of dimensions that allow conformational states to be distinguished, known as collective variables (CVs), along which sampling may be directed. Here, a brief description of what biomolecular simulation entails is followed by a more detailed exposition of the nature of CVs and methods for determining these, and, lastly, an overview of the myriad different approaches for enhancing conformational sampling, most of which rely upon CVs, including new advances in both CV determination and conformational sampling due to machine learning.
Collapse
|
19
|
Schlick T, Portillo-Ledesma S. Biomolecular modeling thrives in the age of technology. NATURE COMPUTATIONAL SCIENCE 2021; 1:321-331. [PMID: 34423314 PMCID: PMC8378674 DOI: 10.1038/s43588-021-00060-9] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/22/2021] [Indexed: 12/12/2022]
Abstract
The biomolecular modeling field has flourished since its early days in the 1970s due to the rapid adaptation and tailoring of state-of-the-art technology. The resulting dramatic increase in size and timespan of biomolecular simulations has outpaced Moore's law. Here, we discuss the role of knowledge-based versus physics-based methods and hardware versus software advances in propelling the field forward. This rapid adaptation and outreach suggests a bright future for modeling, where theory, experimentation and simulation define three pillars needed to address future scientific and biomedical challenges.
Collapse
Affiliation(s)
- Tamar Schlick
- Department of Chemistry, New York University, New York, NY, USA
- Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
- New York University–East China Normal University Center for Computational Chemistry at New York University Shanghai, Shanghai, China
| | | |
Collapse
|
20
|
Lee EMY, Ludwig T, Yu B, Singh AR, Gygi F, Nørskov JK, de Pablo JJ. Neural Network Sampling of the Free Energy Landscape for Nitrogen Dissociation on Ruthenium. J Phys Chem Lett 2021; 12:2954-2962. [PMID: 33729797 DOI: 10.1021/acs.jpclett.1c00195] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In heterogeneous catalysis, free energy profiles of reactions govern the mechanisms, rates, and equilibria. Energetics are conventionally computed using the harmonic approximation (HA), which requires determination of critical states a priori. Here, we use neural networks to efficiently sample and directly calculate the free energy surface (FES) of a prototypical heterogeneous catalysis reaction-the dissociation of molecular nitrogen on ruthenium-at density-functional-theory-level accuracy. We find that the vibrational entropy of surface atoms, often neglected in HA for transition metal catalysts, contributes significantly to the reaction barrier. The minimum free energy path for dissociation reveals an "on-top" adsorbed molecular state prior to the transition state. While a previously reported flat-lying molecular metastable state can be identified in the potential energy surface, it is absent in the FES at relevant reaction temperatures. These findings demonstrate the importance of identifying critical points self-consistently on the FES for reactions that involve considerable entropic effects.
Collapse
Affiliation(s)
- Elizabeth M Y Lee
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois 60637, United States
| | - Thomas Ludwig
- SUNCAT Center for Interface Science and Catalysis, Department of Chemical Engineering, Stanford University, Stanford, California 94305, United States
- SUNCAT Center for Interface Science and Catalysis, SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, California 94025, United States
| | - Boyuan Yu
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois 60637, United States
| | - Aayush R Singh
- SUNCAT Center for Interface Science and Catalysis, Department of Chemical Engineering, Stanford University, Stanford, California 94305, United States
| | - François Gygi
- Department of Computer Science, University of California, Davis, California 95616, United States
| | - Jens K Nørskov
- SUNCAT Center for Interface Science and Catalysis, Department of Chemical Engineering, Stanford University, Stanford, California 94305, United States
- SUNCAT Center for Interface Science and Catalysis, SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, California 94025, United States
- Department of Physics, Technical University of Denmark, Lyngby 2800, Denmark
| | - Juan J de Pablo
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois 60637, United States
- Argonne National Laboratory, 9700 Cass Avenue, Lemont, Illinois 60439, United States
| |
Collapse
|
21
|
Zhang J, Lei YK, Zhang Z, Han X, Li M, Yang L, Yang YI, Gao YQ. Deep reinforcement learning of transition states. Phys Chem Chem Phys 2021; 23:6888-6895. [PMID: 33729229 DOI: 10.1039/d0cp06184k] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Combining reinforcement learning (RL) and molecular dynamics (MD) simulations, we propose a machine-learning approach, called RL‡, to automatically unravel chemical reaction mechanisms. In RL‡, locating the transition state of a chemical reaction is formulated as a game, and two functions are optimized, one for value estimation and the other for policy making, to iteratively improve our chance of winning this game. Both functions can be approximated by deep neural networks. By virtue of RL‡, one can directly interpret the reaction mechanism according to the value function. Meanwhile, the policy function allows efficient sampling of the transition path ensemble, which can be further used to analyze reaction dynamics and kinetics. Through multiple experiments, we show that RL‡ can be trained tabula rasa hence allowing us to reveal chemical reaction mechanisms with minimal subjective biases.
Collapse
Affiliation(s)
- Jun Zhang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China.
| | | | | | | | | | | | | | | |
Collapse
|
22
|
Abstract
Markov chain Monte Carlo methods are a powerful tool for sampling equilibrium configurations in complex systems. One problem these methods often face is slow convergence over large energy barriers. In this work, we propose a novel method that increases convergence in systems composed of many metastable states. This method aims to connect metastable regions directly using generative neural networks in order to propose new configurations in the Markov chain and optimizes the acceptance probability of large jumps between modes in the configuration space. We provide a comprehensive theory as well as a training scheme for the network and demonstrate the method on example systems.
Collapse
Affiliation(s)
- Luigi Sbailò
- Freie Universität Berlin, Department of Mathematics and Computer Science, Arnimallee 6, 14195 Berlin, Germany
| | - Manuel Dibak
- Freie Universität Berlin, Department of Mathematics and Computer Science, Arnimallee 6, 14195 Berlin, Germany
| | - Frank Noé
- Freie Universität Berlin, Department of Mathematics and Computer Science, Arnimallee 6, 14195 Berlin, Germany
| |
Collapse
|
23
|
Ramanathan A, Ma H, Parvatikar A, Chennubhotla SC. Artificial intelligence techniques for integrative structural biology of intrinsically disordered proteins. Curr Opin Struct Biol 2021; 66:216-224. [PMID: 33421906 DOI: 10.1016/j.sbi.2020.12.001] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 12/01/2020] [Accepted: 12/03/2020] [Indexed: 12/16/2022]
Abstract
We outline recent developments in artificial intelligence (AI) and machine learning (ML) techniques for integrative structural biology of intrinsically disordered proteins (IDP) ensembles. IDPs challenge the traditional protein structure-function paradigm by adapting their conformations in response to specific binding partners leading them to mediate diverse, and often complex cellular functions such as biological signaling, self-organization and compartmentalization. Obtaining mechanistic insights into their function can therefore be challenging for traditional structural determination techniques. Often, scientists have to rely on piecemeal evidence drawn from diverse experimental techniques to characterize their functional mechanisms. Multiscale simulations can help bridge critical knowledge gaps about IDP structure-function relationships-however, these techniques also face challenges in resolving emergent phenomena within IDP conformational ensembles. We posit that scalable statistical inference techniques can effectively integrate information gleaned from multiple experimental techniques as well as from simulations, thus providing access to atomistic details of these emergent phenomena.
Collapse
Affiliation(s)
- Arvind Ramanathan
- Data Science & Learning Division, Argonne National Laboratory, Lemont, IL 60439, United States; Consortium for Advanced Science and Engineering (CASE), University of Chicago, Hyde Park, IL, United States.
| | - Heng Ma
- Data Science & Learning Division, Argonne National Laboratory, Lemont, IL 60439, United States
| | - Akash Parvatikar
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - S Chakra Chennubhotla
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, United States
| |
Collapse
|
24
|
|
25
|
Zhang J, Lei YK, Yang YI, Gao YQ. Deep learning for variational multiscale molecular modeling. J Chem Phys 2020; 153:174115. [DOI: 10.1063/5.0026836] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Jun Zhang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Yao-Kun Lei
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
| | - Yi Isaac Yang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China
| | - Yi Qin Gao
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
- Beijing Advanced Innovation Center for Genomics, Peking University, 100871 Beijing, China
- Biomedical Pioneering Innovation Center, Peking University, 100871 Beijing, China
| |
Collapse
|
26
|
Kanwar G, Albergo MS, Boyda D, Cranmer K, Hackett DC, Racanière S, Rezende DJ, Shanahan PE. Equivariant Flow-Based Sampling for Lattice Gauge Theory. PHYSICAL REVIEW LETTERS 2020; 125:121601. [PMID: 33016765 DOI: 10.1103/physrevlett.125.121601] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 08/14/2020] [Accepted: 08/24/2020] [Indexed: 05/24/2023]
Abstract
We define a class of machine-learned flow-based sampling algorithms for lattice gauge theories that are gauge invariant by construction. We demonstrate the application of this framework to U(1) gauge theory in two spacetime dimensions, and find that, at small bare coupling, the approach is orders of magnitude more efficient at sampling topological quantities than more traditional sampling procedures such as hybrid Monte Carlo and heat bath.
Collapse
Affiliation(s)
- Gurtej Kanwar
- Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Michael S Albergo
- Center for Cosmology and Particle Physics, New York University, New York, New York 10003, USA
| | - Denis Boyda
- Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Kyle Cranmer
- Center for Cosmology and Particle Physics, New York University, New York, New York 10003, USA
| | - Daniel C Hackett
- Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Sébastien Racanière
- DeepMind Technologies Limited, 5 New Street Square, London EC4A 3TW, United Kingdom
| | | | - Phiala E Shanahan
- Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
27
|
Zhang J, Lei YK, Zhang Z, Chang J, Li M, Han X, Yang L, Yang YI, Gao YQ. A Perspective on Deep Learning for Molecular Modeling and Simulations. J Phys Chem B 2020. [DOI: 10.1021/acs.jpcb.0c04473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
28
|
Zhang J, Lei YK, Zhang Z, Chang J, Li M, Han X, Yang L, Yang YI, Gao YQ. A Perspective on Deep Learning for Molecular Modeling and Simulations. J Phys Chem A 2020; 124:6745-6763. [PMID: 32786668 DOI: 10.1021/acs.jpca.0c04473] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Deep learning is transforming many areas in science, and it has great potential in modeling molecular systems. However, unlike the mature deployment of deep learning in computer vision and natural language processing, its development in molecular modeling and simulations is still at an early stage, largely because the inductive biases of molecules are completely different from those of images or texts. Footed on these differences, we first reviewed the limitations of traditional deep learning models from the perspective of molecular physics and wrapped up some relevant technical advancement at the interface between molecular modeling and deep learning. We do not focus merely on the ever more complex neural network models; instead, we introduce various useful concepts and ideas brought by modern deep learning. We hope that transacting these ideas into molecular modeling will create new opportunities. For this purpose, we summarized several representative applications, ranging from supervised to unsupervised and reinforcement learning, and discussed their connections with the emerging trends in deep learning. Finally, we give an outlook for promising directions which may help address the existing issues in the current framework of deep molecular modeling.
Collapse
Affiliation(s)
- Jun Zhang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Yao-Kun Lei
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
| | - Zhen Zhang
- Department of Physics, Tangshan Normal University, 063000 Tangshan, China
| | - Junhan Chang
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
| | - Maodong Li
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China
| | - Xu Han
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
| | - Lijiang Yang
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
| | - Yi Isaac Yang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China
| | - Yi Qin Gao
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
- Beijing Advanced Innovation Center for Genomics, Peking University, 100871 Beijing, China
- Biomedical Pioneering Innovation Center, Peking University, 100871 Beijing, China
| |
Collapse
|
29
|
Saravanan KM, Zhang H, Zhang H, Xi W, Wei Y. On the Conformational Dynamics of β-Amyloid Forming Peptides: A Computational Perspective. Front Bioeng Biotechnol 2020; 8:532. [PMID: 32656188 PMCID: PMC7325929 DOI: 10.3389/fbioe.2020.00532] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 05/04/2020] [Indexed: 12/12/2022] Open
Abstract
Understanding the conformational dynamics of proteins and peptides involved in important functions is still a difficult task in computational structural biology. Because such conformational transitions in β-amyloid (Aβ) forming peptides play a crucial role in many neurological disorders, researchers from different scientific fields have been trying to address issues related to the folding of Aβ forming peptides together. Many theoretical models have been proposed in the recent years for studying Aβ peptides using mathematical, physicochemical, and molecular dynamics simulation, and machine learning approaches. In this article, we have comprehensively reviewed the developmental advances in the theoretical models for Aβ peptide folding and interactions, particularly in the context of neurological disorders. Furthermore, we have extensively reviewed the advances in molecular dynamics simulation as a tool used for studying the conversions between polymorphic amyloid forms and applications of using machine learning approaches in predicting Aβ peptides and aggregation-prone regions in proteins. We have also provided details on the theoretical advances in the study of Aβ peptides, which would enhance our understanding of these peptides at the molecular level and eventually lead to the development of targeted therapies for certain acute neurological disorders such as Alzheimer's disease in the future.
Collapse
Affiliation(s)
| | | | | | - Wenhui Xi
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yanjie Wei
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
30
|
Abstract
Machine learning (ML) is transforming all areas of science. The complex and time-consuming calculations in molecular simulations are particularly suitable for an ML revolution and have already been profoundly affected by the application of existing ML methods. Here we review recent ML methods for molecular simulation, with particular focus on (deep) neural networks for the prediction of quantum-mechanical energies and forces, on coarse-grained molecular dynamics, on the extraction of free energy surfaces and kinetics, and on generative network approaches to sample molecular equilibrium structures and compute thermodynamics. To explain these methods and illustrate open methodological problems, we review some important principles of molecular physics and describe how they can be incorporated into ML structures. Finally, we identify and describe a list of open challenges for the interface between ML and molecular simulation.
Collapse
Affiliation(s)
- Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; .,Department of Physics, Freie Universität Berlin, 14195 Berlin, Germany.,Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA;
| | - Alexandre Tkatchenko
- Physics and Materials Science Research Unit, University of Luxembourg, 1511 Luxembourg, Luxembourg;
| | - Klaus-Robert Müller
- Department of Computer Science, Technical University Berlin, 10587 Berlin, Germany; .,Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany.,Department of Brain and Cognitive Engineering, Korea University, Seoul 136-713, South Korea
| | - Cecilia Clementi
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; .,Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA; .,Department of Physics, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
31
|
Affiliation(s)
- Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- Department of Physics, Freie Universität Berlin, Berlin, Germany
| | - Edina Rosta
- Department of Chemistry, Kings College London, London, England
| |
Collapse
|