1
|
Wu Z, Zhou T. Structural Coarse-Graining via Multiobjective Optimization with Differentiable Simulation. J Chem Theory Comput 2024; 20:2605-2617. [PMID: 38483262 DOI: 10.1021/acs.jctc.3c01348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
In the realm of multiscale molecular simulations, structure-based coarse-graining is a prominent approach for creating efficient coarse-grained (CG) representations of soft matter systems, such as polymers. This involves optimizing CG interactions by matching static correlation functions of the corresponding degrees of freedom in all-atom (AA) models. Here, we present a versatile method, namely, differentiable coarse-graining (DiffCG), which combines multiobjective optimization and differentiable simulation. The DiffCG approach is capable of constructing robust CG models by iteratively optimizing the effective potentials to simultaneously match multiple target properties. We demonstrate our approach by concurrently optimizing bonded and nonbonded potentials of a CG model of polystyrene (PS) melts. The resulting CG-PS model effectively reproduces both the structural characteristics, such as the equilibrium probability distribution of microscopic degrees of freedom and the thermodynamic pressure of the AA counterpart. More importantly, leveraging the multiobjective optimization capability, we develop a precise and efficient CG model for PS melts that is transferable across a wide range of temperatures, i.e., from 400 to 600 K. It is achieved via optimizing a pairwise potential with nonlinear temperature dependence in the CG model to simultaneously match target data from AA-MD simulations at multiple thermodynamic states. The temperature transferable CG-PS model demonstrates its ability to accurately predict the radial distribution functions and density at different temperatures, including those that are not included in the target thermodynamic states. Our work opens up a promising route for developing accurate and transferable CG models of complex soft-matter systems through multiobjective optimization with differentiable simulation.
Collapse
Affiliation(s)
- Zhenghao Wu
- Department of Chemistry, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, P. R. China
| | - Tianhang Zhou
- College of Carbon Neutrality Future Technology, State Key Laboratory of Heavy Oil Processing, China University of Petroleum (Beijing), Beijing 102249, P. R. China
| |
Collapse
|
2
|
Navarro C, Majewski M, De Fabritiis G. Top-Down Machine Learning of Coarse-Grained Protein Force Fields. J Chem Theory Comput 2023; 19:7518-7526. [PMID: 37874270 PMCID: PMC10777392 DOI: 10.1021/acs.jctc.3c00638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Indexed: 10/25/2023]
Abstract
Developing accurate and efficient coarse-grained representations of proteins is crucial for understanding their folding, function, and interactions over extended time scales. Our methodology involves simulating proteins with molecular dynamics and utilizing the resulting trajectories to train a neural network potential through differentiable trajectory reweighting. Remarkably, this method requires only the native conformation of proteins, eliminating the need for labeled data derived from extensive simulations or memory-intensive end-to-end differentiable simulations. Once trained, the model can be employed to run parallel molecular dynamics simulations and sample folding events for proteins both within and beyond the training distribution, showcasing its extrapolation capabilities. By applying Markov state models, native-like conformations of the simulated proteins can be predicted from the coarse-grained simulations. Owing to its theoretical transferability and ability to use solely experimental static structures as training data, we anticipate that this approach will prove advantageous for developing new protein force fields and further advancing the study of protein dynamics, folding, and interactions.
Collapse
Affiliation(s)
- Carles Navarro
- Acellera
Labs, Doctor Trueta 183, 08005 Barcelona, Spain
| | | | - Gianni De Fabritiis
- Computational
Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera
Ltd., Devonshire House
582, Middlesex HA7 1JS, United Kingdom
- Institució
Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
3
|
Kandathil SM, Lau AM, Jones DT. Machine learning methods for predicting protein structure from single sequences. Curr Opin Struct Biol 2023; 81:102627. [PMID: 37320955 DOI: 10.1016/j.sbi.2023.102627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/17/2023] [Accepted: 05/17/2023] [Indexed: 06/17/2023]
Abstract
Recent breakthroughs in protein structure prediction have increasingly relied on the use of deep neural networks. These recent methods are notable in that they produce 3-D atomic coordinates as a direct output of the networks, a feature which presents many advantages. Although most techniques of this type make use of multiple sequence alignments as their primary input, a new wave of methods have attempted to use just single sequences as the input. We discuss the make-up and operating principles of these models, and highlight new developments in these areas, as well as areas for future development.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - Andy M Lau
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - David T Jones
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom.
| |
Collapse
|
4
|
Ding Y, Yu K, Huang J. Data science techniques in biomolecular force field development. Curr Opin Struct Biol 2023; 78:102502. [PMID: 36462448 DOI: 10.1016/j.sbi.2022.102502] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 10/18/2022] [Accepted: 10/25/2022] [Indexed: 12/03/2022]
Abstract
Recent advances in data science are impacting the development of classical force fields. Here we review some ideas and techniques from data science that have been used in force field development, including database construction, atom typing, and machine learning potentials. We highlight how new tools such as active learning and automatic differentiation are facilitating the generation of target data and the direct fitting with macroscopic observables. Philosophical changes on how force field models should be built and used are also discussed. It's inspiring that more accurate biomolecular force fields can be developed with the aid of data science techniques.
Collapse
Affiliation(s)
- Ye Ding
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, 310024, China; Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, 310024, China
| | - Kuang Yu
- Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, 518055, China
| | - Jing Huang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, 310024, China; Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, 310024, China.
| |
Collapse
|
5
|
Wang W, Wu Z, Dietschreit JCB, Gómez-Bombarelli R. Learning pair potentials using differentiable simulations. J Chem Phys 2023; 158:044113. [PMID: 36725529 DOI: 10.1063/5.0126475] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Learning pair interactions from experimental or simulation data is of great interest for molecular simulations. We propose a general stochastic method for learning pair interactions from data using differentiable simulations (DiffSim). DiffSim defines a loss function based on structural observables, such as the radial distribution function, through molecular dynamics (MD) simulations. The interaction potentials are then learned directly by stochastic gradient descent, using backpropagation to calculate the gradient of the structural loss metric with respect to the interaction potential through the MD simulation. This gradient-based method is flexible and can be configured to simulate and optimize multiple systems simultaneously. For example, it is possible to simultaneously learn potentials for different temperatures or for different compositions. We demonstrate the approach by recovering simple pair potentials, such as Lennard-Jones systems, from radial distribution functions. We find that DiffSim can be used to probe a wider functional space of pair potentials compared with traditional methods like iterative Boltzmann inversion. We show that our methods can be used to simultaneously fit potentials for simulations at different compositions and temperatures to improve the transferability of the learned potentials.
Collapse
Affiliation(s)
- Wujie Wang
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, USA
| | - Zhenghao Wu
- Eduard-Zintl-Institut für Anorganische und Physikalische Chemie, Technische Universität Darmstadt, Alarich-Weiss-Str. 8, 64287 Darmstadt, Germany
| | - Johannes C B Dietschreit
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, USA
| | - Rafael Gómez-Bombarelli
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, USA
| |
Collapse
|
6
|
End-to-end differentiable blind tip reconstruction for noisy atomic force microscopy images. Sci Rep 2023; 13:129. [PMID: 36599879 DOI: 10.1038/s41598-022-27057-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 12/23/2022] [Indexed: 01/06/2023] Open
Abstract
Observing the structural dynamics of biomolecules is vital to deepening our understanding of biomolecular functions. High-speed (HS) atomic force microscopy (AFM) is a powerful method to measure biomolecular behavior at near physiological conditions. In the AFM, measured image profiles on a molecular surface are distorted by the tip shape through the interactions between the tip and molecule. Once the tip shape is known, AFM images can be approximately deconvolved to reconstruct the surface geometry of the sample molecule. Thus, knowing the correct tip shape is an important issue in the AFM image analysis. The blind tip reconstruction (BTR) method developed by Villarrubia (J Res Natl Inst Stand Technol 102:425, 1997) is an algorithm that estimates tip shape only from AFM images using mathematical morphology operators. While the BTR works perfectly for noise-free AFM images, the algorithm is susceptible to noise. To overcome this issue, we here propose an alternative BTR method, called end-to-end differentiable BTR, based on a modern machine learning approach. In the method, we introduce a loss function including a regularization term to prevent overfitting to noise, and the tip shape is optimized with automatic differentiation and backpropagations developed in deep learning frameworks. Using noisy pseudo-AFM images of myosin V motor domain as test cases, we show that our end-to-end differentiable BTR is robust against noise in AFM images. The method can also detect a double-tip shape and deconvolve doubled molecular images. Finally, application to real HS-AFM data of myosin V walking on an actin filament shows that the method can reconstruct the accurate surface geometry of actomyosin consistent with the structural model. Our method serves as a general post-processing for reconstructing hidden molecular surfaces from any AFM images. Codes are available at https://github.com/matsunagalab/differentiable_BTR .
Collapse
|
7
|
Abstract
![]()
AlphaFold has burst into our lives. A powerful algorithm
that underscores
the strength of biological sequence data and artificial intelligence
(AI). AlphaFold has appended projects and research directions. The
database it has been creating promises an untold number of applications
with vast potential impacts that are still difficult to surmise. AI
approaches can revolutionize personalized treatments and usher in
better-informed clinical trials. They promise to make giant leaps
toward reshaping and revamping drug discovery strategies, selecting
and prioritizing combinations of drug targets. Here, we briefly overview
AI in structural biology, including in molecular dynamics simulations
and prediction of microbiota–human protein–protein interactions.
We highlight the advancements accomplished by the deep-learning-powered
AlphaFold in protein structure prediction and their powerful impact
on the life sciences. At the same time, AlphaFold does not resolve
the decades-long protein folding challenge, nor does it identify the
folding pathways. The models that AlphaFold provides do not capture
conformational mechanisms like frustration and allostery, which are
rooted in ensembles, and controlled by their dynamic distributions.
Allostery and signaling are properties of populations. AlphaFold also
does not generate ensembles of intrinsically disordered proteins and
regions, instead describing them by their low structural probabilities.
Since AlphaFold generates single ranked structures, rather than conformational
ensembles, it cannot elucidate the mechanisms of allosteric activating
driver hotspot mutations nor of allosteric drug resistance. However,
by capturing key features, deep learning techniques can use the single
predicted conformation as the basis for generating a diverse ensemble.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States.,Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Mingzhen Zhang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| | - Yonglan Liu
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, Maryland 21702, United States
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| |
Collapse
|
8
|
Rudden LSP, Hijazi M, Barth P. Deep learning approaches for conformational flexibility and switching properties in protein design. Front Mol Biosci 2022; 9:928534. [PMID: 36032687 PMCID: PMC9399439 DOI: 10.3389/fmolb.2022.928534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/15/2022] [Indexed: 11/30/2022] Open
Abstract
Following the hugely successful application of deep learning methods to protein structure prediction, an increasing number of design methods seek to leverage generative models to design proteins with improved functionality over native proteins or novel structure and function. The inherent flexibility of proteins, from side-chain motion to larger conformational reshuffling, poses a challenge to design methods, where the ideal approach must consider both the spatial and temporal evolution of proteins in the context of their functional capacity. In this review, we highlight existing methods for protein design before discussing how methods at the forefront of deep learning-based design accommodate flexibility and where the field could evolve in the future.
Collapse
Affiliation(s)
| | | | - Patrick Barth
- *Correspondence: Lucas S. P. Rudden, ; Patrick Barth,
| |
Collapse
|