1
|
Pelaez RP, Simeon G, Galvelis R, Mirarchi A, Eastman P, Doerr S, Thölke P, Markland TE, De Fabritiis G. TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations. J Chem Theory Comput 2024; 20:4076-4087. [PMID: 38743033 DOI: 10.1021/acs.jctc.4c00253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Achieving a balance between computational speed, prediction accuracy, and universal applicability in molecular simulations has been a persistent challenge. This paper presents substantial advancements in TorchMD-Net software, a pivotal step forward in the shift from conventional force fields to neural network-based potentials. The evolution of TorchMD-Net into a more comprehensive and versatile framework is highlighted, incorporating cutting-edge architectures such as TensorNet. This transformation is achieved through a modular design approach, encouraging customized applications within the scientific community. The most notable enhancement is a significant improvement in computational efficiency, achieving a very remarkable acceleration in the computation of energy and forces for TensorNet models, with performance gains ranging from 2× to 10× over previous, nonoptimized, iterations. Other enhancements include highly optimized neighbor search algorithms that support periodic boundary conditions and smooth integration with existing molecular dynamics frameworks. Additionally, the updated version introduces the capability to integrate physical priors, further enriching its application spectrum and utility in research. The software is available at https://github.com/torchmd/torchmd-net.
Collapse
Affiliation(s)
- Raul P Pelaez
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Guillem Simeon
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Raimondas Galvelis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005 Barcelona, Spain
| | - Antonio Mirarchi
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Peter Eastman
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Stefan Doerr
- Acellera Labs, C Dr Trueta 183, 08005 Barcelona, Spain
| | | | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
2
|
Aldossary A, Campos-Gonzalez-Angulo JA, Pablo-García S, Leong SX, Rajaonson EM, Thiede L, Tom G, Wang A, Avagliano D, Aspuru-Guzik A. In Silico Chemical Experiments in the Age of AI: From Quantum Chemistry to Machine Learning and Back. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024:e2402369. [PMID: 38794859 DOI: 10.1002/adma.202402369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/28/2024] [Indexed: 05/26/2024]
Abstract
Computational chemistry is an indispensable tool for understanding molecules and predicting chemical properties. However, traditional computational methods face significant challenges due to the difficulty of solving the Schrödinger equations and the increasing computational cost with the size of the molecular system. In response, there has been a surge of interest in leveraging artificial intelligence (AI) and machine learning (ML) techniques to in silico experiments. Integrating AI and ML into computational chemistry increases the scalability and speed of the exploration of chemical space. However, challenges remain, particularly regarding the reproducibility and transferability of ML models. This review highlights the evolution of ML in learning from, complementing, or replacing traditional computational chemistry for energy and property predictions. Starting from models trained entirely on numerical data, a journey set forth toward the ideal model incorporating or learning the physical laws of quantum mechanics. This paper also reviews existing computational methods and ML models and their intertwining, outlines a roadmap for future research, and identifies areas for improvement and innovation. Ultimately, the goal is to develop AI architectures capable of predicting accurate and transferable solutions to the Schrödinger equation, thereby revolutionizing in silico experiments within chemistry and materials science.
Collapse
Affiliation(s)
- Abdulrahman Aldossary
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | | | - Sergio Pablo-García
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
| | - Shi Xuan Leong
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Ella Miray Rajaonson
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Luca Thiede
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Gary Tom
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Andrew Wang
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Davide Avagliano
- Chimie ParisTech, PSL University, CNRS, Institute of Chemistry for Life and Health Sciences (iCLeHS UMR 8060), Paris, F-75005, France
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
- Department of Materials Science & Engineering, University of Toronto, 184 College St., Toronto, ON, M5S 3E4, Canada
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St., Toronto, ON, M5S 3E5, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), 66118 University Ave., Toronto, M5G 1M1, Canada
- Acceleration Consortium, 80 St George St, Toronto, M5S 3H6, Canada
| |
Collapse
|
3
|
Pelaez RP, Simeon G, Galvelis R, Mirarchi A, Eastman P, Doerr S, Thölke P, Markland TE, De Fabritiis G. TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations. ARXIV 2024:arXiv:2402.17660v3. [PMID: 38463504 PMCID: PMC10925388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Achieving a balance between computational speed, prediction accuracy, and universal applicability in molecular simulations has been a persistent challenge. This paper presents substantial advancements in the TorchMD-Net software, a pivotal step forward in the shift from conventional force fields to neural network-based potentials. The evolution of TorchMD-Net into a more comprehensive and versatile framework is highlighted, incorporating cutting-edge architectures such as TensorNet. This transformation is achieved through a modular design approach, encouraging customized applications within the scientific community. The most notable enhancement is a significant improvement in computational efficiency, achieving a very remarkable acceleration in the computation of energy and forces for Tensor-Net models, with performance gains ranging from 2x to 10x over previous, non-optimized, iterations. Other enhancements include highly optimized neighbor search algorithms that support periodic boundary conditions and smooth integration with existing molecular dynamics frameworks. Additionally, the updated version introduces the capability to integrate physical priors, further enriching its application spectrum and utility in research. The software is available at https://github.com/torchmd/torchmd-net.
Collapse
Affiliation(s)
- Raul P Pelaez
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Guillem Simeon
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Raimondas Galvelis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005, Barcelona, Spain
| | - Antonio Mirarchi
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Peter Eastman
- Department of Chemistry, Stanford University, Stanford, CA 94305, USA
| | - Stefan Doerr
- Acellera Labs, C Dr Trueta 183, 08005, Barcelona, Spain
| | | | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, CA 94305, USA
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
4
|
Duignan TT. The Potential of Neural Network Potentials. ACS PHYSICAL CHEMISTRY AU 2024; 4:232-241. [PMID: 38800721 PMCID: PMC11117678 DOI: 10.1021/acsphyschemau.4c00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/04/2024] [Accepted: 03/05/2024] [Indexed: 05/29/2024]
Abstract
In the next half-century, physical chemistry will likely undergo a profound transformation, driven predominantly by the combination of recent advances in quantum chemistry and machine learning (ML). Specifically, equivariant neural network potentials (NNPs) are a breakthrough new tool that are already enabling us to simulate systems at the molecular scale with unprecedented accuracy and speed, relying on nothing but fundamental physical laws. The continued development of this approach will realize Paul Dirac's 80-year-old vision of using quantum mechanics to unify physics with chemistry and providing invaluable tools for understanding materials science, biology, earth sciences, and beyond. The era of highly accurate and efficient first-principles molecular simulations will provide a wealth of training data that can be used to build automated computational methodologies, using tools such as diffusion models, for the design and optimization of systems at the molecular scale. Large language models (LLMs) will also evolve into increasingly indispensable tools for literature review, coding, idea generation, and scientific writing.
Collapse
|
5
|
Doga H, Raubenolt B, Cumbo F, Joshi J, DiFilippo FP, Qin J, Blankenberg D, Shehab O. A Perspective on Protein Structure Prediction Using Quantum Computers. J Chem Theory Comput 2024; 20:3359-3378. [PMID: 38703105 PMCID: PMC11099973 DOI: 10.1021/acs.jctc.4c00067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 04/19/2024] [Accepted: 04/22/2024] [Indexed: 05/06/2024]
Abstract
Despite the recent advancements by deep learning methods such as AlphaFold2, in silico protein structure prediction remains a challenging problem in biomedical research. With the rapid evolution of quantum computing, it is natural to ask whether quantum computers can offer some meaningful benefits for approaching this problem. Yet, identifying specific problem instances amenable to quantum advantage and estimating the quantum resources required are equally challenging tasks. Here, we share our perspective on how to create a framework for systematically selecting protein structure prediction problems that are amenable for quantum advantage, and estimate quantum resources for such problems on a utility-scale quantum computer. As a proof-of-concept, we validate our problem selection framework by accurately predicting the structure of a catalytic loop of the Zika Virus NS3 Helicase, on quantum hardware.
Collapse
Affiliation(s)
- Hakan Doga
- IBM Quantum,
Almaden Research Center, San Jose, California 95120, United States
| | - Bryan Raubenolt
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Fabio Cumbo
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Jayadev Joshi
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Frank P. DiFilippo
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Jun Qin
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Daniel Blankenberg
- Center
for Computational Life Sciences, Lerner
Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Omar Shehab
- IBM
Quantum, IBM Thomas J Watson Research Center, Yorktown Heights, New York 10598, United States
| |
Collapse
|
6
|
Frasnetti E, Magni A, Castelli M, Serapian SA, Moroni E, Colombo G. Structures, dynamics, complexes, and functions: From classic computation to artificial intelligence. Curr Opin Struct Biol 2024; 87:102835. [PMID: 38744148 DOI: 10.1016/j.sbi.2024.102835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/14/2024] [Accepted: 04/22/2024] [Indexed: 05/16/2024]
Abstract
Computational approaches can provide highly detailed insight into the molecular recognition processes that underlie drug binding, the assembly of protein complexes, and the regulation of biological functional processes. Classical simulation methods can bridge a wide range of length- and time-scales typically involved in such processes. Lately, automated learning and artificial intelligence methods have shown the potential to expand the reach of physics-based approaches, ushering in the possibility to model and even design complex protein architectures. The synergy between atomistic simulations and AI methods is an emerging frontier with a huge potential for advances in structural biology. Herein, we explore various examples and frameworks for these approaches, providing select instances and applications that illustrate their impact on fundamental biomolecular problems.
Collapse
Affiliation(s)
- Elena Frasnetti
- Department of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy
| | - Andrea Magni
- Department of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy
| | - Matteo Castelli
- Department of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy
| | - Stefano A Serapian
- Department of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy
| | | | - Giorgio Colombo
- Department of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy.
| |
Collapse
|
7
|
Janson G, Feig M. Transferable deep generative modeling of intrinsically disordered protein conformations. PLoS Comput Biol 2024; 20:e1012144. [PMID: 38781245 PMCID: PMC11152266 DOI: 10.1371/journal.pcbi.1012144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 06/05/2024] [Accepted: 05/07/2024] [Indexed: 05/25/2024] Open
Abstract
Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
Collapse
Affiliation(s)
- Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
8
|
Walter LJ, Quoika PK, Zacharias M. Structure-Based Protein Assembly Simulations Including Various Binding Sites and Conformations. J Chem Inf Model 2024; 64:3465-3476. [PMID: 38602938 DOI: 10.1021/acs.jcim.4c00212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2024]
Abstract
Many biological functions are mediated by large complexes formed by multiple proteins and other cellular macromolecules. Recent progress in experimental structure determination, as well as in integrative modeling and protein structure prediction using deep learning approaches, has resulted in a rapid increase in the number of solved multiprotein assemblies. However, the assembly process of large complexes from their components is much less well-studied. We introduce a rapid computational structure-based (SB) model, GoCa, that allows to follow the assembly process of large multiprotein complexes based on a known native structure. Beyond existing SB Go̅-type models, it distinguishes between intra- and intersubunit interactions, allowing us to include coupled folding and binding. It accounts automatically for the permutation of identical subunits in a complex and allows the definition of multiple minima (native) structures in the case of proteins that undergo global transitions during assembly. The model is successfully tested on several multiprotein complexes. The source code of the GoCa program including a tutorial is publicly available on Github: https://github.com/ZachariasLab/GoCa. We also provide a web source that allows users to quickly generate the necessary input files for a GoCa simulation: https://goca.t38webservices.nat.tum.de.
Collapse
Affiliation(s)
- Luis J Walter
- Center for Functional Protein Assemblies, Technical University of Munich, Ernst-Otto-Fischer-Str. 8, Garching 85748, Germany
| | - Patrick K Quoika
- Center for Functional Protein Assemblies, Technical University of Munich, Ernst-Otto-Fischer-Str. 8, Garching 85748, Germany
| | - Martin Zacharias
- Center for Functional Protein Assemblies, Technical University of Munich, Ernst-Otto-Fischer-Str. 8, Garching 85748, Germany
| |
Collapse
|
9
|
Nie Y, Zheng Z, Li C, Zhan H, Kou L, Gu Y, Lü C. Resolving the dynamic properties of entangled linear polymers in non-equilibrium coarse grain simulation with a priori scaling factors. NANOSCALE 2024. [PMID: 38494916 DOI: 10.1039/d3nr06185j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
The molecular weight of polymers can influence the material properties, but the molecular weight at the experiment level sometimes can be a huge burden for property prediction with full-atomic simulations. The traditional bottom-up coarse grain (CG) simulation can reduce the computation cost. However, the dynamic properties predicted by the CG simulation can deviate from the full-atomic simulation result. Usually, in CG simulations, the diffusion is faster and the viscosity and modulus are much lower. The fast dynamics in CG are usually solved by a posteriori scaling on time, temperature, or potential modifications, which usually have poor transferability to other non-fitted physical properties because of a lack of fundamental physics. In this work, a priori scaling factors were calculated by the loss of degrees of freedom and implemented in the iterative Boltzmann inversion. According to the simulation results on 3 different CG levels at different temperatures and loading rates, such a priori scaling factors can help in reproducing some dynamic properties of polycaprolactone in CG simulation more accurately, such as heat capacity, Young's modulus, and viscosity, while maintaining the accuracy in the structural distribution prediction. The transferability of entropy-enthalpy compensation and a dissipative particle dynamics thermostat is also presented for comparison. The proposed method reveals the huge potential for developing customized CG thermostats and offers a simple way to rebuild multiphysics CG models for polymers with good transferability.
Collapse
Affiliation(s)
- Yihan Nie
- College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
| | - Zhuoqun Zheng
- School of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Chengkai Li
- School of Materials Science and Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, China
| | - Haifei Zhan
- College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
- School of Mechanical, Medical and Process Engineering, Queensland University of Technology (QUT), Brisbane QLD 4001, Australia
- Center for Materials Science, Queensland University of Technology (QUT), Brisbane QLD 4001, Australia
| | - Liangzhi Kou
- School of Mechanical, Medical and Process Engineering, Queensland University of Technology (QUT), Brisbane QLD 4001, Australia
- Center for Materials Science, Queensland University of Technology (QUT), Brisbane QLD 4001, Australia
| | - Yuantong Gu
- School of Mechanical, Medical and Process Engineering, Queensland University of Technology (QUT), Brisbane QLD 4001, Australia
- Center for Materials Science, Queensland University of Technology (QUT), Brisbane QLD 4001, Australia
| | - Chaofeng Lü
- Faculty of Mechanical Engineering & Mechanics, Ningbo University, Ningbo 315211, China
- College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
10
|
Janson G, Feig M. Transferable deep generative modeling of intrinsically disordered protein conformations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.08.579522. [PMID: 38370653 PMCID: PMC10871340 DOI: 10.1101/2024.02.08.579522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
Collapse
Affiliation(s)
- Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
11
|
McCafferty CL, Klumpe S, Amaro RE, Kukulski W, Collinson L, Engel BD. Integrating cellular electron microscopy with multimodal data to explore biology across space and time. Cell 2024; 187:563-584. [PMID: 38306982 DOI: 10.1016/j.cell.2024.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 01/03/2024] [Accepted: 01/03/2024] [Indexed: 02/04/2024]
Abstract
Biology spans a continuum of length and time scales. Individual experimental methods only glimpse discrete pieces of this spectrum but can be combined to construct a more holistic view. In this Review, we detail the latest advancements in volume electron microscopy (vEM) and cryo-electron tomography (cryo-ET), which together can visualize biological complexity across scales from the organization of cells in large tissues to the molecular details inside native cellular environments. In addition, we discuss emerging methodologies for integrating three-dimensional electron microscopy (3DEM) imaging with multimodal data, including fluorescence microscopy, mass spectrometry, single-particle analysis, and AI-based structure prediction. This multifaceted approach fills gaps in the biological continuum, providing functional context, spatial organization, molecular identity, and native interactions. We conclude with a perspective on incorporating diverse data into computational simulations that further bridge and extend length scales while integrating the dimension of time.
Collapse
Affiliation(s)
| | - Sven Klumpe
- Research Group CryoEM Technology, Max-Planck-Institute of Biochemistry, Am Klopferspitz 18, 82152 Martinsried, Germany.
| | - Rommie E Amaro
- Department of Molecular Biology, University of California, San Diego, La Jolla, CA 92093, USA.
| | - Wanda Kukulski
- Institute of Biochemistry and Molecular Medicine, University of Bern, Bühlstrasse 28, 3012 Bern, Switzerland.
| | - Lucy Collinson
- Electron Microscopy Science Technology Platform, Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK.
| | - Benjamin D Engel
- Biozentrum, University of Basel, Spitalstrasse 41, 4056 Basel, Switzerland.
| |
Collapse
|
12
|
Bačić Toplek F, Scalone E, Stegani B, Paissoni C, Capelli R, Camilloni C. Multi- eGO: Model Improvements toward the Study of Complex Self-Assembly Processes. J Chem Theory Comput 2024; 20:459-468. [PMID: 38153340 PMCID: PMC10782439 DOI: 10.1021/acs.jctc.3c01182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/16/2023] [Accepted: 12/18/2023] [Indexed: 12/29/2023]
Abstract
Structure-based models have been instrumental in simulating protein folding and suggesting hypotheses about the mechanisms involved. Nowadays, at least for fast-folding proteins, folding can be simulated in explicit solvent using classical molecular dynamics. However, other self-assembly processes, such as protein aggregation, are still far from being accessible. Recently, we proposed that a hybrid multistate structure-based model, multi-eGO, could help to bridge the gap toward the simulation of out-of-equilibrium, concentration-dependent self-assembly processes. Here, we further improve the model and show how multi-eGO can effectively and accurately learn the conformational ensemble of the amyloid β42 intrinsically disordered peptide, reproduce the well-established folding mechanism of the B1 immunoglobulin-binding domain of streptococcal protein G, and reproduce the aggregation as a function of the concentration of the transthyretin 105-115 amyloidogenic peptide. We envision that by learning from the dynamics of a few minima, multi-eGO can become a platform for simulating processes inaccessible to other simulation techniques.
Collapse
Affiliation(s)
- Fran Bačić Toplek
- Dipartimento
di Bioscienze, Università degli Studi
di Milano, Via Celoria 26, 20133 Milano, Italy
| | - Emanuele Scalone
- Dipartimento
di Bioscienze, Università degli Studi
di Milano, Via Celoria 26, 20133 Milano, Italy
- Department
of Chemistry, Dartmouth College, Hanover, New Hampshire 03755, United States
| | - Bruno Stegani
- Dipartimento
di Bioscienze, Università degli Studi
di Milano, Via Celoria 26, 20133 Milano, Italy
| | - Cristina Paissoni
- Dipartimento
di Bioscienze, Università degli Studi
di Milano, Via Celoria 26, 20133 Milano, Italy
| | - Riccardo Capelli
- Dipartimento
di Bioscienze, Università degli Studi
di Milano, Via Celoria 26, 20133 Milano, Italy
| | - Carlo Camilloni
- Dipartimento
di Bioscienze, Università degli Studi
di Milano, Via Celoria 26, 20133 Milano, Italy
| |
Collapse
|
13
|
Airas J, Ding X, Zhang B. Transferable Implicit Solvation via Contrastive Learning of Graph Neural Networks. ACS CENTRAL SCIENCE 2023; 9:2286-2297. [PMID: 38161379 PMCID: PMC10755853 DOI: 10.1021/acscentsci.3c01160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/26/2023] [Accepted: 10/31/2023] [Indexed: 01/03/2024]
Abstract
Implicit solvent models are essential for molecular dynamics simulations of biomolecules, striking a balance between computational efficiency and biological realism. Efforts are underway to develop accurate and transferable implicit solvent models and coarse-grained (CG) force fields in general, guided by a bottom-up approach that matches the CG energy function with the potential of mean force (PMF) defined by the finer system. However, practical challenges arise due to the lack of analytical expressions for the PMF and algorithmic limitations in parameterizing CG force fields. To address these challenges, a machine learning-based approach is proposed, utilizing graph neural networks (GNNs) to represent the solvation free energy and potential contrasting for parameter optimization. We demonstrate the effectiveness of the approach by deriving a transferable GNN implicit solvent model using 600,000 atomistic configurations of six proteins obtained from explicit solvent simulations. The GNN model provides solvation free energy estimations much more accurately than state-of-the-art implicit solvent models, reproducing configurational distributions of explicit solvent simulations. We also demonstrate the reasonable transferability of the GNN model outside of the training data. Our study offers valuable insights for deriving systematically improvable implicit solvent models and CG force fields from a bottom-up perspective.
Collapse
Affiliation(s)
- Justin Airas
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, United
States
| | - Xinqiang Ding
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, United
States
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, United
States
| |
Collapse
|
14
|
Jones MS, Shmilovich K, Ferguson AL. DiAMoNDBack: Diffusion-Denoising Autoregressive Model for Non-Deterministic Backmapping of Cα Protein Traces. J Chem Theory Comput 2023; 19:7908-7923. [PMID: 37906711 DOI: 10.1021/acs.jctc.3c00840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long time scales, such as aggregation and folding. The reduced resolution realizes computational accelerations, but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only Cα coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the Cα trace and previously backmapped backbone and side-chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side-chain all-atom configurations consistent with the coarse-grained Cα trace. We train DiAMoNDBack over 65k+ structures from the Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side-chain clashes, and the diversity of the generated side-chain configurational states. We make the DiAMoNDBack model publicly available as a free and open-source Python package.
Collapse
Affiliation(s)
- Michael S Jones
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
15
|
Navarro C, Majewski M, De Fabritiis G. Top-Down Machine Learning of Coarse-Grained Protein Force Fields. J Chem Theory Comput 2023; 19:7518-7526. [PMID: 37874270 PMCID: PMC10777392 DOI: 10.1021/acs.jctc.3c00638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Indexed: 10/25/2023]
Abstract
Developing accurate and efficient coarse-grained representations of proteins is crucial for understanding their folding, function, and interactions over extended time scales. Our methodology involves simulating proteins with molecular dynamics and utilizing the resulting trajectories to train a neural network potential through differentiable trajectory reweighting. Remarkably, this method requires only the native conformation of proteins, eliminating the need for labeled data derived from extensive simulations or memory-intensive end-to-end differentiable simulations. Once trained, the model can be employed to run parallel molecular dynamics simulations and sample folding events for proteins both within and beyond the training distribution, showcasing its extrapolation capabilities. By applying Markov state models, native-like conformations of the simulated proteins can be predicted from the coarse-grained simulations. Owing to its theoretical transferability and ability to use solely experimental static structures as training data, we anticipate that this approach will prove advantageous for developing new protein force fields and further advancing the study of protein dynamics, folding, and interactions.
Collapse
Affiliation(s)
- Carles Navarro
- Acellera
Labs, Doctor Trueta 183, 08005 Barcelona, Spain
| | | | - Gianni De Fabritiis
- Computational
Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera
Ltd., Devonshire House
582, Middlesex HA7 1JS, United Kingdom
- Institució
Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
16
|
Borges-Araújo L, Patmanidis I, Singh AP, Santos LHS, Sieradzan AK, Vanni S, Czaplewski C, Pantano S, Shinoda W, Monticelli L, Liwo A, Marrink SJ, Souza PCT. Pragmatic Coarse-Graining of Proteins: Models and Applications. J Chem Theory Comput 2023; 19:7112-7135. [PMID: 37788237 DOI: 10.1021/acs.jctc.3c00733] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
The molecular details involved in the folding, dynamics, organization, and interaction of proteins with other molecules are often difficult to assess by experimental techniques. Consequently, computational models play an ever-increasing role in the field. However, biological processes involving large-scale protein assemblies or long time scale dynamics are still computationally expensive to study in atomistic detail. For these applications, employing coarse-grained (CG) modeling approaches has become a key strategy. In this Review, we provide an overview of what we call pragmatic CG protein models, which are strategies combining, at least in part, a physics-based implementation and a top-down experimental approach to their parametrization. In particular, we focus on CG models in which most protein residues are represented by at least two beads, allowing these models to retain some degree of chemical specificity. A description of the main modern pragmatic protein CG models is provided, including a review of the most recent applications and an outlook on future perspectives in the field.
Collapse
Affiliation(s)
- Luís Borges-Araújo
- Molecular Microbiology and Structural Biochemistry (MMSB, UMR 5086), CNRS, University of Lyon, 7 Passage du Vercors, 69007 Lyon, France
| | - Ilias Patmanidis
- Department of Chemistry, Aarhus University, Langelandsgade 140, 8000 Aarhus C, Denmark
- Groningen Biomolecular Sciences and Biotechnology Institute and Zernike Institute for Advanced Materials, University of Groningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands
| | - Akhil P Singh
- Department of Biology, University of Fribourg, Chemin du Musée 10, Fribourg CH-1700, Switzerland
| | - Lucianna H S Santos
- Biomolecular Simulations Group, Institut Pasteur de Montevideo, Montevideo 11400, Uruguay
| | - Adam K Sieradzan
- Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Stefano Vanni
- Department of Biology, University of Fribourg, Chemin du Musée 10, Fribourg CH-1700, Switzerland
- Institut de Pharmacologie Moléculaire et Cellulaire, Université Côte d'Azur, Inserm, CNRS, 06560 Valbonne, France
| | - Cezary Czaplewski
- Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Sergio Pantano
- Biomolecular Simulations Group, Institut Pasteur de Montevideo, Montevideo 11400, Uruguay
| | - Wataru Shinoda
- Research Institute for Interdisciplinary Science, Okayama University, 3-1-1 Tsushima-naka, Kita, Okayama 700-8530, Japan
| | - Luca Monticelli
- Molecular Microbiology and Structural Biochemistry (MMSB, UMR 5086), CNRS, University of Lyon, 7 Passage du Vercors, 69007 Lyon, France
| | - Adam Liwo
- Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Siewert J Marrink
- Groningen Biomolecular Sciences and Biotechnology Institute and Zernike Institute for Advanced Materials, University of Groningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands
| | - Paulo C T Souza
- Molecular Microbiology and Structural Biochemistry (MMSB, UMR 5086), CNRS, University of Lyon, 7 Passage du Vercors, 69007 Lyon, France
| |
Collapse
|