1
|
Navarro C, Majewski M, De Fabritiis G. Top-Down Machine Learning of Coarse-Grained Protein Force Fields. J Chem Theory Comput 2023; 19:7518-7526. [PMID: 37874270 PMCID: PMC10777392 DOI: 10.1021/acs.jctc.3c00638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Indexed: 10/25/2023]
Abstract
Developing accurate and efficient coarse-grained representations of proteins is crucial for understanding their folding, function, and interactions over extended time scales. Our methodology involves simulating proteins with molecular dynamics and utilizing the resulting trajectories to train a neural network potential through differentiable trajectory reweighting. Remarkably, this method requires only the native conformation of proteins, eliminating the need for labeled data derived from extensive simulations or memory-intensive end-to-end differentiable simulations. Once trained, the model can be employed to run parallel molecular dynamics simulations and sample folding events for proteins both within and beyond the training distribution, showcasing its extrapolation capabilities. By applying Markov state models, native-like conformations of the simulated proteins can be predicted from the coarse-grained simulations. Owing to its theoretical transferability and ability to use solely experimental static structures as training data, we anticipate that this approach will prove advantageous for developing new protein force fields and further advancing the study of protein dynamics, folding, and interactions.
Collapse
Affiliation(s)
- Carles Navarro
- Acellera
Labs, Doctor Trueta 183, 08005 Barcelona, Spain
| | | | - Gianni De Fabritiis
- Computational
Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera
Ltd., Devonshire House
582, Middlesex HA7 1JS, United Kingdom
- Institució
Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
2
|
Habeck M. Bayesian methods in integrative structure modeling. Biol Chem 2023; 404:741-754. [PMID: 37505205 DOI: 10.1515/hsz-2023-0145] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 07/07/2023] [Indexed: 07/29/2023]
Abstract
There is a growing interest in characterizing the structure and dynamics of large biomolecular assemblies and their interactions within the cellular environment. A diverse array of experimental techniques allows us to study biomolecular systems on a variety of length and time scales. These techniques range from imaging with light, X-rays or electrons, to spectroscopic methods, cross-linking mass spectrometry and functional genomics approaches, and are complemented by AI-assisted protein structure prediction methods. A challenge is to integrate all of these data into a model of the system and its functional dynamics. This review focuses on Bayesian approaches to integrative structure modeling. We sketch the principles of Bayesian inference, highlight recent applications to integrative modeling and conclude with a discussion of current challenges and future perspectives.
Collapse
Affiliation(s)
- Michael Habeck
- Microscopic Image Analysis Group, Jena University Hospital, D-07743 Jena, Germany
- Max Planck Institute for Multidisciplinary Sciences, d-37077 Göttingen, Germany
| |
Collapse
|
3
|
Giulini M, Rigoli M, Mattiotti G, Menichetti R, Tarenzi T, Fiorentini R, Potestio R. From System Modeling to System Analysis: The Impact of Resolution Level and Resolution Distribution in the Computer-Aided Investigation of Biomolecules. Front Mol Biosci 2021; 8:676976. [PMID: 34164432 PMCID: PMC8215203 DOI: 10.3389/fmolb.2021.676976] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Accepted: 05/06/2021] [Indexed: 12/18/2022] Open
Abstract
The ever increasing computer power, together with the improved accuracy of atomistic force fields, enables researchers to investigate biological systems at the molecular level with remarkable detail. However, the relevant length and time scales of many processes of interest are still hardly within reach even for state-of-the-art hardware, thus leaving important questions often unanswered. The computer-aided investigation of many biological physics problems thus largely benefits from the usage of coarse-grained models, that is, simplified representations of a molecule at a level of resolution that is lower than atomistic. A plethora of coarse-grained models have been developed, which differ most notably in their granularity; this latter aspect determines one of the crucial open issues in the field, i.e. the identification of an optimal degree of coarsening, which enables the greatest simplification at the expenses of the smallest information loss. In this review, we present the problem of coarse-grained modeling in biophysics from the viewpoint of system representation and information content. In particular, we discuss two distinct yet complementary aspects of protein modeling: on the one hand, the relationship between the resolution of a model and its capacity of accurately reproducing the properties of interest; on the other hand, the possibility of employing a lower resolution description of a detailed model to extract simple, useful, and intelligible information from the latter.
Collapse
Affiliation(s)
- Marco Giulini
- Physics Department, University of Trento, Trento, Italy.,INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento, Italy
| | - Marta Rigoli
- Physics Department, University of Trento, Trento, Italy.,INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento, Italy
| | - Giovanni Mattiotti
- Physics Department, University of Trento, Trento, Italy.,INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento, Italy
| | - Roberto Menichetti
- Physics Department, University of Trento, Trento, Italy.,INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento, Italy
| | - Thomas Tarenzi
- Physics Department, University of Trento, Trento, Italy.,INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento, Italy
| | - Raffaele Fiorentini
- Physics Department, University of Trento, Trento, Italy.,INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento, Italy
| | - Raffaello Potestio
- Physics Department, University of Trento, Trento, Italy.,INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento, Italy
| |
Collapse
|
4
|
Wu Z, Zhang Y, Zhang JZ, Xia K, Xia F. Determining Optimal Coarse-Grained Representation for Biomolecules Using Internal Cluster Validation Indexes. J Comput Chem 2019; 41:14-20. [PMID: 31568566 DOI: 10.1002/jcc.26070] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 08/15/2019] [Accepted: 08/27/2019] [Indexed: 12/30/2022]
Abstract
The development of ultracoarse-grained models for large biomolecules needs to derive the optimal number of coarse-grained (CG) sites to represent the targets. In this work, we propose to use the statistical internal cluster validation indexes to determine the optimal number of CG sites that are optimized based on the essential dynamics coarse-graining method. The calculated curves of Calinski-Harabasz and Silhouette Coefficient indexes exhibit the extrema corresponding to the similar CG numbers. The calculated ratios of the optimal CG numbers to the residue numbers of fine-grained models are in the range from 4 to 2. The comparison of the stability of index results indicates that Calinski-Harabasz index is the better choice to determine the optimal CG representation in coarse-graining. © 2019 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Zhenliang Wu
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China
| | - Yuwei Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China
| | - John Zenghui Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore.,School of Biological Sciences, Nanyang Technological University, 637371, Singapore
| | - Fei Xia
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China
| |
Collapse
|
5
|
Zhang Y, Xia K, Cao Z, Gräter F, Xia F. A new method for the construction of coarse-grained models of large biomolecules from low-resolution cryo-electron microscopy data. Phys Chem Chem Phys 2019; 21:9720-9727. [PMID: 31025999 DOI: 10.1039/c9cp01370a] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The rapid development of cryo-electron microscopy (cryo-EM) has led to the generation of significant low-resolution electron density data of biomolecules. However, the atomistic details of huge biomolecules usually cannot be obtained because it is very difficult to construct all-atom models for MD simulations. Thus, it is still a challenge to make use of the rich low-resolution cryo-EM data for computer simulation and functional study. In this study, we proposed a new method called Convolutional and K-means Coarse-Graining (CK-CG) for the efficient coarse-graining of large biological systems. Using the CK-CG method, we could directly map the cryo-EM data into coarse-grained (CG) beads. Furthermore, the CG beads were parameterized with an empirical harmonic potential to construct a new CG model. We subjected the CK-CG models of the fibrillar protein assemblies F-actin and collagen to external forces in pulling dynamic simulations to assess their mechanical response. The agreement between the estimated tensile stiffness between CG models and experiments demonstrates the validity of the CK-CG method. Thus, our method provides a practical strategy for the direct construction of a structural model from low-resolution data for biological function studies.
Collapse
Affiliation(s)
- Yuwei Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.
| | | | | | | | | |
Collapse
|
6
|
Viswanath S, Sali A. Optimizing model representation for integrative structure determination of macromolecular assemblies. Proc Natl Acad Sci U S A 2019; 116:540-545. [PMID: 30587581 PMCID: PMC6329962 DOI: 10.1073/pnas.1814649116] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Integrative structure determination of macromolecular assemblies requires specifying the representation of the modeled structure, a scoring function for ranking alternative models based on diverse types of data, and a sampling method for generating these models. Structures are often represented at atomic resolution, although ad hoc simplified representations based on generic guidelines and/or trial and error are also used. In contrast, we introduce here the concept of optimizing representation. To illustrate this concept, the optimal representation is selected from a set of candidate representations based on an objective criterion that depends on varying amounts of information available for different parts of the structure. Specifically, an optimal representation is defined as the highest-resolution representation for which sampling is exhaustive at a precision commensurate with the precision of the representation. Thus, the method does not require an input structure and is applicable to any input information. We consider a space of representations in which a representation is a set of nonoverlapping, variable-length segments (i.e., coarse-grained beads) for each component protein sequence. We also implement a method for efficiently finding an optimal representation in our open-source Integrative Modeling Platform (IMP) software (https://integrativemodeling.org/). The approach is illustrated by application to three complexes of two subunits and a large assembly of 10 subunits. The optimized representation facilitates exhaustive sampling and thus can produce a more accurate model and a more accurate estimate of its uncertainty for larger structures than were possible previously.
Collapse
Affiliation(s)
- Shruthi Viswanath
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143;
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143;
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94143
- California Institute of Quantitative Biosciences, University of California, San Francisco, CA 94143
| |
Collapse
|
7
|
Sorzano COS, Jiménez A, Mota J, Vilas JL, Maluenda D, Martínez M, Ramírez-Aportela E, Majtner T, Segura J, Sánchez-García R, Rancel Y, del Caño L, Conesa P, Melero R, Jonic S, Vargas J, Cazals F, Freyberg Z, Krieger J, Bahar I, Marabini R, Carazo JM. Survey of the analysis of continuous conformational variability of biological macromolecules by electron microscopy. Acta Crystallogr F Struct Biol Commun 2019; 75:19-32. [PMID: 30605122 PMCID: PMC6317454 DOI: 10.1107/s2053230x18015108] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 10/26/2018] [Indexed: 11/10/2022] Open
Abstract
Single-particle analysis by electron microscopy is a well established technique for analyzing the three-dimensional structures of biological macromolecules. Besides its ability to produce high-resolution structures, it also provides insights into the dynamic behavior of the structures by elucidating their conformational variability. Here, the different image-processing methods currently available to study continuous conformational changes are reviewed.
Collapse
Affiliation(s)
| | - A. Jiménez
- National Center of Biotechnology (CSIC), Spain
| | - J. Mota
- National Center of Biotechnology (CSIC), Spain
| | - J. L. Vilas
- National Center of Biotechnology (CSIC), Spain
| | - D. Maluenda
- National Center of Biotechnology (CSIC), Spain
| | - M. Martínez
- National Center of Biotechnology (CSIC), Spain
| | | | - T. Majtner
- National Center of Biotechnology (CSIC), Spain
| | - J. Segura
- National Center of Biotechnology (CSIC), Spain
| | | | - Y. Rancel
- National Center of Biotechnology (CSIC), Spain
| | - L. del Caño
- National Center of Biotechnology (CSIC), Spain
| | - P. Conesa
- National Center of Biotechnology (CSIC), Spain
| | - R. Melero
- National Center of Biotechnology (CSIC), Spain
| | - S. Jonic
- Sorbonne Université, UMR CNRS 7590, Muséum National d’Histoire Naturelle, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | | | - F. Cazals
- Inria Sophia Antipolis – Méditerranée, France
| | | | | | | | | | | |
Collapse
|
8
|
Advances in coarse-grained modeling of macromolecular complexes. Curr Opin Struct Biol 2018; 52:119-126. [PMID: 30508766 DOI: 10.1016/j.sbi.2018.11.005] [Citation(s) in RCA: 83] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 11/05/2018] [Accepted: 11/17/2018] [Indexed: 01/12/2023]
Abstract
Recent progress in coarse-grained (CG) molecular modeling and simulation has facilitated an influx of computational studies on biological macromolecules and their complexes. Given the large separation of length-scales and time-scales that dictate macromolecular biophysics, CG modeling and simulation are well-suited to bridge the microscopic and mesoscopic or macroscopic details observed from all-atom molecular simulations and experiments, respectively. In this review, we first summarize recent innovations in the development of CG models, which broadly include structure-based, knowledge-based, and dynamics-based approaches. We then discuss recent applications of different classes of CG models to explore various macromolecular complexes. Finally, we conclude with an outlook for the future in this ever-growing field of biomolecular modeling.
Collapse
|