1
|
Kidder KM, Noid WG. Analysis of mapping atomic models to coarse-grained resolution. J Chem Phys 2024; 161:134113. [PMID: 39365018 DOI: 10.1063/5.0220989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 09/10/2024] [Indexed: 10/05/2024] Open
Abstract
Low-resolution coarse-grained (CG) models provide significant computational and conceptual advantages for simulating soft materials. However, the properties of CG models depend quite sensitively upon the mapping, M, that maps each atomic configuration, r, to a CG configuration, R. In particular, M determines how the configurational information of the atomic model is partitioned between the mapped ensemble of CG configurations and the lost ensemble of atomic configurations that map to each R. In this work, we investigate how the mapping partitions the atomic configuration space into CG and intra-site components. We demonstrate that the corresponding coordinate transformation introduces a nontrivial Jacobian factor. This Jacobian factor defines a labeling entropy that corresponds to the uncertainty in the atoms that are associated with each CG site. Consequently, the labeling entropy effectively transfers configurational information from the lost ensemble into the mapped ensemble. Moreover, our analysis highlights the possibility of resonant mappings that separate the atomic potential into CG and intra-site contributions. We numerically illustrate these considerations with a Gaussian network model for the equilibrium fluctuations of actin. We demonstrate that the spectral quality, Q, provides a simple metric for identifying high quality representations for actin. Conversely, we find that neither maximizing nor minimizing the information content of the mapped ensemble results in high quality representations. However, if one accounts for the labeling uncertainty, Q(M) correlates quite well with the adjusted configurational information loss, Îmap(M), that results from the mapping.
Collapse
Affiliation(s)
- Katherine M Kidder
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - W G Noid
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
2
|
Giulini M, Fiorentini R, Tubiana L, Potestio R, Menichetti R. EXCOGITO, an Extensible Coarse-Graining Toolbox for the Investigation of Biomolecules by Means of Low-Resolution Representations. J Chem Inf Model 2024; 64:4912-4927. [PMID: 38860513 DOI: 10.1021/acs.jcim.4c00490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]
Abstract
Bottom-up coarse-grained (CG) models proved to be essential to complement and sometimes even replace all-atom representations of soft matter systems and biological macromolecules. The development of low-resolution models takes the moves from the reduction of the degrees of freedom employed, that is, the definition of a mapping between a system's high-resolution description and its simplified counterpart. Even in the absence of an explicit parametrization and simulation of a CG model, the observation of the atomistic system in simpler terms can be informative: this idea is leveraged by the mapping entropy, a measure of the information loss inherent to the process of coarsening. Mapping entropy lies at the heart of the extensible coarse-graining toolbox, EXCOGITO, developed to perform a number of operations and analyses on molecular systems pivoting around the properties of mappings. EXCOGITO can process an all-atom trajectory to compute the mapping entropy, identify the mapping that minimizes it, and establish quantitative relations between a low-resolution representation and the geometrical, structural, and energetic features of the system. Here, the software, which is available free of charge under an open-source license, is presented and showcased to introduce potential users to its capabilities and usage.
Collapse
Affiliation(s)
- Marco Giulini
- Physics Department, University of Trento, Via Sommarive, 14, Trento I-38123, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento I-38123, Italy
| | - Raffaele Fiorentini
- Physics Department, University of Trento, Via Sommarive, 14, Trento I-38123, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento I-38123, Italy
| | - Luca Tubiana
- Physics Department, University of Trento, Via Sommarive, 14, Trento I-38123, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento I-38123, Italy
| | - Raffaello Potestio
- Physics Department, University of Trento, Via Sommarive, 14, Trento I-38123, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento I-38123, Italy
| | - Roberto Menichetti
- Physics Department, University of Trento, Via Sommarive, 14, Trento I-38123, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento I-38123, Italy
| |
Collapse
|
3
|
Kidder KM, Shell MS, Noid WG. Surveying the energy landscape of coarse-grained mappings. J Chem Phys 2024; 160:054105. [PMID: 38310476 DOI: 10.1063/5.0182524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 12/28/2023] [Indexed: 02/05/2024] Open
Abstract
Simulations of soft materials often adopt low-resolution coarse-grained (CG) models. However, the CG representation is not unique and its impact upon simulated properties is poorly understood. In this work, we investigate the space of CG representations for ubiquitin, which is a typical globular protein with 72 amino acids. We employ Monte Carlo methods to ergodically sample this space and to characterize its landscape. By adopting the Gaussian network model as an analytically tractable atomistic model for equilibrium fluctuations, we exactly assess the intrinsic quality of each CG representation without introducing any approximations in sampling configurations or in modeling interactions. We focus on two metrics, the spectral quality and the information content, that quantify the extent to which the CG representation preserves low-frequency, large-amplitude motions and configurational information, respectively. The spectral quality and information content are weakly correlated among high-resolution representations but become strongly anticorrelated among low-resolution representations. Representations with maximal spectral quality appear consistent with physical intuition, while low-resolution representations with maximal information content do not. Interestingly, quenching studies indicate that the energy landscape of mapping space is very smooth and highly connected. Moreover, our study suggests a critical resolution below which a "phase transition" qualitatively distinguishes good and bad representations.
Collapse
Affiliation(s)
- Katherine M Kidder
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - M Scott Shell
- Department of Chemical Engineering, University of California, Santa Barbara, California 93106, USA
| | - W G Noid
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
4
|
Maier JC, Wang CI, Jackson NE. Distilling coarse-grained representations of molecular electronic structure with continuously gated message passing. J Chem Phys 2024; 160:024109. [PMID: 38193551 DOI: 10.1063/5.0179253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 12/14/2023] [Indexed: 01/10/2024] Open
Abstract
Bottom-up methods for coarse-grained (CG) molecular modeling are critically needed to establish rigorous links between atomistic reference data and reduced molecular representations. For a target molecule, the ideal reduced CG representation is a function of both the conformational ensemble of the system and the target physical observable(s) to be reproduced at the CG resolution. However, there is an absence of algorithms for selecting CG representations of molecules from which complex properties, including molecular electronic structure, can be accurately modeled. We introduce continuously gated message passing (CGMP), a graph neural network (GNN) method for atomically decomposing molecular electronic structure sampled over conformational ensembles. CGMP integrates 3D-invariant GNNs and a novel gated message passing system to continuously reduce the atomic degrees of freedom accessible for electronic predictions, resulting in a one-shot importance ranking of atoms contributing to a target molecular property. Moreover, CGMP provides the first approach by which to quantify the degeneracy of "good" CG representations conditioned on specific prediction targets, facilitating the development of more transferable CG representations. We further show how CGMP can be used to highlight multiatom correlations, illuminating a path to developing CG electronic Hamiltonians in terms of interpretable collective variables for arbitrarily complex molecules.
Collapse
Affiliation(s)
- J Charlie Maier
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Chun-I Wang
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Nicholas E Jackson
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
5
|
Giulini M, Honorato RV, Rivera JL, Bonvin AMJJ. ARCTIC-3D: automatic retrieval and clustering of interfaces in complexes from 3D structural information. Commun Biol 2024; 7:49. [PMID: 38184711 PMCID: PMC10771469 DOI: 10.1038/s42003-023-05718-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 12/18/2023] [Indexed: 01/08/2024] Open
Abstract
The formation of a stable complex between proteins lies at the core of a wide variety of biological processes and has been the focus of countless experiments. The huge amount of information contained in the protein structural interactome in the Protein Data Bank can now be used to characterise and classify the existing biological interfaces. We here introduce ARCTIC-3D, a fast and user-friendly data mining and clustering software to retrieve data and rationalise the interface information associated with the protein input data. We demonstrate its use by various examples ranging from showing the increased interaction complexity of eukaryotic proteins, 20% of which on average have more than 3 different interfaces compared to only 10% for prokaryotes, to associating different functions to different interfaces. In the context of modelling biomolecular assemblies, we introduce the concept of "recognition entropy", related to the number of possible interfaces of the components of a protein-protein complex, which we demonstrate to correlate with the modelling difficulty in classical docking approaches. The identified interface clusters can also be used to generate various combinations of interface-specific restraints for integrative modelling. The ARCTIC-3D software is freely available at github.com/haddocking/arctic3d and can be accessed as a web-service at wenmr.science.uu.nl/arctic3d.
Collapse
Affiliation(s)
- Marco Giulini
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584, Utrecht, CH, The Netherlands
| | - Rodrigo V Honorato
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584, Utrecht, CH, The Netherlands
| | - Jesús L Rivera
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584, Utrecht, CH, The Netherlands
| | - Alexandre M J J Bonvin
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584, Utrecht, CH, The Netherlands.
| |
Collapse
|
6
|
Borges-Araújo L, Patmanidis I, Singh AP, Santos LHS, Sieradzan AK, Vanni S, Czaplewski C, Pantano S, Shinoda W, Monticelli L, Liwo A, Marrink SJ, Souza PCT. Pragmatic Coarse-Graining of Proteins: Models and Applications. J Chem Theory Comput 2023; 19:7112-7135. [PMID: 37788237 DOI: 10.1021/acs.jctc.3c00733] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
The molecular details involved in the folding, dynamics, organization, and interaction of proteins with other molecules are often difficult to assess by experimental techniques. Consequently, computational models play an ever-increasing role in the field. However, biological processes involving large-scale protein assemblies or long time scale dynamics are still computationally expensive to study in atomistic detail. For these applications, employing coarse-grained (CG) modeling approaches has become a key strategy. In this Review, we provide an overview of what we call pragmatic CG protein models, which are strategies combining, at least in part, a physics-based implementation and a top-down experimental approach to their parametrization. In particular, we focus on CG models in which most protein residues are represented by at least two beads, allowing these models to retain some degree of chemical specificity. A description of the main modern pragmatic protein CG models is provided, including a review of the most recent applications and an outlook on future perspectives in the field.
Collapse
Affiliation(s)
- Luís Borges-Araújo
- Molecular Microbiology and Structural Biochemistry (MMSB, UMR 5086), CNRS, University of Lyon, 7 Passage du Vercors, 69007 Lyon, France
| | - Ilias Patmanidis
- Department of Chemistry, Aarhus University, Langelandsgade 140, 8000 Aarhus C, Denmark
- Groningen Biomolecular Sciences and Biotechnology Institute and Zernike Institute for Advanced Materials, University of Groningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands
| | - Akhil P Singh
- Department of Biology, University of Fribourg, Chemin du Musée 10, Fribourg CH-1700, Switzerland
| | - Lucianna H S Santos
- Biomolecular Simulations Group, Institut Pasteur de Montevideo, Montevideo 11400, Uruguay
| | - Adam K Sieradzan
- Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Stefano Vanni
- Department of Biology, University of Fribourg, Chemin du Musée 10, Fribourg CH-1700, Switzerland
- Institut de Pharmacologie Moléculaire et Cellulaire, Université Côte d'Azur, Inserm, CNRS, 06560 Valbonne, France
| | - Cezary Czaplewski
- Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Sergio Pantano
- Biomolecular Simulations Group, Institut Pasteur de Montevideo, Montevideo 11400, Uruguay
| | - Wataru Shinoda
- Research Institute for Interdisciplinary Science, Okayama University, 3-1-1 Tsushima-naka, Kita, Okayama 700-8530, Japan
| | - Luca Monticelli
- Molecular Microbiology and Structural Biochemistry (MMSB, UMR 5086), CNRS, University of Lyon, 7 Passage du Vercors, 69007 Lyon, France
| | - Adam Liwo
- Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Siewert J Marrink
- Groningen Biomolecular Sciences and Biotechnology Institute and Zernike Institute for Advanced Materials, University of Groningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands
| | - Paulo C T Souza
- Molecular Microbiology and Structural Biochemistry (MMSB, UMR 5086), CNRS, University of Lyon, 7 Passage du Vercors, 69007 Lyon, France
| |
Collapse
|
7
|
Yang W, Templeton C, Rosenberger D, Bittracher A, Nüske F, Noé F, Clementi C. Slicing and Dicing: Optimal Coarse-Grained Representation to Preserve Molecular Kinetics. ACS CENTRAL SCIENCE 2023; 9:186-196. [PMID: 36844497 PMCID: PMC9951291 DOI: 10.1021/acscentsci.2c01200] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Indexed: 05/05/2023]
Abstract
The aim of molecular coarse-graining approaches is to recover relevant physical properties of the molecular system via a lower-resolution model that can be more efficiently simulated. Ideally, the lower resolution still accounts for the degrees of freedom necessary to recover the correct physical behavior. The selection of these degrees of freedom has often relied on the scientist's chemical and physical intuition. In this article, we make the argument that in soft matter contexts desirable coarse-grained models accurately reproduce the long-time dynamics of a system by correctly capturing the rare-event transitions. We propose a bottom-up coarse-graining scheme that correctly preserves the relevant slow degrees of freedom, and we test this idea for three systems of increasing complexity. We show that in contrast to this method existing coarse-graining schemes such as those from information theory or structure-based approaches are not able to recapitulate the slow time scales of the system.
Collapse
Affiliation(s)
- Wangfei Yang
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Graduate
Program in Systems, Synthetic and Physical Biology, Rice University, Houston, Texas77005, United States
| | - Clark Templeton
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - David Rosenberger
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - Andreas Bittracher
- Department
of Mathematics and Computer Science, Freie
Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - Feliks Nüske
- Max
Planck Institute for Dynamics of Complex Technical Systems, Sandtorstrasse 1, 39106Magdeburg, Germany
| | - Frank Noé
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Mathematics and Computer Science, Freie
Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Chemistry, Rice University, Houston, Texas77005, United States
| | - Cecilia Clementi
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Chemistry, Rice University, Houston, Texas77005, United States
- Department
of Physics, Rice University, Houston, Texas77005, United States
- E-mail:
| |
Collapse
|
8
|
Jin J, Pak AJ, Durumeric AEP, Loose TD, Voth GA. Bottom-up Coarse-Graining: Principles and Perspectives. J Chem Theory Comput 2022; 18:5759-5791. [PMID: 36070494 PMCID: PMC9558379 DOI: 10.1021/acs.jctc.2c00643] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Indexed: 01/14/2023]
Abstract
Large-scale computational molecular models provide scientists a means to investigate the effect of microscopic details on emergent mesoscopic behavior. Elucidating the relationship between variations on the molecular scale and macroscopic observable properties facilitates an understanding of the molecular interactions driving the properties of real world materials and complex systems (e.g., those found in biology, chemistry, and materials science). As a result, discovering an explicit, systematic connection between microscopic nature and emergent mesoscopic behavior is a fundamental goal for this type of investigation. The molecular forces critical to driving the behavior of complex heterogeneous systems are often unclear. More problematically, simulations of representative model systems are often prohibitively expensive from both spatial and temporal perspectives, impeding straightforward investigations over possible hypotheses characterizing molecular behavior. While the reduction in resolution of a study, such as moving from an atomistic simulation to that of the resolution of large coarse-grained (CG) groups of atoms, can partially ameliorate the cost of individual simulations, the relationship between the proposed microscopic details and this intermediate resolution is nontrivial and presents new obstacles to study. Small portions of these complex systems can be realistically simulated. Alone, these smaller simulations likely do not provide insight into collectively emergent behavior. However, by proposing that the driving forces in both smaller and larger systems (containing many related copies of the smaller system) have an explicit connection, systematic bottom-up CG techniques can be used to transfer CG hypotheses discovered using a smaller scale system to a larger system of primary interest. The proposed connection between different CG systems is prescribed by (i) the CG representation (mapping) and (ii) the functional form and parameters used to represent the CG energetics, which approximate potentials of mean force (PMFs). As a result, the design of CG methods that facilitate a variety of physically relevant representations, approximations, and force fields is critical to moving the frontier of systematic CG forward. Crucially, the proposed connection between the system used for parametrization and the system of interest is orthogonal to the optimization used to approximate the potential of mean force present in all systematic CG methods. The empirical efficacy of machine learning techniques on a variety of tasks provides strong motivation to consider these approaches for approximating the PMF and analyzing these approximations.
Collapse
Affiliation(s)
- Jaehyeok Jin
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Alexander J. Pak
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Aleksander E. P. Durumeric
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Timothy D. Loose
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Gregory A. Voth
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
9
|
Holtzman R, Giulini M, Potestio R. Making sense of complex systems through resolution, relevance, and mapping entropy. Phys Rev E 2022; 106:044101. [PMID: 36397524 DOI: 10.1103/physreve.106.044101] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 08/16/2022] [Indexed: 06/16/2023]
Abstract
Complex systems are characterized by a tight, nontrivial interplay of their constituents, which gives rise to a multiscale spectrum of emergent properties. In this scenario, it is practically and conceptually difficult to identify those degrees of freedom that mostly determine the behavior of the system and separate them from less prominent players. Here, we tackle this problem making use of three measures of statistical information: Resolution, relevance, and mapping entropy. We address the links existing among them, taking the moves from the established relation between resolution and relevance and further developing novel connections between resolution and mapping entropy; by these means we can identify, in a quantitative manner, the number and selection of degrees of freedom of the system that preserve the largest information content about the generative process that underlies an empirical dataset. The method, which is implemented in a freely available software, is fully general, as it is shown through the application to three very diverse systems, namely, a toy model of independent binary spins, a coarse-grained representation of the financial stock market, and a fully atomistic simulation of a protein.
Collapse
Affiliation(s)
- Roi Holtzman
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Marco Giulini
- Physics Department, University of Trento, via Sommarive, 14 I-38123 Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, I-38123 Trento, Italy
| | - Raffaello Potestio
- Physics Department, University of Trento, via Sommarive, 14 I-38123 Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, I-38123 Trento, Italy
| |
Collapse
|