1
|
Singh B, Mondal A, Gaalswyk K, MacCallum JL, Perez A. MELD-Adapt: On-the-Fly Belief Updating in Integrative Molecular Dynamics. J Chem Theory Comput 2024; 20:9230-9242. [PMID: 39356805 DOI: 10.1021/acs.jctc.4c00690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2024]
Abstract
Integrative structural biology synergizes experimental data with computational methods to elucidate the structures and interactions within biomolecules, a task that becomes critical in the absence of high-resolution structural data. A challenging step for integrating the data is knowing the expected accuracy or belief in the dataset. We previously showed that the Modeling Employing Limited Data (MELD) approach succeeds at predicting structures and finding the best interpretation of the data when the initial belief is equal to or slightly lower than the real value. However, the initial belief might be unknown to the user, as it depends on both the technique and the system of study. Here we introduce MELD-Adapt, designed to dynamically evaluate and infer the reliability of input data while at the same time finding the best interpretation of the data and the structures compatible with it. We demonstrate the utility of this method across different systems, particularly emphasizing its capability to correct initial assumptions and identify the correct fraction of data to produce reliable structural models. The approach is tested with two benchmark sets: the folding of 12 proteins with coarse physical insights and the binding of peptides with varying affinities to the extraterminal domain using chemical shift perturbation data. We find that subtle differences in data structure (e.g., locally clustered or globally distributed), starting belief, and force field preferences can have an impact on the predictions, limiting the possibility of a transferable protocol across all systems and data types. Nonetheless, we find a wide range of initial setup conditions that will lead to successful sampling and identification of native states, leading to a robust pipeline. Furthermore, disagreements about how much data is enforced and satisfied rapidly serve to identify incorrect setup conditions.
Collapse
Affiliation(s)
- Bhumika Singh
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32611-7011, United States
| | - Arup Mondal
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32611-7011, United States
| | - Kari Gaalswyk
- Department of Chemistry, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Justin L MacCallum
- Department of Chemistry, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Alberto Perez
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32611-7011, United States
| |
Collapse
|
2
|
Chen L, Mondal A, Perez A, Miranda-Quintana RA. Protein Retrieval via Integrative Molecular Ensembles (PRIME) through Extended Similarity Indices. J Chem Theory Comput 2024; 20:6303-6315. [PMID: 38978294 DOI: 10.1021/acs.jctc.4c00362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Molecular dynamics (MD) simulations are ideally suited to describe conformational ensembles of biomolecules such as proteins and nucleic acids. Microsecond-long simulations are now routine, facilitated by the emergence of graphical processing units. Clustering, which groups objects based on structural similarity, is typically used to process ensembles, leading to different states, their populations, and the identification of representative structures. A popular pipeline combines hierarchical clustering for clustering and selecting the cluster centroid as representative of the cluster. Here, we propose to improve on this approach, by developing a module-Protein Retrieval via Integrative Molecular Ensembles (PRIME), that consists of tools to improve the prediction of the representative in the most populated cluster using extended continuous similarity. PRIME is integrated with our Molecular Dynamics Analysis with N-ary Clustering Ensembles (MDANCE) package and can be used as a postprocessing tool for arbitrary clustering algorithms, compatible with several MD suites. PRIME predictions produced structures that when aligned to the experimental structure were better superposed (lower RMSD). A further benefit of PRIME is its linear scaling─rather than the traditional O(N2) traditionally associated with comparisons of elements in a set.
Collapse
Affiliation(s)
- Lexin Chen
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| | - Arup Mondal
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| | - Alberto Perez
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| | - Ramón Alain Miranda-Quintana
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| |
Collapse
|
3
|
Caparotta M, Perez A. Advancing Molecular Dynamics: Toward Standardization, Integration, and Data Accessibility in Structural Biology. J Phys Chem B 2024; 128:2219-2227. [PMID: 38418288 DOI: 10.1021/acs.jpcb.3c04823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2024]
Abstract
Molecular dynamics (MD) simulations have become a valuable tool in structural biology, offering insights into complex biological systems that are difficult to obtain through experimental techniques alone. The lack of available data sets and structures in most published computational work has limited other researchers' use of these models. In recent years, the emergence of online sharing platforms and MD database initiatives favor the deposition of ensembles and structures to accompany publications, favoring reuse of the data sets. However, the lack of uniform metadata collection, formats, and what data are deposited limits the impact and its use by different communities that are not necessarily experts in MD. This Perspective highlights the need for standardization and better resource sharing for processing and interpreting MD simulation results, akin to efforts in other areas of structural biology. As the field moves forward, we will see an increase in popularity and benefits of MD-based integrative approaches combining experimental data and simulations through probabilistic reasoning, but these too are limited by uniformity in experimental data availability and choices on how the data are modeled that are not trivial to decipher from papers. Other fields have addressed similar challenges comprehensively by establishing task forces with different degrees of success. The large scope and number of communities to represent the breadth of types of MD simulations complicates a parallel approach that would fit all. Thus, each group typically decides what data and which format to upload on servers like Zenodo. Uploading data with FAIR (findable, accessible, interoperable, reusable) principles in mind including optimal metadata collection will make the data more accessible and actionable by the community. Such a wealth of simulation data will foster method development and infrastructure advancements, thus propelling the field forward.
Collapse
Affiliation(s)
- Marcelo Caparotta
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| | - Alberto Perez
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| |
Collapse
|
4
|
Mondal A, Lenz S, MacCallum JL, Perez A. Hybrid computational methods combining experimental information with molecular dynamics. Curr Opin Struct Biol 2023; 81:102609. [PMID: 37224642 DOI: 10.1016/j.sbi.2023.102609] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/12/2023] [Accepted: 04/23/2023] [Indexed: 05/26/2023]
Abstract
A goal of structural biology is to understand how macromolecules carry out their biological roles by identifying their metastable states, mechanisms of action, pathways leading to conformational changes, and the thermodynamic and kinetic relationships between those states. Integrative modeling brings structural insights into systems where traditional structure determination approaches cannot help. We focus on the synergies and challenges of integrative modeling combining experimental data with molecular dynamics simulations.
Collapse
Affiliation(s)
- Arup Mondal
- Quantum Theory Project, Department of Chemistry, University of Florida, Leigh, UK. https://twitter.com/@amondal_chem
| | - Stefan Lenz
- Department of Chemistry, University of Calgary, 2500 University Drive, Canada
| | - Justin L MacCallum
- Department of Chemistry, University of Calgary, 2500 University Drive, Canada. https://twitter.com/@jlmaccal
| | - Alberto Perez
- Quantum Theory Project, Department of Chemistry, University of Florida, Leigh, UK.
| |
Collapse
|
5
|
Chang L, Mondal A, MacCallum JL, Perez A. CryoFold 2.0: Cryo-EM Structure Determination with MELD. J Phys Chem A 2023; 127:3906-3913. [PMID: 37084537 DOI: 10.1021/acs.jpca.3c01731] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2023]
Abstract
Cryo-electron microscopy data are becoming more prevalent and accessible at higher resolution levels, leading to the development of new computational tools to determine the atomic structure of macromolecules. However, while existing tools adapted from X-ray crystallography are suitable for the highest-resolution maps, new tools are needed for lower-resolution levels and to account for map heterogeneity. In this article, we introduce CryoFold 2.0, an integrative physics-based approach that combines Bayesian inference and the ability to handle multiple data sources with the molecular dynamics flexible fitting (MDFF) approach to determine the structures of macromolecules by using cryo-EM data. CryoFold 2.0 is incorporated into the MELD (modeling employing limited data) plugin, resulting in a pipeline that is more computationally efficient and accurate than running MELD or MDFF alone. The approach requires fewer computational resources and shorter simulation times than the original CryoFold, and it minimizes manual intervention. We demonstrate the effectiveness of the approach on eight different systems, highlighting its various benefits.
Collapse
Affiliation(s)
- Liwei Chang
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| | - Arup Mondal
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| | - Justin L MacCallum
- Department of Chemistry, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Alberto Perez
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| |
Collapse
|
6
|
Lubecka EA, Liwo A. A coarse-grained approach to NMR-data-assisted modeling of protein structures. J Comput Chem 2022; 43:2047-2059. [PMID: 36134668 DOI: 10.1002/jcc.27003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 08/03/2022] [Accepted: 09/05/2022] [Indexed: 11/06/2022]
Abstract
The ESCASA algorithm for analytical estimation of proton positions from coarse-grained geometry developed in our recent work has been implemented in modeling protein structures with the highly coarse-grained UNRES model of polypeptide chains (two sites per residue) and nuclear magnetic resonance (NMR) data. A penalty function with the shape of intersecting gorges was applied to treat ambiguous distance restraints, which automatically selects consistent restraints. Hamiltonian replica exchange molecular dynamics was used to carry out the conformational search. The method was tested with both unambiguous and ambiguous restraints producing good-quality models with GDT_TS from 7.4 units higher to 14.4 units lower than those obtained with the CYANA or MELD software for protein-structure determination from NMR data at the all-atom resolution. The method can thus be applied in modeling the structures of flexible proteins, for which extensive conformational search enabled by coarse-graining is more important than high modeling accuracy.
Collapse
Affiliation(s)
- Emilia A Lubecka
- Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Gdańsk, Poland
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Gdańsk, Poland
| |
Collapse
|