1
|
Tiemann JKS, Szczuka M, Bouarroudj L, Oussaren M, Garcia S, Howard RJ, Delemotte L, Lindahl E, Baaden M, Lindorff-Larsen K, Chavent M, Poulain P. MDverse, shedding light on the dark matter of molecular dynamics simulations. eLife 2024; 12:RP90061. [PMID: 39212001 PMCID: PMC11364437 DOI: 10.7554/elife.90061] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/04/2024] Open
Abstract
The rise of open science and the absence of a global dedicated data repository for molecular dynamics (MD) simulations has led to the accumulation of MD files in generalist data repositories, constituting the dark matter of MD - data that is technically accessible, but neither indexed, curated, or easily searchable. Leveraging an original search strategy, we found and indexed about 250,000 files and 2000 datasets from Zenodo, Figshare and Open Science Framework. With a focus on files produced by the Gromacs MD software, we illustrate the potential offered by the mining of publicly available MD data. We identified systems with specific molecular composition and were able to characterize essential parameters of MD simulation such as temperature and simulation length, and could identify model resolution, such as all-atom and coarse-grain. Based on this analysis, we inferred metadata to propose a search engine prototype to explore the MD data. To continue in this direction, we call on the community to pursue the effort of sharing MD data, and to report and standardize metadata to reuse this valuable matter.
Collapse
Affiliation(s)
- Johanna KS Tiemann
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Magdalena Szczuka
- Institut de Pharmacologie et Biologie Structurale, CNRS, Université de ToulouseToulouseFrance
| | - Lisa Bouarroudj
- Université Paris Cité, CNRS, Institut Jacques MonodParisFrance
| | | | | | - Rebecca J Howard
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm UniversityStockholmSweden
| | - Lucie Delemotte
- Department of applied physics, Science for Life Laboratory, KTH Royal Institute of TechnologyStockholmSweden
| | - Erik Lindahl
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm UniversityStockholmSweden
- Department of applied physics, Science for Life Laboratory, KTH Royal Institute of TechnologyStockholmSweden
| | - Marc Baaden
- Laboratoire de Biochimie Théorique, CNRS, Université Paris CitéParisFrance
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Matthieu Chavent
- Institut de Pharmacologie et Biologie Structurale, CNRS, Université de ToulouseToulouseFrance
| | - Pierre Poulain
- Université Paris Cité, CNRS, Institut Jacques MonodParisFrance
| |
Collapse
|
2
|
Biriukov D, Vácha R. Pathways to a Shiny Future: Building the Foundation for Computational Physical Chemistry and Biophysics in 2050. ACS PHYSICAL CHEMISTRY AU 2024; 4:302-313. [PMID: 39069976 PMCID: PMC11274290 DOI: 10.1021/acsphyschemau.4c00003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 03/15/2024] [Accepted: 03/18/2024] [Indexed: 07/30/2024]
Abstract
In the last quarter-century, the field of molecular dynamics (MD) has undergone a remarkable transformation, propelled by substantial enhancements in software, hardware, and underlying methodologies. In this Perspective, we contemplate the future trajectory of MD simulations and their possible look at the year 2050. We spotlight the pivotal role of artificial intelligence (AI) in shaping the future of MD and the broader field of computational physical chemistry. We outline critical strategies and initiatives that are essential for the seamless integration of such technologies. Our discussion delves into topics like multiscale modeling, adept management of ever-increasing data deluge, the establishment of centralized simulation databases, and the autonomous refinement, cross-validation, and self-expansion of these repositories. The successful implementation of these advancements requires scientific transparency, a cautiously optimistic approach to interpreting AI-driven simulations and their analysis, and a mindset that prioritizes knowledge-motivated research alongside AI-enhanced big data exploration. While history reminds us that the trajectory of technological progress can be unpredictable, this Perspective offers guidance on preparedness and proactive measures, aiming to steer future advancements in the most beneficial and successful direction.
Collapse
Affiliation(s)
- Denys Biriukov
- CEITEC
− Central European Institute of Technology, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- National
Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
| | - Robert Vácha
- CEITEC
− Central European Institute of Technology, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- National
Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- Department
of Condensed Matter Physics, Faculty of Science, Masaryk University, Kotlářská 267/2, 611 37 Brno, Czech
Republic
| |
Collapse
|
3
|
Tiemann JKS, Szczuka M, Bouarroudj L, Oussaren M, Garcia S, Howard RJ, Delemotte L, Lindahl E, Baaden M, Lindorff-Larsen K, Chavent M, Poulain P. MDverse: Shedding Light on the Dark Matter of Molecular Dynamics Simulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.02.538537. [PMID: 37205542 PMCID: PMC10187166 DOI: 10.1101/2023.05.02.538537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
The rise of open science and the absence of a global dedicated data repository for molecular dynamics (MD) simulations has led to the accumulation of MD files in generalist data repositories, constituting the dark matter of MD - data that is technically accessible, but neither indexed, curated, or easily searchable. Leveraging an original search strategy, we found and indexed about 250,000 files and 2,000 datasets from Zenodo, Figshare and Open Science Framework. With a focus on files produced by the Gromacs MD software, we illustrate the potential offered by the mining of publicly available MD data. We identified systems with specific molecular composition and were able to characterize essential parameters of MD simulation such as temperature and simulation length, and could identify model resolution, such as all-atom and coarse-grain. Based on this analysis, we inferred metadata to propose a search engine prototype to explore the MD data. To continue in this direction, we call on the community to pursue the effort of sharing MD data, and to report and standardize metadata to reuse this valuable matter.
Collapse
|
4
|
Antila HS, Kav B, Miettinen MS, Martinez-Seara H, Jungwirth P, Ollila OHS. Emerging Era of Biomolecular Membrane Simulations: Automated Physically-Justified Force Field Development and Quality-Evaluated Databanks. J Phys Chem B 2022. [DOI: 10.1021/acs.jpcb.2c01954] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Hanne S. Antila
- Department of Biomaterials, Max Planck Institute of Colloids and Interfaces, 14424 Potsdam, Germany
| | - Batuhan Kav
- Institute of Biological Information Processing, Structural Biochemistry (IBI-7), Forschungszentrum
Jülich, Wilhelm-Johnen-Str., 52425 Jülich, Germany
| | - Markus S. Miettinen
- Computational Biology Unit, Department of Informatics, University of Bergen, 5008 Bergen, Norway
- Department of Chemistry, University of Bergen, 5020 Bergen, Norway
| | - Hector Martinez-Seara
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Flemingovo nam. 2, 16000 Prague 6, Czech Republic
| | - Pavel Jungwirth
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Flemingovo nam. 2, 16000 Prague 6, Czech Republic
| | - O. H. Samuli Ollila
- Institute of Biotechonology, University of Helsinki, Helsinki 00014, Finland
| |
Collapse
|
5
|
Martinez X, Baaden M. UnityMol prototype for FAIR sharing of molecular-visualization experiences: from pictures in the cloud to collaborative virtual reality exploration in immersive 3D environments. Acta Crystallogr D Struct Biol 2021; 77:746-754. [PMID: 34076589 PMCID: PMC8171070 DOI: 10.1107/s2059798321002941] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Accepted: 03/19/2021] [Indexed: 11/14/2022] Open
Abstract
Motivated by the current COVID-19 pandemic, which has spurred a substantial flow of structural data, the use of molecular-visualization experiences to make these data sets accessible to a broad audience is described. Using a variety of technology vectors related to the cloud, 3D and virtual reality gear, how to share curated visualizations of structural biology, modeling and/or bioinformatics data sets for interactive and collaborative exploration is examined. FAIR is discussed as an overarching principle for sharing such visualizations. Four initial example scenes related to recent COVID-19 structural data are provided, together with a ready-to-use (and share) implementation in the UnityMol software.
Collapse
Affiliation(s)
- Xavier Martinez
- CNRS, Université de Paris, UPR 9080, Laboratoire de Biochimie Théorique, 13 Rue Pierre et Marie Curie, 75005 Paris, France
- Institut de Biologie Physico-Chimique–Fondation Edmond de Rothschild, PSL Research University, Paris, France
| | - Marc Baaden
- CNRS, Université de Paris, UPR 9080, Laboratoire de Biochimie Théorique, 13 Rue Pierre et Marie Curie, 75005 Paris, France
- Institut de Biologie Physico-Chimique–Fondation Edmond de Rothschild, PSL Research University, Paris, France
| |
Collapse
|
6
|
Antila HS, M. Ferreira T, Ollila OHS, Miettinen MS. Using Open Data to Rapidly Benchmark Biomolecular Simulations: Phospholipid Conformational Dynamics. J Chem Inf Model 2021; 61:938-949. [PMID: 33496579 PMCID: PMC7903423 DOI: 10.1021/acs.jcim.0c01299] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Indexed: 01/08/2023]
Abstract
Molecular dynamics (MD) simulations are widely used to monitor time-resolved motions of biomacromolecules, although it often remains unknown how closely the conformational dynamics correspond to those occurring in real life. Here, we used a large set of open-access MD trajectories of phosphatidylcholine (PC) lipid bilayers to benchmark the conformational dynamics in several contemporary MD models (force fields) against nuclear magnetic resonance (NMR) data available in the literature: effective correlation times and spin-lattice relaxation rates. We found none of the tested MD models to fully reproduce the conformational dynamics. That said, the dynamics in CHARMM36 and Slipids are more realistic than in the Amber Lipid14, OPLS-based MacRog, and GROMOS-based Berger force fields, whose sampling of the glycerol backbone conformations is too slow. The performance of CHARMM36 persists when cholesterol is added to the bilayer, and when the hydration level is reduced. However, for conformational dynamics of the PC headgroup, both with and without cholesterol, Slipids provides the most realistic description because CHARMM36 overestimates the relative weight of ∼1 ns processes in the headgroup dynamics. We stress that not a single new simulation was run for the present work. This demonstrates the worth of open-access MD trajectory databanks for the indispensable step of any serious MD study: benchmarking the available force fields. We believe this proof of principle will inspire other novel applications of MD trajectory databanks and thus aid in developing biomolecular MD simulations into a true computational microscope-not only for lipid membranes but for all biomacromolecular systems.
Collapse
Affiliation(s)
- Hanne S. Antila
- Department
of Theory and Bio-Systems, Max Planck Institute
of Colloids and Interfaces, 14424 Potsdam, Germany
| | - Tiago M. Ferreira
- NMR
Group−Institute for Physics, Martin-Luther
University Halle-Wittenberg, 06120 Halle (Saale), Germany
| | | | - Markus S. Miettinen
- Department
of Theory and Bio-Systems, Max Planck Institute
of Colloids and Interfaces, 14424 Potsdam, Germany
| |
Collapse
|
7
|
Abriata LA, Dal Peraro M. State-of-the-art web services for de novo protein structure prediction. Brief Bioinform 2020; 22:5870389. [PMID: 34020540 DOI: 10.1093/bib/bbaa139] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 06/04/2020] [Accepted: 06/05/2020] [Indexed: 02/06/2023] Open
Abstract
Residue coevolution estimations coupled to machine learning methods are revolutionizing the ability of protein structure prediction approaches to model proteins that lack clear homologous templates in the Protein Data Bank (PDB). This has been patent in the last round of the Critical Assessment of Structure Prediction (CASP), which presented several very good models for the hardest targets. Unfortunately, literature reporting on these advances often lacks digests tailored to lay end users; moreover, some of the top-ranking predictors do not provide webservers that can be used by nonexperts. How can then end users benefit from these advances and correctly interpret the predicted models? Here we review the web resources that biologists can use today to take advantage of these state-of-the-art methods in their research, including not only the best de novo modeling servers but also datasets of models precomputed by experts for structurally uncharacterized protein families. We highlight their features, advantages and pitfalls for predicting structures of proteins without clear templates. We present a broad number of applications that span from driving forward biochemical investigations that lack experimental structures to actually assisting experimental structure determination in X-ray diffraction, cryo-EM and other forms of integrative modeling. We also discuss issues that must be considered by users yet still require further developments, such as global and residue-wise model quality estimates and sources of residue coevolution other than monomeric tertiary structure.
Collapse
Affiliation(s)
- Luciano A Abriata
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Matteo Dal Peraro
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| |
Collapse
|