1
|
Landrum GA, Braun J, Katzberger P, Lehner MT, Riniker S. lwreg: A Lightweight System for Chemical Registration and Data Storage. J Chem Inf Model 2024; 64:6247-6252. [PMID: 39114929 DOI: 10.1021/acs.jcim.4c01133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Here, we present lwreg, a lightweight, yet flexible chemical registration system supporting the capture of both two-dimensional molecular structures (topologies) and three-dimensional conformers. lwreg is open source, with a simple Python API, and is designed to be easily integrated into computational workflows. In addition to lwreg itself, we also introduce a straightforward schema for storing experimental data and metadata in the registration database. This direct connection between compound structural information and data generated using those structures creates a powerful tool for data analysis and experimental reproducibility. The software is available at and installable directly from https://github.com/rinikerlab/lightweight-registration.
Collapse
Affiliation(s)
- Gregory A Landrum
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Jessica Braun
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Paul Katzberger
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Marc T Lehner
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Sereina Riniker
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
2
|
Lehner MT, Katzberger P, Maeder N, Landrum GA, Riniker S. DASH properties: Estimating atomic and molecular properties from a dynamic attention-based substructure hierarchy. J Chem Phys 2024; 161:074103. [PMID: 39145551 DOI: 10.1063/5.0218154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 08/01/2024] [Indexed: 08/16/2024] Open
Abstract
Recently, we presented a method to assign atomic partial charges based on the DASH (dynamic attention-based substructure hierarchy) tree with high efficiency and quantum mechanical (QM)-like accuracy. In addition, the approach can be considered "rule based"-where the rules are derived from the attention values of a graph neural network-and thus, each assignment is fully explainable by visualizing the underlying molecular substructures. In this work, we demonstrate that these hierarchically sorted substructures capture the key features of the local environment of an atom and allow us to predict different atomic properties with high accuracy without building a new DASH tree for each property. The fast prediction of atomic properties in molecules with the DASH tree can, for example, be used as an efficient way to generate feature vectors for machine learning without the need for expensive QM calculations. The final DASH tree with the different atomic properties as well as the complete dataset with wave functions is made freely available.
Collapse
Affiliation(s)
- Marc T Lehner
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Paul Katzberger
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Niels Maeder
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Gregory A Landrum
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Sereina Riniker
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| |
Collapse
|
3
|
Behara PK, Jang H, Horton JT, Gokey T, Dotson DL, Boothroyd S, Bayly CI, Cole DJ, Wang LP, Mobley DL. Benchmarking Quantum Mechanical Levels of Theory for Valence Parametrization in Force Fields. J Phys Chem B 2024; 128:7888-7902. [PMID: 39087913 PMCID: PMC11331531 DOI: 10.1021/acs.jpcb.4c03167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 07/09/2024] [Accepted: 07/15/2024] [Indexed: 08/02/2024]
Abstract
A wide range of density functional methods and basis sets are available to derive the electronic structure and properties of molecules. Quantum mechanical calculations are too computationally intensive for routine simulation of molecules in the condensed phase, prompting the development of computationally efficient force fields based on quantum mechanical data. Parametrizing general force fields, which cover a vast chemical space, necessitates the generation of sizable quantum mechanical data sets with optimized geometries and torsion scans. To achieve this efficiently, choosing a quantum mechanical method that balances computational cost and accuracy is crucial. In this study, we seek to assess the accuracy of quantum mechanical theory for specific properties such as conformer energies and torsion energetics. To comprehensively evaluate various methods, we focus on a representative set of 59 diverse small molecules, comparing approximately 25 combinations of functional and basis sets against the reference level coupled cluster calculations at the complete basis set limit.
Collapse
Affiliation(s)
- Pavan Kumar Behara
- Center
for Neurotherapeutics, University of California, Irvine, California 92697, United States
| | - Hyesu Jang
- Chemistry
Department, University of California at
Davis, Davis, California 95616, United States
- OpenEye
Scientific Software, Santa
Fe, New Mexico 87508, United States
| | - Joshua T. Horton
- School
of Natural and Environmental Sciences, Newcastle
University, Newcastle
upon Tyne NE1 7RU, U.K.
| | - Trevor Gokey
- Department
of Chemistry, University of California, Irvine, California 92697, United States
| | - David L. Dotson
- The
Open Force Field Initiative, Open Molecular Software Foundation, Davis, California 95616, United States
- Datryllic
LLC, Phoenix, Arizona 85003, United States
| | | | | | - Daniel J. Cole
- School
of Natural and Environmental Sciences, Newcastle
University, Newcastle
upon Tyne NE1 7RU, U.K.
| | - Lee-Ping Wang
- Chemistry
Department, University of California at
Davis, Davis, California 95616, United States
| | - David L. Mobley
- Center
for Neurotherapeutics, University of California, Irvine, California 92697, United States
- Department
of Chemistry, University of California, Irvine, California 92697, United States
| |
Collapse
|
4
|
Katzberger P, Riniker S. A general graph neural network based implicit solvation model for organic molecules in water. Chem Sci 2024; 15:10794-10802. [PMID: 39027274 PMCID: PMC11253111 DOI: 10.1039/d4sc02432j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 05/24/2024] [Indexed: 07/20/2024] Open
Abstract
The dynamical behavior of small molecules in their environment can be studied with classical molecular dynamics (MD) simulations to gain deeper insight on an atomic level and thus complement and rationalize the interpretation of experimental findings. Such approaches are of great value in various areas of research, e.g., in the development of new therapeutics. The accurate description of solvation effects in such simulations is thereby key and has in consequence been an active field of research since the introduction of MD. So far, the most accurate approaches involve computationally expensive explicit solvent simulations, while widely applied models using an implicit solvent description suffer from reduced accuracy. Recently, machine learning (ML) approaches that provide a probabilistic representation of solvation effects have been proposed as potential alternatives. However, the associated computational costs and minimal or lack of transferability render them unusable in practice. Here, we report the first example of a transferable ML-based implicit solvent model trained on a diverse set of 3 000 000 molecular structures that can be applied to organic small molecules for simulations in water. Extensive testing against reference calculations demonstrated that the model delivers on par accuracy with explicit solvent simulations while providing an up to 18-fold increase in sampling rate.
Collapse
Affiliation(s)
- Paul Katzberger
- Department of Chemistry and Applied Biosciences, ETH Zürich Vladimir-Prelog-Weg 2 8093 Zürich Switzerland
| | - Sereina Riniker
- Department of Chemistry and Applied Biosciences, ETH Zürich Vladimir-Prelog-Weg 2 8093 Zürich Switzerland
| |
Collapse
|
5
|
Davel CM, Bernat T, Wagner JR, Shirts MR. Parameterization of General Organic Polymers within the Open Force Field Framework. J Chem Inf Model 2024; 64:1290-1305. [PMID: 38303159 PMCID: PMC11090695 DOI: 10.1021/acs.jcim.3c01691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Polymer and chemically modified biopolymer systems present unique challenges to traditional molecular simulation preparation workflows. First, typical polymer and biomolecular input formats, such as Protein Data Bank (PDB) files, lack adequate chemical information needed for the parameterization of new chemistries. Second, polymers are typically too large for accurate partial charge generation methods. In this work, we employ direct chemical perception through the Open Force Field toolkit to create a flexible polymer simulation workflow for organic polymers, encompassing everything from biopolymers to soft materials. We propose and test a new input specification for monomer information that can, along with a 3D conformational geometry, parametrize and simulate most soft-material systems within the same workflow used for smaller ligands. The monomer format encompasses a subset of the SMIRKS substructure query language to uniquely identify chemical information and repeating charges in underspecified systems through matching atomic connectivity. This workflow is combined with several different approaches for automatic partial-charge generation for larger systems. As an initial proof of concept, a variety of diverse polymeric systems were parametrized with the Open Force Field toolkit, including functionalized proteins, DNA, homopolymers, cross-linked systems, and sugars. Additionally, shape properties and radial distribution functions were computed from molecular dynamics simulations of poly(ethylene glycol), polyacrylamide, and poly(N-isopropylacrylamide) homopolymers in aqueous solution and compared to previous simulation results in order to demonstrate a start-to-finish workflow for simulation and property prediction. We expect that these tools will greatly expedite the day-to-day computational research of soft-matter simulations and create a robust atomic-scale polymer specification in conjunction with existing polymer structural notations.
Collapse
Affiliation(s)
- Connor M Davel
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Timotej Bernat
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Jeffrey R Wagner
- The Open Force Field Initiative, Open Molecular Software Foundation, Davis, California 95616, United States
| | - Michael R Shirts
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| |
Collapse
|