1
|
Chen L, Medrano Sandonas L, Traber P, Dianat A, Tverdokhleb N, Hurevich M, Yitzchaik S, Gutierrez R, Croy A, Cuniberti G. MORE-Q, a dataset for molecular olfactorial receptor engineering by quantum mechanics. Sci Data 2025; 12:324. [PMID: 39987132 PMCID: PMC11846975 DOI: 10.1038/s41597-025-04616-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Accepted: 02/11/2025] [Indexed: 02/24/2025] Open
Abstract
We introduce the MORE-Q dataset, a quantum-mechanical (QM) dataset encompassing the structural and electronic data of non-covalent molecular sensors formed by combining 18 mucin-derived olfactorial receptors with 102 body odor volatilome (BOV) molecules. To have a better understanding of their intra- and inter-molecular interactions, we have performed accurate QM calculations in different stages of the sensor design and, accordingly, MORE-Q splits into three subsets: i) MORE-Q-G1: QM data of 18 receptors and 102 BOV molecules, ii) MORE-Q-G2: QM data of 23,838 BOV-receptor configurations, and iii) MORE-Q-G3: QM data of 1,836 BOV-receptor-graphene systems. Each subset involves geometries optimized using GFN2-xTB with D4 dispersion correction and up to 39 physicochemical properties, including global and local properties as well as binding features, all computed at the tightly converged PBE+D3 level of theory. By addressing BOV-receptor-graphene systems from a QM perspective, MORE-Q can serve as a benchmark dataset for state-of-the-art machine learning methods developed to predict binding features. This, in turn, can provide valuable insights for developing the next-generation mucin-derived olfactory receptor sensing devices.
Collapse
Affiliation(s)
- Li Chen
- Institute for Materials Science and Max Bergmann Center for Biomaterials, TUD Dresden University of Technology, 01062, Dresden, Germany
| | - Leonardo Medrano Sandonas
- Institute for Materials Science and Max Bergmann Center for Biomaterials, TUD Dresden University of Technology, 01062, Dresden, Germany.
| | - Philipp Traber
- Institute of Physical Chemistry, Friedrich Schiller University Jena, 07737, Jena, Germany
| | - Arezoo Dianat
- Institute for Materials Science and Max Bergmann Center for Biomaterials, TUD Dresden University of Technology, 01062, Dresden, Germany
| | - Nina Tverdokhleb
- Institute for Materials Science and Max Bergmann Center for Biomaterials, TUD Dresden University of Technology, 01062, Dresden, Germany
| | - Mattan Hurevich
- Institute of Chemistry and Center of Nanotechnology, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Shlomo Yitzchaik
- Institute of Chemistry and Center of Nanotechnology, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Rafael Gutierrez
- Institute for Materials Science and Max Bergmann Center for Biomaterials, TUD Dresden University of Technology, 01062, Dresden, Germany
| | - Alexander Croy
- Institute of Physical Chemistry, Friedrich Schiller University Jena, 07737, Jena, Germany.
| | - Gianaurelio Cuniberti
- Institute for Materials Science and Max Bergmann Center for Biomaterials, TUD Dresden University of Technology, 01062, Dresden, Germany.
- Dresden Center for Computational Materials Science (DCMS), TUD Dresden University of Technology, 01062, Dresden, Germany.
| |
Collapse
|
2
|
Chen J, Gao Q, Huang M, Yu K. Application of modern artificial intelligence techniques in the development of organic molecular force fields. Phys Chem Chem Phys 2025; 27:2294-2319. [PMID: 39820957 DOI: 10.1039/d4cp02989e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
The molecular force field (FF) determines the accuracy of molecular dynamics (MD) and is one of the major bottlenecks that limits the application of MD in molecular design. Recently, artificial intelligence (AI) techniques, such as machine-learning potentials (MLPs), have been rapidly reshaping the landscape of MD. Meanwhile, organic molecular systems feature unique characteristics, and require more careful treatment in both model construction, optimization, and validation. While an accurate and generic organic molecular force field is still missing, significant progress has been made with the facilitation of AI, warranting a promising future. In this review, we provide an overview of the various types of AI techniques used in molecular FF development and discuss both the advantages and weaknesses of these methodologies. We show how AI methods provide unprecedented capabilities in many tasks such as potential fitting, atom typification, and automatic optimization. Meanwhile, it is also worth noting that more efforts are needed to improve the transferability of the model, develop a more comprehensive database, and establish more standardized validation procedures. With these discussions, we hope to inspire more efforts to solve the existing problems, eventually leading to the birth of next-generation generic organic FFs.
Collapse
Affiliation(s)
- Junmin Chen
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Qian Gao
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| | - Miaofei Huang
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| | - Kuang Yu
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| |
Collapse
|
3
|
Chen M, Jiang X, Zhang L, Chen X, Wen Y, Gu Z, Li X, Zheng M. The emergence of machine learning force fields in drug design. Med Res Rev 2024; 44:1147-1182. [PMID: 38173298 DOI: 10.1002/med.22008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 11/29/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024]
Abstract
In the field of molecular simulation for drug design, traditional molecular mechanic force fields and quantum chemical theories have been instrumental but limited in terms of scalability and computational efficiency. To overcome these limitations, machine learning force fields (MLFFs) have emerged as a powerful tool capable of balancing accuracy with efficiency. MLFFs rely on the relationship between molecular structures and potential energy, bypassing the need for a preconceived notion of interaction representations. Their accuracy depends on the machine learning models used, and the quality and volume of training data sets. With recent advances in equivariant neural networks and high-quality datasets, MLFFs have significantly improved their performance. This review explores MLFFs, emphasizing their potential in drug design. It elucidates MLFF principles, provides development and validation guidelines, and highlights successful MLFF implementations. It also addresses potential challenges in developing and applying MLFFs. The review concludes by illuminating the path ahead for MLFFs, outlining the challenges to be overcome and the opportunities to be harnessed. This inspires researchers to embrace MLFFs in their investigations as a new tool to perform molecular simulations in drug design.
Collapse
Affiliation(s)
- Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Xinyu Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Lehan Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoxu Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Yiming Wen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Zhiyong Gu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| |
Collapse
|
4
|
Kumar YB, Kumar N, John L, Mahanta HJ, Vaikundamani S, Nagamani S, Sastry GM, Sastry GN. Analyzing the cation-aromatic interactions in proteins: Cation-aromatic database V2.0. Proteins 2024; 92:179-191. [PMID: 37789571 DOI: 10.1002/prot.26600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/17/2023] [Accepted: 09/07/2023] [Indexed: 10/05/2023]
Abstract
The cation-aromatic database (CAD) is a comprehensive repository of cation-aromatic motifs found in experimentally determined protein structures, first reported in 2007 [Proteins, 2007, 67, 1179]. The present article is an update of CAD that contains information of approximately 27.26 million cation-aromatic motifs. CAD uses three distance parameters (r, d1, and d2) to determine the position of the cation relative to the centroid of the aromatic residue and classifies the motifs as cation-π or cation-σ interactions. As of June 2023, about 193 936 protein structures were retrieved from Protein Data Bank, and this resulted in the identification of an impressive number of 27 255 817 cation-aromatic motifs. Among these motifs, spherical motifs constituted 94.09%, while cylindrical motifs made up the remaining 5.91%. When considering the interaction of metal ions with aromatic residues, 965 564 motifs are identified. Remarkably, 82.08% of these motifs involved the binding of metal ions to the amino acid HIS. Moreover, the analysis of binding preferences between cations and aromatic residues revealed that the HIS-HIS, PHE-ARG, and TRP-ARG pairs exhibited a preferential geometry. The motif pair HIS-HIS was the most prevalent, accounting for 19.87% of the total, closely followed by TYR-LYS at 10.17%. Conversely, the motif pair TRP-HIS had the lowest occurrence, representing only 4.20% of the total. The data generated help in revealing the characteristics and biological functions of cation-aromatic interactions in biological molecules. The updated version of CAD (Cation-Aromatic Database V2.0) can be accessed at https://acds.neist.res.in/cadv2.
Collapse
Affiliation(s)
- Y Bhargav Kumar
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, Assam, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Nandan Kumar
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, Assam, India
| | - Lijo John
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, Assam, India
| | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, Assam, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - S Vaikundamani
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, Assam, India
| | - Selvaraman Nagamani
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, Assam, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | | | - G Narahari Sastry
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, Assam, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| |
Collapse
|
5
|
Czernek J, Brus J. Reliable Dimerization Energies for Modeling of Supramolecular Junctions. Int J Mol Sci 2024; 25:602. [PMID: 38203773 PMCID: PMC10778993 DOI: 10.3390/ijms25010602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 12/23/2023] [Accepted: 12/25/2023] [Indexed: 01/12/2024] Open
Abstract
Accurate estimates of intermolecular interaction energy, ΔE, are crucial for modeling the properties of organic electronic materials and many other systems. For a diverse set of 50 dimers comprising up to 50 atoms (Set50-50, with 7 of its members being models of single-stacking junctions), benchmark ΔE data were compiled. They were obtained by the focal-point strategy, which involves computations using the canonical variant of the coupled cluster theory with singles, doubles, and perturbative triples [CCSD(T)] performed while applying a large basis set, along with extrapolations of the respective energy components to the complete basis set (CBS) limit. The resulting ΔE data were used to gauge the performance for the Set50-50 of several density-functional theory (DFT)-based approaches, and of one of the localized variants of the CCSD(T) method. This evaluation revealed that (1) the proposed "silver standard" approach, which employs the localized CCSD(T) method and CBS extrapolations, can be expected to provide accuracy better than two kJ/mol for absolute values of ΔE, and (2) from among the DFT techniques, computationally by far the cheapest approach (termed "ωB97X-3c/vDZP" by its authors) performed remarkably well. These findings are directly applicable in cost-effective yet reliable searches of the potential energy surfaces of noncovalent complexes.
Collapse
Affiliation(s)
- Jiří Czernek
- Institute of Macromolecular Chemistry, Czech Academy of Sciences, Heyrovsky Square 2, 16200 Prague, Czech Republic;
| | | |
Collapse
|
6
|
Ochieng SA, Patkowski K. Accurate three-body noncovalent interactions: the insights from energy decomposition. Phys Chem Chem Phys 2023; 25:28621-28637. [PMID: 37874287 DOI: 10.1039/d3cp03938b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
An impressive collection of accurate two-body interaction energies for small complexes has been assembled into benchmark databases and used to improve the performance of multiple density functional, semiempirical, and machine learning methods. Similar benchmark data on nonadditive three-body energies in molecular trimers are comparatively scarce, and the existing ones are practically limited to homotrimers. In this work, we present a benchmark dataset of 20 equilibrium noncovalent interaction energies for a small but diverse selection of 10 heteromolecular trimers. The new 3BHET dataset presents complexes that combine different interactions including π-π, anion-π, cation-π, and various motifs of hydrogen and halogen bonding in each trimer. A detailed symmetry-adapted perturbation theory (SAPT)-based energy decomposition of the two- and three-body interaction energies shows that 3BHET consists of electrostatics- and dispersion-dominated complexes. The nonadditive three-body contribution is dominated by induction, but its influence on the overall bonding type in the complex (as exemplified by its position on the ternary diagram) is quite small. We also tested the extended SAPT (XSAPT) approach which is capable of including some nonadditive interactions in clusters of any size. The resulting three-body dispersion term (obtained from the many-body dispersion formalism) is mostly in good agreement with the supermolecular CCSD(T)-MP2 values and the nonadditive induction term is similar to the three-body SAPT(DFT) data, but the overall three-body XSAPT energies are not very accurate as they are missing the first-order exchange terms.
Collapse
Affiliation(s)
- Sharon A Ochieng
- Department of Chemistry and Biochemistry, Auburn University, Auburn, Alabama 36849, USA.
| | - Konrad Patkowski
- Department of Chemistry and Biochemistry, Auburn University, Auburn, Alabama 36849, USA.
| |
Collapse
|
7
|
Triestram L, Falcioni F, Popelier PLA. Interacting Quantum Atoms and Multipolar Electrostatic Study of XH···π Interactions. ACS OMEGA 2023; 8:34844-34851. [PMID: 37779962 PMCID: PMC10535255 DOI: 10.1021/acsomega.3c04149] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 08/31/2023] [Indexed: 10/03/2023]
Abstract
The interaction energies of nine XH···π (X = C, N, and O) benzene-containing van der Waals complexes were analyzed, at the atomic and fragment levels, using QTAIM multipolar electrostatics and the energy partitioning method interacting quantum atoms/fragment (IQA/IQF). These descriptors were paired with the relative energy gradient method, which solidifies the connection between quantum mechanical properties and chemical interpretation. This combination provides a precise understanding, both qualitative and quantitative, of the nature of these interactions, which are ubiquitous in biochemical systems. The formation of the OH···π and NH···π systems is electrostatically driven, with the Qzz component of the quadrupole moment of the benzene carbons interacting with the charges of X and H in XH. There is the unexpectedly intramonomeric role of X-H (X = O, N) where its electrostatic energy helps the formation of the complex and its covalent energy thwarts it. However, the CH···π interaction is governed by exchange-correlation energies, thereby establishing a covalent character, as opposed to the literature's designation as a noncovalent interaction. Moreover, dispersion energy is relevant, statically and in absolute terms, but less relevant compared to other energy components in terms of the formation of the complex. Multipolar electrostatics are similar across all systems.
Collapse
Affiliation(s)
- Lena Triestram
- Department of Chemistry, University
of Manchester, Manchester M13 9PL, Great
Britain
| | - Fabio Falcioni
- Department of Chemistry, University
of Manchester, Manchester M13 9PL, Great
Britain
| | - Paul L. A. Popelier
- Department of Chemistry, University
of Manchester, Manchester M13 9PL, Great
Britain
| |
Collapse
|
8
|
Spronk SA, Glick ZL, Metcalf DP, Sherrill CD, Cheney DL. A quantum chemical interaction energy dataset for accurately modeling protein-ligand interactions. Sci Data 2023; 10:619. [PMID: 37699937 PMCID: PMC10497680 DOI: 10.1038/s41597-023-02443-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 08/03/2023] [Indexed: 09/14/2023] Open
Abstract
Fast and accurate calculation of intermolecular interaction energies is desirable for understanding many chemical and biological processes, including the binding of small molecules to proteins. The Splinter ["Symmetry-adapted perturbation theory (SAPT0) protein-ligand interaction"] dataset has been created to facilitate the development and improvement of methods for performing such calculations. Molecular fragments representing commonly found substructures in proteins and small-molecule ligands were paired into >9000 unique dimers, assembled into numerous configurations using an approach designed to adequately cover the breadth of the dimers' potential energy surfaces while enhancing sampling in favorable regions. ~1.5 million configurations of these dimers were randomly generated, and a structurally diverse subset of these were minimized to obtain an additional ~80 thousand local and global minima. For all >1.6 million configurations, SAPT0 calculations were performed with two basis sets to complete the dataset. It is expected that Splinter will be a useful benchmark dataset for training and testing various methods for the calculation of intermolecular interaction energies.
Collapse
Affiliation(s)
- Steven A Spronk
- Molecular Structure and Design, Bristol Myers Squibb Company, P. O. Box 5400, Princeton, NJ, 08543, USA.
| | - Zachary L Glick
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332-0400, USA
| | - Derek P Metcalf
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332-0400, USA
| | - C David Sherrill
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332-0400, USA.
| | - Daniel L Cheney
- Molecular Structure and Design, Bristol Myers Squibb Company, P. O. Box 5400, Princeton, NJ, 08543, USA
| |
Collapse
|
9
|
Villot C, Lao KU. Electronic structure theory on modeling short-range noncovalent interactions between amino acids. J Chem Phys 2023; 158:094301. [PMID: 36889981 DOI: 10.1063/5.0138032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023] Open
Abstract
While short-range noncovalent interactions (NCIs) are proving to be of importance in many chemical and biological systems, these atypical bindings happen within the so-called van der Waals envelope and pose an enormous challenge for current computational methods. We introduce SNCIAA, a database of 723 benchmark interaction energies of short-range noncovalent interactions between neutral/charged amino acids originated from protein x-ray crystal structures at the "gold standard" coupled-cluster with singles, doubles, and perturbative triples/complete basis set [CCSD(T)/CBS] level of theory with a mean absolute binding uncertainty less than 0.1 kcal/mol. Subsequently, a systematic assessment of commonly used computational methods, such as the second-order Møller-Plesset theory (MP2), density functional theory (DFT), symmetry-adapted perturbation theory (SAPT), composite electronic-structure methods, semiempirical approaches, and the physical-based potentials with machine learning (IPML) on SNCIAA is carried out. It is shown that the inclusion of dispersion corrections is essential even though these dimers are dominated by electrostatics, such as hydrogen bondings and salt bridges. Overall, MP2, ωB97M-V, and B3LYP+D4 turned out to be the most reliable methods for the description of short-range NCIs even in strongly attractive/repulsive complexes. SAPT is also recommended in describing short-range NCIs only if the δMP2 correction has been included. The good performance of IPML for dimers at close-equilibrium and long-range conditions is not transferable to the short-range. We expect that SNCIAA will assist the development/improvement/validation of computational methods, such as DFT, force-fields, and ML models, in describing NCIs across entire potential energy surfaces (short-, intermediate-, and long-range NCIs) on the same footing.
Collapse
Affiliation(s)
- Corentin Villot
- Department of Chemistry, Virginia Commonwealth University, Richmond, Virginia 23284, USA
| | - Ka Un Lao
- Department of Chemistry, Virginia Commonwealth University, Richmond, Virginia 23284, USA
| |
Collapse
|
10
|
Summers TJ, Hemmati R, Miller JE, Agbaglo DA, Cheng Q, DeYonker NJ. Evaluating the active site-substrate interplay between x-ray crystal structure and molecular dynamics in chorismate mutase. J Chem Phys 2023; 158:065101. [PMID: 36792523 DOI: 10.1063/5.0127106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Designing realistic quantum mechanical (QM) models of enzymes is dependent on reliably discerning and modeling residues, solvents, and cofactors important in crafting the active site microenvironment. Interatomic van der Waals contacts have previously demonstrated usefulness toward designing QM-models, but their measured values (and subsequent residue importance rankings) are expected to be influenceable by subtle changes in protein structure. Using chorismate mutase as a case study, this work examines the differences in ligand-residue interatomic contacts between an x-ray crystal structure and structures from a molecular dynamics simulation. Select structures are further analyzed using symmetry adapted perturbation theory to compute ab initio ligand-residue interaction energies. The findings of this study show that ligand-residue interatomic contacts measured for an x-ray crystal structure are not predictive of active site contacts from a sampling of molecular dynamics frames. In addition, the variability in interatomic contacts among structures is not correlated with variability in interaction energies. However, the results spotlight using interaction energies to characterize and rank residue importance in future computational enzymology workflows.
Collapse
Affiliation(s)
- Thomas J Summers
- Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, USA
| | - Reza Hemmati
- Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, USA
| | - Justin E Miller
- Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, USA
| | - Donatus A Agbaglo
- Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, USA
| | - Qianyi Cheng
- Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, USA
| | - Nathan J DeYonker
- Department of Chemistry, The University of Memphis, 213 Smith Chemistry Building, Memphis, Tennessee 38152-3550, USA
| |
Collapse
|
11
|
Kříž K, Schmidt L, Andersson AT, Walz MM, van der Spoel D. An Imbalance in the Force: The Need for Standardized Benchmarks for Molecular Simulation. J Chem Inf Model 2023; 63:412-431. [PMID: 36630710 PMCID: PMC9875315 DOI: 10.1021/acs.jcim.2c01127] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Indexed: 01/12/2023]
Abstract
Force fields (FFs) for molecular simulation have been under development for more than half a century. As with any predictive model, rigorous testing and comparisons of models critically depends on the availability of standardized data sets and benchmarks. While such benchmarks are rather common in the fields of quantum chemistry, this is not the case for empirical FFs. That is, few benchmarks are reused to evaluate FFs, and development teams rather use their own training and test sets. Here we present an overview of currently available tests and benchmarks for computational chemistry, focusing on organic compounds, including halogens and common ions, as FFs for these are the most common ones. We argue that many of the benchmark data sets from quantum chemistry can in fact be reused for evaluating FFs, but new gas phase data is still needed for compounds containing phosphorus and sulfur in different valence states. In addition, more nonequilibrium interaction energies and forces, as well as molecular properties such as electrostatic potentials around compounds, would be beneficial. For the condensed phases there is a large body of experimental data available, and tools to utilize these data in an automated fashion are under development. If FF developers, as well as researchers in artificial intelligence, would adopt a number of these data sets, it would become easier to compare the relative strengths and weaknesses of different models and to, eventually, restore the balance in the force.
Collapse
Affiliation(s)
- Kristian Kříž
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| | - Lisa Schmidt
- Faculty
of Biosciences, University of Heidelberg, Heidelberg69117, Germany
| | - Alfred T. Andersson
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| | - Marie-Madeleine Walz
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| | - David van der Spoel
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| |
Collapse
|
12
|
Nagy PR, Gyevi-Nagy L, Lőrincz BD, Kállay M. Pursuing the basis set limit of CCSD(T) non-covalent interaction energies for medium-sized complexes: case study on the S66 compilation. Mol Phys 2022. [DOI: 10.1080/00268976.2022.2109526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
Affiliation(s)
- Péter R. Nagy
- Faculty of Chemical Technology and Biotechnology, Department of Physical Chemistry and Materials Science, Budapest University of Technology and Economics, Budapest, Hungary
- ELKH-BME Quantum Chemistry Research Group, Budapest, Hungary
| | - László Gyevi-Nagy
- Faculty of Chemical Technology and Biotechnology, Department of Physical Chemistry and Materials Science, Budapest University of Technology and Economics, Budapest, Hungary
- ELKH-BME Quantum Chemistry Research Group, Budapest, Hungary
| | - Balázs D. Lőrincz
- Faculty of Chemical Technology and Biotechnology, Department of Physical Chemistry and Materials Science, Budapest University of Technology and Economics, Budapest, Hungary
- ELKH-BME Quantum Chemistry Research Group, Budapest, Hungary
| | - Mihály Kállay
- Faculty of Chemical Technology and Biotechnology, Department of Physical Chemistry and Materials Science, Budapest University of Technology and Economics, Budapest, Hungary
- ELKH-BME Quantum Chemistry Research Group, Budapest, Hungary
| |
Collapse
|
13
|
Sparrow ZM, Ernst BG, Quady TK, DiStasio RA. Uniting Nonempirical and Empirical Density Functional Approximation Strategies Using Constraint-Based Regularization. J Phys Chem Lett 2022; 13:6896-6904. [PMID: 35863751 DOI: 10.1021/acs.jpclett.2c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this work, we present a general framework that unites the two primary strategies for constructing density functional approximations (DFAs): nonempirical (NE) constraint satisfaction and empirical (E) data-driven optimization. The proposed method employs B-splines, bell-shaped spline functions with compact support, to construct each inhomogeneity correction factor (ICF). This choice offers several distinct advantages over traditional polynomial expansions by enabling explicit enforcement of linear and nonlinear constraints as well as ICF smoothness using Tikhonov and penalized B-splines (P-splines) regularization. As proof-of-concept, we use the so-called CASE (constrained and smoothed empirical) framework to construct a constraint-satisfying and data-driven global hybrid that exhibits enhanced performance across a diverse set of chemical properties. We argue that the CASE approach can be used to generate DFAs that maintain the physical rigor and transferability of NE-DFAs while leveraging high-quality quantum-mechanical data to remove the arbitrariness of ansatz selection and improve performance.
Collapse
Affiliation(s)
- Zachary M Sparrow
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States
| | - Brian G Ernst
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States
| | - Trine K Quady
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States
| | - Robert A DiStasio
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States
| |
Collapse
|
14
|
Prasad VK, Otero-de-la-Roza A, DiLabio GA. Small-Basis Set Density-Functional Theory Methods Corrected with Atom-Centered Potentials. J Chem Theory Comput 2022; 18:2913-2930. [PMID: 35412817 DOI: 10.1021/acs.jctc.2c00036] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Density functional theory (DFT) is currently the most popular method for modeling noncovalent interactions and thermochemistry. The accurate calculation of noncovalent interaction energies, reaction energies, and barrier heights requires choosing an appropriate functional and, typically, a relatively large basis set. Deficiencies of the density-functional approximation and the use of a limited basis set are the leading sources of error in the calculation of noncovalent and thermochemical properties in molecular systems. In this article, we present three new DFT methods based on the BLYP, M06-2X, and CAM-B3LYP functionals in combination with the 6-31G* basis set and corrected with atom-centered potentials (ACPs). ACPs are one-electron potentials that have the same form as effective-core potentials, except they do not replace any electrons. The ACPs developed in this work are used to generate energy corrections to the underlying DFT/basis-set method such that the errors in predicted chemical properties are minimized while maintaining the low computational cost of the parent methods. ACPs were developed for the elements H, B, C, N, O, F, Si, P, S, and Cl. The ACP parameters were determined using an extensive training set of 118655 data points, mostly of complete basis set coupled-cluster level quality. The target molecular properties for the ACP-corrected methods include noncovalent interaction energies, molecular conformational energies, reaction energies, barrier heights, and bond separation energies. The ACPs were tested first on the training set and then on a validation set of 42567 additional data points. We show that the ACP-corrected methods can predict the target molecular properties with accuracy close to complete basis set wavefunction theory methods, but at a computational cost of double-ζ DFT methods. This makes the new BLYP/6-31G*-ACP, M06-2X/6-31G*-ACP, and CAM-B3LYP/6-31G*-ACP methods uniquely suited to the calculation of noncovalent, thermochemical, and kinetic properties in large molecular systems.
Collapse
Affiliation(s)
- Viki Kumar Prasad
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia V1V 1V7, Canada
| | - Alberto Otero-de-la-Roza
- Departamento de Química Física y Analítica, Facultad de Química, Universidad de Oviedo, MALTA Consolider Team, Oviedo E-33006, Spain
| | - Gino A DiLabio
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia V1V 1V7, Canada
| |
Collapse
|
15
|
Prasad VK, Otero-de-la-Roza A, DiLabio GA. Fast and Accurate Quantum Mechanical Modeling of Large Molecular Systems Using Small Basis Set Hartree-Fock Methods Corrected with Atom-Centered Potentials. J Chem Theory Comput 2022; 18:2208-2232. [PMID: 35313106 DOI: 10.1021/acs.jctc.1c01128] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
There has been significant interest in developing fast and accurate quantum mechanical methods for modeling large molecular systems. In this work, by utilizing a machine learning regression technique, we have developed new low-cost quantum mechanical approaches to model large molecular systems. The developed approaches rely on using one-electron Gaussian-type functions called atom-centered potentials (ACPs) to correct for the basis set incompleteness and the lack of correlation effects in the underlying minimal or small basis set Hartree-Fock (HF) methods. In particular, ACPs are proposed for ten elements common in organic and bioorganic chemistry (H, B, C, N, O, F, Si, P, S, and Cl) and four different base methods: two minimal basis sets (MINIs and MINIX) plus a double-ζ basis set (6-31G*) in combination with dispersion-corrected HF (HF-D3/MINIs, HF-D3/MINIX, HF-D3/6-31G*) and the HF-3c method. The new ACPs are trained on a very large set (73 832 data points) of noncovalent properties (interaction and conformational energies) and validated additionally on a set of 32 048 data points. All reference data are of complete basis set coupled-cluster quality, mostly CCSD(T)/CBS. The proposed ACP-corrected methods are shown to give errors in the tenths of a kcal/mol range for noncovalent interaction energies and up to 2 kcal/mol for molecular conformational energies. More importantly, the average errors are similar in the training and validation sets, confirming the robustness and applicability of these methods outside the boundaries of the training set. In addition, the performance of the new ACP-corrected methods is similar to complete basis set density functional theory (DFT) but at a cost that is orders of magnitude lower, and the proposed ACPs can be used in any computational chemistry program that supports effective-core potentials without modification. It is also shown that ACPs improve the description of covalent and noncovalent bond geometries of the underlying methods and that the improvement brought about by the application of the ACPs is directly related to the number of atoms to which they are applied, allowing the treatment of systems containing some atoms for which ACPs are not available. Overall, the ACP-corrected methods proposed in this work constitute an alternative accurate, economical, and reliable quantum mechanical approach to describe the geometries, interaction energies, and conformational energies of systems with hundreds to thousands of atoms.
Collapse
Affiliation(s)
- Viki Kumar Prasad
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia, Canada V1V 1V7
| | - Alberto Otero-de-la-Roza
- MALTA Consolider Team, Departamento de Química Física y Analítica, Facultad de Química, Universidad de Oviedo, E-33006 Oviedo, Spain
| | - Gino A DiLabio
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia, Canada V1V 1V7
| |
Collapse
|
16
|
Santra G, Semidalas E, Mehta N, Karton A, Martin JML. S66x8 noncovalent interactions revisited: new benchmark and performance of composite localized coupled-cluster methods. Phys Chem Chem Phys 2022; 24:25555-25570. [DOI: 10.1039/d2cp03938a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The S66x8 noncovalent interactions benchmark has been re-evaluated at the “sterling silver” level. Against this, a selection of computationally more economical alternatives has been assayed, ranging from localized CC to double hybrids and SAPT(DFT).
Collapse
Affiliation(s)
- Golokesh Santra
- Department of Molecular Chemistry and Materials Science, Weizmann Institute of Science, 7610001 Reḥovot, Israel
| | - Emmanouil Semidalas
- Department of Molecular Chemistry and Materials Science, Weizmann Institute of Science, 7610001 Reḥovot, Israel
| | - Nisha Mehta
- Department of Molecular Chemistry and Materials Science, Weizmann Institute of Science, 7610001 Reḥovot, Israel
| | - Amir Karton
- School of Molecular Sciences, The University of Western Australia, Perth, WA 6009, Australia
- School of Science and Technology, University of New England, Armidale, NSW 2351, Australia
| | - Jan M. L. Martin
- Department of Molecular Chemistry and Materials Science, Weizmann Institute of Science, 7610001 Reḥovot, Israel
| |
Collapse
|
17
|
Řezáč J. Non-Covalent Interactions Atlas Benchmark Data Sets 5: London Dispersion in an Extended Chemical Space. Phys Chem Chem Phys 2022; 24:14780-14793. [DOI: 10.1039/d2cp01602h] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The Non-Covalent Interactions Atlas (www.nciatlas.org) has been extended with two data sets of benchmark interaction energies in complexes dominated by London dispersion. The D1200 data set of equilibrium geometries provides...
Collapse
|