1
|
Plett C, Grimme S, Hansen A. Toward Reliable Conformational Energies of Amino Acids and Dipeptides─The DipCONFS Benchmark and DipCONL Datasets. J Chem Theory Comput 2024. [PMID: 39259679 DOI: 10.1021/acs.jctc.4c00801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024]
Abstract
Simulating peptides and proteins is becoming increasingly important, leading to a growing need for efficient computational methods. These are typically semiempirical quantum mechanical (SQM) methods, force fields (FFs), or machine-learned interatomic potentials (MLIPs), all of which require a large amount of accurate data for robust training and evaluation. To assess potential reference methods and complement the available data, we introduce two sets, DipCONFL and DipCONFS, which cover large parts of the conformational space of 17 amino acids and their 289 possible dipeptides in aqueous solution. The conformers were selected from the exhaustive PeptideCS dataset by Andris et al. [ J. Phys. Chem. B 2022, 126, 5949-5958]. The structures, originally generated with GFN2-xTB, were reoptimized using the accurate r2SCAN-3c density functional theory (DFT) composite method including the implicit CPCM water solvation model. The DipCONFS benchmark set contains 918 conformers and is one of the largest sets with highly accurate coupled cluster conformational energies so far. It is employed to evaluate various DFT and wave function theory (WFT) methods, especially regarding whether they are accurate enough to be used as reliable reference methods for larger datasets intended for training and testing more approximated SQM, FF, and MLIP methods. The results reveal that the originally provided BP86-D3(BJ)/DGauss-DZVP conformational energies are not sufficiently accurate. Among the DFT methods tested as an alternative reference level, the revDSD-PBEP86-D4 double hybrid performs best with a mean absolute error (MAD) of 0.2 kcal mol-1 compared with the PNO-LCCSD(T)-F12b reference. The very efficient r2SCAN-3c composite method also shows excellent results, with an MAD of 0.3 kcal mol-1, similar to the best-tested hybrid ωB97M-D4. With these findings, we compiled the large DipCONFL set, which includes over 29,000 realistic conformers in solution with reasonably accurate r2SCAN-3c reference conformational energies, gradients, and further properties potentially relevant for training MLIP methods. This set, also in comparison to DipCONFS, is used to assess the performance of various SQM, FF, and MLIP methods robustly and can complement training sets for those.
Collapse
Affiliation(s)
- Christoph Plett
- Mulliken Center for Theoretical Chemistry, Clausius-Institut für Physikalische und Theoretische Chemie, Universität Bonn, Beringstraße 4, 53115 Bonn, Germany
| | - Stefan Grimme
- Mulliken Center for Theoretical Chemistry, Clausius-Institut für Physikalische und Theoretische Chemie, Universität Bonn, Beringstraße 4, 53115 Bonn, Germany
| | - Andreas Hansen
- Mulliken Center for Theoretical Chemistry, Clausius-Institut für Physikalische und Theoretische Chemie, Universität Bonn, Beringstraße 4, 53115 Bonn, Germany
| |
Collapse
|
2
|
Tehrani A, Richer M, Heidar-Zadeh F. CuGBasis: High-performance CUDA/Python library for efficient computation of quantum chemistry density-based descriptors for larger systems. J Chem Phys 2024; 161:072501. [PMID: 39158048 DOI: 10.1063/5.0216781] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 06/17/2024] [Indexed: 08/20/2024] Open
Abstract
CuGBasis is a free and open-source CUDA®/Python library for efficient computation of scalar, vector, and matrix quantities crucial for the post-processing of electronic structure calculations. CuGBasis integrates high-performance Graphical Processing Unit (GPU) computing with the ease and flexibility of Python programming, making it compatible with a vast ecosystem of libraries. We showcase its utility as a Python library and demonstrate its seamless interoperability with existing Python software to gain chemical insight from quantum chemistry calculations. Leveraging GPU-accelerated code, cuGBasis exhibits remarkable performance, making it highly applicable to larger systems or large databases. Our benchmarks reveal a 100-fold performance gain compared to alternative software packages, including serial/multi-threaded Central Processing Unit and GPU implementations. This paper outlines various features and computational strategies that lead to cuGBasis's enhanced performance, guiding developers of GPU-accelerated code.
Collapse
Affiliation(s)
- Alireza Tehrani
- Department of Chemistry, Queen's University, Kingston, Ontario K7L-3N6, Canada
| | - Michelle Richer
- Department of Chemistry, Queen's University, Kingston, Ontario K7L-3N6, Canada
| | - Farnaz Heidar-Zadeh
- Department of Chemistry, Queen's University, Kingston, Ontario K7L-3N6, Canada
| |
Collapse
|
3
|
Takaba K, Friedman AJ, Cavender CE, Behara PK, Pulido I, Henry MM, MacDermott-Opeskin H, Iacovella CR, Nagle AM, Payne AM, Shirts MR, Mobley DL, Chodera JD, Wang Y. Machine-learned molecular mechanics force fields from large-scale quantum chemical data. Chem Sci 2024; 15:12861-12878. [PMID: 39148808 PMCID: PMC11322960 DOI: 10.1039/d4sc00690a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 06/17/2024] [Indexed: 08/17/2024] Open
Abstract
The development of reliable and extensible molecular mechanics (MM) force fields-fast, empirical models characterizing the potential energy surface of molecular systems-is indispensable for biomolecular simulation and computer-aided drug design. Here, we introduce a generalized and extensible machine-learned MM force field, espaloma-0.3, and an end-to-end differentiable framework using graph neural networks to overcome the limitations of traditional rule-based methods. Trained in a single GPU-day to fit a large and diverse quantum chemical dataset of over 1.1 M energy and force calculations, espaloma-0.3 reproduces quantum chemical energetic properties of chemical domains highly relevant to drug discovery, including small molecules, peptides, and nucleic acids. Moreover, this force field maintains the quantum chemical energy-minimized geometries of small molecules and preserves the condensed phase properties of peptides and folded proteins, self-consistently parametrizing proteins and ligands to produce stable simulations leading to highly accurate predictions of binding free energies. This methodology demonstrates significant promise as a path forward for systematically building more accurate force fields that are easily extensible to new chemical domains of interest.
Collapse
Affiliation(s)
- Kenichiro Takaba
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
- Pharmaceuticals Research Center, Advanced Drug Discovery, Asahi Kasei Pharma Corporation Shizuoka 410-2321 Japan
| | - Anika J Friedman
- Department of Chemical and Biological Engineering, University of Colorado Boulder Boulder CO 80309 USA
| | - Chapin E Cavender
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego 9500 Gilman Drive La Jolla CA 92093 USA
| | - Pavan Kumar Behara
- Center for Neurotherapeutics, Department of Pathology and Laboratory Medicine, University of California Irvine CA 92697 USA
| | - Iván Pulido
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| | - Michael M Henry
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| | | | - Christopher R Iacovella
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| | - Arnav M Nagle
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
- Department of Bioengineering, University of California, Berkeley Berkeley CA 94720 USA
| | - Alexander Matthew Payne
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
- Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center New York 10065 USA
| | - Michael R Shirts
- Department of Chemical and Biological Engineering, University of Colorado Boulder Boulder CO 80309 USA
| | - David L Mobley
- Department of Pharmaceutical Sciences, University of California Irvine California 92697 USA
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| | - Yuanqing Wang
- Simons Center for Computational Physical Chemistry and Center for Data Science, New York University New York NY 10004 USA
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| |
Collapse
|
4
|
Slattery SA, Yon JC, Valeev EF. Revisiting Artifacts of Kohn-Sham Density Functionals for Biosimulation. J Chem Theory Comput 2024; 20:6652-6660. [PMID: 39083031 PMCID: PMC11325537 DOI: 10.1021/acs.jctc.4c00712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
We revisit the problem of unphysical charge density delocalization/fractionalization induced by the self-interaction error of common approximate Kohn-Sham (KS) density functional theory functionals on simulation of small to medium-sized proteins in a vacuum. Aside from producing unphysical electron densities and total energies, the vanishing of the HOMO-LUMO gap associated with the unphysical charge delocalization leads to an unphysical low-energy spectrum and catastrophic failure of most popular solvers for the KS self-consistent field (SCF) problem. We apply a robust quasi-Newton SCF solver [ Phys. Chem. Chem. Phys. 2024, 26, 6557] to obtain solutions for some of these difficult cases. The anatomy of the charge delocalization is revealed by the natural deformation orbitals obtained from the density matrix difference between the Hartree-Fock and KS solutions; the charge delocalization not only can occur between charged fragments (such as in zwitterionic polypeptides) but also involves neutral fragments. The vanishing-gap phenomenon and troublesome SCF convergence are both attributed to the unphysical KS Fock operator eigenspectra of molecular fragments (e.g., amino acids or their side chains). Analysis of amino acid pairs suggests that the unphysical charge delocalization can be partially ameliorated by the use of some range-separated hybrid functionals but not by semilocal or standard hybrid functionals. Last, we demonstrate that solutions without the unphysical charge delocalization can be located even for semilocal KS functionals highly prone to such defects, but such solutions have non-Aufbau character and are unstable with respect to mixing of the non-overlapping "frontier" orbitals. Caution should be exercised when unexpectedly small (or vanishing) HOMO-LUMO gaps and atypical SCF convergence patterns (e.g., oscillatory) are observed in KS DFT simulations in any context (bio or otherwise).
Collapse
Affiliation(s)
- Samuel A Slattery
- Department of Chemistry, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Jaden C Yon
- Department of Chemistry, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Edward F Valeev
- Department of Chemistry, Virginia Tech, Blacksburg, Virginia 24061, United States
| |
Collapse
|
5
|
Grambow CA, Weir H, Cunningham CN, Biancalani T, Chuang KV. CREMP: Conformer-rotamer ensembles of macrocyclic peptides for machine learning. Sci Data 2024; 11:859. [PMID: 39122750 PMCID: PMC11316032 DOI: 10.1038/s41597-024-03698-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 07/29/2024] [Indexed: 08/12/2024] Open
Abstract
Computational and machine learning approaches to model the conformational landscape of macrocyclic peptides have the potential to enable rational design and optimization. However, accurate, fast, and scalable methods for modeling macrocycle geometries remain elusive. Recent deep learning approaches have significantly accelerated protein structure prediction and the generation of small-molecule conformational ensembles, yet similar progress has not been made for macrocyclic peptides due to their unique properties. Here, we introduce CREMP, a resource generated for the rapid development and evaluation of machine learning models for macrocyclic peptides. CREMP contains 36,198 unique macrocyclic peptides and their high-quality structural ensembles generated using the Conformer-Rotamer Ensemble Sampling Tool (CREST). Altogether, this new dataset contains nearly 31.3 million unique macrocycle geometries, each annotated with energies derived from semi-empirical extended tight-binding (xTB) DFT calculations. Additionally, we include 3,258 macrocycles with reported passive permeability data to couple conformational ensembles to experiment. We anticipate that this dataset will enable the development of machine learning models that can improve peptide design and optimization for novel therapeutics.
Collapse
Affiliation(s)
- Colin A Grambow
- Prescient Design, Genentech, 1 DNA Way, South San Francisco, CA, 94080, USA.
| | - Hayley Weir
- Prescient Design, Genentech, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Christian N Cunningham
- Department of Peptide Therapeutics, Genentech, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Tommaso Biancalani
- Biology Research | Development, Genentech, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Kangway V Chuang
- Prescient Design, Genentech, 1 DNA Way, South San Francisco, CA, 94080, USA.
| |
Collapse
|
6
|
Poole D, Williams-Young DB, Jiang A, Glick ZL, Sherrill CD. A modular, composite framework for the utilization of reduced-scaling Coulomb and exchange construction algorithms: Design and implementation. J Chem Phys 2024; 161:052503. [PMID: 39092936 DOI: 10.1063/5.0216760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 07/08/2024] [Indexed: 08/04/2024] Open
Abstract
Multiple algorithms exist for calculating Coulomb (J) or exchange (K) contributions to Fock-like matrices, and it is beneficial to develop a framework that allows the seamless integration and combination of different J and K construction algorithms. In Psi4, we have implemented the "CompositeJK" formalism for this purpose. CompositeJK allows for the combination of any J and K construction algorithms for any quantum chemistry method formulated in terms of J-like or K-like matrices (including, but not limited to, Hartree-Fock and density functional theory) in a highly modular and intuitive fashion, which is simple to utilize for both developers and users. Using the CompositeJK framework, Psi4 was interfaced to the sn-LinK implementation in the GauXC library, adding the first instance of noncommercial graphics processing unit (GPU) support for the construction of Fock matrix elements to Psi4. On systems with hundreds of atoms, the interface to the CPU sn-LinK implementation displays a higher performance than all the alternative JK construction methods available in Psi4, with up to x2.8 speedups compared to existing Psi4JK implementations. The GPU sn-LinK implementation, harnessing the power of GPUs, improves the observed performance gains to up to x7.0.
Collapse
Affiliation(s)
- David Poole
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, USA
| | - David B Williams-Young
- Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Andy Jiang
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, USA
| | - Zachary L Glick
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, USA
| | - C David Sherrill
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, USA
| |
Collapse
|
7
|
Xu R, Jiang Z, Yang Q, Bloino J, Biczysko M. Harmonic and anharmonic vibrational computations for biomolecular building blocks: Benchmarking DFT and basis sets by theoretical and experimental IR spectrum of glycine conformers. J Comput Chem 2024; 45:1846-1869. [PMID: 38682874 DOI: 10.1002/jcc.27377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 04/01/2024] [Accepted: 04/02/2024] [Indexed: 05/01/2024]
Abstract
Advanced vibrational spectroscopic experiments have reached a level of sophistication that can only be matched by numerical simulations in order to provide an unequivocal analysis, a crucial step to understand the structure-function relationship of biomolecules. While density functional theory (DFT) has become the standard method when targeting medium-size or larger systems, the problem of its reliability and accuracy are well-known and have been abundantly documented. To establish a reliable computational protocol, especially when accuracy is critical, a tailored benchmark is usually required. This is generally done over a short list of known candidates, with the basis set often fixed a priori. In this work, we present a systematic study of the performance of DFT-based hybrid and double-hybrid functionals in the prediction of vibrational energies and infrared intensities at the harmonic level and beyond, considering anharmonic effects through vibrational perturbation theory at the second order. The study is performed for the six-lowest energy glycine conformers, utilizing available "state-of-the-art" accurate theoretical and experimental data as reference. Focusing on the most intense fundamental vibrations in the mid-infrared range of glycine conformers, the role of the basis sets is also investigated considering the balance between computational cost and accuracy. Targeting larger systems, a broad range of hybrid schemes with different computational costs is also tested.
Collapse
Affiliation(s)
- Ruiqin Xu
- Department of Physics, College of Sciences, Shanghai University, Shanghai, China
| | | | - Qin Yang
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Science, Prague, Czechia
| | - Julien Bloino
- Classe di Scienze, Scuola Normale Superiore, Pisa, Italy
| | - Malgorzata Biczysko
- Department of Physics, College of Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
8
|
Lee HJ, Liu SW, Sulyok-Eiler M, Harmat V, Farkas V, Bánóczi Z, El Khabchi M, Shawn Fan HJ, Hirao K, Song JW. Neighbor effect on conformational spaces of alanine residue in azapeptides. Heliyon 2024; 10:e33159. [PMID: 39021983 PMCID: PMC11253059 DOI: 10.1016/j.heliyon.2024.e33159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 06/10/2024] [Accepted: 06/14/2024] [Indexed: 07/20/2024] Open
Abstract
The conformational properties of Alanine (Ala) residue have been investigated to understand protein folding and develop force fields. In this work, we examined the neighbor effect on the conformational spaces of Ala residue using model azapeptides, Ac-Ala-azaGly-NHMe (3, AaG), and Ac-azaGly-Ala-NHMe (4, aGA1). Ramachandran energy maps were generated by scanning (φ, ψ) dihedral angles of the Ala residues in models with the fixed dihedral angles (φ = ±90°, ψ = ±0° or ±180°) of azaGly residue using LCgau-BOP and LCgau-BOP + LRD functionals in the gas and water phases. The integral-equation-formalism polarizable continuum model (IEF-PCM) and a solvation model density (SMD) were employed to mimic the solvation effect. The most favorable conformation of Ala residue in azapeptide models is found as the polyproline II (βP), inverse γ-turn (γ'), β-sheet (βS), right-handed helix (αR), or left-handed helix (αL) depending on the conformation of neighbor azaGly residue in isolated form. Solvation methods exhibit that the Ala residue favors the βP, δR, and αR conformations regardless of its position in azapeptides 3 and 4 in water. Azapeptide 5, Ac-azaGly-Ala-NH2 (aGA2), was synthesized to evaluate the theoretical results. The X-ray structure showed that azaGly residue adopts the polyproline II (βP) and Ala residue adopts the right-handed helical (αR) structure in aGA2. The conformational preferences of aGA2 and the dimer structure of aGA2 based on the X-ray structure were examined to assess the performance of DFT functionals. In addition, the local minima of azapeptide 6, Ac-Phe-azaGly-NH2 (FaG), were compared with the previous experimental results. SMD/LCgau-BOP + LRD methods agreed well with the reported experimental results. The results suggest the importance of weak dispersion interactions, neighbor effect, and solvent influence in the conformational preferences of Ala residue in model azapeptides.
Collapse
Affiliation(s)
- Ho-Jin Lee
- Division of Natural and Mathematics Sciences, LeMoyne-Own College, Memphis, TN, 38126, USA
- Department of Natural Sciences, Southwest Tennessee Community College, Memphis, TN, 38015, USA
| | - Shi-Wei Liu
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, PR China
| | - Máté Sulyok-Eiler
- Laboratory of Structural Biology and Chemistry, Institute of Chemistry, Eötvös Loránd University, Budapest, Hungary
- Hevesy György PhD School of Chemistry, Eötvös Loránd University, Budapest, Hungary
| | - Veronika Harmat
- Laboratory of Structural Biology and Chemistry, Institute of Chemistry, Eötvös Loránd University, Budapest, Hungary
- HUN-REN - ELTE Protein Modeling Research Group, Budapest, Hungary
| | - Viktor Farkas
- Laboratory of Structural Biology and Chemistry, Institute of Chemistry, Eötvös Loránd University, Budapest, Hungary
- HUN-REN - ELTE Protein Modeling Research Group, Budapest, Hungary
| | - Zoltán Bánóczi
- Department of Organic Chemistry, Institute of Chemistry, ELTE Eötvös Loránd University, 1117, Budapest, Hungary
- HUN-REN-ELTE Research Group of Peptide Chemistry, 1117, Budapest, Hungary
| | - Mouna El Khabchi
- LIMAS, Faculty of Sciences Dhar El Mahraz, University Sidi Mohamed Ben Abdallah, Fez, Morocco
| | - Hua-Jun Shawn Fan
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, PR China
| | - Kimihiko Hirao
- Fukui Institute for Fundamental Chemistry, Kyoto University, Takano, Nishihiraki-cho 34-4, Sakyo-ku, Kyoto, 606-8103, Japan
| | - Jong-Won Song
- Department of Chemistry Education, Daegu University, Daegudae-ro 201, Gyeongsan-si, Gyeongsangbuk-do, 38453, Republic of Korea
| |
Collapse
|
9
|
Osifová Z, Kalvoda T, Galgonek J, Culka M, Vondrášek J, Bouř P, Bednárová L, Andrushchenko V, Dračínský M, Rulíšek L. What are the minimal folding seeds in proteins? Experimental and theoretical assessment of secondary structure propensities of small peptide fragments. Chem Sci 2024; 15:594-608. [PMID: 38179543 PMCID: PMC10763034 DOI: 10.1039/d3sc04960d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 11/22/2023] [Indexed: 01/06/2024] Open
Abstract
Certain peptide sequences, some of them as short as amino acid triplets, are significantly overpopulated in specific secondary structure motifs in folded protein structures. For example, 74% of the EAM triplet is found in α-helices, and only 3% occurs in the extended parts of proteins (typically β-sheets). In contrast, other triplets (such as VIV and IYI) appear almost exclusively in extended parts (79% and 69%, respectively). In order to determine whether such preferences are structurally encoded in a particular peptide fragment or appear only at the level of a complex protein structure, NMR, VCD, and ECD experiments were carried out on selected tripeptides: EAM (denoted as pro-'α-helical' in proteins), KAM(α), ALA(α), DIC(α), EKF(α), IYI(pro-β-sheet or more generally, pro-extended), and VIV(β), and the reference α-helical CATWEAMEKCK undecapeptide. The experimental data were in very good agreement with extensive quantum mechanical conformational sampling. Altogether, we clearly showed that the pro-helical vs. pro-extended propensities start to emerge already at the level of tripeptides and can be fully developed at longer sequences. We postulate that certain short peptide sequences can be considered minimal "folding seeds". Admittedly, the inherent secondary structure propensity can be overruled by the large intramolecular interaction energies within the folded and compact protein structures. Still, the correlation of experimental and computational data presented herein suggests that the secondary structure propensity should be considered as one of the key factors that may lead to understanding the underlying physico-chemical principles of protein structure and folding from the first principles.
Collapse
Affiliation(s)
- Zuzana Osifová
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences Flemingovo náměstí 2, 160 00, Praha 6 Czech Republic
- Department of Organic Chemistry, Faculty of Science, Charles University Hlavova 2030 Prague 128 00 Czech Republic
| | - Tadeáš Kalvoda
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences Flemingovo náměstí 2, 160 00, Praha 6 Czech Republic
| | - Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences Flemingovo náměstí 2, 160 00, Praha 6 Czech Republic
| | - Martin Culka
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences Flemingovo náměstí 2, 160 00, Praha 6 Czech Republic
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences Flemingovo náměstí 2, 160 00, Praha 6 Czech Republic
| | - Petr Bouř
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences Flemingovo náměstí 2, 160 00, Praha 6 Czech Republic
| | - Lucie Bednárová
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences Flemingovo náměstí 2, 160 00, Praha 6 Czech Republic
| | - Valery Andrushchenko
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences Flemingovo náměstí 2, 160 00, Praha 6 Czech Republic
| | - Martin Dračínský
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences Flemingovo náměstí 2, 160 00, Praha 6 Czech Republic
| | - Lubomír Rulíšek
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences Flemingovo náměstí 2, 160 00, Praha 6 Czech Republic
| |
Collapse
|
10
|
Domenichini G, Dellago C. Molecular Hessian matrices from a machine learning random forest regression algorithm. J Chem Phys 2023; 159:194111. [PMID: 37982481 DOI: 10.1063/5.0169384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 10/27/2023] [Indexed: 11/21/2023] Open
Abstract
In this article, we present a machine learning model to obtain fast and accurate estimates of the molecular Hessian matrix. In this model, based on a random forest, the second derivatives of the energy with respect to redundant internal coordinates are learned individually. The internal coordinates together with their specific representation guarantee rotational and translational invariance. The model is trained on a subset of the QM7 dataset but is shown to be applicable to larger molecules picked from the QM9 dataset. From the predicted Hessian, it is also possible to obtain reasonable estimates of the vibrational frequencies, normal modes, and zero point energies of the molecules.
Collapse
Affiliation(s)
- Giorgio Domenichini
- Faculty of Physics, University of Vienna, Kolingasse 14-16, 1090 Vienna, Austria
| | - Christoph Dellago
- Faculty of Physics, University of Vienna, Kolingasse 14-16, 1090 Vienna, Austria
| |
Collapse
|
11
|
Mortensen JC, Damjanovic J, Miao J, Hui T, Lin Y. A backbone-dependent rotamer library with high (ϕ, ψ) coverage using metadynamics simulations. Protein Sci 2022; 31:e4491. [PMID: 36327064 PMCID: PMC9679973 DOI: 10.1002/pro.4491] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 10/26/2022] [Accepted: 10/28/2022] [Indexed: 12/06/2023]
Abstract
Backbone-dependent rotamer libraries are commonly used to assign the side chain dihedral angles of amino acids when modeling protein structures. Most rotamer libraries are created by curating protein crystal structure data and using various methods to extrapolate the existing data to cover all possible backbone conformations. However, these rotamer libraries may not be suitable for modeling the structures of cyclic peptides and other constrained peptides because these molecules frequently sample backbone conformations rarely seen in the crystal structures of linear proteins. To provide backbone-dependent side chain information beyond the α-helix, β-sheet, and PPII regions, we used explicit-solvent metadynamics simulations of model dipeptides to create a new rotamer library that has high coverage in the (ϕ, ψ) space. Furthermore, this approach can be applied to build high-coverage rotamer libraries for noncanonical amino acids. The resulting Metadynamics of Dipeptides for Rotamer Distribution (MEDFORD) rotamer library predicts the side chain conformations of high-resolution protein crystal structures with similar accuracy (~80%) to a state-of-the-art rotamer library. Our ability to test the accuracy of MEDFORD at predicting the side chain dihedral angles of amino acids in noncanonical backbone conformation is restricted by the limited structural data available for cyclic peptides. For the cyclic peptide data that are currently available, MEDFORD and the state-of-the-art rotamer library perform comparably. However, the two rotamer libraries indeed make different rotamer predictions in noncanonical (ϕ, ψ) regions. For noncanonical amino acids, the MEDFORD rotamer library predicts the χ1 values with approximately 75% accuracy.
Collapse
Affiliation(s)
| | | | - Jiayuan Miao
- Department of ChemistryTufts UniversityMedfordMassachusettsUSA
| | - Tiffani Hui
- Department of ChemistryTufts UniversityMedfordMassachusettsUSA
| | - Yu‐Shan Lin
- Department of ChemistryTufts UniversityMedfordMassachusettsUSA
| |
Collapse
|
12
|
Wang Y, Fass J, Kaminow B, Herr JE, Rufa D, Zhang I, Pulido I, Henry M, Bruce Macdonald HE, Takaba K, Chodera JD. End-to-end differentiable construction of molecular mechanics force fields. Chem Sci 2022; 13:12016-12033. [PMID: 36349096 PMCID: PMC9600499 DOI: 10.1039/d2sc02739a] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 09/05/2022] [Indexed: 01/07/2023] Open
Abstract
Molecular mechanics (MM) potentials have long been a workhorse of computational chemistry. Leveraging accuracy and speed, these functional forms find use in a wide variety of applications in biomolecular modeling and drug discovery, from rapid virtual screening to detailed free energy calculations. Traditionally, MM potentials have relied on human-curated, inflexible, and poorly extensible discrete chemical perception rules (atom types) for applying parameters to small molecules or biopolymers, making it difficult to optimize both types and parameters to fit quantum chemical or physical property data. Here, we propose an alternative approach that uses graph neural networks to perceive chemical environments, producing continuous atom embeddings from which valence and nonbonded parameters can be predicted using invariance-preserving layers. Since all stages are built from smooth neural functions, the entire process-spanning chemical perception to parameter assignment-is modular and end-to-end differentiable with respect to model parameters, allowing new force fields to be easily constructed, extended, and applied to arbitrary molecules. We show that this approach is not only sufficiently expressive to reproduce legacy atom types, but that it can learn to accurately reproduce and extend existing molecular mechanics force fields. Trained with arbitrary loss functions, it can construct entirely new force fields self-consistently applicable to both biopolymers and small molecules directly from quantum chemical calculations, with superior fidelity than traditional atom or parameter typing schemes. When adapted to simultaneously fit partial charge models, espaloma delivers high-quality partial atomic charges orders of magnitude faster than current best-practices with low inaccuracy. When trained on the same quantum chemical small molecule dataset used to parameterize the Open Force Field ("Parsley") openff-1.2.0 small molecule force field augmented with a peptide dataset, the resulting espaloma model shows superior accuracy vis-á-vis experiments in computing relative alchemical free energy calculations for a popular benchmark. This approach is implemented in the free and open source package espaloma, available at https://github.com/choderalab/espaloma.
Collapse
Affiliation(s)
- Yuanqing Wang
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA,Physiology, Biophysics and System Biology PhD Program, Weill Cornell Medical College, Cornell UniversityNew York 10065NYUSA,MFA Program in Creative Writing, Division of Humanities and Arts, City College of New York, City University of New YorkNew York 10031NYUSA
| | - Josh Fass
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA,Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell UniversityNew York 10065NYUSA
| | - Benjamin Kaminow
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA,Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell UniversityNew York 10065NYUSA
| | - John E. Herr
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA
| | - Dominic Rufa
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA,Tri-Institutional PhD Program in Chemical Biology, Weill Cornell Medical College, Cornell UniversityNew York 10065NYUSA
| | - Ivy Zhang
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA,Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell UniversityNew York 10065NYUSA
| | - Iván Pulido
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA
| | - Mike Henry
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA
| | - Hannah E. Bruce Macdonald
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA
| | - Kenichiro Takaba
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA,Pharmaceutical Research Center, Advanced Drug Discovery, Asahi Kasei Pharma CorporationShizuoka 410-2321Japan
| | - John D. Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA
| |
Collapse
|
13
|
Tuca E, DiLabio G, Otero-de-la-Roza A. Minimal Basis Set Hartree-Fock Corrected with Atom-Centered Potentials for Molecular Crystal Modeling and Crystal Structure Prediction. J Chem Inf Model 2022; 62:4107-4121. [PMID: 35980964 DOI: 10.1021/acs.jcim.2c00656] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Crystal structure prediction (CSP), determining the experimentally observable structure of a molecular crystal from the molecular diagram, is an important challenge with technologically relevant applications in materials manufacturing and drug design. For the purpose of screening the randomly generated candidate crystal structures, CSP protocols require energy ranking methods that are fast and can accurately capture the small energy differences between molecular crystals. In addition, a good ranking method should also produce accurate equilibrium geometries, both intramolecular and intermolecular. In this article, we explore the combination of minimal-basis-set Hartree-Fock (HF) with atom-centered potentials (ACPs) as a method for modeling the structure and energetics of molecular crystals. The ACPs are developed for the H, C, N, and O atoms and fitted to a set of reference data at the B86bPBE-XDM level in order to mitigate basis-set incompleteness and missing correlation. In particular, ACPs are developed in combination with two methods: HF-D3/MINIs and HF-3c. The application of ACPs greatly improves the performance of HF-D3/MINIs for lattice energies, crystal energy differences, energy-volume and energy-strain relations, and crystal geometries. In the case of HF-3c, the improvement in the crystal energy differences is much smaller than in HF-D3/MINIs, but lattice energies and particularly crystal geometries are considerably better when ACPs are used. The resulting methods may be useful for CSP but also for quick calculation of molecular crystal lattice energies and geometries.
Collapse
Affiliation(s)
- Emilian Tuca
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna V1 V 1 V7, British Columbia, Canada
| | - Gino DiLabio
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna V1 V 1 V7, British Columbia, Canada
| | - Alberto Otero-de-la-Roza
- Departamento de Química Física y Analítica and MALTA-Consolider Team, Facultad de Química, Universidad de Oviedo, 33006 Oviedo, Spain
| |
Collapse
|
14
|
Prasad VK, Otero-de-la-Roza A, DiLabio GA. Small-Basis Set Density-Functional Theory Methods Corrected with Atom-Centered Potentials. J Chem Theory Comput 2022; 18:2913-2930. [PMID: 35412817 DOI: 10.1021/acs.jctc.2c00036] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Density functional theory (DFT) is currently the most popular method for modeling noncovalent interactions and thermochemistry. The accurate calculation of noncovalent interaction energies, reaction energies, and barrier heights requires choosing an appropriate functional and, typically, a relatively large basis set. Deficiencies of the density-functional approximation and the use of a limited basis set are the leading sources of error in the calculation of noncovalent and thermochemical properties in molecular systems. In this article, we present three new DFT methods based on the BLYP, M06-2X, and CAM-B3LYP functionals in combination with the 6-31G* basis set and corrected with atom-centered potentials (ACPs). ACPs are one-electron potentials that have the same form as effective-core potentials, except they do not replace any electrons. The ACPs developed in this work are used to generate energy corrections to the underlying DFT/basis-set method such that the errors in predicted chemical properties are minimized while maintaining the low computational cost of the parent methods. ACPs were developed for the elements H, B, C, N, O, F, Si, P, S, and Cl. The ACP parameters were determined using an extensive training set of 118655 data points, mostly of complete basis set coupled-cluster level quality. The target molecular properties for the ACP-corrected methods include noncovalent interaction energies, molecular conformational energies, reaction energies, barrier heights, and bond separation energies. The ACPs were tested first on the training set and then on a validation set of 42567 additional data points. We show that the ACP-corrected methods can predict the target molecular properties with accuracy close to complete basis set wavefunction theory methods, but at a computational cost of double-ζ DFT methods. This makes the new BLYP/6-31G*-ACP, M06-2X/6-31G*-ACP, and CAM-B3LYP/6-31G*-ACP methods uniquely suited to the calculation of noncovalent, thermochemical, and kinetic properties in large molecular systems.
Collapse
Affiliation(s)
- Viki Kumar Prasad
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia V1V 1V7, Canada
| | - Alberto Otero-de-la-Roza
- Departamento de Química Física y Analítica, Facultad de Química, Universidad de Oviedo, MALTA Consolider Team, Oviedo E-33006, Spain
| | - Gino A DiLabio
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia V1V 1V7, Canada
| |
Collapse
|
15
|
Prasad VK, Otero-de-la-Roza A, DiLabio GA. Fast and Accurate Quantum Mechanical Modeling of Large Molecular Systems Using Small Basis Set Hartree-Fock Methods Corrected with Atom-Centered Potentials. J Chem Theory Comput 2022; 18:2208-2232. [PMID: 35313106 DOI: 10.1021/acs.jctc.1c01128] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
There has been significant interest in developing fast and accurate quantum mechanical methods for modeling large molecular systems. In this work, by utilizing a machine learning regression technique, we have developed new low-cost quantum mechanical approaches to model large molecular systems. The developed approaches rely on using one-electron Gaussian-type functions called atom-centered potentials (ACPs) to correct for the basis set incompleteness and the lack of correlation effects in the underlying minimal or small basis set Hartree-Fock (HF) methods. In particular, ACPs are proposed for ten elements common in organic and bioorganic chemistry (H, B, C, N, O, F, Si, P, S, and Cl) and four different base methods: two minimal basis sets (MINIs and MINIX) plus a double-ζ basis set (6-31G*) in combination with dispersion-corrected HF (HF-D3/MINIs, HF-D3/MINIX, HF-D3/6-31G*) and the HF-3c method. The new ACPs are trained on a very large set (73 832 data points) of noncovalent properties (interaction and conformational energies) and validated additionally on a set of 32 048 data points. All reference data are of complete basis set coupled-cluster quality, mostly CCSD(T)/CBS. The proposed ACP-corrected methods are shown to give errors in the tenths of a kcal/mol range for noncovalent interaction energies and up to 2 kcal/mol for molecular conformational energies. More importantly, the average errors are similar in the training and validation sets, confirming the robustness and applicability of these methods outside the boundaries of the training set. In addition, the performance of the new ACP-corrected methods is similar to complete basis set density functional theory (DFT) but at a cost that is orders of magnitude lower, and the proposed ACPs can be used in any computational chemistry program that supports effective-core potentials without modification. It is also shown that ACPs improve the description of covalent and noncovalent bond geometries of the underlying methods and that the improvement brought about by the application of the ACPs is directly related to the number of atoms to which they are applied, allowing the treatment of systems containing some atoms for which ACPs are not available. Overall, the ACP-corrected methods proposed in this work constitute an alternative accurate, economical, and reliable quantum mechanical approach to describe the geometries, interaction energies, and conformational energies of systems with hundreds to thousands of atoms.
Collapse
Affiliation(s)
- Viki Kumar Prasad
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia, Canada V1V 1V7
| | - Alberto Otero-de-la-Roza
- MALTA Consolider Team, Departamento de Química Física y Analítica, Facultad de Química, Universidad de Oviedo, E-33006 Oviedo, Spain
| | - Gino A DiLabio
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia, Canada V1V 1V7
| |
Collapse
|
16
|
Prasad VK, Pei Z, Edelmann S, Otero-de-la-Roza A, DiLabio GA. BH9, a New Comprehensive Benchmark Data Set for Barrier Heights and Reaction Energies: Assessment of Density Functional Approximations and Basis Set Incompleteness Potentials. J Chem Theory Comput 2021; 18:151-166. [PMID: 34911294 DOI: 10.1021/acs.jctc.1c00694] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The calculation of accurate reaction energies and barrier heights is essential in computational studies of reaction mechanisms and thermochemistry. To assess methods regarding their ability to predict these two properties, high-quality benchmark sets are required that comprise a reasonably large and diverse set of organic reactions. Due to the time-consuming nature of both locating transition states and computing accurate reference energies for reactions involving large molecules, previous benchmark sets have been limited in scope, the number of reactions considered, and the size of the reactant and product molecules. Recent advances in coupled-cluster theory, in particular local correlation methods like DLPNO-CCSD(T), now allow the calculation of reaction energies and barrier heights for relatively large systems. In this work, we present a comprehensive and diverse benchmark set of barrier heights and reaction energies based on DLPNO-CCSD(T)/CBS called BH9. BH9 comprises 449 chemical reactions belonging to nine types common in organic chemistry and biochemistry. We examine the accuracy of DLPNO-CCSD(T) vis-a-vis canonical CCSD(T) for a subset of BH9 and conclude that, although there is a penalty in using the DLPNO approximation, the reference data are accurate enough to serve as a benchmark for density functional theory (DFT) methods. We then present two applications of the BH9 set. First, we examine the performance of several density functional approximations commonly used in thermochemical and mechanistic studies. Second, we assess our basis set incompleteness potentials regarding their ability to mitigate basis set incompleteness errors. The number of data points, the diversity of the reactions considered, and the relatively large size of the reactant molecules make BH9 the most comprehensive thermochemical benchmark set to date and a useful tool for the development and assessment of computational methods.
Collapse
Affiliation(s)
- Viki Kumar Prasad
- Department of Chemistry, University of British Columbia, 3247 University Way, Kelowna, British Columbia, Canada V1V 1V7
| | - Zhipeng Pei
- Department of Chemistry, University of British Columbia, 3247 University Way, Kelowna, British Columbia, Canada V1V 1V7
| | - Simon Edelmann
- Department of Chemistry, University of British Columbia, 3247 University Way, Kelowna, British Columbia, Canada V1V 1V7
| | - Alberto Otero-de-la-Roza
- Departamento de Química Física y Analítica and MALTA Consolider Team, Facultad de Química, Universidad de Oviedo, 33006 Oviedo, Spain
| | - Gino A DiLabio
- Department of Chemistry, University of British Columbia, 3247 University Way, Kelowna, British Columbia, Canada V1V 1V7
| |
Collapse
|
17
|
Wang P, Shu C, Ye H, Biczysko M. Structural and Energetic Properties of Amino Acids and Peptides Benchmarked by Accurate Theoretical and Experimental Data. J Phys Chem A 2021; 125:9826-9837. [PMID: 34752094 DOI: 10.1021/acs.jpca.1c06504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Structural, energetic, and spectroscopic data derived in this work aim at the setup of an "experimentally validated" database for amino acids and polypeptides conformers. First, the "cheap" composite scheme (ChS, CCSD(T)/(CBS+CV)MP2) is tested for evaluation of conformational energies of all eight stable conformers of glycine, by comparing to the more accurate CCSD(T)/CBS+CV computations (Phys. Chem. Chem. Phys. 2013, 15, 10094-10111 and J Mol. Model. 2020, 26, 129). The recently proposed jun-ChS (J. Chem. Theory and Comput. 2020, 16, 988-1006), employing the jun-cc-pVnZ basis set family for CCSD(T) computations and CBS extrapolation, yields conformational energies accurate to 0.2 kJ·mol-1, at reduced computational cost with respect to aug-ChS employing aug-cc-pVnZ basis sets. The jun-ChS composite scheme is further applied to derive conformational energies for three dipeptide analogues Ac-Gly-NH2, Ac-Ala-NH2, and Gly-Gly. Finally, dipeptide conformational energies and semiexperimental equilibrium rotational constants along with the CCSD(T)/(CBS+CV)MP2 structural parameters (J. Phys. Chem. Lett. 2014, 5, 534-540) stand as the reference for benchmarking of selected density functional methodologies. The double-hybrid functionals B2-PLYP-D3(BJ) and DSD-PBEP86, perform best for structural and energetic characterization of all dipeptide analogues. From hybrid functionals CAM-B3LYP-D3(BJ) and ωB97X-D3(BJ) represent promising methods applicable for larger peptide-based systems for which computations with double-hybrid functionals are not feasible.
Collapse
Affiliation(s)
- Ping Wang
- International Centre for Quantum and Molecular Structures, Physics Department, College of Science, Shanghai University, 99 Shangda Road, Shanghai 200444, China
| | - Chong Shu
- International Centre for Quantum and Molecular Structures, Physics Department, College of Science, Shanghai University, 99 Shangda Road, Shanghai 200444, China
| | - Hexu Ye
- International Centre for Quantum and Molecular Structures, Physics Department, College of Science, Shanghai University, 99 Shangda Road, Shanghai 200444, China
| | - Malgorzata Biczysko
- International Centre for Quantum and Molecular Structures, Physics Department, College of Science, Shanghai University, 99 Shangda Road, Shanghai 200444, China
| |
Collapse
|
18
|
Sheng M, Silvestrini F, Biczysko M, Puzzarini C. Structural and Vibrational Properties of Amino Acids from Composite Schemes and Double-Hybrid DFT: Hydrogen Bonding in Serine as a Test Case. J Phys Chem A 2021; 125:9099-9114. [PMID: 34623165 DOI: 10.1021/acs.jpca.1c06993] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The structures, relative stabilities, and vibrational wavenumbers of the two most stable conformers of serine, stabilized by the O-H···N, O-H···O═C and N-H···O-H intramolecular hydrogen bonds, have been evaluated by means of state-of-the-art composite schemes based on coupled-cluster (CC) theory. The so-called "cheap" composite approach (CCSD(T)/(CBS+CV)MP2) allowed determination of accurate equilibrium structures and harmonic vibrational wavenumbers, also pointing out significant corrections beyond the CCSD(T)/cc-pVTZ level. These accurate results stand as a reference for benchmarking selected hybrid and double-hybrid, dispersion-corrected DFT functionals. B2PLYP-D3 and DSDPBEP86 in conjunction with a triple-ζ basis set have been confirmed as effective methodologies for structural and spectroscopic studies of medium-sized flexible biomolecules, also showing intramolecular hydrogen bonding. These best performing double-hybrid functionals have been employed to simulate IR spectra by means of vibrational perturbation theory, also considering hybrid CC/DFT schemes. The best overall agreement with experiment, with mean absolute error of 8 cm-1, has been obtained by combining CCSD(T)/(CBS+CV)MP2 harmonic wavenumbers with B2PLYP-D3/maug-cc-pVTZ anharmonic corrections. Finally, a composite scheme entirely based on CCSD(T) calculations (CCSD(T)/CBS+CV) has been employed for energetics, further confirming that serine II is the most stable conformer, also when zero-point vibrational energy corrections are included.
Collapse
Affiliation(s)
- Mingzhu Sheng
- International Centre for Quantum and Molecular Structures, Physics Department, Shanghai University, 99 Shangda Road, Shanghai, 200444, China
| | - Filippo Silvestrini
- Department of Chemistry "Giacomo Ciamician", University of Bologna, Via F. Selmi 2, 40126 Bologna, Italy
| | - Malgorzata Biczysko
- International Centre for Quantum and Molecular Structures, Physics Department, Shanghai University, 99 Shangda Road, Shanghai, 200444, China
| | - Cristina Puzzarini
- Department of Chemistry "Giacomo Ciamician", University of Bologna, Via F. Selmi 2, 40126 Bologna, Italy
| |
Collapse
|
19
|
Guan X, Leven I, Heidar-Zadeh F, Head-Gordon T. Protein C-GeM: A Coarse-Grained Electron Model for Fast and Accurate Protein Electrostatics Prediction. J Chem Inf Model 2021; 61:4357-4369. [PMID: 34490776 DOI: 10.1021/acs.jcim.1c00388] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The electrostatic potential (ESP) is a powerful property for understanding and predicting electrostatic charge distributions that drive interactions between molecules. In this study, we compare various charge partitioning schemes including fitted charges, density-based quantum mechanical (QM) partitioning schemes, charge equilibration methods, and our recently introduced coarse-grained electron model, C-GeM, to describe the ESP for protein systems. When benchmarked against high quality density functional theory calculations of the ESP for tripeptides and the crambin protein, we find that the C-GeM model is of comparable accuracy to ab initio charge partitioning methods, but with orders of magnitude improvement in computational efficiency since it does not require either the electron density or the electrostatic potential as input.
Collapse
Affiliation(s)
- Xingyi Guan
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States.,Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Itai Leven
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States.,Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Farnaz Heidar-Zadeh
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States.,Department of Chemistry, Queen's University, Kingston, Ontario K7L 3N6, Canada
| | - Teresa Head-Gordon
- Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States.,Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States.,Departments of Bioengineering and Chemical and Biomolecular Engineering, University of California, Berkeley, California 94720, United States
| |
Collapse
|
20
|
Harder, better, faster, stronger: Large-scale QM and QM/MM for predictive modeling in enzymes and proteins. Curr Opin Struct Biol 2021; 72:9-17. [PMID: 34388673 DOI: 10.1016/j.sbi.2021.07.004] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 06/25/2021] [Accepted: 07/05/2021] [Indexed: 11/23/2022]
Abstract
Computational prediction of enzyme mechanism and protein function requires accurate physics-based models and suitable sampling. We discuss recent advances in large-scale quantum mechanical (QM) modeling of biochemical systems that have reduced the cost of high-accuracy models. Tradeoffs between sampling and accuracy have motivated modeling with molecular mechanics (MM) in a multiscale QM/MM or iterative approach. Limitations to both conventional density-functional theory and classical MM force fields remain for describing noncovalent interactions in comparison to experiment or wavefunction theory. Because predictions of enzyme action (i.e. electrostatics), free energy barriers, and mechanisms are sensitive to the protocol and embedding method in QM/MM, convergence tests and systematic methods for quantifying QM-level interactions are a needed, active area of development.
Collapse
|
21
|
Fogalli GB, Line SRP. Estimating the Influence of Physicochemical and Biochemical Property Indexes on Selection for Amino Acids Usage in Eukaryotic Cells. J Mol Evol 2021; 89:257-268. [PMID: 33760966 DOI: 10.1007/s00239-021-10003-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 03/10/2021] [Indexed: 11/26/2022]
Abstract
Proteins can evolve by accumulating changes on amino acid sequences. These changes are mainly caused by missense mutations on its DNA coding sequences. Mutations with neutral or positive effects on fitness can be maintained while deleterious mutations tend to be eliminated by natural selection. Amino acid changes are influenced by the biophysical, chemical, and biological properties of amino acids. There is a multiplicity of amino acid properties that can influence the function and expression of proteins. Amino acid properties can be expressed into numerical indexes, which can help to predict functional and structural aspects of proteins and allow statistical inferences of selection pressure on amino acid usage. The accuracy of these analyses may be compromised by the existence of several numerical indexes that measure the same amino acid property, and the lack of objective parameters to determine the most accurate and biologically relevant index. In the present study, the gradient consistency test was used in order to estimate the magnitude of directional selection imparted by amino acid biochemical and biophysical properties on protein evolution.
Collapse
Affiliation(s)
- Giovani B Fogalli
- Department of Biosciences, Piracicaba Dental School, University of Campinas, Campinas, Brazil
| | - Sergio R P Line
- Department of Biosciences, Piracicaba Dental School, University of Campinas, Campinas, Brazil.
| |
Collapse
|
22
|
Gutten O, Jurečka P, Aliakbar Tehrani Z, Buděšínský M, Řezáč J, Rulíšek L. Conformational energies and equilibria of cyclic dinucleotides in vacuo and in solution: computational chemistry vs. NMR experiments. Phys Chem Chem Phys 2021; 23:7280-7294. [PMID: 33876088 DOI: 10.1039/d0cp05993e] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Performance of computational methods in modelling cyclic dinucleotides - an important and challenging class of compounds - has been evaluated by two different benchmarks: (1) gas-phase conformational energies and (2) qualitative agreement with NMR observations of the orientation of the χ-dihedral angle in solvent. In gas-phase benchmarks, where CCSD(T) and DLPNO-CCSD(T) methods have been used as the reference, most of the (dispersion corrected) density functional approximations are accurate enough to justify prioritizing computational cost and compatibility with other modelling options as the criterion of choice. NMR experiments of 3'3'-c-di-AMP, 3'3'-c-GAMP, and 3'3'-c-di-GMP show the overall prevalence of the anti-conformation of purine bases, but some population of syn-conformations is observed for guanines. Implicit solvation models combined with quantum-chemical methods struggle to reproduce this behaviour, probably due to a lack of dynamics and explicitly modelled solvent, leading to structures that are too compact. Molecular dynamics simulations overrepresent the syn-conformation of guanine due to the overestimation of an intramolecular hydrogen bond. Our combination of experimental and computational benchmarks provides "error bars" for modelling cyclic dinucleotides in solvent, where such information is generally difficult to obtain, and should help gauge the interpretability of studies dealing with binding of cyclic dinucleotides to important pharmaceutical targets. At the same time, the presented analysis calls for improvement in both implicit solvation models and force-field parameters.
Collapse
Affiliation(s)
- Ondrej Gutten
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 2, 166 10, Praha 6, Czech Republic.
| | | | | | | | | | | |
Collapse
|
23
|
Culka M, Kalvoda T, Gutten O, Rulíšek L. Mapping Conformational Space of All 8000 Tripeptides by Quantum Chemical Methods: What Strain Is Affordable within Folded Protein Chains? J Phys Chem B 2021; 125:58-69. [PMID: 33393778 DOI: 10.1021/acs.jpcb.0c09251] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
To gain more insight into the physicochemical aspects of a protein structure from the first principles, conformational space of all 8000 "capped" tripeptides (i.e., N-Ac-X1X2X3-NH-CH3, where Xi is one of the 20 natural amino acids) was investigated computationally. An enormous dataset (denoted P-CONF_1.6M and containing close to 1 600 000 conformers in total) has been obtained by employing a composite protocol combining density functional theory, semiempirical quantum mechanics (SQM), and state-of-the-art solvation methods with 1000 K molecular dynamics (MD) used to generate initial structures (200 snapshots for each tripeptide). This allowed us to present the first rigorous QM-based glimpse at the vast conformational space spanned by small protein fragments. The same computational procedure was repeated for tripeptide fragments taken from the SCOPe database of three-dimensional protein folds, by restraining them to their geometry in a protein. Such complementary data allowed us to compare the distribution of conformational strain energies of unrestrained tripeptidic fragments "in solvent" with those in existing protein chains. Besides providing a rigorous (ab initio) proof of a few well-known concepts and hypotheses concerning protein structures, such as the distribution of (φ, ψ) angles in Ramachandran plots, we have made several observations that came as a certain surprise: (1) distribution of conformational energies does not significantly differ between the "unbiased/unrestrained" conformers obtained from MD sampling in solvent and the biased conformers, i.e., those of a given tripeptide obtained from protein structures; (2) conformational (strain) energy window up to ∼20 to 25 kcal·mol-1 is readily available to tripeptide fragments within the context of a protein chain; (3) overpopulation in certain regions of Ramachandran plot was observed for the unbiased conformers. Last but not least, the massive dataset of accurate (DFT-D3//COSMO-RS) conformational (free) energies of ∼1.6 M peptide conformers, P-CONF_1.6M, obtained throughout this work may serve as excellent dataset for calibrating and benchmarking of popular force fields.
Collapse
Affiliation(s)
- Martin Culka
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 2, 166 10 Praha 6, Czech Republic
| | - Tadeáš Kalvoda
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 2, 166 10 Praha 6, Czech Republic
| | - Ondrej Gutten
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 2, 166 10 Praha 6, Czech Republic
| | - Lubomír Rulíšek
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 2, 166 10 Praha 6, Czech Republic
| |
Collapse
|
24
|
Bursch M, Hansen A, Pracht P, Kohn JT, Grimme S. Theoretical study on conformational energies of transition metal complexes. Phys Chem Chem Phys 2021; 23:287-299. [PMID: 33336657 DOI: 10.1039/d0cp04696e] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Conformational energies are an important chemical property for which a performance assessment of theoretical methods is mandatory. Existing benchmark sets are often limited to biochemical or main group element containing molecules, while organometallic systems are generally less studied. A key problem herein is to routinely generate conformers for these molecules due to their complexity and manifold of possible coordination patterns. In this study we used our recently published CREST protocol [Pracht et al., Phys. Chem. Chem. Phys., 2020, 22, 7169-7192] to generate conformer ensembles for a variety of 40 challenging transition metal containing molecules, which were then used to form a comprehensive conformational energy benchmark set termed TMCONF40. Several low-cost semiempirical, density functional theory (DFT) and force-field methods were compared to high level DLPNO-CCSD(T1) and double-hybrid DFT reference values. Close attention was paid to the energetic ordering of the conformers in the statistical evaluation. With respect to the double-hybrid references, both tested low-cost composite DFT methods produce high Pearson correlation coefficients of rp,mean,B97-3c//B97-3c = 0.922 and rp,mean,PBEh-3c//B97-3c = 0.890, with mean absolute deviations close to or below 1 kcal mol-1. This good performance also holds for a comparison to DLPNO-CCSD(T1) reference energies for a smaller subset termed TMCONF5. Based on DFT geometries, the GFNn-xTB methods yield reasonable Pearson correlation coefficients of rp,mean,GFN1-xTB//B97-3c = 0.617 (MADmean = 2.15 kcal mol-1) and rp,mean,GFN2-xTB//B97-3c = 0.567 (MADmean = 2.68 kcal mol-1), outperforming the widely used PMx methods on the TMCONF40 test set. Employing the low-cost composite DFT method B97-3c on GFN2-xTB geometries yields an slightly improved correlation of rp,mean,B97-3c//GFN2-xTB = 0.632. Furthermore, for 68% of the investigated complexes at least one low-energy conformer was found that is more stable than the respective crystal structure conformation, which signals the importance of conformational studies. General recommendations for the application of the CREST protocol and DFT methods for transition metal conformational energies are given.
Collapse
Affiliation(s)
- Markus Bursch
- Mulliken Center for Theoretical Chemistry, Universität Bonn, Beringstr. 4, 53115 Bonn, Germany.
| | | | | | | | | |
Collapse
|
25
|
Otero-de-la-Roza A, DiLabio GA. Improved Basis-Set Incompleteness Potentials for Accurate Density-Functional Theory Calculations in Large Systems. J Chem Theory Comput 2020; 16:4176-4191. [PMID: 32470304 DOI: 10.1021/acs.jctc.0c00102] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The accurate calculation of chemical properties using density-functional theory (DFT) requires the use of a nearly complete basis set. In chemical systems involving hundreds to thousands of atoms, the cost of the calculations place practical limitations on the number of basis functions that can be used. Therefore, in most practical applications of DFT to large systems, there exists a basis-set incompleteness error (BSIE). In this article, we present the next iteration of the basis-set incompleteness potentials (BSIPs), one-electron potentials designed to correct for basis-set incompleteness error. The ultimate goal associated with the development of BSIPs is to allow the calculation of molecular properties using DFT with near-complete-basis-set results at a computational cost that is similar to a small basis set calculation. In this work, we develop BSIPs for 10 atoms in the first and second rows (H, B-F, Si-Cl) and 15 common basis sets of the Pople, Dunning, Karlsruhe, and Huzinaga types. Our new BSIPs are constructed to minimize BSIE in the calculation of reaction energies, barrier heights, noncovalent binding energies, and intermolecular distances. The BSIPs were obtained using a training set of 15 944 data points. The fitting approach employed a regularized linear least-squares method with variable selection (the LASSO method), which results in a much better fit to the training data than our previous BSIPs while, at the same time, reducing the computational cost of BSIP development. The proposed BSIPs are tested on various benchmark sets and demonstrate excellent performance in practice. Our new BSIPs are also transferable; i.e., they can be used to correct BSIE in calculations that employ density functionals other than the one used in the BSIP development (B3LYP). Finally, BSIPs can be used in any quantum chemistry program that have implemented effective-core potentials without changes to the software.
Collapse
Affiliation(s)
- A Otero-de-la-Roza
- Departamento de Quı́mica Fı́sica y Analítica and MALTA-Consolider Team, Facultad de Quı́mica, Universidad de Oviedo, 33006 Oviedo, Spain
| | - Gino A DiLabio
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia V1V 1V7, Canada.,Faculty of Management, University of British Columbia, Okanagan, 1137 Alumni Avenue, Kelowna, British Columbia V1V 1V7, Canada
| |
Collapse
|
26
|
Chan B. Aqueous-Phase Conformations of Lactose, Maltose, and Sucrose and the Assessment of Low-Cost DFT Methods with the DSCONF Set of Conformers for the Three Disaccharides. J Phys Chem A 2020; 124:582-590. [PMID: 31927999 DOI: 10.1021/acs.jpca.9b10932] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
In the present study, we have examined a range of quantum chemistry methods for the calculation of conformers for lactose, maltose, and sucrose. We find that the DSD-PBE-P86/aug'-cc-pVTZ//B3-LYP-D3BJ/6-311+G(2d,p) protocol yields good relative energies in comparison with reference CCSD(T)/CBS//B3-LYP-D3BJ/maug-cc-pVTZ values. We have surveyed a total of ∼550 conformers for the three disaccharides with the chosen DSD-PBE-P86 method in conjunction with continuum aqueous solvation. In each case, the lowest free energy conformer is characterized by hydrogen bond(s) between the two rings. Another finding is that the major contributors to the overall variations in aqueous free energies are the electronic energies and the solvation energies. To facilitate investigations of larger systems, we have compiled the DSCONF set of conformers for the three disaccharides, and we have assessed lower cost methods with this set. We find MS1-D3/6-31+G(2d,p) to be cost-effective and accurate for both geometry optimization and the calculation of relative energies for disaccharides. In addition, we note that MS1-D3 has previously been found to yield good relative energies for the WATER27 set of water clusters. We thus deem this method to be appropriate for the study of saccharide conformations in both gas phase and aqueous solution.
Collapse
Affiliation(s)
- Bun Chan
- Graduate School of Engineering , Nagasaki University , Bunkyo 1-14 , Nagasaki 852-8521 , Japan
| |
Collapse
|
27
|
Wappett DA, Goerigk L. Toward a Quantum-Chemical Benchmark Set for Enzymatically Catalyzed Reactions: Important Steps and Insights. J Phys Chem A 2019; 123:7057-7074. [DOI: 10.1021/acs.jpca.9b05088] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
| | - Lars Goerigk
- School of Chemistry, The University of Melbourne, Victoria 3010, Australia
| |
Collapse
|