1
|
Duprat F, Ploix JL, Dreyfus G. Can Graph Machines Accurately Estimate 13C NMR Chemical Shifts of Benzenic Compounds? Molecules 2024; 29:3137. [PMID: 38999091 PMCID: PMC11243075 DOI: 10.3390/molecules29133137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 06/27/2024] [Accepted: 06/28/2024] [Indexed: 07/14/2024] Open
Abstract
In the organic laboratory, the 13C nuclear magnetic resonance (NMR) spectrum of a newly synthesized compound remains an essential step in elucidating its structure. For the chemist, the interpretation of such a spectrum, which is a set of chemical-shift values, is made easier if he/she has a tool capable of predicting with sufficient accuracy the carbon-shift values from the structure he/she intends to prepare. As there are few open-source methods for accurately estimating this property, we applied our graph-machine approach to build models capable of predicting the chemical shifts of carbons. For this study, we focused on benzene compounds, building an optimized model derived from training a database of 10,577 chemical shifts originating from 2026 structures that contain up to ten types of non-carbon atoms, namely H, O, N, S, P, Si, and halogens. It provides a training root-mean-squared relative error (RMSRE) of 0.5%, i.e., a root-mean-squared error (RMSE) of 0.6 ppm, and a mean absolute error (MAE) of 0.4 ppm for estimating the chemical shifts of the 10k carbons. The predictive capability of the graph-machine model is also compared with that of three commercial packages on a dataset of 171 original benzenic structures (1012 chemical shifts). The graph-machine model proves to be very efficient in predicting chemical shifts, with an RMSE of 0.9 ppm, and compares favorably with the RMSEs of 3.4, 1.8, and 1.9 ppm computed with the ChemDraw v. 23.1.1.3, ACD v. 11.01, and MestReNova v. 15.0.1-35756 packages respectively. Finally, a Docker-based tool is proposed to predict the carbon chemical shifts of benzenic compounds solely from their SMILES codes.
Collapse
Affiliation(s)
- François Duprat
- Chimie Moléculaire, Macromoléculaire, Matériaux, ESPCI Paris, PSL University, 10 Rue Vauquelin, 75005 Paris, France
| | - Jean-Luc Ploix
- Chimie Moléculaire, Macromoléculaire, Matériaux, ESPCI Paris, PSL University, 10 Rue Vauquelin, 75005 Paris, France
| | - Gérard Dreyfus
- Chimie Moléculaire, Macromoléculaire, Matériaux, ESPCI Paris, PSL University, 10 Rue Vauquelin, 75005 Paris, France
| |
Collapse
|
2
|
Han C, Zhang D, Xia S, Zhang Y. Accurate Prediction of NMR Chemical Shifts: Integrating DFT Calculations with Three-Dimensional Graph Neural Networks. J Chem Theory Comput 2024; 20:5250-5258. [PMID: 38842505 PMCID: PMC11209944 DOI: 10.1021/acs.jctc.4c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/25/2024] [Accepted: 05/29/2024] [Indexed: 06/07/2024]
Abstract
Computer prediction of NMR chemical shifts plays an increasingly important role in molecular structure assignment and elucidation for organic molecule studies. Density functional theory (DFT) and gauge-including atomic orbital (GIAO) have established a framework to predict NMR chemical shifts but often at a significant computational expense with a limited prediction accuracy. Recent advancements in deep learning methods, especially graph neural networks (GNNs), have shown promise in improving the accuracy of predicting experimental chemical shifts, either by using 2D molecular topological features or 3D conformational representation. This study presents a new 3D GNN model to predict 1H and 13C chemical shifts, CSTShift, that combines atomic features with DFT-calculated shielding tensor descriptors, capturing both isotropic and anisotropic shielding effects. Utilizing the NMRShiftDB2 data set and conducting DFT optimization and GIAO calculations at the B3LYP/6-31G(d) level, we prepared the NMRShiftDB2-DFT data set of high-quality 3D structures and shielding tensors with corresponding experimentally measured 1H and 13C chemical shifts. The developed CSTShift models achieve the state-of-the-art prediction performance on both the NMRShiftDB2-DFT test data set and external CHESHIRE data set. Further case studies on identifying correct structures from two groups of constitutional isomers show its capability for structure assignment and elucidation. The source code and data are accessible at https://yzhang.hpc.nyu.edu/IMA.
Collapse
Affiliation(s)
- Chao Han
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Dongdong Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
3
|
Sajed T, Sayeeda Z, Lee BL, Berjanskii M, Wang F, Gautam V, Wishart DS. Accurate Prediction of 1H NMR Chemical Shifts of Small Molecules Using Machine Learning. Metabolites 2024; 14:290. [PMID: 38786767 PMCID: PMC11123270 DOI: 10.3390/metabo14050290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 05/11/2024] [Accepted: 05/16/2024] [Indexed: 05/25/2024] Open
Abstract
NMR is widely considered the gold standard for organic compound structure determination. As such, NMR is routinely used in organic compound identification, drug metabolite characterization, natural product discovery, and the deconvolution of metabolite mixtures in biofluids (metabolomics and exposomics). In many cases, compound identification by NMR is achieved by matching measured NMR spectra to experimentally collected NMR spectral reference libraries. Unfortunately, the number of available experimental NMR reference spectra, especially for metabolomics, medical diagnostics, or drug-related studies, is quite small. This experimental gap could be filled by predicting NMR chemical shifts for known compounds using computational methods such as machine learning (ML). Here, we describe how a deep learning algorithm that is trained on a high-quality, "solvent-aware" experimental dataset can be used to predict 1H chemical shifts more accurately than any other known method. The new program, called PROSPRE (PROton Shift PREdictor) can accurately (mean absolute error of <0.10 ppm) predict 1H chemical shifts in water (at neutral pH), chloroform, dimethyl sulfoxide, and methanol from a user-submitted chemical structure. PROSPRE (pronounced "prosper") has also been used to predict 1H chemical shifts for >600,000 molecules in many popular metabolomic, drug, and natural product databases.
Collapse
Affiliation(s)
- Tanvir Sajed
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Zinat Sayeeda
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Brian L. Lee
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Mark Berjanskii
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Fei Wang
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Vasuk Gautam
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - David S. Wishart
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
- Department of Laboratory Medicine and Pathology, University of Alberta, Edmonton, AB T6G 2B7, Canada
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB T6G 2H7, Canada
| |
Collapse
|
4
|
Kuhn S, Kolshorn H, Steinbeck C, Schlörer N. Twenty years of nmrshiftdb2: A case study of an open database for analytical chemistry. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2024; 62:74-83. [PMID: 38112483 DOI: 10.1002/mrc.5418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 11/10/2023] [Accepted: 11/10/2023] [Indexed: 12/21/2023]
Abstract
In October 2003, 20 years ago, the open-source and open-content database NMRshiftDB was announced. Since then, the database, renamed as nmrshiftdb2 later, has been continuously available and is one of the longer-running projects in the field of open data in chemistry. After 20 years, we evaluate the success of the project and present lessons learnt for similar projects.
Collapse
Affiliation(s)
- Stefan Kuhn
- Institute of Computer Science, University of Tartu Tartu Estonia and School of Computer Science and Informatics, De Montfort University, Leicester, UK
| | - Heinz Kolshorn
- Department Chemie, Johannes Gutenberg-Universität Mainz, Mainz, Germany
| | - Christoph Steinbeck
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-Universität Jena, Jena, Germany
| | - Nils Schlörer
- NMR-Plattform, Friedrich-Schiller-Universität Jena, Jena, Germany
| |
Collapse
|
5
|
Hack J, Jordan M, Schmitt A, Raru M, Zorn HS, Seyfarth A, Eulenberger I, Geitner R. Ilm-NMR-P31: an open-access 31P nuclear magnetic resonance database and data-driven prediction of 31P NMR shifts. J Cheminform 2023; 15:122. [PMID: 38111059 PMCID: PMC10729349 DOI: 10.1186/s13321-023-00792-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 12/07/2023] [Indexed: 12/20/2023] Open
Abstract
This publication introduces a novel open-access 31P Nuclear Magnetic Resonance (NMR) shift database. With 14,250 entries encompassing 13,730 distinct molecules from 3,648 references, this database offers a comprehensive repository of organic and inorganic compounds. Emphasizing single-phosphorus atom compounds, the database facilitates data mining and machine learning endeavors, particularly in signal prediction and Computer-Assisted Structure Elucidation (CASE) systems. Additionally, the article compares different models for 31P NMR shift prediction, showcasing the database's potential utility. Hierarchically Ordered Spherical Environment (HOSE) code-based models and Graph Neural Networks (GNNs) perform exceptionally well with a mean squared error of 11.9 and 11.4 ppm respectively, achieving accuracy comparable to quantum chemical calculations.
Collapse
Affiliation(s)
- Jasmin Hack
- Institute of Chemistry and Bioengineering, Group of Physical Chemistry/Catalysis, Technical University Ilmenau, Weimarer Str. 32, 98693, Ilmenau, Germany
| | - Moritz Jordan
- Institute of Chemistry and Bioengineering, Group of Physical Chemistry/Catalysis, Technical University Ilmenau, Weimarer Str. 32, 98693, Ilmenau, Germany
| | - Alina Schmitt
- Institute of Chemistry and Bioengineering, Group of Physical Chemistry/Catalysis, Technical University Ilmenau, Weimarer Str. 32, 98693, Ilmenau, Germany
| | - Melissa Raru
- Institute of Chemistry and Bioengineering, Group of Physical Chemistry/Catalysis, Technical University Ilmenau, Weimarer Str. 32, 98693, Ilmenau, Germany
| | - Hannes Sönke Zorn
- Institute of Chemistry and Bioengineering, Group of Physical Chemistry/Catalysis, Technical University Ilmenau, Weimarer Str. 32, 98693, Ilmenau, Germany
| | - Alex Seyfarth
- Institute of Chemistry and Bioengineering, Group of Physical Chemistry/Catalysis, Technical University Ilmenau, Weimarer Str. 32, 98693, Ilmenau, Germany
| | - Isabel Eulenberger
- Institute of Chemistry and Bioengineering, Group of Physical Chemistry/Catalysis, Technical University Ilmenau, Weimarer Str. 32, 98693, Ilmenau, Germany
| | - Robert Geitner
- Institute of Chemistry and Bioengineering, Group of Physical Chemistry/Catalysis, Technical University Ilmenau, Weimarer Str. 32, 98693, Ilmenau, Germany.
| |
Collapse
|
6
|
Rull H, Fischer M, Kuhn S. NMR shift prediction from small data quantities. J Cheminform 2023; 15:114. [PMID: 38012793 PMCID: PMC10683292 DOI: 10.1186/s13321-023-00785-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Accepted: 11/16/2023] [Indexed: 11/29/2023] Open
Abstract
Prediction of chemical shift in NMR using machine learning methods is typically done with the maximum amount of data available to achieve the best results. In some cases, such large amounts of data are not available, e.g. for heteronuclei. We demonstrate a novel machine learning model that is able to achieve better results than other models for relevant datasets with comparatively low amounts of data. We show this by predicting [Formula: see text] and [Formula: see text] NMR chemical shifts of small molecules in specific solvents.
Collapse
Affiliation(s)
- Herman Rull
- Department of Computer Science, Tartu University, Narva mnt 18, Tartu, 51009, Tartumaa, Estonia
| | - Markus Fischer
- Institute for Medical Physics and Biophysics, Leipzig University, Härtelstr. 16-18, 04107, Leipzig, Sachsen, Germany
| | - Stefan Kuhn
- Department of Computer Science, Tartu University, Narva mnt 18, Tartu, 51009, Tartumaa, Estonia.
| |
Collapse
|
7
|
Nuzillard JM. Use of carbon-13 NMR to identify known natural products by querying a nuclear magnetic resonance database-An assessment. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2023; 61:582-588. [PMID: 37583258 DOI: 10.1002/mrc.5386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 07/26/2023] [Accepted: 07/29/2023] [Indexed: 08/17/2023]
Abstract
The quick identification of known organic low molecular weight compounds, also known as structural dereplication, is a highly important task in the chemical profiling of natural resource extracts. To that end, a method that relies on carbon-13 nuclear magnetic resonance (NMR) spectroscopy, elaborated in earlier works of the author's research group, requires the availability of a dedicated database that establishes relationships between chemical structures, biological and chemical taxonomy, and spectroscopy. The construction of such a database, called acd_lotus, was reported earlier, and its usefulness was illustrated by only three examples. This article presents the results of structure searches carried out starting from 58 carbon-13 NMR data sets recorded on compounds selected in the metabolomics section of the biological magnetic resonance bank (BMRB). Two compound retrieval methods were employed. The first one involves searching in the acd_lotus database using commercial software. The second one operates through the freely accessible web interface of the nmrshiftdb2 database, which includes the compounds present in acd_lotus and many others. The two structural dereplication methods have proved to be efficient and can be used together in a complementary way.
Collapse
|
8
|
Borges RM, Ferreira GDA, Campos MM, Teixeira AM, Costa FDN, das Chagas FO, Colonna M. NMR as a tool for compound identification in mixtures. PHYTOCHEMICAL ANALYSIS : PCA 2023. [PMID: 37128872 DOI: 10.1002/pca.3229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 04/05/2023] [Accepted: 04/07/2023] [Indexed: 05/03/2023]
Abstract
INTRODUCTION Natural products and metabolomics are intrinsically linked through efforts to analyze complex mixtures for compound annotation. Although most studies that aim for compound identification in mixtures use MS as the main analysis technique, NMR has complementary advances that are worth exploring for enhanced structural confidence. OBJECTIVE This review aimed to showcase a portfolio of the main tools available for compound identification using NMR. MATERIALS AND METHODS COLMAR, SMART-NMR, MADByTE, and NMRfilter are presented using examples collected from real samples from the perspective of a natural product chemist. Data are also made available through Zenodo so that readers can test each case presented here. CONCLUSION The acquisition of 1 H NMR, HSQC, TOCSY, HSQC-TOCSY, and HMBC data for all samples and fractions from a natural products study is strongly suggested. The same is valid for MS analysis to create a bridged analysis between both techniques in a complementary manner. The use of NOAH supersequences has also been suggested and demonstrated to save NMR time.
Collapse
Affiliation(s)
- Ricardo Moreira Borges
- Instituto de Pesquisas de Produtos Naturais Walter Mors, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Gabriela de Assis Ferreira
- Instituto de Pesquisas de Produtos Naturais Walter Mors, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Mariana Martins Campos
- Instituto de Pesquisas de Produtos Naturais Walter Mors, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Andrew Magno Teixeira
- Instituto de Pesquisas de Produtos Naturais Walter Mors, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Fernanda das Neves Costa
- Instituto de Pesquisas de Produtos Naturais Walter Mors, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Fernanda Oliveira das Chagas
- Instituto de Pesquisas de Produtos Naturais Walter Mors, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Maxwell Colonna
- Departments of Genetics and Biochemistry & Molecular Biology, Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia, USA
| |
Collapse
|
9
|
Chhaganlal MN, Underhaug J, Mjøs SA. Evaluation of NMR predictors for accuracy and ability to reveal trends in 1 H NMR spectra of fatty acids. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2023; 61:318-332. [PMID: 36759332 DOI: 10.1002/mrc.5336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 02/04/2023] [Accepted: 02/07/2023] [Indexed: 06/18/2023]
Abstract
Four different nuclear magnetic resonance (NMR) predictors have been evaluated for their ability to predict 600-MHz 1 H spectra of free fatty acids and fatty acid methyl esters of 20 common fatty acids. The predictors were evaluated on two main criteria: (1) their accuracy in direct prediction of the spectra (absolute accuracy) and (2) the ability to reveal trends or predict the change that occurs in the spectra as a result of a change in the fatty acid carbon chain, or by esterification of the free fatty acids to methyl esters (relative accuracy). The absolute accuracy in chemical shift prediction for fatty acids was good, compared with previous reports on a broader range of compounds. All four predictors had median prediction errors for chemical shifts of the signals in fatty acid methyl esters well below 0.1 ppm and as low as 0.015 ppm for one of the predictors. However, all predictors also had outliers with errors far above the upper interquartile range. In general, they also fail to reproduce trends of diagnostic value that were observed in the experimental data or properly predict the result of a minor change in molecular structure. All four predictors depend on experimental data from different origins. This may be a limiting factor for the relative accuracy of the predictors.
Collapse
Affiliation(s)
| | - Jarl Underhaug
- Department of Chemistry, University of Bergen, Bergen, Norway
| | - Svein A Mjøs
- Department of Chemistry, University of Bergen, Bergen, Norway
| |
Collapse
|
10
|
Sherlock-A Free and Open-Source System for the Computer-Assisted Structure Elucidation of Organic Compounds from NMR Data. Molecules 2023; 28:molecules28031448. [PMID: 36771127 PMCID: PMC9920390 DOI: 10.3390/molecules28031448] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 12/21/2022] [Accepted: 12/22/2022] [Indexed: 02/05/2023] Open
Abstract
The structure elucidation of small organic molecules (<1500 Dalton) through 1D and 2D nuclear magnetic resonance (NMR) data analysis is a potentially challenging, combinatorial problem. This publication presents Sherlock, a free and open-source Computer-Assisted Structure Elucidation (CASE) software where the user controls the chain of elementary operations through a versatile graphical user interface, including spectral peak picking, addition of automatically or user-defined structure constraints, structure generation, ranking and display of the solutions. A set of forty-five compounds was selected in order to illustrate the new possibilities offered to organic chemists by Sherlock for improving the reliability and traceability of structure elucidation results.
Collapse
|
11
|
Borges RM, Gouveia GJ, das Chagas FO. Advances in Microbial NMR Metabolomics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2023; 1439:123-147. [PMID: 37843808 DOI: 10.1007/978-3-031-41741-2_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2023]
Abstract
Confidently, nuclear magnetic resonance (NMR) is the most informative technique in analytical chemistry and its use as an analytical platform in metabolomics is well proven. This chapter aims to present NMR as a viable tool for microbial metabolomics discussing its fundamental aspects and applications in metabolomics using some chosen examples.
Collapse
Affiliation(s)
- Ricardo Moreira Borges
- Instituto de Pesquisas de Produtos Naturais Walter Mors, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Gonçalo Jorge Gouveia
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
| | - Fernanda Oliveira das Chagas
- Instituto de Pesquisas de Produtos Naturais Walter Mors, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
12
|
Gerrard W, Yiu C, Butts CP. Prediction of 15 N chemical shifts by machine learning. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2022; 60:1087-1092. [PMID: 34407565 DOI: 10.1002/mrc.5208] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 06/16/2021] [Accepted: 08/13/2021] [Indexed: 06/13/2023]
Abstract
We demonstrate the potential for machine learning systems to predict three-dimensional (3D)-relevant NMR properties beyond traditional 1 H- and 13 C-based data, with comparable accuracy to density functional theory (DFT) (but orders of magnitude faster). Predictions of DFT-calculated 15 N chemical shifts for 3D molecular structures can be achieved using a machine learning system-IMPRESSION (Intelligent Machine PREdiction of Shift and Scalar information Of Nuclei), with an accuracy of 6.12-ppm mean absolute error (∼1% of the δ15 N chemical shift range) and an error of less than 20 ppm for 95% of the chemical shifts. It provides less accurate raw predictions of experimental chemical shifts, due to the limited size and chemical space diversity of the training dataset used in its creation, coupled with the limitations of the underlying DFT methodology in reproducing experiment.
Collapse
Affiliation(s)
- Will Gerrard
- Department of Chemistry, University of Bristol, Bristol, UK
| | - Calvin Yiu
- Department of Chemistry, University of Bristol, Bristol, UK
| | - Craig P Butts
- Department of Chemistry, University of Bristol, Bristol, UK
| |
Collapse
|
13
|
Jonas E, Kuhn S, Schlörer N. Prediction of chemical shift in NMR: A review. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2022; 60:1021-1031. [PMID: 34787335 DOI: 10.1002/mrc.5234] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 11/10/2021] [Accepted: 11/11/2021] [Indexed: 06/13/2023]
Abstract
Calculation of solution-state NMR parameters, including chemical shift values and scalar coupling constants, is often a crucial step for unambiguous structure assignment. Data-driven (sometimes called empirical) methods leverage databases of known parameter values to estimate parameters for unknown or novel molecules. This is in contrast to popular ab initio techniques that use detailed quantum computational chemistry calculations to arrive at parameter estimates. Data-driven methods have the potential to be considerably faster than ab inito techniques and have been the subject of renewed interest over the past decade with the rise of high-quality databases of NMR parameters and novel machine learning methods. Here, we review these methods, their strengths and pitfalls, and the databases they are built on.
Collapse
Affiliation(s)
- Eric Jonas
- Department of Computer Science, University of Chicago, Chicago, Illinois, 60637, USA
| | - Stefan Kuhn
- Cyber Technology Institute, De Montfort University, Leicester, LE1 9BH, UK
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Nils Schlörer
- NMR Core facility, Department of Chemistry, University of Cologne, Cologne, D-50939, Germany
| |
Collapse
|
14
|
Vulpetti A, Lingel A, Dalvit C, Schiering N, Oberer L, Henry C, Lu Y. Efficient Screening of Target-Specific Selected Compounds in Mixtures by 19F NMR Binding Assay with Predicted 19F NMR Chemical Shifts. ChemMedChem 2022; 17:e202200163. [PMID: 35475323 DOI: 10.1002/cmdc.202200163] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/26/2022] [Indexed: 11/06/2022]
Abstract
Ligand-based 19 F NMR screening is a highly effective and well-established hit-finding approach. The high sensitivity to protein binding makes it particularly suitable for fragment screening. Different criteria can be considered for generating fluorinated fragment libraries. One common strategy is to assemble a large, diverse, well-designed and characterized fragment library which is screened in mixtures, generated based on experimental 19 F NMR chemical shifts. Here, we introduce a complementary knowledge-based 19 F NMR screening approach, named 19 Focused screening, enabling the efficient screening of putative active molecules selected by computational hit finding methodologies, in mixtures assembled and on-the-fly deconvoluted based on predicted 19 F NMR chemical shifts. In this study, we developed a novel approach, named LEFshift , for 19 F NMR chemical shift prediction using rooted topological fluorine torsion fingerprints in combination with a random forest machine learning method. A demonstration of this approach to a real test case is reported.
Collapse
Affiliation(s)
- Anna Vulpetti
- Novartis Pharma AG, Global Discovery Chemistry, Novartis Campus, 4002, Basel, SWITZERLAND
| | - Andreas Lingel
- Novartis Institutes for BioMedical Research Basel, Global Discovery Chemistry, SWITZERLAND
| | - Claudio Dalvit
- Novartis Institutes for BioMedical Research Basel, Protease Platform, SWITZERLAND
| | - Nikolaus Schiering
- Novartis Institutes for BioMedical Research Basel, Protease Platform, SWITZERLAND
| | - Lukas Oberer
- Novartis Institutes for BioMedical Research Basel, Global Discovery Chemistry, SWITZERLAND
| | - Chrystelle Henry
- Novartis Institutes for BioMedical Research Basel, Protein Science, SWITZERLAND
| | - Yipin Lu
- Novartis Institutes for BioMedical Research Basel, Global Discovery Chemistry, SWITZERLAND
| |
Collapse
|
15
|
Molecular search by NMR spectrum based on evaluation of matching between spectrum and molecule. Sci Rep 2021; 11:20998. [PMID: 34697368 PMCID: PMC8546062 DOI: 10.1038/s41598-021-00488-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Accepted: 10/13/2021] [Indexed: 11/17/2022] Open
Abstract
Inferring molecular structures from experimentally measured nuclear magnetic resonance (NMR) spectra is an important task in many chemistry applications. Herein, we present a novel method implementing an automated molecular search by NMR spectrum. Given a query spectrum and a pool of candidate molecules, the matching score of each candidate molecule with respect to the query spectrum is evaluated by introducing a molecule-to-spectrum estimation procedure. The candidate molecule with the highest matching score is selected. This procedure does not require any prior knowledge of the corresponding molecular structure nor laborious manual efforts by chemists. We demonstrate the effectiveness of the proposed method on molecular search using 13C NMR spectra.
Collapse
|
16
|
The Advantage of Automatic Peer-Reviewing of 13C-NMR Reference Data Using the CSEARCH-Protocol. Molecules 2021; 26:molecules26113413. [PMID: 34200052 PMCID: PMC8200238 DOI: 10.3390/molecules26113413] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 05/25/2021] [Accepted: 05/31/2021] [Indexed: 11/23/2022] Open
Abstract
A systematic investigation of the experimental 13C-NMR spectra published in Molecules during the period of 1996 to 2015 with respect to their quality using CSEARCH-technology is described. It is shown that the systematic application of the CSEARCH-Robot-Referee during the peer-reviewing process prohibits at least the most trivial assignment errors and wrong structure proposals. In many cases, the correction of the assignments/chemical shift values is possible by manual inspection of the published tables; in certain cases, reprocessing of the original experimental data might help to clarify the situation, showing the urgent need for a public domain repository. A comparison of the significant key numbers derived for Molecules against those of other important journals in the field of natural product chemistry shows a quite similar level of quality for all publishers responsible for the six journals under investigation. From the results of this study, general rules for data handling, data storage, and manuscript preparation can be derived, helping to increase the quality of published NMR-data and making these data available as validated reference material.
Collapse
|
17
|
Egan JM, van Santen JA, Liu DY, Linington RG. Development of an NMR-Based Platform for the Direct Structural Annotation of Complex Natural Products Mixtures. JOURNAL OF NATURAL PRODUCTS 2021; 84:1044-1055. [PMID: 33750122 PMCID: PMC8330833 DOI: 10.1021/acs.jnatprod.0c01076] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
The development of new "omics" platforms is having a significant impact on the landscape of natural products discovery. However, despite the advantages that such platforms bring to the field, there remains no straightforward method for characterizing the chemical landscape of natural products libraries using two-dimensional nuclear magnetic resonance (2D-NMR) experiments. NMR analysis provides a powerful complement to mass spectrometric approaches, given the universal coverage of NMR experiments. However, the high degree of signal overlap, particularly in one-dimensional NMR spectra, has limited applications of this approach. To address this issue, we have developed a new data analysis platform for complex mixture analysis, termed MADByTE (Metabolomics and Dereplication by Two-Dimensional Experiments). This platform employs a combination of TOCSY and HSQC spectra to identify spin system features within complex mixtures and then matches spin system features between samples to create a chemical similarity network for a given sample set. In this report we describe the design and construction of the MADByTE platform and demonstrate the application of chemical similarity networks for both the dereplication of known compound scaffolds and the prioritization of bioactive metabolites from a bacterial prefractionated extract library.
Collapse
Affiliation(s)
- Joseph M Egan
- Department of Chemistry, Simon Fraser University, 8888 University Drive, Burnaby, British Columbia V5A 1S6, Canada
| | - Jeffrey A van Santen
- Department of Chemistry, Simon Fraser University, 8888 University Drive, Burnaby, British Columbia V5A 1S6, Canada
| | - Dennis Y Liu
- Department of Chemistry, Simon Fraser University, 8888 University Drive, Burnaby, British Columbia V5A 1S6, Canada
| | - Roger G Linington
- Department of Chemistry, Simon Fraser University, 8888 University Drive, Burnaby, British Columbia V5A 1S6, Canada
| |
Collapse
|
18
|
Kuhn S, Colreavy-Donnelly S, de Andrade Silva Quaresma LE, de Andrade Silva Quaresma E, Borges RM. Applying NMR compound identification using NMRfilter to match predicted to experimental data. Metabolomics 2020; 16:123. [PMID: 33222074 DOI: 10.1007/s11306-020-01748-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Accepted: 11/11/2020] [Indexed: 10/22/2022]
Abstract
INTRODUCTION Metabolomics is the approach of choice to guide the understanding of biological systems and its molecular intricacies, but compound identification is yet a bottleneck to be overcome. OBJECTIVE To assay the use of NMRfilter for confidence compound identification based on chemical shift predictions for different datasets. RESULTS We found comparable results using the lead tool COLMAR and NMRfilter. Then, we successfully assayed the use of HMBC to add confidence to the identified compounds. CONCLUSIONS NMRfilter is currently under development to become a stand-alone interactive software for high-confidence NMR compound identification and this communication gathers part of its application capabilities.
Collapse
Affiliation(s)
- Stefan Kuhn
- School of Computer Science and Informatics, De Montfort University, The Gateway, Leicester, LE1 9BH, UK
| | - Simon Colreavy-Donnelly
- School of Computer Science and Informatics, De Montfort University, The Gateway, Leicester, LE1 9BH, UK
| | | | | | - Ricardo Moreira Borges
- Walter Mors Institute of Research on Natural Products, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.
| |
Collapse
|
19
|
Cobas C. NMR signal processing, prediction, and structure verification with machine learning techniques. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2020; 58:512-519. [PMID: 31912547 DOI: 10.1002/mrc.4989] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 01/02/2020] [Accepted: 01/03/2020] [Indexed: 05/25/2023]
Abstract
Machine learning (ML) methods have been present in the field of NMR since decades, but it has experienced a tremendous growth in the last few years, especially thanks to the emergence of deep learning (DL) techniques taking advantage of the increased amounts of data and available computer power. These algorithms are successfully employed for classification, regression, clustering, or dimensionality reduction tasks of large data sets and have been intensively applied in different areas of NMR including metabonomics, clinical diagnosis, or relaxometry. In this article, we concentrate on the various applications of ML/DL in the areas of NMR signal processing and analysis of small molecules, including automatic structure verification and prediction of NMR observables in solution.
Collapse
Affiliation(s)
- Carlos Cobas
- Mestrelab Research, Santiago de Compostela, Spain
| |
Collapse
|
20
|
Gerrard W, Bratholm LA, Packer MJ, Mulholland AJ, Glowacki DR, Butts CP. IMPRESSION - prediction of NMR parameters for 3-dimensional chemical structures using machine learning with near quantum chemical accuracy. Chem Sci 2020; 11:508-515. [PMID: 32190270 PMCID: PMC7067266 DOI: 10.1039/c9sc03854j] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 11/18/2019] [Indexed: 02/06/2023] Open
Abstract
The IMPRESSION (Intelligent Machine PREdiction of Shift and Scalar information Of Nuclei) machine learning system provides an efficient and accurate method for the prediction of NMR parameters from 3-dimensional molecular structures. Here we demonstrate that machine learning predictions of NMR parameters, trained on quantum chemical computed values, can be as accurate as, but computationally much more efficient (tens of milliseconds per molecular structure) than, quantum chemical calculations (hours/days per molecular structure) starting from the same 3-dimensional structure. Training the machine learning system on quantum chemical predictions, rather than experimental data, circumvents the need for the existence of large, structurally diverse, error-free experimental databases and makes IMPRESSION applicable to solving 3-dimensional problems such as molecular conformation and stereoisomerism.
Collapse
Affiliation(s)
| | | | - Martin J Packer
- Chemistry , R&D Oncology , AstraZeneca , Cambridge CB4 0QA , UK
| | | | | | | |
Collapse
|