1
|
Sholokhova AY, Matyushin DD. Ready-to-use Models Built Using a Diverse Set of 266 Aroma Compounds for the Estimation of Gas Chromatographic Retention Indices for the 50%-Cyanopropylphenyl-50%-Dimethylpolysiloxane Stationary Phase. J Sep Sci 2024; 47:e70016. [PMID: 39494751 DOI: 10.1002/jssc.70016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 10/18/2024] [Accepted: 10/22/2024] [Indexed: 11/05/2024]
Abstract
Retention index prediction based on the molecule structure is not often used in practice due to low accuracy, the need to use paid software to calculate molecular descriptors (MD), and the narrow applicability domain of many models. In recent years, relatively accurate and versatile deep learning (DL)-based models have emerged. These models are now used in practice as an additional criterion in gas chromatography-mass spectrometry identification. The DB-225ms stationary phase (usually described as 50%-cyanopropylphenyl-50%-dimethylpolysiloxane in available sources) is widely used, but ready-to-use retention index estimation models are not available for it. This study presents such models. The models are linear and use simple constitutional MD and retention indices predicted by DL for the DB-WAX and DB-624 stationary phases as MD (we show that it is their use that allows us to achieve satisfactory accuracy). The accuracy obtained for a completely unseen hold-out test set: root mean square error 73.2; mean absolute error 45.7; median absolute error 22.0. The models were trained using a retention data set of 266 volatile compounds. All calculations can be performed using the convenient open-source software CHERESHNYA. The final equations are implemented as a spreadsheet and a code snippet and are available online: https://doi.org/10.6084/m9.figshare.26800789.
Collapse
Affiliation(s)
- Anastasia Yu Sholokhova
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow, Russia
| | - Dmitriy D Matyushin
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
2
|
An automated workflow on data processing (AutoDP) for semiquantitative analysis of urine organic acids with GC-MS to facilitate diagnosis of inborn errors of metabolism. Clin Chim Acta 2023; 540:117230. [PMID: 36682441 DOI: 10.1016/j.cca.2023.117230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 12/29/2022] [Accepted: 01/13/2023] [Indexed: 01/22/2023]
Abstract
Determination of urine organic acids (UOAs) is essential to understand the disease progress of inborn errors of metabolism (IEM) and often relies on GC-MS analysis. However, the efficiency of analytical reports is sometimes restricted by data processing due to labor-intensive work if no proper tool is employed. Herein, we present a simple and rapid workflow with an R-based script for automated data processing (AutoDP) of GC-MS raw files to quantitatively analyze essential UOAs. AutoDP features automatic quality checks, compound identification and confirmation with specific fragment ions, retention time correction from analytical batches, and visualization of abnormal UOAs with age-matched references on chromatograms. Compared with manual processing, AutoDP greatly reduces analytical time and increases the number of identifications. Speeding up data processing is expected to shorten the waiting time for clinical diagnosis, which could greatly benefit clinicians and patients with IEM. In addition, with quantitative results obtained from AutoDP, it would be more feasible to perform retrospective analysis of specific UOAs in IEM and could provide new perspectives for studying IEM.
Collapse
|
3
|
Qu C, Schneider BI, Kearsley AJ, Keyrouz W, Allison TC. Predicting Kováts Retention Indices Using Graph Neural Networks. J Chromatogr A 2021; 1646:462100. [PMID: 33892256 DOI: 10.1016/j.chroma.2021.462100] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Revised: 03/16/2021] [Accepted: 03/22/2021] [Indexed: 11/16/2022]
Abstract
The Kováts retention index is a dimensionless quantity that characterizes the rate at which a compound is processed through a gas chromatography column. This quantity is independent of many experimental variables and, as such, is considered a near-universal descriptor of retention time on a chromatography column. The Kováts retention indices of a large number of molecules have been determined experimentally. The "NIST 20: GC Method/Retention Index Library" database has collected and, more importantly, curated retention indices of a subset of these compounds resulting in a highly valued reference database. The experimental data in the library form an ideal data set for training machine learning models for the prediction of retention indices of unknown compounds. In this article, we describe the training of a graph neural network model to predict the Kováts retention index for compounds in the NIST library and compare this approach with previous work [1]. We predict the Kováts retention index with a mean unsigned error of 28 index units as compared to 44, the putative best result using a convolutional neural network [1]. The NIST library also incorporates an estimation scheme based on a group contribution approach that achieves a mean unsigned error of 114 compared to the experimental data. Our method uses the same input data source as the group contribution approach, making its application straightforward and convenient to apply to existing libraries. Our results convincingly demonstrate the predictive powers of systematic, data-driven approaches leveraging deep learning methodologies applied to chemical data and for the data in the NIST 20 library outperform previous models.
Collapse
Affiliation(s)
- Chen Qu
- National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, USA.
| | - Barry I Schneider
- National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, USA.
| | - Anthony J Kearsley
- National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, USA.
| | - Walid Keyrouz
- National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, USA.
| | - Thomas C Allison
- National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, USA.
| |
Collapse
|
4
|
Tan P, Xu L, Wei XC, Huang HZ, Zhang DK, Zeng CJ, Geng FN, Bao XM, Hua H, Zhao JN. Rapid Screening and Quantitative Analysis of 74 Pesticide Residues in Herb by Retention Index Combined with GC-QQQ-MS/MS. JOURNAL OF ANALYTICAL METHODS IN CHEMISTRY 2021; 2021:8816854. [PMID: 33510929 PMCID: PMC7826212 DOI: 10.1155/2021/8816854] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 11/25/2020] [Indexed: 06/12/2023]
Abstract
In this research, a very practical QuEChERS-GC-MS/MS analytical approach for 74 pesticide residues in herb based on retention index was established. This novel analytical approach has two important technical advantages. One advantage is to quickly screen pesticide compounds in herbs without having to use a large number of pesticide standard substances at the beginning of the experiment. The other advantage is to assist in identifying the target pesticide compound accurately. A total of 74 kinds of pesticides were quickly prescreened in all chuanxiong rhizoma samples. The results showed that three kinds of pesticides were screened out in all the samples, including chlorpyrifos, fipronil, and procymidone, and the three pesticides were qualitatively and quantitatively determined. The RSD values for interday and intraday variation were acquired to evaluate the precision of the analytical approach, and the overall interday and intraday variations are not more than 1.97% and 3.82%, respectively. The variations of concentrations of the analyzed three pesticide compounds in sample CX16 are 0.74%-4.15%, indicating that the three pesticides in the sample solutions were stable in 48 h. The spiked recoveries of the three pesticides are 95.22%, 93.03%, and 94.31%, and the RSDs are less than ± 6.0%. The methodological verification results indicated the good reliability and accuracy of the new analytical method. This research work is a new application of retention index, and it will be a valuable tool to assist quickly and accurately in the qualitative and quantitative analysis of multipesticide residues in herbs.
Collapse
Affiliation(s)
- Peng Tan
- College of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
- Sichuan Academy of Traditional Chinese Medicine, State Key Laboratory of Quality Evaluation of Traditional Chinese Medicine, Chengdu 610041, China
| | - Li Xu
- College of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Xi-Chuan Wei
- College of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Hao-Zhou Huang
- College of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Ding-Kun Zhang
- College of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Chen-Juan Zeng
- Sichuan Key Laboratory for Medicinal American Cockroach, Sichuan Good Doctor Panxi Pharmaceutical Co.,Ltd., Chengdu 610000, China
| | - Fu-Neng Geng
- Sichuan Key Laboratory for Medicinal American Cockroach, Sichuan Good Doctor Panxi Pharmaceutical Co.,Ltd., Chengdu 610000, China
| | - Xiao-Ming Bao
- Shimadzu Enterprise Management (China) Co.,Ltd., Chengdu 610023, China
| | - Hua Hua
- Sichuan Academy of Traditional Chinese Medicine, State Key Laboratory of Quality Evaluation of Traditional Chinese Medicine, Chengdu 610041, China
| | - Jun-Ning Zhao
- Sichuan Academy of Traditional Chinese Medicine, State Key Laboratory of Quality Evaluation of Traditional Chinese Medicine, Chengdu 610041, China
| |
Collapse
|
5
|
Ji H, Deng H, Lu H, Zhang Z. Predicting a Molecular Fingerprint from an Electron Ionization Mass Spectrum with Deep Neural Networks. Anal Chem 2020; 92:8649-8653. [DOI: 10.1021/acs.analchem.0c01450] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Affiliation(s)
- Hongchao Ji
- College of Chemistry and Chemical Engineering, Central South University, Changsha, Hunan 410083, PR China
| | - Hanzi Deng
- College of Chemistry and Chemical Engineering, Central South University, Changsha, Hunan 410083, PR China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha, Hunan 410083, PR China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha, Hunan 410083, PR China
| |
Collapse
|
6
|
Matyushin DD, Karnaeva AE, Buryak AK. Molecular Statistical Modeling for the Identification of Unknown Compounds. RUSSIAN JOURNAL OF PHYSICAL CHEMISTRY A 2020. [DOI: 10.1134/s003602442003022x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
7
|
Matyushin DD, Sholokhova AY, Buryak AK. A deep convolutional neural network for the estimation of gas chromatographic retention indices. J Chromatogr A 2019; 1607:460395. [DOI: 10.1016/j.chroma.2019.460395] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 06/15/2019] [Accepted: 07/22/2019] [Indexed: 10/26/2022]
|
8
|
Optimization enhanced genetic algorithm-support vector regression for the prediction of compound retention indices in gas chromatography. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.11.070] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
9
|
Knorr A, Monge A, Stueber M, Stratmann A, Arndt D, Martin E, Pospisil P. Computer-Assisted Structure Identification (CASI)—An Automated Platform for High-Throughput Identification of Small Molecules by Two-Dimensional Gas Chromatography Coupled to Mass Spectrometry. Anal Chem 2013; 85:11216-24. [DOI: 10.1021/ac4011952] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Arno Knorr
- Philip Morris International R&D, Philip Morris Products S.A., 2000 Neuchâtel, Switzerland
| | - Aurelien Monge
- Philip Morris International R&D, Philip Morris Products S.A., 2000 Neuchâtel, Switzerland
| | - Markus Stueber
- Philip Morris International R&D, Philip Morris Research Laboratories GmbH, 51149 Köln, Germany
| | - André Stratmann
- Philip Morris International R&D, Philip Morris Research Laboratories GmbH, 51149 Köln, Germany
| | - Daniel Arndt
- Philip Morris International R&D, Philip Morris Products S.A., 2000 Neuchâtel, Switzerland
| | - Elyette Martin
- Philip Morris International R&D, Philip Morris Products S.A., 2000 Neuchâtel, Switzerland
| | - Pavel Pospisil
- Philip Morris International R&D, Philip Morris Products S.A., 2000 Neuchâtel, Switzerland
| |
Collapse
|
10
|
Wachsmuth CJ, Vogl FC, Oefner PJ, Dettmer K. Gas Chromatographic Techniques in Metabolomics. CHROMATOGRAPHIC METHODS IN METABOLOMICS 2013. [DOI: 10.1039/9781849737272-00087] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
High chemical diversity and abundances ranging from trace to millimolar levels still constitute at times insurmountable challenges in the comprehensive analysis of metabolites in biomedical specimens. Nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) hyphenated with separation techniques such as liquid chromatography (LC), gas chromatography (GC) and capillary electrophoresis (CE) are the most frequently used techniques for both targeted and discovery‐driven metabolomics. Of the separation techniques, comprehensive two‐dimensional gas chromatography (GC×GC) offers the highest peak resolution and capacity, and in combination with MS lower quantification limits in the submicromolar concentration range are realized. Moreover, electron ionization (EI), the most prominent ionization technique for GC‐MS, is highly reproducible, facilitating the generation of mass spectral libraries for routine metabolite identification. However, GC analysis often requires a derivatization prior to analysis and not all metabolite derivatives are recorded in the libraries available. Consequently, metabolite identification is still a major challenge. To identify unknown metabolite signals, soft ionization techniques in combination with high‐resolution MS are employed to determine the accurate mass of the quasi‐molecular ion. The latter is used to calculate elemental formulae that can be fed into metabolite databases for a putative identification or used for the interpretation of EI spectra.
Collapse
Affiliation(s)
- Christian J. Wachsmuth
- Institute of Functional Genomics University of Regensburg, Josef‐Engert‐Strasse 9, 93053 Regensburg Germany ‐regensburg.de
| | - Franziska C. Vogl
- Institute of Functional Genomics University of Regensburg, Josef‐Engert‐Strasse 9, 93053 Regensburg Germany ‐regensburg.de
| | - Peter J. Oefner
- Institute of Functional Genomics University of Regensburg, Josef‐Engert‐Strasse 9, 93053 Regensburg Germany ‐regensburg.de
| | - Katja Dettmer
- Institute of Functional Genomics University of Regensburg, Josef‐Engert‐Strasse 9, 93053 Regensburg Germany ‐regensburg.de
| |
Collapse
|
11
|
Identification of terpenoids from Ephedra combining with accurate mass and in-silico retention indices. Talanta 2013. [DOI: 10.1016/j.talanta.2012.10.018] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
12
|
Giaginis C, Tsantili-Kakoulidou A. Quantitative Structure–Retention Relationships as Useful Tool to Characterize Chromatographic Systems and Their Potential to Simulate Biological Processes. Chromatographia 2012. [DOI: 10.1007/s10337-012-2374-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
13
|
Menikarachchi LC, Cawley S, Hill DW, Hall LM, Hall L, Lai S, Wilder J, Grant DF. MolFind: a software package enabling HPLC/MS-based identification of unknown chemical structures. Anal Chem 2012; 84:9388-94. [PMID: 23039714 PMCID: PMC3523192 DOI: 10.1021/ac302048x] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
In this paper, we present MolFind, a highly multithreaded pipeline type software package for use as an aid in identifying chemical structures in complex biofluids and mixtures. MolFind is specifically designed for high-performance liquid chromatography/mass spectrometry (HPLC/MS) data inputs typical of metabolomics studies where structure identification is the ultimate goal. MolFind enables compound identification by matching HPLC/MS-based experimental data obtained for an unknown compound with computationally derived HPLC/MS values for candidate compounds downloaded from chemical databases such as PubChem. The downloaded "bins" consist of all compounds matching the monoisotopic molecular weight of the unknown. The computational HPLC/MS values predicted include retention index (RI), ECOM(50) (energy required to fragment 50% of a selected precursor ion), drift time, and collision induced dissociation (CID) spectrum. RI, ECOM(50), and drift-time models are used for filtering compounds downloaded from PubChem. The remaining candidates are then ranked based on CID spectra matching. Current RI and ECOM(50) models allow for the removal of about 28% of compounds from PubChem bins. Our estimates suggest that this could be improved to as much as 87% with additional chemical structures included in the computational models. Quantitative structure property relationship-based modeling of drift times showed a better correlation with experimentally determined drift times than did Mobcal cross-sectional areas. In 23 of 35 example cases, filtering PubChem bins with RI and ECOM(50) predictive models resulted in improved ranking of the unknown compounds compared to previous studies using CID spectra matching alone. In 19 of 35 examples, the correct candidate was ranked within the top 20 compounds in bins containing an average of 1635 compounds.
Collapse
Affiliation(s)
- Lochana C. Menikarachchi
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut, United States
| | - Shannon Cawley
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut, United States
| | - Dennis W. Hill
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut, United States
| | - L. Mark Hall
- Hall Associates Consulting, Quincy, Massachusetts, United States
| | - Lowell Hall
- Department of Chemistry, Eastern Nazarene College, Quincy, Massachusetts, United States
| | - Steven Lai
- Waters Corporation, Beverly, Massachusetts, United States
| | - Janine Wilder
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut, United States
| | - David F. Grant
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut, United States
| |
Collapse
|
14
|
Krumsiek J, Suhre K, Evans AM, Mitchell MW, Mohney RP, Milburn MV, Wägele B, Römisch-Margl W, Illig T, Adamski J, Gieger C, Theis FJ, Kastenmüller G. Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information. PLoS Genet 2012; 8:e1003005. [PMID: 23093944 PMCID: PMC3475673 DOI: 10.1371/journal.pgen.1003005] [Citation(s) in RCA: 148] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Accepted: 08/16/2012] [Indexed: 12/22/2022] Open
Abstract
Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these “unknown metabolites” is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype–metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms. Genome-wide association studies on metabolomics data have demonstrated that genetic variation in metabolic enzymes and transporters leads to concentration changes in the respective metabolite levels. The conventional goal of these studies is the detection of novel interactions between the genome and the metabolic system, providing valuable insights for both basic research as well as clinical applications. In this study, we borrow the metabolomics GWAS concept for a novel, entirely different purpose. Metabolite measurements frequently produce signals where a certain substance can be reliably detected in the sample, but it has not yet been elucidated which specific metabolite this signal actually represents. The concept is comparable to a fingerprint: each one is uniquely identifiable, but as long as it is not registered in a database one cannot tell to whom this fingerprint belongs. Obviously, this issue tremendously reduces the usability of a metabolomics analyses. The genetic associations of such an “unknown,” however, give us concrete evidence of the metabolic pathway this substance is most probably involved in. Moreover, we complement the approach with a specific measure of correlation between metabolites, providing further evidence of the metabolic processes of the unknown. For a number of cases, this even allows for a concrete identity prediction, which we then experimentally validate in the lab.
Collapse
Affiliation(s)
- Jan Krumsiek
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | - Karsten Suhre
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Department of Physiology and Biophysics, Weill Cornell Medical College in Qatar, Education City, Qatar Foundation, Doha, Qatar
| | - Anne M. Evans
- Metabolon, Research Triangle Park, North Carolina, United States of America
| | | | - Robert P. Mohney
- Metabolon, Research Triangle Park, North Carolina, United States of America
| | - Michael V. Milburn
- Metabolon, Research Triangle Park, North Carolina, United States of America
| | - Brigitte Wägele
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Department of Genome-Oriented Bioinformatics, Life and Food Science Center Weihenstephan, Technische Universität München, Freising, Germany
| | - Werner Römisch-Margl
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | - Thomas Illig
- Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, Neuherberg, Germany
- Biobank of the Hanover Medical School, Hanover Medical School, Hanover, Germany
| | - Jerzy Adamski
- Institute of Experimental Genetics, Genome Analysis Center, Helmholtz Zentrum München, Neuherberg, Germany
- Lehrstuhl für Experimentelle Genetik, Technische Universität München, Freising-Weihenstephan, Germany
| | - Christian Gieger
- Institute of Epidemiology, Helmholtz Zentrum München, Neuherberg, Germany
| | - Fabian J. Theis
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Department of Mathematics, Technische Universität München, Garching, Germany
| | - Gabi Kastenmüller
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany
- * E-mail:
| |
Collapse
|
15
|
Zhang J, Koo I, Wang B, Gao QW, Zheng CH, Zhang X. A large scale test dataset to determine optimal retention index threshold based on three mass spectral similarity measures. J Chromatogr A 2012; 1251:188-193. [PMID: 22771253 DOI: 10.1016/j.chroma.2012.06.036] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2012] [Revised: 06/07/2012] [Accepted: 06/13/2012] [Indexed: 11/16/2022]
Abstract
Retention index (RI) is useful for metabolite identification. However, when RI is integrated with mass spectral similarity for metabolite identification, many controversial RI threshold setup are reported in literatures. In this study, a large scale test dataset of 5844 compounds with both mass spectra and RI information were created from National Institute of Standards and Technology (NIST) repetitive mass spectra (MS) and RI library. Three MS similarity measures: NIST composite measure, the real part of Discrete Fourier Transform (DFT.R) and the detail of Discrete Wavelet Transform (DWT.D) were used to investigate the accuracy of compound identification using the test dataset. To imitate real identification experiments, NIST MS main library was employed as reference library and the test dataset was used as search data. Our study shows that the optimal RI thresholds are 22, 15, and 15 i.u. for the NIST composite, DFT.R and DWT.D measures, respectively, when the RI and mass spectral similarity are integrated for compound identification. Compared to the mass spectrum matching, using both RI and mass spectral matching can improve the identification accuracy by 1.7%, 3.5%, and 3.5% for the three mass spectral similarity measures, respectively. It is concluded that the improvement of RI matching for compound identification heavily depends on the method of MS spectral similarity measure and the accuracy of RI data.
Collapse
Affiliation(s)
- Jun Zhang
- School of Electronic Engineering & Automation, Anhui University, Hefei, Anhui 230601, China
| | - Imhoi Koo
- Department of Chemistry, University of Louisville, Louisville, KY 40292, USA; Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40292, USA
| | - Bing Wang
- School of Electrical Engineering & Information, Anhui University of Technology, Maanshan, Anhui 243002, China
| | - Qing-Wei Gao
- School of Electronic Engineering & Automation, Anhui University, Hefei, Anhui 230601, China
| | - Chun-Hou Zheng
- School of Electronic Engineering & Automation, Anhui University, Hefei, Anhui 230601, China.
| | - Xiang Zhang
- Department of Chemistry, University of Louisville, Louisville, KY 40292, USA
| |
Collapse
|
16
|
Eggink P, Maliepaard C, Tikunov Y, Haanstra J, Bovy A, Visser R. A taste of sweet pepper: Volatile and non-volatile chemical composition of fresh sweet pepper (Capsicum annuum) in relation to sensory evaluation of taste. Food Chem 2012; 132:301-10. [DOI: 10.1016/j.foodchem.2011.10.081] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Revised: 09/08/2011] [Accepted: 10/10/2011] [Indexed: 11/16/2022]
|
17
|
Abstract
One of the central challenges to metabolomics is metabolite identification. Regardless of whether one uses so-called 'targeted' or 'untargeted' metabolomics, eventually all paths lead to the requirement of identifying (and quantifying) certain key metabolites. Indeed, without metabolite identification, the results of any metabolomic analysis are biologically and chemically uninterpretable. Given the chemical diversity of most metabolomes and the character of most metabolomic data, metabolite identification is intrinsically difficult. Consequently a great deal of effort in metabolomics over the past decade has been focused on making metabolite identification better, faster and cheaper. This review describes some of the newly emerging techniques or technologies in metabolomics that are making metabolite identification easier and more robust. In particular, it focuses on advances in metabolite identification that have occurred over the past 2 to 3 years concerning the technologies, methodologies and software as applied to NMR, MS and separation science. The strengths and limitations of some of these approaches are discussed along with some of the important trends in metabolite identification.
Collapse
|
18
|
Stefan SE, Ehsan M, Pearson WL, Aksenov A, Boginski V, Bendiak B, Eyler JR. Differentiation of Closely Related Isomers: Application of Data Mining Techniques in Conjunction with Variable Wavelength Infrared Multiple Photon Dissociation Mass Spectrometry for Identification of Glucose-Containing Disaccharide Ions. Anal Chem 2011; 83:8468-76. [DOI: 10.1021/ac2017103] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Affiliation(s)
- Sarah E. Stefan
- Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, Florida 32611-7200, United States
| | - Mohammad Ehsan
- Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, Florida 32611-7200, United States
| | - Wright L. Pearson
- Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, Florida 32611-7200, United States
| | - Alexander Aksenov
- Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, Florida 32611-7200, United States
| | - Vladimir Boginski
- Department of Industrial & Systems Engineering, University of Florida, 1350 North Poquito Road, Shalimar, Florida 32579-1163, United States
| | - Brad Bendiak
- Department of Cellular and Developmental Biology and Program in Structural Biology and Biophysics, University of Colorado at Denver and Health Sciences Center, Aurora, Colorado 80045, United States
| | - John R. Eyler
- Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, Florida 32611-7200, United States
| |
Collapse
|
19
|
O'Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open Babel: An open chemical toolbox. J Cheminform 2011; 3:33. [PMID: 21982300 PMCID: PMC3198950 DOI: 10.1186/1758-2946-3-33] [Citation(s) in RCA: 5364] [Impact Index Per Article: 383.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Accepted: 10/07/2011] [Indexed: 02/08/2023] Open
Abstract
Background A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendor-neutral formats. Results We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license from http://openbabel.org.
Collapse
Affiliation(s)
- Noel M O'Boyle
- University of Pittsburgh, Department of Chemistry, 219 Parkman Avenue, Pittsburgh, PA 15217, USA.
| | | | | | | | | | | |
Collapse
|
20
|
O'Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open Babel: An open chemical toolbox. J Cheminform 2011. [PMID: 21982300 DOI: 10.1186/1758-2946-3-33.] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendor-neutral formats. RESULTS We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. CONCLUSIONS Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license from http://openbabel.org.
Collapse
Affiliation(s)
- Noel M O'Boyle
- University of Pittsburgh, Department of Chemistry, 219 Parkman Avenue, Pittsburgh, PA 15217, USA.
| | | | | | | | | | | |
Collapse
|
21
|
iMatch: a retention index tool for analysis of gas chromatography-mass spectrometry data. J Chromatogr A 2011; 1218:6522-30. [PMID: 21813131 DOI: 10.1016/j.chroma.2011.07.039] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2011] [Revised: 06/28/2011] [Accepted: 07/10/2011] [Indexed: 11/20/2022]
Abstract
A method was developed to employ National Institute of Standards and Technology (NIST) 2008 retention index database information for molecular retention matching via constructing a set of empirical distribution functions (DFs) of the absolute retention index deviation to its mean value. The effects of different experimental parameters on the molecules' retention indices were first assessed. The column class, the column type, and the data type have significant effects on the retention index values acquired on capillary columns. However, the normal alkane retention index (I(norm)) with the ramp condition is similar to the linear retention index (I(T)), while the I(norm) with the isothermal condition is similar to the Kováts retention index (I). As for the I(norm) with the complex condition, these data should be treated as an additional group, because the mean I(norm) value of the polar column is significantly different from the I(T). Based on this analysis, nine DFs were generated from the grouped retention index data. The DF information was further implemented into a software program called iMatch. The performance of iMatch was evaluated using experimental data of a mixture of standards and metabolite extract of rat plasma with spiked-in standards. About 19% of the molecules identified by ChromaTOF were filtered out by iMatch from the identification list of electron ionization (EI) mass spectral matching, while all of the spiked-in standards were preserved. The analysis results demonstrate that using the retention index values, via constructing a set of DFs, can improve the spectral matching-based identifications by reducing a significant portion of false-positives.
Collapse
|
22
|
Hagiwara T, Saito S, Ujiie Y, Imai K, Kakuta M, Kadota K, Terada T, Sumikoshi K, Shimizu K, Nishi T. HPLC Retention time prediction for metabolome analysi. Bioinformation 2010; 5:255-8. [PMID: 21364827 PMCID: PMC3055703 DOI: 10.6026/97320630005255] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2010] [Accepted: 11/24/2010] [Indexed: 11/23/2022] Open
Abstract
Liquid Chromatography Time-of-Flight Mass Spectrometry (LC-TOF-MS) is widely used for profiling metabolite compounds. LC-TOF-MS is a chemical
analysis technique that combines the physical separation capabilities of high-pressure liquid chromatography (HPLC) with the mass analysis capabilities
of Time-of-Flight Mass Spectrometry (TOF-MS) which utilizes the difference in the flight time of ions due to difference in the mass-to-charge ratio. Since
metabolite compounds have various chemical characteristics, their precise identification is a crucial problem of metabolomics research.
Contemporaneously analyzed reference standards are commonly required for mass spectral matching and retention time matching, but there are far fewer
reference standards than there are compounds in the organism. We therefore developed a retention time prediction method for HPLC to improve the
accuracy of identification of metabolite compounds. This method uses a combination of Support Vector Regression and Multiple Linear Regression
adaptively to the measured retention time. We achieved a strong correlation (correlation coefficient = 0.974) between measured and predicted retention
times for our experimental data. We also demonstrated a successful identification of an E. coli metabolite compound that cannot be identified by precise
mass alone.
Collapse
Affiliation(s)
- Takashi Hagiwara
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Seiji Saito
- Genaris, Inc., Joint Research Center 106, 1-1-40 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Yoshifumi Ujiie
- Genaris, Inc., Joint Research Center 106, 1-1-40 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kensaku Imai
- Genaris, Inc., Joint Research Center 106, 1-1-40 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Masanori Kakuta
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Koji Kadota
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Tohru Terada
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Kazuya Sumikoshi
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Kentaro Shimizu
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
- Kentaro Shimizu:
| | - Tatsunari Nishi
- Genaris, Inc., Joint Research Center 106, 1-1-40 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|