1
|
Yang Y, Sun S, Yang S, Yang Q, Lu X, Wang X, Yu Q, Huo X, Qian X. Structural annotation of unknown molecules in a miniaturized mass spectrometer based on a transformer enabled fragment tree method. Commun Chem 2024; 7:109. [PMID: 38740942 DOI: 10.1038/s42004-024-01189-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 04/26/2024] [Indexed: 05/16/2024] Open
Abstract
Structural annotation of small molecules in tandem mass spectrometry has always been a central challenge in mass spectrometry analysis, especially using a miniaturized mass spectrometer for on-site testing. Here, we propose the Transformer enabled Fragment Tree (TeFT) method, which combines various types of fragmentation tree models and a deep learning Transformer module. It is aimed to generate the specific structure of molecules de novo solely from mass spectrometry spectra. The evaluation results on different open-source databases indicated that the proposed model achieved remarkable results in that the majority of molecular structures of compounds in the test can be successfully recognized. Also, the TeFT has been validated on a miniaturized mass spectrometer with low-resolution spectra for 16 flavonoid alcohols, achieving complete structure prediction for 8 substances. Finally, TeFT confirmed the structure of the compound contained in a Chinese medicine substance called the Anweiyang capsule. These results indicate that the TeFT method is suitable for annotating fragmentation peaks with clear fragmentation rules, particularly when applied to on-site mass spectrometry with lower mass resolution.
Collapse
Affiliation(s)
- Yiming Yang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Shuang Sun
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Shuyuan Yang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Qin Yang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Xinqiong Lu
- CHIN Instrument (Hefei) Co., Ltd., Hefei, 231200, China
| | - Xiaohao Wang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Quan Yu
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Xinming Huo
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province, School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China.
| | - Xiang Qian
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| |
Collapse
|
2
|
Stettin D, Pohnert G. MSdeCIpher: A Tool to Link Data from Complementary Ionization Techniques in High-Resolution GC-MS to Identify Molecular Ions. Metabolites 2023; 14:10. [PMID: 38248813 PMCID: PMC10820034 DOI: 10.3390/metabo14010010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/09/2023] [Accepted: 12/20/2023] [Indexed: 01/23/2024] Open
Abstract
Electron ionization (EI) and molecular ion-generating techniques like chemical ionization (CI) are complementary ionization methods in gas chromatography (GC)-mass spectrometry (MS). However, manual curation effort and expert knowledge are required to correctly assign molecular ions to fragment spectra. MSdeCIpher is a software tool that enables the combination of two separate datasets from fragment-rich spectra, like EI-spectra, and soft ionization spectra containing molecular ion candidates. Using high-resolution GC-MS data, it identifies and assigns molecular ions based on retention time matching, user-defined adduct/neutral loss criteria, and sum formula matching. To our knowledge, no other freely available or vendor tool is currently capable of combining fragment-rich and soft ionization datasets in this manner. The tool's performance was evaluated on three test datasets. When molecular ions are present, MSdeCIpher consistently ranks the correct molecular ion for each fragment spectrum in one of the top positions, with average ranks of 1.5, 1, and 1.2 in the three datasets, respectively. MSdeCIpher effectively reduces candidate molecular ions for each fragment spectrum and thus enables the usage of compound identification tools that require molecular masses as input. It paves the way towards rapid annotations in untargeted analysis with high-resolution GC-MS.
Collapse
Affiliation(s)
- Daniel Stettin
- Institute for Inorganic and Analytical Chemistry, Bioorganic Analytics, Friedrich Schiller University Jena, 07743 Jena, Germany;
| | - Georg Pohnert
- Institute for Inorganic and Analytical Chemistry, Bioorganic Analytics, Friedrich Schiller University Jena, 07743 Jena, Germany;
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743 Jena, Germany
| |
Collapse
|
3
|
Li S, Bohman B, Flematti GR, Jayatilaka D. Determining the parent and associated fragment formulae in mass spectrometry via the parent subformula graph. J Cheminform 2023; 15:104. [PMID: 37936244 PMCID: PMC10631010 DOI: 10.1186/s13321-023-00776-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 10/25/2023] [Indexed: 11/09/2023] Open
Abstract
BACKGROUND Identifying the molecular formula and fragmentation reactions of an unknown compound from its mass spectrum is crucial in areas such as natural product chemistry and metabolomics. We propose a method for identifying the correct candidate formula of an unidentified natural product from its mass spectrum. The method involves scoring the plausibility of parent candidate formulae based on a parent subformula graph (PSG), and two possible metrics relating to the number of edges in the PSG. This method is applicable to both electron-impact mass spectrometry (EI-MS) and tandem mass spectrometry (MS/MS) data. Additionally, this work introduces the two-dimensional fragmentation plot (2DFP) for visualizing PSGs. RESULTS Our results suggest that incorporating information regarding the edges of the PSG results in enhanced performance in correctly identifying parent formulae, in comparison to the more well-accepted "MS/MS score", on the 2016 Computational Assessment of Small Molecule Identification (CASMI 2016) data set (76.3 vs 58.9% correct formula identification) and the Research Centre for Toxic Compounds in the Environment (RECETOX) data set (66.2% vs 59.4% correct formula identification). In the extension of our method to identify the correct candidate formula from complex EI-MS data of semiochemicals, our method again performed better (correct formula appearing in the top 4 candidates in 20/23 vs 7/23 cases) than the MS/MS score, and enables the rapid identification of both the correct parent ion mass and the correct parent formula with minimal expert intervention. CONCLUSION Our method reliably identifies the correct parent formula even when the mass information is ambiguous. Furthermore, should parent formula identification be successful, the majority of associated fragment formulae can also be correctly identified. Our method can also identify the parent ion and its associated fragments in EI-MS spectra where the identity of the parent ion is unclear due to low quantities and overlapping compounds. Finally, our method does not inherently require empirical fitting of parameters or statistical learning, meaning it is easy to implement and extend upon. SCIENTIFIC CONTRIBUTION Developed, implemented and tested new metrics for assessing plausibility of candidate molecular formulae obtained from HR-MS data.
Collapse
Affiliation(s)
- Sean Li
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, 6009, Australia.
| | - Björn Bohman
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, 6009, Australia
- Department of Plant Protection Biology, Swedish University of Agricultural Sciences, Box 190, 23422, Lomma, Sweden
| | - Gavin R Flematti
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, 6009, Australia
| | - Dylan Jayatilaka
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, 6009, Australia
| |
Collapse
|
4
|
Guillevic M, Guillevic A, Vollmer MK, Schlauri P, Hill M, Emmenegger L, Reimann S. Automated fragment formula annotation for electron ionisation, high resolution mass spectrometry: application to atmospheric measurements of halocarbons. J Cheminform 2021; 13:78. [PMID: 34607604 PMCID: PMC8491408 DOI: 10.1186/s13321-021-00544-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 08/21/2021] [Indexed: 11/29/2022] Open
Abstract
Background Non-target screening consists in searching a sample for all present substances, suspected or unknown, with very little prior knowledge about the sample. This approach has been introduced more than a decade ago in the field of water analysis, together with dedicated compound identification tools, but is still very scarce for indoor and atmospheric trace gas measurements, despite the clear need for a better understanding of the atmospheric trace gas composition. For a systematic detection of emerging trace gases in the atmosphere, a new and powerful analytical method is gas chromatography (GC) of preconcentrated samples, followed by electron ionisation, high resolution mass spectrometry (EI-HRMS). In this work, we present data analysis tools to enable automated fragment formula annotation for unknown compounds measured by GC-EI-HRMS. Results Based on co-eluting mass/charge fragments, we developed an innovative data analysis method to reliably reconstruct the chemical formulae of the fragments, using efficient combinatorics and graph theory. The method does not require the presence of the molecular ion, which is absent in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sim$$\end{document}∼40% of EI spectra. Our method has been trained and validated on >50 halocarbons and hydrocarbons, with 3–20 atoms and molar masses of 30–330 g mol\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$^{-1}$$\end{document}-1, measured with a mass resolution of approx. 3500. For >90% of the compounds, more than 90% of the annotated fragment formulae are correct. Cases of wrong identification can be attributed to the scarcity of detected fragments per compound or the lack of isotopic constraint (no minor isotopocule detected). Conclusions Our method enables to reconstruct most probable chemical formulae independently from spectral databases. Therefore, it demonstrates the suitability of EI-HRMS data for non-target analysis and paves the way for the identification of substances for which no EI mass spectrum is registered in databases. We illustrate the performances of our method for atmospheric trace gases and suggest that it may be well suited for many other types of samples. The L-GPL licenced Python code is released under the name ALPINAC for ALgorithmic Process for Identification of Non-targeted Atmospheric Compounds. Supplementary Information The online version contains supplementary material available at 10.1186/s13321-021-00544-w.
Collapse
Affiliation(s)
- Myriam Guillevic
- Laboratory for Air Pollution /Environmental Technology, Empa, Swiss Federal Laboratories for Materials Science and Technology, Ueberlandstrasse 129, 8600, Dübendorf, Switzerland.
| | - Aurore Guillevic
- Université de Lorraine, CNRS, Inria, LORIA, 54000, Nancy, France
| | - Martin K Vollmer
- Laboratory for Air Pollution /Environmental Technology, Empa, Swiss Federal Laboratories for Materials Science and Technology, Ueberlandstrasse 129, 8600, Dübendorf, Switzerland
| | - Paul Schlauri
- Laboratory for Air Pollution /Environmental Technology, Empa, Swiss Federal Laboratories for Materials Science and Technology, Ueberlandstrasse 129, 8600, Dübendorf, Switzerland
| | - Matthias Hill
- Laboratory for Air Pollution /Environmental Technology, Empa, Swiss Federal Laboratories for Materials Science and Technology, Ueberlandstrasse 129, 8600, Dübendorf, Switzerland
| | - Lukas Emmenegger
- Laboratory for Air Pollution /Environmental Technology, Empa, Swiss Federal Laboratories for Materials Science and Technology, Ueberlandstrasse 129, 8600, Dübendorf, Switzerland
| | - Stefan Reimann
- Laboratory for Air Pollution /Environmental Technology, Empa, Swiss Federal Laboratories for Materials Science and Technology, Ueberlandstrasse 129, 8600, Dübendorf, Switzerland
| |
Collapse
|
5
|
Krettler CA, Thallinger GG. A map of mass spectrometry-based in silico fragmentation prediction and compound identification in metabolomics. Brief Bioinform 2021; 22:6184408. [PMID: 33758925 DOI: 10.1093/bib/bbab073] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 01/29/2021] [Accepted: 02/12/2021] [Indexed: 12/27/2022] Open
Abstract
Metabolomics, the comprehensive study of the metabolome, and lipidomics-the large-scale study of pathways and networks of cellular lipids-are major driving forces in enabling personalized medicine. Complicated and error-prone data analysis still remains a bottleneck, however, especially for identifying novel metabolites. Comparing experimental mass spectra to curated databases containing reference spectra has been the gold standard for identification of compounds, but constructing such databases is a costly and time-demanding task. Many software applications try to circumvent this process by utilizing cutting-edge advances in computational methods-including quantum chemistry and machine learning-and simulate mass spectra by performing theoretical, so called in silico fragmentations of compounds. Other solutions concentrate directly on experimental spectra and try to identify structural properties by investigating reoccurring patterns and the relationships between them. The considerable progress made in the field allows recent approaches to provide valuable clues to expedite annotation of experimental mass spectra. This review sheds light on individual strengths and weaknesses of these tools, and attempts to evaluate them-especially in view of lipidomics, when considering complex mixtures found in biological samples as well as mass spectrometer inter-instrument variability.
Collapse
Affiliation(s)
- Christoph A Krettler
- Institute of Biomedical Informatics, Graz University of Technology, Stremayrgasse 16/I, 8010, Graz, Austria.,Omics Center Graz, BioTechMed-Graz, Stiftingtalstrasse 24, 8010, Graz, Austria
| | - Gerhard G Thallinger
- Institute of Biomedical Informatics, Graz University of Technology, Stremayrgasse 16/I, 8010, Graz, Austria.,Omics Center Graz, BioTechMed-Graz, Stiftingtalstrasse 24, 8010, Graz, Austria
| |
Collapse
|
6
|
Stettin D, Poulin RX, Pohnert G. Metabolomics Benefits from Orbitrap GC-MS-Comparison of Low- and High-Resolution GC-MS. Metabolites 2020; 10:metabo10040143. [PMID: 32260407 PMCID: PMC7254393 DOI: 10.3390/metabo10040143] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 03/26/2020] [Accepted: 04/01/2020] [Indexed: 12/12/2022] Open
Abstract
The development of improved mass spectrometers and supporting computational tools is expected to enable the rapid annotation of whole metabolomes. Essential for the progress is the identification of strengths and weaknesses of novel instrumentation in direct comparison to previous instruments. Orbitrap liquid chromatography (LC)–mass spectrometry (MS) technology is now widely in use, while Orbitrap gas chromatography (GC)–MS introduced in 2015 has remained fairly unexplored in its potential for metabolomics research. This study aims to evaluate the additional knowledge gained in a metabolomics experiment when using the high-resolution Orbitrap GC–MS in comparison to a commonly used unit-mass resolution single-quadrupole GC–MS. Samples from an osmotic stress treatment of a non-model organism, the microalga Skeletonema costatum, were investigated using comparative metabolomics with low- and high-resolution methods. Resulting datasets were compared on a statistical level and on the level of individual compound annotation. Both MS approaches resulted in successful classification of stressed vs. non-stressed microalgae but did so using different sets of significantly dysregulated metabolites. High-resolution data only slightly improved conventional library matching but enabled the correct annotation of an unknown. While computational support that utilizes high-resolution GC–MS data is still underdeveloped, clear benefits in terms of sensitivity, metabolic coverage, and support in structure elucidation of the Orbitrap GC–MS technology for metabolomics studies are shown here.
Collapse
|
7
|
Zuber J, Rathsack P, Otto M. Structural Characterization of Acidic Compounds in Pyrolysis Liquids Using Collision-Induced Dissociation and Fourier Transform Ion Cyclotron Resonance Mass Spectrometry. Anal Chem 2018; 90:12655-12662. [PMID: 30280888 DOI: 10.1021/acs.analchem.8b02873] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In this study, a novel approach to characterize and identify acidic oil compounds utilizing the fragmentational behavior of their corresponding precursor ions is presented. Precursor ions of seven analyzed pyrolysis oils that were generated from pyrolysis educts of different origins and degrees of coalification were produced by electrospray ionization in the negative ion mode (ESI(-)). Following a fragmentation of all ions in the ion cloud by collision-induced dissociation (CID), the precursor and product ions were subsequently detected by ultrahigh resolving Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS). The ESI(-)-CID data sets were evaluated by applying either a targeted classification or untargeted clustering approach. In the case of the targeted classification, 10% of the ionized precursor ions of the analyzed pyrolysis liquid samples could be classified into one of 11 compound classes utilizing theoretical fragmentation pathways of these classes. In contrast, theoretical fragmentation pathways were not necessary for the untargeted clustering approach, making it the more transmittable method. Results from both approaches were verified by analyzing standard compounds of known structure. The analysis and data evaluation methods presented in this work can be used to characterize complex organic mixtures, such as pyrolysis oils, and their compounds in-depth on a structural level.
Collapse
Affiliation(s)
- Jan Zuber
- Institute of Analytical Chemistry , TU Bergakademie Freiberg , Leipziger Straße 29 , 09599 Freiberg , Germany
| | - Philipp Rathsack
- Institute of Analytical Chemistry , TU Bergakademie Freiberg , Leipziger Straße 29 , 09599 Freiberg , Germany.,German Centre for Energy Resources , Reiche Zeche , Fuchsmuehlenweg 9 , 09599 Freiberg , Germany
| | - Matthias Otto
- Institute of Analytical Chemistry , TU Bergakademie Freiberg , Leipziger Straße 29 , 09599 Freiberg , Germany
| |
Collapse
|
8
|
Godzien J, Gil de la Fuente A, Otero A, Barbas C. Metabolite Annotation and Identification. COMPREHENSIVE ANALYTICAL CHEMISTRY 2018. [DOI: 10.1016/bs.coac.2018.07.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
9
|
Hufsky F, Böcker S. Mining molecular structure databases: Identification of small molecules based on fragmentation mass spectrometry data. MASS SPECTROMETRY REVIEWS 2017; 36:624-633. [PMID: 26763615 DOI: 10.1002/mas.21489] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 12/18/2015] [Indexed: 06/05/2023]
Abstract
Mass spectrometry (MS) is a key technology for the analysis of small molecules. For the identification and structural elucidation of novel molecules, new approaches beyond straightforward spectral comparison are required. In this review, we will cover computational methods that help with the identification of small molecules by analyzing fragmentation MS data. We focus on the four main approaches to mine a database of metabolite structures, that is rule-based fragmentation spectrum prediction, combinatorial fragmentation, competitive fragmentation modeling, and molecular fingerprint prediction. © 2016 Wiley Periodicals, Inc. Mass Spec Rev 36:624-633, 2017.
Collapse
Affiliation(s)
- Franziska Hufsky
- Lehrstuhl für Bioinformatik, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, Jena, 07743, Germany
- Bioinformatik für Hochdurchsatzverfahren, Friedrich-Schiller-Universität Jena, Leutragraben 1, Jena, 07743, Germany
| | - Sebastian Böcker
- Lehrstuhl für Bioinformatik, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, Jena, 07743, Germany
| |
Collapse
|
10
|
Gil de la Fuente A, Grace Armitage E, Otero A, Barbas C, Godzien J. Differentiating signals to make biological sense - A guide through databases for MS-based non-targeted metabolomics. Electrophoresis 2017; 38:2242-2256. [PMID: 28426136 DOI: 10.1002/elps.201700070] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Revised: 03/17/2017] [Accepted: 03/17/2017] [Indexed: 12/21/2022]
Abstract
Metabolite identification is one of the most challenging steps in metabolomics studies and reflects one of the greatest bottlenecks in the entire workflow. The success of this step determines the success of the entire research, therefore the quality at which annotations are given requires special attention. A variety of tools and resources are available to aid metabolite identification or annotation, offering different and often complementary functionalities. In preparation for this article, almost 50 databases were reviewed, from which 17 were selected for discussion, chosen for their online ESI-MS functionality. The general characteristics and functions of each database is discussed in turn, considering the advantages and limitations of each along with recommendations for optimal use of each tool, as derived from experiences encountered at the Centre for Metabolomics and Bioanalysis (CEMBIO) in Madrid. These databases were evaluated considering their utility in non-targeted metabolomics, including aspects such as identifier assignment, structural assignment and interpretation of results.
Collapse
Affiliation(s)
- Alberto Gil de la Fuente
- Centre for Metabolomics and Bioanalysis (CEMBIO), Facultad de Farmacia, Universidad CEU San Pablo, Campus Montepríncipe, Boadilla del Monte, Madrid, Spain.,Department of Information Technology, Universidad CEU San Pablo, Campus Montepríncipe, Boadilla del Monte, Madrid, Spain
| | - Emily Grace Armitage
- Wellcome Centre for Molecular Parasitology, Institute of Infection, Immunity and Inflammation, College of Medical Veterinary and Life Sciences, University of Glasgow, Glasgow, UK.,Glasgow Polyomics, Wolfson Wohl Cancer Research Centre, College of Medical Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
| | - Abraham Otero
- Department of Information Technology, Universidad CEU San Pablo, Campus Montepríncipe, Boadilla del Monte, Madrid, Spain
| | - Coral Barbas
- Centre for Metabolomics and Bioanalysis (CEMBIO), Facultad de Farmacia, Universidad CEU San Pablo, Campus Montepríncipe, Boadilla del Monte, Madrid, Spain
| | - Joanna Godzien
- Centre for Metabolomics and Bioanalysis (CEMBIO), Facultad de Farmacia, Universidad CEU San Pablo, Campus Montepríncipe, Boadilla del Monte, Madrid, Spain
| |
Collapse
|
11
|
Yi L, Dong N, Yun Y, Deng B, Ren D, Liu S, Liang Y. Chemometric methods in data processing of mass spectrometry-based metabolomics: A review. Anal Chim Acta 2016; 914:17-34. [PMID: 26965324 DOI: 10.1016/j.aca.2016.02.001] [Citation(s) in RCA: 159] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Revised: 01/28/2016] [Accepted: 02/01/2016] [Indexed: 01/03/2023]
Abstract
This review focuses on recent and potential advances in chemometric methods in relation to data processing in metabolomics, especially for data generated from mass spectrometric techniques. Metabolomics is gradually being regarded a valuable and promising biotechnology rather than an ambitious advancement. Herein, we outline significant developments in metabolomics, especially in the combination with modern chemical analysis techniques, and dedicated statistical, and chemometric data analytical strategies. Advanced skills in the preprocessing of raw data, identification of metabolites, variable selection, and modeling are illustrated. We believe that insights from these developments will help narrow the gap between the original dataset and current biological knowledge. We also discuss the limitations and perspectives of extracting information from high-throughput datasets.
Collapse
Affiliation(s)
- Lunzhao Yi
- Yunnan Food Safety Research Institute, Kunming University of Science and Technology, Kunming, 650500, China.
| | - Naiping Dong
- Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, 999077, China
| | - Yonghuan Yun
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, China
| | - Baichuan Deng
- College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Dabing Ren
- Yunnan Food Safety Research Institute, Kunming University of Science and Technology, Kunming, 650500, China
| | - Shao Liu
- Xiangya Hospital, Central South University, Changsha, 410008, China
| | - Yizeng Liang
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, China
| |
Collapse
|
12
|
Vaniya A, Fiehn O. Using fragmentation trees and mass spectral trees for identifying unknown compounds in metabolomics. Trends Analyt Chem 2015; 69:52-61. [PMID: 26213431 PMCID: PMC4509603 DOI: 10.1016/j.trac.2015.04.002] [Citation(s) in RCA: 97] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Identification of unknown metabolites is the bottleneck in advancing metabolomics, leaving interpretation of metabolomics results ambiguous. The chemical diversity of metabolism is vast, making structure identification arduous and time consuming. Currently, comprehensive analysis of mass spectra in metabolomics is limited to library matching, but tandem mass spectral libraries are small compared to the large number of compounds found in the biosphere, including xenobiotics. Resolving this bottleneck requires richer data acquisition and better computational tools. Multi-stage mass spectrometry (MSn) trees show promise to aid in this regard. Fragmentation trees explore the fragmentation process, generate fragmentation rules and aid in sub-structure identification, while mass spectral trees delineate the dependencies in multi-stage MS of collision-induced dissociations. This review covers advancements over the past 10 years as a tool for metabolite identification, including algorithms, software and databases used to build and to implement fragmentation trees and mass spectral annotations.
Collapse
Affiliation(s)
- Arpana Vaniya
- University of California Davis, Department of Chemistry, One Shields Avenue, Davis, CA 95616, USA
- University of California Davis, West Coast Metabolomics Center, Genome Center, 451 Health Sciences Drive, Davis, CA 95616, USA
| | - Oliver Fiehn
- University of California Davis, West Coast Metabolomics Center, Genome Center, 451 Health Sciences Drive, Davis, CA 95616, USA
- King Abdulaziz University, Biochemistry Department, Jeddah, Saudi Arabia
| |
Collapse
|
13
|
Yi L, Dong N, Yun Y, Deng B, Liu S, Zhang Y, Liang Y. WITHDRAWN: Recent advances in chemometric methods for plant metabolomics: A review. Biotechnol Adv 2014:S0734-9750(14)00183-9. [PMID: 25461504 DOI: 10.1016/j.biotechadv.2014.11.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Revised: 11/17/2014] [Accepted: 11/18/2014] [Indexed: 12/17/2022]
Abstract
This article has been withdrawn at the request of the author(s) and/or editor. The Publisher apologizes for any inconvenience this may cause. The full Elsevier Policy on Article Withdrawal can be found at http://www.elsevier.com/locate/withdrawalpolicy.
Collapse
Affiliation(s)
- Lunzhao Yi
- Yunnan Food Safety Research Institute, Kunming University of Science and Technology, Kunming 650500, China.
| | - Naiping Dong
- Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong 999077, Hong Kong, China
| | - Yonghuan Yun
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Baichuan Deng
- Department of Chemistry, University of Bergen, Bergen N-5007, Norway
| | - Shao Liu
- Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yi Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yizeng Liang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
14
|
Hufsky F, Scheubert K, Böcker S. New kids on the block: novel informatics methods for natural product discovery. Nat Prod Rep 2014; 31:807-17. [DOI: 10.1039/c3np70101h] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
15
|
Hufsky F, Scheubert K, Böcker S. Computational mass spectrometry for small-molecule fragmentation. Trends Analyt Chem 2014. [DOI: 10.1016/j.trac.2013.09.008] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
16
|
Kachala VV, Khemchyan LL, Kashin AS, Orlov NV, Grachev AA, Zalesskiy SS, Ananikov VP. Target-oriented analysis of gaseous, liquid and solid chemical systems by mass spectrometry, nuclear magnetic resonance spectroscopy and electron microscopy. RUSSIAN CHEMICAL REVIEWS 2013. [DOI: 10.1070/rc2013v082n07abeh004413] [Citation(s) in RCA: 178] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
17
|
Molecular Formula Identification with SIRIUS. Metabolites 2013; 3:506-16. [PMID: 24958003 PMCID: PMC3901276 DOI: 10.3390/metabo3020506] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2013] [Revised: 06/03/2013] [Accepted: 06/04/2013] [Indexed: 01/06/2023] Open
Abstract
We present results of the SIRIUS2 submission to the 2012 CASMI contest. Only results for Category 1 (molecular formula identification) were submitted. The SIRIUS method and the parameters used are briefly described, followed by detailed analysis of the results and a discussion of cases where SIRIUS2 was unable to come up with the correct molecular formula. SIRIUS2 returns consistently high quality results, with the exception of fragmentation pattern analysis of time-of-flight data. We then discuss possibilities for further improving SIRIUS2 in the future.
Collapse
|
18
|
Peironcely JE, Rojas-Chertó M, Tas A, Vreeken R, Reijmers T, Coulier L, Hankemeier T. Automated Pipeline for De Novo Metabolite Identification Using Mass-Spectrometry-Based Metabolomics. Anal Chem 2013; 85:3576-83. [DOI: 10.1021/ac303218u] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Julio E. Peironcely
- TNO Research Group Quality & Safety, P.O. Box 360, NL-3700 AJ Zeist, The Netherlands
- Leiden
Academic Center for Drug
Research, Leiden University, Einsteinweg
55, 2333 CC Leiden, The Netherlands
- Netherlands Metabolomics Centre, Einsteinweg 55, 2333 CC Leiden, The Netherlands
| | - Miguel Rojas-Chertó
- Leiden
Academic Center for Drug
Research, Leiden University, Einsteinweg
55, 2333 CC Leiden, The Netherlands
- Netherlands Metabolomics Centre, Einsteinweg 55, 2333 CC Leiden, The Netherlands
| | - Albert Tas
- TNO Research Group Quality & Safety, P.O. Box 360, NL-3700 AJ Zeist, The Netherlands
| | - Rob Vreeken
- Leiden
Academic Center for Drug
Research, Leiden University, Einsteinweg
55, 2333 CC Leiden, The Netherlands
- Netherlands Metabolomics Centre, Einsteinweg 55, 2333 CC Leiden, The Netherlands
| | - Theo Reijmers
- Leiden
Academic Center for Drug
Research, Leiden University, Einsteinweg
55, 2333 CC Leiden, The Netherlands
- Netherlands Metabolomics Centre, Einsteinweg 55, 2333 CC Leiden, The Netherlands
| | - Leon Coulier
- TNO Research Group Quality & Safety, P.O. Box 360, NL-3700 AJ Zeist, The Netherlands
- Netherlands Metabolomics Centre, Einsteinweg 55, 2333 CC Leiden, The Netherlands
| | - Thomas Hankemeier
- Leiden
Academic Center for Drug
Research, Leiden University, Einsteinweg
55, 2333 CC Leiden, The Netherlands
- Netherlands Metabolomics Centre, Einsteinweg 55, 2333 CC Leiden, The Netherlands
| |
Collapse
|
19
|
Rauf I, Rasche F, Nicolas F, Böcker S. Finding maximum colorful subtrees in practice. J Comput Biol 2013; 20:311-21. [PMID: 23509858 DOI: 10.1089/cmb.2012.0083] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In metabolomics and other fields dealing with small compounds, mass spectrometry is applied as a sensitive high-throughput technique. Recently, fragmentation trees have been proposed to automatically analyze the fragmentation mass spectra recorded by such instruments. Computationally, this leads to the problem of finding a maximum weight subtree in an edge-weighted and vertex-colored graph, such that every color appears, at most once in the solution. We introduce new heuristics and an exact algorithm for this Maximum Colorful Subtree problem and evaluate them against existing algorithms on real-world and artificial datasets. Our tree completion heuristic consistently scores better than other heuristics, while the integer programming-based algorithm produces optimal trees with modest running times. Our fast and accurate heuristic can help determine molecular formulas based on fragmentation trees. On the other hand, optimal trees from the integer linear program are useful if structure is relevant, for example for tree alignments.
Collapse
Affiliation(s)
- Imran Rauf
- Department of Computer Science, National University of Computer and Emerging Sciences, Karachi, Pakistan.
| | | | | | | |
Collapse
|
20
|
Scheubert K, Hufsky F, Böcker S. Computational mass spectrometry for small molecules. J Cheminform 2013; 5:12. [PMID: 23453222 PMCID: PMC3648359 DOI: 10.1186/1758-2946-5-12] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 02/01/2013] [Indexed: 12/29/2022] Open
Abstract
: The identification of small molecules from mass spectrometry (MS) data remains a major challenge in the interpretation of MS data. This review covers the computational aspects of identifying small molecules, from the identification of a compound searching a reference spectral library, to the structural elucidation of unknowns. In detail, we describe the basic principles and pitfalls of searching mass spectral reference libraries. Determining the molecular formula of the compound can serve as a basis for subsequent structural elucidation; consequently, we cover different methods for molecular formula identification, focussing on isotope pattern analysis. We then discuss automated methods to deal with mass spectra of compounds that are not present in spectral libraries, and provide an insight into de novo analysis of fragmentation spectra using fragmentation trees. In addition, this review shortly covers the reconstruction of metabolic networks using MS data. Finally, we list available software for different steps of the analysis pipeline.
Collapse
Affiliation(s)
- Kerstin Scheubert
- Chair of Bioinformatics, Friedrich Schiller University, Ernst-Abbe-Platz 2, Jena, Germany.
| | | | | |
Collapse
|
21
|
Fukushima A, Kusano M. Recent progress in the development of metabolome databases for plant systems biology. FRONTIERS IN PLANT SCIENCE 2013; 4:73. [PMID: 23577015 PMCID: PMC3616245 DOI: 10.3389/fpls.2013.00073] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Accepted: 03/15/2013] [Indexed: 05/19/2023]
Abstract
Metabolomics has grown greatly as a functional genomics tool, and has become an invaluable diagnostic tool for biochemical phenotyping of biological systems. Over the past decades, a number of databases involving information related to mass spectra, compound names and structures, statistical/mathematical models and metabolic pathways, and metabolite profile data have been developed. Such databases complement each other and support efficient growth in this area, although the data resources remain scattered across the World Wide Web. Here, we review available metabolome databases and summarize the present status of development of related tools, particularly focusing on the plant metabolome. Data sharing discussed here will pave way for the robust interpretation of metabolomic data and advances in plant systems biology.
Collapse
Affiliation(s)
- Atsushi Fukushima
- RIKEN Plant Science CenterYokohama, Kanagawa, Japan
- *Correspondence: Atsushi Fukushima, RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan. e-mail:
| | - Miyako Kusano
- RIKEN Plant Science CenterYokohama, Kanagawa, Japan
- Department of Genome System Sciences, Graduate School of Nanobioscience, Kihara Institute for Biological ResearchYokohama, Kanagawa, Japan
| |
Collapse
|
22
|
Abstract
Matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) imaging mass spectrometry (IMS) applied directly to microbes on agar-based medium captures global information about microbial molecules, allowing for direct correlation of chemotypes to phenotypes. This tool was developed to investigate metabolic exchange factors of intraspecies, interspecies, and polymicrobial interactions. Based on our experience of the thousands of images we have generated in the laboratory, we present five steps of microbial IMS: culturing, matrix application, dehydration of the sample, data acquisition, and data analysis/interpretation. We also address the common challenges encountered during sample preparation, matrix selection and application, and sample adherence to the MALDI target plate. With the practical guidelines described herein, microbial IMS use can be extended to bio-based agricultural, biofuel, diagnostic, and therapeutic discovery applications.
Collapse
|