1
|
Ziaikin E, Tello E, Peterson DG, Niv MY. BitterMasS: Predicting Bitterness from Mass Spectra. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:10537-10547. [PMID: 38685906 PMCID: PMC11082931 DOI: 10.1021/acs.jafc.3c09767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 04/18/2024] [Accepted: 04/18/2024] [Indexed: 05/02/2024]
Abstract
Bitter compounds are common in nature and among drugs. Previously, machine learning tools were developed to predict bitterness from the chemical structure. However, known structures are estimated to represent only 5-10% of the metabolome, and the rest remain unassigned or "dark". We present BitterMasS, a Random Forest classifier that was trained on 5414 experimental mass spectra of bitter and nonbitter compounds, achieving precision = 0.83 and recall = 0.90 for an internal test set. Next, the model was tested against spectra newly extracted from the literature 106 bitter and nonbitter compounds and for additional spectra measured for 26 compounds. For these external test cases, BitterMasS exhibited 67% precision and 93% recall for the first and 58% accuracy and 99% recall for the second. The spectrum-bitterness prediction strategy was more effective than the spectrum-structure-bitterness prediction strategy and covered more compounds. These encouraging results suggest that BitterMasS can be used to predict bitter compounds in the metabolome without the need for structural assignment of individual molecules. This may enable identification of bitter compounds from metabolomics analyses, for comparing potential bitterness levels obtained by different treatments of samples and for monitoring bitterness changes overtime.
Collapse
Affiliation(s)
- Evgenii Ziaikin
- Food
Science and Nutrition, The Robert H. Smith Faculty of Agriculture,
Food and Environment, The Institute of Biochemistry, Food and Nutrition, The Hebrew University of Jerusalem, 76100 Rehovot, Israel
| | - Edisson Tello
- Department
of Food Science and Technology, College of Food, Agriculture, and
Environmental Sciences, The Ohio State University, Columbus, Ohio 43210, United States
| | - Devin G. Peterson
- Department
of Food Science and Technology, College of Food, Agriculture, and
Environmental Sciences, The Ohio State University, Columbus, Ohio 43210, United States
| | - Masha Y. Niv
- Food
Science and Nutrition, The Robert H. Smith Faculty of Agriculture,
Food and Environment, The Institute of Biochemistry, Food and Nutrition, The Hebrew University of Jerusalem, 76100 Rehovot, Israel
| |
Collapse
|
2
|
Perez de Souza L, Fernie AR. Computational methods for processing and interpreting mass spectrometry-based metabolomics. Essays Biochem 2024; 68:5-13. [PMID: 37999335 PMCID: PMC11065554 DOI: 10.1042/ebc20230019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/10/2023] [Accepted: 11/15/2023] [Indexed: 11/25/2023]
Abstract
Metabolomics has emerged as an indispensable tool for exploring complex biological questions, providing the ability to investigate a substantial portion of the metabolome. However, the vast complexity and structural diversity intrinsic to metabolites imposes a great challenge for data analysis and interpretation. Liquid chromatography mass spectrometry (LC-MS) stands out as a versatile technique offering extensive metabolite coverage. In this mini-review, we address some of the hurdles posed by the complex nature of LC-MS data, providing a brief overview of computational tools designed to help tackling these challenges. Our focus centers on two major steps that are essential to most metabolomics investigations: the translation of raw data into quantifiable features, and the extraction of structural insights from mass spectra to facilitate metabolite identification. By exploring current computational solutions, we aim at providing a critical overview of the capabilities and constraints of mass spectrometry-based metabolomics, while introduce some of the most recent trends in data processing and analysis within the field.
Collapse
Affiliation(s)
- Leonardo Perez de Souza
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Alisdair R Fernie
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
- Center for Plant Systems Biology and Biotechnology, 4000 Plovdiv, Bulgaria
| |
Collapse
|
3
|
Chandran M, S S, Abhirami, Chandran A, Jaleel A, Plakkal Ayyappan J. Defining atherosclerotic plaque biology by mass spectrometry-based omics approaches. Mol Omics 2023; 19:6-26. [PMID: 36426765 DOI: 10.1039/d2mo00260d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Atherosclerosis is the principal cause of vascular diseases and one of the leading causes of worldwide death. Even though several insights into its natural course, risk factors and interventions have been identified, it is still an ongoing global pandemic. Since the structure and biochemical composition of the plaques show high heterogeneity, a comprehensive understanding of the intraplaque composition, its microenvironment, and the mechanisms of the progression and instability across different vascular beds at their progression stages is crucial for better risk stratification and treatment modalities. Even though several cell-based studies, animal studies, and extensive multicentric population studies have been conducted concerning cardiovascular diseases for assessing the risk factors and plaque biology, the studies on human clinical samples are very limited. New novel approaches utilize samples from percutaneous coronary interventions, which could possibly gain more access to clinical samples at different stages of the diseases without complex invasive resections. As an emerging technological platform in disease discovery research, mass spectrometry-based omics technologies offer capabilities for a comprehensive understanding of the mechanisms linked to several vascular diseases. Here, we discuss the cellular and molecular processes of atherosclerosis, different mass spectrometry-based omics approaches, and the studies mostly done on clinical samples of atheroma plaque using mass spectrometry-based proteomics, metabolomics and lipidomics approaches.
Collapse
Affiliation(s)
- Mahesh Chandran
- Translational Nanomedicine and Lifestyle Disease Research Laboratory, Department of Biochemistry, University of Kerala, Thiruvananthapuram 695034, Kerala, India. .,Department of Biotechnology, University of Kerala, Thiruvananthapuram 695034, Kerala, India.,Mass Spectrometry and Proteomics Core Facility, Rajiv Gandhi Centre for Biotechnology, Thiruvananthapuram, Kerala, 695012, India
| | - Sudhina S
- Translational Nanomedicine and Lifestyle Disease Research Laboratory, Department of Biochemistry, University of Kerala, Thiruvananthapuram 695034, Kerala, India.
| | - Abhirami
- Translational Nanomedicine and Lifestyle Disease Research Laboratory, Department of Biochemistry, University of Kerala, Thiruvananthapuram 695034, Kerala, India.
| | - Akash Chandran
- Department of Nanoscience and Nanotechnology, University of Kerala, Kariavattom, Thiruvananthapuram-695581, Kerala, India
| | - Abdul Jaleel
- Mass Spectrometry and Proteomics Core Facility, Rajiv Gandhi Centre for Biotechnology, Thiruvananthapuram, Kerala, 695012, India
| | - Janeesh Plakkal Ayyappan
- Translational Nanomedicine and Lifestyle Disease Research Laboratory, Department of Biochemistry, University of Kerala, Thiruvananthapuram 695034, Kerala, India. .,Department of Biotechnology, University of Kerala, Thiruvananthapuram 695034, Kerala, India.,Department of Nanoscience and Nanotechnology, University of Kerala, Kariavattom, Thiruvananthapuram-695581, Kerala, India.,Centre for Advanced Cancer Research, Department of Biochemistry, University of Kerala, Thiruvananthapuram 695034, Kerala, India
| |
Collapse
|
4
|
Heathcote D, Robertson PA, Butler AA, Ridley C, Lomas J, Buffett MM, Bell M, Vallance C. Electron-induced dissociation dynamics studied using covariance-map imaging. Faraday Discuss 2022; 238:682-699. [PMID: 35781475 DOI: 10.1039/d2fd00033d] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Recently, covariance analysis has found significant use in the field of chemical reaction dynamics. When coupled with data from product time-of-flight mass spectrometry and/or multi-mass velocity-map imaging, it allows us to uncover correlations between two or more ions formed from the same parent molecule. While the approach has parallels with coincidence measurements, covariance analysis allows experiments to be performed at much higher count rates than traditional coincidence methods. We report results from electron-molecule crossed-beam experiments, in which covariance analysis is used to elucidate the dissociation dynamics of multiply-charged ions formed by electron ionisation over the energy range from 50 to 300 eV. The approach is able to isolate signal contributions from multiply charged ions even against a very large 'background' of signal arising from dissociation of singly-charged parent ions. Covariance between the product time-of-flight spectra identifies pairs of fragments arising from the same parent ions, while covariances between the velocity-map images ('recoil-frame covariances') reveal the relative velocity distributions of the ion pairs. We show that recoil-frame covariance analysis can be used to distinguish between multiple plausible dissociation mechanisms, including multi-step processes, and that the approach becomes particularly powerful when investigating the fragmentation dynamics of larger molecules with a higher number of possible fragmentation pathways.
Collapse
Affiliation(s)
- David Heathcote
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, 12 Mansfield Road, Oxford, OX1 3TA, UK.
| | - Patrick A Robertson
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, 12 Mansfield Road, Oxford, OX1 3TA, UK.
| | - Alexander A Butler
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, 12 Mansfield Road, Oxford, OX1 3TA, UK.
| | - Cian Ridley
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, 12 Mansfield Road, Oxford, OX1 3TA, UK.
| | - James Lomas
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, 12 Mansfield Road, Oxford, OX1 3TA, UK.
| | - Madeline M Buffett
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, 12 Mansfield Road, Oxford, OX1 3TA, UK.
| | - Megan Bell
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, 12 Mansfield Road, Oxford, OX1 3TA, UK.
| | - Claire Vallance
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, 12 Mansfield Road, Oxford, OX1 3TA, UK.
| |
Collapse
|
5
|
Kostyukevich Y, Sosnin S, Osipenko S, Kovaleva O, Rumiantseva L, Kireev A, Zherebker A, Fedorov M, Nikolaev EN. PyFragMS-A Web Tool for the Investigation of the Collision-Induced Fragmentation Pathways. ACS OMEGA 2022; 7:9710-9719. [PMID: 35350354 PMCID: PMC8945079 DOI: 10.1021/acsomega.1c07272] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 02/28/2022] [Indexed: 05/13/2023]
Abstract
Dissociation induced by the accumulation of internal energy via collisions of ions with neutral molecules is one of the most important fragmentation techniques in mass spectrometry (MS), and the identification of small singly charged molecules is based mainly on the consideration of the fragmentation spectrum. Many research studies have been dedicated to the creation of databases of experimentally measured tandem mass spectrometry (MS/MS) spectra (such as MzCloud, Metlin, etc.) and developing software for predicting MS/MS fragments in silico from the molecular structure (such as MetFrag, CFM-ID, CSI:FingerID, etc.). However, the fragmentation mechanisms and pathways are still not fully understood. One of the limiting obstacles is that protomers (positive ions protonated at different sites) produce different fragmentation spectra, and these spectra overlap in the case of the presence of different protomers. Here, we are proposing to use a combination of two powerful approaches: computing fragmentation trees that carry information of all consecutive fragmentations and consideration of the MS/MS data of isotopically labeled compounds. We have created PyFragMS-a web tool consisting of a database of annotated MS/MS spectra of isotopically labeled molecules (after H/D and/or 16O/18O exchange) and a collection of instruments for computing fragmentation trees for an arbitrary molecule. Using PyFragMS, we investigated how the site of protonation influences the fragmentation pathway for small molecules. Also, PyFragMS offers capabilities for performing database search when MS/MS data of the isotopically labeled compounds are taken into account.
Collapse
|
6
|
Tian Z, Liu F, Li D, Fernie AR, Chen W. Strategies for structure elucidation of small molecules based on LC–MS/MS data from complex biological samples. Comput Struct Biotechnol J 2022; 20:5085-5097. [PMID: 36187931 PMCID: PMC9489805 DOI: 10.1016/j.csbj.2022.09.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 09/03/2022] [Accepted: 09/03/2022] [Indexed: 11/06/2022] Open
Abstract
LC–MS/MS is a major analytical platform for metabolomics, which has become a recent hotspot in the research fields of life and environmental sciences. By contrast, structure elucidation of small molecules based on LC–MS/MS data remains a major challenge in the chemical and biological interpretation of untargeted metabolomics datasets. In recent years, several strategies for structure elucidation using LC–MS/MS data from complex biological samples have been proposed, these strategies can be simply categorized into two types, one based on structure annotation of mass spectra and for the other on retention time prediction. These strategies have helped many scientists conduct research in metabolite-related fields and are indispensable for the development of future tools. Here, we summarized the characteristics of the current tools and strategies for structure elucidation of small molecules based on LC–MS/MS data, and further discussed the directions and perspectives to improve the power of the tools or strategies for structure elucidation.
Collapse
|
7
|
Shrivastava AD, Swainston N, Samanta S, Roberts I, Wright Muelas M, Kell DB. MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules from Their Mass Spectra. Biomolecules 2021; 11:1793. [PMID: 34944436 PMCID: PMC8699281 DOI: 10.3390/biom11121793] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 11/14/2021] [Accepted: 11/27/2021] [Indexed: 12/15/2022] Open
Abstract
The 'inverse problem' of mass spectrometric molecular identification ('given a mass spectrum, calculate/predict the 2D structure of the molecule whence it came') is largely unsolved, and is especially acute in metabolomics where many small molecules remain unidentified. This is largely because the number of experimentally available electrospray mass spectra of small molecules is quite limited. However, the forward problem ('calculate a small molecule's likely fragmentation and hence at least some of its mass spectrum from its structure alone') is much more tractable, because the strengths of different chemical bonds are roughly known. This kind of molecular identification problem may be cast as a language translation problem in which the source language is a list of high-resolution mass spectral peaks and the 'translation' a representation (for instance in SMILES) of the molecule. It is thus suitable for attack using the deep neural networks known as transformers. We here present MassGenie, a method that uses a transformer-based deep neural network, trained on ~6 million chemical structures with augmented SMILES encoding and their paired molecular fragments as generated in silico, explicitly including the protonated molecular ion. This architecture (containing some 400 million elements) is used to predict the structure of a molecule from the various fragments that may be expected to be observed when some of its bonds are broken. Despite being given essentially no detailed nor explicit rules about molecular fragmentation methods, isotope patterns, rearrangements, neutral losses, and the like, MassGenie learns the effective properties of the mass spectral fragment and valency space, and can generate candidate molecular structures that are very close or identical to those of the 'true' molecules. We also use VAE-Sim, a previously published variational autoencoder, to generate candidate molecules that are 'similar' to the top hit. In addition to using the 'top hits' directly, we can produce a rank order of these by 'round-tripping' candidate molecules and comparing them with the true molecules, where known. As a proof of principle, we confine ourselves to positive electrospray mass spectra from molecules with a molecular mass of 500Da or lower, including those in the last CASMI challenge (for which the results are known), getting 49/93 (53%) precisely correct. The transformer method, applied here for the first time to mass spectral interpretation, works extremely effectively both for mass spectra generated in silico and on experimentally obtained mass spectra from pure compounds. It seems to act as a Las Vegas algorithm, in that it either gives the correct answer or simply states that it cannot find one. The ability to create and to 'learn' millions of fragmentation patterns in silico, and therefrom generate candidate structures (that do not have to be in existing libraries) directly, thus opens up entirely the field of de novo small molecule structure prediction from experimental mass spectra.
Collapse
Affiliation(s)
- Aditya Divyakant Shrivastava
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Department of Computer Science and Engineering, Nirma University, Ahmedabad 382481, India
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Mellizyme Biotechnology Ltd., Liverpool Science Park IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK
| | - Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Ivayla Roberts
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Marina Wright Muelas
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Mellizyme Biotechnology Ltd., Liverpool Science Park IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
8
|
Schreckenbach SA, Anderson JSM, Koopman J, Grimme S, Simpson MJ, Jobst KJ. Predicting the Mass Spectra of Environmental Pollutants Using Computational Chemistry: A Case Study and Critical Evaluation. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:1508-1518. [PMID: 33982573 DOI: 10.1021/jasms.1c00078] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Organic pollutants can be identified by comparing their electron ionization (EI) mass spectra with those in libraries or obtained from authentic standards. Nevertheless, libraries are incomplete; standards may be unavailable or too costly, or their synthesis may be too time-consuming. This study evaluates the performance of quantum chemical electron ionization mass spectrometry (QCEIMS) vis-à-vis competitive fragmentation modeling (CFM) for suspect screening and unknown identification. EI mass spectra of 35 compounds, including halogenated organics, organophosphorus flame retardants (OPFRs), and disinfection byproducts were computed. Computational results were compared with EI mass spectra compiled in the NIST Library as well as collision-induced dissociation (CID) mass spectra obtained from radical cations M•+ generated by charge-exchange atmospheric pressure chemical ionization (APCI). The results indicate that QCEIMS performs equivalently or better than CFM. Average match factors between computed and experimental (NIST) EI mass spectra were 656 vs 503 for the halogenated organics, and on average, QCEIMS predicted 55% of the products generated by CID vs 17% predicted by CFM. QCEIMS predicted 37% of the OPFR CID products whereas CFM predicted 29%. QCEIMS performed comparably to a commercial combinatorial fragmentation method for suspect screening of a dust sample, identifying 19/20 targets. Examples of unknown pollutants, whose reference spectra were unavailable at the time of discovery, are also presented. The computational results suggest that QCEIMS can help guide the analyst in obtaining authentic standards and raise the possibility that, with advances in computing, an unknown may eventually be confirmed in hours as opposed to the days or months required to obtain authentic standards.
Collapse
Affiliation(s)
- Sophia A Schreckenbach
- Departments of Chemistry and Physical and Environmental Sciences, University of Toronto, Toronto, Ontario M1C 1A4, Canada
| | - James S M Anderson
- Institute of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | - Jeroen Koopman
- Mulliken Center for Theoretical Chemistry, University of Bonn, 53115 Bonn, Germany
| | - Stefan Grimme
- Mulliken Center for Theoretical Chemistry, University of Bonn, 53115 Bonn, Germany
| | - Myrna J Simpson
- Departments of Chemistry and Physical and Environmental Sciences, University of Toronto, Toronto, Ontario M1C 1A4, Canada
| | - Karl J Jobst
- Department of Chemistry, Memorial University of Newfoundland, St. John's, Newfoundland and Labrador A1B 3X7, Canada
| |
Collapse
|
9
|
Borges R, Colby SM, Das S, Edison AS, Fiehn O, Kind T, Lee J, Merrill AT, Merz KM, Metz TO, Nunez JR, Tantillo DJ, Wang LP, Wang S, Renslow RS. Quantum Chemistry Calculations for Metabolomics. Chem Rev 2021; 121:5633-5670. [PMID: 33979149 PMCID: PMC8161423 DOI: 10.1021/acs.chemrev.0c00901] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Indexed: 02/07/2023]
Abstract
A primary goal of metabolomics studies is to fully characterize the small-molecule composition of complex biological and environmental samples. However, despite advances in analytical technologies over the past two decades, the majority of small molecules in complex samples are not readily identifiable due to the immense structural and chemical diversity present within the metabolome. Current gold-standard identification methods rely on reference libraries built using authentic chemical materials ("standards"), which are not available for most molecules. Computational quantum chemistry methods, which can be used to calculate chemical properties that are then measured by analytical platforms, offer an alternative route for building reference libraries, i.e., in silico libraries for "standards-free" identification. In this review, we cover the major roadblocks currently facing metabolomics and discuss applications where quantum chemistry calculations offer a solution. Several successful examples for nuclear magnetic resonance spectroscopy, ion mobility spectrometry, infrared spectroscopy, and mass spectrometry methods are reviewed. Finally, we consider current best practices, sources of error, and provide an outlook for quantum chemistry calculations in metabolomics studies. We expect this review will inspire researchers in the field of small-molecule identification to accelerate adoption of in silico methods for generation of reference libraries and to add quantum chemistry calculations as another tool at their disposal to characterize complex samples.
Collapse
Affiliation(s)
- Ricardo
M. Borges
- Walter
Mors Institute of Research on Natural Products, Federal University of Rio de Janeiro, Rio de Janeiro 21941-901, Brazil
| | - Sean M. Colby
- Biological
Science Division, Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Susanta Das
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Arthur S. Edison
- Departments
of Genetics and Biochemistry and Molecular Biology, Complex Carbohydrate
Research Center and Institute of Bioinformatics, University of Georgia, Athens, Georgia 30602, United States
| | - Oliver Fiehn
- West
Coast Metabolomics Center for Compound Identification, UC Davis Genome
Center, University of California, Davis, California 95616, United States
| | - Tobias Kind
- West
Coast Metabolomics Center for Compound Identification, UC Davis Genome
Center, University of California, Davis, California 95616, United States
| | - Jesi Lee
- West
Coast Metabolomics Center for Compound Identification, UC Davis Genome
Center, University of California, Davis, California 95616, United States
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Amy T. Merrill
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Kenneth M. Merz
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Thomas O. Metz
- Biological
Science Division, Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Jamie R. Nunez
- Biological
Science Division, Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Dean J. Tantillo
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Lee-Ping Wang
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Shunyang Wang
- West
Coast Metabolomics Center for Compound Identification, UC Davis Genome
Center, University of California, Davis, California 95616, United States
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Ryan S. Renslow
- Biological
Science Division, Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
10
|
Krettler CA, Thallinger GG. A map of mass spectrometry-based in silico fragmentation prediction and compound identification in metabolomics. Brief Bioinform 2021; 22:6184408. [PMID: 33758925 DOI: 10.1093/bib/bbab073] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 01/29/2021] [Accepted: 02/12/2021] [Indexed: 12/27/2022] Open
Abstract
Metabolomics, the comprehensive study of the metabolome, and lipidomics-the large-scale study of pathways and networks of cellular lipids-are major driving forces in enabling personalized medicine. Complicated and error-prone data analysis still remains a bottleneck, however, especially for identifying novel metabolites. Comparing experimental mass spectra to curated databases containing reference spectra has been the gold standard for identification of compounds, but constructing such databases is a costly and time-demanding task. Many software applications try to circumvent this process by utilizing cutting-edge advances in computational methods-including quantum chemistry and machine learning-and simulate mass spectra by performing theoretical, so called in silico fragmentations of compounds. Other solutions concentrate directly on experimental spectra and try to identify structural properties by investigating reoccurring patterns and the relationships between them. The considerable progress made in the field allows recent approaches to provide valuable clues to expedite annotation of experimental mass spectra. This review sheds light on individual strengths and weaknesses of these tools, and attempts to evaluate them-especially in view of lipidomics, when considering complex mixtures found in biological samples as well as mass spectrometer inter-instrument variability.
Collapse
Affiliation(s)
- Christoph A Krettler
- Institute of Biomedical Informatics, Graz University of Technology, Stremayrgasse 16/I, 8010, Graz, Austria.,Omics Center Graz, BioTechMed-Graz, Stiftingtalstrasse 24, 8010, Graz, Austria
| | - Gerhard G Thallinger
- Institute of Biomedical Informatics, Graz University of Technology, Stremayrgasse 16/I, 8010, Graz, Austria.,Omics Center Graz, BioTechMed-Graz, Stiftingtalstrasse 24, 8010, Graz, Austria
| |
Collapse
|
11
|
Schreckenbach SA, Simmons D, Ladak A, Mullin L, Muir DCG, Simpson MJ, Jobst KJ. Data-Independent Identification of Suspected Organic Pollutants Using Gas Chromatography-Atmospheric Pressure Chemical Ionization-Mass Spectrometry. Anal Chem 2021; 93:1498-1506. [PMID: 33355455 DOI: 10.1021/acs.analchem.0c03733] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The identity of an unknown environmental pollutant is reflected by the mass and dissociation chemistry of its (quasi)molecular ion. Gas chromatography-atmospheric pressure chemical ionization-mass spectrometry (GC-APCI-MS) increases the yield of molecular ions (compared to conventional electron ionization) by collisional cooling. Scanning quadrupole data-independent acquisition (SQDIA) permits unbiased, unattended selection of (quasi)molecular ions and acquisition of structure-diagnostic collision-induced dissociation mass spectra, while minimizing interferences, by sequentially cycling a quadrupole isolation window through the m/z range. This study reports on the development of a suspect screening method based on industrial compounds with bioaccumulation potential. A comparison of false and correct identifications in a mixed standard containing 30 analytes suggests that SQDIA results in a markedly lower false-positive rate than standard DIA: 5 for SQDIA and 82 for DIA. Electronic waste dust was analyzed using GC and quadrupole time-of-flight MS with APCI and SQDIA acquisition. A total of 52 brominated, chlorinated, and organophosphorus compounds were identified by suspect screening; 15 unique elemental compositions were identified using nontargeted screening; 17 compounds were confirmed using standards and others identified to confidence levels 2, 3, or 4. SQDIA reduced false-positive identifications, compared to experiments without quadrupole isolation. False positives also varied by class: 20% for Br, 37% for Cl, 75% for P, and >99% for all other classes. The structure proposal of a previously reported halogenated compound was revisited. The results underline the utility of GC-SQDIA experiments that provide information on both the (quasi)molecular ions and its dissociation products for a more confident structural assignment.
Collapse
Affiliation(s)
- Sophia A Schreckenbach
- Department of Chemistry, University of Toronto, Toronto, Ontario, M1C 1A4, Canada.,Department of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada
| | - Denina Simmons
- Depertment of Biology, University of Ontario Institute of Technology, Oshawa, Ontario L1G 0C5, Canada
| | - Adam Ladak
- Waters Corporation, Milford, Massachusetts 01757, United States
| | - Lauren Mullin
- Waters Corporation, Milford, Massachusetts 01757, United States
| | - Derek C G Muir
- Environment and Climate Change Canada, Burlington, Ontario ON L7S 1A1, Canada
| | - Myrna J Simpson
- Department of Chemistry, University of Toronto, Toronto, Ontario, M1C 1A4, Canada.,Department of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto, Ontario M1C 1A4, Canada
| | - Karl J Jobst
- Department of Chemistry, Memorial University of Newfoundland, St. John's, Newfoundland A1B 3X7, Canada
| |
Collapse
|
12
|
Schüler JA, Rechner S, Müller-Hannemann M. MET: a Java package for fast molecule equivalence testing. J Cheminform 2020; 12:73. [PMID: 33334379 PMCID: PMC7745470 DOI: 10.1186/s13321-020-00480-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 12/03/2020] [Indexed: 11/11/2022] Open
Abstract
An important task in cheminformatics is to test whether two molecules are equivalent with respect to their 2D structure. Mathematically, this amounts to solving the graph isomorphism problem for labelled graphs. In this paper, we present an approach which exploits chemical properties and the local neighbourhood of atoms to define highly distinctive node labels. These characteristic labels are the key for clever partitioning molecules into molecule equivalence classes and an effective equivalence test. Based on extensive computational experiments, we show that our algorithm is significantly faster than existing implementations within SMSD, CDK and RDKit. We provide our Java implementation as an easy-to-use, open-source package (via GitHub) which is compatible with CDK. It fully supports the distinction of different isotopes and molecules with radicals.
Collapse
Affiliation(s)
- Jördis-Ann Schüler
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120, Halle, Germany.
| | - Steffen Rechner
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120, Halle, Germany
| | - Matthias Müller-Hannemann
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120, Halle, Germany
| |
Collapse
|
13
|
Li Y, Kuhn M, Gavin AC, Bork P. Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features. Bioinformatics 2020; 36:1213-1218. [PMID: 31605112 PMCID: PMC7703789 DOI: 10.1093/bioinformatics/btz736] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 07/30/2019] [Accepted: 09/25/2019] [Indexed: 01/11/2023] Open
Abstract
Motivation Untargeted mass spectrometry (MS/MS) is a powerful method for detecting metabolites in biological samples. However, fast and accurate identification of the metabolites’ structures from MS/MS spectra is still a great challenge. Results We present a new analysis method, called SubFragment-Matching (SF-Matching) that is based on the hypothesis that molecules with similar structural features will exhibit similar fragmentation patterns. We combine information on fragmentation patterns of molecules with shared substructures and then use random forest models to predict whether a given structure can yield a certain fragmentation pattern. These models can then be used to score candidate molecules for a given mass spectrum. For rapid identification, we pre-compute such scores for common biological molecular structure databases. Using benchmarking datasets, we find that our method has similar performance to CSI: FingerID and those very high accuracies can be achieved by combining our method with CSI: FingerID. Rarefaction analysis of the training dataset shows that the performance of our method will increase as more experimental data become available. Availability and implementation SF-Matching is available from http://www.bork.embl.de/Docu/sf_matching. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuanyue Li
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Michael Kuhn
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Anne-Claude Gavin
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Molecular Medicine Partnership Unit (MMPU), 69117 Heidelberg, Germany
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Molecular Medicine Partnership Unit (MMPU), 69117 Heidelberg, Germany.,Max Delbrück Center for Molecular Medicine, 13125 Berlin, Germany.,Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany
| |
Collapse
|
14
|
Moumbock AFA, Ntie-Kang F, Akone SH, Li J, Gao M, Telukunta KK, Günther S. An overview of tools, software, and methods for natural product fragment and mass spectral analysis. PHYSICAL SCIENCES REVIEWS 2019. [DOI: 10.1515/psr-2018-0126] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Abstract
One major challenge in natural product (NP) discovery is the determination of the chemical structure of unknown metabolites using automated software tools from either GC–mass spectrometry (MS) or liquid chromatography–MS/MS data only. This chapter reviews the existing spectral libraries and predictive computational tools used in MS-based untargeted metabolomics, which is currently a hot topic in NP structure elucidation. We begin by focusing on spectral databases and the general workflow of MS annotation. We then describe software and tools used in MS, particularly those used to predict fragmentation patterns, mass spectral classifiers, and tools for fragmentation trees analysis. We then round up the chapter by looking at more advanced approaches implemented in tools for competitive fragmentation modeling and quantum chemical approaches.
Collapse
|