1
|
Yu T, Chen JM, Liu W, Zhao JQ, Li P, Liu FJ, Jiang Y, Li HJ. In-depth characterization of cycloartane triterpenoids and discovery of species-specific markers from three Cimicifuga species guided by a strategy that integrates in-source fragment elimination, diagnostic ion recognition, and feature-based molecular networking. J Chromatogr A 2024; 1728:465015. [PMID: 38821032 DOI: 10.1016/j.chroma.2024.465015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/13/2024] [Accepted: 05/21/2024] [Indexed: 06/02/2024]
Abstract
Characterization studies of the plant metabolome are crucial for revealing plant physiology, developing functional foods, and controlling quality. Mass spectrometry-based metabolite profiling allows unprecedented qualitative coverage of complex biological extract composition. However, the electrospray ionization used in metabolite profiling generates multiple artifactual signals for a single analyte, which makes it challenging to filter out redundant signals and organize the signals corresponding to abundant constituents. This study proposed a strategy integrating in-source fragments elimination, diagnostic ions recognition, and feature-based molecular networking (ISFE-DIR-FBMN) to simultaneously characterize cycloartane triterpenoids (CTs) from three medicinal Cimicifuga species. The results showed that 63.1 % of the measured ions were redundant. A total of 184 CTs were annotated, with 27.1 % being reported for the first time. It presents a promising approach to assess the composition of natural extracts, thus facilitating new ingredient registrations or natural-extracts-based drug discovery campaigns. Besides, chemometrics analysis of the three Cimicifuga species identified 32 species-specific markers, highlighting significant differences among them. The valuable information can enhance the sustainable utilization and further development of Cimicifuga resources. The codes involved in ISFE-DIR-FBMN are freely available on GitHub (https://github.com/LHJ-Group/ISFE-DIR-FBMN.git).
Collapse
Affiliation(s)
- Ting Yu
- State Key Laboratory of Natural Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Jia-Min Chen
- State Key Laboratory of Natural Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Wei Liu
- State Key Laboratory of Natural Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Jin-Quan Zhao
- State Key Laboratory of Natural Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Ping Li
- State Key Laboratory of Natural Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Feng-Jie Liu
- Key Laboratory of Pharmaceutical Quality Control of Hebei Province, College of Pharmaceutical Science, Hebei University, Baoding 071002, China.
| | - Yan Jiang
- College of Chemical Engineering, Nanjing Forestry University, Nanjing 210037, China.
| | - Hui-Jun Li
- State Key Laboratory of Natural Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 211198, China.
| |
Collapse
|
2
|
Samanipour S, Barron LP, van Herwerden D, Praetorius A, Thomas KV, O’Brien JW. Exploring the Chemical Space of the Exposome: How Far Have We Gone? JACS AU 2024; 4:2412-2425. [PMID: 39055136 PMCID: PMC11267556 DOI: 10.1021/jacsau.4c00220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 05/29/2024] [Accepted: 05/31/2024] [Indexed: 07/27/2024]
Abstract
Around two-thirds of chronic human disease can not be explained by genetics alone. The Lancet Commission on Pollution and Health estimates that 16% of global premature deaths are linked to pollution. Additionally, it is now thought that humankind has surpassed the safe planetary operating space for introducing human-made chemicals into the Earth System. Direct and indirect exposure to a myriad of chemicals, known and unknown, poses a significant threat to biodiversity and human health, from vaccine efficacy to the rise of antimicrobial resistance as well as autoimmune diseases and mental health disorders. The exposome chemical space remains largely uncharted due to the sheer number of possible chemical structures, estimated at over 1060 unique forms. Conventional methods have cataloged only a fraction of the exposome, overlooking transformation products and often yielding uncertain results. In this Perspective, we have reviewed the latest efforts in mapping the exposome chemical space and its subspaces. We also provide our view on how the integration of data-driven approaches might be able to bridge the identified gaps.
Collapse
Affiliation(s)
- Saer Samanipour
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- UvA
Data Science Center, University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Cornwall Street, Woolloongabba, Queensland 4102, Australia
| | - Leon Patrick Barron
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- MRC
Centre for Environment and Health, Environmental Research Group, School
of Public Health, Faculty of Medicine, Imperial
College London, London W12 0BZ, United Kingdom
| | - Denice van Herwerden
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
| | - Antonia Praetorius
- Institute
for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
| | - Kevin V. Thomas
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Cornwall Street, Woolloongabba, Queensland 4102, Australia
| | - Jake William O’Brien
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Cornwall Street, Woolloongabba, Queensland 4102, Australia
| |
Collapse
|
3
|
Bui-Thi D, Liu Y, Lippens JL, Laukens K, De Vijlder T. TransExION: a transformer based explainable similarity metric for comparing IONS in tandem mass spectrometry. J Cheminform 2024; 16:61. [PMID: 38807166 PMCID: PMC11134763 DOI: 10.1186/s13321-024-00858-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 05/12/2024] [Indexed: 05/30/2024] Open
Abstract
Small molecule identification is a crucial task in analytical chemistry and life sciences. One of the most commonly used technologies to elucidate small molecule structures is mass spectrometry. Spectral library search of product ion spectra (MS/MS) is a popular strategy to identify or find structural analogues. This approach relies on the assumption that spectral similarity and structural similarity are correlated. However, popular spectral similarity measures, usually calculated based on identical fragment matches between the MS/MS spectra, do not always accurately reflect the structural similarity. In this study, we propose TransExION, a Transformer based Explainable similarity metric for IONS. TransExION detects related fragments between MS/MS spectra through their mass difference and uses these to estimate spectral similarity. These related fragments can be nearly identical, but can also share a substructure. TransExION also provides a post-hoc explanation of its estimation, which can be used to support scientists in evaluating the spectral library search results and thus in structure elucidation of unknown molecules. Our model has a Transformer based architecture and it is trained on the data derived from GNPS MS/MS libraries. The experimental results show that it improves existing spectral similarity measures in searching and interpreting structural analogues as well as in molecular networking. SCIENTIFIC CONTRIBUTION: We propose a transformer-based spectral similarity metrics that improves the comparison of small molecule tandem mass spectra. We provide a post hoc explanation that can serve as a good starting point for unknown spectra annotation based on database spectra.
Collapse
Affiliation(s)
- Danh Bui-Thi
- Computer Science Department, University of Antwerp, Middelheimlaan 1, 2020, Antwerp, Belgium
| | - Youzhong Liu
- Therapeutic Development and Supply, Janssen Pharmaceutica N.V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Jennifer L Lippens
- Therapeutic Development and Supply, Janssen Pharmaceutica N.V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Kris Laukens
- Computer Science Department, University of Antwerp, Middelheimlaan 1, 2020, Antwerp, Belgium
| | - Thomas De Vijlder
- Therapeutic Development and Supply, Janssen Pharmaceutica N.V., Turnhoutseweg 30, 2340, Beerse, Belgium.
| |
Collapse
|
4
|
Kalinski JCJ, Noundou XS, Petras D, Matcher GF, Polyzois A, Aron AT, Gentry EC, Bornman TG, Adams JB, Dorrington RA. Urban and agricultural influences on the coastal dissolved organic matter pool in the Algoa Bay estuaries. CHEMOSPHERE 2024; 355:141782. [PMID: 38548083 DOI: 10.1016/j.chemosphere.2024.141782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 02/28/2024] [Accepted: 03/22/2024] [Indexed: 04/08/2024]
Abstract
While anthropogenic pollution is a major threat to aquatic ecosystem health, our knowledge of the presence of xenobiotics in coastal Dissolved Organic Matter (DOM) is still relatively poor. This is especially true for water bodies in the Global South with limited information gained mostly from targeted studies that rely on comparison with authentic standards. In recent years, non-targeted tandem mass spectrometry has emerged as a powerful tool to collectively detect and identify pollutants and biogenic DOM components in the environment, but this approach has yet to be widely utilized for monitoring ecologically important aquatic systems. In this study we compared the DOM composition of Algoa Bay, Eastern Cape, South Africa, and its two estuaries. The Swartkops Estuary is highly urbanized and severely impacted by anthropogenic pollution, while the Sundays Estuary is impacted by commercial agriculture in its catchment. We employed solid-phase extraction followed by liquid chromatography tandem mass spectrometry to annotate more than 200 pharmaceuticals, pesticides, urban xenobiotics, and natural products based on spectral matching. The identification with authentic standards confirmed the presence of methamphetamine, carbamazepine, sulfamethoxazole, N-acetylsulfamethoxazole, imazapyr, caffeine and hexa(methoxymethyl)melamine, and allowed semi-quantitative estimations for annotated xenobiotics. The Swartkops Estuary DOM composition was strongly impacted by features annotated as urban pollutants including pharmaceuticals such as melamines and antiretrovirals. By contrast, the Sundays Estuary exhibited significant enrichment of molecules annotated as agrochemicals widely used in the citrus farming industry, with predicted concentrations for some of them exceeding predicted no-effect concentrations. This study provides new insight into anthropogenic impact on the Algoa Bay system and demonstrates the utility of non-targeted tandem mass spectrometry as a sensitive tool for assessing the health of ecologically important coastal ecosystems and will serve as a valuable foundation for strategizing long-term monitoring efforts.
Collapse
Affiliation(s)
| | - Xavier Siwe Noundou
- Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa; Department of Pharmaceutical Sciences, Sefako Makgatho Health Sciences University, Pretoria, South Africa
| | - Daniel Petras
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, USA; Department of Biochemistry, University of California Riverside, Riverside, USA; CMFI Cluster of Excellence, Interfaculty Institute of Microbiology and Medicine, University of Tuebingen, Tuebingen, Germany
| | - Gwynneth F Matcher
- Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa; South African Institute for Aquatic Biodiversity, 6139, Makhanda, South Africa
| | - Alexandros Polyzois
- Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa; Boyce Thompson Institute and Department of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, 14853, United States
| | - Allegra T Aron
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, USA; Department of Chemistry and Biochemistry, University of Denver, Denver, CO, 80210, United States
| | - Emily C Gentry
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, USA; Department of Chemistry, Virginia Tech, Blacksburg, VA, 24061, United States
| | - Thomas G Bornman
- Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa; South African Environmental Observation Network SAEON, Elwandle Coastal Node, Gqeberha, South Africa; Institute for Coastal and Marine Research, Nelson Mandela University, Gqeberha, South Africa
| | - Janine B Adams
- DSI/NRF Research Chair, Shallow Water Ecosystems, Department of Botany and Institute for Coastal and Marine Research, Nelson Mandela University, Gqeberha, South Africa; Department of Botany, Institute for Coastal and Marine Research CMR, Nelson Mandela University, Gqeberha, South Africa
| | - Rosemary A Dorrington
- Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa; South African Institute for Aquatic Biodiversity, 6139, Makhanda, South Africa.
| |
Collapse
|
5
|
van Tetering L, Spies S, Wildeman QDK, Houthuijs KJ, van Outersterp RE, Martens J, Wevers RA, Wishart DS, Berden G, Oomens J. A spectroscopic test suggests that fragment ion structure annotations in MS/MS libraries are frequently incorrect. Commun Chem 2024; 7:30. [PMID: 38355930 PMCID: PMC10867025 DOI: 10.1038/s42004-024-01112-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Accepted: 01/22/2024] [Indexed: 02/16/2024] Open
Abstract
Modern untargeted mass spectrometry (MS) analyses quickly detect and resolve thousands of molecular compounds. Although features are readily annotated with a molecular formula in high-resolution small-molecule MS applications, the large majority of them remains unidentified in terms of their full molecular structure. Collision-induced dissociation tandem mass spectrometry (CID-MS2) provides a diagnostic molecular fingerprint to resolve the molecular structure through a library search. However, for de novo identifications, one must often rely on in silico generated MS2 spectra as reference. The ability of different in silico algorithms to correctly predict MS2 spectra and thus to retrieve correct molecular structures is a topic of lively debate, for instance in the CASMI contest. Underlying the predicted MS2 spectra are the in silico generated product ion structures, which are normally not used in de novo identification, but which can serve to critically assess the fragmentation algorithms. Here we evaluate in silico generated MSn product ion structures by comparison with structures established experimentally by infrared ion spectroscopy (IRIS). For a set of three dozen product ion structures from five precursor molecules, we find that virtually all fragment ion structure annotations in three major in silico MS2 libraries (HMDB, METLIN, mzCloud) are incorrect and caution the reader against their use for structure annotation of MS/MS ions.
Collapse
Affiliation(s)
- Lara van Tetering
- Radboud University, Institute for Molecules and Materials, FELIX Laboratory, Toernooiveld 7, 6525ED, Nijmegen, The Netherlands
| | - Sylvia Spies
- Radboud University, Institute for Molecules and Materials, FELIX Laboratory, Toernooiveld 7, 6525ED, Nijmegen, The Netherlands
| | - Quirine D K Wildeman
- Radboud University, Institute for Molecules and Materials, FELIX Laboratory, Toernooiveld 7, 6525ED, Nijmegen, The Netherlands
| | - Kas J Houthuijs
- Radboud University, Institute for Molecules and Materials, FELIX Laboratory, Toernooiveld 7, 6525ED, Nijmegen, The Netherlands
| | - Rianne E van Outersterp
- Radboud University, Institute for Molecules and Materials, FELIX Laboratory, Toernooiveld 7, 6525ED, Nijmegen, The Netherlands
| | - Jonathan Martens
- Radboud University, Institute for Molecules and Materials, FELIX Laboratory, Toernooiveld 7, 6525ED, Nijmegen, The Netherlands
| | - Ron A Wevers
- Department of Laboratory Medicine, Translational Metabolic Laboratory, Radboud University Medical Center, Geert Grooteplein Zuid 10, 6525GA, Nijmegen, The Netherlands
| | - David S Wishart
- Departments of Computing Science and Biological Sciences, University of Alberta, Edmonton, AB, Canada
| | - Giel Berden
- Radboud University, Institute for Molecules and Materials, FELIX Laboratory, Toernooiveld 7, 6525ED, Nijmegen, The Netherlands
| | - Jos Oomens
- Radboud University, Institute for Molecules and Materials, FELIX Laboratory, Toernooiveld 7, 6525ED, Nijmegen, The Netherlands.
- van 't Hoff Institute for Molecular Sciences, University of Amsterdam, Science Park 904, 1098XH, Amsterdam, The Netherlands.
| |
Collapse
|
6
|
Adalia R, Patel S, Paiva A, Kaufman T, Zamora I, Cai X, Sanjuan G, Shou WZ. Development of a Predictive Multiple Reaction Monitoring (MRM) Model for High-Throughput ADME Analyses Using Learning-to-Rank (LTR) Techniques. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:131-139. [PMID: 38014625 DOI: 10.1021/jasms.3c00363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Multiple Reaction Monitoring (MRM) is an important MS/MS technique commonly used in drug discovery and development, allowing for the selective and sensitive quantification of compounds in complex matrices. However, compound optimization can be resource intensive and requires experimental determination of product ions for each compound. In this study, we developed a Learning-to-Rank (LTR) model to predict the product ions directly from compound structures, eliminating the requirement for MRM optimization experiments. Experimentally determined MRM conditions for 5757 compounds were used to develop the model. Using the MassChemSite software, theoretical fragments and their mass-to-charge ratios were generated, which were then matched to the experimental product ions to create a data set. Each possible fragment was ranked based on its intensity in the experimental data. Different LTR models were built on a training split. Hyperparameter selection was performed using 5-fold cross validation. The models were evaluated using the Normalized Discounted Cumulative Gain at top k (NDCG@k) and the Coverage at top k (Coverage@k) metrics. Finally, the model was applied to predict MRM conditions for a prospective set of 235 compounds in high-throughput Caco-2 permeability and metabolic stability assays, and quantification results were compared to those obtained with experimentally acquired MRM conditions. The LTR model achieved a NDCG@5 of 0.732 and Coverage@5 of 0.841 on the validation split, and its predictions led to 97% of biologically equivalent results in the Caco-2 permeability and metabolic stability assays.
Collapse
Affiliation(s)
- Ramon Adalia
- Lead Molecular Design S.L., 08172 Sant Cugat de Valles, Spain
- Universitat Autònoma de Barcelona, 08193 Barcelona, Spain
| | - Shivani Patel
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| | - Anthony Paiva
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| | - Tierni Kaufman
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| | - Ismael Zamora
- Lead Molecular Design S.L., 08172 Sant Cugat de Valles, Spain
| | - Xianmei Cai
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| | - Gemma Sanjuan
- Universitat Autònoma de Barcelona, 08193 Barcelona, Spain
| | - Wilson Z Shou
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| |
Collapse
|
7
|
Li S, Bohman B, Flematti GR, Jayatilaka D. Determining the parent and associated fragment formulae in mass spectrometry via the parent subformula graph. J Cheminform 2023; 15:104. [PMID: 37936244 PMCID: PMC10631010 DOI: 10.1186/s13321-023-00776-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 10/25/2023] [Indexed: 11/09/2023] Open
Abstract
BACKGROUND Identifying the molecular formula and fragmentation reactions of an unknown compound from its mass spectrum is crucial in areas such as natural product chemistry and metabolomics. We propose a method for identifying the correct candidate formula of an unidentified natural product from its mass spectrum. The method involves scoring the plausibility of parent candidate formulae based on a parent subformula graph (PSG), and two possible metrics relating to the number of edges in the PSG. This method is applicable to both electron-impact mass spectrometry (EI-MS) and tandem mass spectrometry (MS/MS) data. Additionally, this work introduces the two-dimensional fragmentation plot (2DFP) for visualizing PSGs. RESULTS Our results suggest that incorporating information regarding the edges of the PSG results in enhanced performance in correctly identifying parent formulae, in comparison to the more well-accepted "MS/MS score", on the 2016 Computational Assessment of Small Molecule Identification (CASMI 2016) data set (76.3 vs 58.9% correct formula identification) and the Research Centre for Toxic Compounds in the Environment (RECETOX) data set (66.2% vs 59.4% correct formula identification). In the extension of our method to identify the correct candidate formula from complex EI-MS data of semiochemicals, our method again performed better (correct formula appearing in the top 4 candidates in 20/23 vs 7/23 cases) than the MS/MS score, and enables the rapid identification of both the correct parent ion mass and the correct parent formula with minimal expert intervention. CONCLUSION Our method reliably identifies the correct parent formula even when the mass information is ambiguous. Furthermore, should parent formula identification be successful, the majority of associated fragment formulae can also be correctly identified. Our method can also identify the parent ion and its associated fragments in EI-MS spectra where the identity of the parent ion is unclear due to low quantities and overlapping compounds. Finally, our method does not inherently require empirical fitting of parameters or statistical learning, meaning it is easy to implement and extend upon. SCIENTIFIC CONTRIBUTION Developed, implemented and tested new metrics for assessing plausibility of candidate molecular formulae obtained from HR-MS data.
Collapse
Affiliation(s)
- Sean Li
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, 6009, Australia.
| | - Björn Bohman
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, 6009, Australia
- Department of Plant Protection Biology, Swedish University of Agricultural Sciences, Box 190, 23422, Lomma, Sweden
| | - Gavin R Flematti
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, 6009, Australia
| | - Dylan Jayatilaka
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, 6009, Australia
| |
Collapse
|
8
|
Abram KJ, McCloskey D. In Search of Disentanglement in Tandem Mass Spectrometry Datasets. Biomolecules 2023; 13:1343. [PMID: 37759743 PMCID: PMC10526774 DOI: 10.3390/biom13091343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 08/16/2023] [Accepted: 08/25/2023] [Indexed: 09/29/2023] Open
Abstract
Generative modeling and representation learning of tandem mass spectrometry data aim to learn an interpretable and instrument-agnostic digital representation of metabolites directly from MS/MS spectra. Interpretable and instrument-agnostic digital representations would facilitate comparisons of MS/MS spectra between instrument vendors and enable better and more accurate queries of large MS/MS spectra databases for metabolite identification. In this study, we apply generative modeling and representation learning using variational autoencoders to understand the extent to which tandem mass spectra can be disentangled into their factors of generation (e.g., collision energy, ionization mode, instrument type, etc.) with minimal prior knowledge of the factors. We find that variational autoencoders can disentangle tandem mass spectra data with the proper choice of hyperparameters into meaningful latent representations aligned with known factors of variation. We develop a two-step approach to facilitate the selection of models that are disentangled, which could be applied to other complex and high-dimensional data sets.
Collapse
Affiliation(s)
- Krzysztof Jan Abram
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Lyngby, Denmark;
- Johnson & Johnson MedTech, Bregnerodvej 133, 3460 Birkerod, Denmark
| | - Douglas McCloskey
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Lyngby, Denmark;
- BioMed X Institute, Im Neuenheimer Feld 515, 69120 Heidelberg, Germany
| |
Collapse
|
9
|
Houthuijs KJ, Berden G, Engelke UFH, Gautam V, Wishart DS, Wevers RA, Martens J, Oomens J. An In Silico Infrared Spectral Library of Molecular Ions for Metabolite Identification. Anal Chem 2023. [PMID: 37262385 DOI: 10.1021/acs.analchem.3c01078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Infrared ion spectroscopy (IRIS) continues to see increasing use as an analytical tool for small-molecule identification in conjunction with mass spectrometry (MS). The IR spectrum of an m/z selected population of ions constitutes a unique fingerprint that is specific to the molecular structure. However, direct translation of an IR spectrum to a molecular structure remains challenging, as reference libraries of IR spectra of molecular ions largely do not exist. Quantum-chemically computed spectra can reliably be used as reference, but the challenge of selecting the candidate structures remains. Here, we introduce an in silico library of vibrational spectra of common MS adducts of over 4500 compounds found in the human metabolome database. In total, the library currently contains more than 75,000 spectra computed at the DFT level that can be queried with an experimental IR spectrum. Moreover, we introduce a database of 189 experimental IRIS spectra, which is employed to validate the automated spectral matching routines. This demonstrates that 75% of the metabolites in the experimental data set are correctly identified, based solely on their exact m/z and IRIS spectrum. Additionally, we demonstrate an approach for specifically identifying substructures by performing a search without m/z constraints to find structural analogues. Such an unsupervised search paves the way toward the de novo identification of unknowns that are absent in spectral libraries. We apply the in silico spectral library to identify an unknown in a plasma sample as 3-hydroxyhexanoic acid, highlighting the potential of the method.
Collapse
Affiliation(s)
- Kas J Houthuijs
- Institute for Molecules and Materials, FELIX Laboratory, Radboud University, Nijmegen 6525 ED, The Netherlands
| | - Giel Berden
- Institute for Molecules and Materials, FELIX Laboratory, Radboud University, Nijmegen 6525 ED, The Netherlands
| | - Udo F H Engelke
- Department of Genetics, Translational Metabolic Laboratory, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands
| | - Vasuk Gautam
- Department of Biological Sciences, University of Alberta, Edmonton AB T6G 2E9, Canada
| | - David S Wishart
- Department of Biological Sciences, University of Alberta, Edmonton AB T6G 2E9, Canada
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
- Department of Laboratory Medicine and Pathology, University of Alberta, Edmonton, AB T6G 2B7, Canada
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB T6G 2H7, Canada
| | - Ron A Wevers
- Department of Genetics, Translational Metabolic Laboratory, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands
| | - Jonathan Martens
- Institute for Molecules and Materials, FELIX Laboratory, Radboud University, Nijmegen 6525 ED, The Netherlands
| | - Jos Oomens
- Institute for Molecules and Materials, FELIX Laboratory, Radboud University, Nijmegen 6525 ED, The Netherlands
- van 't Hoff Institute for Molecular Sciences, University of Amsterdam, Amsterdam 1098 XH, The Netherlands
| |
Collapse
|
10
|
Gaudêncio SP, Bayram E, Lukić Bilela L, Cueto M, Díaz-Marrero AR, Haznedaroglu BZ, Jimenez C, Mandalakis M, Pereira F, Reyes F, Tasdemir D. Advanced Methods for Natural Products Discovery: Bioactivity Screening, Dereplication, Metabolomics Profiling, Genomic Sequencing, Databases and Informatic Tools, and Structure Elucidation. Mar Drugs 2023; 21:md21050308. [PMID: 37233502 DOI: 10.3390/md21050308] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/11/2023] [Accepted: 05/12/2023] [Indexed: 05/27/2023] Open
Abstract
Natural Products (NP) are essential for the discovery of novel drugs and products for numerous biotechnological applications. The NP discovery process is expensive and time-consuming, having as major hurdles dereplication (early identification of known compounds) and structure elucidation, particularly the determination of the absolute configuration of metabolites with stereogenic centers. This review comprehensively focuses on recent technological and instrumental advances, highlighting the development of methods that alleviate these obstacles, paving the way for accelerating NP discovery towards biotechnological applications. Herein, we emphasize the most innovative high-throughput tools and methods for advancing bioactivity screening, NP chemical analysis, dereplication, metabolite profiling, metabolomics, genome sequencing and/or genomics approaches, databases, bioinformatics, chemoinformatics, and three-dimensional NP structure elucidation.
Collapse
Affiliation(s)
- Susana P Gaudêncio
- Associate Laboratory i4HB-Institute for Health and Bioeconomy, NOVA School of Science and Technology, NOVA University Lisbon, 2819-516 Caparica, Portugal
- UCIBIO-Applied Molecular Biosciences Unit, Chemistry Department, NOVA School of Science and Technology, NOVA University of Lisbon, 2819-516 Caparica, Portugal
| | - Engin Bayram
- Institute of Environmental Sciences, Room HKC-202, Hisar Campus, Bogazici University, Bebek, Istanbul 34342, Turkey
| | - Lada Lukić Bilela
- Department of Biology, Faculty of Science, University of Sarajevo, 71000 Sarajevo, Bosnia and Herzegovina
| | - Mercedes Cueto
- Instituto de Productos Naturales y Agrobiología-CSIC, 38206 La Laguna, Spain
| | - Ana R Díaz-Marrero
- Instituto de Productos Naturales y Agrobiología-CSIC, 38206 La Laguna, Spain
- Instituto Universitario de Bio-Orgánica (IUBO), Universidad de La Laguna, 38206 La Laguna, Spain
| | - Berat Z Haznedaroglu
- Institute of Environmental Sciences, Room HKC-202, Hisar Campus, Bogazici University, Bebek, Istanbul 34342, Turkey
| | - Carlos Jimenez
- CICA- Centro Interdisciplinar de Química e Bioloxía, Departamento de Química, Facultade de Ciencias, Universidade da Coruña, 15071 A Coruña, Spain
| | - Manolis Mandalakis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, HCMR Thalassocosmos, 71500 Gournes, Crete, Greece
| | - Florbela Pereira
- LAQV, REQUIMTE, Chemistry Department, NOVA School of Science and Technology, NOVA University of Lisbon, 2819-516 Caparica, Portugal
| | - Fernando Reyes
- Fundación MEDINA, Avda. del Conocimiento 34, 18016 Armilla, Spain
| | - Deniz Tasdemir
- GEOMAR Centre for Marine Biotechnology (GEOMAR-Biotech), Research Unit Marine Natural Products Chemistry, GEOMAR Helmholtz Centre for Ocean Research Kiel, Am Kiel-Kanal 44, 24106 Kiel, Germany
- Faculty of Mathematics and Natural Science, Kiel University, Christian-Albrechts-Platz 4, 24118 Kiel, Germany
| |
Collapse
|
11
|
Mahood EH, Bennett AA, Komatsu K, Kruse LH, Lau V, Rahmati Ishka M, Jiang Y, Bravo A, Louie K, Bowen BP, Harrison MJ, Provart NJ, Vatamaniuk OK, Moghe GD. Information theory and machine learning illuminate large-scale metabolomic responses of Brachypodium distachyon to environmental change. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023; 114:463-481. [PMID: 36880270 DOI: 10.1111/tpj.16160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 02/06/2023] [Accepted: 02/19/2023] [Indexed: 05/10/2023]
Abstract
Plant responses to environmental change are mediated via changes in cellular metabolomes. However, <5% of signals obtained from liquid chromatography tandem mass spectrometry (LC-MS/MS) can be identified, limiting our understanding of how metabolomes change under biotic/abiotic stress. To address this challenge, we performed untargeted LC-MS/MS of leaves, roots, and other organs of Brachypodium distachyon (Poaceae) under 17 organ-condition combinations, including copper deficiency, heat stress, low phosphate, and arbuscular mycorrhizal symbiosis. We found that both leaf and root metabolomes were significantly affected by the growth medium. Leaf metabolomes were more diverse than root metabolomes, but the latter were more specialized and more responsive to environmental change. We found that 1 week of copper deficiency shielded the root, but not the leaf metabolome, from perturbation due to heat stress. Machine learning (ML)-based analysis annotated approximately 81% of the fragmented peaks versus approximately 6% using spectral matches alone. We performed one of the most extensive validations of ML-based peak annotations in plants using thousands of authentic standards, and analyzed approximately 37% of the annotated peaks based on these assessments. Analyzing responsiveness of each predicted metabolite class to environmental change revealed significant perturbations of glycerophospholipids, sphingolipids, and flavonoids. Co-accumulation analysis further identified condition-specific biomarkers. To make these results accessible, we developed a visualization platform on the Bio-Analytic Resource for Plant Biology website (https://bar.utoronto.ca/efp_brachypodium_metabolites/cgi-bin/efpWeb.cgi), where perturbed metabolite classes can be readily visualized. Overall, our study illustrates how emerging chemoinformatic methods can be applied to reveal novel insights into the dynamic plant metabolome and stress adaptation.
Collapse
Affiliation(s)
- Elizabeth H Mahood
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
| | - Alexandra A Bennett
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
| | - Karyn Komatsu
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
| | - Lars H Kruse
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
| | - Vincent Lau
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
| | - Maryam Rahmati Ishka
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
- Boyce Thompson Institute, Ithaca, NY, USA
| | - Yulin Jiang
- Soil and Crop Sciences Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
| | | | - Katherine Louie
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Lawrence Berkeley National Laboratory, Department of Energy Joint Genome Institute, Berkeley, CA, USA
| | - Benjamin P Bowen
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Lawrence Berkeley National Laboratory, Department of Energy Joint Genome Institute, Berkeley, CA, USA
| | | | - Nicholas J Provart
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
| | - Olena K Vatamaniuk
- Soil and Crop Sciences Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
| | - Gaurav D Moghe
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
| |
Collapse
|
12
|
Borelli TC, Arini GS, Feitosa LGP, Dorrestein PC, Lopes NP, da Silva RR. Improving annotation propagation on molecular networks through random walks: introducing ChemWalker. Bioinformatics 2023; 39:7067745. [PMID: 36864626 PMCID: PMC9991053 DOI: 10.1093/bioinformatics/btad078] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Revised: 01/13/2023] [Indexed: 03/04/2023] Open
Abstract
MOTIVATION Annotation of the mass signals is still the biggest bottleneck for the untargeted mass spectrometry analysis of complex mixtures. Molecular networks are being increasingly adopted by the mass spectrometry community as a tool to annotate large-scale experiments. We have previously shown that the process of propagating annotations from spectral library matches on molecular networks can be automated using Network Annotation Propagation (NAP). One of the limitations of NAP is that the information for the spectral matches is only propagated locally, to the first neighbor of a spectral match. Here, we show that annotation propagation can be expanded to nodes not directly connected to spectral matches using random walks on graphs, introducing the ChemWalker python library. RESULTS Similarly to NAP, ChemWalker relies on combinatorial in silico fragmentation results, performed by MetFrag, searching biologically relevant databases. Departing from the combination of a spectral network and the structural similarity among candidate structures, we have used MetFusion Scoring function to create a weight function, producing a weighted graph. This graph was subsequently used by the random walk to calculate the probability of 'walking' through a set of candidates, departing from seed nodes (represented by spectral library matches). This approach allowed the information propagation to nodes not directly connected to the spectral library match. Compared with NAP, ChemWalker has a series of improvements, on running time, scalability and maintainability and is available as a standalone python package. AVAILABILITY AND IMPLEMENTATION ChemWalker is freely available at https://github.com/computational-chemical-biology/ChemWalker. CONTACT ridasilva@usp.br. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tiago Cabral Borelli
- NPPNS, Department of Molecular Biosciences, School of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, SP 14040-903, Brazil
| | - Gabriel Santos Arini
- NPPNS, Department of Molecular Biosciences, School of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, SP 14040-903, Brazil
| | - Luís G P Feitosa
- NPPNS, Department of Molecular Biosciences, School of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, SP 14040-903, Brazil
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA 92093, USA
| | - Norberto Peporine Lopes
- NPPNS, Department of Molecular Biosciences, School of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, SP 14040-903, Brazil
| | | |
Collapse
|
13
|
Boelrijk J, van Herwerden D, Ensing B, Forré P, Samanipour S. Predicting RP-LC retention indices of structurally unknown chemicals from mass spectrometry data. J Cheminform 2023; 15:28. [PMID: 36829215 PMCID: PMC9960388 DOI: 10.1186/s13321-023-00699-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 02/13/2023] [Indexed: 02/26/2023] Open
Abstract
Non-target analysis combined with liquid chromatography high resolution mass spectrometry is considered one of the most comprehensive strategies for the detection and identification of known and unknown chemicals in complex samples. However, many compounds remain unidentified due to data complexity and limited number structures in chemical databases. In this work, we have developed and validated a novel machine learning algorithm to predict the retention index (r[Formula: see text]) values for structurally (un)known chemicals based on their measured fragmentation pattern. The developed model, for the first time, enabled the predication of r[Formula: see text] values without the need for the exact structure of the chemicals, with an [Formula: see text] of 0.91 and 0.77 and root mean squared error (RMSE) of 47 and 67 r[Formula: see text] units for the NORMAN ([Formula: see text]) and amide ([Formula: see text]) test sets, respectively. This fragment based model showed comparable accuracy in r[Formula: see text] prediction compared to conventional descriptor-based models that rely on known chemical structure, which obtained an [Formula: see text] of 0.85 with an RMSE of 67.
Collapse
Affiliation(s)
- Jim Boelrijk
- AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands. .,Institute for Informatics, University of Amsterdam, Amsterdam, The Netherlands.
| | - Denice van Herwerden
- grid.7177.60000000084992262Van’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, The Netherlands
| | - Bernd Ensing
- grid.7177.60000000084992262AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands ,Computational Chemistry Group, Van’t Hoff Institute for Molecular Sciences (HIMS), Amsterdam, The Netherlands
| | - Patrick Forré
- grid.7177.60000000084992262AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands ,grid.7177.60000000084992262Institute for Informatics, University of Amsterdam, Amsterdam, The Netherlands
| | - Saer Samanipour
- Van't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, The Netherlands. .,UvA Data Science Center, University of Amsterdam, Amsterdam, The Netherlands. .,Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Woolloongabba, Australia.
| |
Collapse
|
14
|
MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem. Metabolites 2023; 13:metabo13030314. [PMID: 36984753 PMCID: PMC10053663 DOI: 10.3390/metabo13030314] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 02/14/2023] [Accepted: 02/15/2023] [Indexed: 02/23/2023] Open
Abstract
Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics usually relies on mass spectrometry, a technology capable of detecting thousands of compounds in a biological sample. Metabolite annotation is executed using tandem mass spectrometry. Spectral library search is far from comprehensive, and numerous compounds remain unannotated. So-called in silico methods allow us to overcome the restrictions of spectral libraries, by searching in much larger molecular structure databases. Yet, after more than a decade of method development, in silico methods still do not reach the correct annotation rates that users would wish for. Here, we present a novel computational method called Mad Hatter for this task. Mad Hatter combines CSI:FingerID results with information from the searched structure database via a metascore. Compound information includes the melting point, and the number of words in the compound description starting with the letter ‘u’. We then show that Mad Hatter reaches a stunning 97.6% correct annotations when searching PubChem, one of the largest and most comprehensive molecular structure databases. Unfortunately, Mad Hatter is not a real method. Rather, we developed Mad Hatter solely for the purpose of demonstrating common issues in computational method development and evaluation. We explain what evaluation glitches were necessary for Mad Hatter to reach this annotation level, what is wrong with similar metascores in general, and why metascores may screw up not only method evaluations but also the analysis of biological experiments. This paper may serve as an example of problems in the development and evaluation of machine learning models for metabolite annotation.
Collapse
|
15
|
Morehouse NJ, Clark TN, McMann EJ, van Santen JA, Haeckl FPJ, Gray CA, Linington RG. Annotation of natural product compound families using molecular networking topology and structural similarity fingerprinting. Nat Commun 2023; 14:308. [PMID: 36658161 PMCID: PMC9852437 DOI: 10.1038/s41467-022-35734-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 12/20/2022] [Indexed: 01/20/2023] Open
Abstract
Spectral matching of MS2 fragmentation spectra has become a popular method for characterizing natural products libraries but identification remains challenging due to differences in MS2 fragmentation properties between instruments and the low coverage of current spectral reference libraries. To address this bottleneck we present Structural similarity Network Annotation Platform for Mass Spectrometry (SNAP-MS) which matches chemical similarity grouping in the Natural Products Atlas to grouping of mass spectrometry features from molecular networking. This approach assigns compound families to molecular networking subnetworks without the need for experimental or calculated reference spectra. We demonstrate SNAP-MS can accurately annotate subnetworks built from both reference spectra and an in-house microbial extract library, and correctly predict compound families from published molecular networks acquired on a range of MS instrumentation. Compound family annotations for the microbial extract library are validated by co-injection of standards or isolation and spectroscopic analysis. SNAP-MS is freely available at www.npatlas.org/discover/snapms .
Collapse
Affiliation(s)
- Nicholas J Morehouse
- Department of Biological Sciences, University of New Brunswick, Saint John, NB, Canada
| | - Trevor N Clark
- Department of Chemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Emily J McMann
- Department of Chemistry, Simon Fraser University, Burnaby, BC, Canada
| | | | - F P Jake Haeckl
- Department of Chemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Christopher A Gray
- Department of Biological Sciences, University of New Brunswick, Saint John, NB, Canada.,Department of Chemistry, University of New Brunswick, Fredericton, NB, Canada
| | - Roger G Linington
- Department of Chemistry, Simon Fraser University, Burnaby, BC, Canada.
| |
Collapse
|
16
|
Gomes PWP, de Tralia Medeiros TC, Maimone NM, Leão TF, de Moraes LAB, Bauermeister A. Microbial Metabolites Annotation by Mass Spectrometry-Based Metabolomics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2023; 1439:225-248. [PMID: 37843811 DOI: 10.1007/978-3-031-41741-2_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2023]
Abstract
Since the discovery of penicillin, microbial metabolites have been extensively investigated for drug discovery purposes. In the last decades, microbial derived compounds have gained increasing attention in different fields from pharmacognosy to industry and agriculture. Microbial metabolites in microbiomes present specific functions and can be associated with the maintenance of the natural ecosystems. These metabolites may exhibit a broad range of biological activities of great interest to human purposes. Samples from either microbial isolated cultures or microbiomes consist of complex mixtures of metabolites and their analysis are not a simple process. Mass spectrometry-based metabolomics encompass a set of analytical methods that have brought several improvements to the microbial natural products field. This analytical tool allows the comprehensively detection of metabolites, and therefore, the access of the chemical profile from those biological samples. These analyses generate thousands of mass spectra which is challenging to analyse. In this context, bioinformatic metabolomics tools have been successfully employed to accelerate and facilitate the investigation of specialized microbial metabolites. Herein, we describe metabolomics tools used to provide chemical information for the metabolites, and furthermore, we discuss how they can improve investigation of microbial cultures and interactions.
Collapse
Affiliation(s)
- Paulo Wender P Gomes
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Talita Carla de Tralia Medeiros
- Departamento de Química, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, São Paulo, Brazil
| | - Naydja Moralles Maimone
- Departamento de Ciências Exatas, Escola Superior de Agricultura 'Luiz de Queiroz', Universidade de São Paulo, Piracicaba, São Paulo, Brazil
| | - Tiago F Leão
- Centro de Energia Nuclear na Agricultura, Universidade de São Paulo, Piracicaba, São Paulo, Brazil
| | - Luiz Alberto Beraldo de Moraes
- Departamento de Química, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, São Paulo, Brazil
| | - Anelize Bauermeister
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA.
- Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo, Brazil.
| |
Collapse
|
17
|
Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00577-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
AbstractStructural annotation of small molecules in biological samples remains a key bottleneck in untargeted metabolomics, despite rapid progress in predictive methods and tools during the past decade. Liquid chromatography–tandem mass spectrometry, one of the most widely used analysis platforms, can detect thousands of molecules in a sample, the vast majority of which remain unidentified even with best-of-class methods. Here we present LC-MS2Struct, a machine learning framework for structural annotation of small-molecule data arising from liquid chromatography–tandem mass spectrometry (LC-MS2) measurements. LC-MS2Struct jointly predicts the annotations for a set of mass spectrometry features in a sample, using a novel structured prediction model trained to optimally combine the output of state-of-the-art MS2 scorers and observed retention orders. We evaluate our method on a dataset covering all publicly available reversed-phase LC-MS2 data in the MassBank reference database, including 4,327 molecules measured using 18 different LC conditions from 16 contributors, greatly expanding the chemical analytical space covered in previous multi-MS2 scorer evaluations. LC-MS2Struct obtains significantly higher annotation accuracy than earlier methods and improves the annotation accuracy of state-of-the-art MS2 scorers by up to 106%. The use of stereochemistry-aware molecular fingerprints improves prediction performance, which highlights limitations in existing approaches and has strong implications for future computational LC-MS2 developments.
Collapse
|
18
|
Menger F, Celma A, Schymanski EL, Lai FY, Bijlsma L, Wiberg K, Hernández F, Sancho JV, Ahrens L. Enhancing spectral quality in complex environmental matrices: Supporting suspect and non-target screening in zebra mussels with ion mobility. ENVIRONMENT INTERNATIONAL 2022; 170:107585. [PMID: 36265356 DOI: 10.1016/j.envint.2022.107585] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 10/11/2022] [Accepted: 10/13/2022] [Indexed: 06/16/2023]
Abstract
Identification of bioaccumulating contaminants of emerging concern (CECs) via suspect and non-target screening remains a challenging task. In this study, ion mobility separation with high-resolution mass spectrometry (IM-HRMS) was used to investigate the effects of drift time (DT) alignment on spectrum quality and peak annotation for screening of CECs in complex sample matrices using data independent acquisition (DIA). Data treatment approaches (Binary Sample Comparison) and prioritisation strategies (Halogen Match, co-occurrence of features in biota and the water phase) were explored in a case study on zebra mussel (Dreissena polymorpha) in Lake Mälaren, Sweden's largest drinking water reservoir. DT alignment evidently improved the fragment spectrum quality by increasing the similarity score to reference spectra from on average (±standard deviation) 0.33 ± 0.31 to 0.64 ± 0.30 points, thus positively influencing structure elucidation efforts. Thirty-two features were tentatively identified at confidence level 3 or higher using MetFrag coupled with the new PubChemLite database, which included predicted collision cross-section values from CCSbase. The implementation of predicted mobility data was found to support compound annotation. This study illustrates a quantitative assessment of the benefits of IM-HRMS on spectral quality, which will enhance the performance of future screening studies of CECs in complex environmental matrices.
Collapse
Affiliation(s)
- Frank Menger
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences (SLU), SE-75007 Uppsala, Sweden.
| | - Alberto Celma
- Environmental and Public Health Analytical Chemistry, Research Institute for Pesticides and Water, University Jaume I, Avda. Sos Baynat s/n, E-12071 Castellón, Spain
| | - Emma L Schymanski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6, Avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Foon Yin Lai
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences (SLU), SE-75007 Uppsala, Sweden
| | - Lubertus Bijlsma
- Environmental and Public Health Analytical Chemistry, Research Institute for Pesticides and Water, University Jaume I, Avda. Sos Baynat s/n, E-12071 Castellón, Spain
| | - Karin Wiberg
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences (SLU), SE-75007 Uppsala, Sweden
| | - Félix Hernández
- Environmental and Public Health Analytical Chemistry, Research Institute for Pesticides and Water, University Jaume I, Avda. Sos Baynat s/n, E-12071 Castellón, Spain
| | - Juan V Sancho
- Environmental and Public Health Analytical Chemistry, Research Institute for Pesticides and Water, University Jaume I, Avda. Sos Baynat s/n, E-12071 Castellón, Spain
| | - Lutz Ahrens
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences (SLU), SE-75007 Uppsala, Sweden.
| |
Collapse
|
19
|
Cai Y, Zhou Z, Zhu ZJ. Advanced analytical and informatic strategies for metabolite annotation in untargeted metabolomics. Trends Analyt Chem 2022. [DOI: 10.1016/j.trac.2022.116903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
20
|
Sakurai N, Yamazaki S, Suda K, Hosoki A, Akimoto N, Takahashi H, Shibata D, Aoki Y. The Thing Metabolome Repository family (XMRs): comparable untargeted metabolome databases for analyzing sample-specific unknown metabolites. Nucleic Acids Res 2022; 51:D660-D677. [PMID: 36417935 PMCID: PMC9825447 DOI: 10.1093/nar/gkac1058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 10/21/2022] [Accepted: 10/25/2022] [Indexed: 11/25/2022] Open
Abstract
The identification of unknown chemicals has emerged as a significant issue in untargeted metabolome analysis owing to the limited availability of purified standards for identification; this is a major bottleneck for the accumulation of reusable metabolome data in systems biology. Public resources for discovering and prioritizing the unknowns that should be subject to practical identification, as well as further detailed study of spending costs and the risks of misprediction, are lacking. As such a resource, we released databases, Food-, Plant- and Thing-Metabolome Repository (http://metabolites.in/foods, http://metabolites.in/plants, and http://metabolites.in/things, referred to as XMRs) in which the sample-specific localization of unknowns detected by liquid chromatography-mass spectrometry in a wide variety of samples can be examined, helping to discover and prioritize the unknowns. A set of application programming interfaces for the XMRs facilitates the use of metabolome data for large-scale analysis and data mining. Several applications of XMRs, including integrated metabolome and genome analyses, are presented. Expanding the concept of XMRs will accelerate the identification of unknowns and increase the discovery of new knowledge.
Collapse
Affiliation(s)
- Nozomu Sakurai
- To whom correspondence should be addressed. Tel: +81 55 981 6895; Fax: +81 55 981 9448; ;
| | | | - Kunihiro Suda
- Kazusa DNA Research Institute, 2-6-7 Kazusa-kamatari, Kisarazu, Chiba 292-0818, Japan
| | - Ai Hosoki
- Bioinformation and DDBJ Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Nayumi Akimoto
- Kazusa DNA Research Institute, 2-6-7 Kazusa-kamatari, Kisarazu, Chiba 292-0818, Japan
| | - Haruya Takahashi
- Division of Food Science and Biotechnology, Graduate School of Agriculture, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan
| | - Daisuke Shibata
- Kazusa DNA Research Institute, 2-6-7 Kazusa-kamatari, Kisarazu, Chiba 292-0818, Japan
| | - Yuichi Aoki
- Correspondence may also be addressed to Yuichi Aoki. Tel: +81 22 274 6040; Fax: +81 22 274 6040;
| |
Collapse
|
21
|
LC-DAD-ESI-MS/MS and NMR Analysis of Conifer Wood Specialized Metabolites. Cells 2022; 11:cells11203332. [PMID: 36291197 PMCID: PMC9600761 DOI: 10.3390/cells11203332] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 10/13/2022] [Accepted: 10/20/2022] [Indexed: 11/16/2022] Open
Abstract
Many species from the Pinaceae family have been recognized as a rich source of lignans, flavonoids, and other polyphenolics. The great common occurrence of conifers in Europe, as well as their use in the wood industry, makes both plant material and industrial waste material easily accessible and inexpensive. This is a promising prognosis for both discovery of new active compounds as well as for finding new applications for wood and its industry waste products. This study aimed to analyze and phytochemically profile 13 wood extracts of the Pinaceae family species, endemic or introduced in Polish flora, using the LC-DAD–ESI-MS/MS method and compare their respective metabolite profiles. Branch wood methanolic extracts were phytochemically profiled. Lignans, stilbenes, flavonoids, diterpenes, procyanidins, and other compounds were detected, with a considerable variety of chemical content among distinct species. Norway spruce (Picea abies (L.) H.Karst.) branch wood was the most abundant source of stilbenes, European larch (Larix decidua Mill.) mostly contained flavonoids, while silver fir (Abies alba Mill.) was rich in lignans. Furthermore, 10 lignans were isolated from the studied material. Our findings confirm that wood industry waste materials, such as conifer branches, can be a potent source of different phytochemicals, with the plant matrix being relatively simple, facilitating future isolation of target compounds.
Collapse
|
22
|
Ljoncheva M, Stepišnik T, Kosjek T, Džeroski S. Machine learning for identification of silylated derivatives from mass spectra. J Cheminform 2022; 14:62. [PMID: 36109826 PMCID: PMC9476372 DOI: 10.1186/s13321-022-00636-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 07/31/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Motivation
Compound structure identification is using increasingly more sophisticated computational tools, among which machine learning tools are a recent addition that quickly gains in importance. These tools, of which the method titled Compound Structure Identification:Input Output Kernel Regression (CSI:IOKR) is an excellent example, have been used to elucidate compound structure from mass spectral (MS) data with significant accuracy, confidence and speed. They have, however, largely focused on data coming from liquid chromatography coupled to tandem mass spectrometry (LC–MS).
Gas chromatography coupled to mass spectrometry (GC–MS) is an alternative which offers several advantages as compared to LC–MS, including higher data reproducibility. Of special importance is the substantial compound coverage offered by GC–MS, further expanded by derivatization procedures, such as silylation, which can improve the volatility, thermal stability and chromatographic peak shape of semi-volatile analytes. Despite these advantages and the increasing size of compound databases and MS libraries, GC–MS data have not yet been used by machine learning approaches to compound structure identification.
Results
This study presents a successful application of the CSI:IOKR machine learning method for the identification of environmental contaminants from GC–MS spectra. We use CSI:IOKR as an alternative to exhaustive search of MS libraries, independent of instrumental platform and data processing software. We use a comprehensive dataset of GC–MS spectra of trimethylsilyl derivatives and their molecular structures, derived from a large commercially available MS library, to train a model that maps between spectra and molecular structures. We test the learned model on a different dataset of GC–MS spectra of trimethylsilyl derivatives of environmental contaminants, generated in-house and made publicly available. The results show that 37% (resp. 50%) of the tested compounds are correctly ranked among the top 10 (resp. 20) candidate compounds suggested by the model. Even though spectral comparisons with reference standards or de novo structural elucidations are neccessary to validate the predictions, machine learning provides efficient candidate prioritization and reduction of the time spent for compound annotation.
Collapse
|
23
|
Bremer PL, Vaniya A, Kind T, Wang S, Fiehn O. How Well Can We Predict Mass Spectra from Structures? Benchmarking Competitive Fragmentation Modeling for Metabolite Identification on Untrained Tandem Mass Spectra. J Chem Inf Model 2022; 62:4049-4056. [PMID: 36043939 DOI: 10.1021/acs.jcim.2c00936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Competitive Fragmentation Modeling for Metabolite Identification (CFM-ID) is a machine learning tool to predict in silico tandem mass spectra (MS/MS) for known or suspected metabolites for which chemical reference standards are not available. As a machine learning tool, it relies on both an underlying statistical model and an explicit training set that encompasses experimental mass spectra for specific compounds. Such mass spectra depend on specific parameters such as collision energies, instrument types, and adducts which are accumulated in libraries. Yet, ultimately prediction tools that are meant to cover wide expanses of entities must be validated on cases that were not included in the initial training and testing sets. Hence, we here benchmarked the performance of CFM-ID 4.0 to correctly predict MS/MS spectra for spectra that were not included in the CFM-ID training set and for different mass spectrometry conditions. We used 609,456 experimental tandem spectra from the NIST20 mass spectral library that were newly added to the previous NIST17 library version. We found that CFM-ID's highest energy prediction output would maximize the capacity for library generation. Matching the experimental collision energy with CFM-ID's prediction energy produced the best results, even for HCD-Orbitrap instruments. For benzenoids, better MS/MS predictions were achieved than for heterocyclic compounds. However, when exploring CFM-ID's performance on 8,305 compounds at 40 eV HCD-Orbitrap collision energy, >90% of the 20/80 split test compounds showed <700 MS/MS similarity score. Instead of a stand-alone tool, CFM-ID 4.0 might be useful to boost candidate structures in the greater context of identification workflows.
Collapse
Affiliation(s)
- Parker Ladd Bremer
- Department of Chemistry, University of California Davis, Davis, California 95616, United States
| | - Arpana Vaniya
- West Coast Metabolomics Center for Compound Identification, UC Davis Genome Center, University of California Davis, Davis, California 95616, United States
| | - Tobias Kind
- West Coast Metabolomics Center for Compound Identification, UC Davis Genome Center, University of California Davis, Davis, California 95616, United States
| | - Shunyang Wang
- Department of Chemistry, University of California Davis, Davis, California 95616, United States
| | - Oliver Fiehn
- West Coast Metabolomics Center for Compound Identification, UC Davis Genome Center, University of California Davis, Davis, California 95616, United States
| |
Collapse
|
24
|
MSNovelist: de novo structure generation from mass spectra. Nat Methods 2022; 19:865-870. [PMID: 35637304 PMCID: PMC9262714 DOI: 10.1038/s41592-022-01486-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 04/07/2022] [Indexed: 12/29/2022]
Abstract
Current methods for structure elucidation of small molecules rely on finding similarity with spectra of known compounds, but do not predict structures de novo for unknown compound classes. We present MSNovelist, which combines fingerprint prediction with an encoder–decoder neural network to generate structures de novo solely from tandem mass spectrometry (MS2) spectra. In an evaluation with 3,863 MS2 spectra from the Global Natural Product Social Molecular Networking site, MSNovelist predicted 25% of structures correctly on first rank, retrieved 45% of structures overall and reproduced 61% of correct database annotations, without having ever seen the structure in the training phase. Similarly, for the CASMI 2016 challenge, MSNovelist correctly predicted 26% and retrieved 57% of structures, recovering 64% of correct database annotations. Finally, we illustrate the application of MSNovelist in a bryophyte MS2 dataset, in which de novo structure prediction substantially outscored the best database candidate for seven spectra. MSNovelist is ideally suited to complement library-based annotation in the case of poorly represented analyte classes and novel compounds. MSNovelist combines fingerprint prediction with an encoder–decoder neural network for de novo structure generation of small molecules from mass spectra.
Collapse
|
25
|
Sussman EM, Oktem B, Isayeva IS, Liu J, Wickramasekara S, Chandrasekar V, Nahan K, Shin HY, Zheng J. Chemical Characterization and Non-targeted Analysis of Medical Device Extracts: A Review of Current Approaches, Gaps, and Emerging Practices. ACS Biomater Sci Eng 2022; 8:939-963. [PMID: 35171560 DOI: 10.1021/acsbiomaterials.1c01119] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The developers of medical devices evaluate the biocompatibility of their device prior to FDA's review and subsequent introduction to the market. Chemical characterization, described in ISO 10993-18:2020, can generate information for toxicological risk assessment and is an alternative approach for addressing some biocompatibility end points (e.g., systemic toxicity, genotoxicity, carcinogenicity, reproductive/developmental toxicity) that can reduce the time and cost of testing and the need for animal testing. Additionally, chemical characterization can be used to determine whether modifications to the materials and manufacturing processes alter the chemistry of a patient-contacting device to an extent that could impact device safety. Extractables testing is one approach to chemical characterization that employs combinations of non-targeted analysis, non-targeted screening, and/or targeted analysis to establish the identities and quantities of the various chemical constituents that can be released from a device. Due to the difficulty in obtaining a priori information on all the constituents in finished devices, information generation strategies in the form of analytical chemistry testing are often used. Identified and quantified extractables are then assessed using toxicological risk assessment approaches to determine if reported quantities are sufficiently low to overcome the need for further chemical analysis, biological evaluation of select end points, or risk control. For extractables studies to be useful as a screening tool, comprehensive and reliable non-targeted methods are needed. Although non-targeted methods have been adopted by many laboratories, they are laboratory-specific and require expensive analytical instruments and advanced technical expertise to perform. In this Perspective, we describe the elements of extractables studies and provide an overview of the current practices, identified gaps, and emerging practices that may be adopted on a wider scale in the future. This Perspective is outlined according to the steps of an extractables study: information gathering, extraction, extract sample processing, system selection, qualification, quantification, and identification.
Collapse
Affiliation(s)
- Eric M Sussman
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Berk Oktem
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Irada S Isayeva
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Jinrong Liu
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Samanthi Wickramasekara
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Vaishnavi Chandrasekar
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Keaton Nahan
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Hainsworth Y Shin
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Jiwen Zheng
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| |
Collapse
|
26
|
Wasito H, Causon T, Hann S. Alternating in-source fragmentation with single-stage high-resolution mass spectrometry with high annotation confidence in non-targeted metabolomics. Talanta 2022; 236:122828. [PMID: 34635218 DOI: 10.1016/j.talanta.2021.122828] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 08/18/2021] [Accepted: 08/24/2021] [Indexed: 02/07/2023]
Abstract
Non-targeted metabolomics is increasingly applied in various applications for understanding biological processes and finding novel biomarkers in living organisms. However, high-confidence identity confirmation of metabolites in complex biological samples is still a significant bottleneck, especially when using single-stage mass analysers. In the current study, a complete workflow for alternating in-source fragmentation on a time-of-flight mass spectrometry (TOFMS) instrument for non-targeted metabolomics is presented. Hydrophilic interaction liquid chromatography (HILIC) was employed to assess polar metabolites in yeast following ESI parameter optimization using experimental design principles, which revealed the key influence of fragmentor voltage for this application. Datasets from alternating in-source fragmentation high resolution mass spectrometry (HRMS) were evaluated using open-source data processing tools combined with public reference mass spectral databases. The significant influence of the selected fragmentor voltages on the abundance of the primary analyte ion of interest and the extent of in-source fragmentation allowed an optimum selection of qualifier fragments for the different metabolites. The new acquisition and evaluation workflow was implemented for the non-targeted analysis of yeast extract samples whereby more than 130 metabolites were putatively annotated with more than 40% considered to be of high confidence. The presented workflow contains a fully elaborated acquisition and evaluation methodology using alternating in-source fragmentor voltages suitable for peak annotation and metabolite identity confirmation for non-targeted metabolomics applications performed on a single-stage HRMS platform.
Collapse
Affiliation(s)
- Hendri Wasito
- Institute of Analytical Chemistry, Department of Chemistry, University of Natural Resources and Life Sciences, Vienna (BOKU), Muthgasse 18, 1190, Vienna, Austria; Department of Pharmacy, Faculty of Health Sciences, Jenderal Soedirman University, Dr. Soeparno Street, 53122, Purwokerto, Indonesia
| | - Tim Causon
- Institute of Analytical Chemistry, Department of Chemistry, University of Natural Resources and Life Sciences, Vienna (BOKU), Muthgasse 18, 1190, Vienna, Austria
| | - Stephan Hann
- Institute of Analytical Chemistry, Department of Chemistry, University of Natural Resources and Life Sciences, Vienna (BOKU), Muthgasse 18, 1190, Vienna, Austria.
| |
Collapse
|
27
|
Abstract
Motivation Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but these libraries are vastly incomplete; in silico methods search in structure databases, allowing us to overcome this limitation. The best-performing in silico methods use machine learning to predict a molecular fingerprint from tandem mass spectra, then use the predicted fingerprint to search in a molecular structure database. Predicted molecular fingerprints are also of great interest for compound class annotation, de novo structure elucidation, and other tasks. So far, kernel support vector machines are the best tool for fingerprint prediction. However, they cannot be trained on all publicly available reference spectra because their training time scales cubically with the number of training data. Results We use the Nyström approximation to transform the kernel into a linear feature map. We evaluate two methods that use this feature map as input: a linear support vector machine and a deep neural network (DNN). For evaluation, we use a cross-validated dataset of 156 017 compounds and three independent datasets with 1734 compounds. We show that the combination of kernel method and DNN outperforms the kernel support vector machine, which is the current gold standard, as well as a DNN on tandem mass spectra on all evaluation datasets. Availability and implementation The deep kernel learning method for fingerprint prediction is part of the SIRIUS software, available at https://bio.informatik.uni-jena.de/software/sirius.
Collapse
Affiliation(s)
- Kai Dührkop
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
28
|
Place BJ, Ulrich EM, Challis JK, Chao A, Du B, Favela K, Feng YL, Fisher CM, Gardinali P, Hood A, Knolhoff AM, McEachran AD, Nason SL, Newton SR, Ng B, Nuñez J, Peter KT, Phillips AL, Quinete N, Renslow R, Sobus JR, Sussman EM, Warth B, Wickramasekara S, Williams AJ. An Introduction to the Benchmarking and Publications for Non-Targeted Analysis Working Group. Anal Chem 2021; 93:16289-16296. [PMID: 34842413 PMCID: PMC8848292 DOI: 10.1021/acs.analchem.1c02660] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Non-targeted analysis (NTA) encompasses a rapidly evolving set of mass spectrometry techniques aimed at characterizing the chemical composition of complex samples, identifying unknown compounds, and/or classifying samples, without prior knowledge regarding the chemical content of the samples. Recent advances in NTA are the result of improved and more accessible instrumentation for data generation and analysis tools for data evaluation and interpretation. As researchers continue to develop NTA approaches in various scientific fields, there is a growing need to identify, disseminate, and adopt community-wide method reporting guidelines. In 2018, NTA researchers formed the Benchmarking and Publications for Non-Targeted Analysis Working Group (BP4NTA) to address this need. Consisting of participants from around the world and representing fields ranging from environmental science and food chemistry to 'omics and toxicology, BP4NTA provides resources addressing a variety of challenges associated with NTA. Thus far, BP4NTA group members have aimed to establish a consensus on NTA-related terms and concepts and to create consistency in reporting practices by providing resources on a public Web site, including consensus definitions, reference content, and lists of available tools. Moving forward, BP4NTA will provide a setting for NTA researchers to continue discussing emerging challenges and contribute to additional harmonization efforts.
Collapse
Affiliation(s)
- Benjamin J. Place
- National Institute of Standards and Technology, Gaithersburg, MD, USA 20899,Corresponding author,
| | - Elin M. Ulrich
- U.S. Environmental Protection Agency, Office of Research and Development, Center for Computational Toxicology and Exposure, Research Triangle Park, NC, USA 27711
| | | | - Alex Chao
- U.S. Environmental Protection Agency, Office of Research and Development, Center for Computational Toxicology and Exposure, Research Triangle Park, NC, USA 27711
| | - Bowen Du
- Southern California Coastal Water Research Project Authority, Costa Mesa, CA, USA 92626
| | - Kristin Favela
- Southwest Research Institute, San Antonio, TX, USA 78238
| | - Yong-Lai Feng
- Exposure and Biomonitoring Division, Environmental Health Science and Research Bureau, Health Canada, Ottawa, Ontario, Canada, K1A 0K9
| | - Christine M. Fisher
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, MD, USA 20740
| | - Piero Gardinali
- Institute of Environment & Department of Chemistry and Biochemistry, Florida International University, North Miami, FL 33181
| | - Alan Hood
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Silver Spring, MD, USA 20993
| | - Ann M. Knolhoff
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, MD, USA 20740
| | | | - Sara L. Nason
- Connecticut Agricultural Experiment Station, New Haven, CT, USA 06511
| | - Seth R. Newton
- U.S. Environmental Protection Agency, Office of Research and Development, Center for Computational Toxicology and Exposure, Research Triangle Park, NC, USA 27711
| | - Brian Ng
- Institute of Environment & Department of Chemistry and Biochemistry, Florida International University, North Miami, FL 33181
| | - Jamie Nuñez
- Pacific Northwest National Laboratory, Richland, WA, USA 99352
| | - Katherine T. Peter
- National Institute of Standards and Technology, Charleston, SC, USA 29412
| | - Allison L. Phillips
- U.S. Environmental Protection Agency, Office of Research and Development, Center for Public Health and Environmental Assessment, Research Triangle Park, NC, USA 27711
| | - Natalia Quinete
- Institute of Environment & Department of Chemistry and Biochemistry, Florida International University, North Miami, FL 33181
| | - Ryan Renslow
- Pacific Northwest National Laboratory, Richland, WA, USA 99352
| | - Jon R. Sobus
- U.S. Environmental Protection Agency, Office of Research and Development, Center for Computational Toxicology and Exposure, Research Triangle Park, NC, USA 27711
| | - Eric M. Sussman
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Silver Spring, MD, USA 20993
| | - Benedikt Warth
- Department of Food Chemistry and Toxicology, Faculty of Chemistry, University of Vienna, 1090 Vienna, Austria
| | - Samanthi Wickramasekara
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Silver Spring, MD, USA 20993
| | - Antony J. Williams
- U.S. Environmental Protection Agency, Office of Research and Development, Center for Computational Toxicology and Exposure, Research Triangle Park, NC, USA 27711
| |
Collapse
|
29
|
Beniddir MA, Kang KB, Genta-Jouve G, Huber F, Rogers S, van der Hooft JJJ. Advances in decomposing complex metabolite mixtures using substructure- and network-based computational metabolomics approaches. Nat Prod Rep 2021; 38:1967-1993. [PMID: 34821250 PMCID: PMC8597898 DOI: 10.1039/d1np00023c] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Indexed: 12/13/2022]
Abstract
Covering: up to the end of 2020Recently introduced computational metabolome mining tools have started to positively impact the chemical and biological interpretation of untargeted metabolomics analyses. We believe that these current advances make it possible to start decomposing complex metabolite mixtures into substructure and chemical class information, thereby supporting pivotal tasks in metabolomics analysis including metabolite annotation, the comparison of metabolic profiles, and network analyses. In this review, we highlight and explain key tools and emerging strategies covering 2015 up to the end of 2020. The majority of these tools aim at processing and analyzing liquid chromatography coupled to mass spectrometry fragmentation data. We start with defining what substructures are, how they relate to molecular fingerprints, and how recognizing them helps to decompose complex mixtures. We continue with chemical classes that are based on the presence or absence of particular molecular scaffolds and/or functional groups and are thus intrinsically related to substructures. We discuss novel tools to mine substructures, annotate chemical compound classes, and create mass spectral networks from metabolomics data and demonstrate them using two case studies. We also review and speculate about the opportunities that NMR spectroscopy-based metabolome mining of complex metabolite mixtures offers to discover substructures and chemical classes. Finally, we will describe the main benefits and limitations of the current tools and strategies that rely on them, and our vision on how this exciting field can develop toward repository-scale-sized metabolomics analyses. Complementary sources of structural information from genomics analyses and well-curated taxonomic records are also discussed. Many research fields such as natural products discovery, pharmacokinetic and drug metabolism studies, and environmental metabolomics increasingly rely on untargeted metabolomics to gain biochemical and biological insights. The here described technical advances will benefit all those metabolomics disciplines by transforming spectral data into knowledge that can answer biological questions.
Collapse
Affiliation(s)
- Mehdi A Beniddir
- Université Paris-Saclay, CNRS, BioCIS, 5 rue J.-B Clément, 92290 Châtenay-Malabry, France
| | - Kyo Bin Kang
- Research Institute of Pharmaceutical Sciences, College of Pharmacy, Sookmyung Women's University, Seoul 04310, Republic of Korea
| | - Grégory Genta-Jouve
- Laboratoire de Chimie-Toxicologie Analytique et Cellulaire (C-TAC), UMR CNRS 8038, CiTCoM, Université de Paris, 4, Avenue de l'Observatoire, 75006, Paris, France
- Laboratoire Ecologie, Evolution, Interactions des Systèmes Amazoniens (LEEISA), USR 3456, Université De Guyane, CNRS Guyane, 275 Route de Montabo, 97334 Cayenne, French Guiana, France
| | - Florian Huber
- Netherlands eScience Center, 1098 XG Amsterdam, The Netherlands
| | - Simon Rogers
- School of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK
| | | |
Collapse
|
30
|
Tsugawa H, Rai A, Saito K, Nakabayashi R. Metabolomics and complementary techniques to investigate the plant phytochemical cosmos. Nat Prod Rep 2021; 38:1729-1759. [PMID: 34668509 DOI: 10.1039/d1np00014d] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Covering: up to 2021Plants and their associated microbial communities are known to produce millions of metabolites, a majority of which are still not characterized and are speculated to possess novel bioactive properties. In addition to their role in plant physiology, these metabolites are also relevant as existing and next-generation medicine candidates. Elucidation of the plant metabolite diversity is thus valuable for the successful exploitation of natural resources for humankind. Herein, we present a comprehensive review on recent metabolomics approaches to illuminate molecular networks in plants, including chemical isolation and enzymatic production as well as the modern metabolomics approaches such as stable isotope labeling, ultrahigh-resolution mass spectrometry, metabolome imaging (spatial metabolomics), single-cell analysis, cheminformatics, and computational mass spectrometry. Mass spectrometry-based strategies to characterize plant metabolomes through metabolite identification and annotation are described in detail. We also highlight the use of phytochemical genomics to mine genes associated with specialized metabolites' biosynthesis. Understanding the metabolic diversity through biotechnological advances is fundamental to elucidate the functions of the plant-derived specialized metabolome.
Collapse
Affiliation(s)
- Hiroshi Tsugawa
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan. .,RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.,Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 2-24-16 Nakamachi, Koganei, Tokyo 184-8588, Japan.,Graduate School of Medical Life Science, Yokohama City University, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Amit Rai
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan. .,Plant Molecular Science Center, Chiba University, 1-8-1 Inohana, Chuo-ku, Chiba 260-8675, Japan
| | - Kazuki Saito
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan. .,Plant Molecular Science Center, Chiba University, 1-8-1 Inohana, Chuo-ku, Chiba 260-8675, Japan
| | - Ryo Nakabayashi
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.
| |
Collapse
|
31
|
High-confidence structural annotation of metabolites absent from spectral libraries. Nat Biotechnol 2021; 40:411-421. [PMID: 34650271 PMCID: PMC8926923 DOI: 10.1038/s41587-021-01045-9] [Citation(s) in RCA: 91] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 08/04/2021] [Indexed: 12/14/2022]
Abstract
Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel density P value estimation and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial number of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic standards. In human samples, we annotated and manually validated 315 molecular structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics experiments led to 1,715 high-confidence structural annotations that were absent from spectral libraries.
Collapse
|
32
|
Wang F, Liigand J, Tian S, Arndt D, Greiner R, Wishart DS. CFM-ID 4.0: More Accurate ESI-MS/MS Spectral Prediction and Compound Identification. Anal Chem 2021; 93:11692-11700. [PMID: 34403256 DOI: 10.1021/acs.analchem.1c01465] [Citation(s) in RCA: 122] [Impact Index Per Article: 40.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
In the field of metabolomics, mass spectrometry (MS) is the method most commonly used for identifying and annotating metabolites. As this typically involves matching a given MS spectrum against an experimentally acquired reference spectral library, this approach is limited by the coverage and size of such libraries (which typically number in the thousands). These experimental libraries can be greatly extended by predicting the MS spectra of known chemical structures (which number in the millions) to create computational reference spectral libraries. To facilitate the generation of predicted spectral reference libraries, we developed CFM-ID, a computer program that can accurately predict ESI-MS/MS spectrum for a given compound structure. CFM-ID is one of the best-performing methods for compound-to-mass-spectrum prediction and also one of the top tools for in silico mass-spectrum-to-compound identification. This work improves CFM-ID's ability to predict ESI-MS/MS spectra from compounds by (1) learning parameters from features based on the molecular topology, (2) adding a new approach to ring cleavage that models such cleavage as a sequence of simple chemical bond dissociations, and (3) expanding its hand-written rule-based predictor to cover more chemical classes, including acylcarnitines, acylcholines, flavonols, flavones, flavanones, and flavonoid glycosides. We demonstrate that this new version of CFM-ID (version 4.0) is significantly more accurate than previous CFM-ID versions in terms of both EI-MS/MS spectral prediction and compound identification. CFM-ID 4.0 is available at http://cfmid4.wishartlab.com/ as a web server and docker images can be downloaded at https://hub.docker.com/r/wishartlab/cfmid.
Collapse
Affiliation(s)
- Fei Wang
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2R3, Canada.,Alberta Machine Intelligence Institute, Edmonton, AB T5J 3B1, Canada
| | - Jaanus Liigand
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada.,Institute of Chemistry, University of Tartu, Tartu 50411, Estonia
| | - Siyang Tian
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada
| | - David Arndt
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada
| | - Russell Greiner
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2R3, Canada.,Department of Psychiatry, University of Alberta, Edmonton, AB T6G 2R3, Canada.,Alberta Machine Intelligence Institute, Edmonton, AB T5J 3B1, Canada
| | - David S Wishart
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2R3, Canada.,Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada.,Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| |
Collapse
|
33
|
Bach E, Rogers S, Williamson J, Rousu J. Probabilistic framework for integration of mass spectrum and retention time information in small molecule identification. Bioinformatics 2021; 37:1724-1731. [PMID: 33244585 PMCID: PMC8289373 DOI: 10.1093/bioinformatics/btaa998] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 10/27/2020] [Accepted: 11/17/2020] [Indexed: 11/14/2022] Open
Abstract
Motivation Identification of small molecules in a biological sample remains a major bottleneck in molecular biology, despite a decade of rapid development of computational approaches for predicting molecular structures using mass spectrometry (MS) data. Recently, there has been increasing interest in utilizing other information sources, such as liquid chromatography (LC) retention time (RT), to improve identifications solely based on MS information, such as precursor mass-per-charge and tandem mass spectrometry (MS2). Results We put forward a probabilistic modelling framework to integrate MS and RT data of multiple features in an LC-MS experiment. We model the MS measurements and all pairwise retention order information as a Markov random field and use efficient approximate inference for scoring and ranking potential molecular structures. Our experiments show improved identification accuracy by combining MS2 data and retention orders using our approach, thereby outperforming state-of-the-art methods. Furthermore, we demonstrate the benefit of our model when only a subset of LC-MS features has MS2 measurements available besides MS1. Availability and implementation Software and data are freely available at https://github.com/aalto-ics-kepaco/msms_rt_score_integration. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Eric Bach
- Department of Computer Science, School of Science, Aalto University, Espoo, Finland
| | - Simon Rogers
- School of Computing Science, University of Glasgow, Glasgow, UK
| | - John Williamson
- School of Computing Science, University of Glasgow, Glasgow, UK
| | - Juho Rousu
- Department of Computer Science, School of Science, Aalto University, Espoo, Finland
| |
Collapse
|
34
|
Li D, Gaquerel E. Next-Generation Mass Spectrometry Metabolomics Revives the Functional Analysis of Plant Metabolic Diversity. ANNUAL REVIEW OF PLANT BIOLOGY 2021; 72:867-891. [PMID: 33781077 DOI: 10.1146/annurev-arplant-071720-114836] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The remarkable diversity of specialized metabolites produced by plants has inspired several decades of research and nucleated a long list of theories to guide empirical ecological studies. However, analytical constraints and the lack of untargeted processing workflows have long precluded comprehensive metabolite profiling and, consequently, the collection of the critical currencies to test theory predictions for the ecological functions of plant metabolic diversity. Developments in mass spectrometry (MS) metabolomics have revolutionized the large-scale inventory and annotation of chemicals from biospecimens. Hence, the next generation of MS metabolomics propelled by new bioinformatics developments provides a long-awaited framework to revisit metabolism-centered ecological questions, much like the advances in next-generation sequencing of the last two decades impacted all research horizons in genomics. Here, we review advances in plant (computational) metabolomics to foster hypothesis formulation from complex metabolome data. Additionally, we reflect on how next-generation metabolomics could reinvigorate the testing of long-standing theories on plant metabolic diversity.
Collapse
Affiliation(s)
- Dapeng Li
- Department of Molecular Ecology, Max Planck Institute for Chemical Ecology, 07745 Jena, Germany;
| | - Emmanuel Gaquerel
- Institut de Biologie Moléculaire des Plantes du CNRS, Université de Strasbourg, 67084 Strasbourg, France;
| |
Collapse
|
35
|
González-Gaya B, Lopez-Herguedas N, Bilbao D, Mijangos L, Iker AM, Etxebarria N, Irazola M, Prieto A, Olivares M, Zuloaga O. Suspect and non-target screening: the last frontier in environmental analysis. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2021; 13:1876-1904. [PMID: 33913946 DOI: 10.1039/d1ay00111f] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Suspect and non-target screening (SNTS) techniques are arising as new analytical strategies useful to disentangle the environmental occurrence of the thousands of exogenous chemicals present in our ecosystems. The unbiased discovery of the wide number of substances present over environmental analysis needs to find a consensus with powerful technical and computational requirements, as well as with the time-consuming unequivocal identification of discovered analytes. Within these boundaries, the potential applications of SNTS include the studies of environmental pollution in aquatic, atmospheric, solid and biological samples, the assessment of new compounds, transformation products and metabolites, contaminant prioritization, bioremediation or soil/water treatment evaluation, and retrospective data analysis, among many others. In this review, we evaluate the state of the art of SNTS techniques going over the normalized workflow from sampling and sample treatment to instrumental analysis, data processing and a brief review of the more recent applications of SNTS in environmental occurrence and exposure to xenobiotics. The main issues related to harmonization and knowledge gaps are critically evaluated and the challenges of their implementation are assessed in order to ensure a proper use of these promising techniques in the near future.
Collapse
Affiliation(s)
- B González-Gaya
- Department of Analytical Chemistry, University of the Basque Country (UPV/EHU), 48940 Leioa, Basque Country, Spain.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Dührkop K, Nothias LF, Fleischauer M, Reher R, Ludwig M, Hoffmann MA, Petras D, Gerwick WH, Rousu J, Dorrestein PC, Böcker S. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat Biotechnol 2021; 39:462-471. [PMID: 33230292 DOI: 10.1038/s41587-020-0740-8] [Citation(s) in RCA: 252] [Impact Index Per Article: 84.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 10/16/2020] [Indexed: 12/12/2022]
Abstract
Metabolomics using nontargeted tandem mass spectrometry can detect thousands of molecules in a biological sample. However, structural molecule annotation is limited to structures present in libraries or databases, restricting analysis and interpretation of experimental data. Here we describe CANOPUS (class assignment and ontology prediction using mass spectrometry), a computational tool for systematic compound class annotation. CANOPUS uses a deep neural network to predict 2,497 compound classes from fragmentation spectra, including all biologically relevant classes. CANOPUS explicitly targets compounds for which neither spectral nor structural reference data are available and predicts classes lacking tandem mass spectrometry training data. In evaluation using reference data, CANOPUS reached very high prediction performance (average accuracy of 99.7% in cross-validation) and outperformed four baseline methods. We demonstrate the broad utility of CANOPUS by investigating the effect of microbial colonization in the mouse digestive system, through analysis of the chemodiversity of different Euphorbia plants and regarding the discovery of a marine natural product, revealing biological insights at the compound class level.
Collapse
Affiliation(s)
- Kai Dührkop
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany
| | - Louis-Félix Nothias
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | | | - Raphael Reher
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
| | - Marcus Ludwig
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany
| | - Martin A Hoffmann
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany
- International Max Planck Research School 'Exploration of Ecological Interactions with Molecular and Chemical Techniques', Max Planck Institute for Chemical Ecology, Jena, Germany
| | - Daniel Petras
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
| | - William H Gerwick
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Juho Rousu
- Helsinki Institute for Information Technology, Department of Computer Science, Aalto University, Espoo, Finland
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany.
| |
Collapse
|
37
|
Krettler CA, Thallinger GG. A map of mass spectrometry-based in silico fragmentation prediction and compound identification in metabolomics. Brief Bioinform 2021; 22:6184408. [PMID: 33758925 DOI: 10.1093/bib/bbab073] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 01/29/2021] [Accepted: 02/12/2021] [Indexed: 12/27/2022] Open
Abstract
Metabolomics, the comprehensive study of the metabolome, and lipidomics-the large-scale study of pathways and networks of cellular lipids-are major driving forces in enabling personalized medicine. Complicated and error-prone data analysis still remains a bottleneck, however, especially for identifying novel metabolites. Comparing experimental mass spectra to curated databases containing reference spectra has been the gold standard for identification of compounds, but constructing such databases is a costly and time-demanding task. Many software applications try to circumvent this process by utilizing cutting-edge advances in computational methods-including quantum chemistry and machine learning-and simulate mass spectra by performing theoretical, so called in silico fragmentations of compounds. Other solutions concentrate directly on experimental spectra and try to identify structural properties by investigating reoccurring patterns and the relationships between them. The considerable progress made in the field allows recent approaches to provide valuable clues to expedite annotation of experimental mass spectra. This review sheds light on individual strengths and weaknesses of these tools, and attempts to evaluate them-especially in view of lipidomics, when considering complex mixtures found in biological samples as well as mass spectrometer inter-instrument variability.
Collapse
Affiliation(s)
- Christoph A Krettler
- Institute of Biomedical Informatics, Graz University of Technology, Stremayrgasse 16/I, 8010, Graz, Austria.,Omics Center Graz, BioTechMed-Graz, Stiftingtalstrasse 24, 8010, Graz, Austria
| | - Gerhard G Thallinger
- Institute of Biomedical Informatics, Graz University of Technology, Stremayrgasse 16/I, 8010, Graz, Austria.,Omics Center Graz, BioTechMed-Graz, Stiftingtalstrasse 24, 8010, Graz, Austria
| |
Collapse
|
38
|
Peters K, Balcke G, Kleinenkuhnen N, Treutler H, Neumann S. Untargeted In Silico Compound Classification-A Novel Metabolomics Method to Assess the Chemodiversity in Bryophytes. Int J Mol Sci 2021; 22:ijms22063251. [PMID: 33806786 PMCID: PMC8005083 DOI: 10.3390/ijms22063251] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 03/16/2021] [Accepted: 03/18/2021] [Indexed: 12/29/2022] Open
Abstract
In plant ecology, biochemical analyses of bryophytes and vascular plants are often conducted on dried herbarium specimen as species typically grow in distant and inaccessible locations. Here, we present an automated in silico compound classification framework to annotate metabolites using an untargeted data independent acquisition (DIA)–LC/MS–QToF-sequential windowed acquisition of all theoretical fragment ion mass spectra (SWATH) ecometabolomics analytical method. We perform a comparative investigation of the chemical diversity at the global level and the composition of metabolite families in ten different species of bryophytes using fresh samples collected on-site and dried specimen stored in a herbarium for half a year. Shannon and Pielou’s diversity indices, hierarchical clustering analysis (HCA), sparse partial least squares discriminant analysis (sPLS-DA), distance-based redundancy analysis (dbRDA), ANOVA with post-hoc Tukey honestly significant difference (HSD) test, and the Fisher’s exact test were used to determine differences in the richness and composition of metabolite families, with regard to herbarium conditions, ecological characteristics, and species. We functionally annotated metabolite families to biochemical processes related to the structural integrity of membranes and cell walls (proto-lignin, glycerophospholipids, carbohydrates), chemical defense (polyphenols, steroids), reactive oxygen species (ROS) protection (alkaloids, amino acids, flavonoids), nutrition (nitrogen- and phosphate-containing glycerophospholipids), and photosynthesis. Changes in the composition of metabolite families also explained variance related to ecological functioning like physiological adaptations of bryophytes to dry environments (proteins, peptides, flavonoids, terpenes), light availability (flavonoids, terpenes, carbohydrates), temperature (flavonoids), and biotic interactions (steroids, terpenes). The results from this study allow to construct chemical traits that can be attributed to biogeochemistry, habitat conditions, environmental changes and biotic interactions. Our classification framework accelerates the complex annotation process in metabolomics and can be used to simplify biochemical patterns. We show that compound classification is a powerful tool that allows to explore relationships in both molecular biology by “zooming in” and in ecology by “zooming out”. The insights revealed by our framework allow to construct new research hypotheses and to enable detailed follow-up studies.
Collapse
Affiliation(s)
- Kristian Peters
- Bioinformatics & Scientific Data, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle (Saale), Germany; (H.T.); (S.N.)
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103 Leipzig, Germany
- Institute of Biology/Geobotany and Botanical Garden, Martin Luther University Halle-Wittenberg, 06108 Halle (Saale), Germany
- Correspondence: ; Tel.: +49-345-5582-1475
| | - Gerd Balcke
- Cell and Metabolic Biology, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle (Saale), Germany;
| | - Niklas Kleinenkuhnen
- Max Planck Research Group Chromatin and Ageing, Max Planck Institute for Biology of Ageing, Joseph-Stelzmann-Str. 9b, 50931 Cologne, Germany;
- MS-Platform, Cluster of Excellence on Plant Sciences, Botanical Institute (CEPLAS), University of Cologne, 50931 Cologne, Germany
| | - Hendrik Treutler
- Bioinformatics & Scientific Data, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle (Saale), Germany; (H.T.); (S.N.)
- Datameer GmbH, Magdeburger Straße 23, 06112 Halle (Saale), Germany
| | - Steffen Neumann
- Bioinformatics & Scientific Data, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle (Saale), Germany; (H.T.); (S.N.)
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103 Leipzig, Germany
| |
Collapse
|
39
|
Schymanski EL, Kondić T, Neumann S, Thiessen PA, Zhang J, Bolton EE. Empowering large chemical knowledge bases for exposomics: PubChemLite meets MetFrag. J Cheminform 2021; 13:19. [PMID: 33685519 PMCID: PMC7938590 DOI: 10.1186/s13321-021-00489-0] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 01/13/2021] [Accepted: 01/22/2021] [Indexed: 12/31/2022] Open
Abstract
Compound (or chemical) databases are an invaluable resource for many scientific disciplines. Exposomics researchers need to find and identify relevant chemicals that cover the entirety of potential (chemical and other) exposures over entire lifetimes. This daunting task, with over 100 million chemicals in the largest chemical databases, coupled with broadly acknowledged knowledge gaps in these resources, leaves researchers faced with too much-yet not enough-information at the same time to perform comprehensive exposomics research. Furthermore, the improvements in analytical technologies and computational mass spectrometry workflows coupled with the rapid growth in databases and increasing demand for high throughput "big data" services from the research community present significant challenges for both data hosts and workflow developers. This article explores how to reduce candidate search spaces in non-target small molecule identification workflows, while increasing content usability in the context of environmental and exposomics analyses, so as to profit from the increasing size and information content of large compound databases, while increasing efficiency at the same time. In this article, these methods are explored using PubChem, the NORMAN Network Suspect List Exchange and the in silico fragmentation approach MetFrag. A subset of the PubChem database relevant for exposomics, PubChemLite, is presented as a database resource that can be (and has been) integrated into current workflows for high resolution mass spectrometry. Benchmarking datasets from earlier publications are used to show how experimental knowledge and existing datasets can be used to detect and fill gaps in compound databases to progressively improve large resources such as PubChem, and topic-specific subsets such as PubChemLite. PubChemLite is a living collection, updating as annotation content in PubChem is updated, and exported to allow direct integration into existing workflows such as MetFrag. The source code and files necessary to recreate or adjust this are jointly hosted between the research parties (see data availability statement). This effort shows that enhancing the FAIRness (Findability, Accessibility, Interoperability and Reusability) of open resources can mutually enhance several resources for whole community benefit. The authors explicitly welcome additional community input on ideas for future developments.
Collapse
Affiliation(s)
- Emma L Schymanski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, 4367, Belvaux, Luxembourg.
| | - Todor Kondić
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, 4367, Belvaux, Luxembourg
| | - Steffen Neumann
- Bioinformatics and Scientific Data, Leibniz Institute of Plant Biochemistry (IPB Halle), 06120, Halle, Germany.,German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Deutscher Platz 5e, 04103, Leipzig, Germany
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
40
|
Data processing strategies for non-targeted analysis of foods using liquid chromatography/high-resolution mass spectrometry. Trends Analyt Chem 2021. [DOI: 10.1016/j.trac.2021.116188] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
41
|
Chemically informed analyses of metabolomics mass spectrometry data with Qemistree. Nat Chem Biol 2021; 17:146-151. [PMID: 33199911 PMCID: PMC8189545 DOI: 10.1038/s41589-020-00677-3] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 09/18/2020] [Indexed: 01/28/2023]
Abstract
Untargeted mass spectrometry is employed to detect small molecules in complex biospecimens, generating data that are difficult to interpret. We developed Qemistree, a data exploration strategy based on the hierarchical organization of molecular fingerprints predicted from fragmentation spectra. Qemistree allows mass spectrometry data to be represented in the context of sample metadata and chemical ontologies. By expressing molecular relationships as a tree, we can apply ecological tools that are designed to analyze and visualize the relatedness of DNA sequences to metabolomics data. Here we demonstrate the use of tree-guided data exploration tools to compare metabolomics samples across different experimental conditions such as chromatographic shifts. Additionally, we leverage a tree representation to visualize chemical diversity in a heterogeneous collection of samples. The Qemistree software pipeline is freely available to the microbiome and metabolomics communities in the form of a QIIME2 plugin, and a global natural products social molecular networking workflow.
Collapse
|
42
|
Dueñas ME, Lee YJ. Single-Cell Metabolomics by Mass Spectrometry Imaging. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2021; 1280:69-82. [PMID: 33791975 DOI: 10.1007/978-3-030-51652-9_5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Multicellular organisms achieve their complex living activities through the highly organized metabolic interplay of individual cells and tissues. This complexity has driven the need to spatially resolve metabolomics down to the cellular and subcellular level. Recent technological advances have enabled mass spectrometry imaging (MSI), especially matrix-assisted laser desorption/ionization (MALDI), to become a powerful tool for the visualization of molecular species down to subcellular spatial resolution. In the present chapter, we summarize recent advances in the field of MALDI-MSI, with respect to single-cell level resolution metabolomics directly on tissue. In more detail, we focus on advancements in instrumentation for MSI at single-cell resolution, and the applications towards metabolomic scale imaging. Finally, we discuss new computational tools to aid in metabolite identification, future perspective, and the overall direction that the field of single-cell metabolomics directly on tissue may take in the years to come.
Collapse
Affiliation(s)
- Maria Emilia Dueñas
- Department of Chemistry, Iowa State University, Ames, IA, USA.
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, UK.
| | - Young Jin Lee
- Department of Chemistry, Iowa State University, Ames, IA, USA
| |
Collapse
|
43
|
Rodrigues JF, Florea L, de Oliveira MCF, Diamond D, Oliveira ON. Big data and machine learning for materials science. DISCOVER MATERIALS 2021; 1:12. [PMID: 33899049 PMCID: PMC8054236 DOI: 10.1007/s43939-021-00012-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/01/2021] [Indexed: 05/11/2023]
Abstract
Herein, we review aspects of leading-edge research and innovation in materials science that exploit big data and machine learning (ML), two computer science concepts that combine to yield computational intelligence. ML can accelerate the solution of intricate chemical problems and even solve problems that otherwise would not be tractable. However, the potential benefits of ML come at the cost of big data production; that is, the algorithms demand large volumes of data of various natures and from different sources, from material properties to sensor data. In the survey, we propose a roadmap for future developments with emphasis on computer-aided discovery of new materials and analysis of chemical sensing compounds, both prominent research fields for ML in the context of materials science. In addition to providing an overview of recent advances, we elaborate upon the conceptual and practical limitations of big data and ML applied to materials science, outlining processes, discussing pitfalls, and reviewing cases of success and failure.
Collapse
Affiliation(s)
- Jose F. Rodrigues
- Institute of Mathematical Sciences and Computing, University of São Paulo (USP), São Carlos, SP Brazil
| | - Larisa Florea
- SFI Research Centre for Advanced Materials and BioEngineering Research Trinity College Dublin, The University of Dublin, Dublin, Ireland
| | - Maria C. F. de Oliveira
- Institute of Mathematical Sciences and Computing, University of São Paulo (USP), São Carlos, SP Brazil
| | - Dermot Diamond
- Insight Centre for Data Analytics, National Centre for Sensor Research, Dublin City University, Dublin 9, Dublin, Ireland
| | - Osvaldo N. Oliveira
- São Carlos Institute of Physics, University of São Paulo (USP), São Carlos, SP Brazil
| |
Collapse
|
44
|
Xing S, Hu Y, Yin Z, Liu M, Tang X, Fang M, Huan T. Retrieving and Utilizing Hypothetical Neutral Losses from Tandem Mass Spectra for Spectral Similarity Analysis and Unknown Metabolite Annotation. Anal Chem 2020; 92:14476-14483. [DOI: 10.1021/acs.analchem.0c02521] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Shipei Xing
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, V6T 1Z1 BC, Canada
| | - Yan Hu
- Department of Computer Sciences, University of British Columbia, 2366 Main Mall, Vancouver, V6T 1Z1 BC, Canada
| | - Zixuan Yin
- Fortinet, 4190 Still Creek Dr, Burnaby, V5C 6C6 BC, Canada
| | - Min Liu
- School of Civil and Environmental Engineering, Nanyang Technological University, 639798, Singapore
| | - Xiaoyu Tang
- Institute of Chemical Biology, Shenzhen Bay Laboratory, Shenzhen 518132, China
| | - Mingliang Fang
- School of Civil and Environmental Engineering, Nanyang Technological University, 639798, Singapore
| | - Tao Huan
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, V6T 1Z1 BC, Canada
| |
Collapse
|
45
|
Ludwig M, Nothias LF, Dührkop K, Koester I, Fleischauer M, Hoffmann MA, Petras D, Vargas F, Morsy M, Aluwihare L, Dorrestein PC, Böcker S. Database-independent molecular formula annotation using Gibbs sampling through ZODIAC. NAT MACH INTELL 2020. [DOI: 10.1038/s42256-020-00234-6] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
46
|
Fan Z, Alley A, Ghaffari K, Ressom HW. MetFID: artificial neural network-based compound fingerprint prediction for metabolite annotation. Metabolomics 2020; 16:104. [PMID: 32997169 PMCID: PMC9547616 DOI: 10.1007/s11306-020-01726-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 09/19/2020] [Indexed: 12/11/2022]
Abstract
INTRODUCTION Metabolite annotation is a critical and challenging step in mass spectrometry-based metabolomic profiling. In a typical untargeted MS/MS-based metabolomic study, experimental MS/MS spectra are matched against those in spectral libraries for metabolite annotation. Yet, existing spectral libraries comprise merely a marginal percentage of known compounds. OBJECTIVE The objective is to develop a method that helps rank putative metabolite IDs for analytes whose reference MS/MS spectra are not present in spectral libraries. METHODS We introduce MetFID, which uses an artificial neural network (ANN) trained for predicting molecular fingerprints based on experimental MS/MS data. To narrow the search space, MetFID retrieves candidates from metabolite databases using molecular formula or m/z value of the precursor ions of the analytes. The candidate whose fingerprint is most analogous to the predicted fingerprint is used for metabolite annotation. A comprehensive evaluation was performed by training MetFID using MS/MS spectra from the MoNA repository and NIST library and by testing with structure-disjoint MS/MS spectra from the NIST library, the CASMI 2016 dataset, and in-house MS/MS data from a cancer biomarker discovery study. RESULTS We observed that training separate models for distinct ranges of collision energies enhanced model performance compared to a single model that covers a wide range of collision energies. Using MetaboQuest to retrieve candidates, MetFID prioritized the correct putative ID in the first place rank for about 50% of the testing cases. Through the independent testing dataset, we demonstrated that MetFID has the potential to improve the accuracy of ranking putative metabolite IDs by more than 5% compared to other tools such as ChemDistiller, CSI:FingerID, and MetFrag. CONCLUSION MetFID offers a promising opportunity to enhance the accuracy of metabolite annotation by using ANN for molecular fingerprint prediction.
Collapse
Affiliation(s)
- Ziling Fan
- Department of Biochemistry and Molecular & Cellular Biology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA
| | - Amber Alley
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Suite 173, Building D, 4000 Reservoir Road NW, Washington, DC, 20057, USA
| | - Kian Ghaffari
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Suite 173, Building D, 4000 Reservoir Road NW, Washington, DC, 20057, USA
| | - Habtom W Ressom
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Suite 173, Building D, 4000 Reservoir Road NW, Washington, DC, 20057, USA.
| |
Collapse
|
47
|
Li Y, Kuhn M, Gavin AC, Bork P. Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features. Bioinformatics 2020; 36:1213-1218. [PMID: 31605112 PMCID: PMC7703789 DOI: 10.1093/bioinformatics/btz736] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 07/30/2019] [Accepted: 09/25/2019] [Indexed: 01/11/2023] Open
Abstract
Motivation Untargeted mass spectrometry (MS/MS) is a powerful method for detecting metabolites in biological samples. However, fast and accurate identification of the metabolites’ structures from MS/MS spectra is still a great challenge. Results We present a new analysis method, called SubFragment-Matching (SF-Matching) that is based on the hypothesis that molecules with similar structural features will exhibit similar fragmentation patterns. We combine information on fragmentation patterns of molecules with shared substructures and then use random forest models to predict whether a given structure can yield a certain fragmentation pattern. These models can then be used to score candidate molecules for a given mass spectrum. For rapid identification, we pre-compute such scores for common biological molecular structure databases. Using benchmarking datasets, we find that our method has similar performance to CSI: FingerID and those very high accuracies can be achieved by combining our method with CSI: FingerID. Rarefaction analysis of the training dataset shows that the performance of our method will increase as more experimental data become available. Availability and implementation SF-Matching is available from http://www.bork.embl.de/Docu/sf_matching. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuanyue Li
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Michael Kuhn
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Anne-Claude Gavin
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Molecular Medicine Partnership Unit (MMPU), 69117 Heidelberg, Germany
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Molecular Medicine Partnership Unit (MMPU), 69117 Heidelberg, Germany.,Max Delbrück Center for Molecular Medicine, 13125 Berlin, Germany.,Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany
| |
Collapse
|
48
|
Application of High Resolution Mass Spectrometric methods coupled with chemometric techniques in olive oil authenticity studies - A review. Anal Chim Acta 2020; 1134:150-173. [PMID: 33059861 DOI: 10.1016/j.aca.2020.07.029] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 07/13/2020] [Accepted: 07/14/2020] [Indexed: 12/21/2022]
Abstract
Extra Virgin Olive Oil (EVOO), the emblematic food of the Mediterranean diet, is recognized for its nutritional value and beneficial health effects. The main authenticity issues associated with EVOO's quality involve the organoleptic properties (EVOO or defective), mislabeling of production type (organic or conventional), variety and geographical origin, and adulteration. Currently, there is an emerging need to characterize EVOOs and evaluate their genuineness. This can be achieved through the development of analytical methodologies applying advanced "omics" technologies and the investigation of EVOOs chemical fingerprints. The objective of this review is to demonstrate the analytical performance of High Resolution Mass Spectrometry (HRMS) in the field of food authenticity assessment, allowing the determination of a wide range of food constituents with exceptional identification capabilities. HRMS-based workflows used for the investigation of critical olive oil authenticity issues are presented and discussed, combined with advanced data processing, comprehensive data mining and chemometric tools. The use of unsupervised classification tools, such as Principal Component Analysis (PCA) and Hierarchical Clustering Analysis (HCA), as well as supervised classification techniques, including Linear Discriminant Analysis (LDA), Support Vector Machine (SVM), Partial Least Square Discriminant Analysis (PLS-DA), Orthogonal Projection to Latent Structure-Discriminant Analysis (OPLS-DA), Counter Propagation Artificial Neural Networks (CP-ANNs), Self-Organizing Maps (SOMs) and Random Forest (RF) is summarized. The combination of HRMS methodologies with chemometrics improves the quality and reliability of the conclusions from experimental data (profile or fingerprints), provides valuable information suggesting potential authenticity markers and is widely applied in food authenticity studies.
Collapse
|
49
|
Senan O, Aguilar-Mogas A, Navarro M, Capellades J, Noon L, Burks D, Yanes O, Guimerà R, Sales-Pardo M. CliqueMS: a computational tool for annotating in-source metabolite ions from LC-MS untargeted metabolomics data based on a coelution similarity network. Bioinformatics 2020; 35:4089-4097. [PMID: 30903689 PMCID: PMC6792096 DOI: 10.1093/bioinformatics/btz207] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 01/30/2019] [Accepted: 03/21/2019] [Indexed: 11/26/2022] Open
Abstract
Motivation The analysis of biological samples in untargeted metabolomic studies using LC-MS yields tens of thousands of ion signals. Annotating these features is of the utmost importance for answering questions as fundamental as, e.g. how many metabolites are there in a given sample. Results Here, we introduce CliqueMS, a new algorithm for annotating in-source LC-MS1 data. CliqueMS is based on the similarity between coelution profiles and therefore, as opposed to most methods, allows for the annotation of a single spectrum. Furthermore, CliqueMS improves upon the state of the art in several dimensions: (i) it uses a more discriminatory feature similarity metric; (ii) it treats the similarities between features in a transparent way by means of a simple generative model; (iii) it uses a well-grounded maximum likelihood inference approach to group features; (iv) it uses empirical adduct frequencies to identify the parental mass and (v) it deals more flexibly with the identification of the parental mass by proposing and ranking alternative annotations. We validate our approach with simple mixtures of standards and with real complex biological samples. CliqueMS reduces the thousands of features typically obtained in complex samples to hundreds of metabolites, and it is able to correctly annotate more metabolites and adducts from a single spectrum than available tools. Availability and implementation https://CRAN.R-project.org/package=cliqueMS and https://github.com/osenan/cliqueMS. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Oriol Senan
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona, Spain
| | - Antoni Aguilar-Mogas
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona, Spain
| | - Miriam Navarro
- Department of Electronic Engineering, Metabolomics Platform, IISPV, Universitat Rovira i Virgili, Tarragona, Spain.,CIBER of Diabetes and Associated Metabolic Diseases (CIBERDEM), Madrid, Spain
| | - Jordi Capellades
- Department of Electronic Engineering, Metabolomics Platform, IISPV, Universitat Rovira i Virgili, Tarragona, Spain.,CIBER of Diabetes and Associated Metabolic Diseases (CIBERDEM), Madrid, Spain
| | - Luke Noon
- CIBER of Diabetes and Associated Metabolic Diseases (CIBERDEM), Madrid, Spain.,Centro de Investigación Príncipe Felipe, Valencia, Spain
| | - Deborah Burks
- CIBER of Diabetes and Associated Metabolic Diseases (CIBERDEM), Madrid, Spain.,Centro de Investigación Príncipe Felipe, Valencia, Spain
| | - Oscar Yanes
- Department of Electronic Engineering, Metabolomics Platform, IISPV, Universitat Rovira i Virgili, Tarragona, Spain.,CIBER of Diabetes and Associated Metabolic Diseases (CIBERDEM), Madrid, Spain
| | - Roger Guimerà
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona, Spain.,ICREA, Barcelona, Spain
| | - Marta Sales-Pardo
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona, Spain
| |
Collapse
|
50
|
McEachran AD, Chao A, Al-Ghoul H, Lowe C, Grulke C, Sobus JR, Williams AJ. Revisiting Five Years of CASMI Contests with EPA Identification Tools. Metabolites 2020; 10:E260. [PMID: 32585902 PMCID: PMC7345619 DOI: 10.3390/metabo10060260] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 06/03/2020] [Accepted: 06/17/2020] [Indexed: 01/02/2023] Open
Abstract
Software applications for high resolution mass spectrometry (HRMS)-based non-targeted analysis (NTA) continue to enhance chemical identification capabilities. Given the variety of available applications, determining the most fit-for-purpose tools and workflows can be difficult. The Critical Assessment of Small Molecule Identification (CASMI) contests were initiated in 2012 to provide a means to evaluate compound identification tools on a standardized set of blinded tandem mass spectrometry (MS/MS) data. Five CASMI contests have resulted in recommendations, publications, and invaluable datasets for practitioners of HRMS-based screening studies. The US Environmental Protection Agency's (EPA) CompTox Chemicals Dashboard is now recognized as a valuable resource for compound identification in NTA studies. However, this application was too new and immature in functionality to participate in the five previous CASMI contests. In this work, we performed compound identification on all five CASMI contest datasets using Dashboard tools and data in order to critically evaluate Dashboard performance relative to that of other applications. CASMI data was accessed via the CASMI webpage and processed for use in our spectral matching and identification workflow. Relative to applications used by former contest participants, our tools, data, and workflow performed well, placing more challenge compounds in the top five of ranked candidates than did the winners of three contest years and tying in a fourth. In addition, we conducted an in-depth review of the CASMI structure sets and made these reviewed sets available via the Dashboard. Our results suggest that Dashboard data and tools would enhance chemical identification capabilities for practitioners of HRMS-based NTA.
Collapse
Affiliation(s)
- Andrew D. McEachran
- Oak Ridge Institute for Science and Education (ORISE) Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, USA; (A.C.); (H.A.-G.)
| | - Alex Chao
- Oak Ridge Institute for Science and Education (ORISE) Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, USA; (A.C.); (H.A.-G.)
| | - Hussein Al-Ghoul
- Oak Ridge Institute for Science and Education (ORISE) Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, USA; (A.C.); (H.A.-G.)
| | - Charles Lowe
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, USA; (C.L.); (C.G.); (J.R.S.)
| | - Christopher Grulke
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, USA; (C.L.); (C.G.); (J.R.S.)
| | - Jon R. Sobus
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, USA; (C.L.); (C.G.); (J.R.S.)
| | - Antony J. Williams
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, USA; (C.L.); (C.G.); (J.R.S.)
| |
Collapse
|