1
|
An S, Lu M, Wang R, Wang J, Jiang H, Xie C, Tong J, Yu C. Ion entropy and accurate entropy-based FDR estimation in metabolomics. Brief Bioinform 2024; 25:bbae056. [PMID: 38426325 PMCID: PMC10939419 DOI: 10.1093/bib/bbae056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/07/2024] [Accepted: 01/25/2024] [Indexed: 03/02/2024] Open
Abstract
Accurate metabolite annotation and false discovery rate (FDR) control remain challenging in large-scale metabolomics. Recent progress leveraging proteomics experiences and interdisciplinary inspirations has provided valuable insights. While target-decoy strategies have been introduced, generating reliable decoy libraries is difficult due to metabolite complexity. Moreover, continuous bioinformatics innovation is imperative to improve the utilization of expanding spectral resources while reducing false annotations. Here, we introduce the concept of ion entropy for metabolomics and propose two entropy-based decoy generation approaches. Assessment of public databases validates ion entropy as an effective metric to quantify ion information in massive metabolomics datasets. Our entropy-based decoy strategies outperform current representative methods in metabolomics and achieve superior FDR estimation accuracy. Analysis of 46 public datasets provides instructive recommendations for practical application.
Collapse
Affiliation(s)
- Shaowei An
- Shandong First Medical University & Central Hospital Affiliated to Shandong First Medical University, 6699 Qingdao Road, Jinan 271016, Shandong, China
- Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
- Fudan University, 220 Handan Road, Shanghai 200433, China
| | - Miaoshan Lu
- Shandong First Medical University & Central Hospital Affiliated to Shandong First Medical University, 6699 Qingdao Road, Jinan 271016, Shandong, China
- Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
- Zhejiang University, 866 Yuhangtang Road, Hangzhou 310009, Zhejiang, China
| | - Ruimin Wang
- Shandong First Medical University & Central Hospital Affiliated to Shandong First Medical University, 6699 Qingdao Road, Jinan 271016, Shandong, China
- Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
- Fudan University, 220 Handan Road, Shanghai 200433, China
| | - Jinyin Wang
- Shandong First Medical University & Central Hospital Affiliated to Shandong First Medical University, 6699 Qingdao Road, Jinan 271016, Shandong, China
- Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
- Zhejiang University, 866 Yuhangtang Road, Hangzhou 310009, Zhejiang, China
| | - Hengxuan Jiang
- Shandong First Medical University & Central Hospital Affiliated to Shandong First Medical University, 6699 Qingdao Road, Jinan 271016, Shandong, China
| | - Cong Xie
- Shandong First Medical University & Central Hospital Affiliated to Shandong First Medical University, 6699 Qingdao Road, Jinan 271016, Shandong, China
| | - Junjie Tong
- Shandong First Medical University & Central Hospital Affiliated to Shandong First Medical University, 6699 Qingdao Road, Jinan 271016, Shandong, China
| | - Changbin Yu
- Shandong First Medical University & Central Hospital Affiliated to Shandong First Medical University, 6699 Qingdao Road, Jinan 271016, Shandong, China
| |
Collapse
|
2
|
Bilbao A, Munoz N, Kim J, Orton DJ, Gao Y, Poorey K, Pomraning KR, Weitz K, Burnet M, Nicora CD, Wilton R, Deng S, Dai Z, Oksen E, Gee A, Fasani RA, Tsalenko A, Tanjore D, Gardner J, Smith RD, Michener JK, Gladden JM, Baker ES, Petzold CJ, Kim YM, Apffel A, Magnuson JK, Burnum-Johnson KE. PeakDecoder enables machine learning-based metabolite annotation and accurate profiling in multidimensional mass spectrometry measurements. Nat Commun 2023; 14:2461. [PMID: 37117207 PMCID: PMC10147702 DOI: 10.1038/s41467-023-37031-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 02/24/2023] [Indexed: 04/30/2023] Open
Abstract
Multidimensional measurements using state-of-the-art separations and mass spectrometry provide advantages in untargeted metabolomics analyses for studying biological and environmental bio-chemical processes. However, the lack of rapid analytical methods and robust algorithms for these heterogeneous data has limited its application. Here, we develop and evaluate a sensitive and high-throughput analytical and computational workflow to enable accurate metabolite profiling. Our workflow combines liquid chromatography, ion mobility spectrometry and data-independent acquisition mass spectrometry with PeakDecoder, a machine learning-based algorithm that learns to distinguish true co-elution and co-mobility from raw data and calculates metabolite identification error rates. We apply PeakDecoder for metabolite profiling of various engineered strains of Aspergillus pseudoterreus, Aspergillus niger, Pseudomonas putida and Rhodosporidium toruloides. Results, validated manually and against selected reaction monitoring and gas-chromatography platforms, show that 2683 features could be confidently annotated and quantified across 116 microbial sample runs using a library built from 64 standards.
Collapse
Affiliation(s)
- Aivett Bilbao
- Pacific Northwest National Laboratory, Richland, WA, USA.
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA.
| | - Nathalie Munoz
- Pacific Northwest National Laboratory, Richland, WA, USA
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA
| | - Joonhoon Kim
- Pacific Northwest National Laboratory, Richland, WA, USA
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA
| | - Daniel J Orton
- Pacific Northwest National Laboratory, Richland, WA, USA
| | - Yuqian Gao
- Pacific Northwest National Laboratory, Richland, WA, USA
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA
| | | | - Kyle R Pomraning
- Pacific Northwest National Laboratory, Richland, WA, USA
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA
| | - Karl Weitz
- Pacific Northwest National Laboratory, Richland, WA, USA
| | - Meagan Burnet
- Pacific Northwest National Laboratory, Richland, WA, USA
| | | | - Rosemarie Wilton
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA
- Argonne National Laboratory, Lemont, IL, USA
| | - Shuang Deng
- Pacific Northwest National Laboratory, Richland, WA, USA
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA
| | - Ziyu Dai
- Pacific Northwest National Laboratory, Richland, WA, USA
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA
| | - Ethan Oksen
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Aaron Gee
- Agilent Research Laboratories, Agilent Technologies, Santa Clara, CA, USA
| | - Rick A Fasani
- Agilent Research Laboratories, Agilent Technologies, Santa Clara, CA, USA
| | - Anya Tsalenko
- Agilent Research Laboratories, Agilent Technologies, Santa Clara, CA, USA
| | - Deepti Tanjore
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - James Gardner
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | | - Joshua K Michener
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA
- Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - John M Gladden
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA
- Sandia National Laboratory, Livermore, CA, USA
| | - Erin S Baker
- Department of Chemistry, University of North Carolina, Chapel Hill, NC, USA
| | - Christopher J Petzold
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Young-Mo Kim
- Pacific Northwest National Laboratory, Richland, WA, USA
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA
| | - Alex Apffel
- Agilent Research Laboratories, Agilent Technologies, Santa Clara, CA, USA
| | - Jon K Magnuson
- Pacific Northwest National Laboratory, Richland, WA, USA
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA
| | - Kristin E Burnum-Johnson
- Pacific Northwest National Laboratory, Richland, WA, USA.
- US Department of Energy, Agile BioFoundry, Emeryville, CA, USA.
| |
Collapse
|
3
|
Xing S, Shen S, Xu B, Li X, Huan T. BUDDY: molecular formula discovery via bottom-up MS/MS interrogation. Nat Methods 2023:10.1038/s41592-023-01850-x. [PMID: 37055660 DOI: 10.1038/s41592-023-01850-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 03/15/2023] [Indexed: 04/15/2023]
Abstract
A substantial fraction of metabolic features remains undetermined in mass spectrometry (MS)-based metabolomics, and molecular formula annotation is the starting point for unraveling their chemical identities. Here we present bottom-up tandem MS (MS/MS) interrogation, a method for de novo formula annotation. Our approach prioritizes MS/MS-explainable formula candidates, implements machine-learned ranking and offers false discovery rate estimation. Compared with the mathematically exhaustive formula enumeration, our approach shrinks the formula candidate space by 42.8% on average. Method benchmarking on annotation accuracy was systematically carried out on reference MS/MS libraries and real metabolomics datasets. Applied on 155,321 recurrent unidentified spectra, our approach confidently annotated >5,000 novel molecular formulae absent from chemical databases. Beyond the level of individual metabolic features, we combined bottom-up MS/MS interrogation with global optimization to refine formula annotations while revealing peak interrelationships. This approach allowed the systematic annotation of 37 fatty acid amide molecules in human fecal data. All bioinformatics pipelines are available in a standalone software, BUDDY ( https://github.com/HuanLab/BUDDY ).
Collapse
Affiliation(s)
- Shipei Xing
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver, British Columbia, Canada
| | - Sam Shen
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver, British Columbia, Canada
| | - Banghua Xu
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver, British Columbia, Canada
| | - Xiaoxiao Li
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Tao Huan
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver, British Columbia, Canada.
| |
Collapse
|
4
|
Bauermeister A, Mannochio-Russo H, Costa-Lotufo LV, Jarmusch AK, Dorrestein PC. Mass spectrometry-based metabolomics in microbiome investigations. Nat Rev Microbiol 2022; 20:143-160. [PMID: 34552265 PMCID: PMC9578303 DOI: 10.1038/s41579-021-00621-9] [Citation(s) in RCA: 174] [Impact Index Per Article: 87.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/03/2021] [Indexed: 02/08/2023]
Abstract
Microbiotas are a malleable part of ecosystems, including the human ecosystem. Microorganisms affect not only the chemistry of their specific niche, such as the human gut, but also the chemistry of distant environments, such as other parts of the body. Mass spectrometry-based metabolomics is one of the key technologies to detect and identify the small molecules produced by the human microbiota, and to understand the functional role of these microbial metabolites. This Review provides a foundational introduction to common forms of untargeted mass spectrometry and the types of data that can be obtained in the context of microbiome analysis. Data analysis remains an obstacle; therefore, the emphasis is placed on data analysis approaches and integrative analysis, including the integration of microbiome sequencing data.
Collapse
Affiliation(s)
- Anelize Bauermeister
- Institute of Biomedical Science, Universidade de São Paulo, São Paulo, SP, Brazil,Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, CA, USA
| | - Helena Mannochio-Russo
- Department of Biochemistry and Organic Chemistry, Institute of Chemistry, São Paulo State University, Araraquara, SP, Brazil,Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, CA, USA
| | | | - Alan K. Jarmusch
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, CA, USA
| | - Pieter C. Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, CA, USA.,Department of Pediatrics, University of California, San Diego, CA, USA.,Center for Microbiome Innovation, University of California, San Diego, CA, USA
| |
Collapse
|
5
|
Anderson BG, Raskind A, Habra H, Kennedy RT, Evans CR. Modifying Chromatography Conditions for Improved Unknown Feature Identification in Untargeted Metabolomics. Anal Chem 2021; 93:15840-15849. [PMID: 34794310 PMCID: PMC10634695 DOI: 10.1021/acs.analchem.1c02149] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Untargeted metabolomics is an essential component of systems biology research, but it is plagued by a high proportion of detectable features not identified with a chemical structure. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) experiments produce spectra that can be searched against databases to help identify or classify these unknowns, but many features do not generate spectra of sufficient quality to enable successful annotation. Here, we explore alterations to gradient length, mass loading, and rolling precursor ion exclusion parameters for reversed phase liquid chromatography (RPLC) and hydrophilic interaction liquid chromatography (HILIC) that improve compound identification performance for human plasma samples. A manual review of spectral matches from the HILIC data set was used to determine reasonable thresholds for search score and other metrics to enable semi-automated MS/MS data analysis. Compared to typical LC-MS/MS conditions, methods adapted for compound identification increased the total number of unique metabolites that could be matched to a spectral database from 214 to 2052. Following data alignment, 68.0% of newly identified features from the modified conditions could be detected and quantitated using a routine 20-min LC-MS run. Finally, a localized machine learning model was developed to classify the remaining unknowns and select a subset that shared spectral characteristics with successfully identified features. A total of 576 and 749 unidentified features in the HILIC and RPLC data sets were classified by the model as high-priority unknowns or higher-importance targets for follow-up analysis. Overall, our study presents a simple strategy to more deeply annotate untargeted metabolomics data for a modest additional investment of time and sample.
Collapse
Affiliation(s)
- Brady G. Anderson
- Department of Chemistry, University of Michigan, Ann Arbor, MI 48109
- Biomedical Research Core Facilities Metabolomics Core, University of Michigan, Ann Arbor MI 48109
| | - Alexander Raskind
- Biomedical Research Core Facilities Metabolomics Core, University of Michigan, Ann Arbor MI 48109
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
| | - Hani Habra
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
| | - Robert T. Kennedy
- Department of Chemistry, University of Michigan, Ann Arbor, MI 48109
- Biomedical Research Core Facilities Metabolomics Core, University of Michigan, Ann Arbor MI 48109
- Department of Pharmacology, University of Michigan, Ann Arbor, MI 48109
| | - Charles R. Evans
- Biomedical Research Core Facilities Metabolomics Core, University of Michigan, Ann Arbor MI 48109
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
6
|
Beniddir MA, Kang KB, Genta-Jouve G, Huber F, Rogers S, van der Hooft JJJ. Advances in decomposing complex metabolite mixtures using substructure- and network-based computational metabolomics approaches. Nat Prod Rep 2021; 38:1967-1993. [PMID: 34821250 PMCID: PMC8597898 DOI: 10.1039/d1np00023c] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Indexed: 12/13/2022]
Abstract
Covering: up to the end of 2020Recently introduced computational metabolome mining tools have started to positively impact the chemical and biological interpretation of untargeted metabolomics analyses. We believe that these current advances make it possible to start decomposing complex metabolite mixtures into substructure and chemical class information, thereby supporting pivotal tasks in metabolomics analysis including metabolite annotation, the comparison of metabolic profiles, and network analyses. In this review, we highlight and explain key tools and emerging strategies covering 2015 up to the end of 2020. The majority of these tools aim at processing and analyzing liquid chromatography coupled to mass spectrometry fragmentation data. We start with defining what substructures are, how they relate to molecular fingerprints, and how recognizing them helps to decompose complex mixtures. We continue with chemical classes that are based on the presence or absence of particular molecular scaffolds and/or functional groups and are thus intrinsically related to substructures. We discuss novel tools to mine substructures, annotate chemical compound classes, and create mass spectral networks from metabolomics data and demonstrate them using two case studies. We also review and speculate about the opportunities that NMR spectroscopy-based metabolome mining of complex metabolite mixtures offers to discover substructures and chemical classes. Finally, we will describe the main benefits and limitations of the current tools and strategies that rely on them, and our vision on how this exciting field can develop toward repository-scale-sized metabolomics analyses. Complementary sources of structural information from genomics analyses and well-curated taxonomic records are also discussed. Many research fields such as natural products discovery, pharmacokinetic and drug metabolism studies, and environmental metabolomics increasingly rely on untargeted metabolomics to gain biochemical and biological insights. The here described technical advances will benefit all those metabolomics disciplines by transforming spectral data into knowledge that can answer biological questions.
Collapse
Affiliation(s)
- Mehdi A Beniddir
- Université Paris-Saclay, CNRS, BioCIS, 5 rue J.-B Clément, 92290 Châtenay-Malabry, France
| | - Kyo Bin Kang
- Research Institute of Pharmaceutical Sciences, College of Pharmacy, Sookmyung Women's University, Seoul 04310, Republic of Korea
| | - Grégory Genta-Jouve
- Laboratoire de Chimie-Toxicologie Analytique et Cellulaire (C-TAC), UMR CNRS 8038, CiTCoM, Université de Paris, 4, Avenue de l'Observatoire, 75006, Paris, France
- Laboratoire Ecologie, Evolution, Interactions des Systèmes Amazoniens (LEEISA), USR 3456, Université De Guyane, CNRS Guyane, 275 Route de Montabo, 97334 Cayenne, French Guiana, France
| | - Florian Huber
- Netherlands eScience Center, 1098 XG Amsterdam, The Netherlands
| | - Simon Rogers
- School of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK
| | | |
Collapse
|
7
|
Rampler E, Abiead YE, Schoeny H, Rusz M, Hildebrand F, Fitz V, Koellensperger G. Recurrent Topics in Mass Spectrometry-Based Metabolomics and Lipidomics-Standardization, Coverage, and Throughput. Anal Chem 2021; 93:519-545. [PMID: 33249827 PMCID: PMC7807424 DOI: 10.1021/acs.analchem.0c04698] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Evelyn Rampler
- Department of Analytical
Chemistry, Faculty of Chemistry, University of Vienna, Währinger Str. 38, 1090 Vienna, Austria
- Vienna Metabolomics Center (VIME), University of Vienna, Althanstraße 14, 1090 Vienna, Austria
- University of Vienna, Althanstraße 14, 1090 Vienna, Austria
| | - Yasin El Abiead
- Department of Analytical
Chemistry, Faculty of Chemistry, University of Vienna, Währinger Str. 38, 1090 Vienna, Austria
| | - Harald Schoeny
- Department of Analytical
Chemistry, Faculty of Chemistry, University of Vienna, Währinger Str. 38, 1090 Vienna, Austria
| | - Mate Rusz
- Department of Analytical
Chemistry, Faculty of Chemistry, University of Vienna, Währinger Str. 38, 1090 Vienna, Austria
- Institute of Inorganic
Chemistry, University of Vienna, Währinger Straße 42, 1090 Vienna, Austria
| | - Felina Hildebrand
- Department of Analytical
Chemistry, Faculty of Chemistry, University of Vienna, Währinger Str. 38, 1090 Vienna, Austria
| | - Veronika Fitz
- Department of Analytical
Chemistry, Faculty of Chemistry, University of Vienna, Währinger Str. 38, 1090 Vienna, Austria
| | - Gunda Koellensperger
- Department of Analytical
Chemistry, Faculty of Chemistry, University of Vienna, Währinger Str. 38, 1090 Vienna, Austria
- Vienna Metabolomics Center (VIME), University of Vienna, Althanstraße 14, 1090 Vienna, Austria
- University of Vienna, Althanstraße 14, 1090 Vienna, Austria
| |
Collapse
|
8
|
Piovesana S, Cavaliere C, Cerrato A, Montone CM, Laganà A, Capriotti AL. Developments and pitfalls in the characterization of phenolic compounds in food: From targeted analysis to metabolomics-based approaches. Trends Analyt Chem 2020. [DOI: 10.1016/j.trac.2020.116083] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|