1
|
Baygi SF, Barupal DK. IDSL_MINT: a deep learning framework to predict molecular fingerprints from mass spectra. J Cheminform 2024; 16:8. [PMID: 38238779 PMCID: PMC10797927 DOI: 10.1186/s13321-024-00804-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 01/14/2024] [Indexed: 01/22/2024] Open
Abstract
The majority of tandem mass spectrometry (MS/MS) spectra in untargeted metabolomics and exposomics studies lack any annotation. Our deep learning framework, Integrated Data Science Laboratory for Metabolomics and Exposomics-Mass INTerpreter (IDSL_MINT) can translate MS/MS spectra into molecular fingerprint descriptors. IDSL_MINT allows users to leverage the power of the transformer model for mass spectrometry data, similar to the large language models. Models are trained on user-provided reference MS/MS libraries via any customizable molecular fingerprint descriptors. IDSL_MINT was benchmarked using the LipidMaps database and improved the annotation rate of a test study for MS/MS spectra that were not originally annotated using existing mass spectral libraries. IDSL_MINT may improve the overall annotation rates in untargeted metabolomics and exposomics studies. The IDSL_MINT framework and tutorials are available in the GitHub repository at https://github.com/idslme/IDSL_MINT .Scientific contribution statement.Structural annotation of MS/MS spectra from untargeted metabolomics and exposomics datasets is a major bottleneck in gaining new biological insights. Machine learning models to convert spectra into molecular fingerprints can help in the annotation process. Here, we present IDSL_MINT, a new, easy-to-use and customizable deep-learning framework to train and utilize new models to predict molecular fingerprints from spectra for the compound annotation workflows.
Collapse
Affiliation(s)
- Sadjad Fakouri Baygi
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, CAM Building, 3rd Floor, 17 E 102 St, New York, NY, 10029, USA
| | - Dinesh Kumar Barupal
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, CAM Building, 3rd Floor, 17 E 102 St, New York, NY, 10029, USA.
| |
Collapse
|
2
|
Baygi SF, Kumar Y, Barupal DK. IDSL.CSA: Composite Spectra Analysis for Chemical Annotation of Untargeted Metabolomics Datasets. Anal Chem 2023; 95:9480-9487. [PMID: 37311059 PMCID: PMC11080491 DOI: 10.1021/acs.analchem.3c00376] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Poor chemical annotation of high-resolution mass spectrometry data limits applications of untargeted metabolomics datasets. Our new software, the Integrated Data Science Laboratory for Metabolomics and Exposomics─Composite Spectra Analysis (IDSL.CSA) R package, generates composite mass spectra libraries from MS1-only data, enabling the chemical annotation of high-resolution mass spectrometry coupled with liquid chromatography peaks regardless of the availability of MS2 fragmentation spectra. We demonstrate comparable annotation rates for commonly detected endogenous metabolites in human blood samples using IDSL.CSA libraries versus MS/MS libraries in validation tests. IDSL.CSA can create and search composite spectra libraries from any untargeted metabolomics dataset generated using high-resolution mass spectrometry coupled to liquid or gas chromatography instruments. The cross-applicability of these libraries across independent studies may provide access to new biological insights that may be missed due to the lack of MS2 fragmentation data. The IDSL.CSA package is available in the R-CRAN repository at https://cran.r-project.org/package=IDSL.CSA. Detailed documentation and tutorials are provided at https://github.com/idslme/IDSL.CSA.
Collapse
Affiliation(s)
- Sadjad Fakouri Baygi
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Yashwant Kumar
- Non-communicable Diseases Division, Translational Health Science and Technology Institute, Faridabad, Haryana, 121001, India
| | - Dinesh Kumar Barupal
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| |
Collapse
|
3
|
Baygi SF, Kumar Y, Barupal DK. IDSL.CSA: Composite Spectra Analysis for Chemical Annotation of Untargeted Metabolomics Datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.09.527886. [PMID: 36798308 PMCID: PMC9934657 DOI: 10.1101/2023.02.09.527886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Poor chemical annotation of high-resolution mass spectrometry data limit applications of untargeted metabolomics datasets. Our new software, the Integrated Data Science Laboratory for Metabolomics and Exposomics - Composite Spectra Analysis (IDSL.CSA) R package, generates composite mass spectra libraries from MS1-only data, enabling the chemical annotation of LC/HRMS peaks regardless of the availability of MS2 fragmentation spectra. We demonstrate comparable annotation rates for commonly detected endogenous metabolites in human blood samples using IDSL.CSA libraries versus MS/MS libraries in validation tests. IDSL.CSA can create and search composite spectra libraries from any untargeted metabolomics dataset generated using high-resolution mass spectrometry coupled to liquid or gas chromatography instruments. The cross-applicability of these libraries across independent studies may provide access to new biological insights that may be missed due to the lack of MS2 fragmentation data. The IDSL.CSA package is available in the R CRAN repository at https://cran.r-project.org/package=IDSL.CSA . Detailed documentation and tutorials are provided at https://github.com/idslme/IDSL.CSA . For Table of Contents Only
Collapse
Affiliation(s)
- Sadjad Fakouri Baygi
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Yashwant Kumar
- Non-communicable Diseases Division, Translational Health Science and Technology Institute, Faridabad, Haryana, 121001, India
| | - Dinesh Kumar Barupal
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| |
Collapse
|
4
|
Sun X, Jia Z, Zhang Y, Zhao X, Zhao C, Lu X, Xu G. A Strategy for Uncovering the Serum Metabolome by Direct-Infusion High-Resolution Mass Spectrometry. Metabolites 2023; 13:metabo13030460. [PMID: 36984900 PMCID: PMC10057860 DOI: 10.3390/metabo13030460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 03/18/2023] [Accepted: 03/20/2023] [Indexed: 03/30/2023] Open
Abstract
Direct infusion nanoelectrospray high-resolution mass spectrometry (DI-nESI-HRMS) is a promising tool for high-throughput metabolomics analysis. However, metabolite assignment is limited by the inadequate mass accuracy and chemical space of the metabolome database. Here, a serum metabolome characterization method was proposed to make full use of the potential of DI-nESI-HRMS. Different from the widely used database search approach, unambiguous formula assignments were achieved by a reaction network combined with mass accuracy and isotopic patterns filter. To provide enough initial known nodes, an initial network was directly constructed by known metabolite formulas. Then experimental formula candidates were screened by the predefined reaction with the network. The effects of sources and scales of networks on assignment performance were investigated. Further, a scoring rule for filtering unambiguous formula candidates was proposed. The developed approach was validated by a pooled serum sample spiked with reference standards. The coverage and accuracy rates for the spiked standards were 98.9% and 93.6%, respectively. A total of 1958 monoisotopic features were assigned with unique formula candidates for the pooled serum, which is twice more than the database search. Finally, a case study of serum metabolomics in diabetes was carried out using the developed method.
Collapse
Affiliation(s)
- Xiaoshan Sun
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Zhen Jia
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
- Department of Cell Biology, College of Life Sciences, China Medical University, Shenyang 110122, China
| | - Yuqing Zhang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
- Zhang Dayu School of Chemistry, Dalian University of Technology, Dalian 116024, China
| | - Xinjie Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Chunxia Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Xin Lu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| |
Collapse
|
5
|
Ren J, Fernando S, Hopke PK, Holsen TM, Crimmins BS. Suspect Screening and Nontargeted Analysis of Per- and Polyfluoroalkyl Substances in a Lake Ontario Food Web. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:17626-17634. [PMID: 36468978 DOI: 10.1021/acs.est.2c04321] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Per- and polyfluoroalkyl substances (PFAS) are globally distributed in the natural environment, and their persistent and bioaccumulative potential illicit public concern. The production of certain PFAS has been halted or controlled by regulation due to their adverse effect on the health of humans and wildlife. However, new PFAS are continuously developed as alternatives to legacy PFAS. Additionally, many precursors are unknown, and their metabolites have not been assessed. To better understand the PFAS profiles in the Lake Ontario (LO) aquatic food web, a quadrupole time-of-flight mass spectrometer (QToF) coupled to ultrahigh-performance liquid chromatography (UPLC) was used to generate high-resolution mass spectra (HRMS) from sample extracts. The HRMS data files were analyzed using an isotopic profile deconvoluted chromatogram (IPDC) algorithm to isolate PFAS profiles in aquatic organisms. Fourteen legacy PFAAs (C5-C14) and 15 known precursors were detected in the LO food web. In addition, over 400 unknown PFAS features that appear to biomagnify in the LO food web were found. Profundal benthic organisms, deepwater sculpin(Myoxocephalus thompsonii), and Mysis were found to have more known precursors than other species in the food web, suggesting that there is a large reservoir of fluorinated substances in the benthic zone.
Collapse
Affiliation(s)
- Junda Ren
- Department of Civil and Environmental Engineering, Clarkson University, 8 Clarkson Avenue, Potsdam, New York 13699, United States
| | - Sujan Fernando
- Department of Chemical and Biomolecular Engineering, Clarkson University, 8 Clarkson Avenue, Potsdam, New York 13699, United States
| | - Philip K Hopke
- Institute for a Sustainable Environment, Clarkson University, Potsdam, New York 13699, United States
- Center for Air Resources Engineering and Science, Clarkson University, 8 Clarkson Avenue, Potsdam, New York 13699, United States
- Department of Public Health Sciences, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, United States
| | - Thomas M Holsen
- Department of Civil and Environmental Engineering, Clarkson University, 8 Clarkson Avenue, Potsdam, New York 13699, United States
- Department of Chemical and Biomolecular Engineering, Clarkson University, 8 Clarkson Avenue, Potsdam, New York 13699, United States
| | - Bernard S Crimmins
- Department of Civil and Environmental Engineering, Clarkson University, 8 Clarkson Avenue, Potsdam, New York 13699, United States
- AEACS, LLC, New Kensington, Pennsylvania 15068, United States
| |
Collapse
|