1
|
Beck A, Muhoberac M, Randolph CE, Beveridge CH, Wijewardhane PR, Kenttämaa HI, Chopra G. Recent Developments in Machine Learning for Mass Spectrometry. ACS MEASUREMENT SCIENCE AU 2024; 4:233-246. [PMID: 38910862 PMCID: PMC11191731 DOI: 10.1021/acsmeasuresciau.3c00060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 12/27/2023] [Accepted: 01/22/2024] [Indexed: 06/25/2024]
Abstract
Statistical analysis and modeling of mass spectrometry (MS) data have a long and rich history with several modern MS-based applications using statistical and chemometric methods. Recently, machine learning (ML) has experienced a renaissance due to advents in computational hardware and the development of new algorithms for artificial neural networks (ANN) and deep learning architectures. Moreover, recent successes of new ANN and deep learning architectures in several areas of science, engineering, and society have further strengthened the ML field. Importantly, modern ML methods and architectures have enabled new approaches for tasks related to MS that are now widely adopted in several popular MS-based subdisciplines, such as mass spectrometry imaging and proteomics. Herein, we aim to provide an introductory summary of the practical aspects of ML methodology relevant to MS. Additionally, we seek to provide an up-to-date review of the most recent developments in ML integration with MS-based techniques while also providing critical insights into the future direction of the field.
Collapse
Affiliation(s)
- Armen
G. Beck
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Matthew Muhoberac
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Caitlin E. Randolph
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Connor H. Beveridge
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Prageeth R. Wijewardhane
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Hilkka I. Kenttämaa
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Gaurav Chopra
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
- Department
of Computer Science (by courtesy), Purdue University, West Lafayette, Indiana 47907, United States
- Purdue
Institute for Drug Discovery, Purdue Institute for Cancer Research,
Regenstrief Center for Healthcare Engineering, Purdue Institute for
Inflammation, Immunology and Infectious Disease, Purdue Institute for Integrative Neuroscience, West Lafayette, Indiana 47907 United States
| |
Collapse
|
2
|
Potemkin AA, Proskurnin MA, Volkov DS. Noise Filtering Algorithm Using Gaussian Mixture Models for High-Resolution Mass Spectra of Natural Organic Matter. Anal Chem 2024; 96:5455-5461. [PMID: 38530650 DOI: 10.1021/acs.analchem.3c05453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
High-resolution mass spectra of natural organic matter (NOM) contain a large number of noise signals. These signals interfere with the correct molecular composition estimation during nontargeted analysis because formula-assignment programs find empirical formulas for such peaks as well. Previously proposed noise filtering methods that utilize the profile of the intensity distribution of mass spectrum peaks rely on a histogram to calculate the intensity threshold value. However, the histogram profile can vary depending on the user settings. In addition, these algorithms are not automated, so they are handled manually. To overcome the mentioned drawbacks, we propose a new algorithm for noise filtering in mass spectra. This filter is based on Gaussian Mixture Models (GMMs), a machine learning method to find the intensity threshold value. The algorithm is completely data-driven and eliminates the need to work with a histogram. It has no customizable parameters and automatically determines the noise level for each individual mass spectrum. The algorithm performance was tested on mass spectra of natural organic matter obtained by averaging a different number of microscans (transients), and the results were compared with other noise filters proposed in the literature. Finally, the effect of this noise filtering approach on the fraction of peaks with assigned formulas was investigated. It was shown that there is always an increase in the identification rate, but the magnitude of the effect changes with the number of microscans averaged. The increase can be as high as 15%.
Collapse
Affiliation(s)
- Alexander A Potemkin
- Chemistry Department of M.V. Lomonosov Moscow State University, Leninskie Gory, 1-3, GSP-1, Moscow 119991, Russia
| | - Mikhail A Proskurnin
- Chemistry Department of M.V. Lomonosov Moscow State University, Leninskie Gory, 1-3, GSP-1, Moscow 119991, Russia
| | - Dmitry S Volkov
- Chemistry Department of M.V. Lomonosov Moscow State University, Leninskie Gory, 1-3, GSP-1, Moscow 119991, Russia
| |
Collapse
|
3
|
Pan Q, He C, Shi Q. Graph-Based Method for Calibration of High-Resolution Mass Spectra of Natural Organic Matter. Anal Chem 2024; 96:3739-3743. [PMID: 38391144 DOI: 10.1021/acs.analchem.3c05423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]
Abstract
Inaccuracies in ion detection and signal processing can undermine confidence in the molecular formula assignment of high-resolution mass spectrometry, which relies on precise matching of the mass-to-charge ratio (m/z). This study proposes a novel graph-based spectra calibration method, MSCMcalib, which implements coordinate transformation and pattern detection. MSCMcalib maps uncalibrated m/z data onto a modified 2D mass defect plot, facilitating the automatic calibration of detected lines, i.e., the calibration of uncalibrated peaks aligned with these lines. The "propagation" method is subsequently employed to accurately and automatically calibrate 605 m/z values across multiple lines, encompassing 98% of the m/z range. The calibrated m/z values divide the m/z range of the spectrum into multiple subintervals, with each subinterval undergoing a process of "scaling" calibration. The utilization of narrower partitions effectively mitigates divergence issues at both ends that arise from the polynomial fitting of errors against m/z. The effectiveness of MSCMcalib is validated through the calibration of SRFA data with m/z error ranges spanning from -10 to -6 ppm, resulting in an additional assignment of 11%-30% more molecular formulas compared to the quadratic fitting calibration.
Collapse
Affiliation(s)
- Qiong Pan
- State Key Laboratory of Heavy Oil Processing, China University of Petroleum, Beijing 102249, People's Republic of China
- Department of Chemical Engineering, Faculty of Science and Engineering, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Chen He
- State Key Laboratory of Heavy Oil Processing, China University of Petroleum, Beijing 102249, People's Republic of China
| | - Quan Shi
- State Key Laboratory of Heavy Oil Processing, China University of Petroleum, Beijing 102249, People's Republic of China
| |
Collapse
|