1
|
Lai Y, Koelmel JP, Walker DI, Price EJ, Papazian S, Manz KE, Castilla-Fernández D, Bowden JA, Nikiforov V, David A, Bessonneau V, Amer B, Seethapathy S, Hu X, Lin EZ, Jbebli A, McNeil BR, Barupal D, Cerasa M, Xie H, Kalia V, Nandakumar R, Singh R, Tian Z, Gao P, Zhao Y, Froment J, Rostkowski P, Dubey S, Coufalíková K, Seličová H, Hecht H, Liu S, Udhani HH, Restituito S, Tchou-Wong KM, Lu K, Martin JW, Warth B, Godri Pollitt KJ, Klánová J, Fiehn O, Metz TO, Pennell KD, Jones DP, Miller GW. High-Resolution Mass Spectrometry for Human Exposomics: Expanding Chemical Space Coverage. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:12784-12822. [PMID: 38984754 PMCID: PMC11271014 DOI: 10.1021/acs.est.4c01156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 06/11/2024] [Accepted: 06/12/2024] [Indexed: 07/11/2024]
Abstract
In the modern "omics" era, measurement of the human exposome is a critical missing link between genetic drivers and disease outcomes. High-resolution mass spectrometry (HRMS), routinely used in proteomics and metabolomics, has emerged as a leading technology to broadly profile chemical exposure agents and related biomolecules for accurate mass measurement, high sensitivity, rapid data acquisition, and increased resolution of chemical space. Non-targeted approaches are increasingly accessible, supporting a shift from conventional hypothesis-driven, quantitation-centric targeted analyses toward data-driven, hypothesis-generating chemical exposome-wide profiling. However, HRMS-based exposomics encounters unique challenges. New analytical and computational infrastructures are needed to expand the analysis coverage through streamlined, scalable, and harmonized workflows and data pipelines that permit longitudinal chemical exposome tracking, retrospective validation, and multi-omics integration for meaningful health-oriented inferences. In this article, we survey the literature on state-of-the-art HRMS-based technologies, review current analytical workflows and informatic pipelines, and provide an up-to-date reference on exposomic approaches for chemists, toxicologists, epidemiologists, care providers, and stakeholders in health sciences and medicine. We propose efforts to benchmark fit-for-purpose platforms for expanding coverage of chemical space, including gas/liquid chromatography-HRMS (GC-HRMS and LC-HRMS), and discuss opportunities, challenges, and strategies to advance the burgeoning field of the exposome.
Collapse
Affiliation(s)
- Yunjia Lai
- Department
of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Jeremy P. Koelmel
- Department
of Environmental Health Sciences, Yale School
of Public Health, New Haven, Connecticut 06520, United States
| | - Douglas I. Walker
- Gangarosa
Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, Georgia 30322, United States
| | - Elliott J. Price
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Stefano Papazian
- Department
of Environmental Science, Science for Life Laboratory, Stockholm University, SE-106 91 Stockholm, Sweden
- National
Facility for Exposomics, Metabolomics Platform, Science for Life Laboratory, Stockholm University, Solna 171 65, Sweden
| | - Katherine E. Manz
- Department
of Environmental Health Sciences, School of Public Health, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Delia Castilla-Fernández
- Department
of Food Chemistry and Toxicology, Faculty of Chemistry, University of Vienna, 1010 Vienna, Austria
| | - John A. Bowden
- Center for
Environmental and Human Toxicology, Department of Physiological Sciences,
College of Veterinary Medicine, University
of Florida, Gainesville, Florida 32611, United States
| | | | - Arthur David
- Univ Rennes,
Inserm, EHESP, Irset (Institut de recherche en santé, environnement
et travail) − UMR_S, 1085 Rennes, France
| | - Vincent Bessonneau
- Univ Rennes,
Inserm, EHESP, Irset (Institut de recherche en santé, environnement
et travail) − UMR_S, 1085 Rennes, France
| | - Bashar Amer
- Thermo
Fisher Scientific, San Jose, California 95134, United States
| | | | - Xin Hu
- Gangarosa
Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, Georgia 30322, United States
| | - Elizabeth Z. Lin
- Department
of Environmental Health Sciences, Yale School
of Public Health, New Haven, Connecticut 06520, United States
| | - Akrem Jbebli
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Brooklynn R. McNeil
- Biomarkers
Core Laboratory, Irving Institute for Clinical and Translational Research, Columbia University Irving Medical Center, New York, New York 10032, United States
| | - Dinesh Barupal
- Department
of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, New York 10029, United States
| | - Marina Cerasa
- Institute
of Atmospheric Pollution Research, Italian National Research Council, 00015 Monterotondo, Rome, Italy
| | - Hongyu Xie
- Department
of Environmental Science, Science for Life Laboratory, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Vrinda Kalia
- Department
of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Renu Nandakumar
- Biomarkers
Core Laboratory, Irving Institute for Clinical and Translational Research, Columbia University Irving Medical Center, New York, New York 10032, United States
| | - Randolph Singh
- Department
of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Zhenyu Tian
- Department
of Chemistry and Chemical Biology, Northeastern
University, Boston, Massachusetts 02115, United States
| | - Peng Gao
- Department
of Environmental and Occupational Health, and Department of Civil
and Environmental Engineering, University
of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
- UPMC Hillman
Cancer Center, Pittsburgh, Pennsylvania 15232, United States
| | - Yujia Zhao
- Institute
for Risk Assessment Sciences, Utrecht University, Utrecht 3584CM, The Netherlands
| | | | | | - Saurabh Dubey
- Biomarkers
Core Laboratory, Irving Institute for Clinical and Translational Research, Columbia University Irving Medical Center, New York, New York 10032, United States
| | - Kateřina Coufalíková
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Hana Seličová
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Helge Hecht
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Sheng Liu
- Department
of Environmental Health Sciences, Yale School
of Public Health, New Haven, Connecticut 06520, United States
| | - Hanisha H. Udhani
- Biomarkers
Core Laboratory, Irving Institute for Clinical and Translational Research, Columbia University Irving Medical Center, New York, New York 10032, United States
| | - Sophie Restituito
- Department
of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Kam-Meng Tchou-Wong
- Department
of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Kun Lu
- Department
of Environmental Sciences and Engineering, Gillings School of Global
Public Health, The University of North Carolina
at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Jonathan W. Martin
- Department
of Environmental Science, Science for Life Laboratory, Stockholm University, SE-106 91 Stockholm, Sweden
- National
Facility for Exposomics, Metabolomics Platform, Science for Life Laboratory, Stockholm University, Solna 171 65, Sweden
| | - Benedikt Warth
- Department
of Food Chemistry and Toxicology, Faculty of Chemistry, University of Vienna, 1010 Vienna, Austria
| | - Krystal J. Godri Pollitt
- Department
of Environmental Health Sciences, Yale School
of Public Health, New Haven, Connecticut 06520, United States
| | - Jana Klánová
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Oliver Fiehn
- West Coast
Metabolomics Center, University of California−Davis, Davis, California 95616, United States
| | - Thomas O. Metz
- Biological
Sciences Division, Pacific Northwest National
Laboratory, Richland, Washington 99354, United States
| | - Kurt D. Pennell
- School
of Engineering, Brown University, Providence, Rhode Island 02912, United States
| | - Dean P. Jones
- Department
of Medicine, School of Medicine, Emory University, Atlanta, Georgia 30322, United States
| | - Gary W. Miller
- Department
of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| |
Collapse
|
2
|
Sadia M, Boudguiyer Y, Helmus R, Seijo M, Praetorius A, Samanipour S. A stochastic approach for parameter optimization of feature detection algorithms for non-target screening in mass spectrometry. Anal Bioanal Chem 2024:10.1007/s00216-024-05425-3. [PMID: 38995405 DOI: 10.1007/s00216-024-05425-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 06/05/2024] [Accepted: 06/18/2024] [Indexed: 07/13/2024]
Abstract
Feature detection plays a crucial role in non-target screening (NTS), requiring careful selection of algorithm parameters to minimize false positive (FP) features. In this study, a stochastic approach was employed to optimize the parameter settings of feature detection algorithms used in processing high-resolution mass spectrometry data. This approach was demonstrated using four open-source algorithms (OpenMS, SAFD, XCMS, and KPIC2) within the patRoon software platform for processing extracts from drinking water samples spiked with 46 per- and polyfluoroalkyl substances (PFAS). The designed method is based on a stochastic strategy involving random sampling from variable space and the use of Pearson correlation to assess the impact of each parameter on the number of detected suspect analytes. Using our approach, the optimized parameters led to improvement in the algorithm performance by increasing suspect hits in case of SAFD and XCMS, and reducing the total number of detected features (i.e., minimizing FP) for OpenMS. These improvements were further validated on three different drinking water samples as test dataset. The optimized parameters resulted in a lower false discovery rate (FDR%) compared to the default parameters, effectively increasing the detection of true positive features. This work also highlights the necessity of algorithm parameter optimization prior to starting the NTS to reduce the complexity of such datasets.
Collapse
Affiliation(s)
- Mohammad Sadia
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands.
| | - Youssef Boudguiyer
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands
| | - Rick Helmus
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands
| | - Marianne Seijo
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands
| | - Antonia Praetorius
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands
| | - Saer Samanipour
- Van'T Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
3
|
Nikita S, Bhattacharya S, Manocha K, Rathore AS. Deep learning framework for peak detection at the intact level of therapeutic proteins. J Sep Sci 2024; 47:e2400051. [PMID: 38819868 DOI: 10.1002/jssc.202400051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 05/14/2024] [Accepted: 05/21/2024] [Indexed: 06/01/2024]
Abstract
While automated peak detection functionalities are available in commercially accessible software, achieving optimal true positive rates frequently necessitates visual inspection and manual adjustments. In the initial phase of this study, hetero-variants (glycoforms) of a monoclonal antibody were distinguished using liquid chromatography-mass spectrometry, revealing discernible peaks at the intact level. To comprehensively identify each peak (hetero-variant) in the intact-level analysis, a deep learning approach utilizing convolutional neural networks (CNNs) was employed in the subsequent phase of the study. In the current case study, utilizing conventional software for peak identification, five peaks were detected using a 0.5 threshold, whereas seven peaks were identified using the CNN model. The model exhibited strong performance with a probability area under the curve (AUC) of 0.9949, surpassing that of partial least squares discriminant analysis (PLS-DA) (probability AUC of 0.8041), and locally weighted regression (LWR) (probability AUC of 0.6885) on the data acquired during experimentation in real-time. The AUC of the receiver operating characteristic curve also illustrated the superior performance of the CNN over PLS-DA and LWR.
Collapse
Affiliation(s)
- Saxena Nikita
- Department of Chemical Engineering, Indian Institute of Technology, Delhi, India
| | | | - Kriti Manocha
- Department of Chemical Engineering, Indian Institute of Technology, Delhi, India
| | - Anurag S Rathore
- Department of Chemical Engineering, Indian Institute of Technology, Delhi, India
| |
Collapse
|
4
|
Reuschenbach M, Drees F, Leupold MS, Tintrop LK, Schmidt TC, Renner G. qPeaks: A Linear Regression-Based Asymmetric Peak Model for Parameter-Free Automatized Detection and Characterization of Chromatographic Peaks in Non-Target Screening Data. Anal Chem 2024; 96:7120-7129. [PMID: 38666514 DOI: 10.1021/acs.analchem.4c00494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
We present qPeaks (quality peaks), a novel, user-parameter-free algorithm for peak detection and peak characterization applicable to chromatographic data. The algorithm is based on a linearizable regression model that analyzes asymmetric peaks and estimates the specific uncertainties associated with the peak regression parameters. The uncertainties of the parameters are used to derive a data quality score DQSpeak, rendering low reliability results more transparent during processing and allowing for the prioritization of generated features. High DQSpeak chromatographic peaks have a lower chance of being classified as false-positive and show higher repeatability over multiple measurements. The high efficiency of the algorithm makes it particularly useful for application within processing routines of nontarget screening through chromatography coupled with high-resolution mass spectrometry. qPeaks is integrated into the qAlgorithms nontarget screening processing toolbox and appends a parameter-free chromatographic peak detection and characterization step to it. With qAlgorithms, now high-resolution mass spectra are centroided using the qCentroids algorithms, centroids are clustered to form extracted ion chromatograms (EICs) with the qBinning algorithm, and chromatographic peaks are found on the generated EICs with qPeaks. However, all tools from qAlgorithms can also be used independently.
Collapse
Affiliation(s)
- Max Reuschenbach
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr.5, Essen 45141, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr.2, Essen 45141, Germany
| | - Felix Drees
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr.5, Essen 45141, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr.2, Essen 45141, Germany
| | - Michael S Leupold
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr.5, Essen 45141, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr.2, Essen 45141, Germany
| | - Lucie K Tintrop
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr.5, Essen 45141, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr.2, Essen 45141, Germany
| | - Torsten C Schmidt
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr.5, Essen 45141, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr.2, Essen 45141, Germany
- IWW Water Center, Moritzstr.26, Mülheim an der Ruhr 45476, Germany
| | - Gerrit Renner
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr.5, Essen 45141, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr.2, Essen 45141, Germany
| |
Collapse
|
5
|
van Herwerden D, O’Brien JW, Lege S, Pirok BWJ, Thomas KV, Samanipour S. Cumulative Neutral Loss Model for Fragment Deconvolution in Electrospray Ionization High-Resolution Mass Spectrometry Data. Anal Chem 2023; 95:12247-12255. [PMID: 37549176 PMCID: PMC10448439 DOI: 10.1021/acs.analchem.3c00896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 07/03/2023] [Indexed: 08/09/2023]
Abstract
Clean high-resolution mass spectra (HRMS) are essential to a successful structural elucidation of an unknown feature during nontarget analysis (NTA) workflows. This is a crucial step, particularly for the spectra generated during data-independent acquisition or during direct infusion experiments. The most commonly available tools only take advantage of the time domain for spectral cleanup. Here, we present an algorithm that combines the time domain and mass domain information to perform spectral deconvolution. The algorithm employs a probability-based cumulative neutral loss (CNL) model for fragment deconvolution. The optimized model, with a mass tolerance of 0.005 Da and a scoreCNL threshold of 0.00, was able to achieve a true positive rate (TPr) of 95.0%, a false discovery rate (FDr) of 20.6%, and a reduction rate of 35.4%. Additionally, the CNL model was extensively tested on real samples containing predominantly pesticides at different concentration levels and with matrix effects. Overall, the model was able to obtain a TPr above 88.8% with FD rates between 33 and 79% and reduction rates between 9 and 45%. Finally, the CNL model was compared with the retention time difference method and peak shape correlation analysis, showing that a combination of correlation analysis and the CNL model was the most effective for fragment deconvolution, obtaining a TPr of 84.7%, an FDr of 54.4%, and a reduction rate of 51.0%.
Collapse
Affiliation(s)
- Denice van Herwerden
- Van
’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1012 WX, The Netherlands
| | - Jake W. O’Brien
- Van
’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1012 WX, The Netherlands
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Brisbane 4102, Australia
| | - Sascha Lege
- Agilent
Technologies Deutschland GmbH, Waldbronn 76337, Germany
| | - Bob W. J. Pirok
- Van
’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1012 WX, The Netherlands
| | - Kevin V. Thomas
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Brisbane 4102, Australia
| | - Saer Samanipour
- Van
’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1012 WX, The Netherlands
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Brisbane 4102, Australia
- UvA
Data Science Center, University of Amsterdam, Amsterdam 1012 WP, The Netherlands
| |
Collapse
|
6
|
Renner G, Reuschenbach M. Critical review on data processing algorithms in non-target screening: challenges and opportunities to improve result comparability. Anal Bioanal Chem 2023; 415:4111-4123. [PMID: 37380744 PMCID: PMC10328864 DOI: 10.1007/s00216-023-04776-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 04/23/2023] [Accepted: 05/15/2023] [Indexed: 06/30/2023]
Abstract
Non-target screening (NTS) is a powerful environmental and analytical chemistry approach for detecting and identifying unknown compounds in complex samples. High-resolution mass spectrometry has enhanced NTS capabilities but created challenges in data analysis, including data preprocessing, peak detection, and feature extraction. This review provides an in-depth understanding of NTS data processing methods, focusing on centroiding, extracted ion chromatogram (XIC) building, chromatographic peak characterization, alignment, componentization, and prioritization of features. We discuss the strengths and weaknesses of various algorithms, the influence of user input parameters on the results, and the need for automated parameter optimization. We address uncertainty and data quality issues, emphasizing the importance of incorporating confidence intervals and raw data quality assessment in data processing workflows. Furthermore, we highlight the need for cross-study comparability and propose potential solutions, such as utilizing standardized statistics and open-access data exchange platforms. In conclusion, we offer future perspectives and recommendations for developers and users of NTS data processing algorithms and workflows. By addressing these challenges and capitalizing on the opportunities presented, the NTS community can advance the field, improve the reliability of results, and enhance data comparability across different studies.
Collapse
Affiliation(s)
- Gerrit Renner
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr. 5, Essen, D-45141, NRW, Germany.
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr. 2, Essen, D-45141, NRW, Germany.
| | - Max Reuschenbach
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr. 5, Essen, D-45141, NRW, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr. 2, Essen, D-45141, NRW, Germany
| |
Collapse
|
7
|
Feraud M, O'Brien JW, Samanipour S, Dewapriya P, van Herwerden D, Kaserzon S, Wood I, Rauert C, Thomas KV. InSpectra - A platform for identifying emerging chemical threats. JOURNAL OF HAZARDOUS MATERIALS 2023; 455:131486. [PMID: 37172382 DOI: 10.1016/j.jhazmat.2023.131486] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 04/20/2023] [Accepted: 04/23/2023] [Indexed: 05/14/2023]
Abstract
Non-target analysis (NTA) employing high-resolution mass spectrometry (HRMS) coupled with liquid chromatography is increasingly being used to identify chemicals of biological relevance. HRMS datasets are large and complex making the identification of potentially relevant chemicals extremely challenging. As they are recorded in vendor-specific formats, interpreting them is often reliant on vendor-specific software that may not accommodate advancements in data processing. Here we present InSpectra, a vendor independent automated platform for the systematic detection of newly identified emerging chemical threats. InSpectra is web-based, open-source/access and modular providing highly flexible and extensible NTA and suspect screening workflows. As a cloud-based platform, InSpectra exploits parallel computing and big data archiving capabilities with a focus for sharing and community curation of HRMS data. InSpectra offers a reproducible and transparent approach for the identification, tracking and prioritisation of emerging chemical threats.
Collapse
Affiliation(s)
- Mathieu Feraud
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| | - Jake W O'Brien
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia; Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Netherlands.
| | - Saer Samanipour
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia; Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Netherlands; UvA Data Science Center, University of Amsterdam, Netherlands.
| | - Pradeep Dewapriya
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| | - Denice van Herwerden
- Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Netherlands
| | - Sarit Kaserzon
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| | - Ian Wood
- School of Mathematics and Physics, The University of Queensland, Australia
| | - Cassandra Rauert
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| | - Kevin V Thomas
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| |
Collapse
|
8
|
Boelrijk J, van Herwerden D, Ensing B, Forré P, Samanipour S. Predicting RP-LC retention indices of structurally unknown chemicals from mass spectrometry data. J Cheminform 2023; 15:28. [PMID: 36829215 PMCID: PMC9960388 DOI: 10.1186/s13321-023-00699-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 02/13/2023] [Indexed: 02/26/2023] Open
Abstract
Non-target analysis combined with liquid chromatography high resolution mass spectrometry is considered one of the most comprehensive strategies for the detection and identification of known and unknown chemicals in complex samples. However, many compounds remain unidentified due to data complexity and limited number structures in chemical databases. In this work, we have developed and validated a novel machine learning algorithm to predict the retention index (r[Formula: see text]) values for structurally (un)known chemicals based on their measured fragmentation pattern. The developed model, for the first time, enabled the predication of r[Formula: see text] values without the need for the exact structure of the chemicals, with an [Formula: see text] of 0.91 and 0.77 and root mean squared error (RMSE) of 47 and 67 r[Formula: see text] units for the NORMAN ([Formula: see text]) and amide ([Formula: see text]) test sets, respectively. This fragment based model showed comparable accuracy in r[Formula: see text] prediction compared to conventional descriptor-based models that rely on known chemical structure, which obtained an [Formula: see text] of 0.85 with an RMSE of 67.
Collapse
Affiliation(s)
- Jim Boelrijk
- AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands. .,Institute for Informatics, University of Amsterdam, Amsterdam, The Netherlands.
| | - Denice van Herwerden
- grid.7177.60000000084992262Van’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, The Netherlands
| | - Bernd Ensing
- grid.7177.60000000084992262AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands ,Computational Chemistry Group, Van’t Hoff Institute for Molecular Sciences (HIMS), Amsterdam, The Netherlands
| | - Patrick Forré
- grid.7177.60000000084992262AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands ,grid.7177.60000000084992262Institute for Informatics, University of Amsterdam, Amsterdam, The Netherlands
| | - Saer Samanipour
- Van't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, The Netherlands. .,UvA Data Science Center, University of Amsterdam, Amsterdam, The Netherlands. .,Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Woolloongabba, Australia.
| |
Collapse
|
9
|
Development of a scoring parameter to characterize data quality of centroids in high-resolution mass spectra. Anal Bioanal Chem 2022; 414:6635-6645. [PMID: 35871703 PMCID: PMC9411079 DOI: 10.1007/s00216-022-04224-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 05/31/2022] [Accepted: 07/06/2022] [Indexed: 11/29/2022]
Abstract
High-resolution mass spectrometry is widely used in many research fields allowing for accurate mass determinations. In this context, it is pretty standard that high-resolution profile mode mass spectra are reduced to centroided data, which many data processing routines rely on for further evaluation. Yet information on the peak profile quality is not conserved in those approaches; i.e., describing results reliability is almost impossible. Therefore, we overcome this limitation by developing a new statistical parameter called data quality score (DQS). For the DQS calculations, we performed a very fast and robust regression analysis of the individual high-resolution peak profiles and considered error propagation to estimate the uncertainties of the regression coefficients. We successfully validated the new algorithm with the vendor-specific algorithm implemented in Proteowizard’s msConvert. Moreover, we show that the DQS is a sum parameter associated with centroid accuracy and precision. We also demonstrate the benefit of the new algorithm in nontarget screenings as the DQS prioritizes signals that are not influenced by non-resolved isobaric ions or isotopic fine structures. The algorithm is implemented in Python, R, and Julia programming languages and supports multi- and cross-platform downstream data handling.
Collapse
|
10
|
Minkus S, Bieber S, Letzel T. Spotlight on mass spectrometric non-target screening analysis: Advanced data processing methods recently communicated for extracting, prioritizing and quantifying features. ANALYTICAL SCIENCE ADVANCES 2022; 3:103-112. [PMID: 38715638 PMCID: PMC10989605 DOI: 10.1002/ansa.202200001] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2022] [Revised: 03/22/2022] [Accepted: 03/24/2022] [Indexed: 06/13/2024]
Abstract
Non-target screening of trace organic compounds complements routine monitoring of water bodies. So-called features need to be extracted from the raw data that preferably represent a chemical compound. Relevant features need to be prioritized and further be interpreted, for instance by identifying them. Finally, quantitative data is required to assess the risks of a detected compound. This review presents recent and noteworthy contributions to the processing of non-target screening (NTS) data, prioritization of features as well as (semi-) quantitative methods that do not require analytical standards. The focus lies on environmental water samples measured by liquid chromatography, electrospray ionization and high-resolution mass spectrometry. Examples for fully-integrated data processing workflows are given with options for parameter optimization and choosing between different feature extraction algorithms to increase feature coverage. The regions of interest-multivariate curve resolution method is reviewed which combines a data compression alternative with chemometric feature extraction. Furthermore, prioritization strategies based on a confined chemical space for annotation, guidance by targeted analysis and signal intensity are presented. Exploiting the retention time (RT) as diagnostic evidence for NTS investigations is highlighted by discussing RT indexing and prediction using quantitative structure-retention relationship models. Finally, a seminal technology for quantitative NTS is discussed without the need for analytical standards based on predicting ionization efficiencies.
Collapse
Affiliation(s)
- Susanne Minkus
- AFIN‐TS GmbHAugsburgGermany
- Technical University of Munich (Chair of Urban Water Systems Engineering)MunichGermany
| | | | - Thomas Letzel
- AFIN‐TS GmbHAugsburgGermany
- Technical University of Munich (Chair of Urban Water Systems Engineering)MunichGermany
| |
Collapse
|