Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Barupal DK, Baygi SF, Wright RO, Arora M. Data Processing Thresholds for Abundance and Sparsity and Missed Biological Insights in an Untargeted Chemical Analysis of Blood Specimens for Exposomics. Front Public Health 2021;9:653599. [PMID: 34178917 PMCID: PMC8222544 DOI: 10.3389/fpubh.2021.653599] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 05/19/2021] [Indexed: 01/27/2023] Open

For:	Barupal DK, Baygi SF, Wright RO, Arora M. Data Processing Thresholds for Abundance and Sparsity and Missed Biological Insights in an Untargeted Chemical Analysis of Blood Specimens for Exposomics. Front Public Health 2021;9:653599. [PMID: 34178917 PMCID: PMC8222544 DOI: 10.3389/fpubh.2021.653599] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 05/19/2021] [Indexed: 01/27/2023] Open

Number

Cited by Other Article(s)

Kumler W, Hazelton BJ, Ingalls AE. Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics. BMC Bioinformatics 2023;24:404. [PMID: 37891484 PMCID: PMC10612323 DOI: 10.1186/s12859-023-05533-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 10/16/2023] [Indexed: 10/29/2023] Open

Abstract

BACKGROUND

Chromatographic peakpicking continues to represent a significant bottleneck in automated LC-MS workflows. Uncontrolled false discovery rates and the lack of manually-calibrated quality metrics require researchers to visually evaluate individual peaks, requiring large amounts of time and breaking replicability. This problem is exacerbated in noisy environmental datasets and for novel separation methods such as hydrophilic interaction columns in metabolomics, creating a demand for a simple, intuitive, and robust metric of peak quality.

RESULTS

Here, we manually labeled four HILIC oceanographic particulate metabolite datasets to assess the performance of individual peak quality metrics. We used these datasets to construct a predictive model calibrated to the likelihood that visual inspection by an MS expert would include a given mass feature in the downstream analysis. We implemented two novel peak quality metrics, a custom signal-to-noise metric and a test of similarity to a bell curve, both calculated from the raw data in the extracted ion chromatogram, and found that these outperformed existing measurements of peak quality. A simple logistic regression model built on two metrics reduced the fraction of false positives in the analysis from 70-80% down to 1-5% and showed minimal overfitting when applied to novel datasets. We then explored the implications of this quality thresholding on the conclusions obtained by the downstream analysis and found that while only 10% of the variance in the dataset could be explained by depth in the default output from the peakpicker, approximately 40% of the variance was explained when restricted to high-quality peaks alone.

CONCLUSIONS

We conclude that the poor performance of peakpicking algorithms significantly reduces the power of both univariate and multivariate statistical analyses to detect environmental differences. We demonstrate that simple models built on intuitive metrics and derived from the raw data are more robust and can outperform more complex models when applied to new data. Finally, we show that in properly curated datasets, depth is a major driver of variability in the marine microbial metabolome and identify several interesting metabolite trends for future investigation.

Collapse

Guo J, Huan T. Mechanistic Understanding of the Discrepancies between Common Peak Picking Algorithms in Liquid Chromatography–Mass Spectrometry-Based Metabolomics. Anal Chem 2023;95:5894-5902. [PMID: 36972195 DOI: 10.1021/acs.analchem.2c04887] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]

Abstract

Inconsistent peak picking outcomes are a critical concern in processing liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics data. This work systematically studied the mechanisms behind the discrepancies among five commonly used peak picking algorithms, including CentWave in XCMS, linear-weighted moving average in MS-DIAL, automated data analysis pipeline (ADAP) in MZmine 2, Savitzky-Golay in El-MAVEN, and FeatureFinderMetabo in OpenMS. We first collected 10 public metabolomics datasets representing various LC-MS analytical conditions. We then incorporated several novel strategies to (i) acquire the optimal peak picking parameters of each algorithm for a fair comparison, (ii) automatically recognize false metabolic features with poor chromatographic peak shapes, and (iii) evaluate the real metabolic features that are missed by the algorithms. By applying these strategies, we compared the true, false, and undetected metabolic features in each data processing outcome. Our results show that linear-weighted moving average consistently outperforms the other peak picking algorithms. To facilitate a mechanistic understanding of the differences, we proposed six peak attributes: ideal slope, sharpness, peak height, mass deviation, peak width, and scan number. We also developed an R program to automatically measure these attributes for detected and undetected true metabolic features. From the results of the 10 datasets, we concluded that four peak attributes, including ideal slope, scan number, peak width, and mass deviation, are critical for the detectability of a peak. For instance, the focus on ideal slope critically hinders the extraction of true metabolic features with low ideal slope scores in linear-weighted moving average, Savitzky-Golay, and ADAP. The relationships between peak picking algorithms and peak attributes were also visualized in a principal component analysis biplot. Overall, the clear comparison and explanation of the differences between peak picking algorithms can lead to the design of better peak picking strategies in the future.

Collapse

Houriet J, Vidar WS, Manwill PK, Todd DA, Cech NB. How Low Can You Go? Selecting Intensity Thresholds for Untargeted Metabolomics Data Preprocessing. Anal Chem 2022;94:17964-17971. [PMID: 36516972 DOI: 10.1021/acs.analchem.2c04088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Barupal DK. Response: Commentary: Data processing thresholds for abundance and sparsity and missed biological insights in an untargeted chemical analysis of blood specimens for exposomics. Front Public Health 2022;10:1003148. [PMID: 36330107 PMCID: PMC9622927 DOI: 10.3389/fpubh.2022.1003148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 09/28/2022] [Indexed: 01/27/2023] Open

Petrick LM, Shomron N. AI/ML-driven advances in untargeted metabolomics and exposomics for biomedical applications. CELL REPORTS. PHYSICAL SCIENCE 2022;3:100978. [PMID: 35936554 PMCID: PMC9354369 DOI: 10.1016/j.xcrp.2022.100978] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Barupal DK, Mahajan P, Fakouri-Baygi S, Wright RO, Arora M, Teitelbaum SL. CCDB: A database for exploring inter-chemical correlations in metabolomics and exposomics datasets. ENVIRONMENT INTERNATIONAL 2022;164:107240. [PMID: 35461097 PMCID: PMC9195052 DOI: 10.1016/j.envint.2022.107240] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 04/01/2022] [Accepted: 04/08/2022] [Indexed: 05/18/2023]

Fakouri Baygi S, Kumar Y, Barupal DK. IDSL.IPA Characterizes the Organic Chemical Space in Untargeted LC/HRMS Data Sets. J Proteome Res 2022;21:1485-1494. [PMID: 35579321 DOI: 10.1021/acs.jproteome.2c00120] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

Keski-Rahkonen P, Robinson O, Alfano R, Plusquin M, Scalbert A. Commentary: Data Processing Thresholds for Abundance and Sparsity and Missed Biological Insights in an Untargeted Chemical Analysis of Blood Specimens for Exposomics. Front Public Health 2022;9:755837. [PMID: 35111711 PMCID: PMC8801530 DOI: 10.3389/fpubh.2021.755837] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 12/06/2021] [Indexed: 11/13/2022] Open