1
|
Cain CN, Ochoa GS, Synovec RE. Enhancing partial least squares modeling of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry data by tile-based variance ranking. J Chromatogr A 2023; 1694:463920. [PMID: 36933463 DOI: 10.1016/j.chroma.2023.463920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 03/02/2023] [Accepted: 03/07/2023] [Indexed: 03/12/2023]
Abstract
Chemometric methods like partial least squares (PLS) regression are valuable for correlating sample-based differences hidden in comprehensive two-dimensional gas chromatography (GC × GC) data to independently measured physicochemical properties. Herein, this work establishes the first implementation of tile-based variance ranking as a selective data reduction methodology to improve PLS modeling performance of 58 diverse aerospace fuels. Tile-based variance ranking discovered a total of 521 analytes with a square of the relative standard deviation (RSD2) in signal between 0.07 to 22.84. The goodness-of-fit for the models were determined by their normalized root-mean-square error of cross-validation (NRMSECV) and normalized root-mean-square error of prediction (NRMSEP). PLS models developed for viscosity, hydrogen content, and heat of combustion using all 521 features discovered by tile-based variance ranking had a respective NRMSECV (NRMSEP) equal to 10.5 % (10.2 %), 8.3 % (7.6 %), and 13.1 % (13.5 %). In contrast, use of a single-grid binning scheme, a common data reduction strategy for PLS analysis, resulted in less accurate models for viscosity (NRMSECV = 14.2 %; NRMSEP = 14.3 %), hydrogen content (NRMSECV = 12.1 %; NRMSEP = 11.0 %), and heat of combustion (NRMSECV = 14.4 %; NRMSEP = 13.6 %). Further, the features discovered by tile-based variance ranking can be optimized for each PLS model with RReliefF analysis, a machine learning algorithm. RReliefF feature optimization selected 48, 125, and 172 analytes out of the original 521 discovered by tile-based variance ranking to model viscosity, hydrogen content, and heat of combustion, respectively. The RReliefF optimized features developed highly accurate property-composition models for viscosity (NRMSECV = 7.9 %; NRMSEP = 5.8 %), hydrogen content (NRMSECV = 7.0 %; NRMSEP = 4.9 %), heat of combustion (NRMSECV = 7.9 %; NRMSEP = 8.4 %). This work also demonstrates that processing the chromatograms with a tile-based approach allows the analyst to directly identify the analytes of importance in a PLS model. Coupling tile-based feature selection with PLS analysis allows for deeper understanding in any property-composition study.
Collapse
Affiliation(s)
- Caitlin N Cain
- Department of Chemistry, University of Washington, Box 351700, Seattle, WA, 98195, USA
| | - Grant S Ochoa
- Department of Chemistry, University of Washington, Box 351700, Seattle, WA, 98195, USA
| | - Robert E Synovec
- Department of Chemistry, University of Washington, Box 351700, Seattle, WA, 98195, USA.
| |
Collapse
|
2
|
Trinklein TJ, Cain CN, Ochoa GS, Schöneich S, Mikaliunaite L, Synovec RE. Recent Advances in GC×GC and Chemometrics to Address Emerging Challenges in Nontargeted Analysis. Anal Chem 2023; 95:264-286. [PMID: 36625122 DOI: 10.1021/acs.analchem.2c04235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
- Timothy J Trinklein
- Department of Chemistry, University of Washington, Box 351700, Seattle, Washington 98195-1700, United States
| | - Caitlin N Cain
- Department of Chemistry, University of Washington, Box 351700, Seattle, Washington 98195-1700, United States
| | - Grant S Ochoa
- Department of Chemistry, University of Washington, Box 351700, Seattle, Washington 98195-1700, United States
| | - Sonia Schöneich
- Department of Chemistry, University of Washington, Box 351700, Seattle, Washington 98195-1700, United States
| | - Lina Mikaliunaite
- Department of Chemistry, University of Washington, Box 351700, Seattle, Washington 98195-1700, United States
| | - Robert E Synovec
- Department of Chemistry, University of Washington, Box 351700, Seattle, Washington 98195-1700, United States
| |
Collapse
|
3
|
Trinklein TJ, Synovec RE. Simulating comprehensive two-dimensional gas chromatography mass spectrometry data with realistic run-to-run shifting to evaluate the robustness of tile-based Fisher ratio analysis. J Chromatogr A 2022; 1677:463321. [PMID: 35853427 DOI: 10.1016/j.chroma.2022.463321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/01/2022] [Accepted: 07/07/2022] [Indexed: 10/17/2022]
Abstract
Untargeted analysis of comprehensive two-dimensional (2D) gas chromatography time-of-flight mass spectrometry (GC×GC-TOFMS) data has the potential to be hindered by run-to-run retention time shifting. To address this challenge, tile-based Fisher ratio (F-ratio) analysis (FRA) has been developed, which utilizes a supervised, untargeted approach involving a chromatographic segmentation routine termed "tiling" combined with the ANOVA F-ratio statistic to discover class-distinguishing analytes while minimizing false positives arising from shifting. The tiling algorithm is designed to account for retention shifting in both separation dimensions. Although applications of FRA have been reported, there remains a need to thoroughly evaluate the robustness of FRA for different levels of run-to-run retention shifting in order to broaden the scope of its application. To this end, a novel method of simulating GC×GC-TOFMS chromatograms with realistic run-to-run shifting is presented by random generation of low-frequency "shift functions". The dimensionless retention-time precision, <δr>, which is four times the standard deviation in retention time normalized to the peak width-at-base is used as a key modeling variable along with the 2D chromatographic saturation, αe,2D, and within-class relative standard deviation in peak area, RSDwc. We demonstrate that all three of these variables operate together to impact true positive discovery. To quantify the "success" of true positive discovery, GC×GC-TOFMS datasets for various combinations of <δr>, αe,2D, and RSDwc were simulated and then analyzed by FRA using a wide range of relative tile areas (RTA), which is a dimensionless measure of tile size. Since each hit in the FRA hit list was known a priori as either a true or false positive based on the simulation inputs, receiver operating characteristic (ROC) curves were readily constructed. Then, the area under the ROC curve (AUROC) was used as a metric for discovery "success" for various combinations of the modeling variables. Based on the results of this study, recommendations for tile size selection and experimental design are provided, and further supported by comparison to previous tile-based FRA applications. For instance, values for <δr>, αe,2D, and RSDwc obtained from a GC×GC-TOFMS dataset of yeast metabolites suggested an optimum RTA of 6.25, corresponding closely to the RTA of 4.00 employed in the study, implying the simulation results obtained here can be generalized to real datasets.
Collapse
Affiliation(s)
- Timothy J Trinklein
- Department of Chemistry, Box 351700, University of Washington, Seattle, WA 98195, USA
| | - Robert E Synovec
- Department of Chemistry, Box 351700, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
4
|
Sudol PE, Ochoa GS, Cain CN, Synovec RE. Tile-based variance rank initiated-unsupervised sample indexing for comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry. Anal Chim Acta 2022; 1209:339847. [DOI: 10.1016/j.aca.2022.339847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 03/13/2022] [Accepted: 04/16/2022] [Indexed: 11/30/2022]
|
5
|
Sudol PE, Galletta M, Tranchida PQ, Zoccali M, Mondello L, Synovec RE. Untargeted profiling and differentiation of geographical variants of wine samples using headspace solid-phase microextraction flow-modulated comprehensive two-dimensional gas chromatography with the support of tile-based Fisher ratio analysis. J Chromatogr A 2021; 1662:462735. [PMID: 34936905 DOI: 10.1016/j.chroma.2021.462735] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 11/29/2021] [Accepted: 12/02/2021] [Indexed: 12/25/2022]
Abstract
The volatile fraction of food, also called the food volatilome, is increasingly used to develop new fingerprinting approaches. The characterization of the food volatilome is important to achieve desired flavor profiles in food production processes, or to differentiate different products, with winemaking being one popular area of interest. In the present research, headspace solid-phase microextraction (HS SPME) coupled to flow-modulated comprehensive two-dimensional gas chromatography with time-of-flight mass spectrometry (FM GC×GC-TOFMS) was used to characterize geographical-based differences in the volatilome of five white "Grillo" wines (of Sicilian origin), comprising the five sample classes. All wines were produced with the same vinification method in 2019. To minimize the influence of minor bottle-to-bottle differences, three bottles of the same wine were randomly selected, and three samples were collected per bottle, resulting in nine sample replicates per wine. Particular emphasis was devoted to the operational conditions of a novel low duty cycle flow modulator. A fast FM GC×GC-TOFMS method with a modulation period of 700 ms and a re-injection period of 80 ms was developed. Following, the instrumental software was exploited to identify class-distinguishing analytes in the dataset via tile-based Fisher ratio analysis (i.e., ChromaTOF Tile). A tile size of 10 modulations (7 s) on the first dimension and 45 spectra (300 ms) on the second dimension was used to encompass average peak widths and to account for minor retention time shifting. Off-line software was used to apply an ANOVA test. A p-value of 0.01 was applied in order to select the most important class-distinguishing analytes, which were input to principal component analysis (PCA). The PCA scores plot showed distinct clustering of the wines according to geographical origin, although the loadings revealed that only a few analytes were necessary to differentiate the wines. However, a comprehensive flavor profile assessment underscored the importance of all the information output by the ChromaTOF Tile software.
Collapse
Affiliation(s)
- Paige E Sudol
- Department of Chemistry, Box 351700, University of Washington, Seattle, WA 98195, United States of America
| | - Micaela Galletta
- Department of Chemical, Biological, Pharmaceutical and Environmental Sciences, University of Messina, Messina, Italy
| | - Peter Q Tranchida
- Department of Chemical, Biological, Pharmaceutical and Environmental Sciences, University of Messina, Messina, Italy
| | - Mariosimone Zoccali
- Department of Mathematical and Computer Science, Physical Sciences and Earth Sciences, University of Messina, Messina, Italy.
| | - Luigi Mondello
- Department of Chemical, Biological, Pharmaceutical and Environmental Sciences, University of Messina, Messina, Italy; Chromaleont s.r.l., c/o Department of Chemical, Biological, Pharmaceutical and Environmental Sciences, University of Messina, Messina, Italy; BeSep s.r.l., c/o Department of Chemical, Biological, Pharmaceutical and Environmental Sciences, University of Messina, Messina, Italy; Unit of Food Science and Nutrition, Department of Medicine, University Campus Bio-Medico of Rome, Rome, Italy
| | - Robert E Synovec
- Department of Chemistry, Box 351700, University of Washington, Seattle, WA 98195, United States of America
| |
Collapse
|