Ekpe OD, Choo G, Kang JK, Yun ST, Oh JE. Identification of organic chemical indicators for tracking pollution sources in groundwater by machine learning from GC-HRMS-based suspect and non-target screening data.
WATER RESEARCH 2024;
252:121130. [PMID:
38295453 DOI:
10.1016/j.watres.2024.121130]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 01/05/2024] [Accepted: 01/10/2024] [Indexed: 02/02/2024]
Abstract
In this study, the strong analytical power of gas chromatography coupled to a high resolution mass spectrometry (GC-HRMS) in suspect and non-target screening (SNTS) of organic micropollutants was combined with machine learning tools for proposing a novel and robust systematic environmental forensics workflow, focusing on groundwater contamination. Groundwater samples were collected from four different regions with diverse contamination histories (namely oil [OC], agricultural [AGR], industrial [IND], and landfill [LF]), and a total of 252 organic micropollutants were identified, including pharmaceuticals, personal care products, pesticides, polycyclic aromatic hydrocarbons, plasticizers, phenols, organophosphate flame retardants, transformation products, and others, with detection frequencies ranging from 3 % to 100 %. Amongst the SNTS identified compounds, a total of 51 chemical indicators (i.e., OC: 13, LF: 12, AGR: 19, IND: 7) which included level 1 and 2 SNTS identified chemicals were pinpointed across all sampling regions by integrating a bootstrapped feature selection method involving the bootfs algorithm and a partial least squares discriminant analysis (PLS-DA) model to determine potential prevalent contamination sources. The proposed workflow showed good predictive ability (Q2) of 0.897, and the suggested contamination sources were gasoline, diesel, and/or other light petroleum products for the OC region, anthropogenic activities for the LF region, agricultural and human activities for the AGR region, and industrial/human activities for the IND region. These results suggest that the proposed workflow can select a subset of the most diagnostic features in the chemical space that can best distinguish a specific contamination source class.
Collapse