1
|
von Borries K, Holmquist H, Kosnik M, Beckwith KV, Jolliet O, Goodman JM, Fantke P. Potential for Machine Learning to Address Data Gaps in Human Toxicity and Ecotoxicity Characterization. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:18259-18270. [PMID: 37914529 PMCID: PMC10666540 DOI: 10.1021/acs.est.3c05300] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 10/12/2023] [Accepted: 10/13/2023] [Indexed: 11/03/2023]
Abstract
Machine Learning (ML) is increasingly applied to fill data gaps in assessments to quantify impacts associated with chemical emissions and chemicals in products. However, the systematic application of ML-based approaches to fill chemical data gaps is still limited, and their potential for addressing a wide range of chemicals is unknown. We prioritized chemical-related parameters for chemical toxicity characterization to inform ML model development based on two criteria: (1) each parameter's relevance to robustly characterize chemical toxicity described by the uncertainty in characterization results attributable to each parameter and (2) the potential for ML-based approaches to predict parameter values for a wide range of chemicals described by the availability of chemicals with measured parameter data. We prioritized 13 out of 38 parameters for developing ML-based approaches, while flagging another nine with critical data gaps. For all prioritized parameters, we performed a chemical space analysis to assess further the potential for ML-based approaches to predict data for diverse chemicals considering the structural diversity of available measured data, showing that ML-based approaches can potentially predict 8-46% of marketed chemicals based on 1-10% with available measured data. Our results can systematically inform future ML model development efforts to address data gaps in chemical toxicity characterization.
Collapse
Affiliation(s)
- Kerstin von Borries
- Quantitative
Sustainability Assessment, Department of Environmental and Resource
Engineering, Technical University of Denmark, Bygningstorvet 115, 2800 Kgs. Lyngby, Denmark
| | - Hanna Holmquist
- IVL
Swedish Environmental Research Institute, Aschebergsgatan 44, 411 33 Göteborg, Sweden
| | - Marissa Kosnik
- Quantitative
Sustainability Assessment, Department of Environmental and Resource
Engineering, Technical University of Denmark, Bygningstorvet 115, 2800 Kgs. Lyngby, Denmark
| | - Katie V. Beckwith
- Centre
for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United
Kingdom
| | - Olivier Jolliet
- Quantitative
Sustainability Assessment, Department of Environmental and Resource
Engineering, Technical University of Denmark, Bygningstorvet 115, 2800 Kgs. Lyngby, Denmark
| | - Jonathan M. Goodman
- Centre
for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United
Kingdom
| | - Peter Fantke
- Quantitative
Sustainability Assessment, Department of Environmental and Resource
Engineering, Technical University of Denmark, Bygningstorvet 115, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
2
|
Zhang B, Hou H, Liu L, Huang Z, Zhao L. Spatial prediction and influencing factors identification of potential toxic element contamination in soil of different karst landform regions using integration model. CHEMOSPHERE 2023; 327:138404. [PMID: 36931406 DOI: 10.1016/j.chemosphere.2023.138404] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/05/2023] [Accepted: 03/13/2023] [Indexed: 06/18/2023]
Abstract
The prediction of contamination distribution of potentially toxic elements (PTEs) in soils of Guangxi province, China and the identification of their controlling factors pose great challenges due to diverse bedrock types, intense leaching and weathering, and discontinuous terrain distributions. Herein, we integrated the random forest (RF) and empirical Bayesian kriging (EBK) to interpret and predict complex PTEs contamination distribution from three different karst landform regions (fenglin, fengcong, isolated peak plain) in Guangxi province. The modeling results are compared with the commonly used ordinary kriging and regression-kriging. In this study, our developed RF-EBK model combines the advantages of the RF and EBK model to promote the prediction accurately and efficiently. In this study, it was shown that the integration RF-EBK model exhibited desirable for Cd and As concentrations, with R2 of 0.89 and 0.83, respectively. The average RMSE and MAE of integration RF-EBK model decreased by 39% and 44%, respectively, relative to the regression-kriging with the second highest accuracy. Furthermore, the modeling results showed that approximately 41.96% and 18.96% of total area was classified as Cd and As polluted and above regions (Igeo >0) in Guangxi province, respectively. Higher Cd concentration was observed in the soil of fenglin and fengcong regions than that in isolated peak plain region due to the secondary enrichment and parent rock inheritance, while the As concentration exhibited no significant difference among the three regions. The modeling results indicated that the elevated Cd concentration might be associated with soil CaO concentration and alkaline soil environment, whereas As concentration tended to be increased with the elevating Fe2O3 concentrations in weakly acidic soil environment. This result confirmed the applicability and effectiveness of integration model in predicting complex spatial patterns of soil PTEs and identifying their controlling factors.
Collapse
Affiliation(s)
- Bolun Zhang
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China; School of Chemical & Environmental Engineering, China University of Mining and Technology-Beijing, Beijing, 100083, China
| | - Hong Hou
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China.
| | - Lingling Liu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China
| | - Zhanbin Huang
- School of Chemical & Environmental Engineering, China University of Mining and Technology-Beijing, Beijing, 100083, China
| | - Long Zhao
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China
| |
Collapse
|
3
|
Zhang B, Hou H, Huang Z, Zhao L. Estimation of heavy metal soil contamination distribution, hazard probability, and population at risk by machine learning prediction modeling in Guangxi, China. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 330:121607. [PMID: 37031848 DOI: 10.1016/j.envpol.2023.121607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 03/20/2023] [Accepted: 04/07/2023] [Indexed: 05/27/2023]
Abstract
Due to superposition of diverse pollution sources, soil heavy metal concentrations have been detected to exceed the recommended maximum permissible levels in many areas of Guangxi province, China. However, the heavy metal contamination distribution, hazard probability, and population at risk of heavy metals in the entire Guangxi province remain largely unclear. In this study, machine learning prediction models with different standard risk values determined according to land use types were used to identify high-risk areas and estimate populations at risk of Cr and Ni based on 658 topsoil samples from Guangxi province, China. Our results showed that soil Cr and Ni contamination derived from carbonate rocks was relatively serious in Guangxi province, and that their co-enrichment during soil formation was associated with Fe and Mn oxides and alkaline soil environment. Our established model exhibited excellent performance in predicting contamination distribution (R2 > 0.85) and hazard probability (AUC>0.85). Pollution of Cr and Ni exhibited a pattern of decreasing gradually from the central-west areas to the surrounding areas with the polluted area (Igeo>0) of Cr and Ni accounting for approximately 24.46% and 29.24% of total area in Guangxi province, respectively, but only 10.4% and 8.51% of total area was classified as Cr and Ni high-risk regions. We estimated approximately 1.44 and 1.47 million people were potentially exposed to the risk of Cr and Ni contamination, which were mainly concentrated in the Nanning, Laibin, and Guigang. These regions are main heavily-populated agricultural regions in Guangxi, and thus heavy metal contamination localization and risk control in these regions are urgent and essential from the perspective of food safety.
Collapse
Affiliation(s)
- Bolun Zhang
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China; School of Chemical & Environmental Engineering, China University of Mining and Technology-Beijing, Beijing, 100083, China
| | - Hong Hou
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China.
| | - Zhanbin Huang
- School of Chemical & Environmental Engineering, China University of Mining and Technology-Beijing, Beijing, 100083, China
| | - Long Zhao
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China
| |
Collapse
|
4
|
Shen Y, Zhao E, Zhang W, Baccarelli AA, Gao F. Predicting pesticide dissipation half-life intervals in plants with machine learning models. JOURNAL OF HAZARDOUS MATERIALS 2022; 436:129177. [PMID: 35643003 DOI: 10.1016/j.jhazmat.2022.129177] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 05/04/2022] [Accepted: 05/15/2022] [Indexed: 06/15/2023]
Abstract
Pesticide dissipation half-life in plants is an important factor to assessing environmental fate of pesticides and establishing pre-harvest intervals critical to good agriculture practices. However, empirically measured pesticide dissipation half-lives are highly variable and the accurate prediction with models is challenging. This study utilized a dataset of pesticide dissipation half-lives containing 1363 datapoints, 311 pesticides, 10 plant types, and 4 plant component classes. Novel dissipation half-life intervals were proposed and predicted to account for high variations in empirical data. Four machine learning models (i.e., gradient boosting regression tree [GBRT], random forest [RF], supporting vector classifier [SVC], and logistic regression [LR]) were developed to predict dissipation half-life intervals using extended connectivity fingerprints (ECFP), temperature, plant type, and plant component class as model inputs. GBRT-ECFP had the best model performance with F1-microbinary score of 0.698 ± 0.010 for the binary classification compared with other machine learning models (e.g., LR-ECFP, F1-microbinary= 0.662 ± 0.009). Feature importance analysis of molecular structures in the binary classification identified aromatic rings, carbonyl group, organophosphate, =C-H, and N-containing heterocyclic groups as important substructures related to pesticide dissipation half-lives. This study suggests the utility of machine learning models in assessing the environmental fate of pesticides in agricultural crops.
Collapse
Affiliation(s)
- Yike Shen
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY 10032, United States
| | - Ercheng Zhao
- Institute of Plant Protection, Beijing Academy of Agricultural and Forestry Science, Beijing 100097, PR China
| | - Wei Zhang
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, Michigan 48823, United States.
| | - Andrea A Baccarelli
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY 10032, United States
| | - Feng Gao
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY 10032, United States.
| |
Collapse
|