1
|
Li H, Chu Y, Zhu Y, Han X, Shu S. Trihalomethane prediction model for water supply system based on machine learning and Log-linear regression. ENVIRONMENTAL GEOCHEMISTRY AND HEALTH 2024; 46:31. [PMID: 38227052 DOI: 10.1007/s10653-023-01778-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 11/09/2023] [Indexed: 01/17/2024]
Abstract
Laboratory determination of trihalomethanes (THMs) is a very time-consuming task. Therefore, establishing a THMs model using easily obtainable water quality parameters would be very helpful. This study explored the modeling methods of the random forest regression (RFR) model, support vector regression (SVR) model, and Log-linear regression model to predict the concentration of total-trihalomethanes (T-THMs), bromodichloromethane (BDCM), and dibromochloromethane (DBCM), using nine water quality parameters as input variables. The models were developed and tested using a dataset of 175 samples collected from a water treatment plant. The results showed that the RFR model, with the optimal parameter combination, outperformed the Log-linear regression model in predicting the concentration of T-THMs (N25 = 82-88%, rp = 0.70-0.80), while the SVR model performed slightly better than the RFR model in predicting the concentration of BDCM (N25 = 85-98%, rp = 0.70-0.97). The RFR model exhibited superior performance compared to the other two models in predicting the concentration of T-THMs and DBCM. The study concludes that the RFR model is superior overall to the SVR model and Log-linear regression models and could be used to monitor THMs concentration in water supply systems.
Collapse
Affiliation(s)
- Hui Li
- College of Environmental Science and Engineering, Donghua University, No. 2999 North Renmin Road, Shanghai, 201620, China
| | - Yangyang Chu
- College of Environmental Science and Engineering, Donghua University, No. 2999 North Renmin Road, Shanghai, 201620, China
| | - Yanping Zhu
- College of Environmental Science and Engineering, Donghua University, No. 2999 North Renmin Road, Shanghai, 201620, China
| | - Xiaomeng Han
- College of Environmental Science and Engineering, Donghua University, No. 2999 North Renmin Road, Shanghai, 201620, China
| | - Shihu Shu
- College of Environmental Science and Engineering, Donghua University, No. 2999 North Renmin Road, Shanghai, 201620, China.
| |
Collapse
|
2
|
Li X, Li Z, Shen H, Zhao H, Qin G, Xue J. Effects of long-term and low-concentration exposures of benzene and formaldehyde on mortality of Drosophila melanogaster. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2022; 300:118924. [PMID: 35104555 DOI: 10.1016/j.envpol.2022.118924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 01/07/2022] [Accepted: 01/26/2022] [Indexed: 06/14/2023]
Abstract
Single-chemical thresholds cannot comprehensively evaluate the risk of chemical mixture exposure in indoor air. Moreover, a large number of researches have focused on short-term and high-concentration co-exposure scenarios related to different species, based on diverse endpoints, which hampers the application and improvement of existing risk evaluation models of chemical mixture exposures. More importantly, current risk evaluation models are not user-friendly for construction practitioners who do not have sufficient toxicological knowledge. Therefore, in this study, an inhalation experiment system and a hazard index (HI) were developed to investigate the risks associated with low-concentration and long-term inhalation exposure scenarios of formaldehyde and benzene, individually and combined, based on Drosophila melanogaster mortality. The results showed that the system exhibited good reproducibility in providing stable exposure concentrations during D. melanogaster life cycle. Furthermore, in a range of experimental concentrations, the interaction between formaldehyde and benzene was additive or synergistic, which was concentration- and ratio-dependent. This study is of great significance in harmonising and providing toxicity data under long-term and low-concentration exposure scenarios, which is beneficial for establishing a new user-friendly risk evaluation model for indoor chemical mixture exposures. It should be noted that the proposed HI value could indicate the hazard degrees of long-term inhalation exposures of formaldehyde and benzene, individually and combined, to D. melanogaster. However, the applicability of this index requires further experiments to evaluate the exposure risks of other volatile organic compounds (VOCs) to D. melanogaster.
Collapse
Affiliation(s)
- Xiaoying Li
- College of Mechanical Engineering, Tongji University, Shanghai, 200092, China
| | - Zhenhai Li
- College of Mechanical Engineering, Tongji University, Shanghai, 200092, China.
| | - Hao Shen
- Shanghai Institute of Measurement and Testing Technology, Shanghai, 201203, China
| | - Haishan Zhao
- Shanghai Institute of Measurement and Testing Technology, Shanghai, 201203, China
| | - Guojun Qin
- College of Mechanical Engineering, Tongji University, Shanghai, 200092, China
| | - Jingchuan Xue
- College of Mechanical Engineering, Tongji University, Shanghai, 200092, China
| |
Collapse
|
3
|
Yan H, Fan W, Chen X, Wang H, Qin C, Jiang X. Component spectra extraction and quantitative analysis for preservative mixtures by combining terahertz spectroscopy and machine learning. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 271:120908. [PMID: 35077979 DOI: 10.1016/j.saa.2022.120908] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 01/09/2022] [Accepted: 01/13/2022] [Indexed: 06/14/2023]
Abstract
Preservatives are universally used in synergistic combination to enhance antimicrobial effect. Identify compositions and quantify components of preservatives are crucial steps in quality monitoring to guarantee merchandise safety. In the work, three most common preservatives, sorbic acid, potassium sorbate and sodium benzoate, are deliberately mixed in pairs with different mass ratios, which aresupposedto be the "unknown" multicomponent systems and measured by terahertz (THz) time-domain spectroscopy. Subsequently, three major challenges have been accomplished by machine learning methods in this work. The singular value decomposition (SVD) effectively obtains the number of components in mixed preservatives. Then, the component spectra are successfully extracted by non-negative matrix factorization (NMF) and self-modeling mixture analysis (SMMA), which match well with the measured THz spectra of pure reagents. Moreover, the support vector machine for regression (SVR) designed an underlying model to the target components and simultaneously identify contents of each individual component in validation mixtures with decision coefficient R2 = 0.989. By taking advantages of the fingerprint-based THz technique and machine learning methods, our approach has been demonstrated the great potential to be served as a useful strategy for detecting preservative mixtures in practical applications.
Collapse
Affiliation(s)
- Hui Yan
- State Key Laboratory of Transient Optics and Photonics, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, China; College of Science, Zhongyuan University of Technology, Zhengzhou Key Laboratory of Low-dimensional Quantum Materials and Devices, Zhengzhou 450007, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenhui Fan
- State Key Laboratory of Transient Optics and Photonics, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, China; University of Chinese Academy of Sciences, Beijing 100049, China; Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan 030006, China.
| | - Xu Chen
- State Key Laboratory of Transient Optics and Photonics, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, China
| | - Hanqi Wang
- State Key Laboratory of Transient Optics and Photonics, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chong Qin
- State Key Laboratory of Transient Optics and Photonics, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaoqiang Jiang
- State Key Laboratory of Transient Optics and Photonics, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, China; University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
4
|
Zhang Y, Liang X. Understanding Organic Nonpoint-Source Pollution in Watersheds via Pollutant Indicators, Disinfection By-Product Precursor Predictors, and Composition of Dissolved Organic Matter. JOURNAL OF ENVIRONMENTAL QUALITY 2019; 48:102-116. [PMID: 30640343 DOI: 10.2134/jeq2018.06.0228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The analytical techniques and instrumentation used to assess agricultural and rural nonpoint-source organic pollution loading are usually complex and expensive. There has been a strong demand for alternative methodologies to determine the presence and composition of organic pollutants and to predict their levels. In the current work, we investigated a simple and inexpensive approach combining excitation-emission matrix and support vector machine that measures pollution and predicts the levels of precursors to disinfection by-products, which are organic pollutants derived from agricultural and rural nonpoint sources in small watersheds. Through parallel factor analysis, a four-component model was developed to explain the composition of dissolved organic matter in water impacted by nonpoint-source pollution. Support vector classification and support vector regression with model components can use fluorescence properties as proxy indicators for nonpoint-source pollution. When the model components are used as input variables, formation potential of disinfection by-products can be predicted. This method provides water utilities managers with tools to control pollution, supervise aquatic environments, and ensure the safety of drinking water.
Collapse
|
5
|
Semi-correlations combined with the index of ideality of correlation: a tool to build up model of mutagenic potential. Mol Cell Biochem 2018; 452:133-140. [PMID: 30074137 DOI: 10.1007/s11010-018-3419-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2018] [Accepted: 07/28/2018] [Indexed: 02/01/2023]
Abstract
Mutagenicity is the ability of a substance to induce mutations. This hazardous ability of a substance is decisive from point of view of ecotoxicology. The number of substances, which are used for practical needs, grows every year. Consequently, methods for at least preliminary estimation of mutagenic potential of new substances are necessary. Semi-correlations are a special case of traditional correlations. These correlations can be named as "correlations along two parallel lines." This kind of correlation has been tested as a tool to predict selected endpoints, which are represented by only two values: "inactive/active" (0/1). Here this approach is used to build up predictive models for mutagenicity of large dataset (n = 3979). The so-called index of ideality of correlation (IIC) has been tested as a statistical criterion to estimate the semi-correlation. Three random splits of experimental data into the training, invisible-training, calibration, and validation sets were analyzed. Two models were built up for each split: the first model based on optimization without the IIC and the second model based on optimization where IIC is involved in the Monte Carlo optimization. The statistical characteristics of the best model (calculated with taking into account the IIC) n = 969; sensitivity = 0.8050; specificity = 0.9069; accuracy = 0.8648; Matthews's correlation coefficient = 0.7196 (using IIC). Thus, the use of IIC improves the statistical quality of the binary classification models of mutagenic potentials (Ames test) of organic compounds.
Collapse
|
6
|
Wang CC, Lin YC, Lin YC, Jhang SR, Tung CW. Identification of informative features for predicting proinflammatory potentials of engine exhausts. Biomed Eng Online 2017; 16:66. [PMID: 28830522 PMCID: PMC5568601 DOI: 10.1186/s12938-017-0355-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND The immunotoxicity of engine exhausts is of high concern to human health due to the increasing prevalence of immune-related diseases. However, the evaluation of immunotoxicity of engine exhausts is currently based on expensive and time-consuming experiments. It is desirable to develop efficient methods for immunotoxicity assessment. METHODS To accelerate the development of safe alternative fuels, this study proposed a computational method for identifying informative features for predicting proinflammatory potentials of engine exhausts. A principal component regression (PCR) algorithm was applied to develop prediction models. The informative features were identified by a sequential backward feature elimination (SBFE) algorithm. RESULTS A total of 19 informative chemical and biological features were successfully identified by SBFE algorithm. The informative features were utilized to develop a computational method named FS-CBM for predicting proinflammatory potentials of engine exhausts. FS-CBM model achieved a high performance with correlation coefficient values of 0.997 and 0.943 obtained from training and independent test sets, respectively. CONCLUSIONS The FS-CBM model was developed for predicting proinflammatory potentials of engine exhausts with a large improvement on prediction performance compared with our previous CBM model. The proposed method could be further applied to construct models for bioactivities of mixtures.
Collapse
Affiliation(s)
- Chia-Chi Wang
- School of Pharmacy, Kaohsiung Medical University, Kaohsiung, Taiwan
- Ph.D. Program in Toxicology, Kaohsiung Medical University, Kaohsiung, Taiwan
- Institute of Environmental Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan
- National Institute of Environmental Health Sciences, National Health Research Institutes, Miaoli County, Taiwan
| | - Ying-Chi Lin
- School of Pharmacy, Kaohsiung Medical University, Kaohsiung, Taiwan
- Ph.D. Program in Toxicology, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Yuan-Chung Lin
- Ph.D. Program in Toxicology, Kaohsiung Medical University, Kaohsiung, Taiwan
- Institute of Environmental Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan
| | - Syu-Ruei Jhang
- Institute of Environmental Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan
| | - Chun-Wei Tung
- School of Pharmacy, Kaohsiung Medical University, Kaohsiung, Taiwan
- Ph.D. Program in Toxicology, Kaohsiung Medical University, Kaohsiung, Taiwan
- National Institute of Environmental Health Sciences, National Health Research Institutes, Miaoli County, Taiwan
| |
Collapse
|
7
|
Zhang H, Kang YL, Zhu YY, Zhao KX, Liang JY, Ding L, Zhang TG, Zhang J. Novel naïve Bayes classification models for predicting the chemical Ames mutagenicity. Toxicol In Vitro 2017; 41:56-63. [DOI: 10.1016/j.tiv.2017.02.016] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Revised: 01/04/2017] [Accepted: 02/18/2017] [Indexed: 10/20/2022]
|
8
|
Tian D, Zheng W, He G, Zheng Y, Andersen ME, Tan H, Qu W. Predicting cytotoxicity of complex mixtures in high cancer incidence regions of the Huai River Basin based on GC-MS spectrum with partial least squares regression. ENVIRONMENTAL RESEARCH 2015; 137:391-397. [PMID: 25614340 DOI: 10.1016/j.envres.2014.12.027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Revised: 12/29/2014] [Accepted: 12/30/2014] [Indexed: 06/04/2023]
Abstract
Complex mixture exposures, such as those associated with water sources, are an important issue in health risk assessment. This study assessed the cytotoxicity of chemical mixtures extracted from water sources in regions of the Huai River Basin with high cancer incidences and built statistical models of cytotoxicity based on pollution profiles that were measured with gas chromatography-mass spectrometry (GC-MS). Both surface and ground waters were collected from rural water sources of Shenqiu County, Henan Province of China from 2008 to 2011 and extracted with XAD-2 resigns. Cytotoxicity was evaluated with Chinese hamster ovary K1 (CHO-K1) cells and compared against the pollution profiles of the extracts. IC50 of water samples ranged from 0.023 to 0.338L-eq/mL. The pollutants in waters determined by GC-MS are complex and some of the compounds that contributed to cytotoxicity lack toxicity data. A partial least squares (PLS) regression model of cytotoxicity was built based on linear aggregation of predictor variables (i.e., peaks for single compounds in the gas chromatograms). The PLS model contains 2 PLS factors extracted from 141 variables. The model was validated internally with training data permutation and externally with a test sample. The model explained 92% of the cytotoxicity in the training samples and 40% in the test sample. This approach provides a general, rapid method for relating water toxicity to GC-MS chromatograms and for predicting the compounds that contribute most to toxicity.
Collapse
Affiliation(s)
- Dajun Tian
- Key Laboratory of Public Health and Safety, Ministry of Education, Department of Environmental Health, School of Public Health, Fudan University, Yi Xue Yuan Road 138, Shanghai 200032, China
| | - Weiwei Zheng
- Key Laboratory of Public Health and Safety, Ministry of Education, Department of Environmental Health, School of Public Health, Fudan University, Yi Xue Yuan Road 138, Shanghai 200032, China
| | - Gengsheng He
- Key Laboratory of the Public Health Safety, Ministry of Education, Department of Nutrition and Food Hygiene, Fudan University, Shanghai 200032, China
| | - Yuxin Zheng
- Chinese Center for Disease Control and Prevention, Nan Wei Road 29, Beijing 100050, China
| | - Melvin E Andersen
- Institute for Chemical Safety Sciences, The Hamner Institutes for Health Sciences, Research Triangle Park, NC 27709, USA
| | - Hui Tan
- Key Laboratory of the Public Health Safety, Ministry of Education, Department of Childhood and Adolescent, Fudan University, Shanghai 200032, China
| | - Weidong Qu
- Key Laboratory of Public Health and Safety, Ministry of Education, Department of Environmental Health, School of Public Health, Fudan University, Yi Xue Yuan Road 138, Shanghai 200032, China.
| |
Collapse
|
9
|
Lampa E, Lind L, Lind PM, Bornefalk-Hermansson A. The identification of complex interactions in epidemiology and toxicology: a simulation study of boosted regression trees. Environ Health 2014; 13:57. [PMID: 24993424 PMCID: PMC4120739 DOI: 10.1186/1476-069x-13-57] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2013] [Accepted: 06/28/2014] [Indexed: 05/29/2023]
Abstract
BACKGROUND There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants. The usual approach is to formulate an additive statistical model and check for departures using product terms between the variables of interest. In this paper, we present an approach to search for interaction effects among several variables using boosted regression trees. METHODS We simulate a continuous outcome from real data on 27 environmental contaminants, some of which are correlated, and test the method's ability to uncover the simulated interactions. The simulated outcome contains one four-way interaction, one non-linear effect and one interaction between a continuous variable and a binary variable. Four scenarios reflecting different strengths of association are simulated. We illustrate the method using real data. RESULTS The method succeeded in identifying the true interactions in all scenarios except where the association was weakest. Some spurious interactions were also found, however. The method was also capable to identify interactions in the real data set. CONCLUSIONS We conclude that boosted regression trees can be used to uncover complex interaction effects in epidemiological studies.
Collapse
Affiliation(s)
- Erik Lampa
- Department of Medical Sciences, Occupational and Environmental Medicine, Uppsala University, 75185 Uppsala Sweden
| | - Lars Lind
- Department of Medical Sciences, Cardiovascular Epidemiology, Uppsala University, 75185 Uppsala Sweden
| | - P Monica Lind
- Department of Medical Sciences, Occupational and Environmental Medicine, Uppsala University, 75185 Uppsala Sweden
| | | |
Collapse
|
10
|
|
11
|
Wang S, Zhang H, Zheng W, Wang X, Andersen ME, Pi J, He G, Qu W. Organic extract contaminants from drinking water activate Nrf2-mediated antioxidant response in a human cell line. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2013; 47:4768-4777. [PMID: 23560486 DOI: 10.1021/es305133k] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Traditional risk assessment methods face challenges in estimating risks from drinking waters that contain low-levels of large numbers of contaminants. Here, we evaluate the toxicity of organic contaminant (OC) extracts from drinking water by examining activation of nuclear factor E2-related factor 2 (Nrf2)-mediated antioxidant response. In HepG2 cells, the Nrf2-mediated antioxidant response-measured as Nrf2 protein accumulation, expression of antioxidant response element (ARE)-regulated genes and ARE-luciferase reporter gene assays were activated by OC extracts from drinking water sources that detected 25 compounds in 9 classification groups. Individual OCs induced oxidative stress at concentrations much higher than their environmental levels; however, mixtures of contaminants induced oxidative stress response at only 8 times the environmental levels. Additionally, a synthetic OC mixture prepared based on the contamination profiling of drinking water induced ARE activity to the same extent as the real-world mixture, reinforcing our conclusion that these mixture exposures produce responses relevant for human exposure situations. Our study tested the possibility of assessing toxicity of OCs of drinking water using a specific ARE-pathway measurement. This approach should be broadly useful in assisting risk assessment of mixed environmental exposure.
Collapse
Affiliation(s)
- Shu Wang
- Key Laboratory of the Public Health Safety, Ministry of Education, Department of Environmental Health, School of Public Health, Fudan University, Shanghai, 200032, People's Republic of China
| | | | | | | | | | | | | | | |
Collapse
|