1
|
Vidal V, Sampognaro L, de León F, Kruk C, Perera G, Crisci C, Segura AM. A critical review of model construction and performance for nowcast systems for faecal contamination in recreational beaches. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 954:176233. [PMID: 39277000 DOI: 10.1016/j.scitotenv.2024.176233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 08/22/2024] [Accepted: 09/10/2024] [Indexed: 09/17/2024]
Abstract
Faecal contamination is a widespread environmental and public health problem on recreational beaches around the world. The implementation of predictive models has been recommended by the World Health Organization as a complement to traditional monitoring to assist decision-makers and reduce health risks. Despite several advances that have been made in the modeling of faecal coliforms, tools and algorithms from machine learning are still scarcely used in the field and their implementation in nowcast systems is delayed. Here, we perform a literature review on modeling strategies to predict faecal contamination in recreational beaches in the last two decades and the implementation of models in nowcast systems to aid management. Models constructed for surface waters of continental (lakes, rivers and streams), estuarine and marine coastal ecosystems were analyzed and compared based on performance metrics for continuous (i.e. regression; R2, Root Mean Square Error: RMSE) and categorical (i.e. classification; accuracy, sensitivity, specificity) responses. We found 67 articles matching the search criteria and 40 with information allowing to evaluate and compare predictive ability. In early 2000, Multiple Linear Regressions were common, followed by a peak of Artificial Neural Networks (ANNs) from 2010 to 2015, and the rise of Machine learning techniques, such as decision trees (CART and Random Forest) since 2015. ANNs and decision trees presented better accuracy than the remaining models. Rainfall and its lags were important predictor variables followed by water temperature. Specificity was much higher than sensitivity in all modeling strategies, which is typical in data sets where one category (e.g. closed beach) is far less common than the normal state (i.e. unbalanced data sets). We registered the implementation of statistical models in early warning systems in 6 countries, mainly by public beach quality management institutions, followed by NGOs in conjunction with universities. We identified critical steps towards improving model construction, evaluation and usage: i) the need to balance the data set previous to model training, ii) the need to separate data set in training, validation and test to perform an honest evaluation of model performance and iii) the transduction of model outputs to plain language to relevant stakeholders. Integrating into a single framework in situ monitoring, model construction and nowcasting systems could help to improve decision making systems to protect users from bathing in contaminated waters. Still the reduction of arrival of faecal coliforms to aquatic ecosystems (e.g. by improving sewage treatment systems) will be the ultimate factor in reducing health risk.
Collapse
Affiliation(s)
- Victoria Vidal
- Departamento Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), CURE-Rocha, Universidad de la República, Ruta Nacional N°9 intersección Ruta N°15, Rocha 27000, Uruguay.
| | - Lia Sampognaro
- Departamento Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), CURE-Rocha, Universidad de la República, Ruta Nacional N°9 intersección Ruta N°15, Rocha 27000, Uruguay
| | - Fernanda de León
- Departamento Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), CURE-Rocha, Universidad de la República, Ruta Nacional N°9 intersección Ruta N°15, Rocha 27000, Uruguay
| | - Carla Kruk
- Departamento Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), CURE-Rocha, Universidad de la República, Ruta Nacional N°9 intersección Ruta N°15, Rocha 27000, Uruguay
| | - Gonzalo Perera
- Departamento Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), CURE-Rocha, Universidad de la República, Ruta Nacional N°9 intersección Ruta N°15, Rocha 27000, Uruguay
| | - Carolina Crisci
- Departamento Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), CURE-Rocha, Universidad de la República, Ruta Nacional N°9 intersección Ruta N°15, Rocha 27000, Uruguay
| | - Angel M Segura
- Departamento Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), CURE-Rocha, Universidad de la República, Ruta Nacional N°9 intersección Ruta N°15, Rocha 27000, Uruguay
| |
Collapse
|
2
|
Lloyd SD, Carvajal G, Campey M, Taylor N, Osmond P, Roser DJ, Khan SJ. Predicting recreational water quality and public health safety in urban estuaries using Bayesian Networks. WATER RESEARCH 2024; 254:121319. [PMID: 38422692 DOI: 10.1016/j.watres.2024.121319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 02/05/2024] [Accepted: 02/14/2024] [Indexed: 03/02/2024]
Abstract
To support the reactivation of urban rivers and estuaries for bathing while ensuring public safety, it is critical to have access to real-time information on microbial water quality and associated health risks. Predictive modelling can provide this information, though challenges concerning the optimal size of training data, model transferability, and communication of uncertainty still need attention. Further, urban estuaries undergo distinctive hydrological variations requiring tailored modelling approaches. This study assessed the use of Bayesian Networks (BNs) for the prediction of enterococci exceedances and extrapolation of health risks at planned bathing sites in an urban estuary in Sydney, Australia. The transferability of network structures between sites was assessed. Models were validated using a novel application of the k-fold walk-forward validation procedure and further tested using independent compliance and event-based sampling datasets. Learning curves indicated the model's sensitivity reached a minimum performance threshold of 0.8 once training data included ≥ 400 observations. It was demonstrated that Semi-Naïve BN structures can be transferred while maintaining stable predictive performance. In all sites, salinity and solar exposure had the greatest influence on Posterior Probability Distributions (PPDs), when combined with antecedent rainfall. The BNs provided a novel and transparent framework to quantify and visualise enterococci, stormwater impact, health risks, and associated uncertainty under varying environmental conditions. This study has advanced the application of BNs in predicting recreational water quality and providing decision support in urban estuarine settings, proposed for bathing, where uncertainty is high.
Collapse
Affiliation(s)
- Simon D Lloyd
- School of Built Environment, University of New South Wales, NSW, Australia.
| | - Guido Carvajal
- Facultad de Ingeniería, Universidad Andrés Bello, Antonio Varas 880, Providencia, Santiago, Chile
| | - Meredith Campey
- Beachwatch, NSW Department of Planning and Environment, NSW, Australia
| | | | - Paul Osmond
- School of Built Environment, University of New South Wales, NSW, Australia
| | - David J Roser
- School of Civil and Environmental Engineering, University of New South Wales, NSW, Australia
| | - Stuart J Khan
- School of Civil Engineering, University of Sydney, NSW, Australia
| |
Collapse
|
3
|
Seis W, Veldhuis MCT, Rouault P, Steffelbauer D, Medema G. A new Bayesian approach for managing bathing water quality at river bathing locations vulnerable to short-term pollution. WATER RESEARCH 2024; 252:121186. [PMID: 38340453 DOI: 10.1016/j.watres.2024.121186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 12/21/2023] [Accepted: 01/23/2024] [Indexed: 02/12/2024]
Abstract
Short-term fecal pollution events are a major challenge for managing microbial safety at recreational waters. Long turn-over times of current laboratory methods for analyzing fecal indicator bacteria (FIB) delay water quality assessments. Data-driven models have been shown to be valuable approaches to enable fast water quality assessments. However, a major barrier towards the wider use of such models is the prevalent data scarcity at existing bathing waters, which questions the representativeness and thus usefulness of such datasets for model training. The present study explores the ability of five data-driven modelling approaches to predict short-term fecal pollution episodes at recreational bathing locations under data scarce situations and imbalanced datasets. The study explicitly focuses on the potential benefits of adopting an innovative modeling and risk-based assessment approach, based on state/cluster-based Bayesian updating of FIB distributions in relation to different hydrological states. The models are benchmarked against commonly applied supervised learning approaches, particularly linear regression, and random forests, as well as to a zero-model which closely resembles the current way of classifying bathing water quality in the European Union. For model-based clustering we apply a non-parametric Bayesian approach based on a Dirichlet Process Mixture Model. The study tests and demonstrates the proposed approaches at three river bathing locations in Germany, known to be influenced by short-term pollution events. At each river two modelling experiments ("longest dry period", "sequential model training") are performed to explore how the different modelling approaches react and adapt to scarce and uninformative training data, i.e., datasets that do not include event pollution information in terms of elevated FIB concentrations. We demonstrate that it is especially the proposed Bayesian approaches that are able to raise correct warnings in such situations (> 90 % true positive rate). The zero-model and random forest are shown to be unable to predict contamination episodes if pollution episodes are not present in the training data. Our research shows that the investigated Bayesian approaches reduce the risk of missed pollution events, thereby improving bathing water safety management. Additionally, the approaches provide a transparent solution for setting minimum data quality requirements under various conditions. The proposed approaches open the way for developing data-driven models for bathing water quality prediction against the reality that data scarcity is common problem at existing and prospective bathing waters.
Collapse
Affiliation(s)
- Wolfgang Seis
- KWB Kompetenzzentrum Wasser Berlin gGmbH, Cicerostraße 24, Berlin 10709, Germany; Water Management Department, Faculty of Civil Engineering and Geosciences, Delft University of Technology, Stevinweg 1, Delft 2628 CN, the Netherlands.
| | - Marie-Claire Ten Veldhuis
- Water Management Department, Faculty of Civil Engineering and Geosciences, Delft University of Technology, Stevinweg 1, Delft 2628 CN, the Netherlands
| | - Pascale Rouault
- KWB Kompetenzzentrum Wasser Berlin gGmbH, Cicerostraße 24, Berlin 10709, Germany
| | - David Steffelbauer
- KWB Kompetenzzentrum Wasser Berlin gGmbH, Cicerostraße 24, Berlin 10709, Germany
| | - Gertjan Medema
- Water Management Department, Faculty of Civil Engineering and Geosciences, Delft University of Technology, Stevinweg 1, Delft 2628 CN, the Netherlands; KWR Water Research Institute, Groningenhaven 7, Nieuwegein 3433PE, the Netherlands
| |
Collapse
|
4
|
Searcy RT, Boehm AB. Know Before You Go: Data-Driven Beach Water Quality Forecasting. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17930-17939. [PMID: 36472482 DOI: 10.1021/acs.est.2c05972] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Forecasting environmental hazards is critical in preventing or building resilience to their impacts on human communities and ecosystems. Environmental data science is an emerging field that can be harnessed for forecasting, yet more work is needed to develop methodologies that can leverage increasingly large and complex data sets for decision support. Here, we design a data-driven framework that can, for the first time, forecast bacterial standard exceedances at marine beaches with 3 days lead time. Using historical data sets collected at two California sites, we train nearly 400 forecast models using statistical and machine learning techniques and test forecasts against predictions from both a naive "persistence" model and a baseline nowcast model. Overall, forecast models are found to have similar sensitivities and specificities to the persistence model, but significantly higher areas under the ROC curve (a metric distinguishing a model's ability to effectively parse classes across decision thresholds), suggesting that forecasts can provide enhanced information beyond past observations alone. Forecast model performance at all lead times was similar to that of nowcast models. Together, results suggest that integrating the forecasting framework developed in this study into beach management programs can enable better public notification and aid in proactive pollution and health risk management.
Collapse
Affiliation(s)
- Ryan T Searcy
- Department of Civil & Environmental Engineering, Stanford University, 473 Via Ortega, Stanford, California 94305, United States
| | - Alexandria B Boehm
- Department of Civil & Environmental Engineering, Stanford University, 473 Via Ortega, Stanford, California 94305, United States
| |
Collapse
|
5
|
Zimmer-Faust AG, Griffith JF, Steele JA, Santos B, Cao Y, Asato L, Chiem T, Choi S, Diaz A, Guzman J, Laak D, Padilla M, Quach-Cu J, Ruiz V, Woo M, Weisberg SB. Relationship between coliphage and Enterococcus at southern California beaches and implications for beach water quality management. WATER RESEARCH 2023; 230:119383. [PMID: 36630853 DOI: 10.1016/j.watres.2022.119383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 11/08/2022] [Accepted: 11/17/2022] [Indexed: 06/17/2023]
Abstract
Coliphage have been suggested as an alternative to fecal indicator bacteria for assessing recreational beach water quality, but it is unclear how frequently and at what types of beaches coliphage produces a different management outcome. Here we conducted side-by-side sampling of male-specific and somatic coliphage by the new EPA dead-end hollow fiber ultrafiltration (D-HFUF-SAL) method and Enterococcus at southern California beaches over two years. When samples were combined for all beach sites, somatic and male-specific coliphage both correlated with Enterococcus. When examined categorically, Enterococcus would have resulted in approximately two times the number of health advisories as somatic coliphage and four times that of male-specific coliphage,using recently proposed thresholds of 60 PFU/100 mL for somatic and 30 PFU/100 mL for male-specific coliphage. Overall, only 12% of total exceedances would have been for coliphage alone. Somatic coliphage exceedances that occurred in the absence of an Enterococcus exceedance were limited to a single site during south swell events, when this beach is known to be affected by nearby minimally treated sewage. Thus, somatic coliphage provided additional valuable health protection information, but may be more appropriate as a supplement to FIB measurements rather than as replacement because: (a) EPA-approved PCR methods for Enterococcus allow a more rapid response, (b) coliphage is more challenging owing to its greater sampling volume and laboratory time requirements, and (c) Enterococcus' long data history has yielded predictive management models that would need to be recreated for coliphage.
Collapse
Affiliation(s)
- Amity G Zimmer-Faust
- Southern California Coastal Water Research Project Authority, 3535 Harbor Blvd., Costa Mesa, CA 92626, United States.
| | - John F Griffith
- Southern California Coastal Water Research Project Authority, 3535 Harbor Blvd., Costa Mesa, CA 92626, United States
| | - Joshua A Steele
- Southern California Coastal Water Research Project Authority, 3535 Harbor Blvd., Costa Mesa, CA 92626, United States
| | - Bryan Santos
- City of San Diego, Environmental Monitoring and Technical Services, United States
| | - Yiping Cao
- Orange County Sanitation District, United States
| | - Laralyn Asato
- City of San Diego, Environmental Monitoring and Technical Services, United States
| | - Tania Chiem
- Orange County Public Health Laboratory, United States
| | - Samuel Choi
- Orange County Sanitation District, United States
| | - Arturo Diaz
- Orange County Sanitation District, United States
| | - Joe Guzman
- Orange County Public Health Laboratory, United States
| | - David Laak
- Ventura County Public Works Agency, United States
| | | | | | - Victor Ruiz
- Los Angeles City Sanitation Department, United States
| | - Mary Woo
- California State University Channel Islands, Ventura, CA, United States
| | - Stephen B Weisberg
- Southern California Coastal Water Research Project Authority, 3535 Harbor Blvd., Costa Mesa, CA 92626, United States
| |
Collapse
|
6
|
Khanna H, Fan YW, Chan SN. Automated Secchi disk depth measurement based on artificial intelligence object recognition. MARINE POLLUTION BULLETIN 2022; 185:114378. [PMID: 36435020 DOI: 10.1016/j.marpolbul.2022.114378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 11/07/2022] [Accepted: 11/13/2022] [Indexed: 06/16/2023]
Abstract
Water transparency affects the degree of sunlight penetration in water, which is important to many water quality processes. It can be visually measured by lowering a Secchi disk (SD) into water and recording its disappearance depth - the Secchi disk depth (SDD). High frequency SDD measurement is manpower intensive, precluding better understanding of the daily and diurnal variation of water transparency. For the first time, an artificial intelligence based object detection algorithm was employed for the automatic detection of SD from images, mimicking SDD measurement by human eyes. The trained model was validated on a large number of images (about 2000 for a single day in daytime) obtained from a remote-controlled imaging system in a fish farm in a Hong Kong embayment, demonstrating high detection accuracy of 93 %. The work opens up opportunities in the nowcast and forecast of short-term water quality changes (e.g. algal blooms) in coastal waters.
Collapse
Affiliation(s)
- Harshit Khanna
- Department of Mathematics, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
| | - Y W Fan
- Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - S N Chan
- Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China.
| |
Collapse
|
7
|
Long L, Zhu LT, Huang Q. Correlation between lung cancer markers and air pollutants in western China population. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:64022-64030. [PMID: 35467186 DOI: 10.1007/s11356-022-20354-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Accepted: 04/15/2022] [Indexed: 06/14/2023]
Abstract
The relationship between serum lung cancer markers and the air pollution remains unclear. To further reveal the correlation between air pollutants and lung cancer, a retrospective analysis of 446,032 asymptomatic healthy people and symptomatic healthy people from the Health Management Center of the First Affiliated Hospital of Chongqing Medical University from 2014 to 2019 was performed. The distribution characteristics of serum lung cancer markers, cancer embryo antigens (CEA), cytokeratin 19 fragment (CYFRA211), squamous cell carcinoma antigen (SCC), and nerve-specific enolase (NSE) was analyzed in these population. Two independent sample man-Whitney U test was used to analyze the correlation of lung cancer markers and age, and a Chi-square test was used to analyze the relationship between lung cancer markers and gender. The daily change trend was profiled for six main air quality indicators PM10, PM2.5, SO2, NO2, CO, O3 during the same period. The correlation between lung markers and air pollutants was investigated by Spearman and multiple linear regression. The results showed that CYFRA211 had the highest excess rate in the screening population. There were differences in the number of cases with concentrated expression of lung cancer markers in the different age groups. Among them, the people with NSE exceeding the standard were the youngest, and most of them were 40-55 years old. Besides SCC, the expression levels of other markers increased with age, and the expression levels of the four markers in males were significantly higher than those in females. Although the levels of PM10 and PM2.5 exceeded the WHO standard (World Health Organization. 2011), they were not correlated with lung cancer markers. Multiple comparisons showed that the air pollutants SO2 and CYFRA211, as well as NO2 and NSE were closely related, but there was no significant linear relationship between CEA, SCC, and air pollutants. In conclusion, among the four lung cancer markers, CYFRA211 had the highest abnormal excess rate in total screening population, and the expression levels of these markers varied by gender and age, with males showing significantly higher expression levels than females, and they increased significantly with age except for SCC. The differential expression of these lung cancer markers may provide more strategies for lung cancer screening in the corresponding population. Lung cancer markers, CYFRA211 and NSE, can be used as sensitive biomarkers for exposure to certain air pollutants and provide references for the prevention and management of air pollution.
Collapse
Affiliation(s)
- Li Long
- Health Management Center, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
| | - Li-Ting Zhu
- Xiamen Key Laboratory of Indoor Air and Health, Key Lab of Urban Environment and Health, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen, 361021, China
- National Basic Science Data Center, Beijing, 100190, China
| | - Qiansheng Huang
- Xiamen Key Laboratory of Indoor Air and Health, Key Lab of Urban Environment and Health, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen, 361021, China
- National Basic Science Data Center, Beijing, 100190, China
| |
Collapse
|
8
|
Li L, Qiao J, Yu G, Wang L, Li HY, Liao C, Zhu Z. Interpretable tree-based ensemble model for predicting beach water quality. WATER RESEARCH 2022; 211:118078. [PMID: 35066260 DOI: 10.1016/j.watres.2022.118078] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 11/29/2021] [Accepted: 01/12/2022] [Indexed: 06/14/2023]
Abstract
Tree-based machine learning models based on environmental features offer low-cost and timely solutions for predicting microbial fecal contamination in beach water to inform the public of the health risk. However, many of these models are black boxes that are difficult for humans to understand, which may cause severe consequences such as unexplained decisions and failure in accountability. To develop interpretable predictive models for beach water quality, we evaluate five tree-based models, namely classification tree, random forest, CatBoost, XGBoost, and LightGBM, and employ a state-of-the-art explanation method SHAP to explain the models. When tested on the Escherichia coli (E. coli) concentration data collected from three beach sites along Lake Erie shores, LightGBM, followed by XGBoost, achieves the highest averaged precision and recall scores. For all three sites, both models suggest lake turbidity as the most important predictor, and elucidate the crucial role of accurate local data of wave height and rainfall in the model development. Local SHAP values further reveal the robustness of the importance of lake turbidity as its SHAP value increases nearly monotonically with its value and is minimally affected by other environmental factors. Moreover, we found an intriguing interaction between lake turbidity and day-of-year. This work suggests that the combination of LightGBM and SHAP has a promising potential to develop interpretable models for predicting microbial water quality in freshwater lakes.
Collapse
Affiliation(s)
- Lingbo Li
- Department of Civil, Structural and Environmental Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Jundong Qiao
- Department of Civil, Structural and Environmental Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Guan Yu
- Department of Biostatistics, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Leizhi Wang
- Nanjing Hydraulic Research Institute, State Key laboratory of Hydrology, Water Resources and Hydraulic Engineering & Science, Nanjing 210029, China
| | - Hong-Yi Li
- Department of Civil and Environmental Engineering, University of Houston, Houston, TX, USA
| | - Chen Liao
- Program for Computational and Systems Biology, Memorial Sloan-Kettering Cancer Center, NY, USA.
| | - Zhenduo Zhu
- Department of Civil, Structural and Environmental Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA.
| |
Collapse
|
9
|
Sokolova E, Ivarsson O, Lillieström A, Speicher NK, Rydberg H, Bondelind M. Data-driven models for predicting microbial water quality in the drinking water source using E. coli monitoring and hydrometeorological data. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 802:149798. [PMID: 34454142 DOI: 10.1016/j.scitotenv.2021.149798] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 07/08/2021] [Accepted: 08/16/2021] [Indexed: 06/13/2023]
Abstract
Rapid changes in microbial water quality in surface waters pose challenges for production of safe drinking water. If not treated to an acceptable level, microbial pathogens present in the drinking water can result in severe consequences for public health. The aim of this paper was to evaluate the suitability of data-driven models of different complexity for predicting the concentrations of E. coli in the river Göta älv at the water intake of the drinking water treatment plant in Gothenburg, Sweden. The objectives were to (i) assess how the complexity of the model affects the model performance; and (ii) identify relevant factors and assess their effect as predictors of E. coli levels. To forecast E. coli levels one day ahead, the data on laboratory measurements of E. coli and total coliforms, Colifast measurements of E. coli, water temperature, turbidity, precipitation, and water flow were used. The baseline approaches included Exponential Smoothing and ARIMA (Autoregressive Integrated Moving Average), which are commonly used univariate methods, and a naive baseline that used the previous observed value as its next prediction. Also, models common in the machine learning domain were included: LASSO (Least Absolute Shrinkage and Selection Operator) Regression and Random Forest, and a tool for optimising machine learning pipelines - TPOT (Tree-based Pipeline Optimization Tool). Also, a multivariate autoregressive model VAR (Vector Autoregression) was included. The models that included multiple predictors performed better than univariate models. Random Forest and TPOT resulted in higher performance but showed a tendency of overfitting. Water temperature, microbial concentrations upstream and at the water intake, and precipitation upstream were shown to be important predictors. Data-driven modelling enables water producers to interpret the measurements in the context of what concentrations can be expected based on the recent historic data, and thus identify unexplained deviations warranting further investigation of their origin.
Collapse
Affiliation(s)
- Ekaterina Sokolova
- Chalmers University of Technology, Department of Architecture and Civil Engineering, Sweden.
| | - Oscar Ivarsson
- Chalmers University of Technology, Department of Computer Science and Engineering, Sweden
| | - Ann Lillieström
- Chalmers University of Technology, Department of Computer Science and Engineering, Sweden
| | - Nora K Speicher
- Chalmers University of Technology, Department of Computer Science and Engineering, Sweden
| | - Henrik Rydberg
- City of Gothenburg, Department of Sustainable Water and Waste, Sweden
| | - Mia Bondelind
- Chalmers University of Technology, Department of Architecture and Civil Engineering, Sweden
| |
Collapse
|
10
|
Abstract
Predictive models of bathing water quality are a useful support to traditional monitoring and provide timely and adequate information for the protection of public health. When developing models, it is critical to select an appropriate model type and appropriate metrics to reduce errors so that the predicted outcome is reliable. It is usually necessary to conduct intensive sampling to collect a sufficient amount of data. This paper presents the process of developing a predictive model in Kaštela Bay (Adriatic Sea) using only data from regular (official) bathing water quality monitoring collected during five bathing seasons. The predictive modelling process, which included data preprocessing, model training, and model tuning, showed no silver bullet model and selected two model types that met the specified requirements: a neural network (ANN) for Escherichia coli and a random forest (RF) for intestinal enterococci. The different model types are probably the result of the different persistence of two indicator bacteria to the effects of marine environmental factors and consequently the different die-off rates. By combining these two models, the bathing water samples were classified with acceptable performances, an informedness of 71.7%, an F-score of 47.1%, and an overall accuracy of 80.6%.
Collapse
|
11
|
Huang R, Ma C, Ma J, Huangfu X, He Q. Machine learning in natural and engineered water systems. WATER RESEARCH 2021; 205:117666. [PMID: 34560616 DOI: 10.1016/j.watres.2021.117666] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 09/01/2021] [Accepted: 09/11/2021] [Indexed: 06/13/2023]
Abstract
Water resources of desired quality and quantity are the foundation for human survival and sustainable development. To better protect the water environment and conserve water resources, efficient water management, purification, and transportation are of critical importance. In recent years, machine learning (ML) has exhibited its practicability, reliability, and high efficiency in numerous applications; furthermore, it has solved conventional and emerging problems in both natural and engineered water systems. For example, ML can predict various water quality indicators in situ and real-time by considering the complex interactions among water-related variables. ML approaches can also solve emerging pollution problems with proven rules or universal mechanisms summarized from the related research. Moreover, by applying image recognition technology to analyze the relationships between image information and physicochemical properties of the research object, ML can effectively identify and characterize specific contaminants. In view of the bright prospects of ML, this review comprehensively summarizes the development of ML applications in natural and engineered water systems. First, the concept and modeling steps of ML are briefly introduced, including data preparation, algorithm selection and model evaluation. In addition, comprehensive applications of ML in recent studies, including predicting water quality, mapping groundwater contaminants, classifying water resources, tracing contaminant sources, and evaluating pollutant toxicity in natural water systems, as well as modeling treatment techniques, assisting characterization analysis, purifying and distributing drinking water, and collecting and treating sewage water in engineered water systems, are summarized. Finally, the advantages and disadvantages of commonly used algorithms are analyzed according to their structures and mechanisms, and recommendations on the selection of ML algorithms for different studies, as well as prospects on the application and development of ML in water science are proposed. This review provides references for solving a wider range of water-related problems and brings further insights into the intelligent development of water science.
Collapse
Affiliation(s)
- Ruixing Huang
- Key Laboratory of Eco-environments in the Three Gorges Reservoir Region, Ministry of Education, College of Environmental and Ecology, Chongqing University, Chongqing 400044, China; State Key Laboratory of Urban Water Resource and Environment, School of Municipal and Environmental Engineering, Harbin Institute of Technology, Harbin 150090, China
| | - Chengxue Ma
- Key Laboratory of Eco-environments in the Three Gorges Reservoir Region, Ministry of Education, College of Environmental and Ecology, Chongqing University, Chongqing 400044, China; State Key Laboratory of Urban Water Resource and Environment, School of Municipal and Environmental Engineering, Harbin Institute of Technology, Harbin 150090, China
| | - Jun Ma
- State Key Laboratory of Urban Water Resource and Environment, School of Municipal and Environmental Engineering, Harbin Institute of Technology, Harbin 150090, China
| | - Xiaoliu Huangfu
- Key Laboratory of Eco-environments in the Three Gorges Reservoir Region, Ministry of Education, College of Environmental and Ecology, Chongqing University, Chongqing 400044, China.
| | - Qiang He
- Key Laboratory of Eco-environments in the Three Gorges Reservoir Region, Ministry of Education, College of Environmental and Ecology, Chongqing University, Chongqing 400044, China
| |
Collapse
|
12
|
Machine Learning-Based Prediction of Chlorophyll-a Variations in Receiving Reservoir of World’s Largest Water Transfer Project—A Case Study in the Miyun Reservoir, North China. WATER 2021. [DOI: 10.3390/w13172406] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Although water transfer projects can alleviate the water crisis, they may cause potential risks to water quality safety in receiving areas. The Miyun Reservoir in northern China, one of the receiving reservoirs of the world’s largest water transfer project (South-to-North Water Transfer Project, SNWTP), was selected as a case study. Considering its potential eutrophication trend, two machine learning models, i.e., the support vector machine (SVM) model and the random forest (RF) model, were built to investigate the trophic state by predicting the variations of chlorophyll-a (Chl-a) concentrations, the typical reflection of eutrophication, in the reservoir after the implementation of SNWTP. The results showed that compared with the SVM model, the RF model had higher prediction accuracy and more robust prediction ability with abnormal data, and was thus more suitable for predicting Chl-a concentration variations in the receiving reservoir. Additionally, short-term water transfer would not cause significant variations of Chl-a concentrations. After the project implementation, the impact of transferred water on the water quality of the receiving reservoir would have gradually increased. After a 10-year implementation, transferred water would cause a significant decline in the receiving reservoir’s water quality, and Chl-a concentrations would increase, especially from July to August. This led to a potential risk of trophic state change in the Miyun Reservoir and required further attention from managers. This study can provide prediction techniques and advice on water quality security management associated with eutrophication risks resulting from water transfer projects.
Collapse
|
13
|
Gupta S, Aga D, Pruden A, Zhang L, Vikesland P. Data Analytics for Environmental Science and Engineering Research. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2021; 55:10895-10907. [PMID: 34338518 DOI: 10.1021/acs.est.1c01026] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The advent of new data acquisition and handling techniques has opened the door to alternative and more comprehensive approaches to environmental monitoring that will improve our capacity to understand and manage environmental systems. Researchers have recently begun using machine learning (ML) techniques to analyze complex environmental systems and their associated data. Herein, we provide an overview of data analytics frameworks suitable for various Environmental Science and Engineering (ESE) research applications. We present current applications of ML algorithms within the ESE domain using three representative case studies: (1) Metagenomic data analysis for characterizing and tracking antimicrobial resistance in the environment; (2) Nontarget analysis for environmental pollutant profiling; and (3) Detection of anomalies in continuous data generated by engineered water systems. We conclude by proposing a path to advance incorporation of data analytics approaches in ESE research and application.
Collapse
Affiliation(s)
- Suraj Gupta
- The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Diana Aga
- Department of Chemistry, University at Buffalo, The State University of New York, Buffalo, New York 14226, United States
| | - Amy Pruden
- Via Department of Civil and Environmental Engineering, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Peter Vikesland
- Via Department of Civil and Environmental Engineering, Virginia Tech, Blacksburg, Virginia 24061, United States
| |
Collapse
|
14
|
Wang L, Zhu Z, Sassoubre L, Yu G, Liao C, Hu Q, Wang Y. Improving the robustness of beach water quality modeling using an ensemble machine learning approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 765:142760. [PMID: 33131841 DOI: 10.1016/j.scitotenv.2020.142760] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 09/28/2020] [Accepted: 09/28/2020] [Indexed: 05/12/2023]
Abstract
Microbial pollution of beach water can expose swimmers to harmful pathogens. Predictive modeling provides an alternative method for beach management that addresses several limitations associated with traditional culture-based methods of assessing water quality. Widely-used machine learning methods often suffer from high variability in performance from one year or beach to another. Therefore, the best machine learning method varies between beaches and years, making method selection difficult. This study proposes an ensemble machine learning approach referred to as model stacking that has a two-layered learning structure, where the outputs of five widely-used individual machine learning models (multiple linear regression, partial least square, sparse partial least square, random forest, and Bayesian network) are taken as input features for another model that produces the final prediction. Applying this approach to three beaches along eastern Lake Erie, New York, USA, we show that generally the model stacking approach was able to generate reliably good predictions compared to all of the five base models. The accuracy rankings of the stacking model consistently stayed 1st or 2nd every year, with yearly-average accuracy of 78%, 81%, and 82.3% at the three studied beaches, respectively. This study highlights the value of the model stacking approach in predicting beach water quality and solving other pressing environmental problems.
Collapse
Affiliation(s)
- Leizhi Wang
- Department of Civil, Structural and Environmental Engineering, University at Buffalo, The State University of New York, Buffalo 14220, NY, USA; Nanjing Hydraulic Research Institute, State Key laboratory of Hydrology, Water Resources and Hydraulic Engineering & Science, Nanjing 210029, China; Yangtze Institute for Conservation and Development, Nanjing, 210098, China
| | - Zhenduo Zhu
- Department of Civil, Structural and Environmental Engineering, University at Buffalo, The State University of New York, Buffalo 14220, NY, USA.
| | - Lauren Sassoubre
- Department of Civil, Structural and Environmental Engineering, University at Buffalo, The State University of New York, Buffalo 14220, NY, USA
| | - Guan Yu
- Department of Biostatistics, University at Buffalo, The State University of New York, Buffalo 14220, NY, USA
| | - Chen Liao
- Program for Computational and Systems Biology, Memorial Sloan-Kettering Cancer Center, NY 10065, New York, USA
| | - Qingfang Hu
- Nanjing Hydraulic Research Institute, State Key laboratory of Hydrology, Water Resources and Hydraulic Engineering & Science, Nanjing 210029, China; Yangtze Institute for Conservation and Development, Nanjing, 210098, China
| | - Yintang Wang
- Nanjing Hydraulic Research Institute, State Key laboratory of Hydrology, Water Resources and Hydraulic Engineering & Science, Nanjing 210029, China; Yangtze Institute for Conservation and Development, Nanjing, 210098, China
| |
Collapse
|
15
|
Searcy RT, Boehm AB. A Day at the Beach: Enabling Coastal Water Quality Prediction with High-Frequency Sampling and Data-Driven Models. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2021; 55:1908-1918. [PMID: 33471505 DOI: 10.1021/acs.est.0c06742] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
To reduce the incidence of recreational waterborne illness, fecal indicator bacteria (FIB) are measured to assess water quality and inform beach management. Recently, predictive FIB models have been used to aid managers in making beach posting and closure decisions. However, those predictive models must be trained using rich historical data sets consisting of FIB and environmental data that span years, and many beaches lack such data sets. Here, we investigate whether water quality data collected during discrete short duration, high-frequency beach sampling events (e.g., samples collected at sub-hourly intervals for 24-48 h) are sufficient to train predictive models that can be used for beach management. We use data collected during six high-frequency sampling events at three California marine beaches and train a total of 126 models using common data-driven techniques. Tide, solar irradiation, water temperature, significant wave height, and offshore wind speed were found to be the most important environmental variables in the models. We validate the predictive performance of models using withheld data. Random forests are consistently the top performing model type. Overall, we find that data-driven models trained using high-frequency FIB and environmental data perform well at predicting water quality and can be used to inform public health decisions at beaches.
Collapse
Affiliation(s)
- Ryan T Searcy
- Department of Civil & Environmental Engineering, Stanford University, 473 Via Ortega, Stanford, Palo Alto 94305, California, United States
| | - Alexandria B Boehm
- Department of Civil & Environmental Engineering, Stanford University, 473 Via Ortega, Stanford, Palo Alto 94305, California, United States
| |
Collapse
|
16
|
Hart JD, Blackwood AD, Noble RT. Examining coastal dynamics and recreational water quality by quantifying multiple sewage specific markers in a North Carolina estuary. THE SCIENCE OF THE TOTAL ENVIRONMENT 2020; 747:141124. [PMID: 32795790 DOI: 10.1016/j.scitotenv.2020.141124] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 07/16/2020] [Accepted: 07/18/2020] [Indexed: 06/11/2023]
Abstract
Fecal contamination is observed downstream of municipal separate storm sewer systems in coastal North Carolina. While it is well accepted that wet weather contributes to this phenomenon, less is understood about the contribution of the complex hydrology in this low-lying coastal plain. A quantitative microbial assessment was conducted in Beaufort, North Carolina to identify trends and potential sources of fecal contamination in stormwater receiving waters. Fecal indicator concentrations were significantly higher in receiving water downstream of a tidally submerged outfall compared to an outfall that was permanently submerged (p < 0.001), though tidal height was not predictive of human-specific microbial source tracking (MST) marker concentrations at the tidally submerged site. Short-term rainfall (i.e. <12 h) was predictive of E. coli, Enterococcus spp., and human-specific MST marker concentrations (Fecal Bacteroides, BacHum, and HF183) in receiving waters. The strong correlation between 12-hr antecedent rainfall and Enterococcus spp. (r = 0.57, p < 0.001, n = 92) suggests a predictive model could be developed based on rainfall to communicate risk for bathers. Additional molecular marker data indicates that the delivery of fecal sources is complex and highly variable, likely due to the influence of tidal influx (saltwater intrusion from the estuary) into the low-lying stormwater pipes. In particular, elevated MST marker concentrations (up to 2.56 × 104 gene copies HF183/mL) were observed in standing water near surcharging street storm drain. These data are being used to establish a baseline for stormwater dynamics prior to dramatic rainfall in 2018 and to characterize the interaction between complex stormwater dynamics and water quality impairment in coastal NC.
Collapse
Affiliation(s)
- Justin D Hart
- University of North Carolina Institute of Marine Sciences, Morehead City, NC, United States of America; Department of Environmental Sciences and Engineering, University of North Carolina Gillings School of Global Public Health, Chapel Hill, NC, United States of America
| | - A Denene Blackwood
- University of North Carolina Institute of Marine Sciences, Morehead City, NC, United States of America
| | - Rachel T Noble
- University of North Carolina Institute of Marine Sciences, Morehead City, NC, United States of America; Department of Environmental Sciences and Engineering, University of North Carolina Gillings School of Global Public Health, Chapel Hill, NC, United States of America.
| |
Collapse
|
17
|
Poulin C, Peletz R, Ercumen A, Pickering AJ, Marshall K, Boehm AB, Khush R, Delaire C. What Environmental Factors Influence the Concentration of Fecal Indicator Bacteria in Groundwater? Insights from Explanatory Modeling in Uganda and Bangladesh. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2020; 54:13566-13578. [PMID: 32975935 DOI: 10.1021/acs.est.0c02567] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Information about microbial water quality is critical for managing water safety and protecting public health. In low-income countries, monitoring all drinking water supplies is impractical because financial resources and capacity are insufficient. Data sets derived from satellite imagery, census, and hydrological models provide an opportunity to examine relationships between a suite of environmental risk factors and microbial water quality over large geographical scales. We investigated the relationships between groundwater fecal contamination and different environmental parameters in Uganda and Bangladesh. In Uganda, groundwater contamination was associated with high population density (p < 0.001; OR = 1.27), high cropland coverage (p < 0.001; OR = 1.47), high average monthly precipitation (p < 0.001; OR = 1.14), and high surface runoff (p < 0.001; OR = 1.37), while low groundwater contamination was more likely in areas further from cities (p < 0.001; OR = 0.66) and with higher forest coverage (p < 0.001; OR = 0.70). In Bangladesh, contamination was associated with higher weekly precipitation (p < 0.001; OR = 1.44) and higher livestock density (p = 0.05; OR = 1.11), while low contamination was associated with low forest coverage (p < 0.001; OR = 1.23) and high cropland coverage (p < 0.001; OR = 0.80). We developed a groundwater contamination index for each country to help decision-makers identify areas where groundwater is most prone to fecal contamination and prioritize monitoring activities. Our approach demonstrates how to harness satellite-derived data to guide water safety management.
Collapse
Affiliation(s)
- Chloé Poulin
- The Aquaya Institute, PO Box 21862, Nairobi, Kenya
| | | | - Ayse Ercumen
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Amy J Pickering
- Civil and Environmental Engineering, Tufts University, Medford, Massachusetts 02153, United States
| | | | - Alexandria B Boehm
- Department of Civil and Environmental Engineering, Stanford University, Stanford California 94305-4020, United States
| | - Ranjiv Khush
- The Aquaya Institute, PO Box 21862, Nairobi, Kenya
| | | |
Collapse
|
18
|
Peng Z, Hu Y, Liu G, Hu W, Zhang H, Gao R. Calibration and quantifying uncertainty of daily water quality forecasts for large lakes with a Bayesian joint probability modelling approach. WATER RESEARCH 2020; 185:116162. [PMID: 32810742 DOI: 10.1016/j.watres.2020.116162] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 07/04/2020] [Accepted: 07/07/2020] [Indexed: 06/11/2023]
Abstract
Correcting the systematic bias and quantifying uncertainty associated with the operational water quality forecasts are imperative works for risk-based environmental decision making. This work proposes a post-processing method for addressing both bias correction and total uncertainty quantification for daily forecasts of water quality parameters derived from dynamical lake models. The post-processing is implemented based on a Bayesian Joint Probability (BJP) modeling approach. The BJP model uses a log-sinh transformation to normalize the raw forecasts and corresponding observations, and uses a bivariate Gaussian distribution to characterize the dependence relationship. The posterior distribution of the transformation parameters is inferenced through Metropolis Monte Carlo Markov chain sampling; it generates unbiased probabilistic forecasts that account for uncertainties from all sources. The BJP is used to post-processing raw daily forecasts of dissolved oxygen (DO), ammonium nitrogen (NH), total phosphorus (TP) and total nitrogen (TN) concentrations of Lake Chaohu, the fifth largest lake in China with lead times from 0 to 5 days. Results suggest that an average 93.1% forecast bias has been removed by BJP. The root mean square error in probability skill scores range from 5.8% for NH to 68.2% for TP, and the non-parametric bootstrapping test suggests that 67.7% forecasts are significantly improved averaged across all sampling sites, water quality parameters and lead times. The probabilities of the calibrated forecasts are reasonably consistent with the observed relative frequencies, and have appropriate spread and thus correctly quantify forecast uncertainty. The BJP post-processing method used in this study can be a useful operational tool that help to better realize the potential of water quality forecasts derived from dynamical models.
Collapse
Affiliation(s)
- Zhaoliang Peng
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, 210008, China.
| | - Yuemin Hu
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, 210008, China
| | - Gang Liu
- Administration Bureau of Lake Chaohu, Chaohu, 238000, China
| | - Weiping Hu
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, 210008, China
| | - Hui Zhang
- Administration Bureau of Lake Chaohu, Chaohu, 238000, China
| | - Rui Gao
- Administration Bureau of Lake Chaohu, Chaohu, 238000, China; Lake Chaohu Research Institute, Hefei, 238000, China
| |
Collapse
|
19
|
Madani M, Seth R. Evaluating multiple predictive models for beach management at a freshwater beach in the Great Lakes region. JOURNAL OF ENVIRONMENTAL QUALITY 2020; 49:896-908. [PMID: 33016491 DOI: 10.1002/jeq2.20107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 05/10/2020] [Accepted: 05/14/2020] [Indexed: 06/11/2023]
Abstract
Recreational water quality is currently monitored at Sandpoint Beach on Lake St. Clair using culture-based enumeration of Escherichia coli. Using water quality and weather data collected over 4 yr, several multiple linear regression (MLR)-based models were developed for near real-time prediction of E. coli concentration and were tested using independent data from the fifth year. Model performance was assessed by the determination of metrics such as RMSE, accuracy, specificity, sensitivity, and area under the receiver operating characteristic curve (AUROC). Each of the developed MLR models described herein resulted in increased correct responses for both exceedance and non-exceedance of the applicable standard as compared to predictions based on E. coli measurements (persistence models, using the previous day's E. coli concentration), which is the method currently being used. The AUROC values for persistence models are between 0.5 and 0.6, as compared to >0.7 for all the MLR models described herein. Among the MLR models, model performance improved when qualitative sky weather condition, which is commonly reported but was not previously used in similar models, was included. To select the best model, a principal coordinate analysis was used to combine multiple model performance metrics and provide a more sensitive tool for model comparison. Although models developed using 2, 3, and 4 yr of monitoring data provided reasonable performance, the model developed using the most recent 2-yr data was marginally better. Thus, data from the most recent 2 yr are likely sufficient as a training dataset for updating the MLR model for Sandpoint Beach in the future.
Collapse
Affiliation(s)
- Mohammad Madani
- Dep. of Civil and Environmental Engineering, Univ. of Windsor, Windsor, ON, N9B3P4, Canada
| | - Rajesh Seth
- Dep. of Civil and Environmental Engineering, Univ. of Windsor, Windsor, ON, N9B3P4, Canada
| |
Collapse
|
20
|
Zhang X, Zhi X, Chen L, Shen Z. Spatiotemporal variability and key influencing factors of river fecal coliform within a typical complex watershed. WATER RESEARCH 2020; 178:115835. [PMID: 32330732 PMCID: PMC7160644 DOI: 10.1016/j.watres.2020.115835] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2020] [Revised: 03/30/2020] [Accepted: 04/14/2020] [Indexed: 05/08/2023]
Abstract
Fecal coliform bacteria are a key indicator of human health risks; however, the spatiotemporal variability and key influencing factors of river fecal coliform have yet to be explored in a rural-suburban-urban watershed with multiple land uses. In this study, the fecal coliform concentrations in 21 river sections were monitored for 20 months, and 441 samples were analyzed. Multivariable regressions were used to evaluate the spatiotemporal dynamics of fecal coliform. The results showed that spatial differences were mainly dominated by urbanization level, and environmental factors could explain the temporal dynamics of fecal coliform in different urban patterns except in areas with high urbanization levels. Reducing suspended solids is a direct way to manage fecal coliform in the Beiyun River when the natural factors are difficulty to change, such as temperature and solar radiation. The export of fecal coliform from urban areas showed a quick and sensitive response to rainfall events and increased dozens of times in the short term. Landscape patterns, such as the fragmentation of impervious surfaces and the overall landscape, were identified as key factors influencing urban non-point source bacteria. The results obtained from this study will provide insight into the management of river fecal pollution.
Collapse
Affiliation(s)
- Xiaoyue Zhang
- State Key Laboratory of Water Environment Simulation, School of Environment, Beijing Normal University, Beijing, 100875, PR China
| | - Xiaosha Zhi
- State Key Laboratory of Water Environment Simulation, School of Environment, Beijing Normal University, Beijing, 100875, PR China; Satellite Environment Centre, Ministry of Environmental Protection, Beijing, 100094, PR China
| | - Lei Chen
- State Key Laboratory of Water Environment Simulation, School of Environment, Beijing Normal University, Beijing, 100875, PR China.
| | - Zhenyao Shen
- State Key Laboratory of Water Environment Simulation, School of Environment, Beijing Normal University, Beijing, 100875, PR China
| |
Collapse
|
21
|
Xu T, Coco G, Neale M. A predictive model of recreational water quality based on adaptive synthetic sampling algorithms and machine learning. WATER RESEARCH 2020; 177:115788. [PMID: 32330740 DOI: 10.1016/j.watres.2020.115788] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 04/01/2020] [Accepted: 04/02/2020] [Indexed: 06/11/2023]
Abstract
Predicting recreational water quality is one of the most difficult tasks in water management with major implications for humans and society. Many data-driven models have been used to predict water quality indicators to allow a real time assessment of public health risk. This assessment is most commonly based on Faecal Indicator Bacteria (FIB), with the value of FIB compared with thresholds published in guidelines. However, FIB values usually tend to be unbalanced within water quality datasets, with small proportions of data exceeding guideline thresholds and far larger numbers that do not. This can be a limiting factor in the uptake of model predictions since, even if the overall accuracy is high, the sensitivity of the predictions can be low. To address this issue, this paper proposes an adaptive synthetic sampling algorithm (ADASYN) to generate synthetic above-threshold FIB instances and test the validity of the approach for the prediction of recreational water quality. The models in this paper are based on four machine learning techniques: k-mean nearest neighbour, boosting decision tree, support vector machine, and multi-layer perceptron artificial neural network and are applied to five different locations in Auckland, New Zealand. Aside from support vector machine, all models provide favourable predictions with relatively high sensitivity (around 75%) and overall accuracy (over 90%), indicating that both the compliant and exceedance conditions can be effectively predicted through the use of more sophisticated model training which involves artificial data. Considering the model accuracy and stability, boosting decision trees (BDT) and multi-layer perceptron artificial neural (MLP-ANN) network are the best two models and the multi-layer perceptron is the most efficient with the shortest computation time.
Collapse
Affiliation(s)
- Tingting Xu
- School of Environment, Faculty of Science, University of Auckland, New Zealand.
| | - Giovanni Coco
- School of Environment, Faculty of Science, University of Auckland, New Zealand
| | - Martin Neale
- School of Environment, Faculty of Science, University of Auckland, New Zealand
| |
Collapse
|
22
|
Panidhapu A, Li Z, Aliashrafi A, Peleato NM. Integration of weather conditions for predicting microbial water quality using Bayesian Belief Networks. WATER RESEARCH 2020; 170:115349. [PMID: 31830650 DOI: 10.1016/j.watres.2019.115349] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 10/27/2019] [Accepted: 11/27/2019] [Indexed: 06/10/2023]
Abstract
Levels of fecal indicator bacteria (FIB) provide a surrogate measure of the microbial quality of water used for a wide range of applications. Despite the common use of these measures, a significant limitation is a delay in results due to the time required for cultivation and enumeration of FIB. Testing requires at least 18-24 h, and therefore, FIB cannot be used to identify current or real-time microbial water quality. An approach of nowcasting or empirical modelling approaches that incorporate water quality, environmental, and weather variables to predict FIB levels in real-time has been developed with some success. However, FIB levels are dependent on a complex interaction of numerous variables, which can be challenging to model with ordinary linear regression or classification methods most commonly applied. In this study, novel use of Bayesian Belief Networks (BBNs) that allow for a probabilistic representation of complex variable interactions is investigated for real-time modelling of FIB levels surface waters. In particular, the integration of both water quality measures and current/historical weather for prediction of fecal coliforms and Escherichia coli levels is achieved using BBNs. For 4-bin classification of fecal coliform levels, BBNs increased prediction accuracy by 25%-54% compared to other previously used techniques including logistic regression, Naïve Bayes, and random forests. Binary prediction of E. coli levels exceeding a threshold of 20 CFU/100 mL was also significantly improved using BBNs with prediction accuracies >90% for all monitoring sites. Advantages of the BBN approach are also demonstrated identifying the ability to make predictions from incomplete monitoring data as well as probabilistic inference of variable importance in FIB levels. In particular, the results indicate that water quality surrogates such as conductivity are essential to real-time prediction of FIB. The results and models described in this work can be readily utilized to provide accurate and real-time assessments of FIB levels in surface waters utilizing commonly monitored parameters.
Collapse
Affiliation(s)
- Anjaneyulu Panidhapu
- School of Engineering, University of British Columbia Okanagan, 1137, Alumni Ave., Kelowna, BC, Canada
| | - Ziyu Li
- School of Engineering, University of British Columbia Okanagan, 1137, Alumni Ave., Kelowna, BC, Canada
| | - Atefeh Aliashrafi
- School of Engineering, University of British Columbia Okanagan, 1137, Alumni Ave., Kelowna, BC, Canada
| | - Nicolás M Peleato
- School of Engineering, University of British Columbia Okanagan, 1137, Alumni Ave., Kelowna, BC, Canada.
| |
Collapse
|
23
|
Peng Z, Hu W, Liu G, Zhang H, Gao R, Wei W. Development and evaluation of a real-time forecasting framework for daily water quality forecasts for Lake Chaohu to Lead time of six days. THE SCIENCE OF THE TOTAL ENVIRONMENT 2019; 687:218-231. [PMID: 31207512 DOI: 10.1016/j.scitotenv.2019.06.067] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Revised: 06/02/2019] [Accepted: 06/04/2019] [Indexed: 06/09/2023]
Abstract
The socioeconomic benefits associated with informative water quality forecasts for large lakes are becoming increasingly evident. However, it remains an enormous challenge to produce forecasts of water quality variables that are accurate enough to meet public demand. In this study, we developed and evaluated a new forecast framework for real-time forecasting of daily dissolved oxygen (DO), ammonium nitrogen (NH), total phosphorus (TP) and total nitrogen (TN) concentrations at lead times from one to six days for Lake Chaohu, the fifth largest freshwater lake in China. The forecast framework is based on a 3-D hydrodynamic ecological model referred to as EcoLake. We used hydrological, meteorological and water quality data from multiple sources to generate initial conditions and forcing functions. Solar radiation and inflows from tributaries which are not readily available were calculated using forecasted cloud cover and rainfall. Forecast skill was evaluated based on 122 forecasts produced on different days in 2017 and for each of the 12 sampling sites. Results indicate that the skill of the forecast framework varies considerably across water quality variables, sampling sites, and lead times. Generally, the forecast framework is more skillful than the persistence forecasts, which use the most recent observations as forecasts. The TN forecasts tend to be the most skillful with a mean RMSE skill score of 28.5% averaged across the six lead times. The DO forecasts tend to have the lowest skill with an average value of 10.9%. Model sensitivity experiments further revealed that errors in the raw air temperature and wind speed forecasts have a noticeable impact on the overall skill of DO and NH forecasts. The forecast framework proposed here could be a useful operational forecasting tool to enhance the effectiveness of the drinking water supply and public health protection based on the water quality management of Lake Chaohu.
Collapse
Affiliation(s)
- Zhaoliang Peng
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing 210008, China.
| | - Weiping Hu
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing 210008, China
| | - Gang Liu
- Administration Bureau of Lake Chaohu of Anhui Province, Chaohu 238000, China
| | - Hui Zhang
- Administration Bureau of Lake Chaohu of Anhui Province, Chaohu 238000, China
| | - Rui Gao
- Administration Bureau of Lake Chaohu of Anhui Province, Chaohu 238000, China
| | - Wei Wei
- Hefei Bureau of Hydrology, Hefei 230000, China
| |
Collapse
|
24
|
Bertone E, Purandare J, Durand B. Spatiotemporal prediction of Escherichia coli and Enterococci for the Commonwealth Games triathlon event using Bayesian Networks. MARINE POLLUTION BULLETIN 2019; 146:11-21. [PMID: 31426138 DOI: 10.1016/j.marpolbul.2019.05.066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 05/28/2019] [Accepted: 05/29/2019] [Indexed: 06/10/2023]
Abstract
A number of Bayesian Networks were developed in order to nowcast and forecast, up to 4 days ahead and in different locations, the likelihood of water quality within the 2018 Commonwealth Games Triathlon swim course exceeding the critical limits for Enterococci and Escherichia coli. The models are data-driven, but the identification of potential inputs and optimal model structure was performed through the parallel contribution of several stakeholders and experts, consulted through workshops. The models, whose main nodes were discretised with a customised discretisation algorithm, were validated over a test set of data and deployed in real-time during the Commonwealth Games in support to a traditional water quality monitoring program. The proposed modelling framework proved to be cost-effective and less time-consuming than process-based models while still achieving high accuracy; in addition, the added value of a continuous stakeholder engagement guarantees a shared understanding of the model outputs and its future deployment.
Collapse
Affiliation(s)
- E Bertone
- School of Engineering and Built Environment, Griffith University, Gold Coast Campus, QLD 4222, Australia; Cities Research Institute, Griffith University, Gold Coast Campus, QLD 4222, Australia.
| | - J Purandare
- Cities Research Institute, Griffith University, Gold Coast Campus, QLD 4222, Australia; Gold Coast Water and Waste, City of Gold Coast, QLD 4211, Australia
| | - B Durand
- Gold Coast Water and Waste, City of Gold Coast, QLD 4211, Australia
| |
Collapse
|
25
|
Harris AR, Pickering AJ, Boehm AB, Mrisho M, Davis J. Comparison of analytical techniques to explain variability in stored drinking water quality and microbial hand contamination of female caregivers in Tanzania. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2019; 21:893-903. [PMID: 31017132 DOI: 10.1039/c8em00460a] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Exposure to fecal contamination continues to be a major public health concern for low-income households in sub-Saharan Africa. Drinking water and hands are known transmission routes for pathogens in household environments. In an effort to identify explanatory variables of water and hand contamination, a variety of analytical approaches have been employed that model variation in E. coli contamination as a function of behaviors and household characteristics. Using data collected from 1217 households in Bagamoyo, Tanzania, this investigation compares the explanatory variables identified in the three different modeling methods to explain hand and water contamination: ordinary least squares regression, logistic regression, and classification tree. Although the modeling approaches varied, there were some similarities in the results, with certain explanatory variables being consistently identified as being related to hand and water contamination (e.g., water source type for the water models and activity prior to sampling for the hand models). At the same time, there were also marked differences across the models. In sum, these results suggest there are benefits to using multiple analysis methods to assess relationships in complex systems. The models were also characterized by low explanatory power, suggesting that variation in hand and water contamination is difficult to capture when analyzing one-time water and hand rinse samples. For improved model performance, future studies could explore modeling of repeat measures of water quality and hand contamination.
Collapse
Affiliation(s)
- Angela R Harris
- Environmental and Water Studies, Department of Civil and Environmental Engineering, Stanford University, Stanford, CA, USA
| | | | | | | | | |
Collapse
|
26
|
Sagarduy M, Courtois S, Del Campo A, Garmendia JM, Petrau A. Differential decay and prediction of persistence of Enterococcus spp. and Escherichia coli culturable cells and molecular markers in freshwater and seawater environments. Int J Hyg Environ Health 2019; 222:695-704. [PMID: 31097324 DOI: 10.1016/j.ijheh.2019.04.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 04/19/2019] [Accepted: 04/23/2019] [Indexed: 10/26/2022]
Abstract
To quantify the impact of fecal pollution on the microbiological bathing water quality, predictive modeling is being increasingly used in which the decay rate of the fecal indicators plays an important role. The decay of sewage-sourced enterococci and Escherichia coli culturable cells and their associated molecular markers (16SrRNA) quantified by Quantitative Reverse transcription PCR were measured in controlled microcosms as well in in situ conditions using different water types, from marine waters to fresh waters with intermediate salinity. All bacterial decays were fitted to a first order decay model. In the laboratory study, the light radiation was the most influent factor affecting E. coli and enterococci survival by culture methods although environmental conditions weakly impacted the decay of molecular markers. The results also indicated differential persistence of genetic markers and culturable organisms of fecal indicator bacteria in different water systems. For each bacteria indicator and analytical method, four equations were obtained to predict the time required to have a 90% reduction (T90) according irradiance, salinity and temperature parameters. The weighted model RMSE (Root Mean Square Error) calculated for all field experiments showed that quantification obtained with the equations defined by laboratory-based study compared reasonably well with in-situ observed quantification (0.4 and 0.2 log by standard culture methods for E. coli and Enterococcus spp. and 0.6 and 0.3 log by RT-qPCR for E. coli and Enterococcus spp. respectively). The modeling tool can be used to predict the presence of fecal pollution in marine and fresh waters in combination with either culture based- or rapid molecular methods.
Collapse
Affiliation(s)
- Maialen Sagarduy
- Rivages Pro Tech, 2, Allée Théodore Monod, 64210, Bidart, France.
| | - Sophie Courtois
- Suez, CIRSEE, 38 rue du président Wilson, 78230, Le Pecq, France
| | - Andrea Del Campo
- AZTI Tecnalia, Herrera Kaia - Portualdea z/g, E-20110, Pasaia, Spain
| | | | - Agnès Petrau
- Rivages Pro Tech, 2, Allée Théodore Monod, 64210, Bidart, France
| |
Collapse
|
27
|
He Y, He Y, Sen B, Li H, Li J, Zhang Y, Zhang J, Jiang SC, Wang G. Storm runoff differentially influences the nutrient concentrations and microbial contamination at two distinct beaches in northern China. THE SCIENCE OF THE TOTAL ENVIRONMENT 2019; 663:400-407. [PMID: 30716630 DOI: 10.1016/j.scitotenv.2019.01.369] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 01/24/2019] [Accepted: 01/28/2019] [Indexed: 06/09/2023]
Abstract
With the escalating coastal development and loss of vegetated landscape, the volume of storm runoff increases significantly in Chinese coastal cities. To protect human health and valuable recreational resources, it is necessary to develop a quantitative understanding of coastal pollution. Here we studied the influence of storm runoff on the nutrients and microbial pathogens at two popular bathing beaches in northern China. Dongshan Beach, located near the mouth of an urban river, is influenced by non-point source pollution while Tiger-Rock Beach, a coastal beach, is primarily influenced by a point source from a storm drain outfall. Storm runoff significantly (P < 0.001) decreased the salinity and Chl a post-storm at both the beaches, but only reduced the concentration of dissolved inorganic N at Tiger-Rock Beach. Escherichia coli decreased by 68.7% at Dongshan Beach, possibly due to the dilution effect of the stormflow, contradicting the notion of elevated fecal contamination in coastal beaches from storm runoff. Vibrio parahaemolyticus increased at both beaches post-storm, by 155.7% at Dongshan Beach and 136.7% at Tiger-Rock Beach. Regardless of storm impact, both E. coli and V. parahaemolyticus were much higher at Dongshan Beach than that at Tiger-Rock, suggesting the influence of different surrounding topographies. Lastly, the statistical models developed based on the environmental and microbial parameters regression showed predictive power (adjusted R2 > 0.5) to estimate the concentration of E. coli at Dongshan Beach and V. parahaemolyticus at Tiger-Rock Beach. Overall, the results suggest the unique role of the individual beaches in attenuating the effect of rainfall on the concentration of microbial pathogens in bathing water quality and provide unique predictive models for recreational water management and public health protection.
Collapse
Affiliation(s)
- Yike He
- Center for Marine Environmental Ecology, School of Environmental Science and Engineering, Tianjin University, Tianjin 300072, China
| | - Yaodong He
- Center for Marine Environmental Ecology, School of Environmental Science and Engineering, Tianjin University, Tianjin 300072, China
| | - Biswarup Sen
- Center for Marine Environmental Ecology, School of Environmental Science and Engineering, Tianjin University, Tianjin 300072, China
| | - Hao Li
- Center for Marine Environmental Ecology, School of Environmental Science and Engineering, Tianjin University, Tianjin 300072, China
| | - Jiaqian Li
- Center for Marine Environmental Ecology, School of Environmental Science and Engineering, Tianjin University, Tianjin 300072, China
| | - Yongfeng Zhang
- Qinhuangdao Marine Environmental Monitoring Central Station, SOA, Qinhuangdao, Hebei 066002, China
| | - Jianle Zhang
- Qinhuangdao Marine Environmental Monitoring Central Station, SOA, Qinhuangdao, Hebei 066002, China
| | - Sunny C Jiang
- Department of Civil and Environmental Engineering, University of California at Irvine, CA 92697, USA
| | - Guangyi Wang
- Center for Marine Environmental Ecology, School of Environmental Science and Engineering, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
28
|
García-Alba J, Bárcena JF, Ugarteburu C, García A. Artificial neural networks as emulators of process-based models to analyse bathing water quality in estuaries. WATER RESEARCH 2019; 150:283-295. [PMID: 30529593 DOI: 10.1016/j.watres.2018.11.063] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 10/26/2018] [Accepted: 11/21/2018] [Indexed: 06/09/2023]
Abstract
This study aims to provide a method for developing artificial neural networks in estuaries as emulators of process-based models to analyse bathing water quality and its variability over time and space. The methodology forecasts the concentration of faecal indicator organisms, integrating the accuracy and reliability of field measurements, the spatial and temporal resolution of process-based modelling, and the decrease in computational costs by artificial neural networks whilst preserving the accuracy of results. Thus, the overall approach integrates a coupled hydrodynamic-bacteriological model previously calibrated with field data at the bathing sites into a low-order emulator by using artificial neural networks, which are trained by the process-based model outputs. The application of the method to the Eo Estuary, located on the northwestern coast of Spain, demonstrated that artificial neural networks are viable surrogates of highly nonlinear process-based models and highly variable forcings. The results showed that the process-based model and the neural networks conveniently reproduced the measurements of Escherichia coli (E. coli) concentrations, indicating a slightly better fit for the process-based model (R2 = 0.87) than for the neural networks (R2 = 0.83). This application also highlighted that during the model setup of both predictive tools, the computational time of the process-based approach was 0.78 times lower than that of the artificial neural networks (ANNs) approach due to the additional time spent on ANN development. Conversely, the computational costs of forecasting are considerably reduced by the neural networks compared with the process-based model, with a decrease in hours of 25, 600, 3900, and 31633 times for forecasting 1 h, 1 day, 1 month, and 1 bathing season, respectively. Therefore, the longer the forecasting period, the greater the reduction in computational time by artificial neural networks.
Collapse
Affiliation(s)
- Javier García-Alba
- Environmental Hydraulics Institute "IHCantabria", Universidad de Cantabria - Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011, Santander, Spain.
| | - Javier F Bárcena
- Environmental Hydraulics Institute "IHCantabria", Universidad de Cantabria - Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011, Santander, Spain.
| | - Carlos Ugarteburu
- Environmental Hydraulics Institute "IHCantabria", Universidad de Cantabria - Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011, Santander, Spain.
| | - Andrés García
- Environmental Hydraulics Institute "IHCantabria", Universidad de Cantabria - Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011, Santander, Spain.
| |
Collapse
|
29
|
Laureano-Rosario AE, Duncan AP, Symonds EM, Savic DA, Muller-Karger FE. Predicting culturable enterococci exceedances at Escambron Beach, San Juan, Puerto Rico using satellite remote sensing and artificial neural networks. JOURNAL OF WATER AND HEALTH 2019; 17:137-148. [PMID: 30758310 DOI: 10.2166/wh.2018.128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Predicting recreational water quality is key to protecting public health from exposure to wastewater-associated pathogens. It is not feasible to monitor recreational waters for all pathogens; therefore, monitoring programs use fecal indicator bacteria (FIB), such as enterococci, to identify wastewater pollution. Artificial neural networks (ANNs) were used to predict when culturable enterococci concentrations exceeded the U.S. Environmental Protection Agency (U.S. EPA) Recreational Water Quality Criteria (RWQC) at Escambron Beach, San Juan, Puerto Rico. Ten years of culturable enterococci data were analyzed together with satellite-derived sea surface temperature (SST), direct normal irradiance (DNI), turbidity, and dew point, along with local observations of precipitation and mean sea level (MSL). The factors identified as the most relevant for enterococci exceedance predictions based on the U.S. EPA RWQC were DNI, turbidity, cumulative 48 h precipitation, MSL, and SST; they predicted culturable enterococci exceedances with an accuracy of 75% and power greater than 60% based on the Receiving Operating Characteristic curve and F-Measure metrics. Results show the applicability of satellite-derived data and ANNs to predict recreational water quality at Escambron Beach. Future work should incorporate local sanitary survey data to predict risky recreational water conditions and protect human health.
Collapse
Affiliation(s)
- Abdiel E Laureano-Rosario
- College of Marine Science, University of South Florida, 140 7th Avenue South, Saint Petersburg, FL 33701, USA E-mail:
| | - Andrew P Duncan
- Centre for Water Systems, University of Exeter, Harrison Building, North Park Road, Exeter EX4 4QF, UK
| | - Erin M Symonds
- College of Marine Science, University of South Florida, 140 7th Avenue South, Saint Petersburg, FL 33701, USA E-mail:
| | - Dragan A Savic
- Centre for Water Systems, University of Exeter, Harrison Building, North Park Road, Exeter EX4 4QF, UK
| | - Frank E Muller-Karger
- College of Marine Science, University of South Florida, 140 7th Avenue South, Saint Petersburg, FL 33701, USA E-mail:
| |
Collapse
|
30
|
Wyer MD, Kay D, Morgan H, Naylor S, Clark S, Watkins J, Davies CM, Francis C, Osborn H, Bennett S. Within-day variability in microbial concentrations at a UK designated bathing water: Implications for regulatory monitoring and the application of predictive modelling based on historical compliance data. WATER RESEARCH X 2018; 1:100006. [PMID: 31193990 PMCID: PMC6549935 DOI: 10.1016/j.wroa.2018.10.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 10/15/2018] [Accepted: 10/19/2018] [Indexed: 06/09/2023]
Abstract
Prediction of bathing water quality is recommended by the World Health Organization (WHO), the European Union (EU) and the United States Environmental Protection Agency (USEPA) and is an established element in bathing water management designed to protect public health. Most commonly, historical regulatory compliance data are used for model calibration and provide the dependent variable for modelling. Independent (or predictor) variables (e.g. rainfall, river flow and received irradiance) measured over some antecedent period are used to deliver prediction of the faecal indicator concentration measured on the day of the regulatory sample collection. The implied linked assumptions of this approach are, therefore, that; (i) the independent variables accurately predict the bathing-day water quality; which is (ii) accurately characterized by the single regulatory sample. Assumption (ii) will not be the case where significant within-day variability in water quality is evident. This study built a detailed record of water quality change through 60 days at a UK coastal bathing water in 2011 using half-hourly samples each subjected to triplicate filtration designed to enhance enumeration precision. On average, the mean daily variation in FIO concentrations exceeded 1 log10 order, with the largest daily variations exceeding 2 log10 orders. Significant diurnality was observed at this bathing water, which would determine its EU Directive compliance category if the regulatory samples were collected at the same time each day. A sampling programme of this intensity has not been reported elsewhere to date and, if this pattern is proven to be characteristic of other bathing waters world-wide, it has significance for: (a) the design of regulatory sampling programmes; (b) the use of historical data to assess compliance, which often comprises a single sample taken at the compliance point on a regular, often weekly, basis; and (c) the use of regulatory compliance data to build predictive models of water quality.
Collapse
Affiliation(s)
- Mark D. Wyer
- Department of Geography and Earth Sciences, Llandinam Building, Aberystwyth University, SY23 3DB, UK
| | - David Kay
- Department of Geography and Earth Sciences, Llandinam Building, Aberystwyth University, SY23 3DB, UK
| | - Huw Morgan
- Place, Housing and Public Protection Services, Pollution Control, Swansea Council, The Guildhall, Swansea, SA1 4PE, UK
| | - Sam Naylor
- Place, Housing and Public Protection Services, Pollution Control, Swansea Council, The Guildhall, Swansea, SA1 4PE, UK
| | - Simon Clark
- Place, Housing and Public Protection Services, Pollution Control, Swansea Council, The Guildhall, Swansea, SA1 4PE, UK
| | - John Watkins
- Department of Geography and Earth Sciences, Llandinam Building, Aberystwyth University, SY23 3DB, UK
| | - Cheryl M. Davies
- Department of Geography and Earth Sciences, Llandinam Building, Aberystwyth University, SY23 3DB, UK
| | - Carol Francis
- Department of Geography and Earth Sciences, Llandinam Building, Aberystwyth University, SY23 3DB, UK
| | - Hamish Osborn
- Natural Resources Wales, Area Office, Maes Newydd, Llandarcy, SA10 6JQ, UK
| | - Sarah Bennett
- Natural Resources Wales, Area Office, Maes Newydd, Llandarcy, SA10 6JQ, UK
| |
Collapse
|
31
|
Zimmer-Faust AG, Brown CA, Manderson A. Statistical models of fecal coliform levels in Pacific Northwest estuaries for improved shellfish harvest area closure decision making. MARINE POLLUTION BULLETIN 2018; 137:360-369. [PMID: 30503445 PMCID: PMC6290359 DOI: 10.1016/j.marpolbul.2018.09.028] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2018] [Revised: 09/07/2018] [Accepted: 09/16/2018] [Indexed: 05/03/2023]
Abstract
There is a substantial need for tools that effectively predict spatial and temporal fecal pollution patterns in estuarine waters. In this study, statistical models of exceedances of shellfish fecal coliform (FC) water quality criteria were developed using a 10-year dataset of FC levels and environmental data. Performance (sensitivity, specificity, and predictive capacity) of five different types of models was tested (MLR regression, Tobit (censored) regression, Firth's binary logistic regression (BLR), classification trees, and mixed-effects regression) for each of three conditionally managed shellfish-harvesting areas in Tillamook Bay, Oregon (USA). The most influential variables were related to precipitation and river stage height in the wet season and wind and tidal-stage in the dry season. Classification tree and Firth's BLR approaches better explained exceedances of shellfish water quality standards than the current closure thresholds. Findings demonstrate the utility of statistical modeling approaches for improved management of shellfish harvesting waters.
Collapse
Affiliation(s)
- Amity G Zimmer-Faust
- U.S. Environmental Protection Agency, Office of Research and Development, 2111 Marine Science Dr, Newport, OR 97365, United States of America.
| | - Cheryl A Brown
- U.S. Environmental Protection Agency, Office of Research and Development, 2111 Marine Science Dr, Newport, OR 97365, United States of America
| | - Alex Manderson
- Oregon Department of Agriculture, Salem, OR, United States of America
| |
Collapse
|
32
|
Searcy RT, Taggart M, Gold M, Boehm AB. Implementation of an automated beach water quality nowcast system at ten California oceanic beaches. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2018; 223:633-643. [PMID: 29975890 DOI: 10.1016/j.jenvman.2018.06.058] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 06/12/2018] [Accepted: 06/17/2018] [Indexed: 06/08/2023]
Abstract
Fecal indicator bacteria like Escherichia coli and entercococci are monitored at beaches around the world to reduce incidence of recreational waterborne illness. Measurements are usually made weekly, but FIB concentrations can exhibit extreme variability, fluctuating at shorter periods. The result is that water quality has likely changed by the time data are provided to beachgoers. Here, we present an automated water quality prediction system (called the nowcast system) that is capable of providing daily predictions of water quality for numerous beaches. We created nowcast models for 10 California beaches using weather, oceanographic, and other environmental variables as input to tuned regression models to predict if FIB concentrations were above single sample water quality standards. Rainfall was used as a variable in nearly every model. The models were calibrated and validated using historical data. Subsequently, models were implemented during the 2017 swim season in collaboration with local beach managers. During the 2017 swim season, the median sensitivity of the nowcast models was 0.5 compared to 0 for the current method of using day-to-week old measurements to make beach posting decisions. Model specificity was also high (median of 0.87). During the implementation phase, nowcast models provided an average of 140 additional days per beach of updated water quality information to managers when water quality measurements were not made. The work presented herein emphasizes that a one-size-fits all approach to nowcast modeling, even when beaches are in close proximity, is infeasible. Flexibility in modeling approaches and adaptive responses to modeling and data challenges are required when implementing nowcast models for beach management.
Collapse
Affiliation(s)
- Ryan T Searcy
- Heal the Bay, 1444 9th Street, Santa Monica, CA 90401, USA
| | - Mitzy Taggart
- Heal the Bay, 1444 9th Street, Santa Monica, CA 90401, USA
| | - Mark Gold
- UCLA, 2248 Murphy Hall, 410 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Alexandria B Boehm
- Department of Civil and Environmental Engineering, Stanford University, 473 Via Ortega, Stanford, CA, 94305, USA.
| |
Collapse
|
33
|
Park Y, Kim M, Pachepsky Y, Choi SH, Cho JG, Jeon J, Cho KH. Development of a Nowcasting System Using Machine Learning Approaches to Predict Fecal Contamination Levels at Recreational Beaches in Korea. JOURNAL OF ENVIRONMENTAL QUALITY 2018; 47:1094-1102. [PMID: 30272778 DOI: 10.2134/jeq2017.11.0425] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Microbial contamination in beach water poses a public health threat due to waterborne diseases. To reduce the risk of exposure to fecal contamination, informing beachgoers in advance about the microbial water quality is important. Currently, determining the level of fecal contamination takes 24 h. The objective of this study is to predict the current level of fecal contamination (enterococcus [ENT] and ) using readily available environmental variables. Artificial neural network (ANN) and support vector regression (SVR) models were constructed using data from the Haeundae and Gwangalli Beaches in Busan City. The input variables included the tidal level, air and water temperature, solar radiation, wind direction and velocity, precipitation, discharge from the wastewater treatment plant, and suspended solid concentration in beach water. The dependence of fecal contamination on the input variables was statistically evaluated; precipitation, discharge from the wastewater treatment plant, and wind direction at the two beaches were positively correlated to the changes in the two bacterial concentrations ( < 0.01), whereas solar radiation was negatively correlated ( < 0.01). The performance of the ANN model for predicting ENT and at Gwangalli Beach was significantly higher than that of the SVR model with the training dataset ( < 0.05). Based on the comparison of residual values between the predicted and observed fecal indicator bacteria concentrations in two models, the ANN demonstrated better performance than SVR. This study suggests an effective prediction method to determine whether a beach is safe for recreational use.
Collapse
|
34
|
Enterococcal Concentrations in a Coastal Ecosystem Are a Function of Fecal Source Input, Environmental Conditions, and Environmental Sources. Appl Environ Microbiol 2018; 84:AEM.01038-18. [PMID: 30006393 DOI: 10.1128/aem.01038-18] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 06/23/2018] [Indexed: 02/01/2023] Open
Abstract
Fecal pollution at coastal beaches requires management efforts to address public health and economic concerns. Feces-borne bacterial concentrations are influenced by different fecal sources, environmental conditions, and ecosystem reservoirs, making their public health significance convoluted. In this study, we sought to delineate the influences of these factors on enterococcal concentrations in southern Maine coastal recreational waters. Weekly water samples and water quality measurements were conducted at freshwater, estuarine, and marine beach sites from June through September 2016. The samples were analyzed for total and particle-associated enterococcal concentrations, total suspended solids, and microbial source tracking markers (PCR: Bac32, HF183, CF128, DF475, and Gull2; quantitative PCR [qPCR]: AllBac, HF183, and GFD). Water, soil, sediment, and marine sediment samples were also subjected to 16S rRNA sequencing and SourceTracker analysis to determine the influence from these environmental reservoirs on water sample microbial communities. Enterococcal and particle-associated enterococcal concentrations were elevated in freshwater, but the concentrations of suspended solids were relatively similar. Mammal fecal contamination was significantly elevated in the estuary, with human and bird fecal contaminant levels similar between sites. A partial least-squares regression model indicated particle-associated enterococcal and mammal marker concentrations had the most significant positive relationships with enterococcal concentrations across marine, estuary, and freshwater environments. Freshwater microbial communities were significantly influenced by underlying sediment, while estuarine/marine beach communities were influenced by freshwater, high tide height, and estuarine sediment. Elevated enterococcal levels were reflective of a combination of increased fecal source input, environmental sources, and environmental conditions, highlighting the need for encompassing microbial source tracking (MST) approaches for managing water quality issues.IMPORTANCE Enterococci have long been the federal standard in determining water quality at estuarine and marine environments. Although enterococci are highly abundant in the intestines of many animals, they are not exclusive to that environment and can persist and grow outside fecal tracts. This presents a management problem for areas that are largely impaired by nonpoint source contamination, as fecal sources might not be the root cause of contamination. This study employed different microbial source tracking methods for delineating the influences from fecal source input, environmental sources, and environmental conditions to determine which combination of variables are influencing enterococcal concentrations in recreational waters at a historically impaired coastal town. The results showed that fecal source input, environmental sources, and conditions all play roles in influencing enterococcal concentrations. This highlights the need to include an encompassing microbial source tracking approach to assess the effects of all important variables on enterococcal concentrations.
Collapse
|
35
|
de Souza RV, Campos CJA, Garbossa LHP, Seiffert WQ. Developing, cross-validating and applying regression models to predict the concentrations of faecal indicator organisms in coastal waters under different environmental scenarios. THE SCIENCE OF THE TOTAL ENVIRONMENT 2018; 630:20-31. [PMID: 29471188 DOI: 10.1016/j.scitotenv.2018.02.139] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Revised: 02/09/2018] [Accepted: 02/12/2018] [Indexed: 06/08/2023]
Abstract
This study developed, cross-validated and applied a regression-based model to predict concentrations of faecal indicator organisms (FIOs) under different environmental conditions in the North and South bays of Santa Catarina, South of Brazil. The model was developed using a database of FIO concentrations in seawater sampled at 50 sites and the validation was performed using a different database by comparing 288 pairs of measured and modelled results for 15 sites. The index of agreement between the model outputs and the FIO concentrations measured during the validation period was 66%; the mean average error was 0.43 log10 and the root mean square error was 0.58 log10 MPN.100mL-1. These validation results indicate that the model provides a fair representation of the FIO contamination in the bays for the meteorological conditions under which the model was trained. The simulation of different scenarios showed that under typical levels of resident human population in the catchments and median rainfall and solar radiation conditions, the median FIO concentration in the bays is 0.4 MPN.100mL-1. Under extreme meteorological conditions, the combined effect of high rainfall and low solar radiation increased FIO concentrations up to 5 log10 MPN.100mL-1. The simulated scenarios also show that increases in resident population during the summer tourist season and average rainfall concentrations do not increase median FIO concentrations in the bays relative to periods of time with average population, possibly because of higher bacterial die-off in the waters. The models can be an effective tool for management of human health risks in bathing and shellfish waters impacted by sewage pollution.
Collapse
Affiliation(s)
- Robson V de Souza
- Empresa de Pesquisa Agropecuária e Extensão Rural de Santa Catarina (Epagri), Rodovia Admar Gonzaga, 1347, Itacorubi, Florianópolis, SC 88034-901, Brazil.
| | - Carlos J A Campos
- Centre for Environment, Fisheries & Aquaculture Science (Cefas), Weymouth Laboratory, Barrack Road, The Nothe DT4 8UB, UK
| | - Luis H P Garbossa
- Empresa de Pesquisa Agropecuária e Extensão Rural de Santa Catarina (Epagri), Rodovia Admar Gonzaga, 1347, Itacorubi, Florianópolis, SC 88034-901, Brazil
| | - Walter Q Seiffert
- Universidade Federal de Santa Catarina (UFSC), Rodovia Admar Gonzaga, 1346, Itacorubi, Florianópolis, SC 88034-001, Brazil
| |
Collapse
|
36
|
Steele JA, Blackwood AD, Griffith JF, Noble RT, Schiff KC. Quantification of pathogens and markers of fecal contamination during storm events along popular surfing beaches in San Diego, California. WATER RESEARCH 2018; 136:137-149. [PMID: 29501758 DOI: 10.1016/j.watres.2018.01.056] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Revised: 01/19/2018] [Accepted: 01/24/2018] [Indexed: 05/08/2023]
Abstract
Along southern California beaches, the concentrations of fecal indicator bacteria (FIB) used to quantify the potential presence of fecal contamination in coastal recreational waters have been previously documented to be higher during wet weather conditions (typically winter or spring) than those observed during summer dry weather conditions. FIB are used for management of recreational waters because measurement of the bacterial and viral pathogens that are the potential causes of illness in beachgoers exposed to stormwater can be expensive, time-consuming, and technically difficult. Here, we use droplet digital Polymerase Chain Reaction (digital PCR) and digital reverse transcriptase PCR (digital RT-PCR) assays for direct quantification of pathogenic viruses, pathogenic bacteria, and source-specific markers of fecal contamination in the stormwater discharges. We applied these assays across multiple storm events from two different watersheds that discharge to popular surfing beaches in San Diego, CA. Stormwater discharges had higher FIB concentrations as compared to proximal beaches, often by ten-fold or more during wet weather. Multiple lines of evidence indicated that the stormwater discharges contained human fecal contamination, despite the presence of separate storm sewer and sanitary sewer systems in both watersheds. Human fecal source markers (up to 100% of samples, 20-12440 HF183 copies per 100 ml) and human norovirus (up to 96% of samples, 25-495 NoV copies per 100 ml) were routinely detected in stormwater discharge samples. Potential bacterial pathogens were also detected and quantified: Campylobacter spp. (up to 100% of samples, 16-504 gene copies per 100 ml) and Salmonella (up to 25% of samples, 6-86 gene copies per 100 ml). Other viral human pathogens were also measured, but occurred at generally lower concentrations: adenovirus (detected in up to 22% of samples, 14-41 AdV copies per 100 ml); no enterovirus was detected in any stormwater discharge sample. Higher concentrations of avian source markers were noted in the stormwater discharge located immediately downstream of a large bird sanctuary along with increased Campylobacter concentrations and notably different Campylobacter species composition than the watershed that had no bird sanctuary. This study is one of the few to directly measure an array of important bacterial and viral pathogens in stormwater discharges to recreational beaches, and provides context for stormwater-based management of beaches during high risk wet-weather periods. Furthermore, the combination of culture-based and digital PCR-derived data is demonstrated to be valuable for assessing hydrographic relationships, considering delivery mechanisms, and providing foundational exposure information for risk assessment.
Collapse
Affiliation(s)
- Joshua A Steele
- Southern California Coastal Water Research Project, 3535 Harbor Blvd. Ste 110, Costa Mesa, CA 92626, USA.
| | - A Denene Blackwood
- UNC Institute of Marine Science, 3431 Arendell Street, Morehead City, NC 28557, USA
| | - John F Griffith
- Southern California Coastal Water Research Project, 3535 Harbor Blvd. Ste 110, Costa Mesa, CA 92626, USA
| | - Rachel T Noble
- UNC Institute of Marine Science, 3431 Arendell Street, Morehead City, NC 28557, USA
| | - Kenneth C Schiff
- Southern California Coastal Water Research Project, 3535 Harbor Blvd. Ste 110, Costa Mesa, CA 92626, USA
| |
Collapse
|
37
|
Jennings WC, Chern EC, O'Donohue D, Kellogg MG, Boehm AB. Frequent detection of a human fecal indicator in the urban ocean: environmental drivers and covariation with enterococci. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2018; 20:480-492. [PMID: 29404550 PMCID: PMC6686843 DOI: 10.1039/c7em00594f] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Fecal pollution of surface waters presents a global human health threat. New molecular indicators of fecal pollution have been developed to address shortcomings of traditional culturable fecal indicators. However, there is still little information on their fate and transport in the environment. The present study uses spatially and temporally extensive data on traditional (culturable enterococci, cENT) and molecular (qPCR-enterococci, qENT and human-associated marker, HF183/BacR287) indicator concentrations in marine water surrounding highly-urbanized San Francisco, California, USA to investigate environmental and anthropogenic processes that impact fecal pollution. We constructed multivariable regression models for fecal indicator bacteria at 14 sampling stations. The human marker was detected more frequently in our study than in many other published studies, with detection frequency at some stations as high as 97%. The odds of cENT, qENT, and HF183/BacR287 exceeding health-relevant thresholds were statistically elevated immediately following discharges of partially treated combined sewage, and cENT levels dissipated after approximately 1 day. However, combined sewer discharges were not important predictors of indicator levels typically measured in weekly monitoring samples. Instead, precipitation and solar insolation were important predictors of cENT in weekly samples, while precipitation and water temperature were important predictors of HF183/BacR287 and qENT. The importance of precipitation highlights the significance of untreated storm water as a source of fecal pollution to the urban ocean, even for a city served by a combined sewage system. Sunlight and water temperature likely control persistence of the indicators via photoinactivation and dark decay processes, respectively.
Collapse
Affiliation(s)
- Wiley C Jennings
- Department of Civil and Environmental Engineering, Environmental Engineering and Science, Stanford University, 94305-4020, USA.
| | - Eunice C Chern
- San Francisco Public Utilities Commission, Water Quality Laboratory, 1000 El Camino Real, Millbrae, CA 94030, USA and EPA Region 10 Laboratory, 7411 Beach Dr E, Port Orchard, WA 98366, USA
| | - Diane O'Donohue
- San Francisco Public Utilities Commission, Oceanside Biology Laboratory, 3500 Great Highway, San Francisco, CA 94132, USA
| | - Michael G Kellogg
- San Francisco Public Utilities Commission, Oceanside Biology Laboratory, 3500 Great Highway, San Francisco, CA 94132, USA
| | - Alexandria B Boehm
- Department of Civil and Environmental Engineering, Environmental Engineering and Science, Stanford University, 94305-4020, USA.
| |
Collapse
|
38
|
Avila R, Horn B, Moriarty E, Hodson R, Moltchanova E. Evaluating statistical model performance in water quality prediction. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2018; 206:910-919. [PMID: 29207304 DOI: 10.1016/j.jenvman.2017.11.049] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 10/19/2017] [Accepted: 11/19/2017] [Indexed: 06/07/2023]
Abstract
Exposure to contaminated water while swimming or boating or participating in other recreational activities can cause gastrointestinal and respiratory disease. It is not uncommon for water bodies to experience rapid fluctuations in water quality, and it is therefore vital to be able to predict them accurately and in time so as to minimise population's exposure to pathogenic organisms. E. coli is commonly used as an indicator to measure water quality in freshwater, and higher counts of E. coli are associated with increased risk to illness. In this case study, we compare the performance of a wide range of statistical models in prediction of water quality via E. coli levels for the weekly data collected over the summer months from 2006 to 2014 at the recreational site on the Oreti river in Wallacetown, New Zealand. The models include naive model, multiple linear regression, dynamic regression, regression tree, Markov chain, classification tree, random forests, multinomial logistic regression, discriminant analysis and Bayesian network. The results show that Bayesian network was superior to all the other models. Overall, it had a leave-one-out and k-fold cross validation error rate of 21%, while predicting the majority of instances of E. coli levels classified as unsafe by the Microbiological Water Quality Guidelines for Marine and Freshwater Recreational Areas 2003, New Zealand. Because Bayesian networks are also flexible in handling missing data and outliers and allow for continuous updating in real time, we have found them to be a promising tool, and in the future, plan to extend the analysis beyond the current case study site.
Collapse
Affiliation(s)
- Rodelyn Avila
- School of Mathematics and Statistics, University of Canterbury, Private Bag 4800, Christchurch 8140, New Zealand; Institute of Environmental Science and Research, ESR, PO Box 29181, Christchurch 8540, New Zealand.
| | - Beverley Horn
- Institute of Environmental Science and Research, ESR, PO Box 29181, Christchurch 8540, New Zealand
| | - Elaine Moriarty
- Institute of Environmental Science and Research, ESR, PO Box 29181, Christchurch 8540, New Zealand
| | - Roger Hodson
- Environment Southland, Private Bag 90116, Invercargill 9840, New Zealand
| | - Elena Moltchanova
- School of Mathematics and Statistics, University of Canterbury, Private Bag 4800, Christchurch 8140, New Zealand
| |
Collapse
|
39
|
Soller JA, Schoen M, Steele JA, Griffith JF, Schiff KC. Incidence of gastrointestinal illness following wet weather recreational exposures: Harmonization of quantitative microbial risk assessment with an epidemiologic investigation of surfers. WATER RESEARCH 2017; 121:280-289. [PMID: 28558279 DOI: 10.1016/j.watres.2017.05.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Revised: 05/02/2017] [Accepted: 05/08/2017] [Indexed: 05/26/2023]
Abstract
We modeled the risk of gastrointestinal (GI) illness associated with recreational exposures to marine water following storm events in San Diego County, California. We estimated GI illness risks via quantitative microbial risk assessment (QMRA) techniques by consolidating site specific pathogen monitoring data of stormwater, site specific dilution estimates, literature-based water ingestion data, and literature based pathogen dose-response and morbidity information. Our water quality results indicated that human sources of contamination contribute viral and bacterial pathogens to streams draining an urban watershed during wet weather that then enter the ocean and affect nearshore water quality. We evaluated a series of approaches to account for uncertainty in the norovirus dose-response model selection and compared our model results to those from a concurrently conducted epidemiological study that provided empirical estimates for illness risk following ocean exposure. The preferred norovirus dose-response approach yielded median risk estimates for water recreation-associated illness (15 GI illnesses per 1000 recreation events) that closely matched the reported epidemiological results (12 excess GI illnesses per 1000 wet weather recreation events). The results are consistent with norovirus, or other pathogens associated with norovirus, as an important cause of gastrointestinal illness among surfers in this setting. This study demonstrates the applicability of QMRA for recreational water risk estimation, even under wet weather conditions and describes a process that might be useful in developing site-specific water quality criteria in this and other locations.
Collapse
Affiliation(s)
- Jeffrey A Soller
- Soller Environmental, LLC, 3022 King St., Berkeley, CA 94703, USA.
| | - Mary Schoen
- Soller Environmental, LLC, 3022 King St., Berkeley, CA 94703, USA
| | - Joshua A Steele
- Southern California Coastal Water Research Project, 3535 Harbor Blvd #110, Costa Mesa, CA 92626, USA
| | - John F Griffith
- Southern California Coastal Water Research Project, 3535 Harbor Blvd #110, Costa Mesa, CA 92626, USA
| | - Kenneth C Schiff
- Southern California Coastal Water Research Project, 3535 Harbor Blvd #110, Costa Mesa, CA 92626, USA
| |
Collapse
|
40
|
Wiegner TN, Edens CJ, Abaya LM, Carlson KM, Lyon-Colbert A, Molloy SL. Spatial and temporal microbial pollution patterns in a tropical estuary during high and low river flow conditions. MARINE POLLUTION BULLETIN 2017; 114:952-961. [PMID: 27866724 DOI: 10.1016/j.marpolbul.2016.11.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Revised: 11/01/2016] [Accepted: 11/10/2016] [Indexed: 05/19/2023]
Abstract
Spatial and temporal patterns of coastal microbial pollution are not well documented. Our study examined these patterns through measurements of fecal indicator bacteria (FIB), nutrients, and physiochemical parameters in Hilo Bay, Hawai'i, during high and low river flow. >40% of samples tested positive for the human-associated Bacteroides marker, with highest percentages near rivers. Other FIB were also higher near rivers, but only Clostridium perfringens concentrations were related to discharge. During storms, FIB concentrations were three times to an order of magnitude higher, and increased with decreasing salinity and water temperature, and increasing turbidity. These relationships and high spatial resolution data for these parameters were used to create Enterococcus spp. and C. perfringens maps that predicted exceedances with 64% and 95% accuracy, respectively. Mapping microbial pollution patterns and predicting exceedances is a valuable tool that can improve water quality monitoring and aid in visualizing FIB hotspots for management actions.
Collapse
Affiliation(s)
- T N Wiegner
- Marine Science Department. University of Hawai'i at Hilo, 200 W. Kawili St., Hilo, HI 96720, United States.
| | - C J Edens
- Tropical Conservation Biology and Environmental Science Graduate Program, University of Hawai'i at Hilo, 200 W. Kawili St., Hilo, HI 96720, United States.
| | - L M Abaya
- Tropical Conservation Biology and Environmental Science Graduate Program, University of Hawai'i at Hilo, 200 W. Kawili St., Hilo, HI 96720, United States.
| | - K M Carlson
- Marine Science Department, University of Hawai'i at Hilo, 200 W. Kawili St., Hilo, HI 96720, United States.
| | - A Lyon-Colbert
- Amber Lyon-Colbert, M.S., Department of Biological Sciences, California State University, East Bay, Hayward, CA 94542, United States.
| | - S L Molloy
- Department of Biological Sciences, California State University, East Bay, Hayward, CA 94542, United States.
| |
Collapse
|
41
|
Bedri Z, Corkery A, O'Sullivan JJ, Deering LA, Demeter K, Meijer WG, O'Hare G, Masterson B. Evaluating a microbial water quality prediction model for beach management under the revised EU Bathing Water Directive. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2016; 167:49-58. [PMID: 26613350 DOI: 10.1016/j.jenvman.2015.10.046] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Revised: 10/21/2015] [Accepted: 10/28/2015] [Indexed: 06/05/2023]
Abstract
The revised Bathing Water Directive (2006/7/EC) requires EU member states to minimise the risk to public health from faecal pollution at bathing waters through improved monitoring and management approaches. While increasingly sophisticated measurement methods (such as microbial source tracking) assist in the management of bathing water resources, the use of deterministic predictive models for this purpose, while having the potential to provide decision making support, remains less common. This study explores an integrated, deterministic catchment-coastal hydro-environmental model as a decision-making tool for beach management which, based on advance predictions of bathing water quality, can inform beach managers on appropriate management actions (to prohibit bathing or advise the public not to bathe) in the event of a poor water quality forecast. The model provides a 'moving window' five-day forecast of Escherichia coli levels at a bathing water compliance point off the Irish coast and the accuracy of bathing water management decisions were investigated for model predictions under two scenarios over the period from the 11th August to the 5th September, 2012. Decisions for Scenario 1 were based on model predictions where rainfall forecasts from a meteorological source (www.yr.no) were used to drive the rainfall-runoff processes in the catchment component of the model, and for Scenario 2, were based on predictions that were improved by incorporating real-time rainfall data from a sensor network within the catchment into the forecasted meteorological input data. The accuracy of the model in the decision-making process was assessed using the contingency table and its metrics. The predictive model gave reasonable outputs to support appropriate decision making for public health protection. Scenario 1 provided real-time predictions that, on 77% of instances during the study period where both predicted and E. coli concentrations were available, would correctly inform a beach manager to either take action to mitigate for poor bathing water quality or take no action. However, Scenario 1 also provided data to support a decision to take action (when none was necessary - a type I error) in 4% of instances and to take no action (when action was required - a type II error) in 19% of the instances analysed. Type II errors are critical in terms of public health protection given that for this error, bathers can be exposed to risks from poor bathing water quality. Scenario 2, on the other hand, provided predictions that would support correct management actions for 79% of the instances but would result in type I and type II errors for 4% and 17% of the instances respectively. Comparison of Scenarios 1 and 2 for this study indicate that Scenario 2 gave a marginally better overall performance in terms of supporting correct management decisions, as it provided data that could result in a lower occurrence of the more critical type II errors. Given that the 28 member states of the European Union are required to engage with the public health provisions of the revised Bathing Water Directive, issues of compliance, pertaining particularly to the management of bathing water resources, remain topical. Decision supports for managing bathing waters in the context of the Directive are likely to become the focus of much attention and although, the current study has been validated in bathing waters off the east coast of Ireland, the approach of using a deterministic and integrated catchment-coastal model for such purposes is easily transferable to other bathing water jurisdictions.
Collapse
Affiliation(s)
- Zeinab Bedri
- Centre for Water Resources Research, School of Civil, Structural, and Environmental Engineering, University College Dublin, Belfield, Dublin 4, Ireland.
| | - Aisling Corkery
- Centre for Water Resources Research, School of Civil, Structural, and Environmental Engineering, University College Dublin, Belfield, Dublin 4, Ireland
| | - John J O'Sullivan
- Centre for Water Resources Research, School of Civil, Structural, and Environmental Engineering, University College Dublin, Belfield, Dublin 4, Ireland; UCD Earth Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - Louise A Deering
- School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Katalin Demeter
- School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Wim G Meijer
- School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin 4, Ireland; UCD Earth Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - Gregory O'Hare
- Clarity Centre, UCD School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland; UCD Earth Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - Bartholomew Masterson
- School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin 4, Ireland
| |
Collapse
|
42
|
Thoe W, Choi KW, Lee JHW. Predicting 'very poor' beach water quality gradings using classification tree. JOURNAL OF WATER AND HEALTH 2016; 14:97-108. [PMID: 26837834 DOI: 10.2166/wh.2015.094] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A beach water quality prediction system has been developed in Hong Kong using multiple linear regression (MLR) models. However, linear models are found to be weak at capturing the infrequent 'very poor' water quality occasions when Escherichia coli (E. coli) concentration exceeds 610 counts/100 mL. This study uses a classification tree to increase the accuracy in predicting the 'very poor' water quality events at three Hong Kong beaches affected either by non-point source or point source pollution. Binary-output classification trees (to predict whether E. coli concentration exceeds 610 counts/100 mL) are developed over the periods before and after the implementation of the Harbour Area Treatment Scheme, when systematic changes in water quality were observed. Results show that classification trees can capture more 'very poor' events in both periods when compared to the corresponding linear models, with an increase in correct positives by an average of 20%. Classification trees are also developed at two beaches to predict the four-category Beach Water Quality Indices. They perform worse than the binary tree and give excessive false alarms of 'very poor' events. Finally, a combined modelling approach using both MLR model and classification tree is proposed to enhance the beach water quality prediction system for Hong Kong.
Collapse
Affiliation(s)
- Wai Thoe
- Department of Civil and Environmental Engineering, Environmental and Water Studies, Stanford University, Stanford, CA 94305, USA E-mail:
| | - King Wah Choi
- Department of Civil and Environmental Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Joseph Hun-wei Lee
- Department of Civil and Environmental Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| |
Collapse
|
43
|
Wang J, Song P, Wang Z, Zhang B, Liu W, Yu J. A Combined Model for Regional Eco-environmental Quality Evaluation Based on Particle Swarm Optimization–Radial Basis Function Network. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2015. [DOI: 10.1007/s13369-015-1958-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
44
|
Thoe W, Gold M, Griesbach A, Grimmer M, Taggart ML, Boehm AB. Sunny with a chance of gastroenteritis: predicting swimmer risk at California beaches. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2015; 49:423-431. [PMID: 25489920 DOI: 10.1021/es504701j] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Traditional beach management that uses concentrations of cultivatable fecal indicator bacteria (FIB) may lead to delayed notification of unsafe swimming conditions. Predictive, nowcast models of beach water quality may help reduce beach management errors and enhance protection of public health. This study compares performances of five different types of statistical, data-driven predictive models: multiple linear regression model, binary logistic regression model, partial least-squares regression model, artificial neural network, and classification tree, in predicting advisories due to FIB contamination at 25 beaches along the California coastline. Classification tree and the binary logistic regression model with threshold tuning are consistently the best performing model types for California beaches. Beaches with good performing models usually have a rainfall/flow related dominating factor affecting beach water quality, while beaches having a deteriorating water quality trend or low FIB exceedance rates are less likely to have a good performing model. This study identifies circumstances when predictive models are the most effective, and suggests that using predictive models for public notification of unsafe swimming conditions may improve public health protection at California beaches relative to current practices.
Collapse
Affiliation(s)
- W Thoe
- Department of Civil and Environmental Engineering, Environmental and Water Studies, Stanford University , Stanford, California 94305, United States
| | | | | | | | | | | |
Collapse
|