1
|
Moeinzadeh H, Yong KT, Withana A. A critical analysis of parameter choices in water quality assessment. WATER RESEARCH 2024; 258:121777. [PMID: 38781620 DOI: 10.1016/j.watres.2024.121777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 04/25/2024] [Accepted: 05/12/2024] [Indexed: 05/25/2024]
Abstract
The determination of water quality heavily depends on the selection of parameters recorded from water samples for the water quality index (WQI). Data-driven methods, including machine learning models and statistical approaches, are frequently used to refine the parameter set for four main reasons: reducing cost and uncertainty, addressing the eclipsing problem, and enhancing the performance of models predicting the WQI. Despite their widespread use, there is a noticeable gap in comprehensive reviews that systematically examine previous studies in this area. Such reviews are essential to assess the validity of these objectives and to demonstrate the effectiveness of data-driven methods in achieving these goals. This paper sets out with two primary aims: first, to provide a review of the existing literature on methods for selecting parameters. Second, it seeks to delineate and evaluate the four principal motivations for parameter selection identified in the literature. This manuscript categorizes existing studies into two methodological groups for refining parameters: one focuses on preserving information within the dataset, and another ensures consistent prediction using the full set of parameters. It characterizes each group and evaluates how effectively each approach meets the four predefined objectives. The study presents that the minimal WQI approach, common to both categories, is the only approach that has successfully reduced recording costs. Nonetheless, it notes that simply reducing the number of parameters does not guarantee cost savings. Furthermore, the group of studies classified as preserving information within the dataset has demonstrated potential to decrease the eclipsing problem, whereas studies in the consistent prediction group have not been able to mitigate this issue. Additionally, since data-driven approaches still rely on the initial parameters chosen by experts, they do not eliminate the need for expert judgment. The study further points out that the WQI formula is a straightforward and expedient tool for assessing water quality. Consequently, the paper argues that employing machine learning solely to reduce the number of parameters to enhance WQI prediction is not a standalone solution. Rather, this objective should be integrated with a more comprehensive set of research goals. The critical analysis of research objectives and the characterization of previous studies lay the groundwork for future research. This groundwork will enable subsequent studies to evaluate how their proposed methods can effectively achieve these objectives.
Collapse
Affiliation(s)
- Hossein Moeinzadeh
- School of Computer Science, The University of Sydney, Sydney, 2006, New South Wales, Australia.
| | - Ken-Tye Yong
- School of Computer Science, The University of Sydney, Sydney, 2006, New South Wales, Australia; School of Biomedical Engineering, The University of Sydney, Sydney, 2006, New South Wales, Australia; Sydney Nano, The University of Sydney, Sydney, 2006, New South Wales, Australia
| | - Anusha Withana
- School of Computer Science, The University of Sydney, Sydney, 2006, New South Wales, Australia; Sydney Nano, The University of Sydney, Sydney, 2006, New South Wales, Australia
| |
Collapse
|
2
|
Lloyd SD, Carvajal G, Campey M, Taylor N, Osmond P, Roser DJ, Khan SJ. Predicting recreational water quality and public health safety in urban estuaries using Bayesian Networks. WATER RESEARCH 2024; 254:121319. [PMID: 38422692 DOI: 10.1016/j.watres.2024.121319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 02/05/2024] [Accepted: 02/14/2024] [Indexed: 03/02/2024]
Abstract
To support the reactivation of urban rivers and estuaries for bathing while ensuring public safety, it is critical to have access to real-time information on microbial water quality and associated health risks. Predictive modelling can provide this information, though challenges concerning the optimal size of training data, model transferability, and communication of uncertainty still need attention. Further, urban estuaries undergo distinctive hydrological variations requiring tailored modelling approaches. This study assessed the use of Bayesian Networks (BNs) for the prediction of enterococci exceedances and extrapolation of health risks at planned bathing sites in an urban estuary in Sydney, Australia. The transferability of network structures between sites was assessed. Models were validated using a novel application of the k-fold walk-forward validation procedure and further tested using independent compliance and event-based sampling datasets. Learning curves indicated the model's sensitivity reached a minimum performance threshold of 0.8 once training data included ≥ 400 observations. It was demonstrated that Semi-Naïve BN structures can be transferred while maintaining stable predictive performance. In all sites, salinity and solar exposure had the greatest influence on Posterior Probability Distributions (PPDs), when combined with antecedent rainfall. The BNs provided a novel and transparent framework to quantify and visualise enterococci, stormwater impact, health risks, and associated uncertainty under varying environmental conditions. This study has advanced the application of BNs in predicting recreational water quality and providing decision support in urban estuarine settings, proposed for bathing, where uncertainty is high.
Collapse
Affiliation(s)
- Simon D Lloyd
- School of Built Environment, University of New South Wales, NSW, Australia.
| | - Guido Carvajal
- Facultad de Ingeniería, Universidad Andrés Bello, Antonio Varas 880, Providencia, Santiago, Chile
| | - Meredith Campey
- Beachwatch, NSW Department of Planning and Environment, NSW, Australia
| | | | - Paul Osmond
- School of Built Environment, University of New South Wales, NSW, Australia
| | - David J Roser
- School of Civil and Environmental Engineering, University of New South Wales, NSW, Australia
| | - Stuart J Khan
- School of Civil Engineering, University of Sydney, NSW, Australia
| |
Collapse
|
3
|
Peng T, Xiong J, Sun K, Qian S, Tao Z, Nazir MS, Zhang C. Research and application of a novel selective stacking ensemble model based on error compensation and parameter optimization for AQI prediction. ENVIRONMENTAL RESEARCH 2024; 247:118176. [PMID: 38215922 DOI: 10.1016/j.envres.2024.118176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 12/11/2023] [Accepted: 01/09/2024] [Indexed: 01/14/2024]
Abstract
With the ongoing process of industrialization, the issue of declining air quality is increasingly becoming a critical concern. Accurate prediction of the Air Quality Index (AQI), considered as an all-inclusive measure representing the extent of pollutants present in the atmosphere, is of paramount importance. This study introduces a novel methodology that combines stacking ensemble and error correction to improve AQI prediction. Additionally, the reptile search algorithm (RSA) is employed for optimizing model parameters. In this study, four distinct regional AQI data containing a collection of 34864 data samples are collected. Initially, we perform cross-validation on ten commonly used single models to obtain prediction results. Then, based on evaluation indices, five models are selected for ensemble. The results of the study show that the model proposed in this paper achieves an improvement of around 10% in terms of accuracy when compared to the conventional model. Thus, the model introduced in this study offers a more scientifically grounded approach in tackling air pollution.
Collapse
Affiliation(s)
- Tian Peng
- Faculty of Automation, Huaiyin Institute of Technology, Huai'an, 223003, China; Jiangsu Permanent Magnet Motor Engineering Research Center, Huaiyin Institute of Technology, Huai'an, 223003, China.
| | - Jinlin Xiong
- Faculty of Automation, Huaiyin Institute of Technology, Huai'an, 223003, China
| | - Kai Sun
- Faculty of Automation, Huaiyin Institute of Technology, Huai'an, 223003, China
| | - Shijie Qian
- Faculty of Automation, Huaiyin Institute of Technology, Huai'an, 223003, China
| | - Zihan Tao
- Faculty of Automation, Huaiyin Institute of Technology, Huai'an, 223003, China
| | | | - Chu Zhang
- Faculty of Automation, Huaiyin Institute of Technology, Huai'an, 223003, China; Jiangsu Permanent Magnet Motor Engineering Research Center, Huaiyin Institute of Technology, Huai'an, 223003, China.
| |
Collapse
|
4
|
Essamlali I, Nhaila H, El Khaili M. Advances in machine learning and IoT for water quality monitoring: A comprehensive review. Heliyon 2024; 10:e27920. [PMID: 38533055 PMCID: PMC10963334 DOI: 10.1016/j.heliyon.2024.e27920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 02/22/2024] [Accepted: 03/08/2024] [Indexed: 03/28/2024] Open
Abstract
Water holds great significance as a vital resource in our everyday lives, highlighting the important to continuously monitor its quality to ensure its usability. The advent of the. The Internet of Things (IoT) has brought about a revolutionary shift by enabling real-time data collection from diverse sources, thereby facilitating efficient monitoring of water quality (WQ). By employing Machine learning (ML) techniques, this gathered data can be analyzed to make accurate predictions regarding water quality. These predictive insights play a crucial role in decision-making processes aimed at safeguarding water quality, such as identifying areas in need of immediate attention and implementing preventive measures to avert contamination. This paper aims to provide a comprehensive review of the current state of the art in water quality monitoring, with a specific focus on the employment of IoT wireless technologies and ML techniques. The study examines the utilization of a range of IoT wireless technologies, including Low-Power Wide Area Networks (LpWAN), Wi-Fi, Zigbee, Radio Frequency Identification (RFID), cellular networks, and Bluetooth, in the context of monitoring water quality. Furthermore, it explores the application of both supervised and unsupervised ML algorithms for analyzing and interpreting the collected data. In addition to discussing the current state of the art, this survey also addresses the challenges and open research questions involved in integrating IoT wireless technologies and ML for water quality monitoring (WQM).
Collapse
Affiliation(s)
- Ismail Essamlali
- Electrical Engineering and Intelligent Systems Laboratory, ENSET Mohammedia, Hassan 2nd University of Casablanca, Mail Box 159, Morocco
| | - Hasna Nhaila
- Electrical Engineering and Intelligent Systems Laboratory, ENSET Mohammedia, Hassan 2nd University of Casablanca, Mail Box 159, Morocco
| | - Mohamed El Khaili
- Electrical Engineering and Intelligent Systems Laboratory, ENSET Mohammedia, Hassan 2nd University of Casablanca, Mail Box 159, Morocco
| |
Collapse
|
5
|
Sakizadeh M, Zhang C, Milewski A. Spatial distribution pattern and health risk of groundwater contamination by cadmium, manganese, lead and nitrate in groundwater of an arid area. ENVIRONMENTAL GEOCHEMISTRY AND HEALTH 2024; 46:80. [PMID: 38367130 DOI: 10.1007/s10653-023-01845-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 12/21/2023] [Indexed: 02/19/2024]
Abstract
Combining the results of base models to create a meta-model is one of the ensemble approaches known as stacking. In this study, stacking of five base learners, including eXtreme gradient boosting, random forest, feed-forward neural networks, generalized linear models with Lasso or Elastic Net regularization, and support vector machines, was used to study the spatial variation of Mn, Cd, Pb, and nitrate in Qom-Kahak Aquifers, Iran. The stacking strategy proved to be an effective substitute predictor for existing machine learning approaches due to its high accuracy and stability when compared to individual learners. Contrarily, there was not any best-performing base model for all of the involved parameters. For instance, in the case of cadmium, random forest produced the best results, with adjusted R2 and RMSE of 0.108 and 0.014, as opposed to 0.337 and 0.013 obtained by the stacking method. The Mn and Cd showed a tight link with phosphate by the redundancy analysis (RDA). This demonstrates the effect of phosphate fertilizers on agricultural operations. In order to analyze the causes of groundwater pollution, spatial methodologies can be used with multivariate analytic techniques, such as RDA, to help uncover hidden sources of contamination that would otherwise go undetected. Lead has a larger health risk than nitrate, according to the probabilistic health risk assessment, which found that 34.4% and 6.3% of the simulated values for children and adults, respectively, were higher than HQ = 1. Furthermore, cadmium exposure risk affected 84% of children and 47% of adults in the research area.
Collapse
Affiliation(s)
- Mohamad Sakizadeh
- Department of Environmental Sciences, Shahid Rajaee Teacher Training University, Lavizan, 1678815811, Tehran, Iran.
| | - Chaosheng Zhang
- International Network for Environment and Health (INEH), School of Geography, Archaeology and Irish Studies, University of Galway, Galway, Ireland
| | - Adam Milewski
- Department of Geology, University of Georgia, Athens, USA
| |
Collapse
|
6
|
Tselemponis A, Stefanis C, Giorgi E, Kalmpourtzi A, Olmpasalis I, Tselemponis A, Adam M, Kontogiorgis C, Dokas IM, Bezirtzoglou E, Constantinidis TC. Coastal Water Quality Modelling Using E. coli, Meteorological Parameters and Machine Learning Algorithms. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:6216. [PMID: 37444064 PMCID: PMC10341787 DOI: 10.3390/ijerph20136216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 06/19/2023] [Accepted: 06/21/2023] [Indexed: 07/15/2023]
Abstract
In this study, machine learning models were implemented to predict the classification of coastal waters in the region of Eastern Macedonia and Thrace (EMT) concerning Escherichia coli (E. coli) concentration and weather variables in the framework of the Directive 2006/7/EC. Six sampling stations of EMT, located on beaches of the regional units of Kavala, Xanthi, Rhodopi, Evros, Thasos and Samothraki, were selected. All 1039 samples were collected from May to September within a 14-year follow-up period (2009-2021). The weather parameters were acquired from nearby meteorological stations. The samples were analysed according to the ISO 9308-1 for the detection and the enumeration of E. coli. The vast majority of the samples fall into category 1 (Excellent), which is a mark of the high quality of the coastal waters of EMT. The experimental results disclose, additionally, that two-class classifiers, namely Decision Forest, Decision Jungle and Boosted Decision Tree, achieved high Accuracy scores over 99%. In addition, comparing our performance metrics with those of other researchers, diversity is observed in using algorithms for water quality prediction, with algorithms such as Decision Tree, Artificial Neural Networks and Bayesian Belief Networks demonstrating satisfactory results. Machine learning approaches can provide critical information about the dynamic of E. coli contamination and, concurrently, consider the meteorological parameters for coastal waters classification.
Collapse
Affiliation(s)
- Athanasios Tselemponis
- Laboratory of Hygiene and Environmental Protection, Medical School, Democritus University of Thrace, 68100 Alexandroupoli, Greece; (A.T.); (E.G.); (A.K.); (I.O.); (A.T.); (M.A.); (C.K.); (E.B.); (T.C.C.)
| | - Christos Stefanis
- Laboratory of Hygiene and Environmental Protection, Medical School, Democritus University of Thrace, 68100 Alexandroupoli, Greece; (A.T.); (E.G.); (A.K.); (I.O.); (A.T.); (M.A.); (C.K.); (E.B.); (T.C.C.)
| | - Elpida Giorgi
- Laboratory of Hygiene and Environmental Protection, Medical School, Democritus University of Thrace, 68100 Alexandroupoli, Greece; (A.T.); (E.G.); (A.K.); (I.O.); (A.T.); (M.A.); (C.K.); (E.B.); (T.C.C.)
| | - Aikaterini Kalmpourtzi
- Laboratory of Hygiene and Environmental Protection, Medical School, Democritus University of Thrace, 68100 Alexandroupoli, Greece; (A.T.); (E.G.); (A.K.); (I.O.); (A.T.); (M.A.); (C.K.); (E.B.); (T.C.C.)
| | - Ioannis Olmpasalis
- Laboratory of Hygiene and Environmental Protection, Medical School, Democritus University of Thrace, 68100 Alexandroupoli, Greece; (A.T.); (E.G.); (A.K.); (I.O.); (A.T.); (M.A.); (C.K.); (E.B.); (T.C.C.)
| | - Antonios Tselemponis
- Laboratory of Hygiene and Environmental Protection, Medical School, Democritus University of Thrace, 68100 Alexandroupoli, Greece; (A.T.); (E.G.); (A.K.); (I.O.); (A.T.); (M.A.); (C.K.); (E.B.); (T.C.C.)
| | - Maria Adam
- Laboratory of Hygiene and Environmental Protection, Medical School, Democritus University of Thrace, 68100 Alexandroupoli, Greece; (A.T.); (E.G.); (A.K.); (I.O.); (A.T.); (M.A.); (C.K.); (E.B.); (T.C.C.)
| | - Christos Kontogiorgis
- Laboratory of Hygiene and Environmental Protection, Medical School, Democritus University of Thrace, 68100 Alexandroupoli, Greece; (A.T.); (E.G.); (A.K.); (I.O.); (A.T.); (M.A.); (C.K.); (E.B.); (T.C.C.)
| | - Ioannis M. Dokas
- Department of Civil Engineering, Democritus University of Thrace, 69100 Komotini, Greece;
| | - Eugenia Bezirtzoglou
- Laboratory of Hygiene and Environmental Protection, Medical School, Democritus University of Thrace, 68100 Alexandroupoli, Greece; (A.T.); (E.G.); (A.K.); (I.O.); (A.T.); (M.A.); (C.K.); (E.B.); (T.C.C.)
| | - Theodoros C. Constantinidis
- Laboratory of Hygiene and Environmental Protection, Medical School, Democritus University of Thrace, 68100 Alexandroupoli, Greece; (A.T.); (E.G.); (A.K.); (I.O.); (A.T.); (M.A.); (C.K.); (E.B.); (T.C.C.)
| |
Collapse
|
7
|
Yang R, Liu H, Li Y. Quantifying uncertainty of marine water quality forecasts for environmental management using a dynamic multi-factor analysis and multi-resolution ensemble approach. CHEMOSPHERE 2023; 331:138831. [PMID: 37137396 DOI: 10.1016/j.chemosphere.2023.138831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/25/2023] [Accepted: 04/30/2023] [Indexed: 05/05/2023]
Abstract
Unpredictable climate change and human activities pose enormous challenges to assessing the water quality components in the marine environment. Accurately quantifying the uncertainty of water quality forecasts can help decision-makers implement more scientific water pollution management strategies. This work introduces a new method of uncertainty quantification driven by point prediction for solving the engineering problem of water quality forecasting under the influence of complex environmental factors. The constructed multi-factor correlation analysis system can dynamically adjust the combined weight of environmental indicators according to the performance, thereby increasing the interpretability of data fusion. The designed singular spectrum analysis is utilized to reduce the volatility of the original water quality data. The real-time decomposition technique cleverly avoids the problem of data leakage. The multi-resolution-multi-objective optimization ensemble method is adopted to absorb the characteristics of different resolution data, so as to mine deeper potential information. Experimental studies are conducted using 6 actual water quality high-resolution signals with 21,600 sampling points from the Pacific islands and corresponding low-resolution signals with 900 sampling points, including temperature, salinity, turbidity, chlorophyll, dissolved oxygen, and oxygen saturation. The results illustrate that the model is superior to the existing model in quantifying the uncertainty of water quality prediction.
Collapse
Affiliation(s)
- Rui Yang
- Institute of Artificial Intelligence and Robotics (IAIR), Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic and Transportation Engineering, Central South University, Changsha, 410075, Hunan, China
| | - Hui Liu
- Institute of Artificial Intelligence and Robotics (IAIR), Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic and Transportation Engineering, Central South University, Changsha, 410075, Hunan, China.
| | - Yanfei Li
- School of Mechatronic Engineering, Hunan Agricultural University, Changsha, 410128, Hunan, China
| |
Collapse
|
8
|
Zheng HL, An SY, Qiao BJ, Guan P, Huang DS, Wu W. A data-driven interpretable ensemble framework based on tree models for forecasting the occurrence of COVID-19 in the USA. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:13648-13659. [PMID: 36131178 PMCID: PMC9492466 DOI: 10.1007/s11356-022-23132-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 09/16/2022] [Indexed: 06/15/2023]
Abstract
This prevalence of coronavirus disease 2019 (COVID-19) has become one of the most serious public health crises. Tree-based machine learning methods, with the advantages of high efficiency, and strong interpretability, have been widely used in predicting diseases. A data-driven interpretable ensemble framework based on tree models was designed to forecast daily new cases of COVID-19 in the USA and to determine the important factors related to COVID-19. Based on a hyperparametric optimization technique, we developed three machine learning algorithms based on decision trees, including random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), and three linear ensemble models were used to integrate these outcomes for better prediction accuracy. Finally, the SHapley Additive explanation (SHAP) value was used to obtain the feature importance ranking. Our outcomes demonstrated that, among the three basic machine learners, the prediction accuracy was the following in descending order: LightGBM, XGBoost, and RF. The optimized LAD ensemble was the most precise prediction model that reduced the prediction error of the best base learner (LightGBM) by approximately 3.111%, while vaccination, wearing masks, less mobility, and government interventions had positive effects on the control and prevention of COVID-19.
Collapse
Affiliation(s)
- Hu-Li Zheng
- Department of Epidemiology, School of Public Health, China Medical University, No. 77 Puhe Road, Shenyang, Liaoning Province China
| | - Shu-Yi An
- Liaoning Provincial Center for Disease Control and Prevention, Shenyang, Liaoning China
| | - Bao-Jun Qiao
- Liaoning Provincial Center for Disease Control and Prevention, Shenyang, Liaoning China
| | - Peng Guan
- Department of Epidemiology, School of Public Health, China Medical University, No. 77 Puhe Road, Shenyang, Liaoning Province China
| | - De-Sheng Huang
- Department of Mathematics, School of Intelligent Medicine, China Medical University, Shenyang, Liaoning China
| | - Wei Wu
- Department of Epidemiology, School of Public Health, China Medical University, No. 77 Puhe Road, Shenyang, Liaoning Province China
| |
Collapse
|
9
|
Lučin I, Družeta S, Mauša G, Alvir M, Grbčić L, Lušić DV, Sikirica A, Kranjčević L. Predictive modeling of microbiological seawater quality in karst region using cascade model. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 851:158009. [PMID: 35987218 DOI: 10.1016/j.scitotenv.2022.158009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 08/06/2022] [Accepted: 08/09/2022] [Indexed: 06/15/2023]
Abstract
This paper presents an in-depth analysis of seawater quality measurements during the bathing seasons from year 2009 to 2020 in the city of Rijeka, Croatia. Due to rare occurrences of measurements with less than excellent water quality, considered dataset is deeply imbalanced. Additionally, it incorporates measurements under the influence of submerged groundwater discharges (SGD), which were observed in some bathing locations. These discharges were previously thought to dry up during the summer season and are now suspected to be one of the causes of increased Escherichia coli values. Consequently, and in view of the fact that the accuracy of prediction models can be significantly influenced by temporal and spatial variation of the input data, a novel cascade prediction modeling strategy was proposed. It consists of a sequence of prediction models which tend to identify general environmental conditions which confidently lead to excellent bathing water quality. The proposed model uses environmental features which can rather easily be estimated or obtained from the weather forecast. The model was trained on a highly biased dataset, consisting of data from locations with and without SGD influence, and for the time period spanning extremely dry and warm seasons, extremely wet seasons, as well as normal seasons. To simulate realistic application, the model was tested using temporal and spatial stratification of data. The cascade strategy was shown to be a good approach for reliably detecting environmental parameters which produce excellent water quality. Proposed model is designed as a filter method, where instances classified as less-than-excellent water quality require further analysis. The cascade model provides great flexibility as it can be customized to the particular needs of the investigated area and dataset specifics.
Collapse
Affiliation(s)
- Ivana Lučin
- Department of Fluid Mechanics and Computational Engineering, Faculty of Engineering, University of Rijeka, Vukovarska 58, Rijeka 51000, Croatia; Center for Advanced Computing and Modelling, University of Rijeka, Radmile Matejčić 2, Rijeka 51000, Croatia
| | - Siniša Družeta
- Department of Fluid Mechanics and Computational Engineering, Faculty of Engineering, University of Rijeka, Vukovarska 58, Rijeka 51000, Croatia; Center for Advanced Computing and Modelling, University of Rijeka, Radmile Matejčić 2, Rijeka 51000, Croatia
| | - Goran Mauša
- Department of Computer Engineering, Faculty of Engineering, University of Rijeka, Vukovarska 58, Rijeka 51000, Croatia; Center for Advanced Computing and Modelling, University of Rijeka, Radmile Matejčić 2, Rijeka 51000, Croatia
| | - Marta Alvir
- Department of Fluid Mechanics and Computational Engineering, Faculty of Engineering, University of Rijeka, Vukovarska 58, Rijeka 51000, Croatia
| | - Luka Grbčić
- Department of Fluid Mechanics and Computational Engineering, Faculty of Engineering, University of Rijeka, Vukovarska 58, Rijeka 51000, Croatia; Center for Advanced Computing and Modelling, University of Rijeka, Radmile Matejčić 2, Rijeka 51000, Croatia
| | - Darija Vukić Lušić
- Center for Advanced Computing and Modelling, University of Rijeka, Radmile Matejčić 2, Rijeka 51000, Croatia; Department of Environmental Health, Faculty of Medicine, University of Rijeka, Braće Branchetta 20/1, Rijeka 51000, Croatia; Department of Environmental Health, Teaching Institute of Public Health of Primorje-Gorski Kotar County, Krešimirova 52a, Rijeka 51000, Croatia
| | - Ante Sikirica
- Department of Fluid Mechanics and Computational Engineering, Faculty of Engineering, University of Rijeka, Vukovarska 58, Rijeka 51000, Croatia; Center for Advanced Computing and Modelling, University of Rijeka, Radmile Matejčić 2, Rijeka 51000, Croatia
| | - Lado Kranjčević
- Department of Fluid Mechanics and Computational Engineering, Faculty of Engineering, University of Rijeka, Vukovarska 58, Rijeka 51000, Croatia; Center for Advanced Computing and Modelling, University of Rijeka, Radmile Matejčić 2, Rijeka 51000, Croatia.
| |
Collapse
|
10
|
Prediction and Interpretation of Water Quality Recovery after a Disturbance in a Water Treatment System Using Artificial Intelligence. WATER 2022. [DOI: 10.3390/w14152423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In this study, an ensemble machine learning model was developed to predict the recovery rate of water quality in a water treatment plant after a disturbance. XGBoost, one of the most popular ensemble machine learning models, was used as the main framework of the model. Water quality and operational data observed in a pilot plant were used to train and test the model. Disturbance was determined when the observed turbidity was higher than the given turbidity criteria. Therefore, the recovery rate of water quality at a time t was defined during the falling limb of the turbidity recovery period. It was considered as a relative ratio of the differences between the peak and observed turbidities at time t to the difference between the peak turbidity and turbidity criteria. The root mean square error–observation standard deviation ratio of the XGBoost model improved from 0.730 to 0.373 by pretreatment, removing the observation for the rising limb of the disturbance from the training data. Moreover, Shapley value analysis, a novel explainable artificial intelligence method, was used to provide a reasonable interpretation of the model’s performance.
Collapse
|
11
|
Fei S, Hassan MA, Xiao Y, Su X, Chen Z, Cheng Q, Duan F, Chen R, Ma Y. UAV-based multi-sensor data fusion and machine learning algorithm for yield prediction in wheat. PRECISION AGRICULTURE 2022; 24:187-212. [PMID: 35967193 PMCID: PMC9362526 DOI: 10.1007/s11119-022-09938-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 06/30/2022] [Indexed: 05/31/2023]
Abstract
UNLABELLED Early prediction of grain yield helps scientists to make better breeding decisions for wheat. Use of machine learning (ML) methods for fusion of unmanned aerial vehicle (UAV)-based multi-sensor data can improve the prediction accuracy of crop yield. For this, five ML algorithms including Cubist, support vector machine (SVM), deep neural network (DNN), ridge regression (RR) and random forest (RF) were used for multi-sensor data fusion and ensemble learning for grain yield prediction in wheat. A set of thirty wheat cultivars and breeding lines were grown under three irrigation treatments i.e., light, moderate and high irrigation treatments to evaluate the yield prediction capabilities of a low-cost multi-sensor (RGB, multi-spectral and thermal infrared) UAV platform. Multi-sensor data fusion-based yield prediction showed higher accuracy compared to individual-sensor data in each ML model. The coefficient of determination (R 2) values for Cubist, SVM, DNN and RR models regarding grain yield prediction were observed from 0.527 to 0.670. Moreover, the results of ensemble learning through integrating the above models illustrated further increase in accuracy. The predictions of ensemble learning showed high R 2 values up to 0.692, which was higher as compared to individual ML models across the multi-sensor data. Root mean square error (RMSE), residual prediction deviation (RPD) and ratio of prediction performance to inter-quartile range (RPIQ) were calculated to be 0.916 t ha-1, 1.771 and 2.602, respectively. The results proved that low altitude UAV-based multi-sensor data can be used for early grain yield prediction using data fusion and an ensemble learning framework with high accuracy. This high-throughput phenotyping approach is valuable for improving the efficiency of selection in large breeding activities. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11119-022-09938-8.
Collapse
Affiliation(s)
- Shuaipeng Fei
- Institute of Farmland Irrigation, Chinese Academy of Agricultural Sciences, Xinxiang, 453002 China
| | - Muhammad Adeel Hassan
- National Wheat Improvement Centre, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
- Dezhou Academy of Agricultural Sciences, Dezhou, 253050 China
| | - Yonggui Xiao
- National Wheat Improvement Centre, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
| | - Xin Su
- Water Diversion and Irrigation Engineering Technology Center, Yellow River Institute of Hydraulic Research, Zhengzhou, 450003 China
| | - Zhen Chen
- Institute of Farmland Irrigation, Chinese Academy of Agricultural Sciences, Xinxiang, 453002 China
| | - Qian Cheng
- Institute of Farmland Irrigation, Chinese Academy of Agricultural Sciences, Xinxiang, 453002 China
| | - Fuyi Duan
- Institute of Farmland Irrigation, Chinese Academy of Agricultural Sciences, Xinxiang, 453002 China
| | - Riqiang Chen
- School of Information Science and Technology, Beijing Forestry University, Beijing, 100083 China
| | - Yuntao Ma
- College of Land Science and Technology, China Agricultural University, Beijing, 100193 China
| |
Collapse
|
12
|
Li Z, Zhang C, Liu H, Zhang C, Zhao M, Gong Q, Fu G. Developing stacking ensemble models for multivariate contamination detection in water distribution systems. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 828:154284. [PMID: 35247409 DOI: 10.1016/j.scitotenv.2022.154284] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 02/25/2022] [Accepted: 02/28/2022] [Indexed: 06/14/2023]
Abstract
This study presents a new stacking ensemble model for contamination event detection using multiple water quality parameters. The stacking model consists of a number of machine learning base predictors and a meta-predictor, and it is trained using cross-validation to capture different features in multiple water quality parameters and then used for water quality predictions. For each water quality parameter, the residuals between predicted and measured data are classified to identify anomalies with thresholds derived from the sequential model-based optimization method and detection probabilities updated using Bayesian analysis. Alarms derived from individual water quality parameters are fused to enhance the anomaly signals and improve the detection accuracy. The proposed stacking-based method is evaluated using a data set of six water quality parameters from a real water distribution system with randomly simulated events. The stacking-based method could detect 2496 events out of a total 2500 events without a false alarm. The results show that the stacking method outperforms an artificial neural network (ANN) benchmark method in contamination event detection. The stacking method has a higher true positive rate, lower false positive rate and higher F1 score than the ANN method. This implies that the stacking method has great promise of detecting contamination events in the water distribution system.
Collapse
Affiliation(s)
- Zilin Li
- School of Hydraulic Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Chi Zhang
- School of Hydraulic Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China.
| | - Haixing Liu
- School of Hydraulic Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Chao Zhang
- School of Hydraulic Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Mengke Zhao
- School of Hydraulic Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Qiang Gong
- Dalian Water Supply Group Co. Ltd., Dalian, Liaoning 116011, China
| | - Guangtao Fu
- Centre for Water Systems, University of Exeter, Exeter EX4 4QF, UK
| |
Collapse
|
13
|
Zhu M, Wang J, Yang X, Zhang Y, Zhang L, Ren H, Wu B, Ye L. A review of the application of machine learning in water quality evaluation. ECO-ENVIRONMENT & HEALTH (ONLINE) 2022; 1:107-116. [PMID: 38075524 PMCID: PMC10702893 DOI: 10.1016/j.eehl.2022.06.001] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/19/2022] [Accepted: 06/01/2022] [Indexed: 12/31/2023]
Abstract
With the rapid increase in the volume of data on the aquatic environment, machine learning has become an important tool for data analysis, classification, and prediction. Unlike traditional models used in water-related research, data-driven models based on machine learning can efficiently solve more complex nonlinear problems. In water environment research, models and conclusions derived from machine learning have been applied to the construction, monitoring, simulation, evaluation, and optimization of various water treatment and management systems. Additionally, machine learning can provide solutions for water pollution control, water quality improvement, and watershed ecosystem security management. In this review, we describe the cases in which machine learning algorithms have been applied to evaluate the water quality in different water environments, such as surface water, groundwater, drinking water, sewage, and seawater. Furthermore, we propose possible future applications of machine learning approaches to water environments.
Collapse
Affiliation(s)
- Mengyuan Zhu
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Jiawei Wang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Xiao Yang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Yu Zhang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Linyu Zhang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Hongqiang Ren
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Bing Wu
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Lin Ye
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| |
Collapse
|
14
|
Li L, Qiao J, Yu G, Wang L, Li HY, Liao C, Zhu Z. Interpretable tree-based ensemble model for predicting beach water quality. WATER RESEARCH 2022; 211:118078. [PMID: 35066260 DOI: 10.1016/j.watres.2022.118078] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 11/29/2021] [Accepted: 01/12/2022] [Indexed: 06/14/2023]
Abstract
Tree-based machine learning models based on environmental features offer low-cost and timely solutions for predicting microbial fecal contamination in beach water to inform the public of the health risk. However, many of these models are black boxes that are difficult for humans to understand, which may cause severe consequences such as unexplained decisions and failure in accountability. To develop interpretable predictive models for beach water quality, we evaluate five tree-based models, namely classification tree, random forest, CatBoost, XGBoost, and LightGBM, and employ a state-of-the-art explanation method SHAP to explain the models. When tested on the Escherichia coli (E. coli) concentration data collected from three beach sites along Lake Erie shores, LightGBM, followed by XGBoost, achieves the highest averaged precision and recall scores. For all three sites, both models suggest lake turbidity as the most important predictor, and elucidate the crucial role of accurate local data of wave height and rainfall in the model development. Local SHAP values further reveal the robustness of the importance of lake turbidity as its SHAP value increases nearly monotonically with its value and is minimally affected by other environmental factors. Moreover, we found an intriguing interaction between lake turbidity and day-of-year. This work suggests that the combination of LightGBM and SHAP has a promising potential to develop interpretable models for predicting microbial water quality in freshwater lakes.
Collapse
Affiliation(s)
- Lingbo Li
- Department of Civil, Structural and Environmental Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Jundong Qiao
- Department of Civil, Structural and Environmental Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Guan Yu
- Department of Biostatistics, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Leizhi Wang
- Nanjing Hydraulic Research Institute, State Key laboratory of Hydrology, Water Resources and Hydraulic Engineering & Science, Nanjing 210029, China
| | - Hong-Yi Li
- Department of Civil and Environmental Engineering, University of Houston, Houston, TX, USA
| | - Chen Liao
- Program for Computational and Systems Biology, Memorial Sloan-Kettering Cancer Center, NY, USA.
| | - Zhenduo Zhu
- Department of Civil, Structural and Environmental Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA.
| |
Collapse
|
15
|
A Stacking Ensemble Learning Model for Monthly Rainfall Prediction in the Taihu Basin, China. WATER 2022. [DOI: 10.3390/w14030492] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The prediction of monthly rainfall is greatly beneficial for water resources management and flood control projects. Machine learning (ML) techniques, as an increasingly popular approach, have been applied in diverse climatic regions, showing their respective superiority. On top of that, the ensemble learning model that synthesizes the advantages of different ML models deserves more attention. In this study, an ensemble learning model based on stacking approach was proposed. Four prevalent ML models, namely k-nearest neighbors (KNN), extreme gradient boosting (XGB), support vector regression (SVR), and artificial neural networks (ANN) are taken as base models. To combine the outputs from the base models, the weighting algorithm is used as second-layer learner to generate predictions. Large-scale climate indices, large-scale atmospheric variables, and local meteorological variables were used as predictors. R2, RMSE and MAE, were used as evaluation metrics. The results show that the performance of base models varied among the nine stations in the Taihu Basin, while the stacking approach generally performed better than the four base models. The stacking model showed better performance in spring and winter than in summer and autumn. During wet months, the accuracy of model prediction varied more significantly. On the whole, based on performance evaluation measures, it is concluded that the proposed stacking ensemble multi-ML model can provide a flexible and reasonable prediction framework applicable to other regions.
Collapse
|
16
|
Sokolova E, Ivarsson O, Lillieström A, Speicher NK, Rydberg H, Bondelind M. Data-driven models for predicting microbial water quality in the drinking water source using E. coli monitoring and hydrometeorological data. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 802:149798. [PMID: 34454142 DOI: 10.1016/j.scitotenv.2021.149798] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 07/08/2021] [Accepted: 08/16/2021] [Indexed: 06/13/2023]
Abstract
Rapid changes in microbial water quality in surface waters pose challenges for production of safe drinking water. If not treated to an acceptable level, microbial pathogens present in the drinking water can result in severe consequences for public health. The aim of this paper was to evaluate the suitability of data-driven models of different complexity for predicting the concentrations of E. coli in the river Göta älv at the water intake of the drinking water treatment plant in Gothenburg, Sweden. The objectives were to (i) assess how the complexity of the model affects the model performance; and (ii) identify relevant factors and assess their effect as predictors of E. coli levels. To forecast E. coli levels one day ahead, the data on laboratory measurements of E. coli and total coliforms, Colifast measurements of E. coli, water temperature, turbidity, precipitation, and water flow were used. The baseline approaches included Exponential Smoothing and ARIMA (Autoregressive Integrated Moving Average), which are commonly used univariate methods, and a naive baseline that used the previous observed value as its next prediction. Also, models common in the machine learning domain were included: LASSO (Least Absolute Shrinkage and Selection Operator) Regression and Random Forest, and a tool for optimising machine learning pipelines - TPOT (Tree-based Pipeline Optimization Tool). Also, a multivariate autoregressive model VAR (Vector Autoregression) was included. The models that included multiple predictors performed better than univariate models. Random Forest and TPOT resulted in higher performance but showed a tendency of overfitting. Water temperature, microbial concentrations upstream and at the water intake, and precipitation upstream were shown to be important predictors. Data-driven modelling enables water producers to interpret the measurements in the context of what concentrations can be expected based on the recent historic data, and thus identify unexplained deviations warranting further investigation of their origin.
Collapse
Affiliation(s)
- Ekaterina Sokolova
- Chalmers University of Technology, Department of Architecture and Civil Engineering, Sweden.
| | - Oscar Ivarsson
- Chalmers University of Technology, Department of Computer Science and Engineering, Sweden
| | - Ann Lillieström
- Chalmers University of Technology, Department of Computer Science and Engineering, Sweden
| | - Nora K Speicher
- Chalmers University of Technology, Department of Computer Science and Engineering, Sweden
| | - Henrik Rydberg
- City of Gothenburg, Department of Sustainable Water and Waste, Sweden
| | - Mia Bondelind
- Chalmers University of Technology, Department of Architecture and Civil Engineering, Sweden
| |
Collapse
|
17
|
Bourel M, Segura AM, Crisci C, López G, Sampognaro L, Vidal V, Kruk C, Piccini C, Perera G. Machine learning methods for imbalanced data set for prediction of faecal contamination in beach waters. WATER RESEARCH 2021; 202:117450. [PMID: 34352535 DOI: 10.1016/j.watres.2021.117450] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 07/09/2021] [Accepted: 07/15/2021] [Indexed: 06/13/2023]
Abstract
Predicting water contamination by statistical models is a useful tool to manage health risk in recreational beaches. Extreme contamination events, i.e. those exceeding normative are generally rare with respect to bathing conditions and thus the data is said to be imbalanced. Modeling and predicting those rare events present unique challenges. Here we introduce and evaluate several machine learning techniques and metrics to model imbalanced data and evaluate model performance. We do so by using a) simulated data-sets and b) a real data base with records of faecal coliform abundance monitored for 10 years in 21 recreational beaches in Uruguay (N ≈ 19000) using in situ and meteorological variables. We discuss advantages and disadvantages of the methods and provide a simple guide to perform models for a general audience. We also provide R codes to reproduce model fitting and testing. We found that most Machine Learning techniques are sensitive to imbalance and require specific data pre-treatment (e.g. upsampling) to improve performance. Accuracy (i.e. correctly classified cases over total cases) is not adequate to evaluate model performance on imbalanced data set. Instead, true positive rates (TPR) and false positive rates (FPR) are recommended. Among the 52 possible candidate algorithms tested, the stratified Random forest presented the better performance improving TPR in 50% with respect to baseline (0.4) and outperformed baseline in the evaluated metrics. Support vector machines combined with upsampling method or synthetic minority oversampling technique (SMOTE) performed well, similar to Adaboost with SMOTE. These results suggests that combining modeling strategies is necessary to improve our capacity to anticipate water contamination and avoid health risk.
Collapse
Affiliation(s)
- Mathias Bourel
- IMERL, Facultad de Ingeniería, Universidad de la República, Montevideo, Uruguay; Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay.
| | - Angel M Segura
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Carolina Crisci
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Guzmán López
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Lia Sampognaro
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Victoria Vidal
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Carla Kruk
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay; Departamento de Microbiología, Instituto de Investigaciones Biológicas Clemente Estable, Ministerio de Educación y Cultura, Montevideo, Uruguay; Instituto de Ecología y Ciencias Ambientales, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Claudia Piccini
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay; Departamento de Microbiología, Instituto de Investigaciones Biológicas Clemente Estable, Ministerio de Educación y Cultura, Montevideo, Uruguay
| | - Gonzalo Perera
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| |
Collapse
|
18
|
Heasley C, Sanchez JJ, Tustin J, Young I. Systematic review of predictive models of microbial water quality at freshwater recreational beaches. PLoS One 2021; 16:e0256785. [PMID: 34437625 PMCID: PMC8389397 DOI: 10.1371/journal.pone.0256785] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 08/14/2021] [Indexed: 11/19/2022] Open
Abstract
Monitoring of fecal indicator bacteria at recreational waters is an important public health measure to minimize water-borne disease, however traditional culture methods for quantifying bacteria can take 18-24 hours to obtain a result. To support real-time notifications of water quality, models using environmental variables have been created to predict indicator bacteria levels on the day of sampling. We conducted a systematic review of predictive models of fecal indicator bacteria at freshwater recreational sites in temperate climates to identify and describe the existing approaches, trends, and their performance to inform beach water management policies. We conducted a comprehensive search strategy, including five databases and grey literature, screened abstracts for relevance, and extracted data using structured forms. Data were descriptively summarized. A total of 53 relevant studies were identified. Most studies (n = 44, 83%) were conducted in the United States and evaluated water quality using E. coli as fecal indicator bacteria (n = 46, 87%). Studies were primarily conducted in lakes (n = 40, 75%) compared to rivers (n = 13, 25%). The most commonly reported predictive model-building method was multiple linear regression (n = 37, 70%). Frequently used predictors in best-fitting models included rainfall (n = 39, 74%), turbidity (n = 31, 58%), wave height (n = 24, 45%), and wind speed and direction (n = 25, 47%, and n = 23, 43%, respectively). Of the 19 (36%) studies that measured accuracy, predictive models averaged an 81.0% accuracy, and all but one were more accurate than traditional methods. Limitations identifed by risk-of-bias assessment included not validating models (n = 21, 40%), limited reporting of whether modelling assumptions were met (n = 40, 75%), and lack of reporting on handling of missing data (n = 37, 70%). Additional research is warranted on the utility and accuracy of more advanced predictive modelling methods, such as Bayesian networks and artificial neural networks, which were investigated in comparatively fewer studies and creating risk of bias tools for non-medical predictive modelling.
Collapse
Affiliation(s)
- Cole Heasley
- School of Occupational and Public Health, Ryerson University, Toronto, Ontario, Canada
| | - J. Johanna Sanchez
- School of Occupational and Public Health, Ryerson University, Toronto, Ontario, Canada
| | - Jordan Tustin
- School of Occupational and Public Health, Ryerson University, Toronto, Ontario, Canada
| | - Ian Young
- School of Occupational and Public Health, Ryerson University, Toronto, Ontario, Canada
| |
Collapse
|
19
|
Ye GH, Alim M, Guan P, Huang DS, Zhou BS, Wu W. Improving the precision of modeling the incidence of hemorrhagic fever with renal syndrome in mainland China with an ensemble machine learning approach. PLoS One 2021; 16:e0248597. [PMID: 33725011 PMCID: PMC7963064 DOI: 10.1371/journal.pone.0248597] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 03/02/2021] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE Hemorrhagic fever with renal syndrome (HFRS), one of the main public health concerns in mainland China, is a group of clinically similar diseases caused by hantaviruses. Statistical approaches have always been leveraged to forecast the future incidence rates of certain infectious diseases to effectively control their prevalence and outbreak potential. Compared to the use of one base model, model stacking can often produce better forecasting results. In this study, we fitted the monthly reported cases of HFRS in mainland China with a model stacking approach and compared its forecasting performance with those of five base models. METHOD We fitted the monthly reported cases of HFRS ranging from January 2004 to June 2019 in mainland China with an autoregressive integrated moving average (ARIMA) model; the Holt-Winter (HW) method, seasonal decomposition of the time series by LOESS (STL); a neural network autoregressive (NNAR) model; and an exponential smoothing state space model with a Box-Cox transformation; ARMA errors; and trend and seasonal components (TBATS), and we combined the forecasting results with the inverse rank approach. The forecasting performance was estimated based on several accuracy criteria for model prediction, including the mean absolute percentage error (MAPE), root-mean-squared error (RMSE) and mean absolute error (MAE). RESULT There was a slight downward trend and obvious seasonal periodicity inherent in the time series data for HFRS in mainland China. The model stacking method was selected as the best approach with the best performance in terms of both fitting (RMSE 128.19, MAE 85.63, MAPE 8.18) and prediction (RMSE 151.86, MAE 118.28, MAPE 13.16). CONCLUSION The results showed that model stacking by using the optimal mean forecasting weight of the five abovementioned models achieved the best performance in terms of predicting HFRS one year into the future. This study has corroborated the conclusion that model stacking is an easy way to enhance prediction accuracy when modeling HFRS.
Collapse
Affiliation(s)
- Guo-hua Ye
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China
| | - Mirxat Alim
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China
| | - Peng Guan
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China
| | - De-sheng Huang
- Department of Mathematics, School of Fundamental Sciences, China Medical University, Shenyang, Liaoning, China
| | - Bao-sen Zhou
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China
| | - Wei Wu
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China
- * E-mail:
| |
Collapse
|