1
|
Mamun MAA, Islam ARMT, Aktar MN, Uddin MN, Islam MS, Pal SC, Islam A, Bari ABMM, Idris AM, Senapathi V. Predicting groundwater phosphate levels in coastal multi-aquifers: A geostatistical and data-driven approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 953:176024. [PMID: 39241889 DOI: 10.1016/j.scitotenv.2024.176024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 08/19/2024] [Accepted: 09/02/2024] [Indexed: 09/09/2024]
Abstract
The groundwater (GW) resource plays a central role in securing water supply in the coastal region of Bangladesh and therefore the future sustainability of this valuable resource is crucial for the area. However, there is limited research on the driving factors and prediction of phosphate concentration in groundwater. In this work, geostatistical modeling, self-organizing maps (SOM) and data-driven algorithms were combined to determine the driving factors and predict GW phosphate content in coastal multi-aquifers in southern Bangladesh. The SOM analysis identified three distinct spatial patterns: K+Na+pH, Ca2+Mg2+NO₃-, and HCO₃-SO₄2-PO43-F-. Four data-driven algorithms, including CatBoost, Gradient Boosting Machine (GBM), Long Short-Term Memory (LSTM), and Support Vector Regression (SVR) were used to predict phosphate concentration in GW using 380 samples and 15 prediction parameters. Forecasting accuracy was evaluated using RMSE, R2, RAE, CC, and MAE. Phosphate dissolution and saltwater intrusion, along with phosphorus fertilizers, increase PO43- content in GW. Using input parameters selected by multicollinearity and SOM, the CatBoost model showed exceptional performance in both training (RMSE = 0.002, MAE = 0.001, R2 = 0.999, RAE = 0.057, CC = 1.00) and testing (RMSE = 0.001, MAE = 0.002, R2 = 0.989, RAE = 0.057, CC = 0.998). Na+, K+, and Mg2+ significantly influenced prediction accuracy. The uncertainty study revealed a low standard error for the CatBoost model, indicating robustness and consistency. Semi-variogram models confirmed that the most influential attributes showed weak dependence, suggesting that agricultural runoff increases the heterogeneity of PO43- distribution in GW. These findings are crucial for developing conservation and strategic plans for sustainable utilization of coastal GW resources.
Collapse
Affiliation(s)
| | - Abu Reza Md Towfiqul Islam
- Department of Disaster Management, Begum Rokeya University, Rangpur 5400, Bangladesh; Department of Development Studies, Daffodil International University, Dhaka 1216, Bangladesh.
| | - Mst Nazneen Aktar
- Department of Disaster Management, Begum Rokeya University, Rangpur 5400, Bangladesh
| | - Md Nashir Uddin
- Department of Civil Engineering, Dhaka University of Engineering and Technology, Gazipur, Bangladesh
| | - Md Saiful Islam
- Department of Soil Science, Patuakhali Science and Technology University, Dumki, Patuakhali 8602, Bangladesh
| | - Subodh Chandra Pal
- Department of Geography, The University of Burdwan, Purba Bardhaman, West Bengal 713104, India
| | - Aznarul Islam
- Department of Geography, Aliah University, 17 Gorachand Road, Kolkata 700014, India
| | - A B M Mainul Bari
- Department of Industrial and Production Engineering, Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh
| | - Abubakr M Idris
- Department of Chemistry, College of Science, King Khalid University, Abha 62529, Saudi Arabia
| | - Venkatramanan Senapathi
- PG and Research Department of Geology, National College (Autonomous), Tiruchirappalli 620001, Tamil Nadu, India.
| |
Collapse
|
2
|
Tian Y, Liu Q, Ji Y, Dang Q, Sun Y, He X, Liu Y, Su J. Prediction of sulfate concentrations in groundwater in areas with complex hydrogeological conditions based on machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 923:171312. [PMID: 38423319 DOI: 10.1016/j.scitotenv.2024.171312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/16/2024] [Accepted: 02/25/2024] [Indexed: 03/02/2024]
Abstract
The persistent and increasing levels of sulfate due to a variety of human activities over the last decades present a widely concerning environmental issue. Understanding the controlling factors of groundwater sulfate and predicting sulfate concentration is critical for governments or managers to provide information on groundwater protection. In this study, the integration of self-organizing map (SOM) approach and machine learning (ML) modeling offers the potential to determine the factors and predict sulfate concentrations in the Huaibei Plain, where groundwater is enriched with sulfate and the areas have complex hydrogeological conditions. The SOM calculation was used to illustrate groundwater hydrochemistry and analyze the correlations among the hydrochemical parameters. Three ML algorithms including random forest (RF), support vector machine (SVM), and back propagation neural network (BPNN) were adopted to predict sulfate levels in groundwater by using 501 groundwater samples and 8 predictor variables. The prediction performance was evaluated through statistical metrics (R2, MSE and MAE). Mine drainage mainly facilitated increase in groundwater SO42- while gypsum dissolution and pyrite oxidation were found another two potential sources. The major water chemistry type was Ca-HCO3. The dominant cation was Na+ while the dominant anion was HCO3-. There was an intuitive correlation between groundwater sulfate and total dissolved solids (TDS), Cl-, and Na+. By using input variables identified by the SOM method, the evaluation results of ML algorithms showed that the R2, MSE and MAE of RF, SVM, BPNN were 0.43-0.70, 0.16-0.49 and 0.25-0.44. Overall, BPNN showed the best prediction performance and had higher R2 values and lower error indices. TDS and Na+ had a high contribution to the prediction accuracy. These findings are crucial for developing groundwater protection and remediation policies, enabling more sustainable management.
Collapse
Affiliation(s)
- Yushan Tian
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Quanli Liu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Yao Ji
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Qiuling Dang
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Yuanyuan Sun
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Xiaosong He
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Yue Liu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China.
| | - Jing Su
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China.
| |
Collapse
|
3
|
Prediction and Interpretation of Water Quality Recovery after a Disturbance in a Water Treatment System Using Artificial Intelligence. WATER 2022. [DOI: 10.3390/w14152423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In this study, an ensemble machine learning model was developed to predict the recovery rate of water quality in a water treatment plant after a disturbance. XGBoost, one of the most popular ensemble machine learning models, was used as the main framework of the model. Water quality and operational data observed in a pilot plant were used to train and test the model. Disturbance was determined when the observed turbidity was higher than the given turbidity criteria. Therefore, the recovery rate of water quality at a time t was defined during the falling limb of the turbidity recovery period. It was considered as a relative ratio of the differences between the peak and observed turbidities at time t to the difference between the peak turbidity and turbidity criteria. The root mean square error–observation standard deviation ratio of the XGBoost model improved from 0.730 to 0.373 by pretreatment, removing the observation for the rising limb of the disturbance from the training data. Moreover, Shapley value analysis, a novel explainable artificial intelligence method, was used to provide a reasonable interpretation of the model’s performance.
Collapse
|
4
|
Park J, Lee WH, Kim KT, Park CY, Lee S, Heo TY. Interpretation of ensemble learning to predict water quality using explainable artificial intelligence. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 832:155070. [PMID: 35398119 DOI: 10.1016/j.scitotenv.2022.155070] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 03/31/2022] [Accepted: 04/02/2022] [Indexed: 06/14/2023]
Abstract
Algal bloom is a significant issue when managing water quality in freshwater; specifically, predicting the concentration of algae is essential to maintaining the safety of the drinking water supply system. The chlorophyll-a (Chl-a) concentration is a commonly used indicator to obtain an estimation of algal concentration. In this study, an XGBoost ensemble machine learning (ML) model was developed from eighteen input variables to predict Chl-a concentration. The composition and pretreatment of input variables to the model are important factors for improving model performance. Explainable artificial intelligence (XAI) is an emerging area of ML modeling that provides a reasonable interpretation of model performance. The effect of input variable selection on model performance was estimated, where the priority of input variable selection was determined using three indices: Shapley value (SHAP), feature importance (FI), and variance inflation factor (VIF). SHAP analysis is an XAI algorithm designed to compute the relative importance of input variables with consistency, providing an interpretable analysis for model prediction. The XGB models simulated with independent variables selected using three indices were evaluated with root mean square error (RMSE), RMSE-observation standard deviation ratio, and Nash-Sutcliffe efficiency. This study shows that the model exhibited the most stable performance when the priority of input variables was determined by SHAP. This implies that on-site monitoring can be designed to collect the selected input variables from the SHAP analysis to reduce the cost of overall water quality analysis. The independent variables were further analyzed using SHAP summary plot, force plot, target plot, and partial dependency plot to provide understandable interpretation on the performance of the XGB model. While XAI is still in the early stages of development, this study successfully demonstrated a good example of XAI application to improve the interpretation of machine learning model performance in predicting water quality.
Collapse
Affiliation(s)
- Jungsu Park
- Department of Civil and Environmental Engineering, Hanbat National University,125, Dongseo-daero, Yuseong-gu, Daejeon 34158, Republic of Korea.
| | - Woo Hyoung Lee
- Department of Civil, Environmental and Construction Engineering, University of Central Florida, 12800 Pegasus Dr., Orlando, FL 32816, USA.
| | - Keug Tae Kim
- Department of Environmental & Energy Engineering, The University of Suwon, 17 Wauan-gil, Bongdam-eup, Hwaseong-si, Gyeonggi-do 18323, Republic of Korea.
| | | | - Sanghun Lee
- Department of Information & Statistics, Chungbuk National University, Chungdae-Ro 1, SeoWon-Gu, Cheongju, Chungbuk 28644, Republic of Korea
| | - Tae-Young Heo
- Department of Information & Statistics, Chungbuk National University, Chungdae-Ro 1, SeoWon-Gu, Cheongju, Chungbuk 28644, Republic of Korea.
| |
Collapse
|
5
|
Hamlin QF, Martin SL, Kendall AD, Hyndman DW. Examining Relationships Between Groundwater Nitrate Concentrations in Drinking Water and Landscape Characteristics to Understand Health Risks. GEOHEALTH 2022; 6:e2021GH000524. [PMID: 35509496 PMCID: PMC9060635 DOI: 10.1029/2021gh000524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 02/11/2022] [Accepted: 03/31/2022] [Indexed: 06/14/2023]
Abstract
Nitrate ingested from drinking water has been linked to adverse health outcomes (e.g., cancer, birth defects) at levels as low as ∼2 mg/L NO3-N, far below the regulatory limits of 10 mg/L. In many areas, groundwater is a common drinking water source and may contain elevated nitrate, but limited data on the patterns and concentrations are available. Using an extensive regulatory data set of over 100,000 nitrate drinking water well samples, we developed new maps of groundwater nitrate concentrations from 76,724 wells in Michigan's Lower Peninsula, USA for the 2006-2015 period. Kriging, a geostatistical method, was used to interpolate concentrations and quantify probability of exceeding relevant thresholds (>0.4 [common detection limit], >2 mg/L NO3-N). We summarized this probability in small watersheds (∼80 km2) to identify correlated variables using the machine learning method classification and regression trees (CARTs). We found 79% of wells had concentrations below the detection limit in this analysis (<0.4 mg/L NO3-N). In the shallow aquifer (focus of study), 13% of wells exceeded 2 mg/L NO3-N and 2% exceeded the EPA maximum contaminant level of 10 mg/L. CART explained 40%-45% of variation in each model and identified three categories of critical correlated variables: source (high agricultural nitrogen inputs), vulnerable soil conditions (low soil organic carbon and high hydraulic conductivity), and transport mechanisms (high aquifer recharge). These findings add to the body of literature seeking to identify communities at risk of elevated nitrate and study associated adverse health outcomes.
Collapse
Affiliation(s)
- Q. F. Hamlin
- Department of Earth and Environmental SciencesMichigan State UniversityEast LansingMIUSA
| | - S. L. Martin
- Department of Earth and Environmental SciencesMichigan State UniversityEast LansingMIUSA
| | - A. D. Kendall
- Department of Earth and Environmental SciencesMichigan State UniversityEast LansingMIUSA
| | - D. W. Hyndman
- Department of Earth and Environmental SciencesMichigan State UniversityEast LansingMIUSA
- Department of GeosciencesSchool of Natural Sciences and MathematicsUniversity of Texas at DallasRichardsonTXUSA
| |
Collapse
|
6
|
Alkindi KM, Mukherjee K, Pandey M, Arora A, Janizadeh S, Pham QB, Anh DT, Ahmadi K. Prediction of groundwater nitrate concentration in a semiarid region using hybrid Bayesian artificial intelligence approaches. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:20421-20436. [PMID: 34735705 DOI: 10.1007/s11356-021-17224-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 10/21/2021] [Indexed: 06/13/2023]
Abstract
Nitrate is a major pollutant in groundwater whose main source is municipal wastewater and agricultural activities. In the present study, Bayesian approaches such as Bayesian generalized linear model (BGLM), Bayesian regularized neural network (BRNN), Bayesian additive regression tree (BART), and Bayesian ridge regression (BRR) were used to model groundwater nitrate contamination in a semiarid region Marvdasht watershed, Fars province, Iran. Eleven groundwater (GW) nitrate conditioning factors have been taken as input parameters for predictive modeling. The results showed that the Bayesian models used in this study were all competent to model groundwater nitrate and the BART model with R2 = 0.83 was more efficient than the other models. The result of variable importance showed that potassium (K) has the highest importance in the models followed by rainfall, altitude, groundwater depth, and distance from the residential area. The results of the study can support the decision-making process to control and reduce the sources of nitrate pollution.
Collapse
Affiliation(s)
- Khalifa M Alkindi
- UNESCO Chair on Aflaj Studies, Archaeohydrology, University of Nizwa, Nizwa, Oman
| | - Kaustuv Mukherjee
- Department of Geography, Chandidas Mahavidyalaya, Birbhum, WB, 731215, India
| | - Manish Pandey
- University Center for Research & Development (UCRD), Chandigarh University, Mohali, 140413, Punjab, India
- Department of Civil Engineering, University Institute of Engineering, Chandigarh University, Mohali, 140413, Punjab, India
| | - Aman Arora
- Department of Geography, Faculty of Natural Sciences, Jamia Millia Islamia, New Delhi, 10025, Delhi, India
| | - Saeid Janizadeh
- Department of Watershed Management Engineering and Sciences, Faculty in Natural Resources and Marine Science, Tarbiat Modares University, 14115-111, Tehran, Iran
| | - Quoc Bao Pham
- Institute of Applied Technology, Thu Dau Mot University, Binh Duong Province, Vietnam
| | - Duong Tran Anh
- Ho Chi Minh City University of Technology (HUTECH) 475A, Dien Bien Phu, Ward 25, Binh Thanh District, Ho Chi Minh City, Vietnam.
| | - Kourosh Ahmadi
- Department of Forestry, Faculty in Natural Resources and Marine Science, Tarbiat Modares University, 14115-111, Tehran, Iran
| |
Collapse
|
7
|
Neural Network and Random Forest-Based Analyses of the Performance of Community Drinking Water Arsenic Treatment Plants. WATER 2021. [DOI: 10.3390/w13243507] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
A plethora of technologies has been developed over decades of extensive research on arsenic remediation, although the technical and financial perspective of arsenic removal plants in the field requires critical evaluation. In the present study, focusing on some of the pronounced arsenic-affected areas in West Bengal, India, we assessed the implementation and operation of different arsenic removal technologies using a dataset of 4000 spatio-temporal data collected from an in-depth field survey of 136 arsenic removal plants engaged in the public water supply. Our statistical analysis of this dataset indicates a 120% rise in the average cumulative capacity of the plants during 2014–2021. The majorities of the plants are based on the activated alumina with FeCl3 technology and serve about 49% of the population in the study area. The average cost of water production for the activated alumina with FeCl3 technology was found to be ₹7.56/m3 (USD $1 ≈ INR ₹70), while the lowest was ₹0.39/m3 for granular ferric hydroxide technology. A machine learning-based framework was employed to analyze the impact of water quality and treatment plant parameters on the removal efficiency, capital, and operational cost of the plants. The artificial neural network model exhibited adequate statistical significance, with a high F-value and R2 of 5830.94 and 0.72 for the capital cost model, 136,954, and 0.98 for the operational cost model, respectively. The relative importance of the process variables was identified through random forest models. The models indicated that flow rate, media, and chemicals are the predominant costs, while contaminant loading in influent water and a coagulating agent was important for removal efficiency. The established framework may be instrumental as a decision-making tool for water providers to assess the expected performance and financial involvement for proposed or ongoing arsenic removal plants concerning various design and quality parameters.
Collapse
|
8
|
Comparative Analysis of Artificial Intelligence Models for Accurate Estimation of Groundwater Nitrate Concentration. SENSORS 2020; 20:s20205763. [PMID: 33053663 PMCID: PMC7599737 DOI: 10.3390/s20205763] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 09/23/2020] [Accepted: 09/28/2020] [Indexed: 11/17/2022]
Abstract
Prediction of the groundwater nitrate concentration is of utmost importance for pollution control and water resource management. This research aims to model the spatial groundwater nitrate concentration in the Marvdasht watershed, Iran, based on several artificial intelligence methods of support vector machine (SVM), Cubist, random forest (RF), and Bayesian artificial neural network (Baysia-ANN) machine learning models. For this purpose, 11 independent variables affecting groundwater nitrate changes include elevation, slope, plan curvature, profile curvature, rainfall, piezometric depth, distance from the river, distance from residential, Sodium (Na), Potassium (K), and topographic wetness index (TWI) in the study area were prepared. Nitrate levels were also measured in 67 wells and used as a dependent variable for modeling. Data were divided into two categories of training (70%) and testing (30%) for modeling. The evaluation criteria coefficient of determination (R2), mean absolute error (MAE), root mean square error (RMSE), and Nash–Sutcliffe efficiency (NSE) were used to evaluate the performance of the models used. The results of modeling the susceptibility of groundwater nitrate concentration showed that the RF (R2 = 0.89, RMSE = 4.24, NSE = 0.87) model is better than the other Cubist (R2 = 0.87, RMSE = 5.18, NSE = 0.81), SVM (R2 = 0.74, RMSE = 6.07, NSE = 0.74), Bayesian-ANN (R2 = 0.79, RMSE = 5.91, NSE = 0.75) models. The results of groundwater nitrate concentration zoning in the study area showed that the northern parts of the case study have the highest amount of nitrate, which is higher in these agricultural areas than in other areas. The most important cause of nitrate pollution in these areas is agriculture activities and the use of groundwater to irrigate these crops and the wells close to agricultural areas, which has led to the indiscriminate use of chemical fertilizers by irrigation or rainwater of these fertilizers is washed and penetrates groundwater and pollutes the aquifer.
Collapse
|
9
|
Prediction of Chlorophyll-a Concentrations in the Nakdong River Using Machine Learning Methods. WATER 2020. [DOI: 10.3390/w12061822] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Many studies have attempted to predict chlorophyll-a concentrations using multiple regression models and validating them with a hold-out technique. In this study commonly used machine learning models, such as Support Vector Regression, Bagging, Random Forest, Extreme Gradient Boosting (XGBoost), Recurrent Neural Network (RNN), and Long–Short-Term Memory (LSTM), are used to build a new model to predict chlorophyll-a concentrations in the Nakdong River, Korea. We employed 1–step ahead recursive prediction to reflect the characteristics of the time series data. In order to increase the prediction accuracy, the model construction was based on forward variable selection. The fitted models were validated by means of cumulative learning and rolling window learning, as opposed to the hold–out technique. The best results were obtained when the chlorophyll-a concentration was predicted by combining the RNN model with the rolling window learning method. The results suggest that the selection of explanatory variables and 1–step ahead recursive prediction in the machine learning model are important processes for improving its prediction performance.
Collapse
|