1
|
Singh G, Mehta S. Prediction of geogenic source of groundwater fluoride contamination in Indian states: A comparative study of different supervised machine learning algorithms. JOURNAL OF WATER AND HEALTH 2024; 22:1387-1408. [PMID: 39212277 DOI: 10.2166/wh.2024.063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 06/27/2024] [Indexed: 09/04/2024]
Abstract
India has been dealing with fluoride contamination of groundwater for the past few decades. Long-term exposure of fluoride can cause skeletal and dental fluorosis. Therefore, an in-depth exploration of fluoride concentrations in different parts of India is desirable. This work employs machine learning algorithms to analyze the fluoride concentrations in five major affected Indian states (Andhra Pradesh, Rajasthan, Tamil Nadu, Telangana and West Bengal). A correlation matrix was used to identify appropriate predictor variables for fluoride prediction. The various algorithms used for predictions included K-nearest neighbor (KNN), logistic regression (LR), random forest (RF), support vector classifier (SVC), Gaussian NB, MLP classifier, decision tree classifier, gradient boosting classifier, voting classifier soft and voting classifier hard. The performance of these models is assessed over accuracy, precision, recall and error rate and receiver operating curve. As the dataset was skewed, the performance of models was evaluated before and after resampling. Analysis of results indicates that the RF model is the best model for predicting fluoride contamination in groundwater in Indian states.
Collapse
|
2
|
Nafouanti MB, Li J, Nyakilla EE, Mwakipunda GC, Mulashani A. A novel hybrid random forest linear model approach for forecasting groundwater fluoride contamination. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:50661-50674. [PMID: 36800089 DOI: 10.1007/s11356-023-25886-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 02/07/2023] [Indexed: 02/18/2023]
Abstract
Groundwater quality in the Datong basin is threatened by high fluoride contamination. Laboratory analysis is a standard method for estimating groundwater quality parameters, which is expensive and time-consuming. Therefore, this paper proposes a hybrid random forest linear model (HRFLM) as a novel approach for estimating groundwater fluoride contamination. Light gradient boosting (LightGBM), random forest (RF), and extreme gradient boosting (Xgboost) were also employed in comparison with HRFLM for predicting fluoride contamination in groundwater. 202 groundwater samples were collected to draw up the performance capability of several models in forecasting subsurface water fluoride contamination. The performance of the models was assessed utilizing the receiver operating characteristic (ROC) area under the curve (AUC) and the confusion matrix (CM). The CM results reveal that with nine predictor variables, the hybrid HRFLM achieved an accuracy of 95%, outperforming the Xgboost, LightGBM, and RF models, which attained 88%, 88%, and 85%, respectively. Likewise, the AUC results of the hybrid HRFLM show high performance with an AUC of 0.98 compared to Xgboost, LightGBM, and RF, which achieved an AUC of 0.95, 0.90, and 0.88, respectively. The study demonstrates that the HRFLM can be applied as an advanced approach for groundwater fluoride contamination prediction in the Datong basin and could be adopted in various areas facing a similar challenge.
Collapse
Affiliation(s)
- Mouigni Baraka Nafouanti
- State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences, Wuhan, 430074, China.
| | - Junxia Li
- State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences, Wuhan, 430074, China.,China Laboratory of Basin Hydrology and Wetland Eco-restoration, China University of Geosciences, Wuhan, 430074, China
| | - Edwin E Nyakilla
- Department of Petroleum Engineering, Faculty of Earth Resources, China University of Geosciences, Wuhan, 430074, China
| | - Grant Charles Mwakipunda
- Department of Petroleum Engineering, Faculty of Earth Resources, China University of Geosciences, Wuhan, 430074, China
| | - Alvin Mulashani
- Department of Geosciences and Mining Technology, College of Engineering and Technology, Mbeya University of Science and Technology, Box 131, Mbeya, Tanzania
| |
Collapse
|
3
|
Khosravi K, Safari MJS, Sheikh Khozani Z, Crookston B, Golkarian A. Stacking ensemble-based hybrid algorithms for discharge computation in sharp-crested labyrinth weirs. Soft comput 2022. [DOI: 10.1007/s00500-022-07073-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
4
|
Intelligent flow discharge computation in a rectangular channel with free overfall condition. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07112-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
5
|
Daily Water Level Prediction of Zrebar Lake (Iran): A Comparison between M5P, Random Forest, Random Tree and Reduced Error Pruning Trees Algorithms. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2020. [DOI: 10.3390/ijgi9080479] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Zrebar Lake is one of the largest freshwater lakes in Iran and it plays an important role in the ecosystem of the environment, while its desiccation has a negative impact on the surrounded ecosystem. Despite this, this lake provides an interesting recreation setting in terms of ecotourism. The prediction and forecasting of the water level of the lake through simple but practical methods can provide a reliable tool for future lake water resource management. In the present study, we predict the daily water level of Zrebar Lake in Iran through well-known decision tree-based algorithms, including the M5 pruned (M5P), random forest (RF), random tree (RT) and reduced error pruning tree (REPT). We used five different water input combinations to find the most effective one. For our modeling, we chose 70% of the dataset for training (from 2011 to 2015) and 30% for model evaluation (from 2015 to 2017). We evaluated the models’ performances using different quantitative (root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R2), percent bias (PBIAS) and ratio of the root mean square error to the standard deviation of measured data (RSR)) and visual frameworks (Taylor diagram and box plot). Our results showed that water level with a one-day lag time had the highest effect on the result and, by increasing the lag time, its effect on the result was decreased. This result indicated that all the developed models had a good prediction capability, but the M5P model outperformed the others, followed by RF and RT equally and then REPT. Our results showed that these algorithms can predict water level accurately only with a one-day lag time in water level as an input and they are cost-effective tools for future predictions.
Collapse
|
6
|
Bui DT, Khosravi K, Tiefenbacher J, Nguyen H, Kazakis N. Improving prediction of water quality indices using novel hybrid machine-learning algorithms. THE SCIENCE OF THE TOTAL ENVIRONMENT 2020; 721:137612. [PMID: 32169637 DOI: 10.1016/j.scitotenv.2020.137612] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 02/26/2020] [Accepted: 02/26/2020] [Indexed: 06/10/2023]
Abstract
River water quality assessment is one of the most important tasks to enhance water resources management plans. A water quality index (WQI) considers several water quality variables simultaneously. Traditionally WQI calculations consume time and are often fraught with errors during derivations of sub-indices. In this study, 4 standalone (random forest (RF), M5P, random tree (RT), and reduced error pruning tree (REPT)) and 12 hybrid data-mining algorithms (combinations of standalones with bagging (BA), CV parameter selection (CVPS) and randomizable filtered classification (RFC)) were used to create Iran WQI (IRWQIsc) predictions. Six years (2012 to 2018) of monthly data from two water quality monitoring stations within the Talar catchment were compiled. Using Pearson correlation coefficients, 10 different input combinations were constructed. The data were divided into two groups (ratio 70:30) for model building (training dataset) and model validation (testing dataset) using a 10-fold cross-validation technique. The models were evaluated using several statistical and visual evaluation metrics. Result show that fecal coliform (FC) and total solids (TS) had the greatest and least effect on the prediction of IRWQIsc. The best input combinations varied among the algorithms; generally variables with very low correlations displayed weaker performance. Hybrid algorithms improved the prediction power of several of the standalone models, but not all. Hybrid BA-RT outperformed the other models (R2 = 0.941, RMSE = 2.71, MAE = 1.87, NSE = 0.941, PBIAS = 0.500). PBIAS indicated that all algorithms, with the exceptions of RT, BA-RT and CVPS-REPT, overestimated WQI values.
Collapse
Affiliation(s)
- Duie Tien Bui
- Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Viet Nam; Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Viet Nam.
| | - Khabat Khosravi
- School of Engineering, University of Guelph, Guelph, Canada.
| | - John Tiefenbacher
- Department of Geography, Texas State University, San Marcos, TX 78666, USA.
| | - Hoang Nguyen
- Institute of Research and Development, Duy Tan University, Da Nang 550000, Viet Nam.
| | - Nerantzis Kazakis
- Aristotle University of Thessaloniki, Department of Geology, Lab. of Engineering Geology & Hydrogeology, 54124 Thessaloniki, Greece.
| |
Collapse
|
7
|
Improvement of Credal Decision Trees Using Ensemble Frameworks for Groundwater Potential Modeling. SUSTAINABILITY 2020. [DOI: 10.3390/su12072622] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Groundwater is one of the most important sources of fresh water all over the world, especially in those countries where rainfall is erratic, such as Vietnam. Nowadays, machine learning (ML) models are being used for the assessment of groundwater potential of the region. Credal decision trees (CDT) is one of the ML models which has been used in such studies. In the present study, the performance of the CDT has been improved using various ensemble frameworks such as Bagging, Dagging, Decorate, Multiboost, and Random SubSpace. Based on these methods, five hybrid models, namely BCDT, Dagging-CDT, Decorate-CDT, MBCDT, and RSSCDT, were developed and applied for groundwater potential mapping of DakLak province of Vietnam. Data of 227 groundwater wells of the study area were utilized for the construction and validation of the models. Twelve groundwater potential conditioning factors, namely rainfall, slope, elevation, river density, Sediment Transport Index (STI), curvature, flow direction, aspect, soil, land use, Topographic Wetness Index (TWI), and geology, were considered for the model studies. Various statistical measures, including area under receiver operating characteristic (AUC) curve, were applied to validate and compare the performance of the models. The results show that performance of the hybrid CDT ensemble models MBCDT (AUC = 0.770), BCDT (AUC = 0.731), Dagging-CDT (AUC = 0.763), Decorate-CDT (AUC = 0.750), and RSSCDT (AUC = 0.766) improved significantly in comparison to the single CDT (AUC = 0.722) model. Therefore, these developed hybrid models can be applied for better ground water potential mapping and groundwater resources management of the study area as well as other regions of the world.
Collapse
|
8
|
Investigation and Optimization of the C-ANN Structure in Predicting the Compressive Strength of Foamed Concrete. MATERIALS 2020; 13:ma13051072. [PMID: 32121104 PMCID: PMC7084645 DOI: 10.3390/ma13051072] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 02/24/2020] [Accepted: 02/27/2020] [Indexed: 11/16/2022]
Abstract
Development of Foamed Concrete (FC) and incessant increases in fabrication technology have paved the way for many promising civil engineering applications. Nevertheless, the design of FC requires a large number of experiments to determine the appropriate Compressive Strength (CS). Employment of machine learning algorithms to take advantage of the existing experiments database has been attempted, but model performance can still be improved. In this study, the performance of an Artificial Neural Network (ANN) was fully analyzed to predict the 28 days CS of FC. Monte Carlo simulations (MCS) were used to statistically analyze the convergence of the modeled results under the effect of random sampling strategies and the network structures selected. Various statistical measures such as Coefficient of Determination (R2), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) were used for validation of model performance. The results show that ANN is a highly efficient predictor of the CS of FC, achieving a maximum R2 value of 0.976 on the training part and an R2 of 0.972 on the testing part, using the optimized C-ANN-[3–4–5–1] structure, which compares with previous published studies. In addition, a sensitivity analysis using Partial Dependence Plots (PDP) over 1000 MCS was also performed to interpret the relationship between the input parameters and 28 days CS of FC. Dry density was found as the variable with the highest impact to predict the CS of FC. The results presented could facilitate and enhance the use of C-ANN in other civil engineering-related problems.
Collapse
|
9
|
A Comparative Study of Kernel Logistic Regression, Radial Basis Function Classifier, Multinomial Naïve Bayes, and Logistic Model Tree for Flash Flood Susceptibility Mapping. WATER 2020. [DOI: 10.3390/w12010239] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Risk of flash floods is currently an important problem in many parts of Vietnam. In this study, we used four machine-learning methods, namely Kernel Logistic Regression (KLR), Radial Basis Function Classifier (RBFC), Multinomial Naïve Bayes (NBM), and Logistic Model Tree (LMT) to generate flash flood susceptibility maps at the minor part of Nghe An province of the Center region (Vietnam) where recurrent flood problems are being experienced. Performance of these four methods was evaluated to select the best method for flash flood susceptibility mapping. In the model studies, ten flash flood conditioning factors, namely soil, slope, curvature, river density, flow direction, distance from rivers, elevation, aspect, land use, and geology, were chosen based on topography and geo-environmental conditions of the site. For the validation of models, the area under Receiver Operating Characteristic (ROC), Area Under Curve (AUC), and various statistical indices were used. The results indicated that performance of all the models is good for generating flash flood susceptibility maps (AUC = 0.983–0.988). However, performance of LMT model is the best among the four methods (LMT: AUC = 0.988; KLR: AUC = 0.985; RBFC: AUC = 0.984; and NBM: AUC = 0.983). The present study would be useful for the construction of accurate flash flood susceptibility maps with the objectives of identifying flood-susceptible areas/zones for proper flash flood risk management.
Collapse
|