1
|
Boudibi S, Fadlaoui H, Hiouani F, Bouzidi N, Aissaoui A, Khomri ZE. Groundwater salinity modeling and mapping using machine learning approaches: a case study in Sidi Okba region, Algeria. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024:10.1007/s11356-024-34440-1. [PMID: 39042194 DOI: 10.1007/s11356-024-34440-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 07/16/2024] [Indexed: 07/24/2024]
Abstract
The groundwater salinization process complexity and the lack of data on its controlling factors are the main challenges for accurate predictions and mapping of aquifer salinity. For this purpose, effective machine learning (ML) methodologies are employed for effective modeling and mapping of groundwater salinity (GWS) in the Mio-Pliocene aquifer in the Sidi Okba region, Algeria, based on limited dataset of electrical conductivity (EC) measurements and readily available digital elevation model (DEM) derivatives. The dataset was randomly split into training (70%) and testing (30%) sets, and three wrapper selection methods, recursive feature elimination (RFE), forward feature selection (FFS), and backward feature selection (BFS) are applied to train the data. The resulting combinations are used as inputs for five ML models, namely random forest (RF), hybrid neuro-fuzzy inference system (HyFIS), K-nearest neighbors (KNN), cubist regression model (CRM), and support vector machine (SVM). The best-performing model is identified and applied to predict and map GWS across the entire study area. It is highlighted that the applied methods yield input variation combinations as critical factors that are often overlocked by many researchers, which substantially impacts the models' accuracy. Among different alternatives the RF model emerged as the most effective for predicting and mapping GWS in the study area, which led to the high performance in both the training (RMSE = 1.016, R = 0.854, and MAE = 0.759) and testing (RMSE = 1.069, R = 0.831, and MAE = 0.921) phases. The generated digital map highlighted the alarming situation regarding excessive GWS levels in the study area, particularly in zones of low elevations and far from the Foum Elgherza dam and Elbiraz wadi. Overall, this study represents a significant advancement over previous approaches, offering enhanced predictive performance for GWS with the minimum number of input variables.
Collapse
Affiliation(s)
- Samir Boudibi
- Centre de Recherche Scientifique et Technique sur les Régions Arides, CRSTRA, Biskra, Algeria.
| | - Haroun Fadlaoui
- Centre de Recherche Scientifique et Technique sur les Régions Arides, CRSTRA, Biskra, Algeria
| | - Fatima Hiouani
- Department of Agricultural Sciences, University of Mohammed Khider, Biskra, Algeria
| | - Narimen Bouzidi
- Centre de Recherche Scientifique et Technique sur les Régions Arides, CRSTRA, Biskra, Algeria
| | - Azeddine Aissaoui
- Centre de Recherche Scientifique et Technique sur les Régions Arides, CRSTRA, Biskra, Algeria
| | - Zine-Eddine Khomri
- Centre de Recherche Scientifique et Technique sur les Régions Arides, CRSTRA, Biskra, Algeria
| |
Collapse
|
2
|
Krishnamoorthy L, Lakshmanan VR. Groundwater quality assessment using machine learning models: a comprehensive study on the industrial corridor of a semi-arid region. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024:10.1007/s11356-024-34119-7. [PMID: 38963621 DOI: 10.1007/s11356-024-34119-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 06/21/2024] [Indexed: 07/05/2024]
Abstract
Water plays a significant role in sustaining the lives of humans and other living organisms. Groundwater quality analysis has become inevitable, because of increased contamination of water resources and global warming. This study used machine learning (ML) models to predict the water quality index (WQI) and water quality classification (WQC). Forty groundwater samples were collected near the Ranipet industrial corridor, and the hydrogeochemistry and heavy metal contamination were analyzed. WQC prediction employed random forest (RF), gradient boosting (GB), decision tree (DT), and K-nearest neighbor (KNN) models, and WQI prediction used extreme gradient boosting (XGBoost), support vector regressor (SVR), RF, and multi-layer perceptron (MLP) models. The grid search method is used to evaluate the ML model by F1 score, accuracy, recall, precision, and Matthews correlation coefficient (MCC) for WQC and the coefficient of determination (R2), mean absolute error (MAE), mean square error (MSE), and median absolute percentage error (MAPE) for WQI. The WQI results indicate that the groundwater quality of the study area is very poor and unsuitable for drinking or irrigation purposes. The performance metrics of the RF model excelled in predicting both WQC (accuracy = 97%) and WQI (R2 = 91.0%), outperforming other models and emphasizing ML's superiority in groundwater quality assessment. The findings suggest that ML models perform well and yield better accuracy than conventional techniques used in groundwater quality assessment studies.
Collapse
|
3
|
Elzain HE, Abdalla O, A Ahmed H, Kacimov A, Al-Maktoumi A, Al-Higgi K, Abdallah M, Yassin MA, Senapathi V. An innovative approach for predicting groundwater TDS using optimized ensemble machine learning algorithms at two levels of modeling strategy. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 351:119896. [PMID: 38171121 DOI: 10.1016/j.jenvman.2023.119896] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 12/12/2023] [Accepted: 12/19/2023] [Indexed: 01/05/2024]
Abstract
Groundwater salinization in coastal aquifers is a major socioeconomic challenge in Oman and many other regions worldwide due to several anthropogenic activities and natural drivers. Therefore, assessing the salinization of groundwater resources is crucial to ensure the protection of water resources and sustainable management. The aim of this study is to apply a novel approach using predictive optimized ensemble trees-based (ETB) machine learning models, namely Catboost regression (CBR), Extra trees regression (ETR), and Bagging regression (BA), at two levels of modeling strategy for predicting groundwater TDS as an indicator for seawater intrusion in a coastal aquifer, Oman. At level 1, ETR and CBR models were used as base models or inputs for BA in level 2. The results show that the models at level 1 (i.e., ETR and CBR) yielded satisfactory results using a limited number of inputs (Cl, K, and Sr) from a few sets of 40 groundwater wells. The BA model at level 2 improved the overall performance of the modeling by extracting more information from ETR and CBR models at level 1 models. At level 2, the BA model achieved a significant improvement in accuracy (MSE = 0.0002, RSR = 0.062, R2 = 0.995 and NSE = 0.996) compared to each individual model of ETR (MSE = 0.0007, RSR = 0.245, R2 = 0.98 and NSE = 0.94), and CBR (MSE = 0.0035, RSR = 0.258, R2 = 0.933 and NSE = 0.934) at level 1 models in the testing dataset. BA model at level 2 outperformed all models regarding predictive accuracy, best generalization of new data, and matching the locations of the polluted and unpolluted wells. Our approach predicts groundwater TDS with high accuracy and thus provides early warnings of water quality deterioration along coastal aquifers which will improve water resources sustainability.
Collapse
Affiliation(s)
- Hussam Eldin Elzain
- Water Research Center, Sultan Qaboos University, P.O. 50, Al Khoudh 123, Oman.
| | - Osman Abdalla
- Department of Earth Sciences, College of Science, Sultan Qaboos University, P.O. 36, Al Khoudh 123, Oman.
| | - Hamdi A Ahmed
- Department of Industrial and Data Engineering, Pukyong National University, Busan, 48513, South Korea.
| | - Anvar Kacimov
- Department of Soils, Water and Agricultural Engineering, Sultan Qaboos University, P.O. 34, Al Khoudh 123, Oman.
| | - Ali Al-Maktoumi
- Water Research Center, Sultan Qaboos University, P.O. 50, Al Khoudh 123, Oman; Department of Soils, Water and Agricultural Engineering, Sultan Qaboos University, P.O. 34, Al Khoudh 123, Oman.
| | - Khalifa Al-Higgi
- Department of Earth Sciences, College of Science, Sultan Qaboos University, P.O. 36, Al Khoudh 123, Oman.
| | - Mohammed Abdallah
- College of Hydrology and Water Resources, Hohai University, Nanjing, Jiangsu, 210024, China.
| | - Mohamed A Yassin
- Interdisciplinary Research Centre for Membranes and Water Security, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia.
| | | |
Collapse
|
4
|
Elsayed A, Rixon S, Levison J, Binns A, Goel P. Application of classification machine learning algorithms for characterizing nutrient transport in a clay plain agricultural watershed. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 345:118924. [PMID: 37678017 DOI: 10.1016/j.jenvman.2023.118924] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 08/28/2023] [Accepted: 08/30/2023] [Indexed: 09/09/2023]
Abstract
Excess nutrients in surface water and groundwater can lead to water quality deterioration in available water resources. Thus, the classification of nutrient concentrations in water resources has gained significant attention during recent decades. Machine learning (ML) algorithms are considered an efficient tool to describe nutrient loss from agricultural land to surface water and groundwater. Previous studies have applied regression and classification ML algorithms to predict nutrient concentrations in surface water and/or groundwater, or to categorize an output variable using a limited number of input variables. However, there have been no studies that examined the application of different ML classification algorithms in agricultural settings to classify various output variables using a wide range of input variables. In this study, twenty-four ML classification algorithms were implemented on a dataset from three locations within the Upper Parkhill watershed, an agricultural watershed in southern Ontario, Canada. Nutrient concentrations in surface water were classified using geochemical and physical water parameters of surface water and groundwater (e.g., pH), climate and field conditions as the input variables. The performance of these algorithms was evaluated using four evaluation metrics (e.g., classification accuracy) to identify the optimal algorithm for classifying the output variables. Ensemble bagged trees was found to be the optimal ML algorithm for classifying nitrate concentration in surface water (accuracy of 90.9%), while the weighted KNN was the most appropriate algorithm for categorizing the total phosphorus concentration (accuracy of 87%). The ensemble subspace discriminant algorithm gave the highest overall classification accuracy for the concentration of soluble reactive phosphorus and total dissolved phosphorus in surface water with an accuracy of 79.2% and 77.9%, respectively. This study exemplifies that ML algorithms can be used to signify exceedance of recommended concentrations of nutrients in surface waters in agricultural watersheds. Results are useful for decision makers to develop nutrient management strategies.
Collapse
Affiliation(s)
- Ahmed Elsayed
- School of Engineering, Morwick G360 Groundwater Research Institute, University of Guelph, 50 Stone Road East, Guelph, Ontario, N1G 2W1, Canada; Irrigation and Hydraulics Department, Faculty of Engineering, Cairo University, 1 Gamaa Street, Giza, 12613, Egypt.
| | - Sarah Rixon
- School of Engineering, Morwick G360 Groundwater Research Institute, University of Guelph, 50 Stone Road East, Guelph, Ontario, N1G 2W1, Canada
| | - Jana Levison
- School of Engineering, Morwick G360 Groundwater Research Institute, University of Guelph, 50 Stone Road East, Guelph, Ontario, N1G 2W1, Canada
| | - Andrew Binns
- School of Engineering, Morwick G360 Groundwater Research Institute, University of Guelph, 50 Stone Road East, Guelph, Ontario, N1G 2W1, Canada
| | - Pradeep Goel
- Ministry of the Environment, Conservation and Parks (MECP), 125 Resources Road, Etobicoke, Ontario, M9P 3V6, Canada
| |
Collapse
|
5
|
Yu X, Chen S, Zhang X, Wu H, Guo Y, Guan J. Research progress of the artificial intelligence application in wastewater treatment during 2012-2022: a bibliometric analysis. WATER SCIENCE AND TECHNOLOGY : A JOURNAL OF THE INTERNATIONAL ASSOCIATION ON WATER POLLUTION RESEARCH 2023; 88:1750-1766. [PMID: 37830995 PMCID: wst_2023_296 DOI: 10.2166/wst.2023.296] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/14/2023]
Abstract
This study identified literatures from the Web of Science Core Collection on the application of artificial intelligence in wastewater treatment from 2011 to 2022, through bibliometrics, to summarize achievements and capture the scientific and technological progress. The number of papers published is on the rise, and especially, the number of papers issued after 2018 has increased sharply, with China contributing the most in this regard, followed by the US, Iran and India. The University of Tehran has the largest number of papers, WATER is the most published journal, and Nasr M has the largest number of articles. Collaborative network has been developed mainly through cooperation between European countries, China and the US. Remote sensing in developing countries needs to be further integrated with water quality monitoring programs. It is worth noting that artificial neural network is a research hotspot in recent years. Through keyword clustering analysis, 'machine learning' and 'deep learning' are hot keywords that have emerged since 2019. The use of neural networks for predicting the effectiveness of treatment of difficult to degrade wastewater is a future research trend. The rapid advancement of deep learning provides the opportunity to build automated pipeline defect detection systems through image recognition.
Collapse
Affiliation(s)
- Xiaoman Yu
- School of Resources and Environmental Engineering, Shanghai Polytechnic University, Shanghai 201209, China E-mail:
| | - Shuai Chen
- School of Resources and Environmental Engineering, Shanghai Polytechnic University, Shanghai 201209, China; Anhui International Joint Research Center for Nano Carbon-based Materials and Environmental Health, Huainan 232001, China
| | - Xiaojiao Zhang
- School of Resources and Environmental Engineering, Shanghai Polytechnic University, Shanghai 201209, China
| | - Hongcheng Wu
- Shanghai Wobai Environmental Development Co. Ltd, Shanghai 201209, China
| | - Yaoguang Guo
- School of Resources and Environmental Engineering, Shanghai Polytechnic University, Shanghai 201209, China
| | - Jie Guan
- School of Resources and Environmental Engineering, Shanghai Polytechnic University, Shanghai 201209, China
| |
Collapse
|
6
|
Tokatlı C, Uğurluoğlu A, Muhammad S. Ecotoxicological evaluation of organic contamination in the world's two significant gateways to the Black Sea using GIS techniques: Turkish Straits. MARINE POLLUTION BULLETIN 2023; 194:115405. [PMID: 37598535 DOI: 10.1016/j.marpolbul.2023.115405] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 08/01/2023] [Accepted: 08/07/2023] [Indexed: 08/22/2023]
Abstract
This study was carried out to determine the spatial-temporal distributions of limnological parameters of Çanakkale Strait (ÇS) and İstanbul Strait (İS), Turkiye. Fluvial (n = 11) and lacustrine (n = 4) habitats water samples were collected in the dry and rainy seasons of 2022-2023. Among limnological parameters, the highest mean electrical conductivity values of 6063 μS/cm were noted in the İS basin during the rainy season and the lowest was 0.04 mg/L for nitrite in the ÇS basin. Generally, the levels of organic contaminants and ecological risk indices were as follows: rivers of İS > rivers of ÇS > Alibey Dam Lake (İS) > Atikhisar Dam Lake (ÇS). The highest non-carcinogenic health risks of 0.88 were noted for children in the ÇS basin during the dry season and the lowest of <0.01 in Atikhisar Dam Lake during the rainy season. Multivariate statistical techniques were applied to data to categorize investigated ecosystems and sources apportionment of contaminants and geospatial distribution.
Collapse
Affiliation(s)
- Cem Tokatlı
- Trakya University, Evrenos Gazi Campus, İpsala Vocational School, Department of Laboratory Technology, Edirne, Turkey
| | | | - Said Muhammad
- National Centre of Excellence in Geology, University of Peshawar, 25130, Pakistan.
| |
Collapse
|
7
|
Reis Pereira M, dos Santos FN, Tavares F, Cunha M. Enhancing host-pathogen phenotyping dynamics: early detection of tomato bacterial diseases using hyperspectral point measurement and predictive modeling. FRONTIERS IN PLANT SCIENCE 2023; 14:1242201. [PMID: 37662158 PMCID: PMC10468592 DOI: 10.3389/fpls.2023.1242201] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 07/27/2023] [Indexed: 09/05/2023]
Abstract
Early diagnosis of plant diseases is needed to promote sustainable plant protection strategies. Applied predictive modeling over hyperspectral spectroscopy (HS) data can be an effective, fast, cost-effective approach for improving plant disease diagnosis. This study aimed to investigate the potential of HS point-of-measurement (POM) data for in-situ, non-destructive diagnosis of tomato bacterial speck caused by Pseudomonas syringae pv. tomato (Pst), and bacterial spot, caused by Xanthomonas euvesicatoria (Xeu), on leaves (cv. cherry). Bacterial artificial infection was performed on tomato plants at the same phenological stage. A sensing system composed by a hyperspectral spectrometer, a transmission optical fiber bundle with a slitted probe and a white light source were used for spectral data acquisition, allowing the assessment of 3478 spectral points. An applied predictive classification model was developed, consisting of a normalizing pre-processing strategy allied with a Linear Discriminant Analysis (LDA) for reducing data dimensionality and a supervised machine learning algorithm (Support Vector Machine - SVM) for the classification task. The predicted model achieved classification accuracies of 100% and 74% for Pst and Xeu test set assessments, respectively, before symptom appearance. Model predictions were coherent with host-pathogen interactions mentioned in the literature (e.g., changes in photosynthetic pigment levels, production of bacterial-specific molecules, and activation of plants' defense mechanisms). Furthermore, these results were coherent with visual phenotyping inspection and PCR results. The reported outcomes support the application of spectral point measurements acquired in-vivo for plant disease diagnosis, aiming for more precise and eco-friendly phytosanitary approaches.
Collapse
Affiliation(s)
- Mafalda Reis Pereira
- Faculdade de Ciências da Universidade do Porto (FCUP), Rua do Campo Alegre, Porto, Portugal
- Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Campus da Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, Porto, Portugal
| | - Filipe Neves dos Santos
- Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Campus da Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, Porto, Portugal
| | - Fernando Tavares
- Faculdade de Ciências da Universidade do Porto (FCUP), Rua do Campo Alegre, Porto, Portugal
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, Vairão, Portugal
| | - Mário Cunha
- Faculdade de Ciências da Universidade do Porto (FCUP), Rua do Campo Alegre, Porto, Portugal
- Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Campus da Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, Porto, Portugal
| |
Collapse
|
8
|
Araya D, Podgorski J, Berg M. Groundwater salinity in the Horn of Africa: Spatial prediction modeling and estimated people at risk. ENVIRONMENT INTERNATIONAL 2023; 176:107925. [PMID: 37209488 DOI: 10.1016/j.envint.2023.107925] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 04/06/2023] [Accepted: 04/07/2023] [Indexed: 05/22/2023]
Abstract
BACKGROUND Changes in climate and anthropogenic activities have made water salinization a significant threat worldwide, affecting biodiversity, crop productivity and contributing to water insecurity. The Horn of Africa, which includes eastern Ethiopia, northeast Kenya, Eritrea, Djibouti, and Somalia, has natural characteristics that favor high groundwater salinity. Excess salinity has been linked to infrastructure and health problems, including increased infant mortality. This region has suffered successive droughts that have limited the availability of safe drinking water resources, leading to a humanitarian crisis for which little spatially explicit information about groundwater salinity is available. METHODS Machine learning (random forest) is used to make spatial predictions of salinity levels at three electrical conductivity (EC) thresholds using data from 8646 boreholes and wells along with environmental predictor variables. Attention is paid to understanding the input data, balancing classes, performing many iterations, specifying cut-off values, employing spatial cross-validation, and identifying spatial uncertainties. RESULTS Estimates are made for this transboundary region of the population potentially exposed to hazardous salinity levels. The findings indicate that about 11.6 million people (∼7% of the total population), including 400,000 infants and half a million pregnant women, rely on groundwater for drinking and live in areas of high groundwater salinity (EC > 1500 µS/cm). Somalia is the most affected and has the largest number of people potentially exposed. Around 50% of the Somali population (5 million people) may be exposed to unsafe salinity levels in their drinking water. In only five of Somalia's 18 regions are less than 50% of infants potentially exposed to unsafe salinity levels. The main drivers of high salinity include precipitation, groundwater recharge, evaporation, ocean proximity, and fractured rocks. The combined overall accuracy and area under the curve of multiple runs is ∼ 82%. CONCLUSIONS The modelled groundwater salinity maps for three different salinity thresholds in the Horn of Africa highlight the uneven spatial distribution of salinity in the studied countries and the large area affected, which is mainly arid flat lowlands. The results of this study provide the first detailed mapping of groundwater salinity in the region, providing essential information for water and health scientists along with decision-makers to identify and prioritize areas and populations in need of assistance.
Collapse
Affiliation(s)
- Dahyann Araya
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Department of Water Resources and Drinking Water, 8600 Dübendorf , Switzerland.
| | - Joel Podgorski
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Department of Water Resources and Drinking Water, 8600 Dübendorf , Switzerland
| | - Michael Berg
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Department of Water Resources and Drinking Water, 8600 Dübendorf , Switzerland.
| |
Collapse
|
9
|
Haggerty R, Sun J, Yu H, Li Y. Application of machine learning in groundwater quality modeling - A comprehensive review. WATER RESEARCH 2023; 233:119745. [PMID: 36812816 DOI: 10.1016/j.watres.2023.119745] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 11/30/2022] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
Groundwater is a crucial resource across agricultural, civil, and industrial sectors. The prediction of groundwater pollution due to various chemical components is vital for planning, policymaking, and management of groundwater resources. In the last two decades, the application of machine learning (ML) techniques for groundwater quality (GWQ) modeling has grown exponentially. This review assesses all supervised, semi-supervised, unsupervised, and ensemble ML models implemented to predict any groundwater quality parameter, making this the most extensive modern review on this topic. Neural networks are the most used ML model in GWQ modeling. Their usage has declined in recent years, giving rise to more accurate or advanced techniques such as deep learning or unsupervised algorithms. Iran and the United States lead the world in areas modeled, with a wealth of historical data available. Nitrate has been modeled most exhaustively, targeted by nearly half of all studies. Advancements in future work will be made with further implementation of deep learning and explainable artificial intelligence or other cutting-edge techniques, application of these techniques for sparsely studied variables, the modeling of new or unique study areas, and the implementation of ML techniques for groundwater quality management.
Collapse
Affiliation(s)
- Ryan Haggerty
- Department of Civil and Environmental Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, United States
| | - Jianxin Sun
- School of Computing, University of Nebraska-Lincoln, Lincoln, NE 68588, United States
| | - Hongfeng Yu
- School of Computing, University of Nebraska-Lincoln, Lincoln, NE 68588, United States; Holland Computing Center, University of Nebraska-Lincoln, Lincoln, NE 68588, United States
| | - Yusong Li
- Department of Civil and Environmental Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, United States.
| |
Collapse
|
10
|
Saha A, Pal SC, Chowdhuri I, Roy P, Chakrabortty R. Effect of hydrogeochemical behavior on groundwater resources in Holocene aquifers of moribund Ganges Delta, India: Infusing data-driven algorithms. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2022; 314:120203. [PMID: 36150620 DOI: 10.1016/j.envpol.2022.120203] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 08/16/2022] [Accepted: 09/15/2022] [Indexed: 06/16/2023]
Abstract
One of the fundamental sustainable development goals has been recognized as having access to clean water for drinking purposes. In the Anthropocene era, rapid urbanization put further stress on water resources, and associated groundwater contamination expanded into a significant global environmental issue. Natural arsenic and related water pollution have already caused a burden issue on groundwater vulnerability and corresponding health hazard in and around the Ganges delta. A field based hydrogeochemical analysis has been carried out in the elevated arsenic prone areas of moribund Ganges delta, West Bengal, a part of western Ganga- Brahmaputra delta (GBD). New data driven heuristic algorithms are rarely used in groundwater vulnerability studies, specifically not yet used in the elevated arsenic prone areas of Ganges delta, India. Therefore, in the current study, emphasis has been given on integration of heuristic algorithms and random forest (RF) i.e., "RF-particle swarm optimization (PSO)", "RF-grey wolf optimizer (GWO)" and "RF-grasshopper optimization algorithm (GOA)", to identify groundwater vulnerable zones on the basis of field based hydrogeochemical parameters. In addition, correspondence health hazard of this area was assessed through human health hazard index. The spatial distribution of groundwater vulnerability revealed that middle-eastern and north-western part of the study area covered by very high and high, whereas central, western and south-western part are covered by very low and low vulnerability zones in outcomes of all the applied models. The evaluation result indicates that RF-GOA (AUC = 0.911) model performed the best considering testing dataset, and thereafter RF-GWO, RF-PSO and RF with AUC value is 0.901, 0.892 and 0.812 respectively. Findings also revealed the groundwater in this study region is quite unfavorable for drinking and irrigation purposes. The suggested models demonstrate their usefulness in foretelling sustainable groundwater resource management in various deltaic regions of the world through taking appropriate measures by policy-makers.
Collapse
Affiliation(s)
- Asish Saha
- Department of Geography, The University of Burdwan, Bardhaman, West Bengal, 713104, India
| | - Subodh Chandra Pal
- Department of Geography, The University of Burdwan, Bardhaman, West Bengal, 713104, India.
| | - Indrajit Chowdhuri
- Department of Geography, The University of Burdwan, Bardhaman, West Bengal, 713104, India
| | - Paramita Roy
- Department of Geography, The University of Burdwan, Bardhaman, West Bengal, 713104, India
| | - Rabin Chakrabortty
- Department of Geography, The University of Burdwan, Bardhaman, West Bengal, 713104, India
| |
Collapse
|
11
|
Dutta S, Barman R, Radhapyari K, Datta S, Lale K, Ray B, Chakraborty T, Srivastava SK. Potentially toxic elements in groundwater of the upper Brahmaputra floodplains of Assam, India: water quality and health risk. ENVIRONMENTAL MONITORING AND ASSESSMENT 2022; 194:923. [PMID: 36258132 DOI: 10.1007/s10661-022-10637-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 10/07/2022] [Indexed: 06/16/2023]
Abstract
This paper presents the groundwater quality assessment of the upper Brahmaputra floodplains of Assam on a seasonal basis. A total of 88 samples were analyzed for the presence of potentially toxic elements in two seasons. In addition, an attempt is made to identify any possible associated health risks to the residents via the drinking water pathway. The study reveals the presence of various potentially toxic elements, in particular, manganese, iron, nickel, and fluoride concentration exceeding the drinking water specifications set by BIS and WHO drinking water standards. The degree of groundwater contamination was assessed using the Water Quality Index, Heavy metal Pollution Index, Heavy metal Evaluation Index, and Degree of Contamination. The spatial distribution maps of groundwater quality were prepared using geographical information system. The non-carcinogenic health risk was evaluated using hazard quotients and hazard index as per the United States Environmental Protection Agency methodology. The hazard quotient of fluoride and manganese have values > 1, which exceeds USEPA recommended benchmark. The health risk assessment identified that the risk was highest during the pre-monsoon season, and the child population is more vulnerable to non-carcinogenic risk than the adults. Findings of cancer risk identified that pre-monsoon groundwater samples from the Golaghat District pose the highest health risks in the upper Brahmaputra floodplains. The risk is highest in the southwest of the study area, followed by the south and then by the north.
Collapse
Affiliation(s)
- Snigdha Dutta
- Central Ground Water Board, North Eastern Region, Guwahati, 781035, Assam, India
| | - Rinkumoni Barman
- Central Ground Water Board, North Eastern Region, Guwahati, 781035, Assam, India
| | - Keisham Radhapyari
- Central Ground Water Board, North Eastern Region, Guwahati, 781035, Assam, India.
| | - Suparna Datta
- Central Ground Water Board, Eastern Region, Kolkata, 700091, West Bengal, India
| | - Kiran Lale
- Central Ground Water Board, North Western Region, Chandigarh, 160019, India
| | - Biplab Ray
- Central Ground Water Board, North Eastern Region, Guwahati, 781035, Assam, India
| | - Tapan Chakraborty
- Central Ground Water Board, State Unit Office, Shillong, 793001, Meghalaya, India
- Central Ground Water Board, Central Head Quarters, Faridabad, 121001, Haryana, India
| | | |
Collapse
|
12
|
Chandra Pal S, Towfiqul Islam ARM, Chakrabortty R, Islam MS, Saha A, Shit M. Application of data-mining technique and hydro-chemical data for evaluating vulnerability of groundwater in Indo-Gangetic Plain. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2022; 318:115582. [PMID: 35772277 DOI: 10.1016/j.jenvman.2022.115582] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Revised: 04/08/2022] [Accepted: 06/17/2022] [Indexed: 06/15/2023]
Abstract
Vulnerability of groundwater is critical for the sustainable development of groundwater resources, especially in freshwater-limited coastal Indo-Gangetic plains. Here, we intend to develop an integrated novel approach for delineating groundwater vulnerability using hydro-chemical analysis and data-mining methods, i.e., Decision Tree (DT) and K-Nearest Neighbor (KNN) via k-fold cross-validation (CV) technique. A total of 110 of groundwater samples were obtained during the dry and wet seasons to generate an inventory map. Four K-fold CV approach was used to delineate the vulnerable region from sixteen vulnerability causal factors. The statistical error metrics i.e., receiver operating characteristic-area under the curve (AUC-ROC) and other advanced metrices were adopted to validate model outcomes. The results demonstrated the excellent ability of the proposed models to recognize the vulnerability of groundwater zones in the Indo-Gangetic plain. The DT model revealed higher performance (AUC = 0.97) followed by KNN model (AUC = 0.95). The north-central and north-eastern parts are more vulnerable due to high salinity, Nitrate (NO3-), Fluoride (F-) and Arsenic (As) concentrations. Policy-makers and groundwater managers can utilize the proposed integrated novel approach and the outcome of groundwater vulnerability maps to attain sustainable groundwater development and safeguard human-induced activities at the regional level.
Collapse
Affiliation(s)
- Subodh Chandra Pal
- Department of Geography, The University of Burdwan, Bardhaman, West Bengal, 713104, India.
| | | | - Rabin Chakrabortty
- Department of Geography, The University of Burdwan, Bardhaman, West Bengal, 713104, India
| | - Md Saiful Islam
- Department of Soil Science, Patuakhali Science and Technology University, Dumki, Patuakhali, 8602, Bangladesh
| | - Asish Saha
- Department of Geography, The University of Burdwan, Bardhaman, West Bengal, 713104, India
| | - Manisa Shit
- Department of Geography, Raiganj University, Raiganj, Uttar Dinajpur, West Bengal, 733134, India
| |
Collapse
|
13
|
Reis-Pereira M, Tosin R, Martins R, Neves dos Santos F, Tavares F, Cunha M. Kiwi Plant Canker Diagnosis Using Hyperspectral Signal Processing and Machine Learning: Detecting Symptoms Caused by Pseudomonas syringae pv. actinidiae. PLANTS (BASEL, SWITZERLAND) 2022; 11:2154. [PMID: 36015456 PMCID: PMC9414239 DOI: 10.3390/plants11162154] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 07/26/2022] [Accepted: 08/04/2022] [Indexed: 11/16/2022]
Abstract
Pseudomonas syringae pv. actinidiae (Psa) has been responsible for numerous epidemics of bacterial canker of kiwi (BCK), resulting in high losses in kiwi production worldwide. Current diagnostic approaches for this disease usually depend on visible signs of the infection (disease symptoms) to be present. Since these symptoms frequently manifest themselves in the middle to late stages of the infection process, the effectiveness of phytosanitary measures can be compromised. Hyperspectral spectroscopy has the potential to be an effective, non-invasive, rapid, cost-effective, high-throughput approach for improving BCK diagnostics. This study aimed to investigate the potential of hyperspectral UV-VIS reflectance for in-situ, non-destructive discrimination of bacterial canker on kiwi leaves. Spectral reflectance (325-1075 nm) of twenty plants were obtained with a handheld spectroradiometer in two commercial kiwi orchards located in Portugal, for 15 weeks, totaling 504 spectral measurements. Several modeling approaches based on continuous hyperspectral data or specific wavelengths, chosen by different feature selection algorithms, were tested to discriminate BCK on leaves. Spectral separability of asymptomatic and symptomatic leaves was observed in all multi-variate and machine learning models, including the FDA, GLM, PLS, and SVM methods. The combination of a stepwise forward variable selection approach using a support vector machine algorithm with a radial kernel and class weights was selected as the final model. Its overall accuracy was 85%, with a 0.70 kappa score and 0.84 F-measure. These results were coherent with leaves classified as asymptomatic or symptomatic by visual inspection. Overall, the findings herein reported support the implementation of spectral point measurements acquired in situ for crop disease diagnosis.
Collapse
Affiliation(s)
- Mafalda Reis-Pereira
- Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
- Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Campus da Faculdade de Engenharia da Universidade do Porto, Rua Roberto Frias, 4200-465 Porto, Portugal
| | - Renan Tosin
- Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
- Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Campus da Faculdade de Engenharia da Universidade do Porto, Rua Roberto Frias, 4200-465 Porto, Portugal
| | - Rui Martins
- Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
- Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Campus da Faculdade de Engenharia da Universidade do Porto, Rua Roberto Frias, 4200-465 Porto, Portugal
| | - Filipe Neves dos Santos
- Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Campus da Faculdade de Engenharia da Universidade do Porto, Rua Roberto Frias, 4200-465 Porto, Portugal
| | - Fernando Tavares
- Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
| | - Mário Cunha
- Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
- Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Campus da Faculdade de Engenharia da Universidade do Porto, Rua Roberto Frias, 4200-465 Porto, Portugal
| |
Collapse
|
14
|
Fallatah O, Ahmed M, Gyawali B, Alhawsawi A. Factors controlling groundwater radioactivity in arid environments: An automated machine learning approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 830:154707. [PMID: 35331768 DOI: 10.1016/j.scitotenv.2022.154707] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 03/02/2022] [Accepted: 03/16/2022] [Indexed: 06/14/2023]
Abstract
Groundwater resources in the Kingdom of Saudi Arabia (KSA) have high levels of natural radioactivity. Within the northwestern KSA, gross alpha (α) and gross beta (β) levels exceed national and international drinking-water limits. In this study, we developed and used an automated machine learning (AML) approach to quantify relationships between gross α and gross β activities and different geological, hydrogeological, and geochemical conditions. Two AML model groups (group I for gross α; group II for gross β) were constructed, using water samples collected from 360 irrigation and water supply wells, to define a robust model that explains the spatial variability in gross α and gross β activities, as well as variables that control the gross activities. Each group contained four model families: deep neural network (DNN), gradient boosting machine (GBM), generalized linear model (GLM), and distributed random forest (DRF). Model inputs include chemical compositions as well as geological and hydrogeological conditions. Three performance metrics were used to evaluate the models during training and testing: normalized root mean square error (NRMSE), Pearson's correlation coefficient (r), and Nash-Sutcliff efficiency (NSE) coefficient. Results indicate that (1) the GBM model outperformed (training: NRMSE: 0.37 ± 0.10; r: 0.92 ± 0.05; NSE: 0.85 ± 0.09; testing: NRMSE: 0.71 ± 0.08; r: 0.72 ± 0.08; NSE: 0.49 ± 0.12) the DNN, DRF, and GLM models when modelling gross α activities; (2) gross α activities are controlled by pH, stream density, nitrate, manganese, and vegetation index; (3) the DRF model outperformed (training: NRMSE: 0.41 ± 0.05; r: 0.92 ± 0.02; NSE: 0.83 ± 0.04; testing: NRMSE: 0.67 ± 0.09; r: 0.77 ± 0.07; NSE: 0.54 ± 0.12) the GBM, DNN, and GLM models when modelling gross β activities; (4) input variables that affect the gross β actives are pH, temperature, stream density, lithology, and nitrate; and (5) no single model could be used to model both gross α and gross β activities-instead, a combination of AML models should be used. Our computationally efficient approach provides a framework and insights for using AML techniques in water quality investigations and promotes more and improved use of different geological, hydrogeological, and geochemical datasets by the scientific community and decision makers to develop guidelines for mitigation.
Collapse
Affiliation(s)
- Othman Fallatah
- Department of Nuclear Engineering, Faculty of Engineering, King Abdulaziz University, P.O. Box 80204, Jeddah 21589, Saudi Arabia
| | - Mohamed Ahmed
- Department of Physical and Environmental Sciences, Texas A&M University-Corpus Christi, 6300 Ocean Drive, Corpus Christi, TX 78412, USA.
| | - Bimal Gyawali
- Department of Physical and Environmental Sciences, Texas A&M University-Corpus Christi, 6300 Ocean Drive, Corpus Christi, TX 78412, USA
| | - Abdulsalam Alhawsawi
- Department of Nuclear Engineering, Faculty of Engineering, King Abdulaziz University, P.O. Box 80204, Jeddah 21589, Saudi Arabia
| |
Collapse
|
15
|
Computational assessment of groundwater salinity distribution within coastal multi-aquifers of Bangladesh. Sci Rep 2022; 12:11165. [PMID: 35778436 PMCID: PMC9249886 DOI: 10.1038/s41598-022-15104-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 06/17/2022] [Indexed: 11/30/2022] Open
Abstract
The rising salinity trend in the country’s coastal groundwater has reached an alarming rate due to unplanned use of groundwater in agriculture and seawater seeping into the underground due to sea-level rise caused by global warming. Therefore, assessing salinity is crucial for the status of safe groundwater in coastal aquifers. In this research, a rigorous hybrid neurocomputing approach comprised of an Adaptive Neuro-Fuzzy Inference System (ANFIS) hybridized with a new meta-heuristic optimization algorithm, namely Aquila optimization (AO) and the Boruta-Random forest feature selection (FS) was developed for estimating the salinity of multi-aquifers in coastal regions of Bangladesh. In this regard, 539 data samples, including ten water quality indices, were collected to provide the predictive model. Moreover, the individual ANFIS, Slime Mould Algorithm (SMA), and Ant Colony Optimization for Continuous Domains (ACOR) coupled with ANFIS (i.e., ANFIS-SMA and ANFIS-ACOR) and LASSO regression (Lasso-Reg) schemes were examined to compare with the primary model. Several goodness-of-fit indices, such as correlation coefficient (R), the root mean squared error (RMSE), and Kling-Gupta efficiency (KGE) were used to validate the robustness of the predictive models. Here, the Boruta-Random Forest (B-RF), as a new robust tree-based FS, was adopted to identify the most significant candidate inputs and effective input combinations to reduce the computational cost and time of the modeling. The outcomes of four selected input combinations ascertained that the ANFIS-OA regarding the best accuracy in terms of (R = 0.9450, RMSE = 1.1253 ppm, and KGE = 0.9146) outperformed the ANFIS-SMA (R = 0.9406, RMSE = 1.1534 ppm, and KGE = 0.8793), ANFIS-ACOR (R = 0.9402, RMSE = 1.1388 ppm, and KGE = 0.8653), Lasso-Reg (R = 0.9358), and ANFIS (R = 0.9306) models. Besides, the first candidate input combination (C1) by three inputs, including Cl− (mg/l), Mg2+ (mg/l), Na+ (mg/l), yielded the best accuracy among all alternatives, implying the role importance of (B-RF) feature selection. Finally, the spatial salinity distribution assessment in the study area ascertained the high predictability potential of the ANFIS-OA hybrid with B-RF feature selection compared to other paradigms. The most important novelty of this research is using a robust framework comprised of the non-linear data filtering technique and a new hybrid neuro-computing approach, which can be considered as a reliable tool to assess water salinity in coastal aquifers.
Collapse
|
16
|
Zhou P, Li Z, Snowling S, Goel R, Zhang Q. Multi-step ahead prediction of hourly influent characteristics for wastewater treatment plants: a case study from North America. ENVIRONMENTAL MONITORING AND ASSESSMENT 2022; 194:389. [PMID: 35445887 DOI: 10.1007/s10661-022-09957-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 03/12/2022] [Indexed: 06/14/2023]
Abstract
Prediction of influent characteristics, before any treatment takes place, is of great importance to the operation and management of wastewater treatment plants (WWTPs). In this study, four machine-learning models, including multilayer perceptron (MLP), long short-term memory network (LSTM), K-nearest neighbour (KNN), and random forest (RF), are introduced to utilize real-time wastewater data from three WWTPs in North America (i.e., Tres Rios, Woodward, and one confidential plant) for predicting hourly influent characteristics. Input variables are selected using an autocorrelation analysis and a variable importance measure from RF. Both univariate and multivariate analyses are investigated to improve model accuracy. The performances of one- and multiple-step-ahead models are compared. With a short prediction horizon, all the models derived from both univariate and multivariate analyses show excellent performance. It was found that the performance deterioration as the prediction horizon expands could be mitigated significantly by including extra variables, such as meteorological variables. This work can provide valuable support for the high-temporal-resolution prediction of wastewater influent characteristics for WWTPs. The proposed models can also bridge the gap between data and decision-making in the wastewater sector.
Collapse
Affiliation(s)
- Pengxiao Zhou
- Department of Civil Engineering, McMaster University, Hamilton, ON, L8S 4L7, Canada
| | - Zhong Li
- Department of Civil Engineering, McMaster University, Hamilton, ON, L8S 4L7, Canada.
| | - Spencer Snowling
- Hatch Ltd., Sheridan Science & Technology Park, 2800 Speakman Drive, Mississauga, ON, L5K 2R7, Canada
| | - Rajeev Goel
- Hatch Ltd., Sheridan Science & Technology Park, 2800 Speakman Drive, Mississauga, ON, L5K 2R7, Canada
| | - Qianqian Zhang
- Department of Civil Engineering, McMaster University, Hamilton, ON, L8S 4L7, Canada
| |
Collapse
|
17
|
Prediction of Rockfill Materials' Shear Strength Using Various Kernel Function-Based Regression Models-A Comparative Perspective. MATERIALS 2022; 15:ma15051739. [PMID: 35268965 PMCID: PMC8911239 DOI: 10.3390/ma15051739] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 02/15/2022] [Accepted: 02/23/2022] [Indexed: 11/21/2022]
Abstract
The mechanical behavior of the rockfill materials (RFMs) used in a dam’s shell must be evaluated for the safe and cost-effective design of embankment dams. However, the characterization of RFMs with specific reference to shear strength is challenging and costly, as the materials may contain particles larger than 500 mm in diameter. This study explores the potential of various kernel function-based Gaussian process regression (GPR) models to predict the shear strength of RFMs. A total of 165 datasets compiled from the literature were selected to train and test the proposed models. Comparing the developed models based on the GPR method shows that the superlative model was the Pearson universal kernel (PUK) model with an R-squared (R2) of 0.9806, a correlation coefficient (r) of 0.9903, a mean absolute error (MAE) of 0.0646 MPa, a root mean square error (RMSE) of 0.0965 MPa, a relative absolute error (RAE) of 13.0776%, and a root relative squared error (RRSE) of 14.6311% in the training phase, while it performed equally well in the testing phase, with R2 = 0.9455, r = 0.9724, MAE = 0.1048 MPa, RMSE = 0.1443 MPa, RAE = 21.8554%, and RRSE = 23.6865%. The prediction results of the GPR-PUK model are found to be more accurate and are in good agreement with the actual shear strength of RFMs, thus verifying the feasibility and effectiveness of the model.
Collapse
|
18
|
Neurocomputing Modelling of Hydrochemical and Physical Properties of Groundwater Coupled with Spatial Clustering, GIS, and Statistical Techniques. SUSTAINABILITY 2022. [DOI: 10.3390/su14042250] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Groundwater (GW) is a critical freshwater resource for billions of individuals worldwide. Rapid anthropogenic exploitation has increasingly deteriorated GW quality and quantity. Reliable estimation of complex hydrochemical properties of GW is crucial for sustainable development. Real field and experimental studies in an agricultural area from the significant sandstone aquifers (Wajid Aquifer) were conducted. For the modelling purpose, three types of computational models, including the emerging Hammerstein–Wiener (HW), back propagation neural network (BPNN), and statistical multi-variate regression (MVR), were developed for the multi-station estimation of total dissolved solids (TDS) (mg/L) and total hardness (TH) (mg/L). A geographic information system (GIS) was used for the spatial variability assessment of 32 hydrochemical and physical properties of the GW aquifer. A comprehensive visualized literature review spanning several decades was conducted in order to gain an understanding of the existing research and debates relevant to a particular GW and artificial intelligence (AI) study. The experimental data, pre-processing, and feature selection were conducted to determine the most dominant variables for AI-based modelling. The estimation results were evaluated using determination coefficient (DC), mean bias error (MBE), mean square error (MSE), and root mean square error (RMSE). The outcomes proved that TDS (mg/L) and TH (mg/L) correlated more than 90% and 70–85% with Ca2+, Cl−, Br−, NO3−, and Fe, and Na+, SO42−, Mg2+, and F− combinations, respectively. HW-M1 justified promising among all the models with MBE = 1.41 × 10−11, 1.14 × 10−14, and MSE = 7.52 × 10−2, 3.88 × 10−11 for TDS (mg/L), TH (mg/L), respectively. The accuracy proved merit for the overall development of and practical estimation of hydrochemical variables (TDS, TH) (mg/L) and decision-making benchmarks.
Collapse
|
19
|
Comparative Assessment of Individual and Ensemble Machine Learning Models for Efficient Analysis of River Water Quality. SUSTAINABILITY 2022. [DOI: 10.3390/su14031183] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The prediction accuracies of machine learning (ML) models may not only be dependent on the input parameters and training dataset, but also on whether an ensemble or individual learning model is selected. The present study is based on the comparison of individual supervised ML models, such as gene expression programming (GEP) and artificial neural network (ANN), with that of an ensemble learning model, i.e., random forest (RF), for predicting river water salinity in terms of electrical conductivity (EC) and dissolved solids (TDS) in the Upper Indus River basin, Pakistan. The projected models were trained and tested by using a dataset of seven input parameters chosen on the basis of significant correlation. Optimization of the ensemble RF model was achieved by producing 20 sub-models in order to choose the accurate one. The goodness-of-fit of the models was assessed through well-known statistical indicators, such as the coefficient of determination (R2), mean absolute error (MAE), root mean squared error (RMSE), and Nash–Sutcliffe efficiency (NSE). The results demonstrated a strong association between inputs and modeling outputs, where R2 value was found to be 0.96, 0.98, and 0.92 for the GEP, RF, and ANN models, respectively. The comparative performance of the proposed methods showed the relative superiority of the RF compared to GEP and ANN. Among the 20 RF sub-models, the most accurate model yielded the R2 equal to 0.941 and 0.938, with 70 and 160 numbers of corresponding estimators. The lowest RMSE values of 1.37 and 3.1 were yielded by the ensemble RF model on training and testing data, respectively. The results of the sensitivity analysis demonstrated that HCO3− is the most effective variable followed by Cl− and SO42− for both the EC and TDS. The assessment of the models on external criteria ensured the generalized results of all the aforementioned techniques. Conclusively, the outcome of the present research indicated that the RF model with selected key parameters could be prioritized for water quality assessment and management.
Collapse
|
20
|
Dauji S, Keesari T. Decision tree for estimating groundwater contaminant through proxies considering seasonality and soil saturation. ENVIRONMENTAL MONITORING AND ASSESSMENT 2021; 193:779. [PMID: 34748103 DOI: 10.1007/s10661-021-09577-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 10/26/2021] [Indexed: 06/13/2023]
Abstract
Chloride ion is an important indicator of water quality. Field measurement of chloride is difficult whereas laboratory measurement is both time-consuming and chemical intensive. The conservative nature of chloride and good correlation with electrical conductivity (EC) justifies its use as proxy for chloride estimations. Comparison of the best regression models (RMs) and data-driven decision tree (DT) model enables appreciation of relative merits of the two approaches for this purpose. Quantitative improvements over the models from literature are, increase in correlation (RM: 0.70 to 0.77; DT: 0.70 to 0.78) and decrease in relative errors (RM: MARE: 0.88 to 0.65 and RMSRE: 1.91 to 0.92; DT: MARE: 0.88 to 0.40; RMSRE: 1.91 to 0.54); thereby, DT has emerged as the better modeling approach for this case. Considering the influence of seasonality (pre-or post-monsoon) and degree of saturation of soil (water logged or water depleted) enabled the reduction of the correlation range (0.24-0.87) of the basic variables to a smaller range (0.44-0.89) for estimates of Cl-, along with relative error ranging from 0.35 to 0.57, the improvement being more pronounced for lower value of variable correlations. The overall comparison using the evaluation datasets between RM from literature and RM/DT models from this study exemplified that for the study area, the case-specific models developed using the data-driven tool: DT resulted in the most accurate estimation of chloride in groundwater from the chosen proxy: EC.
Collapse
Affiliation(s)
- Saha Dauji
- Nuclear Recycle Board, Bhabha Atomic Research Centre, Anushaktinagar, Mumbai, 400094, India.
- Homi Bhabha National Institute, Anushaktinagar, Mumbai, 400094, India.
| | - Tirumalesh Keesari
- Isotope and Radiation Application Division, Bhabha Atomic Research Centre, Mumbai, 400085, India
- Homi Bhabha National Institute, Anushaktinagar, Mumbai, 400094, India
| |
Collapse
|
21
|
A Novel Hybrid Model for Developing Groundwater Potentiality Model Using High Resolution Digital Elevation Model (DEM) Derived Factors. WATER 2021. [DOI: 10.3390/w13192632] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The present work aims to build a unique hybrid model by combining six fuzzy operator feature selection-based techniques with logistic regression (LR) for producing groundwater potential models (GPMs) utilising high resolution DEM-derived parameters in Saudi Arabia’s Bisha area. The current work focuses exclusively on the influence of DEM-derived parameters on GPMs modelling, without considering other variables. AND, OR, GAMMA 0.75, GAMMA 0.8, GAMMA 0.85, and GAMMA 0.9 are six hybrid models based on fuzzy feature selection. The GPMs were validated by using empirical and binormal receiver operating characteristic curves (ROC). An RF-based sensitivity analysis was performed in order to examine the influence of GPM settings. Six hybrid algorithms and one unique hybrid model have predicted 1835–2149 km2 as very high and 3235–4585 km2 as high groundwater potential regions. The AND model (ROCe-AUC: 0.81; ROCb-AUC: 0.804) outperformed the other models based on ROC’s area under curve (AUC). A novel hybrid model was constructed by combining six GPMs (considering as variables) with the LR model. The AUC of ROCe and ROCb revealed that the novel hybrid model outperformed existing fuzzy-based GPMs (ROCe: 0.866; ROCb: 0.892). With DEM-derived parameters, the present work will help to improve the effectiveness of GPMs for developing sustainable groundwater management plans.
Collapse
|
22
|
Bourel M, Segura AM, Crisci C, López G, Sampognaro L, Vidal V, Kruk C, Piccini C, Perera G. Machine learning methods for imbalanced data set for prediction of faecal contamination in beach waters. WATER RESEARCH 2021; 202:117450. [PMID: 34352535 DOI: 10.1016/j.watres.2021.117450] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 07/09/2021] [Accepted: 07/15/2021] [Indexed: 06/13/2023]
Abstract
Predicting water contamination by statistical models is a useful tool to manage health risk in recreational beaches. Extreme contamination events, i.e. those exceeding normative are generally rare with respect to bathing conditions and thus the data is said to be imbalanced. Modeling and predicting those rare events present unique challenges. Here we introduce and evaluate several machine learning techniques and metrics to model imbalanced data and evaluate model performance. We do so by using a) simulated data-sets and b) a real data base with records of faecal coliform abundance monitored for 10 years in 21 recreational beaches in Uruguay (N ≈ 19000) using in situ and meteorological variables. We discuss advantages and disadvantages of the methods and provide a simple guide to perform models for a general audience. We also provide R codes to reproduce model fitting and testing. We found that most Machine Learning techniques are sensitive to imbalance and require specific data pre-treatment (e.g. upsampling) to improve performance. Accuracy (i.e. correctly classified cases over total cases) is not adequate to evaluate model performance on imbalanced data set. Instead, true positive rates (TPR) and false positive rates (FPR) are recommended. Among the 52 possible candidate algorithms tested, the stratified Random forest presented the better performance improving TPR in 50% with respect to baseline (0.4) and outperformed baseline in the evaluated metrics. Support vector machines combined with upsampling method or synthetic minority oversampling technique (SMOTE) performed well, similar to Adaboost with SMOTE. These results suggests that combining modeling strategies is necessary to improve our capacity to anticipate water contamination and avoid health risk.
Collapse
Affiliation(s)
- Mathias Bourel
- IMERL, Facultad de Ingeniería, Universidad de la República, Montevideo, Uruguay; Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay.
| | - Angel M Segura
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Carolina Crisci
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Guzmán López
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Lia Sampognaro
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Victoria Vidal
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Carla Kruk
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay; Departamento de Microbiología, Instituto de Investigaciones Biológicas Clemente Estable, Ministerio de Educación y Cultura, Montevideo, Uruguay; Instituto de Ecología y Ciencias Ambientales, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Claudia Piccini
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay; Departamento de Microbiología, Instituto de Investigaciones Biológicas Clemente Estable, Ministerio de Educación y Cultura, Montevideo, Uruguay
| | - Gonzalo Perera
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| |
Collapse
|
23
|
Development of Prediction Models for Shear Strength of Rockfill Material Using Machine Learning Techniques. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11136167] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Supervised machine learning and its algorithms are a developing trend in the prediction of rockfill material (RFM) mechanical properties. This study investigates supervised learning algorithms—support vector machine (SVM), random forest (RF), AdaBoost, and k-nearest neighbor (KNN) for the prediction of the RFM shear strength. A total of 165 RFM case studies with 13 key material properties for rockfill characterization have been applied to construct and validate the models. The performance of the SVM, RF, AdaBoost, and KNN models are assessed using statistical parameters, including the coefficient of determination (R2), Nash–Sutcliffe efficiency (NSE) coefficient, root mean square error (RMSE), and ratio of the RMSE to the standard deviation of measured data (RSR). The applications for the abovementioned models for predicting the shear strength of RFM are compared and discussed. The analysis of the R2 together with NSE, RMSE, and RSR for the RFM shear strength data set demonstrates that the SVM achieved a better prediction performance with (R2 = 0.9655, NSE = 0.9639, RMSE = 0.1135, and RSR = 0.1899) succeeded by the RF model with (R2 = 0.9545, NSE = 0.9542, RMSE = 0.1279, and RSR = 0.2140), the AdaBoost model with (R2 = 0.9390, NSE = 0.9388, RMSE = 0.1478, and RSR = 0.2474), and the KNN with (R2 = 0.6233, NSE = 0.6180, RMSE = 0.3693, and RSR = 0.6181). Furthermore, the sensitivity analysis result shows that normal stress was the key parameter affecting the shear strength of RFM.
Collapse
|
24
|
Song C, Yao L, Hua C, Ni Q. A water quality prediction model based on variational mode decomposition and the least squares support vector machine optimized by the sparrow search algorithm (VMD-SSA-LSSVM) of the Yangtze River, China. ENVIRONMENTAL MONITORING AND ASSESSMENT 2021; 193:363. [PMID: 34041601 DOI: 10.1007/s10661-021-09127-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 05/16/2021] [Indexed: 05/12/2023]
Abstract
Accurate and reliable water quality forecasting is of great significance for water resource optimization and management. This study focuses on the prediction of water quality parameters such as the dissolved oxygen (DO) in a river system. The accuracy of traditional water quality prediction methods is generally low, and the prediction results have serious autocorrelation. To overcome nonstationarity, randomness, and nonlinearity of the water quality parameter data, an improved least squares support vector machine (LSSVM) model was proposed to improve the model's performance at two gaging stations, namely Panzhihua and Jiujiang, in the Yangtze River, China. In addition, a hybrid model that recruits variational mode decomposition (VMD) to denoise the input data was adopted. A novel metaheuristic optimization algorithm, the sparrow search algorithm (SSA) was also implemented to compute the optimal parameter values for the LSSVM model. To validate the proposed hybrid model, standalone LSSVM, SSA-LSSVM, VMD-LSSVM, support vector regression (SVR), as well as back propagation neural network (BPNN) were considered as the benchmark models. The results indicated that the VMD-SSA-LSSVM model exhibited the best forecasting performance among all the peer models at Panzhihua station. Furthermore, the model forecasting results applied at Jiujiang were consistent with those at Panzhihua station. This result further verified the accuracy and stability of the VMD-SSA-LSSVM model. Thus, the proposed hybrid model was effective method for forecasting nonstationary and nonlinear water quality parameter series and can be recommended as a promising model for water quality parameter forecasting.
Collapse
Affiliation(s)
- Chenguang Song
- School of Engineering and Technology, China University of Geosciences (Beijing), Beijing, 100083, China.
| | - Leihua Yao
- School of Engineering and Technology, China University of Geosciences (Beijing), Beijing, 100083, China.
| | - Chengya Hua
- School of Engineering and Technology, China University of Geosciences (Beijing), Beijing, 100083, China
| | - Qihang Ni
- School of Engineering and Technology, China University of Geosciences (Beijing), Beijing, 100083, China
| |
Collapse
|