1
|
Yao Z, Wang Z, Huang J, Xu N, Cui X, Wu T. Interpretable prediction, classification and regulation of water quality: A case study of Poyang Lake, China. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 951:175407. [PMID: 39127213 DOI: 10.1016/j.scitotenv.2024.175407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 08/06/2024] [Accepted: 08/07/2024] [Indexed: 08/12/2024]
Abstract
Effective identification and regulation of water quality impact factors is essential for water resource management and environmental protection. However, the complex coupling of water quality systems poses a significant challenge to this task. This study proposes coherent model for water quality prediction, classification and regulation based on interpretable machine learning. The decomposition-reconstruction module is used to transform non-stationary water quality series into stationary series while effectively reducing the feature dimensions. Spatiotemporal multi-source data is introduced by using the Maximum Information Coefficient (MIC) for feature selection. The Temporal Convolutional Network (TCN) is used to extract the temporal features of different variables, followed by the introduction of External Attention mechanism (EA) to construct the relationship between these features. Finally, the target water quality sequence is simulated using Gated Recurrent Unit (GRU). The proposed model was applied to Poyang Lake in China to predict six water quality indicators: ammonia nitrogen (NH3-N), dissolved oxygen (DO), pH, total nitrogen (TN), total phosphorus (TP), water temperature (WT). The water quality was then classified based on the prediction results using the XGBoost algorithm. The findings indicate that the proposed model's Nash-Sutcliff Efficiency (NSE) value ranges from 0.88 to 0.99, surpassing that of the benchmark model, and demonstrates strong interval prediction performance. The results highlight the superior performance of the XGBoost algorithm (with an accuracy of 0.89) in addressing water quality classification issues, particularly in cases of category imbalance. Subsequently, interpretability analysis using the SHapley Additive exPlanation (SHAP) method revealed that the model is capable of learning relationships between different variables and there exists a possibility of learning the physical laws. Ultimately, this study proposes a water quality regulation mechanism that improves TN and DO levels by stepwise changing the magnitude of water temperature, which significantly improves in the case of data limitations. In conclusion, this study presents an overall framework for integrating water quality prediction, classification and improvement for the first time, forming a complete set of water quality early warning and improvement management strategies. This framework provides new ideas and ways for lake water quality management.
Collapse
Affiliation(s)
- Zhiyuan Yao
- College of Information, Shanghai Ocean University, Shanghai 201306, China
| | - Zhaocai Wang
- College of Information, Shanghai Ocean University, Shanghai 201306, China.
| | - Jinghan Huang
- College of Economics and Management, Shanghai Ocean University, Shanghai 201306, China
| | - Nannan Xu
- College of Information, Shanghai Ocean University, Shanghai 201306, China
| | - Xuefei Cui
- College of engineering, Shanghai Ocean University, Shanghai 201306, China
| | - Tunhua Wu
- School of Information Engineering, Wenzhou Medical University, Wenzhou 325035, China
| |
Collapse
|
2
|
Rissmann CWF, Pearson LK, Snelder TH. Physiographic Environment Classification: a Controlling Factor Classification of Landscape Susceptibility to Waterborne Contaminant Loss. ENVIRONMENTAL MANAGEMENT 2024; 74:230-255. [PMID: 38441648 PMCID: PMC11227452 DOI: 10.1007/s00267-024-01950-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 02/10/2024] [Indexed: 07/07/2024]
Abstract
Spatial variation in the landscape factors climate, geomorphology, and lithology cause significant differences in water quality issues even when land use pressures are similar. The Physiographic Environment Classification (PEC) classifies landscapes based on their susceptibility to the loss of water quality contaminants. The classification is informed by a conceptual model of the landscape factors that control the hydrochemical maturity of water discharged to streams. In New Zealand, a case study using climatic, topographic, and geological data classified the country into six, 36, and 320 classes at Levels 1 (Climate), 1-2 (Climate + Geomorphology), and 1-3 (Climate + Geomorphology + Lithology), respectively. Variance partitioning analysis applied to New Zealand's national surface water monitoring network (n = 810 stations) assessed the contributions of PEC classes and land use on the spatial variation of water quality contaminants. Compared to land use, PEC explained 0.6× the variation in Nitrate Nitrite Nitrogen (NNN), 1.0× in Total Kjeldahl Nitrogen (TKN), 1.8× in Dissolved Reactive Phosphorus (DRP), 2.3× in Particulate Phosphorus (PP), 2.6× in E. coli, and 4.3× in Turbidity (TURB). Land use explained more variation in riverine NNN, while landscape factors explained more variation in DRP, PP, E. coli, and TURB. Overall, PEC accounted for 2.1× more variation in riverine contaminant concentrations than land use. The differences in contaminant concentrations between PEC classes (p < 0.05), after adjusting for land use, were consistent with the conceptual model of hydrochemical maturation. PEC elucidates underlying causes of contaminant loss susceptibility and can inform targeted land management across multiple scales.
Collapse
Affiliation(s)
- Clinton W F Rissmann
- Land and Water Science, Invercargill, New Zealand.
- Waterways Centre for Freshwater Management, University of Canterbury, and Lincoln University, Christchurch, New Zealand.
| | | | | |
Collapse
|
3
|
Rojas-López AG, Rodríguez-Molina A, Uriarte-Arcia AV, Villarreal-Cervantes MG. Vertebral Column Pathology Diagnosis Using Ensemble Strategies Based on Supervised Machine Learning Techniques. Healthcare (Basel) 2024; 12:1324. [PMID: 38998860 PMCID: PMC11241707 DOI: 10.3390/healthcare12131324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Revised: 06/25/2024] [Accepted: 06/28/2024] [Indexed: 07/14/2024] Open
Abstract
One expanding area of bioinformatics is medical diagnosis through the categorization of biomedical characteristics. Automatic medical strategies to boost the diagnostic through machine learning (ML) methods are challenging. They require a formal examination of their performance to identify the best conditions that enhance the ML method. This work proposes variants of the Voting and Stacking (VC and SC) ensemble strategies based on diverse auto-tuning supervised machine learning techniques to increase the efficacy of traditional baseline classifiers for the automatic diagnosis of vertebral column orthopedic illnesses. The ensemble strategies are created by first combining a complete set of auto-tuned baseline classifiers based on different processes, such as geometric, probabilistic, logic, and optimization. Next, the three most promising classifiers are selected among k-Nearest Neighbors (kNN), Naïve Bayes (NB), Logistic Regression (LR), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Support Vector Machine (SVM), Artificial Neural Networks (ANN), and Decision Tree (DT). The grid-search K-Fold cross-validation strategy is applied to auto-tune the baseline classifier hyperparameters. The performances of the proposed ensemble strategies are independently compared with the auto-tuned baseline classifiers. A concise analysis evaluates accuracy, precision, recall, F1-score, and ROC-ACU metrics. The analysis also examines the misclassified disease elements to find the most and least reliable classifiers for this specific medical problem. The results show that the VC ensemble strategy provides an improvement comparable to that of the best baseline classifier (the kNN). Meanwhile, when all baseline classifiers are included in the SC ensemble, this strategy surpasses 95% in all the evaluated metrics, standing out as the most suitable option for classifying vertebral column diseases.
Collapse
Affiliation(s)
- Alam Gabriel Rojas-López
- Optimal Mechatronic Design Laboratory, Postgraduate Department, Instituto Politécnico Nacional—Centro de Innovación y Desarrollo Tecnológico en Cómputo, Mexico City 07700, Mexico; (A.G.R.-L.); (A.V.U.-A.)
| | | | - Abril Valeria Uriarte-Arcia
- Optimal Mechatronic Design Laboratory, Postgraduate Department, Instituto Politécnico Nacional—Centro de Innovación y Desarrollo Tecnológico en Cómputo, Mexico City 07700, Mexico; (A.G.R.-L.); (A.V.U.-A.)
| | - Miguel Gabriel Villarreal-Cervantes
- Optimal Mechatronic Design Laboratory, Postgraduate Department, Instituto Politécnico Nacional—Centro de Innovación y Desarrollo Tecnológico en Cómputo, Mexico City 07700, Mexico; (A.G.R.-L.); (A.V.U.-A.)
| |
Collapse
|
4
|
Talukdar S, Shahfahad, Bera S, Naikoo MW, Ramana GV, Mallik S, Kumar PA, Rahman A. Optimisation and interpretation of machine and deep learning models for improved water quality management in Lake Loktak. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 351:119866. [PMID: 38147770 DOI: 10.1016/j.jenvman.2023.119866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/28/2023] [Accepted: 12/13/2023] [Indexed: 12/28/2023]
Abstract
Loktak Lake, one of the largest freshwater lakes in Manipur, India, is critical for the eco-hydrology and economy of the region, but faces deteriorating water quality due to urbanisation, anthropogenic activities, and domestic sewage. Addressing the urgent need for effective pollution management, this study aims to assess the lake's water quality status using the water quality index (WQI) and develop advanced machine learning (ML) tools for WQI assessment and ML model interpretation to improve pollution management decision making. The WQI was assessed using entropy-based weighting arithmetic and three ML models - Gradient Boosting Machine (GBM), Random Forest (RF) and Deep Neural Network (DNN) - were optimised using a grid search algorithm in the H2O Application Programming Interface (API). These models were validated by various metrics and interpreted globally and locally via Partial Dependency Plot (PDP), Accumulated Local Effect (ALE) and SHapley Additive exPlanations (SHAP). The results show a WQI range of 72.38-100, with 52.7% of samples categorised as very poor. The RF model outperformed GBM and DNN and showed the highest accuracy and generalisation ability, which is reflected in the superior R2 values (0.97 in training, 0.9 in test) and the lower root mean square error (RMSE). RF's minimal margin of error and reliable feature interpretation contrasted with DNN's larger margin of error and inconsistency, which affected its usefulness for decision making. Turbidity was found to be a critical predictive feature in all models, significantly influencing WQI, with other variables such as pH and temperature also playing an important role. SHAP dependency plots illustrated the direct relationship between key water quality parameters such as turbidity and WQI predictions. The novelty of this study lies in its comprehensive approach to the evaluation and interpretation of ML models for WQI estimation, which provides a nuanced understanding of water quality dynamics in Loktak Lake. By identifying the most effective ML models and key predictive functions, this study provides invaluable insights for water quality management and paves the way for targeted strategies to monitor and improve water quality in this vital freshwater ecosystem.
Collapse
Affiliation(s)
- Swapan Talukdar
- Department of Geography, Faculty of Natural Sciences, Jamia Millia Islamia, New Delhi, 110025, India.
| | - Shahfahad
- Department of Geography, Faculty of Natural Sciences, Jamia Millia Islamia, New Delhi, 110025, India.
| | - Somnath Bera
- Department of Geography, Central University of South Bihar, Gaya, Bihar, 823001, India.
| | - Mohd Waseem Naikoo
- Department of Geography & Disaster Management, University of Kashmir, Srinagar, Jammu & Kashmir, 190006, India.
| | - G V Ramana
- Department of Civil Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India.
| | - Santanu Mallik
- Department of Civil Engineering, National Institution of Technology, Agaratala, Tripura, 799046, India.
| | - Potsangbam Albino Kumar
- Department of Civil Engineering, National Institution of Technology, Imphal, Manipur, 795004, India.
| | - Atiqur Rahman
- Department of Geography, Faculty of Natural Sciences, Jamia Millia Islamia, New Delhi, 110025, India.
| |
Collapse
|
5
|
O'Sullivan CM, Deo RC, Ghahramani A. Explainable AI approach with original vegetation data classifies spatio-temporal nitrogen in flows from ungauged catchments to the Great Barrier Reef. Sci Rep 2023; 13:18145. [PMID: 37875554 PMCID: PMC10598196 DOI: 10.1038/s41598-023-45259-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 10/17/2023] [Indexed: 10/26/2023] Open
Abstract
Transfer of processed data and parameters to ungauged catchments from the most similar gauged counterpart is a common technique in water quality modelling. But catchment similarities for Dissolved Inorganic Nitrogen (DIN) are ill posed, which affects the predictive capability of models reliant on such methods for simulating DIN. Spatial data proxies to classify catchments for most similar DIN responses are a demonstrated solution, yet their applicability to ungauged catchments is unexplored. We adopted a neural network pattern recognition model (ANN-PR) and explainable artificial intelligence approach (SHAP-XAI) to match all ungauged catchments that flow to the Great Barrier Reef to gauged ones based on proxy spatial data. Catchment match suitability was verified using a neural network water quality (ANN-WQ) simulator trained on gauged catchment datasets, tested by simulating DIN for matched catchments in unsupervised learning scenarios. We show that discriminating training data to DIN regime benefits ANN-WQ simulation performance in unsupervised scenarios ( p< 0.05). This phenomenon demonstrates that proxy spatial data is a useful tool to classify catchments with similar DIN regimes. Catchments lacking similarity with gauged ones are identified as priority monitoring areas to gain observed data for all DIN regimes in catchments that flow to the Great Barrier Reef, Australia.
Collapse
Affiliation(s)
- Cherie M O'Sullivan
- University of Southern Queensland, Toowoomba, QLD, 4350, Australia. Cherie.O'
| | - Ravinesh C Deo
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield, QLD, 4300, Australia
- Center for Applied Climate Sciences, University of Southern Queensland, Toowoomba, QLD, 4350, Australia
| | - Afshin Ghahramani
- University of Southern Queensland, Toowoomba, QLD, 4350, Australia
- Department of Environment and Science, Queensland Government, Rockhampton, QLD, 4700, Australia
| |
Collapse
|