1
|
Zhou H, Wu C, Li B, Lu C, Zhao Y, Zhao Z. Classification of deep and shallow groundwater wells based on machine learning in the Hebei Plain North China. Sci Rep 2024; 14:18166. [PMID: 39107373 PMCID: PMC11303693 DOI: 10.1038/s41598-024-69238-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Accepted: 08/01/2024] [Indexed: 08/10/2024] Open
Abstract
Accurately determining the extraction volumes from various aquifers is crucial for effectively managing groundwater overexploitation. A key initial step in quantifying extracted groundwater volumes involves the classification of groundwater wells as either deep or shallow. This study evaluated 881,872 groundwater wells in the Hebei Plain, applying machine learning techniques to classify wells with unknown depths. Through the hydrogeological borehole data, the groundwater wells with known depth are divided into deep wells and shallow wells. Four machine learning algorithms-Random Forest, Support Vector Machine, Logistic Regression, and Naive Bayes-were employed to classify groundwater wells with unknown depths. The accuracy of these models was validated using known-depth well classifications. The results reveal that the Random Forest algorithm exhibited the highest performance among the models, achieving an overall accuracy of 91.23%. According to the Random Forest model, 43.51% of groundwater wells with unknown depths were classified as deep, while 56.49% were classified as shallow. The study also found that wells in areas where salinity exceeds 2 g/L are primarily deep groundwater wells. These findings provide valuable technical insight for groundwater well decommissioning and facilitate the assessment of extracted volumes of deep and shallow groundwater.
Collapse
Affiliation(s)
- Hang Zhou
- Key Laboratory of Roads and Railway Engineering Safety Control (Shijiazhuang Tiedao University), Ministry of Education, Shijiazhuang, 050043, China
| | - Chu Wu
- State Key Laboratory of Water Cycle Simulation and Regulation, China Institute of Water Resources and Hydropower Research, Beijing, 100038, China
| | - Baoqi Li
- Key Laboratory of Roads and Railway Engineering Safety Control (Shijiazhuang Tiedao University), Ministry of Education, Shijiazhuang, 050043, China
| | - Chuiyu Lu
- State Key Laboratory of Water Cycle Simulation and Regulation, China Institute of Water Resources and Hydropower Research, Beijing, 100038, China.
| | - Yong Zhao
- State Key Laboratory of Water Cycle Simulation and Regulation, China Institute of Water Resources and Hydropower Research, Beijing, 100038, China
| | - Ziyue Zhao
- Hebei Provincial Water Affairs Center, Shijiazhuang, 050043, China.
| |
Collapse
|
2
|
Lin RH, Lin P, Wang CC, Tung CW. A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example. J Cheminform 2024; 16:91. [PMID: 39095893 PMCID: PMC11297603 DOI: 10.1186/s13321-024-00891-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 07/27/2024] [Indexed: 08/04/2024] Open
Abstract
Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks.Scieific contributionA novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing.
Collapse
Affiliation(s)
- Run-Hsin Lin
- Institute of Biotechnology and Pharmaceutical Research, National Health Research Institutes, Miaoli County, 35053, Taiwan
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, 10675, Taiwan
| | - Pinpin Lin
- National Institute of Environmental Health Sciences, National Health Research Institutes, Miaoli County, 35053, Taiwan
| | - Chia-Chi Wang
- Department and Graduate Institute of Veterinary Medicine, School of Veterinary Medicine, National Taiwan University, Taipei, 10617, Taiwan
| | - Chun-Wei Tung
- Institute of Biotechnology and Pharmaceutical Research, National Health Research Institutes, Miaoli County, 35053, Taiwan.
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, 10675, Taiwan.
| |
Collapse
|
3
|
Zhang S, Yuan Y, Wang Z, Li J. The application of laser‑induced fluorescence in oil spill detection. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024; 31:23462-23481. [PMID: 38466385 DOI: 10.1007/s11356-024-32807-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 03/03/2024] [Indexed: 03/13/2024]
Abstract
Over the past two decades, oil spills have been one of the most serious ecological disasters, causing massive damage to the aquatic and terrestrial ecosystems as well as the socio-economy. In view of this situation, several methods have been developed and utilized to analyze oil samples. Among these methods, laser-induced fluorescence (LIF) technology has been widely used in oil spill detection due to its classification method, which is based on the fluorescence characteristics of chemical material in oil. This review systematically summarized the LIF technology from the perspective of excitation wavelength selection and the application of traditional and novel machine learning algorithms to fluorescence spectrum processing, both of which are critical for qualitative and quantitative analysis of oil spills. It can be seen that an appropriate excitation wavelength is indispensable for spectral discrimination due to different kinds of polycyclic aromatic hydrocarbons' (PAHs) compounds in petroleum products. By summarizing some articles related to LIF technology, we discuss the influence of the excitation wavelength on the accuracy of the oil spill detection model and proposed several suggestions on the selection of excitation wavelength. In addition, we introduced some traditional and novel machine learning (ML) algorithms and discussed the strengths and weaknesses of these algorithms and their applicable scenarios. With an appropriate excitation wavelength and data processing algorithm, it is believed that laser-induced fluorescence technology will become an efficient technique for real-time detection and analysis of oil spills.
Collapse
Affiliation(s)
- Shubo Zhang
- Department of Optical Science and Engineering, Fudan University, Shanghai, 200433, China
| | - Yafei Yuan
- Department of Sports Media and Information Technology, Shandong Sport University, Jinan, 250102, Shandong, China.
| | - Zhanhu Wang
- Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai, 200083, China
| | - Jing Li
- Department of Optical Science and Engineering, Fudan University, Shanghai, 200433, China
| |
Collapse
|
4
|
Li X, Ge J, Liu Z, Yang S, Wang L, Liu Y. Estimating the methane flux of the Dajiuhu subalpine peatland using machine learning algorithms and the maximal information coefficient technique. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 916:170241. [PMID: 38278264 DOI: 10.1016/j.scitotenv.2024.170241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 01/04/2024] [Accepted: 01/15/2024] [Indexed: 01/28/2024]
Abstract
The eddy covariance (EC) technique has emerged as the most widely used method for long-term continuous methane flux (FCH4) observations. However, the completeness of the FCH4 time series is limited by instrumental failures and data quality issues, resulting in missing data gaps ranging from 20 % to 90 %. In this situation, the excellent performance of machine learning (ML) algorithms in filling missing FCH4 data has provided a foundation for developing regional-scale FCH4 models. In this study, we established estimation models for FCH4 utilizing random forest (RF), support vector machine (SVM), back propagation (BP) and nonlinear multiple regression (MLR) algorithms. The maximal information coefficient (MIC) technique was employed to identify and rank the environmental factors that were correlated with FCH4. Our findings revealed that soil temperature (Ts), soil water content (SWC) and air temperature (Ta) were the primary environmental factors influencing FCH4. Among the four algorithms, from perspectives of model accuracy and relatively small number of driving factors, the RF models exhibited the best performance, followed by BP and SVM, whereas MLR demonstrated the lowest performance. Among the 144 RF models established using nine datasets, RF model with 8 driving factors in all-year (RFall-year8) could capture seasonal variations. Ultimately, we recommend (RFall-year8 as the optimal model for estimating FCH4 in the Dajiuhu subalpine peatland.
Collapse
Affiliation(s)
- Xue Li
- School of Environmental Studies, China University of Geosciences, Wuhan 430074, China; Laboratory of Basin Hydrology and Wetland Eco-restoration, China University of Geosciences, Wuhan 430074, China; Hubei Key Laboratory of Wetland Evolution and Ecological Restoration, China University of Geosciences (Wuhan), Wuhan 430078, China; Institution of Ecology and Environmental Sciences, China University of Geosciences (Wuhan), Wuhan 430078, China
| | - Jiwen Ge
- School of Environmental Studies, China University of Geosciences, Wuhan 430074, China; Laboratory of Basin Hydrology and Wetland Eco-restoration, China University of Geosciences, Wuhan 430074, China; Hubei Key Laboratory of Wetland Evolution and Ecological Restoration, China University of Geosciences (Wuhan), Wuhan 430078, China; Institution of Ecology and Environmental Sciences, China University of Geosciences (Wuhan), Wuhan 430078, China.
| | - Ziwei Liu
- School of Environmental Studies, China University of Geosciences, Wuhan 430074, China; Laboratory of Basin Hydrology and Wetland Eco-restoration, China University of Geosciences, Wuhan 430074, China; Hubei Key Laboratory of Wetland Evolution and Ecological Restoration, China University of Geosciences (Wuhan), Wuhan 430078, China; Institution of Ecology and Environmental Sciences, China University of Geosciences (Wuhan), Wuhan 430078, China
| | - Shiyu Yang
- School of Environmental Studies, China University of Geosciences, Wuhan 430074, China; Laboratory of Basin Hydrology and Wetland Eco-restoration, China University of Geosciences, Wuhan 430074, China; Hubei Key Laboratory of Wetland Evolution and Ecological Restoration, China University of Geosciences (Wuhan), Wuhan 430078, China; Institution of Ecology and Environmental Sciences, China University of Geosciences (Wuhan), Wuhan 430078, China
| | - Linlin Wang
- School of Environmental Studies, China University of Geosciences, Wuhan 430074, China; Laboratory of Basin Hydrology and Wetland Eco-restoration, China University of Geosciences, Wuhan 430074, China; Hubei Key Laboratory of Wetland Evolution and Ecological Restoration, China University of Geosciences (Wuhan), Wuhan 430078, China; Institution of Ecology and Environmental Sciences, China University of Geosciences (Wuhan), Wuhan 430078, China
| | - Ye Liu
- School of Environmental Studies, China University of Geosciences, Wuhan 430074, China; Laboratory of Basin Hydrology and Wetland Eco-restoration, China University of Geosciences, Wuhan 430074, China; Hubei Key Laboratory of Wetland Evolution and Ecological Restoration, China University of Geosciences (Wuhan), Wuhan 430078, China; Institution of Ecology and Environmental Sciences, China University of Geosciences (Wuhan), Wuhan 430078, China
| |
Collapse
|
5
|
Zeng G, Ma Y, Du M, Chen T, Lin L, Dai M, Luo H, Hu L, Zhou Q, Pan X. Deep convolutional neural networks for aged microplastics identification by Fourier transform infrared spectra classification. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 913:169623. [PMID: 38159742 DOI: 10.1016/j.scitotenv.2023.169623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/20/2023] [Accepted: 12/21/2023] [Indexed: 01/03/2024]
Abstract
Infrared (IR) spectroscopy is a powerful technique for detecting and identifying Microplastics (MPs) in the environment. However, the aging of MPs presents a challenge in accurately identification and classification. To address this challenge, a classification model based on deep convolutional neural networks (CNNs) was developed using infrared spectra results. Particularly, original infrared (IR) spectra were used as the sample dataset, therefore, relevant spectral details were preserved and additional noise or distortions were not introduced. The Adam (Adaptive moment estimation) algorithm was employed to accelerate gradient descent and weight update, the Dropout function was implemented to prevent overfitting and enhance the generalization performance of the network. An activation function ReLu (Rectified Linear Unit) was also utilized to simplify the co-adaptation relationship among neurons and prevent gradient disappearance. The performance of the CNN model in MPs classification was evaluated based on accuracy and robustness, and compared with other machine learning techniques. CNN model demonstrated superior capabilities in feature extraction and recognition, and greatly simplified the pre-processing procedure. The identification results of aged commercial microplastic samples showed accuracies of 40 % for Artificial Neural Network, 60 % for Random Forest, 80 % for Deep Neural Network, and 100 % for CNN, respectively. The CNN architecture developed in this work also demonstrates versatility by being suitable for both limited data cases and potential expansion to include more discrete data in the future.
Collapse
Affiliation(s)
- Ganning Zeng
- College of Environment, Zhejiang University of Technology, Hangzhou 310014, China; Key Laboratory of Ocean Space Resource Management Technology, MNR, Hangzhou 310012, China.
| | - Yuan Ma
- College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, China
| | - Mingming Du
- College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, China
| | - Tiansheng Chen
- College of Environment, Zhejiang University of Technology, Hangzhou 310014, China
| | - Liangyu Lin
- Key Laboratory of Ocean Space Resource Management Technology, MNR, Hangzhou 310012, China
| | - Mengzheng Dai
- College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, China
| | - Hongwei Luo
- College of Environment, Zhejiang University of Technology, Hangzhou 310014, China
| | - Lingling Hu
- College of Environment, Zhejiang University of Technology, Hangzhou 310014, China
| | - Qian Zhou
- College of Environment, Zhejiang University of Technology, Hangzhou 310014, China
| | - Xiangliang Pan
- College of Environment, Zhejiang University of Technology, Hangzhou 310014, China.
| |
Collapse
|
6
|
Khan N, Raza MA, Mirjat NH, Balouch N, Abbas G, Yousef A, Touti E. Unveiling the predictive power: a comprehensive study of machine learning model for anticipating chronic kidney disease. Front Artif Intell 2024; 6:1339988. [PMID: 38259821 PMCID: PMC10801895 DOI: 10.3389/frai.2023.1339988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 12/08/2023] [Indexed: 01/24/2024] Open
Abstract
In today's modern era, chronic kidney disease stands as a significantly grave ailment that detrimentally impacts human life. This issue is progressively escalating in both developed and developing nations. Precise and timely identification of chronic kidney disease is imperative for the prevention and management of kidney failure. Historical methods of diagnosing chronic kidney disease have often been deemed unreliable on several fronts. To distinguish between healthy individuals and those afflicted by chronic kidney disease, dependable and effective non-invasive techniques such as machine learning models have been adopted. In our ongoing research, we employ various machine learning models, encompassing logistic regression, random forest, decision tree, k-nearest neighbor, and support vector machine utilizing four kernel functions (linear, Laplacian, Bessel, and radial basis kernels), to forecast chronic kidney disease. The dataset used constitutes records from a case-control study involving chronic kidney disease patients in district Buner, Khyber Pakhtunkhwa, Pakistan. For comparative evaluation of the models in terms of classification and accuracy, diverse performance metrics, including accuracy, Brier score, sensitivity, Youden's index, and F1 score, were computed.
Collapse
Affiliation(s)
- Nitasha Khan
- Department of Electrical Engineering, Nazeer Hussain University, Karachi, Pakistan
| | - Muhammad Amir Raza
- Department of Electrical Engineering, Mehran University of Engineering and Technology, Khairpur Mirs, Sindh, Pakistan
| | - Nayyar Hussain Mirjat
- Department of Electrical Engineering, Mehran University of Engineering and Technology, Jamshoro, Sindh, Pakistan
| | - Neelam Balouch
- Department of Zoology, Shah Abdul Latif University Khairpur Mirs, Khairpur Mirs, Pakistan
| | - Ghulam Abbas
- School of Electrical Engineering, Southeast University, Nanjing, China
| | - Amr Yousef
- Electrical Engineering Department, University of Business and Technology, Jeddah, Saudi Arabia
- Engineering Mathematics Department, Alexandria University, Alexandria, Egypt
| | - Ezzeddine Touti
- Department of Electrical Engineering, College of Engineering, Northern Border University, Arar, Saudi Arabia
| |
Collapse
|
7
|
Ignatenko V, Surkov A, Koltcov S. Random forests with parametric entropy-based information gains for classification and regression problems. PeerJ Comput Sci 2024; 10:e1775. [PMID: 38196961 PMCID: PMC10773894 DOI: 10.7717/peerj-cs.1775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 12/04/2023] [Indexed: 01/11/2024]
Abstract
The random forest algorithm is one of the most popular and commonly used algorithms for classification and regression tasks. It combines the output of multiple decision trees to form a single result. Random forest algorithms demonstrate the highest accuracy on tabular data compared to other algorithms in various applications. However, random forests and, more precisely, decision trees, are usually built with the application of classic Shannon entropy. In this article, we consider the potential of deformed entropies, which are successfully used in the field of complex systems, to increase the prediction accuracy of random forest algorithms. We develop and introduce the information gains based on Renyi, Tsallis, and Sharma-Mittal entropies for classification and regression random forests. We test the proposed algorithm modifications on six benchmark datasets: three for classification and three for regression problems. For classification problems, the application of Renyi entropy allows us to improve the random forest prediction accuracy by 19-96% in dependence on the dataset, Tsallis entropy improves the accuracy by 20-98%, and Sharma-Mittal entropy improves accuracy by 22-111% compared to the classical algorithm. For regression problems, the application of deformed entropies improves the prediction by 2-23% in terms of R2 in dependence on the dataset.
Collapse
Affiliation(s)
- Vera Ignatenko
- Social and Cognitive Informatics Laboratory, National Research University Higher School of Economics, Saint-Petersburg, Russia
| | - Anton Surkov
- Social and Cognitive Informatics Laboratory, National Research University Higher School of Economics, Saint-Petersburg, Russia
| | - Sergei Koltcov
- Social and Cognitive Informatics Laboratory, National Research University Higher School of Economics, Saint-Petersburg, Russia
| |
Collapse
|
8
|
Aguayo R, León-Muñoz J, Aguayo M, Baez-Villanueva OM, Zambrano-Bigiarini M, Fernández A, Jacques-Coper M. PatagoniaMet: A multi-source hydrometeorological dataset for Western Patagonia. Sci Data 2024; 11:6. [PMID: 38167535 PMCID: PMC10761917 DOI: 10.1038/s41597-023-02828-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 12/06/2023] [Indexed: 01/05/2024] Open
Abstract
Western Patagonia (40-56°S) is a clear example of how the systematic lack of publicly available data and poor quality control protocols have hindered further hydrometeorological studies. To address these limitations, we present PatagoniaMet (PMET), a compilation of ground-based hydrometeorological data (PMET-obs; 1950-2020), and a daily gridded product of precipitation and temperature (PMET-sim; 1980-2020). PMET-obs was developed considering a 4-step quality control process applied to 523 hydrometeorological time series obtained from eight institutions in Chile and Argentina. Following current guidelines for hydrological datasets, several climatic and geographic attributes were derived for each catchment. PMET-sim was developed using statistical bias correction procedures, spatial regression models and hydrological methods, and was compared against other bias-corrected alternatives using hydrological modelling. PMET-sim was able to achieve Kling-Gupta efficiencies greater than 0.7 in 72% of the catchments, while other alternatives exceeded this threshold in only 50% of the catchments. PatagoniaMet represents an important milestone in the availability of hydro-meteorological data that will facilitate new studies in one of the largest freshwater ecosystems in the world.
Collapse
Affiliation(s)
- Rodrigo Aguayo
- Facultad de Ciencias Ambientales, Centro EULA-Chile, Universidad de Concepción, Concepción, Chile.
| | - Jorge León-Muñoz
- Departamento de Química Ambiental, Universidad Católica de la Santísima Concepción, Concepción, Chile
- Centro Interdisciplinario para la Investigación Acuícola (INCAR), Concepción-Puerto Montt, Chile
- Centro de Energía, Universidad Católica de la Santísima Concepción, Concepcion, Chile
| | - Mauricio Aguayo
- Facultad de Ciencias Ambientales, Centro EULA-Chile, Universidad de Concepción, Concepción, Chile
| | | | - Mauricio Zambrano-Bigiarini
- Departamento de Ingeniería Civil, Universidad de La Frontera, Temuco, Chile
- Center for Climate and Resilience Research (CR2), Santiago, Chile
| | - Alfonso Fernández
- Departamento de Geografía, Mountain GeoScience Group, Universidad de Concepción, Concepción, Chile
- Programa Ciencia Interdisciplinaria para las Montañas de los Andes del Sur (CIMASur), Universidad de Concepción, Concepción, Chile
| | - Martin Jacques-Coper
- Center for Climate and Resilience Research (CR2), Santiago, Chile
- Departamento de Geofísica, Universidad de Concepción, Concepción, Chile
- Center for Oceanographic Research COPAS-Coastal, Universidad de Concepción, Concepción, Chile
| |
Collapse
|
9
|
Wang Y, Shi F, Yao P, Sheng Y, Zhao C. Assessing the evolution and attribution of watershed resilience in arid inland river basins, Northwest China. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 906:167534. [PMID: 37797763 DOI: 10.1016/j.scitotenv.2023.167534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/28/2023] [Accepted: 09/30/2023] [Indexed: 10/07/2023]
Abstract
Water scarcity significantly limits the sustainable development of oasis economies in arid inland river basins. Quantifying watershed resilience and its drivers is a major focus in the fields of hydrology and water resources. In this study, the resilience indicator pi represents watershed resilience, while meteorological, hydrological, socioeconomic, and ecological factors are used to investigate the spatial and temporal patterns of resilience and important driving factors in the Hotan River Basin from 1958 to 2020 by combining principal component analysis and random forest model. Results show that the overall resilience of the Hotan River Basin is low, decreasing from the upper (upstream) to the middle and lower (downstream) reaches, and that the intensity of human activities has a negative impact on resilience. Rivers are more likely to reach maximum resilience after experiencing periods of wet and dry conditions, although there is a lag in this progress. The random forest machine learning algorithm was used to accurately predict the resilience levels of the two upstream tributaries Yurungkash and Karakash Rivers, and the downstream Hotan River, with classification accuracies of 84.2 %, 71.4 %, and 87 %, respectively. The factors affecting the resilience of the Yurungkash River are the 30-day maximum, base flow index, low pulse duration, median streamflow in May, median streamflow in August, median streamflow in October, and 7-day maximum. The set of factors used to classify the resilience of the Karakash River include the 7-day maximum, 1-day maximum, median streamflow in June, 30-day maximum, 3-day maximum, median streamflow in February, and autumn temperature. The factors affecting the resilience of the Hotan River are the watershed inflow, Xiaota station runoff, population growth rate, and effective irrigated area. The findings of this study provide a theoretical basis for integrated water resource management and the sustainable development of the oasis economy in the Hotan River Basin.
Collapse
Affiliation(s)
- Yuehui Wang
- State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China; Key Laboratory of Surficial Geochemistry, Ministry of Education, Department of Hydrosciences, School of Earth Sciences and Engineering, Nanjing University, Nanjing 210023, China
| | - Fengzhi Shi
- State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China; Akesu National Station of Observation and Research for Oasis Agro-ecosystem, Akesu 843017, Xinjiang, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Peng Yao
- State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China; Akesu National Station of Observation and Research for Oasis Agro-ecosystem, Akesu 843017, Xinjiang, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yu Sheng
- State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China; Akesu National Station of Observation and Research for Oasis Agro-ecosystem, Akesu 843017, Xinjiang, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chengyi Zhao
- School of Geographical Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, China
| |
Collapse
|
10
|
Isles PDF. A random forest approach to improve estimates of tributary nutrient loading. WATER RESEARCH 2024; 248:120876. [PMID: 37984040 DOI: 10.1016/j.watres.2023.120876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 11/13/2023] [Accepted: 11/14/2023] [Indexed: 11/22/2023]
Abstract
Estimating constituent loads from discrete water quality samples coupled with stream discharge measurements is critical for management of freshwater resources. Nutrient loads calculated based on discharge-concentration relationships form the basis of government nutrient load targets and scientific studies of the response of receiving waters to external loads. In this study, a new model is developed using random forests and applied to estimate concentrations and loads of total phosphorus, dissolved phosphorus, total nitrogen, and chloride, using data from 17 tributaries to Lake Champlain monitored from 1992 to 2021. I benchmark this model against one of the most widespread models currently used to estimate nutrient loads, Weighted Regressions on Time, Discharge, and Season (WRTDS). The random forest model outperformed both the base WRTDS model and an extension of the WRTDS model using Kalman filtering in the great majority of cases, likely due to the inclusion of rate-of-change in discharge and antecedent discharge over different leading windows as predictors, and to the flexibility of the random forest to model predictor-response relationships. The random forest also had useful visualization capabilities which provided important process insights. WRTDS remains a useful model for many applications, but this study represents a promising new approach for load estimation which can be applied easily to existing datasets, and which is easy to customize for different applications.
Collapse
Affiliation(s)
- Peter D F Isles
- Vermont Department of Environmental Conservation, 1 National Life Drive, Montpelier, VT 05 USA.
| |
Collapse
|
11
|
Maal-Bared R, Brisolara K, Knight M, Mansfeldt C. To sample or not to sample: A governance-focused decision tree for wastewater service providers considering participation in wastewater-based epidemiology (WBE) in support of public health programs. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 905:167128. [PMID: 37722431 DOI: 10.1016/j.scitotenv.2023.167128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 09/13/2023] [Accepted: 09/14/2023] [Indexed: 09/20/2023]
Abstract
Wastewater-based epidemiology (WBE) provides value to public health monitoring and protection. Participation of public and private wastewater system operators in WBE efforts is critical to public health surveillance program success and sustainability. However, given the number of WBE solicitations wastewater service providers receive, the limitation of service provider resources, the concerns around privacy, ethics, and equity, and the fatigue associated with responding to COVID-19, operators are becoming more hesitant to participate in WBE efforts. While various ethical concerns and sustainability challenges associated with WBE have been documented, no efforts to date have investigated what factors should systematically influence the decision to provide samples to a WBE effort. Therefore, this study develops a decision-making tool for WBE teams to proactively monitor, manage, and avoid wastewater system operators' operational risks and potential liabilities. Ultimately, using this tool allows WBE program partners in academia, government, and industry to better understand wastewater system operators' needs and challenges surrounding data quality and use, public health ethics, and daily wastewater infrastructure operation.
Collapse
Affiliation(s)
| | - Kari Brisolara
- LSUHSC, School of Public Health, 2020 Gravier St, New Orleans, LA, USA.
| | - Mark Knight
- LuminUltra Technologies Ltd, 520 King St, Fredericton, NB E3B 6G3, Canada.
| | - Cresten Mansfeldt
- Department of Civil, Environmental, and Architectural Engineering, University of Colorado Boulder, 4001 Discovery Drive, Boulder, CO, USA; Environmental Engineering Program, University of Colorado Boulder, 4001 Discovery Drive, Boulder, CO, USA.
| |
Collapse
|
12
|
Zhang WX, Yue FJ, Wang Y, Li Y, Lang YC, Li SL. Dynamic N transport and N 2O emission during rainfall events in the coastal river. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 903:166206. [PMID: 37567291 DOI: 10.1016/j.scitotenv.2023.166206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 07/21/2023] [Accepted: 08/08/2023] [Indexed: 08/13/2023]
Abstract
The coastal zone exhibited a high population density with highly impacted by anthropogenic activities, such as river impoundment to prevent saline intrusion, which resulted in weak hydrological conditions. Rainfall events can result in dramatic changes in hydrological and nutrient transportation conditions, especially in rivers with weak hydrological conditions. However, how the nitrogen transport and N2O emissions or biogeochemistry responds to the different types of rainfall events in the weak hydrodynamics rivers is poorly understood. In this study, the hydrological, nitrogenous characteristic, as well as N2O dynamics, were studied by high-frequency water sampling during two distinct rainfall events, high-intensity with short duration (E1) and low-intensity with long duration (E2). The results displayed that the hydrologic condition in E1 with a wider range of d-excess values (from -9.50 to 32.1 ‰), were more dynamic than those observed in E2. The N2O concentrations (0.01-3.33 μmol/L) were higher during E1 compared to E2 (0.03-1.11 μmol/L), which indicated that high-intensity rainfall has a greater potential for N2O emission. On the contrary, the concentrations of nitrogen (e.g., TN and NO3--N) were lower during E1 compared to E2. Additionally, hysteresis was observed in both water and nitrogen components, resulting in a prolonged recovery time for pre-rainfall levels during the long-duration event. Moreover, the results showed that the higher average N2O flux (78.3 μmol/m2/h) in the rainfall event period was much larger than that in the non-rainfall period (1.63 μmol/m2/h). The frequency dam regulation resulted in the water level fluctuation, which could enhance wet-dry alternation and simulated N2O emissions. This study highlighted the characteristic of N dynamic and hydrological responses to diverse rainfall events occurrences in the coastal river. Rainfall could increase the N2O emission, especially during high-intensity rainfall events, which cannot be ignored in the context of annual N2O release.
Collapse
Affiliation(s)
- Wen-Xi Zhang
- Institute of Surface-Earth System Science, School of Earth System Science, Tianjin University, Tianjin 300072, China
| | - Fu-Jun Yue
- Institute of Surface-Earth System Science, School of Earth System Science, Tianjin University, Tianjin 300072, China; Tianjin Bohai Rim Coastal Earth Critical Zone National Observation and Research Station, Tianjin University, Tianjin 300072, China.
| | - Yong Wang
- Hydrology and Water Resources Management Center of Tianjin, Tianjin 300061, China
| | - Yun Li
- Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China.
| | - Yun-Chao Lang
- Institute of Surface-Earth System Science, School of Earth System Science, Tianjin University, Tianjin 300072, China; Tianjin Bohai Rim Coastal Earth Critical Zone National Observation and Research Station, Tianjin University, Tianjin 300072, China
| | - Si-Liang Li
- Institute of Surface-Earth System Science, School of Earth System Science, Tianjin University, Tianjin 300072, China; Tianjin Bohai Rim Coastal Earth Critical Zone National Observation and Research Station, Tianjin University, Tianjin 300072, China; Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
13
|
Liao Z, Lu J, Xie K, Wang Y, Yuan Y. Prediction of Photochemical Properties of Dissolved Organic Matter Using Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17971-17980. [PMID: 37029743 DOI: 10.1021/acs.est.2c07545] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Apparent quantum yields (Φ) of photochemically produced reactive intermediates (PPRIs) formed by dissolved organic matter (DOM) are vital to element cycles and contaminant fates in surface water. Simultaneous determination of ΦPPRI values from numerous water samples through existing experimental methods is time consuming and ineffective. Herein, machine learning models were developed with a systematic data set including 1329 data points to predict the values of three ΦPPRIs (Φ3DOM*, Φ1O2, and Φ·OH) based on DOM spectral parameters, experimental conditions, and calculation parameters. The best predictive performances for Φ3DOM*, Φ1O2, and Φ·OH were achieved using the CatBoost model, which outperformed the traditional linear regression models. The significances of the wavelength range and spectral parameters on the three ΦPPRI predictions were revealed, suggesting that DOM with lower molecular weight, lower aromatic content, and a more autochthonous portion possessed higher ΦPPRIs. Chain models were constructed by adding the predicted Φ3DOM* as a new feature into the Φ1O2 and Φ·OH models, which consequently improved the predictive performance of Φ1O2 but worsened the Φ·OH prediction likely due to the complex formation pathways of ·OH. Overall, this study offered robust ΦPPRI prediction across interlaboratory differences and provided new insights into the relationship between PPRIs formation and DOM properties.
Collapse
Affiliation(s)
- Zhiyang Liao
- Guangdong Key Laboratory of Environmental Catalysis and Health Risk Control, School of Environmental Science and Engineering, Institute of Environmental Health and Pollution Control, Guangdong University of Technology, Guangzhou 510006, China
| | - Jinrong Lu
- Guangdong Key Laboratory of Environmental Catalysis and Health Risk Control, School of Environmental Science and Engineering, Institute of Environmental Health and Pollution Control, Guangdong University of Technology, Guangzhou 510006, China
| | - Kunting Xie
- Guangdong Key Laboratory of Environmental Catalysis and Health Risk Control, School of Environmental Science and Engineering, Institute of Environmental Health and Pollution Control, Guangdong University of Technology, Guangzhou 510006, China
| | - Yi Wang
- Guangdong Key Laboratory of Environmental Catalysis and Health Risk Control, School of Environmental Science and Engineering, Institute of Environmental Health and Pollution Control, Guangdong University of Technology, Guangzhou 510006, China
| | - Yong Yuan
- Guangdong Key Laboratory of Environmental Catalysis and Health Risk Control, School of Environmental Science and Engineering, Institute of Environmental Health and Pollution Control, Guangdong University of Technology, Guangzhou 510006, China
| |
Collapse
|
14
|
Zhao W, Ma J, Liu Q, Dou L, Qu Y, Shi H, Sun Y, Chen H, Tian Y, Wu F. Accurate Prediction of Soil Heavy Metal Pollution Using an Improved Machine Learning Method: A Case Study in the Pearl River Delta, China. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17751-17761. [PMID: 36821784 DOI: 10.1021/acs.est.2c07561] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
In traditional soil heavy metal (HM) pollution assessment, spatial interpolation analysis is often carried out on the limited sampling points in the study area to get the overall status of heavy metal pollution. Unfortunately, in many machine learning spatial information enhancement algorithms, the additional spatial information introduced fails to reflect the hierarchical heterogeneity of the study area. Therefore, we designed hierarchical regionalization labels based on three interpolation techniques (inverse distance weight, ordinary kriging, and trend surface interpolation) as new spatial covariates for a machine learning (ML) model. It was demonstrated that regional spatial information improved the prediction performance of the model (R2 > 0.7). On the basis of the prediction results, the status of HM pollution in the Pearl River Delta (PRD) region was evaluated: cadmium (Cd) and copper (Cu) were the most serious pollutants in the PRD (the point overstandard rates are 18.77% and 12.95%, respectively). The analysis of index importance and bivariate local indicators of spatial association (LISA) shows that the key factors affecting the spatial distribution of heavy metals are geographical and climatic conditions [namely, altitude, humidity index, and normalized vegetation difference index (NDVI)] and some industrial activities (such as metal processing, printing and dyeing, and electronics industry). This study develops a novel approach to improve existing spatial interpolation techniques, which will enable more precise and scientific soil environmental management.
Collapse
Affiliation(s)
- Wenhao Zhao
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| | - Jin Ma
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| | - Qiyuan Liu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| | - Lei Dou
- Guangdong Geological Survey Institute, Guangzhou 510110, P. R. China
| | - Yajing Qu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| | - Huading Shi
- Technical Centre for Soil, Agricultural and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, P. R. China
| | - Yi Sun
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| | - Haiyan Chen
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| | - Yuxin Tian
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| | - Fengchang Wu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| |
Collapse
|
15
|
Hao H, Li P, Jiao W, Ge D, Hu C, Li J, Lv Y, Chen W. Ensemble learning-based applied research on heavy metals prediction in a soil-rice system. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 898:165456. [PMID: 37451444 DOI: 10.1016/j.scitotenv.2023.165456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 07/06/2023] [Accepted: 07/08/2023] [Indexed: 07/18/2023]
Abstract
Accurate prediction of heavy metal accumulation in soil ecosystems is crucial for maintaining healthy soil environments and ensuring high-quality agricultural products, as well as a challenging scientific task. In this study, we constructed a dataset containing 490 sets of multidimensional environmental covariate data and proposed prediction models for heavy metal concentrations (HMC) in a soil-rice system, EL-HMC (including RF-HMC and GBM-HMC), based on Random Forest (RF) and Gradient Boosting Machine (GBM) ensemble learning (EL) techniques. To reasonably evaluate the effectiveness of each model, Multiple linear and Bayesian regressions were selected as benchmark models (BM), and mean absolute error (MAE), root mean square error (RMSE), and determination coefficient R2 were selected as evaluation indicators. In addition, sensitivity and spatial autocorrelation (SAC) analyses were used to examine the robustness of the model. The results showed that the R2 values of RF-HMC and GBM-HMC for modeling available cadmium (Cd) concentrations in soil were 0.654 and 0.690, respectively, with an average increase of 48.0 % compared to the BMs. The R2 values of RF-HMC and GBM-HMC for predicting Cd, lead (Pb), chromium (Cr), and mercury (Hg) concentrations in rice ranged from 0.618 to 0.824 and 0.645 to 0.850, respectively, with an average increase of 58.2 % compared with the BMs. The corresponding MAEs and RMSEs of RF-HMC and GBM-HMC had low error levels. Sensitivity analysis of the input features and the SAC of the prediction bias showed that the EL-HMC models have excellent robustness. Therefore, the EL technology-based prediction models for HMCs proposed herein are practical and feasible, demonstrating better accuracy and stability than the traditional model. This study verifies the application potential of EL technology in pollution ecology and provides a new perspective and solution for sustainable management and precise prevention of heavy metal pollution in farmland soil at the regional scale.
Collapse
Affiliation(s)
- Huijuan Hao
- Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China
| | - Panpan Li
- Information Centre, PLA Strategic Support Force Characteristic Medical Center, Beijing 100101, PR China.
| | - Wentao Jiao
- Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China.
| | - Dabing Ge
- College of Resources and Environment, Hunan Agricultural University, Changsha 410128, PR China
| | - Chengwei Hu
- Information Centre, PLA Strategic Support Force Characteristic Medical Center, Beijing 100101, PR China
| | - Jing Li
- Department of Oncology, Huludao Central Hospital, Huludao 125001, PR China
| | - Yuntao Lv
- Risk assessment Laboratory for Environmental Factors of Agro-product Quality Safety, Ministry of Agriculture and villages, Changsha 410005, PR China
| | - Wanming Chen
- Risk assessment Laboratory for Environmental Factors of Agro-product Quality Safety, Ministry of Agriculture and villages, Changsha 410005, PR China
| |
Collapse
|
16
|
Dieser M, Zieseniß S, Mielenz H, Müller K, Greef JM, Stever-Schoo B. Nitrate leaching potential from arable land in Germany: Identifying most relevant factors. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 345:118664. [PMID: 37499418 DOI: 10.1016/j.jenvman.2023.118664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 07/29/2023]
Abstract
Diffuse nitrogen losses from agriculture in Germany continue to cause regionally increased nitrate concentrations in groundwater. Groundwater quality monitoring cannot be a timely indicator of the effects of mitigation measures being applied in agriculture, due to frequently long transport routes and high residence times of the leachate. Instead, nitrate leaching potential is often determined at field and farm scale by monitoring soil mineral nitrogen contents at 0-90 cm depth in autumn (SMNa), i.e. before the start of the annual leachate period. In this study, we developed an understanding of the controls on the soil mineral nitrogen content at the start of winter. In an on-farm approach, extensive data was collected from 48 farms in five nitrate-sensitive regions in Germany from 2017 to 2020. From this data set, 25 management and site factors were evaluated with regard to their significance for SMNa by means of a random forest model. With the random forest regression, we identified the role of the factors on SMNa with an acceptable model accuracy with R2 = 0.56. The results show that the cultivated crop is the most important factor influencing SMNa. Potatoes, oilseed rape and maize produced the highest SMNas, whereas SMNas were lowest after spring barley, sugar beet and winter barley. Among site factors, soil type and texture as well as precipitation in October were most decisive. The effects of N fertilisation parameters such as rate and timing were masked by these site factors. The results show that the reduction of nitrogen-intensive crops in crop sequences can be a promising measure for the reduction of nitrate loads. On the other hand, our analysis makes clear that soil-related factors controlling nitrogen release and risk of leaching, as well as weather, can significantly mask the effect of cultivation.
Collapse
Affiliation(s)
- Mona Dieser
- Julius Kühn Institute (JKI), - Federal Research Centre for Cultivated Plants, Institute for Crop and Soil Science, Braunschweig, Germany.
| | - Steffen Zieseniß
- Julius Kühn Institute (JKI), - Federal Research Centre for Cultivated Plants, Institute for Crop and Soil Science, Braunschweig, Germany
| | - Henrike Mielenz
- Julius Kühn Institute (JKI), - Federal Research Centre for Cultivated Plants, Institute for Crop and Soil Science, Braunschweig, Germany
| | - Karolin Müller
- Julius Kühn Institute (JKI), - Federal Research Centre for Cultivated Plants, Institute for Crop and Soil Science, Braunschweig, Germany
| | - Jörg-Michael Greef
- Julius Kühn Institute (JKI), - Federal Research Centre for Cultivated Plants, Institute for Crop and Soil Science, Braunschweig, Germany
| | - Burkhard Stever-Schoo
- Julius Kühn Institute (JKI), - Federal Research Centre for Cultivated Plants, Institute for Crop and Soil Science, Braunschweig, Germany
| |
Collapse
|
17
|
Saranya MS, Vinish VN. A comparative evaluation of streamflow prediction using the SWAT and NNAR models in the Meenachil River Basin of Central Kerala, India. WATER SCIENCE AND TECHNOLOGY : A JOURNAL OF THE INTERNATIONAL ASSOCIATION ON WATER POLLUTION RESEARCH 2023; 88:2002-2018. [PMID: 37906455 PMCID: wst_2023_330 DOI: 10.2166/wst.2023.330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Reliable and accurate modelling of streamflow is still a challenging task due to their complex behaviour, need for extensive parameter for development as well as lack of complete or accurate data. In this study, the applicability of an emerging data-driven model, specifically a neural network autoregression (NNAR) model, was evaluated for the first time as a substitute to the physically based hydrological model Soil and Water Assessment Tool (SWAT) for predicting streamflow under data-scarce conditions and for immediate high-quality modelling results. The inputs to the NNAR model were the lagged values of the daily streamflow time series data, and the output was the predicted value for the next day. Using streamflow data that was windowed by 20 days, the NNAR model produced the best prediction. The results of the statistical metrics used to evaluate the performance of the NNAR model were satisfactory (R = 0.90, RMSE = 28.27, MAE = 11.92, R2 = 0.83), indicating a high degree of agreement between the predicted and observed streamflow. The NNAR model outputs demonstrated its ability to accurately predict streamflow in the river basin, even without an explicit understanding of the physical processes that govern the system.
Collapse
Affiliation(s)
- M S Saranya
- Department of Civil Engineering, R.I.T. Govt. Engineering College, APJ Abdul Kalam Technological University, Kottayam, Kerala, India E-mail: ;
| | - V Nair Vinish
- Department of Civil Engineering, R.I.T. Govt. Engineering College, APJ Abdul Kalam Technological University, Kottayam, Kerala, India
| |
Collapse
|
18
|
Puri D, Kumar R, Sihag P, Thakur MS, Perveen K, Alfaisal FM, Lee D. Analytical Investigation of the Impact of Jet Geometry on Aeration Effectiveness Using Soft Computing Techniques. ACS OMEGA 2023; 8:31811-31825. [PMID: 37692205 PMCID: PMC10483528 DOI: 10.1021/acsomega.3c03294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 08/02/2023] [Indexed: 09/12/2023]
Abstract
Jet aeration is a commonly used technique for introducing air into water during wastewater treatment. In this investigation, the efficacy of different soft computing models, namely, Random Forest, Reduced Error Pruning Tree, Artificial Neural Network (ANN), Gaussian Process, and Support Vector Machine, was examined in predicting the aeration efficiency (E20) of circular and square jet configurations in an open channel flow. A total of 126 experimental data points were utilized to develop and validate these models. To assess the models' performance, three goodness-of-fit parameters were employed: correlation coefficient (CC), root-mean-square error (RMSE), and mean absolute error (MAE). The analysis revealed that all of the developed models exhibited predictive capabilities, with CC values surpassing 0.8. Nonetheless, when it comes to predicting E20, the ANN model outperformed other soft computing models, achieving a CC of 0.9748, MAE of 0.0164, and RMSE of 0.0211. A sensitivity analysis emphasized that the angle of inclination exerted the most significant influence on the aeration in an open channel. Furthermore, the results demonstrated that square jets delivered superior aeration compared to that of circular jets under identical operating conditions.
Collapse
Affiliation(s)
- Diksha Puri
- School
of Environmental Science, Shoolini University, Solan, Himachal Pradesh 173229, India
| | - Raj Kumar
- Department
of Mechanical Engineering, Gachon University, Seongnam 13120, South Korea
| | - Parveen Sihag
- Department
of Civil Engineering, Chandigarh University, Mohali, Punjab 140301, India
| | - Mohindra Singh Thakur
- Department
of Civil Engineering, Shoolini University, Solan, Himachal Pradesh 173229, India
| | - Kahkashan Perveen
- Department
of Botany & Microbiology, College of Science, King Saud University, P.O. Box 22452, Riyadh 11495, Saudi Arabia
| | - Faisal M. Alfaisal
- Department
of Civil Engineering, College of Engineering, King Saud University, Riyadh 11495, Saudi Arabia
| | - Daeho Lee
- Department
of Mechanical Engineering, Gachon University, Seongnam 13120, South Korea
| |
Collapse
|
19
|
Barcala V, Rozemeijer J, Ouwerkerk K, Gerner L, Osté L. Value and limitations of machine learning in high-frequency nutrient data for gap-filling, forecasting, and transport process interpretation. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023; 195:892. [PMID: 37368078 DOI: 10.1007/s10661-023-11519-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 06/13/2023] [Indexed: 06/28/2023]
Abstract
High-frequency monitoring of water quality in catchments brings along the challenge of post-processing large amounts of data. Moreover, monitoring stations are often remote and technical issues resulting in data gaps are common. Machine learning algorithms can be applied to fill these gaps, and to a certain extent, for predictions and interpretation. The objectives of this study were (1) to evaluate six different machine learning models for gap-filling in a high-frequency nitrate and total phosphorus concentration time series, (2) to showcase the potential added value (and limitations) of machine learning to interpret underlying processes, and (3) to study the limits of machine learning algorithms for predictions outside the training period. We used a 4-year high-frequency dataset from a ditch draining one intensive dairy farm in the east of The Netherlands. Continuous time series of precipitation, evapotranspiration, groundwater levels, discharge, turbidity, and nitrate or total phosphorus were used as predictors for total phosphorus and nitrate concentrations respectively. Our results showed that the random forest algorithm had the best performance to fill in data-gaps, with R2 higher than 0.92 and short computation times. The feature importance helped understanding the changes in transport processes linked to water conservation measures and rain variability. Applying the machine learning model outside the training period resulted in a low performance, largely due to system changes (manure surplus and water conservation) which were not included as predictors. This study offers a valuable and novel example of how to use and interpret machine learning models for post-processing high-frequency water quality data.
Collapse
Affiliation(s)
- Victoria Barcala
- Unit Inland Water Systems, Daltonlaan 600, 3584 BK, Utrecht, The Netherlands.
| | - Joachim Rozemeijer
- Unit Subsurface and Groundwater Systems, Daltonlaan 600, 3584 BK, Utrecht, The Netherlands
| | - Kevin Ouwerkerk
- Unit Subsurface and Groundwater Systems, Daltonlaan 600, 3584 BK, Utrecht, The Netherlands
| | - Laurens Gerner
- Water Board Rijn and IJssel, Liemersweg 2, 7006 GG, Doetinchem, The Netherlands
| | - Leonard Osté
- Unit Inland Water Systems, Daltonlaan 600, 3584 BK, Utrecht, The Netherlands
| |
Collapse
|
20
|
Cheng Q, Chunhong Z, Qianglin L. Development and application of random forest regression soft sensor model for treating domestic wastewater in a sequencing batch reactor. Sci Rep 2023; 13:9149. [PMID: 37277429 DOI: 10.1038/s41598-023-36333-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 06/01/2023] [Indexed: 06/07/2023] Open
Abstract
Small-scale distributed water treatment equipment such as sequencing batch reactor (SBR) is widely used in the field of rural domestic sewage treatment because of its advantages of rapid installation and construction, low operation cost and strong adaptability. However, due to the characteristics of non-linearity and hysteresis in SBR process, it is difficult to construct the simulation model of wastewater treatment. In this study, a methodology was developed using artificial intelligence and automatic control system that can save energy corresponding to reduce carbon emissions. The methodology leverages random forest model to determine a suitable soft sensor for the prediction of COD trends. This study uses pH and temperature sensors as premises for COD sensors. In the proposed method, data were pre-processed into 12 input variables and top 7 variables were selected as the variables of the optimized model. Cycle ended by the artificial intelligence and automatic control system instead of by fixed time control that was an uncontrolled scenario. In 12 test cases, percentage of COD removal is about 91. 075% while 24. 25% time or energy was saved from an average perspective. This proposed soft sensor selection methodology can be applied in field of rural domestic sewage treatment with advantages of time and energy saving. Time-saving results in increasing treatment capacity and energy-saving represents low carbon technology. The proposed methodology provides a framework for investigating ways to reduce costs associated with data collection by replacing costly and unreliable sensors with affordable and reliable alternatives. By adopting this approach, energy conservation can be maintained while meeting emission standards.
Collapse
Affiliation(s)
- Qiu Cheng
- Department of Material and Environmental Engineering, Chengdu Technological University, Chengdu, China
| | - Zhan Chunhong
- Huicai Environmental Technology Co., Ltd., De Yuan Zhen, Pidu District, Chengdu, Sichuan, China
| | - Li Qianglin
- Department of Material and Environmental Engineering, Chengdu Technological University, Chengdu, China.
| |
Collapse
|
21
|
Chen Y, Wang J, Jiang L, Li H, Wang H, Lv G, Li X. Prediction of spatial distribution characteristics of ecosystem functions based on a minimum data set of functional traits of desert plants. FRONTIERS IN PLANT SCIENCE 2023; 14:1131778. [PMID: 37332722 PMCID: PMC10272538 DOI: 10.3389/fpls.2023.1131778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 05/10/2023] [Indexed: 06/20/2023]
Abstract
The relationship between plant functional traits and ecosystem function is a hot topic in current ecological research, and community-level traits based on individual plant functional traits play important roles in ecosystem function. In temperate desert ecosystems, which functional trait to use to predict ecosystem function is an important scientific question. In this study, the minimum data sets of functional traits of woody (wMDS) and herbaceous (hMDS) plants were constructed and used to predict the spatial distribution of C, N, and P cycling in ecosystems. The results showed that the wMDS included plant height, specific leaf area, leaf dry weight, leaf water content, diameter at breast height (DBH), leaf width, and leaf thickness, and the hMDS included plant height, specific leaf area, leaf fresh weight, leaf length, and leaf width. The linear regression results based on the cross-validations (FTEIW - L, FTEIA - L, FTEIW - NL, and FTEIA - NL) for the MDS and TDS (total data set) showed that the R2 (coefficients of determination) for wMDS were 0.29, 0.34, 0.75, and 0.57, respectively, and those for hMDS were 0.82, 0.75, 0.76, and 0.68, respectively, proving that the MDSs can replace the TDS in predicting ecosystem function. Then, the MDSs were used to predict the C, N, and P cycling in the ecosystem. The results showed that non-linear models RF and BPNN were able to predict the spatial distributions of C, N and P cycling, and the distributions showed inconsistent patterns between different life forms under moisture restrictions. The C, N, and P cycling showed strong spatial autocorrelation and were mainly influenced by structural factors. Based on the non-linear models, the MDSs can be used to accurately predict the C, N, and P cycling, and the predicted values of woody plant functional traits visualized by regression kriging were closer to the kriging results based on raw values. This study provides a new perspective for exploring the relationship between biodiversity and ecosystem function.
Collapse
Affiliation(s)
- Yudong Chen
- College of Ecology and Environment, Xinjiang University, Urumqi, China
- Key Laboratory of Oasis Ecology of Education Ministry, Xinjiang University, Urumqi, China
- Xinjiang Jinghe Observation and Research Station of Temperate Desert Ecosystem, Ministry of Education, Jinghe, China
| | - Jinlong Wang
- College of Ecology and Environment, Xinjiang University, Urumqi, China
- Key Laboratory of Oasis Ecology of Education Ministry, Xinjiang University, Urumqi, China
- Xinjiang Jinghe Observation and Research Station of Temperate Desert Ecosystem, Ministry of Education, Jinghe, China
| | - Lamei Jiang
- College of Ecology and Environment, Xinjiang University, Urumqi, China
- Key Laboratory of Oasis Ecology of Education Ministry, Xinjiang University, Urumqi, China
- Xinjiang Jinghe Observation and Research Station of Temperate Desert Ecosystem, Ministry of Education, Jinghe, China
| | - Hanpeng Li
- College of Ecology and Environment, Xinjiang University, Urumqi, China
- Key Laboratory of Oasis Ecology of Education Ministry, Xinjiang University, Urumqi, China
- Xinjiang Jinghe Observation and Research Station of Temperate Desert Ecosystem, Ministry of Education, Jinghe, China
| | - Hengfang Wang
- College of Ecology and Environment, Xinjiang University, Urumqi, China
- Key Laboratory of Oasis Ecology of Education Ministry, Xinjiang University, Urumqi, China
- Xinjiang Jinghe Observation and Research Station of Temperate Desert Ecosystem, Ministry of Education, Jinghe, China
| | - Guanghui Lv
- College of Ecology and Environment, Xinjiang University, Urumqi, China
- Key Laboratory of Oasis Ecology of Education Ministry, Xinjiang University, Urumqi, China
- Xinjiang Jinghe Observation and Research Station of Temperate Desert Ecosystem, Ministry of Education, Jinghe, China
| | - Xiaotong Li
- College of Ecology and Environment, Xinjiang University, Urumqi, China
- Key Laboratory of Oasis Ecology of Education Ministry, Xinjiang University, Urumqi, China
- Xinjiang Jinghe Observation and Research Station of Temperate Desert Ecosystem, Ministry of Education, Jinghe, China
| |
Collapse
|
22
|
Zhao F, Tang L, Jiang H, Mao Y, Song W, Chen H. Prediction of heavy metals adsorption by hydrochars and identification of critical factors using machine learning algorithms. BIORESOURCE TECHNOLOGY 2023:129223. [PMID: 37244307 DOI: 10.1016/j.biortech.2023.129223] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 05/18/2023] [Accepted: 05/20/2023] [Indexed: 05/29/2023]
Abstract
Hydrochar has become a popular product for immobilizing heavy metals in water bodies. However, the relationships between the preparation conditions, hydrochar properties, adsorption conditions, heavy metal types, and the maximum adsorption capacity (Qm) of hydrochar are not adequately explored. Four artificial intelligence models were used in this study to predict the Qm of hydrochar and identify the key influencing factors. The gradient boosting decision tree (GBDT) showed excellent predictive capability for this study (R2=0.93, RMSE=25.65). Hydrochar properties (37%) controlled heavy metal adsorption. Meanwhile, the optimal hydrochar properties were revealed, including the C, H, N, and O contents of 57.28-78.31%, 3.56-5.61%, 2.01-6.42%, and 20.78-25.37%. Higher hydrothermal temperatures (>220 °C) and longer hydrothermal time (>10 h) lead to the optimal type and density of surface functional groups for heavy metal adsorption, which increased the Qm values. This study has great potential for instructing industrial applications of hydrochar in treating heavy metal pollution.
Collapse
Affiliation(s)
- Fangzhou Zhao
- School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Lingyi Tang
- Department of Earth and Atmospheric Sciences, University of Alberta, Edmonton, Alberta, T6G 2E3, Canada
| | - Hanfeng Jiang
- School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Yajun Mao
- School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Wenjing Song
- Key Laboratory of Tobacco Biology and Processing, Ministry of Agriculture, Tobacco Research Institute of Chinese Academy of Agricultural Sciences, Qingdao 266101, China
| | - Haoming Chen
- School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing, China.
| |
Collapse
|
23
|
Rao W, Qian X, Fan Y, Liu T. A soft sensor for simulating algal cell density based on dynamic response to environmental changes in a eutrophic shallow lake. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 868:161543. [PMID: 36640876 DOI: 10.1016/j.scitotenv.2023.161543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 01/07/2023] [Accepted: 01/07/2023] [Indexed: 06/17/2023]
Abstract
There is a great need for timely monitoring and rapid water quality assessment to control the algal blooms that often occur in eutrophic lakes. While algal cell density (ACD) is a critical indicator of algal growth, field monitoring is laborious and time-consuming, and rapid assessment of algal blooms based on ACD is often not possible. To address the limitations of conventional ACD detection, we proposed a soft sensor approach that uses surrogate indicators to simulate ACD in machine learning models. We conducted a case study using monitoring data from Chaohu Lake collected between 2016 and 2019. We found that ensemble learning models, especially extreme gradient boosting (XGBoost), outperformed traditional machine learning algorithms by comparing various machine learning algorithms. Also, considering the influence of input variable selection on model performance, we combined the results of different filter methods in the multi-stage variable selection process. Finally, we screened out seven key variables out of the 43 initial candidate variables, including dissolved oxygen (DO), chlorophyll-a (Chl-a), Secchi disk depth (SD), pH, permanganate index (CODMn), week of the year (WOY), and wind velocity (WV). Their inclusion substantially improved data accessibility and supported the development of a rapid simulation model. The final model was capable of reliable spatiotemporal generalization, with an overall R2 value of 0.761. On the theoretical side, our study makes a new attempt to simulate ACD values in a eutrophic lake. For practical purposes, the soft sensor can facilitate the rapid assessment of bloom conditions, which helps the local administration with emergency prevention and control.
Collapse
Affiliation(s)
- Wenxin Rao
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Xin Qian
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China; Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science & Technology, Nanjing 210044, China.
| | - Yifan Fan
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Tong Liu
- Faculty of Environmental Earth Science, Hokkaido University, Sapporo 060-0810, Japan
| |
Collapse
|
24
|
Agarwal V, Akyilmaz O, Shum CK, Feng W, Yang TY, Forootan E, Syed TH, Haritashya UK, Uz M. Machine learning based downscaling of GRACE-estimated groundwater in Central Valley, California. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 865:161138. [PMID: 36586696 DOI: 10.1016/j.scitotenv.2022.161138] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 12/19/2022] [Accepted: 12/19/2022] [Indexed: 06/17/2023]
Abstract
California's Central Valley, one of the most agriculturally productive regions, is also one of the most stressed aquifers in the world due to anthropogenic groundwater over-extraction primarily for irrigation. Groundwater depletion is further exacerbated by climate-driven droughts. Gravity Recovery and Climate Experiment (GRACE) satellite gravimetry has demonstrated the feasibility of quantifying global groundwater storage changes at uniform monthly sampling, though at a coarse resolution and is thus impractical for effective water resources management. Here, we employ the Random Forest machine learning algorithm to establish empirical relationships between GRACE-derived groundwater storage and in situ groundwater level variations over the Central Valley during 2002-2016 and achieved spatial downscaling of GRACE-observed groundwater storage changes from a few hundred km to 5 km. Validations of our modeled groundwater level with in situ groundwater level indicate excellent Nash-Sutcliffe Efficiency coefficients ranging from 0.94 to 0.97. In addition, the secular components of modeled groundwater show good agreements with those of vertical displacements observed by GPS, and CryoSat-2 radar altimetry measurements and is perfectly consistent with findings from previous studies. Our estimated groundwater loss is about 30 km3 from 2002 to 2016, which also agrees well with previous studies in Central Valley. We find the maximum groundwater storage loss rates of -5.7 ± 1.2 km3 yr-1 and -9.8 ± 1.7 km3 yr-1 occurred during the extended drought periods of January 2007-December 2009, and October 2011-September 2015, respectively while Central Valley also experienced groundwater recharges during prolonged flood episodes. The 5-km resolution Central Valley-wide groundwater storage trends reveal that groundwater depletion occurs mostly in southern San Joaquin Valley collocated with severe land subsidence due to aquifer compaction from excessive groundwater over withdrawal.
Collapse
Affiliation(s)
- Vibhor Agarwal
- Department of Earth Sciences, College of Wooster, USA; Department of Geology and Environmental Geosciences, University of Dayton, USA; Division of Geodetic Science, School of Earth Sciences, The Ohio State University, USA.
| | - Orhan Akyilmaz
- Department of Geomatic Engineering, Istanbul Technical University, Turkey
| | - C K Shum
- Division of Geodetic Science, School of Earth Sciences, The Ohio State University, USA; Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, China
| | - Wei Feng
- Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, China; School of Geospatial Engineering and Science, Sun Yat-sen University, China
| | | | | | | | - Umesh K Haritashya
- Department of Geology and Environmental Geosciences, University of Dayton, USA
| | - Metehan Uz
- Department of Geomatic Engineering, Istanbul Technical University, Turkey
| |
Collapse
|
25
|
Schwarz M, Trippel J, Engelhart M, Wagner M. Dynamic alpha factor prediction with operating data - a machine learning approach to model oxygen transfer dynamics in activated sludge. WATER RESEARCH 2023; 231:119650. [PMID: 36702025 DOI: 10.1016/j.watres.2023.119650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 12/13/2022] [Accepted: 01/18/2023] [Indexed: 06/17/2023]
Abstract
Aeration is an energy-intensive process of aerobic biological wastewater treatment. An accurate model of oxygen transfer dynamics in activated sludge tanks would improve design and operation of aeration systems. Such a model should consider spatial and diurnal variation of α-factor as well as site-specific conditions that impact oxygen transfer. For this dynamic prediction a machine learning approach was used for the first time. The data-driven method was based on long-term ex-situ off-gas measurements with pilot-scale reactors (5.8 m height, 8.3 m3 vol) coupled to full-scale activated sludge tanks on the sites of two conventional and a two-stage activated sludge treatment plant. The ex-situ off-gas method allowed to quantify theoretical off-gas parameters in non-aerated zones and thus consider the whole activated sludge tank. We introduced the α0-factor to compare aerated and non-aerated zones under nonsteady-state conditions. Like the established α-factor for steady-state conditions, the α0-factor describes oxygen transfer inhibiting effects in activated sludge. α0-factor was lowest in upstream denitrification zones. This indicates an anoxic elimination of oxygen transfer inhibiting wastewater contaminants which improved oxygen transfer in subsequent aerobic zones. Random Forest models predicted α0-factor reliably in all examined activated sludge tanks even for stormwater events and seasonal variation. Model development only required online sensor data already available to operators. Our results suggest that machine learning models can dynamically predict α-factors in a variety of activated sludge processes, thus considering site-specific conditions in model training without manual calibration.
Collapse
Affiliation(s)
- M Schwarz
- Institute IWAR, Chair of Wastewater Technology, Technical University of Darmstadt, Franziska-Braun-Str. 7, Darmstadt 64287, Germany.
| | - J Trippel
- Institute IWAR, Chair of Wastewater Technology, Technical University of Darmstadt, Franziska-Braun-Str. 7, Darmstadt 64287, Germany
| | - M Engelhart
- Institute IWAR, Chair of Wastewater Technology, Technical University of Darmstadt, Franziska-Braun-Str. 7, Darmstadt 64287, Germany
| | - M Wagner
- Institute IWAR, Chair of Wastewater Technology, Technical University of Darmstadt, Franziska-Braun-Str. 7, Darmstadt 64287, Germany
| |
Collapse
|
26
|
Elbeltagi A, Pande CB, Kumar M, Tolche AD, Singh SK, Kumar A, Vishwakarma DK. Prediction of meteorological drought and standardized precipitation index based on the random forest (RF), random tree (RT), and Gaussian process regression (GPR) models. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:43183-43202. [PMID: 36648725 DOI: 10.1007/s11356-023-25221-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Accepted: 01/05/2023] [Indexed: 06/17/2023]
Abstract
Agriculture, meteorological, and hydrological drought is a natural hazard which affects ecosystems in the central India of Maharashtra state. Due to limited historical data for drought monitoring and forecasting available in the central India of Maharashtra state, implementing machine learning (ML) algorithms could allow for the prediction of future drought events. In this paper, we have focused on the prediction accuracy of meteorological drought in the semi-arid region based on the standardized precipitation index (SPI) using the random forest (RF), random tree (RT), and Gaussian process regression (GPR-PUK kernel) models. A different combination of machine learning models and variables has been performed for the forecasting of metrological drought based on the SPI-6 and 12 months. Models were developed using monthly rainfall data for the period of 2000-2019 at two meteorological stations, namely, Karanjali and Gangawdi, each representing a geographical region of Upper Godavari river basin area in the central India of Maharashtra state which frequently experiences droughts. Historical data from the SPI from 2000 to 2013 was processed to train the model into machine learning model, and the rest of the 2014 to 2019-year data were used for testing to forecast the SPI and metrological drought. The mean square error (MSE), root mean square error (RMSE), adjusted R2, Mallows' (Cp), Akaike's (AIC), Schwarz's (SBC), and Amemiya's PC were used to identify the best combination input model and best subregression analysis for both stations of SPI-6 and 12. The correlation coefficient ([Formula: see text]), mean absolute error (MAE), root mean square error (RMSE), relative absolute error (RAE), and root relative squared error (RRSE) were used to perform evaluation for SPI-6 and 12 months of both stations with RF, RT, and GPR-PUK kernel models during the training and testing scenarios. The results during testing phase revealed that the RF was found as the best model in forecasting droughts with values of [Formula: see text], MAE, RMSE, RAE (%), and RRSE (%) being 0.856, 0.551, 0.718, 74.778, and 54.019, respectively, for SPI-6 while 0.961, 0.361, 0.538, 34.926, and 28.262, respectively, for SPI-12 scales at Gangawdi station. Further, the respective values of evaluators at Karanjali station were 0.913 and 0.966, 0.541 and 0.386, 0.604 and 0.589, 52.592 and 36.959, and 42.315 and 31.394 for PUK kernel and RT models, respectively, during SPI-6 and SPI-12. Machine learning models are potential drought warning techniques because they take less time, have fewer inputs, and are less sophisticated than dynamic or scientific models.
Collapse
Affiliation(s)
- Ahmed Elbeltagi
- Agricultural Engineering Department, Faculty of Agriculture, Mansoura University, Mansoura, 35516, Egypt
| | - Chaitanya B Pande
- Indian Institute of Tropical Meteorology, Pune, India
- Universiti Tenaga Nasional (UNITEN), Kajang, Malaysia
| | - Manish Kumar
- College of Agricultural Engineering and Technology, Dr. R.P.C.A.U, Pusa-Bihar, 848125, India
| | - Abebe Debele Tolche
- Haramaya Institute of Technology, School of Water Resources and Environmental Engineering, Haramaya University, P.O. Box 138, Dire Dawa, Ethiopia
| | - Sudhir Kumar Singh
- K. Banerjee Centre of Atmospheric and Ocean Studies, IIDS, Nehru Science Centre, University of Allahabad, 211002, Prayagraj, India
| | - Akshay Kumar
- Environmental Science and Engineering and Department (ESED), Indian Institute of Technology, Bombay, Maharashtra, India
| | - Dinesh Kumar Vishwakarma
- Department of Irrigation and Drainage Engineering, Govind Ballabh Pant University of Agriculture and Technology, Pantnagar, Uttarakhand, 263145, India.
| |
Collapse
|
27
|
Santos F, Calle N, Bonilla S, Sarmiento F, Herrnegger M. Impacts of soil erosion and climate change on the built heritage of the Pambamarca Fortress Complex in northern Ecuador. PLoS One 2023; 18:e0281869. [PMID: 36821586 PMCID: PMC9949680 DOI: 10.1371/journal.pone.0281869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 02/02/2023] [Indexed: 02/24/2023] Open
Abstract
The Pambamarca fortress complex in northern Ecuador is a cultural and built heritage with 18 prehispanic fortresses known as Pucaras. They are mostly located on the ridge of the Pambamarca volcano, which is severely affected by erosion. In this research, we implemented a multiscale methodology to identify sheet, rill and gully erosion in the context of climate change for the prehistoric sites. In a first phase, we coupled the Revised Universal Soil Loss Equation (RUSLE) and four CMIP6 climate models to evaluate and prioritize which Pucaras are prone to sheet and rill erosion, after comparing historical and future climate scenarios. Then, we conducted field visits to collect geophotos and soil samples for validation purposes, as well as drone flight campaigns to derive high resolution digital elevation models and identify gully erosion with the stream power index. Our erosion maps achieved an overall accuracy of 0.75 when compared with geophotos and correlated positively with soil samples sand fraction. The Pucaras evaluated with the historical climate scenario obtained erosion rates ranging between 0 and 20 ton*ha-1*yr-1. These rates also varied from -15.7% to 39.1% for four future climate change models that reported extreme conditions. In addition, after identifying and overflying six Pucaras that showed the highest erosion rates in the future climate models, we mapped their gully-prone areas that represented between 0.9% and 3.2% of their analyzed areas. The proposed methodology allowed us to observe how the design of the Pucaras and their concentric terraces have managed to reduce gully erosion, but also to notice the pressures they suffer due to their susceptibility to erosion, anthropic pressures and climate change. To address this, we suggest management strategies to guide the protection of this cultural and built heritage landscapes.
Collapse
Affiliation(s)
- Fabián Santos
- Centro de Investigación para el Territorio y el Hábitat Sostenible (CITEHS), Universidad Tecnológica Indoamérica, Machala y Sabanilla, Quito, Ecuador
- * E-mail:
| | - Nora Calle
- Departamento de Ciencias de la Computación, Universidad de las Fuerzas Armadas (ESPE), Sangolquí, Ecuador
| | - Santiago Bonilla
- Centro de Investigación para el Territorio y el Hábitat Sostenible (CITEHS), Universidad Tecnológica Indoamérica, Machala y Sabanilla, Quito, Ecuador
| | - Fausto Sarmiento
- Geography Department, Neotropical Montology Collaboratory, University of Georgia, Athens, Georgia, United States of America
| | - Mathew Herrnegger
- Department of Water, Atmosphere and Environment, Institute of Hydrology and Water Management, University of Natural Resources and Life Sciences, Vienna, Austria
| |
Collapse
|
28
|
V-BANet: Land cover change detection using effective deep learning technique. ECOL INFORM 2023. [DOI: 10.1016/j.ecoinf.2023.102019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
|
29
|
Liu K, Jiao Y, Du C, Zhang X, Chen X, Xu F, Jiang C. Driver Stress Detection Using Ultra-Short-Term HRV Analysis under Real World Driving Conditions. ENTROPY (BASEL, SWITZERLAND) 2023; 25:e25020194. [PMID: 36832561 PMCID: PMC9955749 DOI: 10.3390/e25020194] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 01/13/2023] [Accepted: 01/17/2023] [Indexed: 05/09/2023]
Abstract
Considering that driving stress is a major contributor to traffic accidents, detecting drivers' stress levels in time is helpful for ensuring driving safety. This paper attempts to investigate the ability of ultra-short-term (30-s, 1-min, 2-min, and 3-min) HRV analysis for driver stress detection under real driving circumstances. Specifically, the t-test was used to investigate whether there were significant differences in HRV features under different stress levels. Ultra-short-term HRV features were compared with the corresponding short-term (5-min) features during low-stress and high-stress phases by the Spearman rank correlation and Bland-Altman plots analysis. Furthermore, four different machine-learning classifiers, including a support vector machine (SVM), random forests (RFs), K-nearest neighbor (KNN), and Adaboost, were evaluated for stress detection. The results show that the HRV features extracted from ultra-short-term epochs were able to detect binary drivers' stress levels accurately. In particular, although the capability of HRV features in detecting driver stress also varied between different ultra-short-term epochs, MeanNN, SDNN, NN20, and MeanHR were selected as valid surrogates of short-term features for driver stress detection across the different epochs. For drivers' stress levels classification, the best performance was achieved with the SVM classifier, with an accuracy of 85.3% using 3-min HRV features. This study makes a contribution to building a robust and effective stress detection system using ultra-short-term HRV features under actual driving environments.
Collapse
Affiliation(s)
- Kun Liu
- School of Transportation & Logistics, Southwest Jiaotong University, Chengdu 610097, China
| | - Yubo Jiao
- School of Transportation & Logistics, Southwest Jiaotong University, Chengdu 610097, China
| | - Congcong Du
- School of Mines, China University of Mining and Technology, Xuzhou 221116, China
- Department of Aeronautical and Aviation Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China
| | - Xiaoming Zhang
- School of Transportation & Logistics, Southwest Jiaotong University, Chengdu 610097, China
| | - Xiaoyu Chen
- School of Transportation & Logistics, Southwest Jiaotong University, Chengdu 610097, China
| | - Fang Xu
- Department of Purchase Management, Sichuan Tourism University, Chengdu 610100, China
| | - Chaozhe Jiang
- School of Transportation & Logistics, Southwest Jiaotong University, Chengdu 610097, China
- Correspondence:
| |
Collapse
|
30
|
Li Y, Yang X. Quantitative analysis of near infrared spectroscopic data based on dual-band transformation and competitive adaptive reweighted sampling. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2023; 285:121924. [PMID: 36208577 DOI: 10.1016/j.saa.2022.121924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 09/21/2022] [Indexed: 06/16/2023]
Abstract
Near infrared (NIR) spectroscopy has the characteristics of rapid processing, nondestructive analysis and on-line detection. This technique has been widely used in the fields of quantitative determination and substance content analysis. However, for complex NIR spectral data, most traditional machine learning models cannot carry out effective quantitative analyses (manifested as underfitting; that is, the training effect of the model is not good). Small amounts of available data limit the performance of deep learning-based infrared spectroscopy methods, while the traditional threshold-based feature selection methods require more prior knowledge. To address the above problems, this paper proposes a competitive adaptive reweighted sampling method based on dual band transformation (DWT-CARS). DWT-CARS includes four types in total: CARS based on integrated two-dimensional correlation spectrum (i2DCOS-CARS), CARS based on difference coefficient (DI-CARS), CARS based on ratio coefficient (RI-CARS) and CARS based on normalized difference coefficient (NDI-CARS). We conducted comparative experiments on three datasets; compared to traditional machine learning methods, our method achieved good results, demonstrating that this method has considerable prospects for the quantitative analysis of near-infrared spectroscopic data. To further improve the performance and stability of this method, we combined the idea of integrated modeling and constructed a partial least squares model based on Monte Carlo sampling for the samples obtained by CARS (DWT-CARS-MC-PLS). Through comparative experiments, we verified that the integrated model could further enhance the accuracy and stability of the results.
Collapse
Affiliation(s)
- Yiming Li
- Faculty of Information Technology, Beijing University of Technology, Beijing, China
| | - Xinwu Yang
- Faculty of Information Technology, Beijing University of Technology, Beijing, China.
| |
Collapse
|
31
|
Wu S, Wang L, Zhou G, Liu C, Ji Z, Li Z, Li W. Strategies for the content determination of capsaicin and the identification of adulterated pepper powder using a hand-held near-infrared spectrometer. Food Res Int 2023; 163:112192. [PMID: 36596130 DOI: 10.1016/j.foodres.2022.112192] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 11/11/2022] [Accepted: 11/15/2022] [Indexed: 11/27/2022]
Abstract
To achieve the goals of rapid content determination of capsaicin and adulteration detection of pepper powder. The method based on the hand-held near-infrared spectrometer combined with ensemble preprocessing was proposed. DoE-based ensemble preprocessing technique was utilized to develop the partial least squares regression models of red pepper [Capsicum annuum L. var. conoides (Mill.) Irish] powders. The performance of final models was evaluated using coefficient of determination (R2), root mean square error of prediction (RMSEP) and residual predictive deviation (RPD). Model development using selective ensemble preprocessing gave the best prediction of capsaicin in Yanjiao pepper powder (R2 = 0.9800, RPD = 7.090, RMSEP = 0.00689) and Tianying pepper powder (R2 = 0.8935, RPD = 3.017, RMSEP = 0.06154). Moreover, the potential of grey wolf optimizer-support vector machine (GWO-SVM) to detect adulterated pepper powder was investigated. The samples were composed of two authentic products and three different adulterants with different adulteration levels. The results showed that the classification accuracy of GWO-SVM model for Yanjiao peppers was over 90 %, which realized the adulteration detection of Yanjiao pepper. And GWO-SVM showed better performance in detecting adulterated Tianying pepper compared to hierarchical cluster analysis, orthogonal partial least squares discriminant analysis and random forest. In summary, the quality control strategy established in this paper can provide a solution for the adulteration detection and quality evaluation of pepper powder in a rapid and on-site way.
Collapse
Affiliation(s)
- Sijun Wu
- College of Pharmaceutical Engineering of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China; State key Laboratory of Component-based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China
| | - Long Wang
- College of Pharmaceutical Engineering of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China; State key Laboratory of Component-based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China
| | - Guoming Zhou
- College of Pharmaceutical Engineering of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China; State key Laboratory of Component-based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China
| | - Chao Liu
- Shandong wisdom instrument Co., Ltd., Jinan 250000, China
| | - Zhongrui Ji
- Shandong wisdom instrument Co., Ltd., Jinan 250000, China
| | - Zheng Li
- College of Pharmaceutical Engineering of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China; State key Laboratory of Component-based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China; Haihe Laboratory of Modern Chinese Medicine, Tianjin 301617, China
| | - Wenlong Li
- College of Pharmaceutical Engineering of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China; State key Laboratory of Component-based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China; Haihe Laboratory of Modern Chinese Medicine, Tianjin 301617, China.
| |
Collapse
|
32
|
Behrouz MS, Yazdi MN, Sample DJ. Using Random Forest, a machine learning approach to predict nitrogen, phosphorus, and sediment event mean concentrations in urban runoff. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2022; 317:115412. [PMID: 35649331 DOI: 10.1016/j.jenvman.2022.115412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 05/22/2022] [Accepted: 05/24/2022] [Indexed: 06/15/2023]
Abstract
Estimating pollutant loads from developed watersheds is vitally important to reduce nonpoint source pollution from urban areas, as a key tool in meeting water quality goals is the implementation of Stormwater Control Measures (SCMs). SCMs are selected and sized based on influent pollutant loads. A common method used to estimate pollutant loads in urban runoff is the Event Mean Concentration (EMC) method. In this study, we develop and apply data-driven models using Random Forest (RF), a machine learning approach, to predict Total Nitrogen (TN), Total Phosphorus (TP), Total Suspended Solids (TSS), and Ortho-Phosphorus (Ortho-P) EMCs in urban runoff. The parameters considered in this study were climatological characteristics (i.e., Antecedent Dry Period or ADP, Precipitation Depth or P, Duration or D, and Intensity or I) and catchment characteristics including land use-related parameters including Imperviousness or Imp, Saturated Hydraulic Conductivity or Ksat, and Available Water Capacity or AWC), and site-specific parameters including Slope (S), and Catchment Size (A). Stormwater quality data for this study were obtained from the National Stormwater Quality Database (NSQD), which is the largest repository of stormwater quality data in the U.S. Results demonstrate that land use-related characteristics (i.e., Imp, Ksat, and AWC) were the most effective variables for predicting all EMCs. For TP, TSS, and Ortho-P, site-specific characteristics (S and A) had a greater effect than climatological characteristics (i.e., ADP, P, D, and I). However, for TN, climatological characteristics had a greater effect than site-specific characteristics (S and A). In addition, for TN, TP, and TSS, precipitation characteristics (P, D, and I) were found to be more effective parameters for estimating EMCs than ADP. This study highlights the most influential parameters affecting EMCs which can be used by stakeholders and SCMs designers to improve estimates of nutrients and sediment EMCs. The selection and design of the highest performing SCMs is essential in achieving effective treatment of stormwater, attaining water quality goals, and protecting downstream waterbodies.
Collapse
Affiliation(s)
- Mina Shahed Behrouz
- Department of Biological System Engineering, Virginia Polytechnic Institute and State University, Seitz Hall, 155 Ag-Quad Ln, Blacksburg, VA, 24060, United States; Hampton Roads Agricultural Research and Extension Center, Virginia Polytechnic and State University, 1444 Diamond Springs Rd, Virginia Beach, VA, 23455, United States.
| | - Mohammad Nayeb Yazdi
- Department of Biological System Engineering, Virginia Polytechnic Institute and State University, Seitz Hall, 155 Ag-Quad Ln, Blacksburg, VA, 24060, United States; Hampton Roads Agricultural Research and Extension Center, Virginia Polytechnic and State University, 1444 Diamond Springs Rd, Virginia Beach, VA, 23455, United States.
| | - David J Sample
- Department of Biological System Engineering, Virginia Polytechnic Institute and State University, Seitz Hall, 155 Ag-Quad Ln, Blacksburg, VA, 24060, United States.
| |
Collapse
|
33
|
A Machine Learning-Based Surrogate Model for the Identification of Risk Zones Due to Off-Stream Reservoir Failure. WATER 2022. [DOI: 10.3390/w14152416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Approximately 70,000 Spanish off-stream reservoirs, many of them irrigation ponds, need to be evaluated in terms of their potential hazard to comply with the new national Regulation of the Hydraulic Public Domain. This requires a great engineering effort to evaluate different scenarios with two-dimensional hydraulic models, for which many owners lack the necessary resources. This work presents a simplified methodology based on machine learning to identify risk zones at any point in the vicinity of an off-stream reservoir without the need to elaborate and run full two-dimensional hydraulic models. A predictive model based on random forest was created from datasets including the results of synthetic cases computed with an automatic tool based on the two-dimensional numerical software Iber. Once fitted, the model provided an estimate on the potential hazard considering the physical characteristics of the structure, the surrounding terrain and the vulnerable locations. Two approaches were compared for balancing the dataset: the synthetic minority oversampling and the random undersampling. Results from the random forest model adjusted with the random undersampling technique showed to be useful for the estimation of risk zones. On a real application test the simplified method achieved 91% accuracy.
Collapse
|
34
|
Gao Z, Xia R, Zhang P. Prediction of anti-proliferation effect of [1,2,3]triazolo[4,5-d]pyrimidine derivatives by random forest and mix-kernel function SVM with PSO. Chem Pharm Bull (Tokyo) 2022; 70:684-693. [PMID: 35922903 DOI: 10.1248/cpb.c22-00376] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In order to predict the anti-gastric cancer effect of [1,2,3]triazolo[4,5-d]pyrimidine derivatives (1,2,3-TPD), quantitative structure-activity relationship (QSAR) studies were performed. Based on five descriptors selected from descriptors pool, four QSAR models were established by heuristic method (HM), random forest (RF), support vector machine with radial basis kernel function (RBF-SVM), and mix-kernel function support vector machine (MIX-SVM) including radial basis kernel and polynomial kernel function. Furthermore, the model built by RF explained the importance of the descriptors selected by HM. Compared with RBF-SVM, the MIX-SVM enhanced the generalization and learning ability of the constructed model simultaneously and the multi parameters optimization problem in this method was also solved by particle swarm optimization (PSO) algorithm with very low complexity and fast convergence. Besides, leave-one-out cross validation (LOO-CV) was adopted to test the robustness of the models and Q2 was used to describe the results. And the MIX-SVM model showed the best prediction ability and strongest model robustness: R2 = 0.927, Q2 = 0.916, MSE = 0.027 for the training set and R2 = 0.946, Q2 = 0.913, MSE = 0.023 for the test set. This study reveals five key descriptors of 1,2,3-TPD and will provide help to screen out efficient and novel drugs in the future.
Collapse
Affiliation(s)
- Zhan Gao
- College of Computer Science and Technology, Qingdao University
| | - Runze Xia
- College of Computer Science and Technology, Qingdao University
| | - Peijian Zhang
- College of Computer Science and Technology, Qingdao University
| |
Collapse
|
35
|
Dobson B, Barry S, Maes-Prior R, Mijic A, Woodward G, Pearse WD. Predicting catchment suitability for biodiversity at national scales. WATER RESEARCH 2022; 221:118764. [PMID: 35752096 DOI: 10.1016/j.watres.2022.118764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 06/11/2022] [Accepted: 06/13/2022] [Indexed: 06/15/2023]
Abstract
Biomonitoring of water quality and catchment management are often disconnected, due to mismatching scales. Considerable effort and money are spent each year on routine reach-scale surveying across many sites, particularly in countries like the UK, where nationwide sampling has been conducted using standardised techniques for many decades. Most of these traditional freshwater biomonitoring schemes focus on pre-defined indicators of organic pollution to compare observed vs expected subsets of common macroinvertebrate indicator species. Other taxa, including many threatened species, are often ignored due to their rarity, as are many invasive species, which are seen as undesirable despite becoming increasingly common and widespread in freshwaters, especially in urban ecosystems. Both these types of taxa are often monitored separately for reasons related to biodiversity concerns rather than for gauging water quality. Repurposing such data could therefore provide important new biomonitoring tools that can help catchment managers to directly link the water quality they aim to control with the biodiversity they are trying to protect. Here we used extensive data held in the England Non-Native and Rare/Protected species records that track these two groups of species as a proof-of-concept for linking catchment scale management of freshwater ecosystems and biodiversity to a range of potential drivers across England. We used national land use (Centre for Ecology and Hydrology land cover map) and water quality indicator (Environment Agency water quality data archive) datasets to predict, at the catchment scale, the presence or absence of 48 focal threatened or invasive species of concern routinely sampled by the English Environment Agency, with a median accuracy of 0.81 area under the receiver operating characteristic curve. A variety of water quality indicators and land-use types were useful in predictions, highlighting that future biomonitoring schemes could use such complementary measures to capture a wider spectrum of drivers and responses. In particular, the percentage of a catchment covered by freshwater was the single most important metric, reinforcing the need for space/habitat to support biodiversity, but we were also able to resolve a range of key environmental drivers for particular focal species. We show how our method could inform new catchment management approaches, by highlighting how key relationships can be identified and how to understand, visualise and prioritise catchments that are most suitable for restoration or water quality interventions. The scale of this work, in terms of number of species, drivers and locations, represents a significant step towards forging a new approach to catchment management that enables managers to link drivers they can control (water quality and land use) to the biota they are trying to protect (biodiversity).
Collapse
Affiliation(s)
- Barnaby Dobson
- Department of Civil and Environmental Engineering, Faculty of Engineering, Imperial College London.
| | - Saoirse Barry
- Department of Civil and Environmental Engineering, Faculty of Engineering, Imperial College London
| | - Robin Maes-Prior
- Department of Civil and Environmental Engineering, Faculty of Engineering, Imperial College London
| | - Ana Mijic
- Department of Civil and Environmental Engineering, Faculty of Engineering, Imperial College London
| | - Guy Woodward
- Georgina Mace Centre for the Living Planet, Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, Berkshire SL5 7PY, U.K
| | - William D Pearse
- Georgina Mace Centre for the Living Planet, Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, Berkshire SL5 7PY, U.K
| |
Collapse
|
36
|
Wang J, Lu J, Zhang Z, Han X, Zhang C, Chen X. Agricultural non-point sources and their effects on chlorophyll-a in a eutrophic lake over three decades (1985-2020). ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:46634-46648. [PMID: 35171419 DOI: 10.1007/s11356-022-19220-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Accepted: 02/10/2022] [Indexed: 06/14/2023]
Abstract
Erhai Lake is the second largest freshwater lake in Yunnan Province but suffers from the deterioration of water quality and agricultural non-point source pollution (ANPSP). However, little is known about the influence of ANPSP on the water quality of Erhai Lake. The export coefficient model (ECM) was used to obtain the total nitrogen (TN) and total phosphorus (TP) loads from ANPSP in Erhai Lake Basin (ELB). The trophic status of Erhai Lake as influenced by such sources of nutrient input was also been assessed. Results indicated that the TN and TP loads in ELB increased from 1985 to 2005 due to sustainable agricultural development; thereafter, the TN and TP loads decreased from 2005 to 2020, indicating that agricultural pollution prevention improved in ELB. The northern part of ELB had higher pollution intensity than the southern part and the central part, indicating that the ecosystem in the northern part of ELB appeared to be vulnerable. Driving force analysis showed that cattle breeding was the main reason for the exported TN loads in most watersheds, and intensive agricultural planting was the major contributor to TP loads. The mean annual Chl-a concentration had a strong correlation with the TN and TP loads exported from north of ELB, and this finding suggested that ANPSP could lead to eutrophication. The results of this study demonstrate the impacts of agricultural activities on water quality at the watershed scale and provide a scientific foundation for lake management decision-making.
Collapse
Affiliation(s)
- Jialin Wang
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, 430079, China
| | - Jianzhong Lu
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, 430079, China
| | - Zhan Zhang
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, 430079, China
| | - Xingxing Han
- Institute of Surface-Earth System Science, School of Earth System Science, Tianjin University, Tianjin, 300072, China.
- Tianjin Key Laboratory of Earth Critical Zone Science and Sustainable Development in Bohai Rim, Tianjin University, Tianjin, 300072, China.
| | - Chen Zhang
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, 430079, China
| | - Xiaoling Chen
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, 430079, China.
| |
Collapse
|
37
|
Application of Machine Learning and Process-Based Models for Rainfall-Runoff Simulation in DuPage River Basin, Illinois. HYDROLOGY 2022. [DOI: 10.3390/hydrology9070117] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Rainfall-runoff simulation is vital for planning and controlling flood control events. Hydrology modeling using Hydrological Engineering Center—Hydrologic Modeling System (HEC-HMS) is accepted globally for event-based or continuous simulation of the rainfall-runoff operation. Similarly, machine learning is a fast-growing discipline that offers numerous alternatives suitable for hydrology research’s high demands and limitations. Conventional and process-based models such as HEC-HMS are typically created at specific spatiotemporal scales and do not easily fit the diversified and complex input parameters. Therefore, in this research, the effectiveness of Random Forest, a machine learning model, was compared with HEC-HMS for the rainfall-runoff process. Furthermore, we also performed a hydraulic simulation in Hydrological Engineering Center—Geospatial River Analysis System (HEC-RAS) using the input discharge obtained from the Random Forest model. The reliability of the Random Forest model and the HEC-HMS model was evaluated using different statistical indexes. The coefficient of determination (R2), standard deviation ratio (RSR), and normalized root mean square error (NRMSE) were 0.94, 0.23, and 0.17 for the training data and 0.72, 0.56, and 0.26 for the testing data, respectively, for the Random Forest model. Similarly, the R2, RSR, and NRMSE were 0.99, 0.16, and 0.06 for the calibration period and 0.96, 0.35, and 0.10 for the validation period, respectively, for the HEC-HMS model. The Random Forest model slightly underestimated peak discharge values, whereas the HEC-HMS model slightly overestimated the peak discharge value. Statistical index values illustrated the good performance of the Random Forest and HEC-HMS models, which revealed the suitability of both models for hydrology analysis. In addition, the flood depth generated by HEC-RAS using the Random Forest predicted discharge underestimated the flood depth during the peak flooding event. This result proves that HEC-HMS could compensate Random Forest for the peak discharge and flood depth during extreme events. In conclusion, the integrated machine learning and physical-based model can provide more confidence in rainfall-runoff and flood depth prediction.
Collapse
|
38
|
Segmentation of PMSE Data Using Random Forests. REMOTE SENSING 2022. [DOI: 10.3390/rs14132976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
EISCAT VHF radar data are used for observing, monitoring, and understanding Earth’s upper atmosphere. This paper presents an approach to segment Polar Mesospheric Summer Echoes (PMSE) from datasets obtained from EISCAT VHF radar data. The data consist of 30 observations days, corresponding to 56,250 data samples. We manually labeled the data into three different categories: PMSE, Ionospheric background, and Background noise. For segmentation, we employed random forests on a set of simple features. These features include: altitude derivative, time derivative, mean, median, standard deviation, minimum, and maximum values corresponding to neighborhood sizes ranging from 3 by 3 to 11 by 11 pixels. Next, in order to reduce the model bias and variance, we employed a method that decreases the weight applied to pixel labels with large uncertainty. Our results indicate that, first, it is possible to segment PMSE from the data using random forests. Second, the weighted-down labels technique improves the performance of the random forests method.
Collapse
|
39
|
Water Level Forecasting Using Deep Learning Time-Series Analysis: A Case Study of Red River of the North. WATER 2022. [DOI: 10.3390/w14121971] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The Red River of the North is vulnerable to floods, which have caused significant damage and economic loss to inhabitants. A better capability in flood-event prediction is essential to decision-makers for planning flood-loss-reduction strategies. Over the last decades, classical statistical methods and Machine Learning (ML) algorithms have greatly contributed to the growth of data-driven forecasting systems that provide cost-effective solutions and improved performance in simulating the complex physical processes of floods using mathematical expressions. To make improvements to flood prediction for the Red River of the North, this paper presents effective approaches that make use of a classical statistical method, a classical ML algorithm, and a state-of-the-art Deep Learning method. Respectively, the methods are seasonal autoregressive integrated moving average (SARIMA), Random Forest (RF), and Long Short-Term Memory (LSTM). We used hourly level records from three U.S. Geological Survey (USGS), at Pembina, Drayton, and Grand Forks stations with twelve years of data (2007–2019), to evaluate the water level at six hours, twelve hours, one day, three days, and one week in advance. Pembina, at the downstream location, has a water level gauge but not a flow-gauging station, unlike the others. The floodwater-level-prediction results show that the LSTM method outperforms the SARIMA and RF methods. For the one-week-ahead prediction, the RMSE values for Pembina, Drayton, and Grand Forks are 0.190, 0.151, and 0.107, respectively. These results demonstrate the high precision of the Deep Learning algorithm as a reliable choice for flood-water-level prediction.
Collapse
|
40
|
Baratto PFB, Cecílio RA, de Sousa Teixeira DB, Zanetti SS, Xavier AC. Random forest for spatialization of daily evapotranspiration (ET 0) in watersheds in the Atlantic Forest. ENVIRONMENTAL MONITORING AND ASSESSMENT 2022; 194:449. [PMID: 35606615 DOI: 10.1007/s10661-022-10110-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 05/15/2022] [Indexed: 06/15/2023]
Abstract
The importance of daily data on reference evapotranspiration (ET0) has increased in recent years due to its relevance in planning and decision making regarding irrigated agriculture, water production, and forest restoration. Facing the scarcity of this information measured in loco, the study of interpolation methods capable of representing ET0 becomes important. Therefore, this study aimed to evaluate the adequacy of the Random Forest (RF) method in the spatialization of ET0 in the watersheds of the Mid-South region of the Espírito Santo State, located within the Atlantic Forest biome, Brazil. From this study, it was found that the RF method is the most suitable one for ET0 spatialization when compared to the Angular distance weighting (ADW) and the inverse distance weighting (IDW) techniques. Also, the spatializations carried out by this method were transformed into databases in a grid format and made available online. Furthermore, the RF database was also compared to other ET0 grid databases, and it was concluded that the RF database also carried out a better performance than the other ones.
Collapse
Affiliation(s)
| | - Roberto Avelino Cecílio
- Department of Forest and Wood Sciences, Federal University of Espírito Santo, Jerônimo Monteiro, ES, 29550-000, Brazil
| | - David Bruno de Sousa Teixeira
- Department of Agricultural Engineering, Federal University of Viçosa, Avenue Peter Henry Rolfs, Viçosa, MG, 36570-900, Brazil.
| | - Sidney Sara Zanetti
- Department of Forest and Wood Sciences, Federal University of Espírito Santo, Jerônimo Monteiro, ES, 29550-000, Brazil
| | - Alexandre Cândido Xavier
- Department of Forest and Wood Sciences, Federal University of Espírito Santo, Jerônimo Monteiro, ES, 29550-000, Brazil
| |
Collapse
|
41
|
Time Series Features for Supporting Hydrometeorological Explorations and Predictions in Ungauged Locations Using Large Datasets. WATER 2022. [DOI: 10.3390/w14101657] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Regression-based frameworks for streamflow regionalization are built around catchment attributes that traditionally originate from catchment hydrology, flood frequency analysis and their interplay. In this work, we deviated from this traditional path by formulating and extensively investigating the first regression-based streamflow regionalization frameworks that largely emerge from general-purpose time series features for data science and, more precisely, from a large variety of such features. We focused on 28 features that included (partial) autocorrelation, entropy, temporal variation, seasonality, trend, lumpiness, stability, nonlinearity, linearity, spikiness, curvature and others. We estimated these features for daily temperature, precipitation and streamflow time series from 511 catchments and then merged them within regionalization contexts with traditional topographic, land cover, soil and geologic attributes. Precipitation and temperature features (e.g., the spectral entropy, seasonality strength and lag-1 autocorrelation of the precipitation time series, and the stability and trend strength of the temperature time series) were found to be useful predictors of many streamflow features. The same applies to traditional attributes such as the catchment mean elevation. Relationships between predictor and dependent variables were also revealed, while the spectral entropy, the seasonality strength and several autocorrelation features of the streamflow time series were found to be more regionalizable than others.
Collapse
|
42
|
Taghavi N, Niven RK, Paull DJ, Kramer M. Groundwater vulnerability assessment: A review including new statistical and hybrid methods. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 822:153486. [PMID: 35122861 DOI: 10.1016/j.scitotenv.2022.153486] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 01/20/2022] [Accepted: 01/24/2022] [Indexed: 06/14/2023]
Abstract
The concept of groundwater vulnerability was first introduced in the 1970s in France to recognize sensitive areas in which surface pollution could affect groundwater, and to enable others to develop management methods for groundwater protection against surface pollutants. Since this time, numerous methods have been developed for groundwater vulnerability assessment (GVA). These can be categorized into four groups: (i) overlay and index-based methods, (ii) process-based simulation models, (iii) statistical methods, and (iv) hybrid methods. This work provides a comprehensive review of modern GVA methods, which in contrast to previous reviews, examines the last two categories in detail. First, the concept of groundwater vulnerability is defined, then the major GVA methods are introduced and classified. This includes detailed accounts of statistical methods, which can be subdivided into orthodox statistical, data-driven and Bayesian methods, and their advantages and disadvantages, as well as modern hybrid methods. It is concluded that Bayesian inference offers many advantages compared with other GVA methods. It combines theory and data to give the posterior probabilities of different models, which can be continually updated with new data. Furthermore, using the Bayesian approach, it is possible to calculate the probability of a proposition, which is exactly what is needed to make decisions. However, despite the advantages of Bayesian inference, its applications to date have been very limited.
Collapse
Affiliation(s)
- Nasrin Taghavi
- School of Engineering and Information Technology, The University of New South Wales, Canberra, ACT 2600, Australia
| | - Robert K Niven
- School of Engineering and Information Technology, The University of New South Wales, Canberra, ACT 2600, Australia.
| | - David J Paull
- School of Science, The University of New South Wales, Canberra, ACT 2600, Australia
| | - Matthias Kramer
- School of Engineering and Information Technology, The University of New South Wales, Canberra, ACT 2600, Australia
| |
Collapse
|
43
|
New hybrid GR6J-wavelet-based genetic algorithm-artificial neural network (GR6J-WGANN) conceptual-data-driven model approaches for daily rainfall–runoff modelling. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07372-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
44
|
Ho L, Jerves-Cobo R, Barthel M, Six J, Bode S, Boeckx P, Goethals P. Greenhouse gas dynamics in an urbanized river system: influence of water quality and land use. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:37277-37290. [PMID: 35048344 DOI: 10.1007/s11356-021-18081-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 12/08/2021] [Indexed: 06/14/2023]
Abstract
Rivers act as a natural source of greenhouse gases (GHGs). However, anthropogenic activities can largely alter the chemical composition and microbial communities of rivers, consequently affecting their GHG production. To investigate these impacts, we assessed the accumulation of CO2, CH4, and N2O in an urban river system (Cuenca, Ecuador). High variation of dissolved GHG concentrations was found among river tributaries that mainly depended on water quality and land use. By using Prati and Oregon water quality indices, we observed a clear pattern between water quality and the dissolved GHG concentration: the more polluted the sites were, the higher were their dissolved GHG concentrations. When river water quality deteriorated from acceptable to very heavily polluted, the mean value of pCO2 and dissolved CH4 increased by up to ten times while N2O concentrations boosted by 15 times. Furthermore, surrounding land-use types, i.e., urban, roads, and agriculture, could considerably affect the GHG production in the rivers. Particularly, the average pCO2 and dissolved N2O of the sites close to urban areas were almost four times higher than those of the natural sites while this ratio was 25 times in case of CH4, reflecting the finding that urban areas had the worst water quality with almost 70% of their sites being polluted while this proportion of nature areas was only 12.5%. Lastly, we identified dissolved oxygen, ammonium, and flow characteristics as the main important factors to the GHG production by applying statistical analysis and random forests. These results highlighted the impacts of land-use types on the production of GHGs in rivers contaminated by sewage discharges and surface runoff.
Collapse
Affiliation(s)
- Long Ho
- Department of Animal Sciences, Ghent University, Ghent, Belgium.
| | - Ruben Jerves-Cobo
- Department of Animal Sciences, Ghent University, Ghent, Belgium
- PROMAS, Universidad de Cuenca, Cuenca, Ecuador
- Department of Data Analysis and Mathematical Modelling, BIOMATH, Ghent University, Ghent, Belgium
| | - Matti Barthel
- Department of Environmental System`S Science, ETH Zurich, Zurich, Switzerland
| | - Johan Six
- Department of Environmental System`S Science, ETH Zurich, Zurich, Switzerland
| | - Samuel Bode
- Department of Green Chemistry and Technology, Isotope Bioscience Laboratory - ISOFYS, Ghent University, Ghent, Belgium
| | - Pascal Boeckx
- Department of Green Chemistry and Technology, Isotope Bioscience Laboratory - ISOFYS, Ghent University, Ghent, Belgium
| | - Peter Goethals
- Department of Animal Sciences, Ghent University, Ghent, Belgium
| |
Collapse
|
45
|
Tree Based Approaches for Predicting Concrete Carbonation Coefficient. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12083874] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2022]
Abstract
Carbonation is one of the critical durability issues in reinforced concrete structures in terms of their structural integrity and safety and may cause the fatal deterioration and corrosion of steel reinforcement if ignored. Many researchers have performed a considerable number of studies to predict the carbonation of concrete structures. However, it is still challenging to predict the carbonation depth or carbonation coefficient, as they depend on various factors. Therefore, creating a model that can learn from available data using Data Driven Techniques (DDT) is a step forward in this research field. This study provides new approaches to predict the carbonation coefficient of concrete through Model Tree (MT), Random Forest (RF) and Multi-Gene Genetic Programming (MGGP) approaches. With 827 case studies, the predicted models can be seen as a function of a set of conditioning factors, which are statistically significant in explaining the carbonation mechanism. The results obtained through MT, RF and MGGP were compared with those obtained through Multiple Linear Regression (MLR), Artificial Neural Networks (ANNs) and Genetic Programming (which were previously developed). The results reveal that the MT, RF and MGGP perform better than the previous models. Moreover, the MT technique displays its output in terms of series of equations, RF as multiple trees and MGGP in form of a single equation, which are more user-friendly and applicable in practice.
Collapse
|
46
|
Prediction of Rainfall in Australia Using Machine Learning. INFORMATION 2022. [DOI: 10.3390/info13040163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Meteorological phenomena is an area in which a large amount of data is generated and where it is more difficult to make predictions about events that will occur due to the high number of variables on which they depend. In general, for this, probabilistic models are used that offer predictions with a margin of error, so that in many cases they are not very good. Due to the aforementioned conditions, the use of machine learning algorithms can serve to improve predictions. This article describes an exploratory study of the use of machine learning to make predictions about the phenomenon of rain. To do this, a set of data was taken as an example that describes the measurements gathered on rainfall in the main cities of Australia in the last 10 years, and some of the main machine learning algorithms were applied (knn, decision tree, random forest, and neural networks). The results show that the best model is based on neural networks.
Collapse
|
47
|
Water Information Extraction Based on Multi-Model RF Algorithm and Sentinel-2 Image Data. SUSTAINABILITY 2022. [DOI: 10.3390/su14073797] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
For the Sentinel-2 multispectral satellite image remote sensing data, due to the rich spatial information, the traditional water body extraction methods cannot meet the needs of practical applications. In this study, a random forest-based RF_16 optimal combination model algorithm is proposed to extract water bodies. The research process uses Sentinel-2 multispectral satellite images and DEM data as the basic data, collected 24 characteristic variable indicators (B2, B3, B4, B8, B11, B12, NDVI, MSAVI, B5, B6, B7, B8A, NDI45, MCARI, REIP, S2REP, IRECI, PSSRa, NDWI, MNDWI, LSWI, DEM, SLOPE, SLOPE ASPECT), and constructed four combined models with different input variables. After analysis, it was determined that RF_16 was the optimal combination for extracting water body information in the study area. Model. The results show that: (1) The characteristic variables that have an important impact on the accuracy of the model are the improved normalized difference water index (MNDWI), band B2 (Blue), normalized water index (NDWI), B4 (Red), B3 (Green), and band B5 (Vegetation Red-Edge 1); (2) The water extraction accuracy of the optimal combined model RF_16 can reach 93.16%, and the Kappa coefficient is 0.8214. The overall accuracy is 0.12% better than the traditional Relief F algorithm. The RF_16 method based on the optimal combination model of random forest is an effective means to obtain high-precision water body information in the study area. It can effectively reduce the “salt and pepper effect” and the influence of mixed pixels such as water and shadows on the water extraction accuracy.
Collapse
|
48
|
Barros DB, Cardoso SM, Oliveira E, Brentan B, Ribeiro L. Using data mining techniques to isolate chemical intrusion in water distribution systems. ENVIRONMENTAL MONITORING AND ASSESSMENT 2022; 194:203. [PMID: 35182211 DOI: 10.1007/s10661-022-09867-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 02/05/2022] [Indexed: 06/14/2023]
Abstract
The security of water distribution systems has become the subject of an increasing volume of research over the last decade. Data analysis and machine learning are linked to hydraulic and quality modeling for improving the capacity of water utilities to save lives when faced with the contamination of water networks. This research applies k-nearest neighbor and random forest algorithms to estimate the location of contamination sources at near-real time. Epanet and Epanet-MSX software are used to simulate intrusions of pesticide into water distribution system and the interaction with compounds already present in water bulk. Different pesticide concentrations are considered in the simulations, and chlorine monitoring occurs through placed quality sensors. The results show that random forest can localize [Formula: see text] of contamination scenarios, while the KNN algorithm found [Formula: see text]. Finally, an assessment of contamination spread is made for a better understanding of the impacts of non-localized contamination.
Collapse
Affiliation(s)
- Daniel Bezerra Barros
- Hydraulic Engineering and Water Resources Department - School of Engineering, Federal University of Minas Gerais, Belo Horizonte, Brazil.
| | | | - Eva Oliveira
- School of Technology, University of Campinas, Campinas, Brazil
| | - Bruno Brentan
- Hydraulic Engineering and Water Resources Department - School of Engineering, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | | |
Collapse
|
49
|
Comparative Assessment of Individual and Ensemble Machine Learning Models for Efficient Analysis of River Water Quality. SUSTAINABILITY 2022. [DOI: 10.3390/su14031183] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The prediction accuracies of machine learning (ML) models may not only be dependent on the input parameters and training dataset, but also on whether an ensemble or individual learning model is selected. The present study is based on the comparison of individual supervised ML models, such as gene expression programming (GEP) and artificial neural network (ANN), with that of an ensemble learning model, i.e., random forest (RF), for predicting river water salinity in terms of electrical conductivity (EC) and dissolved solids (TDS) in the Upper Indus River basin, Pakistan. The projected models were trained and tested by using a dataset of seven input parameters chosen on the basis of significant correlation. Optimization of the ensemble RF model was achieved by producing 20 sub-models in order to choose the accurate one. The goodness-of-fit of the models was assessed through well-known statistical indicators, such as the coefficient of determination (R2), mean absolute error (MAE), root mean squared error (RMSE), and Nash–Sutcliffe efficiency (NSE). The results demonstrated a strong association between inputs and modeling outputs, where R2 value was found to be 0.96, 0.98, and 0.92 for the GEP, RF, and ANN models, respectively. The comparative performance of the proposed methods showed the relative superiority of the RF compared to GEP and ANN. Among the 20 RF sub-models, the most accurate model yielded the R2 equal to 0.941 and 0.938, with 70 and 160 numbers of corresponding estimators. The lowest RMSE values of 1.37 and 3.1 were yielded by the ensemble RF model on training and testing data, respectively. The results of the sensitivity analysis demonstrated that HCO3− is the most effective variable followed by Cl− and SO42− for both the EC and TDS. The assessment of the models on external criteria ensured the generalized results of all the aforementioned techniques. Conclusively, the outcome of the present research indicated that the RF model with selected key parameters could be prioritized for water quality assessment and management.
Collapse
|
50
|
Abstract
Predictive uncertainty in hydrological modelling is quantified by using post-processing or Bayesian-based methods. The former methods are not straightforward and the latter ones are not distribution-free (i.e., assumptions on the probability distribution of the hydrological model’s output are necessary). To alleviate possible limitations related to these specific attributes, in this work we propose the calibration of the hydrological model by using the quantile loss function. By following this methodological approach, one can directly simulate pre-specified quantiles of the predictive distribution of streamflow. As a proof of concept, we apply our method in the frameworks of three hydrological models to 511 river basins in the contiguous US. We illustrate the predictive quantiles and show how an honest assessment of the predictive performance of the hydrological models can be made by using proper scoring rules. We believe that our method can help towards advancing the field of hydrological uncertainty.
Collapse
|