1
|
Nong X, Lai C, Chen L, Wei J. A novel coupling interpretable machine learning framework for water quality prediction and environmental effect understanding in different flow discharge regulations of hydro-projects. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 950:175281. [PMID: 39117235 DOI: 10.1016/j.scitotenv.2024.175281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 08/01/2024] [Accepted: 08/02/2024] [Indexed: 08/10/2024]
Abstract
Machine learning models (MLMs) have been increasingly used to forecast water pollution. However, the "black box" characteristic for understanding mechanism processes still limits the applicability of MLMs for water quality management in hydro-projects under complex and frequently artificial regulation. This study proposes an interpretable machine learning framework for water quality prediction coupled with a hydrodynamic (flow discharge) scenario-based Random Forest (RF) model with multiple model-agnostic techniques and quantifies global, local, and joint interpretations (i.e., partial dependence, individual conditional expectation, and accumulated local effects) of environmental factor implications. The framework was applied and verified to predict the permanganate index (CODMn) under different flow discharge regulation scenarios in the Middle Route of the South-to-North Water Diversion Project of China (MRSNWDPC). A total of 4664 sampling cases data matrices, including water quality, meteorological, and hydrological indicators from eight national stations along the main canal of the MRSNWDPC, were collected from May 2019 to December 2020. The results showed that the RF models were effective in forecasting CODMn in all flow discharge scenarios, with a mean square error, coefficient of determination, and mean absolute error of 0.006-0.026, 0.481-0.792, and 0.069-0.104, respectively, in the testing dataset. A global interpretation indicated that dissolved oxygen, flow discharge, and surface pressure are the three most important variables of CODMn. Local and joint interpretations indicated that the RF-based prediction model provides a basic understanding of the physical mechanisms of environmental systems. The proposed framework can effectively learn the fundamental environmental implications of water quality variations and provide reliable prediction performance, highlighting the importance of model interpretability for trustworthy machine learning applications in water management projects. This study provides scientific references for applying advanced data-driven MLMs to water quality forecasting and a reliable methodological framework for water quality management and similar hydro-projects.
Collapse
Affiliation(s)
- Xizhi Nong
- College of Civil Engineering and Architecture, Guangxi University, Nanning 530004, China; State Key Laboratory of Hydroscience and Engineering, Tsinghua University, Beijing 100084, China; Centre for Urban Sustainability and Resilience, Department of Civil, Environmental and Geomatic Engineering, University College London, London WC1E 6BT, UK; School of Computing and Engineering, University of West London, London W5 5RF, UK
| | - Cheng Lai
- College of Civil Engineering and Architecture, Guangxi University, Nanning 530004, China
| | - Lihua Chen
- College of Civil Engineering and Architecture, Guangxi University, Nanning 530004, China.
| | - Jiahua Wei
- State Key Laboratory of Hydroscience and Engineering, Tsinghua University, Beijing 100084, China
| |
Collapse
|
2
|
Clements E, Thompson KA, Hannoun D, Dickenson ERV. Classification machine learning to detect de facto reuse and cyanobacteria at a drinking water intake. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 948:174690. [PMID: 38992351 DOI: 10.1016/j.scitotenv.2024.174690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 06/25/2024] [Accepted: 07/08/2024] [Indexed: 07/13/2024]
Abstract
Harmful algal blooms (HABs) or higher levels of de facto water reuse (DFR) can increase the levels of certain contaminants at drinking water intakes. Therefore, the goal of this study was to use multi-class supervised machine learning (SML) classification with data collected from six online instruments measuring fourteen total water quality parameters to detect cyanobacteria (corresponding to approximately 950 cells/mL, 2900 cells/mL, and 8600 cells/mL) or DFR (0.5, 1 and 2 % of wastewater effluent) events in the raw water entering an intake. Among 56 screened models from the caret package in R, four (mda, LogitBoost, bagFDAGCV, and xgbTree) were selected for optimization. mda had the greatest testing set accuracy, 98.09 %, after optimization with 7 false alerts. Some of the most important water parameters for the different models were phycocyanin-like fluorescence, UVA254, and pH. SML could detect algae blending events (estimated <9000 cells/mL) due in part to the phycocyanin-like fluorescence sensor. UVA254 helped identify higher concentrations of DFR. These results show that multi-class SML classification could be used at drinking water intakes in conjunction with online instrumentation to detect and differentiate HABs and DFR events. This could be used to create alert systems for the water utilities at the intake, rather than the finished water, so any adjustment to the treatment process could be implemented.
Collapse
Affiliation(s)
- Emily Clements
- Southern Nevada Water Authority, 1299 Burkholder Blvd., Henderson, NV 89015, USA
| | - Kyle A Thompson
- Southern Nevada Water Authority, 1299 Burkholder Blvd., Henderson, NV 89015, USA; Carollo Engineers, Inc., 10900 Stonelake Blvd Bldg 2 Ste 126, Austin, TX 78759, USA
| | - Deena Hannoun
- Southern Nevada Water Authority, 1299 Burkholder Blvd., Henderson, NV 89015, USA
| | - Eric R V Dickenson
- Southern Nevada Water Authority, 1299 Burkholder Blvd., Henderson, NV 89015, USA.
| |
Collapse
|
3
|
Wang C, Wang Q, Ben W, Qiao M, Ma B, Bai Y, Qu J. Machine learning predicts the growth of cyanobacterial genera in river systems and reveals their different environmental responses. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 946:174383. [PMID: 38960197 DOI: 10.1016/j.scitotenv.2024.174383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 03/04/2024] [Accepted: 06/28/2024] [Indexed: 07/05/2024]
Abstract
Cyanobacterial blooms are a common and serious problem in global freshwater environments. However, the response mechanisms of various cyanobacterial genera to multiple nutrients and pollutants, as well as the factors driving their competitive dominance, remain unclear or controversial. The relative abundance and cell density of two dominant cyanobacterial genera (i.e., Cyanobium and Microcystis) in river ecosystems along a gradient of anthropogenic disturbance were predicted by random forest with post-interpretability based on physicochemical indices. Results showed that the optimized predictions all reached strong fitting with R2 > 0.75, and conventional water quality indices played a dominant role. One-dimensional and two-dimensional partial dependence plot (PDP) revealed that the responses of Cyanobium and Microcystis to nutrients and temperature were similar, but they showed differences in preferrable nutrient utilization and response to pollutants. Further prediction and PDP for the ratio of Cyanobium and Microcystis unveiled that their distinct responses to PAHs and SPAHs were crucial drivers for their competitive dominance over each other. This study presents a new way for analyzing the response of cyanobacterial genera to multiple environmental factors and their dominance relationships by interpretable machine learning, which is suitable for the identification and interpretation of high-dimensional nonlinear ecosystems with complex interactions.
Collapse
Affiliation(s)
- Chenchen Wang
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China; School of Environmental and Municipal Engineering, Tianjin Chengjian University, Tianjin 300384, China; Tianjin Key Laboratory of Aquatic Science and Technology, Tianjin Chengjian University, Tianjin 300384, China
| | - Qiaojuan Wang
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Weiwei Ben
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Meng Qiao
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Baiwen Ma
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China.
| | - Yaohui Bai
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China.
| | - Jiuhui Qu
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| |
Collapse
|
4
|
Kim JH, Byeon S, Lee H, Lee DH, Lee MY, Shin JK, Chon K, Jeong DS, Park Y. Deep-learning and data-resampling: A novel approach to predict cyanobacterial alert levels in a reservoir. ENVIRONMENTAL RESEARCH 2024:120135. [PMID: 39393456 DOI: 10.1016/j.envres.2024.120135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 10/05/2024] [Accepted: 10/08/2024] [Indexed: 10/13/2024]
Abstract
The proliferation of harmful algal blooms results in adverse impacts on aquatic ecosystems and public health. Early warning system monitors algal bloom occurrences and provides management strategies for promptly addressing high-concentration algal blooms following their occurrence. In this study, we aimed to develop a proactive prediction model for cyanobacterial alert levels to enable efficient decision-making in management practices. We utilized 11 years of water quality, hydrodynamic, and meteorological data from a reservoir that experiences frequent harmful cyanobacterial blooms in summer. We used these data to construct a deep-learning model, specifically a 1D convolution neural network (1D-CNN) model, to predict cyanobacterial alert levels one week in advance. However, the collected distribution of algal alert levels was imbalanced, leading to the biased training of data-driven models and performance degradation in model predictions. Therefore, an adaptive synthetic sampling method was applied to address the imbalance in the minority class data and improve the predictive performance of the 1D-CNN. The adaptive synthetic sampling method resolved the imbalance in the data during the training phase by incorporating an additional 156 and 196 data points for the caution and warning levels, respectively. The selected optimal 1D-CNN model with a filter size of 5 and comprising 16 filters achieved training and testing prediction accuracies of 97.3% and 85.0%, respectively. During the test phase, the prediction accuracies for each algal alert level (L-0, L-1, and L-2) were 89.9%, 79.2%, and 71.4%, respectively, indicating reasonably consistent predictive results for all three alert levels. Therefore, the use of synthetic data addressed data imbalances and enhanced the predictive performance of the data-driven model. The reliable forecasts produced by the improved model can support the development of management strategies to mitigate harmful algal blooms in reservoirs and can aid in building an early warning system to facilitate effective responses.
Collapse
Affiliation(s)
- Jin Hwi Kim
- Future and Fusion Lab of Architectural Civil and Environmental Engineering, Korea University, Seoul 02841, Republic of Korea
| | - Seohyun Byeon
- School of Civil and Environmental Engineering, Konkuk University, Gwangjin-gu, Seoul 05029, Republic of Korea
| | - Hankyu Lee
- School of Civil and Environmental Engineering, Konkuk University, Gwangjin-gu, Seoul 05029, Republic of Korea
| | - Dong Hoon Lee
- Department of Civil and Environmental Engineering, Dongguk University-Seoul, 30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea
| | - Min-Yong Lee
- Division of Chemical Research, National Institute of Environmental Research, Seogu, Incheon 22689, Republic of Korea
| | - Jae-Ki Shin
- Limnoecological Science Research Institute Korea THE HANGANG, Gyeongnam 50440, Republic of Korea
| | - Kangmin Chon
- Department of Environmental Engineering, Kangwon National University, Gangwon-do 24341, Republic of Korea; Department of Integrated Energy and Infra System, Kangwon National University, Gangwon-do 24341, Republic of Korea
| | - Dae Seong Jeong
- Future and Fusion Lab of Architectural Civil and Environmental Engineering, Korea University, Seoul 02841, Republic of Korea
| | - Yongeun Park
- School of Civil and Environmental Engineering, Konkuk University, Gwangjin-gu, Seoul 05029, Republic of Korea.
| |
Collapse
|
5
|
Su T, Xu L, Liu X, Cui X, Lei B, Di J, Xie T. Study on the applicability of FAI linear fitting model in the extraction of cyanobacterial blooms. ENVIRONMENTAL MONITORING AND ASSESSMENT 2024; 196:909. [PMID: 39249606 DOI: 10.1007/s10661-024-13082-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 08/31/2024] [Indexed: 09/10/2024]
Abstract
Currently, more and more lakes around the world are experiencing outbreaks of cyanobacterial blooms, and high-precision and rapid monitoring of the spatial distribution of algae in water bodies is an important task. Remote sensing technology is one of the effective means for monitoring algae in water bodies. Studies have shown that the Floating Algae Index (FAI) is superior to methods such as the Standardized Differential Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) in monitoring cyanobacterial blooms. However, compared to the NDVI method, the FAI method has difficulty in determining the threshold, and how to choose the threshold with the highest classification accuracy is challenging. In this study, FAI linear fitting model (FAI-L) is selected to solve the problem that FAI threshold is difficult to determine. Innovatively combine FAI index and NDVI index, and use NDVI index to find the threshold of FAI index. In order to analyze the applicability of FAI-L to extract cyanobacterial blooms, this paper selected multi-temporal Landsat8, HJ-1B, and Sentinel-2 remote sensing images as data sources, and took Chaohu Lake and Taihu Lake in China as research areas to extract cyanobacterial blooms. The results show that (1) the accuracy of extracting cyanobacterial bloom by FAI-L method is generally higher than that by NDVI and FAI. Under different data sources and different research areas, the average accuracy of extracting cyanobacterial blooms by FAI-L method is 95.13%, which is 6.98% and 18.43% higher than that by NDVI and FAI respectively. (2) The average accuracy of FAI-L method for extracting cyanobacterial blooms varies from 84.09 to 99.03%, with a standard deviation of 4.04, which is highly stable and applicable. (3) For simultaneous multi-source image data, the FAI-L method has the highest average accuracy in extracting cyanobacterial blooms, at 95.93%, which is 6.77% and 13.26% higher than NDVI and FAI methods, respectively. In this paper, it is found that FAI-L method shows high accuracy and stability in extracting cyanobacterial blooms, and it can extract the spatial distribution of cyanobacterial blooms well, which can provide a new method for monitoring cyanobacterial blooms.
Collapse
Affiliation(s)
- Tao Su
- School of Spatial Information and Geomatics Engineering, Anhui University of Science and Technology, Huainan, 232001, China.
| | - Liangquan Xu
- School of Spatial Information and Geomatics Engineering, Anhui University of Science and Technology, Huainan, 232001, China
| | - Xinbei Liu
- School of Spatial Information and Geomatics Engineering, Anhui University of Science and Technology, Huainan, 232001, China
| | - Xingyuan Cui
- School of Spatial Information and Geomatics Engineering, Anhui University of Science and Technology, Huainan, 232001, China
| | - Bo Lei
- Department of Irrigation and Drainage, China Institute of Water Resources and Hydropower Research, Beijing, 100038, China
| | - Junnan Di
- School of Spatial Information and Geomatics Engineering, Anhui University of Science and Technology, Huainan, 232001, China
| | - Tian Xie
- Anhui Yangtze River Administration, Hefei, 241000, China
| |
Collapse
|
6
|
Kim D, Lee K, Jeong S, Song M, Kim B, Park J, Heo TY. Real-time chlorophyll-a forecasting using machine learning framework with dimension reduction and hyperspectral data. ENVIRONMENTAL RESEARCH 2024; 262:119823. [PMID: 39173818 DOI: 10.1016/j.envres.2024.119823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 08/18/2024] [Accepted: 08/19/2024] [Indexed: 08/24/2024]
Abstract
Since water is an essential resource in various fields, it requires constant monitoring. Chlorophyll-a concentration is a crucial indicator of water quality and can be used to monitor water quality. In this study, we developed methods to forecast chlorophyll-a concentrations in real-time using hyperspectral data on IoT platform and various machine learning algorithms. Compared to regular cameras that record information only in the three broad color bands of red, green, and blue, the hyperspectral images of drinking water sources record the data in dozens or even hundreds of distinct small wavelength bands, providing each pixel in an image with a full spectrum. Different machine learning algorithms have been developed using hyperspectral data and field observations of water quality and weather conditions. Previous studies have predicted chlorophyll concentrations using either partial least squares (PLS), which is a dimensionality reduction method, or machine learning. In contrast, our study employed the PLS technique as a preprocessing step to diminish the dimensionality of the hyperspectral data, followed by the application of the machine learning techniques with optimized hyperparameters to improve the precision of the predictions, thereby introducing a real-time mechanism for chlorophyll-a prediction. Consequently, a machine learning algorithm with R2 values of 0.9 or above and sufficiently small RMSE was developed for real-time chlorophyll-a forecasting. Real-time chlorophyll-a forecasting using LightGBM has the best performance, with a mean R2 of 0.963 and a mean RMSE of 2.679. This paper is expected to have applications in algal bloom early detection on monitoring systems.
Collapse
Affiliation(s)
- Doyun Kim
- Department of Information and Statistics, Chungbuk National University, South Korea
| | - KyoungJin Lee
- Sales Department, Esolutions Co. Ltd, Daejeon, South Korea
| | - SeungMyeong Jeong
- Autonomous IoT Research Center, Korea Electronics Technology Institute, South Korea
| | - MinSeok Song
- EMS department, DongMoon ENT Co., Ltd., South Korea
| | | | - Jungsu Park
- Department of Civil and Environmental Engineering, Hanbat National University, South Korea.
| | - Tae-Young Heo
- Department of Information and Statistics, Chungbuk National University, South Korea.
| |
Collapse
|
7
|
Park J, Patel K, Lee WH. Recent advances in algal bloom detection and prediction technology using machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 938:173546. [PMID: 38810749 DOI: 10.1016/j.scitotenv.2024.173546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 05/18/2024] [Accepted: 05/24/2024] [Indexed: 05/31/2024]
Abstract
Harmful algal blooms (HAB) including red tides and cyanobacteria are a significant environmental issue that can have harmful effects on aquatic ecosystems and human health. Traditional methods of detecting and managing algal blooms have been limited by their reliance on manual observation and analysis, which can be time-consuming and costly. Recent advances in machine learning (ML) technology have shown promise in improving the accuracy and efficiency of algal bloom detection and prediction. This paper provides an overview of the latest developments in using ML for algal bloom detection and prediction using various water quality parameters and environmental factors. First, we introduced ML for algal bloom prediction using regression and classification models. Then we explored image-based ML for algae detection by utilizing satellite images, surveillance cameras, and microscopic images. This study also highlights several real-world examples of successful implementation of ML for algal bloom detection and prediction. These examples show how ML can enhance the accuracy and efficiency of detecting and predicting algal blooms, contributing to the protection of aquatic ecosystems and human health. The study also outlines recent efforts to enhance the field applicability of ML models and suggests future research directions. A recent interest in explainable artificial intelligence (XAI) was discussed in an effort to understand the most influencing environmental factors on algal blooms. XAI facilitates interpretations of ML model results, thereby enhancing the models' usability for decision-making in field management and improving their overall applicability in real-world settings. We also emphasize the significance of obtaining high-quality, field-representative data to enhance the efficiency of ML applications. The effectiveness of ML models in detecting and predicting algal blooms can be improved through management strategies for data quality, such as pre-treating missing data and integrating diverse datasets into a unified database. Overall, this paper presents a comprehensive review of the latest advancements in managing algal blooms using ML technology and proposes future research directions to enhance the utilization of ML techniques.
Collapse
Affiliation(s)
- Jungsu Park
- Department of Civil and Environmental Engineering, Hanbat National University,125, Dongseo-daero, Yuseong-gu, Daejeon 34158, Republic of Korea.
| | - Keval Patel
- Department of Civil, Environmental and Construction Engineering, University of Central Florida, 12800 Pegasus Dr., Orlando, FL 32816, United States.
| | - Woo Hyoung Lee
- Department of Civil, Environmental and Construction Engineering, University of Central Florida, 12800 Pegasus Dr., Orlando, FL 32816, United States.
| |
Collapse
|
8
|
Lee DH, Lee SI, Kang JH. Machine learning approaches to identify spatial factors and their influential distances for heavy metal contamination in downstream sediment. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 948:174755. [PMID: 39025146 DOI: 10.1016/j.scitotenv.2024.174755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 06/30/2024] [Accepted: 07/11/2024] [Indexed: 07/20/2024]
Abstract
Contaminated sediments can adversely affect aquatic ecosystems, making the identification and management of pollutant sources extremely important. In this study, we proposed machine learning approaches to reveal sources and their influential distances for heavy metal contamination of downstream sediment. We employed classification models with artificial neural networks (ANN) and random forest (RF), respectively, to predict the heavy metal contamination of stream sediments using upland environmental variables as input features. A comprehensive Korean nationwide monitoring database containing 1546 datasets was used to train and test the models. These datasets encompass the concentrations of eight heavy metals (Ar, Cd, Cr, Cu, Hg, Ni, Pb, and Zn) in sediment samples collected from 160 stream sites across the nation from 2014 to 2018. Model's prediction accuracy was evaluated for input feature sets from different influential upland areas defined by different buffer radii and the watershed boundary for each site. Although both ANN and RF models were unsatisfactory in predicting heavy metal quartile classes, RF-classifiers with adaptive synthetic oversampling (ORFC) showed reasonably well-predicted classes of the sediment samples based on the Canada's Sediment Quality Guidelines (accuracy ranged from 0.67 to 0.94). The best influential distance (i.e., buffer radius) was determined for each heavy metal based on the accuracy of ORFC. The results indicated that Cd, Cu and Pb had shorter influential distances (1.5-2.0 km) than the other heavy metals with little difference in accuracy for different influential distances. Feature importance calculation revealed that upland soil contamination was the primary factor for Hg and Ni, while residential areas and roads were significant features associated with Pb and Zn contamination. This approach offers information on major contamination sources and their influential areas to be prioritized for managing contaminated stream sediments.
Collapse
Affiliation(s)
- Dong Hoon Lee
- Department of Civil and Environmental Engineering, Dongguk University-Seoul, Seoul 04620, Republic of Korea
| | - Sang-Il Lee
- Department of Civil and Environmental Engineering, Dongguk University-Seoul, Seoul 04620, Republic of Korea
| | - Joo-Hyon Kang
- Department of Civil and Environmental Engineering, Dongguk University-Seoul, Seoul 04620, Republic of Korea.
| |
Collapse
|
9
|
Mao X, Wang Q, Chang H, Liu B, Zhou S, Deng L, Zhang B, Qu F. Moderate oxidation of algae-laden water: Principals and challenges. WATER RESEARCH 2024; 257:121674. [PMID: 38678835 DOI: 10.1016/j.watres.2024.121674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 04/21/2024] [Accepted: 04/23/2024] [Indexed: 05/01/2024]
Abstract
The occurrence of seasonal algae blooms represents a huge dilemma for water resource management and has garnered widespread attention. Therefore, finding methods to control algae pollution and improve water quality is urgently needed. Moderate oxidation has emerged as a feasible way of algae-laden water treatment and is an economical and prospective strategy for controlling algae and endogenous and exogenous pollutants. Despite this, a comprehensive understanding of algae-laden water treatment by moderate oxidation, particularly principles and summary of advanced strategies, as well as challenges in moderate oxidation application, is still lacking. This review outlines the properties and characterization of algae-laden water, which serve as a prerequisite for assessing the treatment efficiency of moderate oxidation. Biomass, cell viability, and organic matter are key components to assessing moderate oxidation performance. More importantly, the recent advancements in employing moderate oxidation as a treatment or pretreatment procedure were examined, and the suitability of different techniques was evaluated. Generally, moderate oxidation is more promising for improving the solid-liquid separation process by the reduction of cell surface charge (stability) and removal/degradation of the soluble algae secretions. Furthermore, this review presents an outlook on future research directions aimed at overcoming the challenges encountered by existing moderate oxidation technologies. This comprehensive examination aims to provide new and valuable insights into the moderate oxidation process.
Collapse
Affiliation(s)
- Xin Mao
- Hunan Engineering Research Center of Water Security Technology and Application, College of Civil Engineering, Hunan University, Changsha 410082, PR China
| | - Qingnan Wang
- Hunan Engineering Research Center of Water Security Technology and Application, College of Civil Engineering, Hunan University, Changsha 410082, PR China
| | - Haiqing Chang
- MOE Key Laboratory of Deep Earth Science and Engineering, College of Architecture and Environment, Sichuan University, Chengdu 610065, PR China
| | - Bin Liu
- Hunan Engineering Research Center of Water Security Technology and Application, College of Civil Engineering, Hunan University, Changsha 410082, PR China.
| | - Shiqing Zhou
- Hunan Engineering Research Center of Water Security Technology and Application, College of Civil Engineering, Hunan University, Changsha 410082, PR China
| | - Lin Deng
- Hunan Engineering Research Center of Water Security Technology and Application, College of Civil Engineering, Hunan University, Changsha 410082, PR China
| | - Bing Zhang
- National Research Base of Intelligent Manufacturing Service, Chongqing Technology and Business University, Chongqing 400067, China.
| | - Fangshu Qu
- Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Guangzhou University, Guangzhou 510006, PR China
| |
Collapse
|
10
|
Kim JH, Lee DH, Mendoza JA, Lee MY. Applying machine learning random forest (RF) method in predicting the cement products with a co-processing of input materials: Optimizing the hyperparameters. ENVIRONMENTAL RESEARCH 2024; 248:118300. [PMID: 38281562 DOI: 10.1016/j.envres.2024.118300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 12/25/2023] [Accepted: 01/22/2024] [Indexed: 01/30/2024]
Abstract
Co-processing recycled waste during cement production, i.e., using alternative materials such as secondary raw materials or secondary raw fuels, is widely practiced in developed countries. Alternative raw materials or fuels contain high concentrations of heavy metals and other hazardous chemicals, which might lead to the potential for dangerous heavy metals and hazardous chemicals to be transferred to clinker or cement products, resulting in exposure and emissions to people or the environment. Managing input materials and predicting which inputs affect the final concentration is essential to prevent potential hazards. We used the data of six heavy metals by input raw materials and input fuels of cement manufacturers in 2016-2017. The concentrations of Pb and Cu in cement were about 10-200 times and 4 to 200 times higher than other heavy metals (Cr, As, Cd, Hg), respectively. We profiled the influence of heavy metal concentration of each input material using the principal component analysis (PCA), which analyzed the leading causes of each heavy metal. The Random Forest (RF) ensemble model predicted cement heavy metal concentrations according to input materials. In the case of Cu, Cd, and Cr, the training performance showed R square values of 0.71, 0.71, and 0.92, respectively, as a result of predicting the cement heavy metal concentration according to the heavy metal concentration of each cement input material using the RF model, which is a machine learning model. The results of this study show that the RF model can be used to predict the amount and concentration of alternative raw materials and alternative fuels by controlling the concentration of heavy metals in cement through the concentration of heavy metals in the input materials.
Collapse
Affiliation(s)
- Jin Hwi Kim
- Department of Civil and Environmental Engineering, Konkuk University, Seoul, 05029, Republic of Korea.
| | - Dong Hoon Lee
- Department of Civil and Environmental Engineering, Dongguk University, Seoul, 04620, Republic of Korea
| | - Joseph Albert Mendoza
- School of Chemical, Biological, Materials Engineering, and Sciences, Mapua University, 658 Muralla Street, Intramuros, Manila, 1002, Philippines
| | - Min-Yong Lee
- Division of Chemical Research, National Institute of Environmental Research, Seogu, Incheon, 22689, Republic of Korea.
| |
Collapse
|
11
|
Chen Z, Zhang L, Zhang P, Guo H, Zhang R, Li L, Li X. Prediction of Cytochrome P450 Inhibition Using a Deep Learning Approach and Substructure Pattern Recognition. J Chem Inf Model 2024; 64:2528-2538. [PMID: 37864562 DOI: 10.1021/acs.jcim.3c01396] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2023]
Abstract
Cytochrome P450 (CYP) is a family of enzymes that are responsible for about 75% of all metabolic reactions. Among them, CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4 participate in the metabolism of most drugs and mediate many adverse drug reactions. Therefore, it is necessary to estimate the chemical inhibition of Cytochrome P450 enzymes in drug discovery and the food industry. In the past few decades, many computational models have been reported, and some provided good performance. However, there are still several issues that should be resolved for these models, such as single isoform, models with unbalanced performance, lack of structural characteristics analysis, and poor availability. In the present study, the deep learning models based on python using the Keras framework and TensorFlow were developed for the chemical inhibition of each CYP isoform. These models were established based on a large data set containing 85715 compounds extracted from the PubChem bioassay database. On external validation, the models provided good AUC values with 0.97, 0.94, 0.94, 0.96, and 0.94 for CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4, respectively. The models can be freely accessed on the Web server named CYPi-DNNpredictor (cypi.sapredictor.cn), and the codes for the model were made open source in the Supporting Information. In addition, we also analyzed the structural characteristics of chemicals with CYP450 inhibition and detected the structural alerts (SAs), which should be responsible for the inhibition. The SAs were also made available online, named CYPi-SAdetector (cypisa.sapredictor.cn). The models can be used as a powerful tool for the prediction of CYP450 inhibitors, and the SAs should provide useful information for the mechanisms of Cytochrome P450 inhibition.
Collapse
Affiliation(s)
- Zhaoyang Chen
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Le Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Pei Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Huizhu Guo
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Ruiqiu Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Ling Li
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Xiao Li
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| |
Collapse
|
12
|
Cai G, Yang X, Yu X, Zheng W, Cai R, Wang H. The novel application of violacein produced by a marine Duganella strain as a promising agent for controlling Heterosigma akashiwo bloom: Algicidal mechanism, fermentation optimization and agent formulation. JOURNAL OF HAZARDOUS MATERIALS 2024; 466:133548. [PMID: 38262320 DOI: 10.1016/j.jhazmat.2024.133548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 12/28/2023] [Accepted: 01/15/2024] [Indexed: 01/25/2024]
Abstract
Controlling harmful algal blooms with algicidal bacteria is thought to be an efficient and eco-friendly way but lack of comprehensive studies from theory to practice limited the field application. Here we presented a purple bacterial strain Duganella sp. A3 capable of killing several harmful algae, including Heterosigma akashiwo, a world-wide fish-killing microalga. A bioactivity-guided purification and identification approach revealed the major algicidal compound of A3 as the pigment violacein, which was never reported for its algicidal potential before. Violacein rapidly disrupted cell permeability, caused long-term oxidative stress, but mildly affected algal photosystem, which might explain its highly species-specific activity against unarmored H. akashiwo. To explore the application potential of violacein, a fermentation optimization approach combing single-factor and multi-factor experiments was conducted to increase the violacein yield, which finally reached 0.4199 g/L just using a simple medium formula beneficial for compound purification. Finally, taking advantages of the physical and chemical stabilities, we successfully developed the novel application of violacein as a sustained-releasing and easy-to-preserve algicidal agent using alginate-acacia-gum-chitosan encapsulation, which paved the path for its future application in controlling H. akashiwo bloom.
Collapse
Affiliation(s)
- Guanjing Cai
- Biology Department and Institute of Marine Sciences, College of Science, and Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou 515063, China.
| | - Xujun Yang
- Jimei University, Xiamen 361021, China; State Key Laboratory of Marine Environmental Science and Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, School of Life Sciences, Xiamen University, Xiamen 361005, China
| | - Xiaoqi Yu
- State Key Laboratory of Marine Environmental Science and Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, School of Life Sciences, Xiamen University, Xiamen 361005, China; Jimei Branch Xiamen Foreign Language School, Xiamen 361021, China
| | - Wei Zheng
- State Key Laboratory of Marine Environmental Science and Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, School of Life Sciences, Xiamen University, Xiamen 361005, China
| | - Runlin Cai
- Biology Department and Institute of Marine Sciences, College of Science, and Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou 515063, China
| | - Hui Wang
- Biology Department and Institute of Marine Sciences, College of Science, and Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou 515063, China
| |
Collapse
|
13
|
Kim H, Lee G, Lee CG, Park SJ. Algae development in rivers with artificially constructed weirs: Dominant influence of discharge over temperature. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 355:120551. [PMID: 38460331 DOI: 10.1016/j.jenvman.2024.120551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 02/05/2024] [Accepted: 03/04/2024] [Indexed: 03/11/2024]
Abstract
Algal blooms contribute to water quality degradation, unpleasant odors, taste issues, and the presence of harmful substances in artificially constructed weirs. Mitigating these adverse effects through effective algal bloom management requires identifying the contributing factors and predicting algal concentrations. This study focused on the upstream region of the Seungchon Weir in Korea, which is characterized by elevated levels of total nitrogen and phosphorus due to a significant influx of water from a sewage treatment plant. We employed four distinct machine learning models to predict chlorophyll-a (Chl-a) concentrations and identified the influential variables linked to local algal bloom events. The gradient boosting model enabled an in-depth exploration of the intricate relationships between algal occurrence and water quality parameters, enabling accurate identification of the causal factors. The models identified the discharge flow rate (D-Flow) and water temperature as the primary determinants of Chl-a levels, with feature importance values of 0.236 and 0.212, respectively. Enhanced model precision was achieved by utilizing daily average D-Flow values, with model accuracy and significance of the D-Flow amplifying as the temporal span of daily averaging increased. Elevated Chl-a concentrations correlated with diminished D-Flow and temperature, highlighting the pivotal role of D-Flow in regulating Chl-a concentration. This trend can be attributed to the constrained discharge of the Seungchon Weir during winter. Calculating the requisite D-Flow to maintain a desirable Chl-a concentration of up to 20 mg/m3 across varying temperatures revealed an escalating demand for D-Flow with rising temperatures. Specific D-Flow ranges, corresponding to each season and temperature condition, were identified as particularly influential on Chl-a concentration. Thus, optimizing Chl-a reduction can be achieved by strategically increasing D-Flow within these specified ranges for each season and temperature variation. This study highlights the importance of maintaining sufficient D-Flow levels to mitigate algal proliferation within river systems featuring weirs.
Collapse
Affiliation(s)
- Hyunju Kim
- Faculty of Liberal Education, Seoul National University, Seoul, 08826, Republic of Korea
| | - Gyesik Lee
- School of Computer Engineering and Applied Mathematics, Hankyong National University, Anseong, 17579, Republic of Korea.
| | - Chang-Gu Lee
- Department of Environmental and Safety Engineering, Ajou University, Suwon, 16499, Republic of Korea
| | - Seong-Jik Park
- Department of Bioresources and Rural System Engineering, Hankyong National University, Anseong, 17579, Republic of Korea.
| |
Collapse
|
14
|
Xiao X, Peng Y, Zhang W, Yang X, Zhang Z, Ren B, Zhu G, Zhou S. Current status and prospects of algal bloom early warning technologies: A Review. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 349:119510. [PMID: 37951110 DOI: 10.1016/j.jenvman.2023.119510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/21/2023] [Accepted: 10/31/2023] [Indexed: 11/13/2023]
Abstract
In recent years, frequent occurrences of algal blooms due to environmental changes have posed significant threats to the environment and human health. This paper analyzes the reasons of algal bloom from the perspective of environmental factors such as nutrients, temperature, light, hydrodynamics factors and others. Various commonly used algal bloom monitoring methods are discussed, including traditional field monitoring methods, remote sensing techniques, molecular biology-based monitoring techniques, and sensor-based real-time monitoring techniques. The advantages and limitations of each method are summarized. Existing algal bloom prediction models, including traditional models and machine learning (ML) models, are introduced. Support Vector Machine (SVM), deep learning (DL), and other ML models are discussed in detail, along with their strengths and weaknesses. Finally, this paper provides an outlook on the future development of algal bloom warning techniques, proposing to combine various monitoring methods and prediction models to establish a multi-level and multi-perspective algal bloom monitoring system, further improving the accuracy and timeliness of early warning, and providing more effective safeguards for environmental protection and human health.
Collapse
Affiliation(s)
- Xiang Xiao
- College of Civil Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
| | - Yazhou Peng
- College of Civil Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China.
| | - Wei Zhang
- School of Hydraulic and Environmental Engineering, Changsha University of Science & Technology, Changsha, 410114, China.
| | - Xiuzhen Yang
- College of Civil Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
| | - Zhi Zhang
- Laboratory of Three Gorges Reservoir Region, Chongqing University, Chongqing, 400045, China
| | - Bozhi Ren
- School of Earth Sciences and Spatial Information Engineering, Hunan University of Science and Technology, Xiangtan, 411201, Hunan, China
| | - Guocheng Zhu
- College of Civil Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
| | - Saijun Zhou
- College of Civil Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
| |
Collapse
|
15
|
Ly QV, Tong NA, Lee BM, Nguyen MH, Trung HT, Le Nguyen P, Hoang THT, Hwang Y, Hur J. Improving algal bloom detection using spectroscopic analysis and machine learning: A case study in a large artificial reservoir, South Korea. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 901:166467. [PMID: 37611716 DOI: 10.1016/j.scitotenv.2023.166467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 08/17/2023] [Accepted: 08/19/2023] [Indexed: 08/25/2023]
Abstract
The prediction of algal blooms using traditional water quality indicators is expensive, labor-intensive, and time-consuming, making it challenging to meet the critical requirement of timely monitoring for prompt management. Using optical measures for forecasting algal blooms is a feasible and useful method to overcome these problems. This study explores the potential application of optical measures to enhance algal bloom prediction in terms of prediction accuracy and workload reduction, aided by machine learning (ML) models. Compared to absorption-derived parameters, commonly used fluorescence indices such as the fluorescence index (FI), humification index (HIX), biological index (BIX), and protein-like component improved the prediction accuracy. However, the prediction accuracy was decreased when all optical indices were considered for computation due to increased noise and uncertainty in the models. With the exception of chemical oxygen demand (COD), this study successfully replaced biochemical oxygen demand (BOD), dissolved organic carbon (DOC), and nutrients with selected fluorescence indices, demonstrating relatively analogous performance in either training or testing data, with consistent and good coefficient of determination (R2) values of approximately 0.85 and 0.74, respectively. Among all models considered, ensemble learning models consistently outperformed conventional regression models and artificial neural networks (ANNs). However, there was a trade-off between accuracy and computation efficiency among the ensemble learning models (i.e., Stacking and XGBoost) for algal bloom prediction. Our study offers a glimpse of the potential application of spectroscopic measures to improve accuracy and efficiency in algal bloom prediction, but further work should be carried out in other water bodies to further validate our proposed hypothesis.
Collapse
Affiliation(s)
- Quang Viet Ly
- Department of Environmental Engineering, Seoul National University of Science and Technology, Seoul 01811, South Korea
| | - Ngoc Anh Tong
- School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, Vietnam
| | - Bo-Mi Lee
- Water Quality Assessment Research Division, National Institute of Environmental Research, Incheon 22689, South Korea
| | - Minh Hieu Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, Vietnam; School of Information and Communication Technology, Griffith University, Gold Coast, Australia
| | - Huynh Thanh Trung
- Ecole Polytechnique Federale de Lausanne, 1015 Lausanne, Switzerland
| | - Phi Le Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, Vietnam
| | - Thu-Huong T Hoang
- School of Chemistry and Life Science, Hanoi University of Science and Technology, Hanoi 10000, Vietnam
| | - Yuhoon Hwang
- Department of Environmental Engineering, Seoul National University of Science and Technology, Seoul 01811, South Korea
| | - Jin Hur
- Department of Environment and Energy, Sejong University, Seoul 05006, South Korea.
| |
Collapse
|
16
|
Kim JH, Lee H, Byeon S, Shin JK, Lee DH, Jang J, Chon K, Park Y. Machine Learning-Based Early Warning Level Prediction for Cyanobacterial Blooms Using Environmental Variable Selection and Data Resampling. TOXICS 2023; 11:955. [PMID: 38133356 PMCID: PMC10747537 DOI: 10.3390/toxics11120955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/14/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023]
Abstract
Many countries have attempted to mitigate and manage issues related to harmful algal blooms (HABs) by monitoring and predicting their occurrence. The infrequency and duration of HABs occurrence pose the challenge of data imbalance when constructing machine learning models for their prediction. Furthermore, the appropriate selection of input variables is a significant issue because of the complexities between the input and output variables. Therefore, the objective of this study was to improve the predictive performance of HABs using feature selection and data resampling. Data resampling was used to address the imbalance in the minority class data. Two machine learning models were constructed to predict algal alert levels using 10 years of meteorological, hydrodynamic, and water quality data. The improvement in model accuracy due to changes in resampling methods was more noticeable than the improvement in model accuracy due to changes in feature selection methods. Models constructed using combinations of original and synthetic data across all resampling methods demonstrated higher prediction performance for the caution level (L-1) and warning level (L-2) than models constructed using the original data. In particular, the optimal artificial neural network and random forest models constructed using combinations of original and synthetic data showed significantly improved prediction accuracy for L-1 and L-2, representing the transition from normal to bloom formation states in the training and testing steps. The test results of the optimal RF model using the original data indicated prediction accuracies of 98.8% for L0, 50.0% for L1, and 50.0% for L2. In contrast, the optimal random forest model using the Synthetic Minority Oversampling Technique-Edited Nearest Neighbor (ENN) sampling method achieved accuracies of 85.0% for L0, 85.7% for L1, and 100% for L2. Therefore, applying synthetic data can address the imbalance in the observed data and improve the detection performance of machine learning models. Reliable predictions using improved models can support the design of management practices to mitigate HABs in reservoirs and ultimately ensure safe and clean water resources.
Collapse
Affiliation(s)
- Jin Hwi Kim
- School of Civil and Environmental Engineering, Konkuk University, Gwangjin-gu, Seoul 05029, Republic of Korea; (J.H.K.); (H.L.); (S.B.)
| | - Hankyu Lee
- School of Civil and Environmental Engineering, Konkuk University, Gwangjin-gu, Seoul 05029, Republic of Korea; (J.H.K.); (H.L.); (S.B.)
| | - Seohyun Byeon
- School of Civil and Environmental Engineering, Konkuk University, Gwangjin-gu, Seoul 05029, Republic of Korea; (J.H.K.); (H.L.); (S.B.)
| | - Jae-Ki Shin
- Busan Region Branch Office of the Nakdong River, Korea Water Resources Corporation (K-Water), Saha-Gu, Busan 49300, Republic of Korea;
| | - Dong Hoon Lee
- Department of Civil and Environmental Engineering, Dongguk University-Seoul, 30, Pildong-ro 1-gil, Jung-gu, Seoul 04620, Republic of Korea;
| | - Jiyi Jang
- Division of Atmospheric Sciences, Korea Polar Research Institute, 26, Songdomirae-ro, Yeonsu-gu, Incheon 21990, Republic of Korea;
| | - Kangmin Chon
- Department of Environmental Engineering, Kangwon National University, Gangwon-do, Chuncheon 24341, Republic of Korea;
- Department of Integrated Energy and Infra System, Kangwon National University, Gangwon-do, Chuncheon 24341, Republic of Korea
| | - Yongeun Park
- School of Civil and Environmental Engineering, Konkuk University, Gwangjin-gu, Seoul 05029, Republic of Korea; (J.H.K.); (H.L.); (S.B.)
| |
Collapse
|
17
|
Li H, Bhattarai B, Barber M, Goel R. Stringent Response of Cyanobacteria and Other Bacterioplankton during Different Stages of a Harmful Cyanobacterial Bloom. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:16016-16032. [PMID: 37819800 DOI: 10.1021/acs.est.3c03114] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
We conducted a field study to investigate the role of stringent response in cyanobacteria and coexisting bacterioplankton during nutrient-deprived periods at various stages of bloom in a freshwater lake (Utah Lake) for the first time. Using metagenomics and metatranscriptomics analyses, we examined the cyanobacterial ecology and expression of important functional genes related to stringent response, N and P metabolism, and regulation. Our findings mark a significant advancement in understanding the mechanisms by which toxic cyanobacteria survive and proliferate during nitrogen (N) and phosphorus (P) limitations. We successfully identified and analyzed the metagenome-assembled genomes (MAGs) of the dominant bloom-forming cyanobacteria, namely, Dolichospermum circinale, Aphanizomenon flos-aquae UKL13-PB, Planktothrix agardhii, and Microcystis aeruginosa. By mapping RNA-seq data to the coding sequences of the MAGs, we observed that these four prevalent cyanobacteria species activated multiple functions to adapt to the depletion of inorganic nutrients. During and after the blooms, the four dominant cyanobacteria species expressed high levels of transcripts related to toxin production, such as microcystins (mcy), anatoxins (ana), and cylindrospermopsins (cyr). Additionally, genes associated with polyphosphate (poly-P) storage and the stringent response alarmone (p)ppGpp synthesis/hydrolysis, including ppk, relA, and spoT, were highly activated in both cyanobacteria and bacterioplankton. Under N deficiency, the main N pathways shifted from denitrification and dissimilatory nitrate reduction in bacterioplankton toward N2-fixing and assimilatory nitrate reduction in certain cyanobacteria with a corresponding shift in the community composition. P deprivation triggered a stringent response mediated by spoT-dependent (p)ppGpp accumulation and activation of the Pho regulon in both cyanobacteria and bacterioplankton, facilitating inorganic and organic P uptake. The dominant cyanobacterial MAGs exhibited the presence of multiple alkaline phosphatase (APase) transcripts (e.g., phoA in Dolichospermum, phoX in Planktothrix, and Microcystis), suggesting their ability to synthesize and release APase enzymes to convert ambient organic P into bioavailable forms. Conversely, transcripts associated with bacterioplankton-dominated pathways like denitrification were low and did not align with the occurrence of intense cyanoHABs. The strong correlations observed among N, P, stringent response metabolisms and the succession of blooms caused by dominant cyanobacterial species provide evidence that the stringent response, induced by nutrient limitation, may activate unique N and P functions in toxin-producing cyanobacteria, thereby sustaining cyanoHABs.
Collapse
Affiliation(s)
- Hanyan Li
- Institute for Environmental Genomics, The University of Oklahoma, 101 David L Boren Blvd, Norman, Oklahoma 73019, United States
| | - Bishav Bhattarai
- Department of Civil and Environmental Engineering, The University of Utah, 110 S Central Campus, Salt Lake City, Utah 84112, United States
| | - Michael Barber
- Department of Civil and Environmental Engineering, The University of Utah, 110 S Central Campus, Salt Lake City, Utah 84112, United States
| | - Ramesh Goel
- Department of Civil and Environmental Engineering, The University of Utah, 110 S Central Campus, Salt Lake City, Utah 84112, United States
| |
Collapse
|
18
|
Ahn JM, Kim J, Kim K. Ensemble Machine Learning of Gradient Boosting (XGBoost, LightGBM, CatBoost) and Attention-Based CNN-LSTM for Harmful Algal Blooms Forecasting. Toxins (Basel) 2023; 15:608. [PMID: 37888638 PMCID: PMC10611362 DOI: 10.3390/toxins15100608] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 09/26/2023] [Accepted: 10/02/2023] [Indexed: 10/28/2023] Open
Abstract
Harmful algal blooms (HABs) are a serious threat to ecosystems and human health. The accurate prediction of HABs is crucial for their proactive preparation and management. While mechanism-based numerical modeling, such as the Environmental Fluid Dynamics Code (EFDC), has been widely used in the past, the recent development of machine learning technology with data-based processing capabilities has opened up new possibilities for HABs prediction. In this study, we developed and evaluated two types of machine learning-based models for HABs prediction: Gradient Boosting models (XGBoost, LightGBM, CatBoost) and attention-based CNN-LSTM models. We used Bayesian optimization techniques for hyperparameter tuning, and applied bagging and stacking ensemble techniques to obtain the final prediction results. The final prediction result was derived by applying the optimal hyperparameter and bagging and stacking ensemble techniques, and the applicability of prediction to HABs was evaluated. When predicting HABs with an ensemble technique, it is judged that the overall prediction performance can be improved by complementing the advantages of each model and averaging errors such as overfitting of individual models. Our study highlights the potential of machine learning-based models for HABs prediction and emphasizes the need to incorporate the latest technology into this important field.
Collapse
Affiliation(s)
- Jung Min Ahn
- Water Quality Assessment Research Division, Water Environment Research Department, National Institute of Environmental Research, Incheon 22689, Republic of Korea; (J.K.)
| | | | | |
Collapse
|
19
|
Zeinolabedini Rezaabad M, Lacey H, Marshall L, Johnson F. Influence of resampling techniques on Bayesian network performance in predicting increased algal activity. WATER RESEARCH 2023; 244:120558. [PMID: 37666153 DOI: 10.1016/j.watres.2023.120558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 08/10/2023] [Accepted: 08/30/2023] [Indexed: 09/06/2023]
Abstract
Early warning of increased algal activity is important to mitigate potential impacts on aquatic life and human health. While many methods have been developed to predict increased algal activity, an ongoing issue is that severe algal blooms often occur with low frequency in water bodies. This results in imbalanced data sets available for model specification, leading to poor predictions of the frequency of increased algal activity. One approach to address this is to resample data sets of increased algal activity to increase the prevalence of higher than normal algal activity in calibration data and ultimately improve model predictions. This study aims to investigate the use of resampling techniques to address the imbalanced dataset and determine if such methods can improve the prediction of increased algal activity. Three techniques were investigated, Kmeans under-sampling (US_Kmeans), synthetic minority over-sampling technique (SMOTE), and 'SMOTE and cluster-based under-sampling technique' (SCUT). The resampling methods were applied to a Bayesian network (BN) model of Lake Burragorang in New South Wales, Australia. The model was developed to predict chlorophyll-a (chl-a) using a range of water quality parameters as predictors. The original data and each of the balanced datasets were used for BN structures and parameter learning. The results showed that the best graphical structure was obtained by adding synthetic data from SMOTE with the highest true positive rate (TPR) and area under the curve (AUC). When compared using a fixed graphical structure for the BN, all resampling techniques increased the ability of the BN to detect events with higher probability of increased algal activity. The resampling model results can also be used to better understand the most important influences on high chl-a concentrations and suggest future data collection and model development priorities.
Collapse
Affiliation(s)
- Maryam Zeinolabedini Rezaabad
- Water Research Centre, School of Civil and Environmental Engineering, University of New South Wales, Kensington, New South Wales, Australia; ARC Training Centre Data Analytics for Resources and Environments, School of Life and Environmental Sciences, The University of Sydney, Camperdown, New South Wales, Australia.
| | | | - Lucy Marshall
- Water Research Centre, School of Civil and Environmental Engineering, University of New South Wales, Kensington, New South Wales, Australia; ARC Training Centre Data Analytics for Resources and Environments, School of Life and Environmental Sciences, The University of Sydney, Camperdown, New South Wales, Australia; Faculty of Science and Engineering, Macquarie University, North Ryde, New South Wales, Australia
| | - Fiona Johnson
- Water Research Centre, School of Civil and Environmental Engineering, University of New South Wales, Kensington, New South Wales, Australia; ARC Training Centre Data Analytics for Resources and Environments, School of Life and Environmental Sciences, The University of Sydney, Camperdown, New South Wales, Australia
| |
Collapse
|
20
|
Rao W, Qian X, Fan Y, Liu T. A soft sensor for simulating algal cell density based on dynamic response to environmental changes in a eutrophic shallow lake. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 868:161543. [PMID: 36640876 DOI: 10.1016/j.scitotenv.2023.161543] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 01/07/2023] [Accepted: 01/07/2023] [Indexed: 06/17/2023]
Abstract
There is a great need for timely monitoring and rapid water quality assessment to control the algal blooms that often occur in eutrophic lakes. While algal cell density (ACD) is a critical indicator of algal growth, field monitoring is laborious and time-consuming, and rapid assessment of algal blooms based on ACD is often not possible. To address the limitations of conventional ACD detection, we proposed a soft sensor approach that uses surrogate indicators to simulate ACD in machine learning models. We conducted a case study using monitoring data from Chaohu Lake collected between 2016 and 2019. We found that ensemble learning models, especially extreme gradient boosting (XGBoost), outperformed traditional machine learning algorithms by comparing various machine learning algorithms. Also, considering the influence of input variable selection on model performance, we combined the results of different filter methods in the multi-stage variable selection process. Finally, we screened out seven key variables out of the 43 initial candidate variables, including dissolved oxygen (DO), chlorophyll-a (Chl-a), Secchi disk depth (SD), pH, permanganate index (CODMn), week of the year (WOY), and wind velocity (WV). Their inclusion substantially improved data accessibility and supported the development of a rapid simulation model. The final model was capable of reliable spatiotemporal generalization, with an overall R2 value of 0.761. On the theoretical side, our study makes a new attempt to simulate ACD values in a eutrophic lake. For practical purposes, the soft sensor can facilitate the rapid assessment of bloom conditions, which helps the local administration with emergency prevention and control.
Collapse
Affiliation(s)
- Wenxin Rao
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Xin Qian
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China; Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science & Technology, Nanjing 210044, China.
| | - Yifan Fan
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Tong Liu
- Faculty of Environmental Earth Science, Hokkaido University, Sapporo 060-0810, Japan
| |
Collapse
|
21
|
Kim J, Jung W, An J, Oh HJ, Park J. Self-optimization of training dataset improves forecasting of cyanobacterial bloom by machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 866:161398. [PMID: 36621510 DOI: 10.1016/j.scitotenv.2023.161398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/30/2022] [Accepted: 01/01/2023] [Indexed: 06/17/2023]
Abstract
Data-driven model (DDM) prediction of aquatic ecological responses, such as cyanobacterial harmful algal blooms (CyanoHABs), is critically influenced by the choice of training dataset. However, a systematic method to choose the optimal training dataset considering data history has not yet been developed. Providing a comprehensive procedure with self-based optimal training dataset-selecting algorithm would self-improve the DDM performance. In this study, a novel algorithm was developed to self-generate possible training dataset candidates from the available input and output variable data and self-choose the optimal training dataset that maximizes CyanoHAB forecasting performance. Nine years of meteorological and water quality data (input) and CyanoHAB data (output) from a site on the Nakdong River, South Korea, were acquired and pretreated via an automated process. An artificial neural network (ANN) was chosen from among the DDM candidates by first-cut training and validation using the entire collected dataset. Optimal training datasets for the ANN were self-selected from among the possible self-generated training datasets by systematically simulating the performance in response to 46 periods and 40 sizes (number of data elements) of the generated training datasets. The best-performing models were screened to identify the candidate models. The best performance corresponded to 6-7 years of training data (∼18 % lower error) for forecasting 1-28 d ahead (1-28 d of forecasting lead time (FLT)). After the hyperparameters of the screened model candidates were fine-tuned, the best-performing model (7 years of data with 14 d FLT) was self-determined by comparing the forecasts with unseen CyanoHAB events. The self-determined model could reasonably predict CyanoHABs occurring in Korean waters (cyanobacteria cells/mL ≥ 1000). Thus, our proposed method of self-optimizing the training dataset effectively improved the predictive accuracy and operational efficiency of the DDM prediction of CyanoHAB.
Collapse
Affiliation(s)
- Jayun Kim
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea
| | - Woosik Jung
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea
| | - Jusuk An
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea; Department of Environmental Research, Korea Institute of Civil Engineering and Building Technology, Goyang, Republic of Korea
| | - Hyun Je Oh
- Department of Environmental Research, Korea Institute of Civil Engineering and Building Technology, Goyang, Republic of Korea
| | - Joonhong Park
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea.
| |
Collapse
|
22
|
Chang H, Wu H, Zhang L, Wu W, Zhang C, Zhong N, Zhong D, Xu Y, He X, Yang J, Zhang Y, Zhang T, Liao Q, Ho SH. Gradient electro-processing strategy for efficient conversion of harmful algal blooms to biohythane with mechanisms insight. WATER RESEARCH 2022; 222:118929. [PMID: 35970007 DOI: 10.1016/j.watres.2022.118929] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 07/22/2022] [Accepted: 07/30/2022] [Indexed: 06/15/2023]
Abstract
Globally eruptive harmful algal blooms (HABs) have caused numerous negative effects on aquatic ecosystem and human health. Conversion of HABs into biohythane via dark fermentation (DF) is a promising approach to simultaneously cope with environmental and energy issues, but low HABs harvesting efficiency and biohythane productivity severely hinder its application. Here we designed a gradient electro-processing strategy for efficient HABs harvesting and disruption, which had intrinsic advantages of no secondary pollution and high economic feasibility. Firstly, low current density (0.888-4.444 mA/cm2) was supplied to HABs suspension to harvest biomass via electro-flocculation, which achieved 98.59% harvesting efficiency. A mathematic model considering coupling effects of multi-influencing factors on HABs harvesting was constructed to guide large-scale application. Then, the harvested HABs biomass was disrupted via electro-oxidation under higher current density (44.44 mA/cm2) to improve bioavailability for DF. As results, hydrogen and methane yields of 64.46 mL/ (g VS) and 171.82 mL/(g VS) were obtained under 6 min electro-oxidation, along with the highest energy yield (50.1 kJ/L) and energy conversion efficiency (44.87%). Mechanisms of HABs harvesting and disruption under gradient electro-processing were revealed, along with the conversion pathways from HABs to biohythane. Together, this work provides a promising strategy for efficient disposal of HABs with extra benefit of biohythane production.
Collapse
Affiliation(s)
- Haixing Chang
- College of Chemistry and Chemical Engineering, Chongqing University of Technology, Chongqing 400054, China; State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China
| | - Haihua Wu
- College of Chemistry and Chemical Engineering, Chongqing University of Technology, Chongqing 400054, China
| | - Lei Zhang
- College of Chemistry and Chemical Engineering, Chongqing University of Technology, Chongqing 400054, China
| | - Wenbo Wu
- College of Chemistry and Chemical Engineering, Chongqing University of Technology, Chongqing 400054, China
| | - Chaofan Zhang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China
| | - Nianbing Zhong
- Intelligent Fiber Sensing Technology of Chongqing Municipal Engineering Research Center of Institutions of Higher Education, Chongqing Key Laboratory of Fiber Optic Sensor and Photodetector, Chongqing University of Technology, Chongqing 400054, China
| | - Dengjie Zhong
- College of Chemistry and Chemical Engineering, Chongqing University of Technology, Chongqing 400054, China
| | - Yunlan Xu
- College of Chemistry and Chemical Engineering, Chongqing University of Technology, Chongqing 400054, China
| | - Xuefeng He
- Intelligent Fiber Sensing Technology of Chongqing Municipal Engineering Research Center of Institutions of Higher Education, Chongqing Key Laboratory of Fiber Optic Sensor and Photodetector, Chongqing University of Technology, Chongqing 400054, China
| | - Jing Yang
- College of Chemistry and Chemical Engineering, Chongqing University of Technology, Chongqing 400054, China
| | - Yue Zhang
- College of Chemistry and Chemical Engineering, Chongqing University of Technology, Chongqing 400054, China
| | - Ting Zhang
- College of Chemistry and Chemical Engineering, Chongqing University of Technology, Chongqing 400054, China
| | - Qiang Liao
- Key laboratory of Low-grade Energy Utilization Technologies and Systems, Chongqing University, Ministry of Education, Chongqing 400030, China.
| | - Shih-Hsin Ho
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China.
| |
Collapse
|
23
|
Detecting Starch-Head and Mildewed Fruit in Dried Hami Jujubes Using Visible/Near-Infrared Spectroscopy Combined with MRSA-SVM and Oversampling. Foods 2022; 11:foods11162431. [PMID: 36010431 PMCID: PMC9407322 DOI: 10.3390/foods11162431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/05/2022] [Accepted: 08/09/2022] [Indexed: 11/17/2022] Open
Abstract
Dried Hami jujube has great commercial and nutritional value. Starch-head and mildewed fruit are defective jujubes that pose a threat to consumer health. A novel method for detecting starch-head and mildewed fruit in dried Hami jujubes with visible/near-infrared spectroscopy was proposed. For this, the diffuse reflectance spectra in the range of 400–1100 nm of dried Hami jujubes were obtained. Borderline synthetic minority oversampling technology (BL-SMOTE) was applied to solve the problem of imbalanced sample distribution, and its effectiveness was demonstrated compared to other methods. Then, the feature variables selected by competitive adaptive reweighted sampling (CARS) were used as the input to establish the support vector machine (SVM) classification model. The parameters of SVM were optimized by the modified reptile search algorithm (MRSA). In MRSA, Tent chaotic mapping and the Gaussian random walk strategy were used to improve the optimization ability of the original reptile search algorithm (RSA). The final results showed that the MRSA-SVM method combined with BL-SMOTE had the best classification performance, and the detection accuracy reached 97.22%. In addition, the recall, precision, F1 and kappa coefficient outperform other models. Furthermore, this study provided a valuable reference for the detection of defective fruit in other fruits.
Collapse
|
24
|
Baek SS, Jung EY, Pyo J, Pachepsky Y, Son H, Cho KH. Hierarchical deep learning model to simulate phytoplankton at phylum/class and genus levels and zooplankton at the genus level. WATER RESEARCH 2022; 218:118494. [PMID: 35523035 DOI: 10.1016/j.watres.2022.118494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2022] [Revised: 04/19/2022] [Accepted: 04/20/2022] [Indexed: 06/14/2023]
Abstract
Harmful algal blooms (HABs) have become a global issue, affecting public health and water industries in numerous countries. Because funds for monitoring HABs are limited, model development may be an alternative approach for understanding and managing HABs. Continuous monitoring based on grab sampling is time-consuming, costly, and labor-intensive. However, improving simulation performance remains a major challenge in modeling, and current methods are limited to simulating phytoplankton (e.g., Microcystis sp., Anabaena sp., Aulacoseira sp., Cyclotella sp., Pediastrum sp., and Eudorina sp.) and zooplankton (e.g., Cyclotella sp., Pediastrum sp., and Eudorina sp.) at the genus level. The traditional modeling approach is limited for evaluating the interactions between phytoplankton and zooplankton. Recently, deep learning (DL) models have been proposed for solving modeling problems because of their large data handling capabilities and model structure flexibilities. In this study, we evaluated the applicability of DL for simulating phytoplankton at the phylum/class and genus levels and zooplankton at the genus level. Our work was an explicit representation of the taxonomic and ecological hierarchy of the DL model structure. The prerequisite for this model design was the data collection at two taxonomic and hierarchical levels. Our model consisted of hierarchical DL with classification transformer (TF) and regression TF models. These DL models were hierarchically connected; the output of the phylum/class level model was transferred to the genus level simulation model, and the output of the genus level model was fed into the zooplankton simulation model. The classification TF model determined the phytoplankton occurrence initiation date, whereas the regression TF model quantified the cell concentration of plankton. The hierarchical DL showed potential to simulate phytoplankton at the phylum/class and genus levels by producing average R2, and root mean standard error values of 0.42 and 0.83 [log(cells mL-1)], respectively. All simulated plankton results closely matched the measured concentrations. Particularly, the simulated cyanobacteria showed good agreement with the measured cell concentration, with an R2 value of 0.72. In addition, our simulated result showed good agreement in peak concentration compared to observations. However, a limitation remained in following the temporal variation of Tintinnopsis sp. and Bosmia sp. Using an importance map from the TF model, water temperature, total phosphorus, and total nitrogen were identified as significant variables influencing phytoplankton and zooplankton blooms. Overall, our study demonstrated that DL can be used for modeling HABs at the phylum/class and genus levels.
Collapse
Affiliation(s)
- Sang-Soo Baek
- Department of Environmental Engineering, Yeungnam University, 280 Daehak-Ro, Gyeongsan-Si, Gyeongbuk 38541, South Korea
| | - Eun-Young Jung
- Center for Environmental Data Strategy, Korea Environment Institute, Sejong 30147, Republic of Korea
| | - JongCheol Pyo
- Busan Water Quality Institute, 421-1 Maeri, Sangdongmyun, Kimhae 621-813, Republic of Korea
| | - Yakov Pachepsky
- Environmental Microbial and Food Safety Laboratory, USDA-ARS, Beltsville, MD, USA
| | - Heejong Son
- Center for Environmental Data Strategy, Korea Environment Institute, Sejong 30147, Republic of Korea.
| | - Kyung Hwa Cho
- School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea.
| |
Collapse
|
25
|
Machine Learning and Multiple Imputation Approach to Predict Chlorophyll-a Concentration in the Coastal Zone of Korea. WATER 2022. [DOI: 10.3390/w14121862] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The concentration of chlorophyll-a (Chl-a) is an integrative bio-indicator of aquatic ecosystems and a direct indicator that evaluates the ecological status of water bodies. In this study, we focused on predicting the Chl-a concentration in seawater using machine learning (after replacing missing values). To replace the missing values among marine environment observation data, a comparison experiment was performed using multiple built-in imputation methods (i.e., pmm, cart, rf, norm, norm.nob, norm.boot, and norm.predict) of the mice package in R. The cart method was selected as the most suitable. We generated each regression model using six machine learning algorithms (regression tree, support vector regression (SVR), bagging, random forest, gradient boosting machine (GBM), and extreme gradient boosting (XGBoost)) to predict the Chl-a concentration based on the complete imputed dataset. The prediction performance of the models was evaluated by four evaluation criteria using 10-fold cross-validation tests. XGBoost, an ensemble learning approach, outperformed other models in predicting the Chl-a concentration; SVR, a single model, also showed a good performance. The most important environmental factor in predicting the Chl-a concentration was an organic carbon particulate; however, dissolved oxygen also showed potential. This study was conducted with field observations in the spring and summer in the coastal zone of Korea. There exists a limit in machine learning applications, which excludes temporal and spatial factors. However, extensions to time series forecasting for deep learning or machine learning can lead to meaningful regional and seasonal analysis. It can also improve prediction performance as a result of the long-term data accumulation of field observations of more varied features (such as meteorological and hydrodynamic) besides water quality.
Collapse
|