1
|
Wang L, Shan K, Yi Y, Yang H, Zhang Y, Xie M, Zhou Q, Shang M. Employing hybrid deep learning for near-real-time forecasts of sensor-based algal parameters in a Microcystis bloom-dominated lake. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 922:171009. [PMID: 38402991 DOI: 10.1016/j.scitotenv.2024.171009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/05/2024] [Accepted: 02/14/2024] [Indexed: 02/27/2024]
Abstract
Harmful cyanobacterial blooms (CyanoHABs) are increasingly impacting the ecosystem of lakes, reservoirs and estuaries globally. The integration of real-time monitoring and deep learning technology has opened up new horizons for early warnings of CyanoHABs. However, unlike traditional methods such as pigment quantification or microscopy counting, the high-frequency data from in-situ fluorometric sensors display unpredictable fluctuations and variability, posing a challenge for predictive models to discern underlying trends within the time-series sequence. This study introduces a hybrid framework for near-real-time CyanoHABs predictions in a cyanobacterium Microcystis-dominated lake - Lake Dianchi, China. The proposed model was validated using hourly Chlorophyll-a (Chl a) concentrations and algal cell densities. Our results demonstrate that applying decomposition-based singular spectrum analysis (SSA) significantly enhances the prediction accuracy of subsequent CyanoHABs models, particularly in the case of temporal convolutional network (TCN). Comparative experiments revealed that the SSA-TCN model outperforms other SSA-based deep learning models for predicting Chl a (R2 = 0.45-0.93, RMSE = 2.29-5.89 μg/L) and algal cell density (R2 = 0.63-0.89, RMSE = 9489.39-16,015.37 cells/mL) at one to four steps ahead predictions. The forecast of bloom intensities achieved a remarkable accuracy of 98.56 % and an average precision rate of 94.04 % ± 0.05 %. In addition, scenarios involving various input combinations of environmental factors demonstrated that water temperature emerged as the most effective driver for CyanoHABs predictions, with a mean RMSE of 2.94 ± 0.12 μg/L, MAE of 1.55 ± 0.09 μg/L, and R2 of 0.83 ± 0.01. Overall, the newly developed approach underscores the potential of a well-designed hybrid deep-learning framework for accurately predicting sensor-based algal parameters. It offers novel perspectives for managing CyanoHABs through online monitoring and artificial intelligence in aquatic ecosystems.
Collapse
Affiliation(s)
- Lan Wang
- School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China; Chongqing Key Laboratory of Big Data and Intelligent Computing, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China; School of Artificial Intelligence, Chongqing University of Education, Chongqing 400065, China
| | - Kun Shan
- Chongqing Key Laboratory of Big Data and Intelligent Computing, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China.
| | - Yang Yi
- Chongqing Key Laboratory of Big Data and Intelligent Computing, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Hong Yang
- Department of Geography and Environmental Science, University of Reading, Reading RG6 6AB, UK
| | - Yanyan Zhang
- College of Resources, Sichuan Agricultural University, Chengdu 611130, China
| | - Mingjiang Xie
- Chongqing Key Laboratory of Big Data and Intelligent Computing, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Qichao Zhou
- Institute for Ecological Research and Pollution Control of Plateau Lakes, School of Ecology and Environmental Sciences, Yunnan University, Kunming 650500, China
| | - Mingsheng Shang
- Chongqing Key Laboratory of Big Data and Intelligent Computing, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| |
Collapse
|
2
|
Villanueva P, Yang J, Radmer L, Liang X, Leung T, Ikuma K, Swanner ED, Howe A, Lee J. One-Week-Ahead Prediction of Cyanobacterial Harmful Algal Blooms in Iowa Lakes. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:20636-20646. [PMID: 38011382 DOI: 10.1021/acs.est.3c07764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Cyanobacterial harmful algal blooms (CyanoHABs) pose serious risks to inland water resources. Despite advancements in our understanding of associated environmental factors and modeling efforts, predicting CyanoHABs remains challenging. Leveraging an integrated water quality data collection effort in Iowa lakes, this study aimed to identify factors associated with hazardous microcystin levels and develop one-week-ahead predictive classification models. Using water samples from 38 Iowa lakes collected between 2018 and 2021, feature selection was conducted considering both linear and nonlinear properties. Subsequently, we developed three model types (Neural Network, XGBoost, and Logistic Regression) with different sampling strategies using the nine selected variables (mcyA_M, TKN, % hay/pasture, pH, mcyA_M:16S, % developed, DOC, dewpoint temperature, and ortho-P). Evaluation metrics demonstrated the strong performance of the Neural Network with oversampling (ROC-AUC 0.940, accuracy 0.861, sensitivity 0.857, specificity 0.857, LR+ 5.993, and 1/LR- 5.993), as well as the XGBoost with downsampling (ROC-AUC 0.944, accuracy 0.831, sensitivity 0.928, specificity 0.833, LR+ 5.557, and 1/LR- 11.569). This study exhibited the intricacies of modeling with limited data and class imbalances, underscoring the importance of continuous monitoring and data collection to improve predictive accuracy. Also, the methodologies employed can serve as meaningful references for researchers tackling similar challenges in diverse environments.
Collapse
Affiliation(s)
- Paul Villanueva
- Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Jihoon Yang
- Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Lorien Radmer
- Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Xuewei Liang
- Department of Civil, Construction and Environmental Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Tania Leung
- Department of Geological and Atmospheric Sciences, Iowa State University, Ames, Iowa 50011, United States
| | - Kaoru Ikuma
- Department of Civil, Construction and Environmental Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Elizabeth D Swanner
- Department of Geological and Atmospheric Sciences, Iowa State University, Ames, Iowa 50011, United States
| | - Adina Howe
- Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Jaejin Lee
- Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, Iowa 50011, United States
| |
Collapse
|
3
|
Kim J, Jung W, An J, Oh HJ, Park J. Self-optimization of training dataset improves forecasting of cyanobacterial bloom by machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 866:161398. [PMID: 36621510 DOI: 10.1016/j.scitotenv.2023.161398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/30/2022] [Accepted: 01/01/2023] [Indexed: 06/17/2023]
Abstract
Data-driven model (DDM) prediction of aquatic ecological responses, such as cyanobacterial harmful algal blooms (CyanoHABs), is critically influenced by the choice of training dataset. However, a systematic method to choose the optimal training dataset considering data history has not yet been developed. Providing a comprehensive procedure with self-based optimal training dataset-selecting algorithm would self-improve the DDM performance. In this study, a novel algorithm was developed to self-generate possible training dataset candidates from the available input and output variable data and self-choose the optimal training dataset that maximizes CyanoHAB forecasting performance. Nine years of meteorological and water quality data (input) and CyanoHAB data (output) from a site on the Nakdong River, South Korea, were acquired and pretreated via an automated process. An artificial neural network (ANN) was chosen from among the DDM candidates by first-cut training and validation using the entire collected dataset. Optimal training datasets for the ANN were self-selected from among the possible self-generated training datasets by systematically simulating the performance in response to 46 periods and 40 sizes (number of data elements) of the generated training datasets. The best-performing models were screened to identify the candidate models. The best performance corresponded to 6-7 years of training data (∼18 % lower error) for forecasting 1-28 d ahead (1-28 d of forecasting lead time (FLT)). After the hyperparameters of the screened model candidates were fine-tuned, the best-performing model (7 years of data with 14 d FLT) was self-determined by comparing the forecasts with unseen CyanoHAB events. The self-determined model could reasonably predict CyanoHABs occurring in Korean waters (cyanobacteria cells/mL ≥ 1000). Thus, our proposed method of self-optimizing the training dataset effectively improved the predictive accuracy and operational efficiency of the DDM prediction of CyanoHAB.
Collapse
Affiliation(s)
- Jayun Kim
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea
| | - Woosik Jung
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea
| | - Jusuk An
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea; Department of Environmental Research, Korea Institute of Civil Engineering and Building Technology, Goyang, Republic of Korea
| | - Hyun Je Oh
- Department of Environmental Research, Korea Institute of Civil Engineering and Building Technology, Goyang, Republic of Korea
| | - Joonhong Park
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea.
| |
Collapse
|
4
|
Oscillation Flow Dam Operation Method for Algal Bloom Mitigation. WATER 2022. [DOI: 10.3390/w14081315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Green algae play an important role in ecosystems as primary producers, but they can cause algal blooms, which are socio-environmental burdens as responding to them requires water resources from dam reservoirs. This study proposes an alternative for reducing algal blooms through dam operation without using additional water resources. A novel oscillation flow concept was suggested: oscillating discharge of dam for irregular flow. To examine its effect, the Environmental Fluid Dynamics Code—National Institute of Environment Research (EFDC-NIER) model was constructed and calibrated for the Namhan River, South Korea, from downstream of the Chungju Dam to downstream of Gangcheon Weir. The water quality in the study area were simulated and analyzed for August 2019, which is when the largest number of harmful cyanobacteria had been reported in recent years. Our results showed that the oscillation flow produced significant variance of flow velocity, and algal bloom density in the Namhan River was reduced by 20–30% through the operation of the Chungju Dam. However, further study and investigation are required before practical application.
Collapse
|
5
|
A Classification-Based Machine Learning Approach to the Prediction of Cyanobacterial Blooms in Chilgok Weir, South Korea. WATER 2022. [DOI: 10.3390/w14040542] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Cyanobacterial blooms appear by complex causes such as water quality, climate, and hydrological factors. This study aims to present the machine learning models to predict occurrences of these complicated cyanobacterial blooms efficiently and effectively. The dataset was classified into groups consisting of two, three, or four classes based on cyanobacterial cell density after a week, which was used as the target variable. We developed 96 machine learning models for Chilgok weir using four classification algorithms: k-Nearest Neighbor, Decision Tree, Logistic Regression, and Support Vector Machine. In the modeling methodology, we first selected input features by applying ANOVA (Analysis of Variance) and solving a multi-collinearity problem as a process of feature selection, which is a method of removing irrelevant features to a target variable. Next, we adopted an oversampling method to resolve the problem of having an imbalanced dataset. Consequently, the best performance was achieved for models using datasets divided into two classes, with an accuracy of 80% or more. Comparatively, we confirmed low accuracy of approximately 60% for models using datasets divided into three classes. Moreover, while we produced models with overall high accuracy when using logCyano (logarithm of cyanobacterial cell density) as a feature, several models in combination with air temperature and NO3-N (nitrate nitrogen) using two classes also demonstrated more than 80% accuracy. It can be concluded that it is possible to develop very accurate classification-based machine learning models with two features related to cyanobacterial blooms. This proved that we could make efficient and effective models with a low number of inputs.
Collapse
|
6
|
Parallelization of a 3-Dimensional Hydrodynamics Model Using a Hybrid Method with MPI and OpenMP. Processes (Basel) 2021. [DOI: 10.3390/pr9091548] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Process-based numerical models developed to perform hydraulic/hydrologic/water quality analysis of watersheds and rivers have become highly sophisticated, with a corresponding increase in their computation time. However, for incidents such as water pollution, rapid analysis and decision-making are critical. This paper proposes an optimized parallelization scheme to reduce the computation time of the Environmental Fluid Dynamics Code-National Institute of Environmental Research (EFDC-NIER) model, which has been continuously developed for water pollution or algal bloom prediction in rivers. An existing source code and a parallel computational code with open multi-processing (OpenMP) and a message passing interface (MPI) were optimized, and their computation times compared. Subsequently, the simulation results for the existing EFDC model and the model with the parallel computation code were compared. Furthermore, the optimal parallel combination for hybrid parallel computation was evaluated by comparing the simulation time based on the number of cores and threads. When code parallelization was applied, the performance improved by a factor of approximately five compared to the existing source code. Thus, if the parallel computational source code applied in this study is used, urgent decision-making will be easier for events such as water pollution incidents.
Collapse
|