1
|
Miller T, Michoński G, Durlik I, Kozlovska P, Biczak P. Artificial Intelligence in Aquatic Biodiversity Research: A PRISMA-Based Systematic Review. BIOLOGY 2025; 14:520. [PMID: 40427709 PMCID: PMC12109572 DOI: 10.3390/biology14050520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2025] [Revised: 04/30/2025] [Accepted: 05/06/2025] [Indexed: 05/29/2025]
Abstract
Freshwater ecosystems are increasingly threatened by climate change and anthropogenic activities, necessitating innovative and scalable monitoring solutions. Artificial intelligence (AI) has emerged as a transformative tool in aquatic biodiversity research, enabling automated species identification, predictive habitat modeling, and conservation planning. This systematic review follows the PRISMA framework to analyze AI applications in freshwater biodiversity studies. Using a structured literature search across Scopus, Web of Science, and Google Scholar, we identified 312 relevant studies published between 2010 and 2024. This review categorizes AI applications into species identification, habitat assessment, ecological risk evaluation, and conservation strategies. A risk of bias assessment was conducted using QUADAS-2 and RoB 2 frameworks, highlighting methodological challenges, such as measurement bias and inconsistencies in the model validation. The citation trends demonstrate exponential growth in AI-driven biodiversity research, with leading contributions from China, the United States, and India. Despite the growing use of AI in this field, this review also reveals several persistent challenges, including limited data availability, regional imbalances, and concerns related to model generalizability and transparency. Our findings underscore AI's potential in revolutionizing biodiversity monitoring but also emphasize the need for standardized methodologies, improved data integration, and interdisciplinary collaboration to enhance ecological insights and conservation efforts.
Collapse
Affiliation(s)
- Tymoteusz Miller
- Institute of Marine and Environmental Sciences, University of Szczecin, 71-415 Szczecin, Poland;
| | - Grzegorz Michoński
- Institute of Marine and Environmental Sciences, University of Szczecin, 71-415 Szczecin, Poland;
| | - Irmina Durlik
- Polish Society of Bioinformatics and Data Science, Biodata, 71-214 Szczecin, Poland; (I.D.); (P.B.)
- Faculty of Navigation, Maritime University of Szczecin, 70-500 Szczecin, Poland
| | - Polina Kozlovska
- Faculty of Economics, Finance and Management, University of Szczecin, 71-412 Szczecin, Poland;
| | - Paweł Biczak
- Polish Society of Bioinformatics and Data Science, Biodata, 71-214 Szczecin, Poland; (I.D.); (P.B.)
| |
Collapse
|
2
|
Jeong B, Shin H, Shin J, Cha Y. The analysis of spatiotemporal effects of environmental factors on harmful algal blooms in a bloom-prone river using partial least squares structural equation modeling. WATER SCIENCE AND TECHNOLOGY : A JOURNAL OF THE INTERNATIONAL ASSOCIATION ON WATER POLLUTION RESEARCH 2025; 91:1128-1140. [PMID: 40448456 DOI: 10.2166/wst.2025.066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2024] [Accepted: 03/02/2025] [Indexed: 06/02/2025]
Abstract
Since rivers are adversely affected by harmful algal blooms (HABs), it is necessary to develop countermeasures through analyzing the relationship between environmental variables and HABs. This study focused on analyzing the connections between HABs and environmental variables in the middle reaches of the Nakdong River in South Korea. Partial least squares structural equation modeling (PLS-SEM) was used to identify those relationships. The study developed three different PLS-SEM models to investigate various aspects, including lagged effects of environmental factors, influences from upstream and tributaries, and interactions between genera of HABs. The results of the study revealed that the magnitude of HABs had the strongest relationships with nutrient concentrations, particularly 1 week prior to HABs measurement. Additionally, the magnitude of HABs showed stronger relationships with upstream nutrient concentrations compared with tributaries' nutrient concentrations. Furthermore, the dominant genus of HABs in the study area, Microcystis, showed significant relationships with temperature and nutrient concentrations. However, the study did not find significant relationships between Microcystis and other harmful cyanobacteria genera. The methodological framework provides valuable insight into the management of HABs. It allows for the analysis of multiple aspects of the relationships between environmental factors and HABs, which is crucial for effective water resource management.
Collapse
Affiliation(s)
- Bongseok Jeong
- School of Environmental Engineering, University of Seoul, Dongdaemun-gu, Seoul 02504, Republic of Korea
- Present address: Division for Environmental Planning, Water and Land Research Group, Korea Environment Institute, Sejong, Republic of Korea
| | - Hyunjoo Shin
- Water Quality Assessment Research Division, Water Environment Research Department, National Institute of Environmental Research, Incheon 22689, Republic of Korea
- Department of Life Science, Graduate School of Kyonggi University, Yeongtong-gu, Suwon 16227, Republic of Korea
| | - Jihoon Shin
- School of Environmental Engineering, University of Seoul, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - YoonKyung Cha
- School of Environmental Engineering, University of Seoul, Dongdaemun-gu, Seoul 02504, Republic of Korea E-mail:
| |
Collapse
|
3
|
Sheik AG, Sireesha M, Kumar A, Dasari PR, Patnaik R, Bagchi SK, Ansari FA, Bux F. The role of industry 4.0 enabling technologies for predicting, and managing of algal blooms: Bridging gaps and unlocking potential. MARINE POLLUTION BULLETIN 2025; 212:117493. [PMID: 39740519 DOI: 10.1016/j.marpolbul.2024.117493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 12/19/2024] [Accepted: 12/19/2024] [Indexed: 01/02/2025]
Abstract
Recent advancements in data analytics, predictive modeling, and optimization have highlighted the potential of integrating algal blooms (ABs) with Industry 4.0 technologies. Among these innovations, digital twins (DT) have gained prominence, driven by the rapid development of artificial intelligence (AI) and machine learning (ML) technologies, particularly those associated with the Internet of Things (IoT). AI is pivotal in enabling IoT and DT by enhancing decision-making, automating processes, and delivering actionable insights. The intersection of DT and AI in the context of ABs presents a promising new area for research exploration. Digital twins, which serve as virtual replicas of physical entities, systems, or processes, offer significant potential when combined with AI technologies, paving the way for novel research avenues in algal management (AM). This literature review examines digital twins' challenges and applications within AM. It also comprehensively analyzes the current state of IoT-based applications developed using AI and DT. The review further explores the tools for implementing DT systems and surveys existing AI techniques incorporating DTs. Additionally, it discusses the opportunities and challenges associated with creating various IoT-based applications by integrating AI and DT. The review concludes by identifying unexplored research avenues in this emerging field, underscoring the potential for future advancements in Artificial Intelligence of Things (AIoT) within AM.
Collapse
Affiliation(s)
- Abdul Gaffar Sheik
- Institute for Water and Wastewater Technology, Durban University of Technology, Durban-4001, South Africa; School of Engineering, The University of British Columbia Okanagan, 3333 University Way, Kelowna, BC V1V 1V7, Canada
| | - Mantena Sireesha
- Center for Geospatial and Saline Studies, Sasi Institute of Technology & Engineering, Tadepalligudem, Andhra Pradesh-534101, India; Department of Computer Science and Engineering, Sasi Institute of Technology & Engineering, Tadepalligudem, Andhra Pradesh-534101, India
| | - Arvind Kumar
- Institute for Water and Wastewater Technology, Durban University of Technology, Durban-4001, South Africa
| | - Purushottama Rao Dasari
- Department of Chemical and Materials Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| | - Reeza Patnaik
- Institute for Water and Wastewater Technology, Durban University of Technology, Durban-4001, South Africa
| | - Sourav Kumar Bagchi
- Institute for Water and Wastewater Technology, Durban University of Technology, Durban-4001, South Africa
| | - Faiz Ahmad Ansari
- Institute for Water and Wastewater Technology, Durban University of Technology, Durban-4001, South Africa
| | - Faizal Bux
- Institute for Water and Wastewater Technology, Durban University of Technology, Durban-4001, South Africa.
| |
Collapse
|
4
|
Kim JH, Byeon S, Lee H, Lee DH, Lee MY, Shin JK, Chon K, Jeong DS, Park Y. Deep-learning and data-resampling: A novel approach to predict cyanobacterial alert levels in a reservoir. ENVIRONMENTAL RESEARCH 2024; 263:120135. [PMID: 39393456 DOI: 10.1016/j.envres.2024.120135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 10/05/2024] [Accepted: 10/08/2024] [Indexed: 10/13/2024]
Abstract
The proliferation of harmful algal blooms results in adverse impacts on aquatic ecosystems and public health. Early warning system monitors algal bloom occurrences and provides management strategies for promptly addressing high-concentration algal blooms following their occurrence. In this study, we aimed to develop a proactive prediction model for cyanobacterial alert levels to enable efficient decision-making in management practices. We utilized 11 years of water quality, hydrodynamic, and meteorological data from a reservoir that experiences frequent harmful cyanobacterial blooms in summer. We used these data to construct a deep-learning model, specifically a 1D convolution neural network (1D-CNN) model, to predict cyanobacterial alert levels one week in advance. However, the collected distribution of algal alert levels was imbalanced, leading to the biased training of data-driven models and performance degradation in model predictions. Therefore, an adaptive synthetic sampling method was applied to address the imbalance in the minority class data and improve the predictive performance of the 1D-CNN. The adaptive synthetic sampling method resolved the imbalance in the data during the training phase by incorporating an additional 156 and 196 data points for the caution and warning levels, respectively. The selected optimal 1D-CNN model with a filter size of 5 and comprising 16 filters achieved training and testing prediction accuracies of 97.3% and 85.0%, respectively. During the test phase, the prediction accuracies for each algal alert level (L-0, L-1, and L-2) were 89.9%, 79.2%, and 71.4%, respectively, indicating reasonably consistent predictive results for all three alert levels. Therefore, the use of synthetic data addressed data imbalances and enhanced the predictive performance of the data-driven model. The reliable forecasts produced by the improved model can support the development of management strategies to mitigate harmful algal blooms in reservoirs and can aid in building an early warning system to facilitate effective responses.
Collapse
Affiliation(s)
- Jin Hwi Kim
- Future and Fusion Lab of Architectural Civil and Environmental Engineering, Korea University, Seoul, 02841, Republic of Korea
| | - Seohyun Byeon
- Department of Civil and Environmental Engineering, Konkuk University, Gwangjin-gu, Seoul, 05029, Republic of Korea
| | - Hankyu Lee
- Department of Civil and Environmental Engineering, Konkuk University, Gwangjin-gu, Seoul, 05029, Republic of Korea
| | - Dong Hoon Lee
- Department of Civil and Environmental Engineering, Dongguk University-Seoul, 30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea
| | - Min-Yong Lee
- Division of Hazard Management, National Institute of Chemical Safety, Seogu, Incheon, 22689, Republic of Korea
| | - Jae-Ki Shin
- Limnoecological Science Research Institute Korea, THE HANGANG, Gyeongnam, 50440, Republic of Korea
| | - Kangmin Chon
- Department of Environmental Engineering, Kangwon National University, Gangwon-do, 24341, Republic of Korea
| | - Dae Seong Jeong
- Future and Fusion Lab of Architectural Civil and Environmental Engineering, Korea University, Seoul, 02841, Republic of Korea
| | - Yongeun Park
- Department of Civil and Environmental Engineering, Konkuk University, Gwangjin-gu, Seoul, 05029, Republic of Korea.
| |
Collapse
|
5
|
Park J, Seong B, Park Y, Lee WH, Heo TY. Explainable artificial intelligence for the interpretation of ensemble learning performance in algal bloom estimation. WATER ENVIRONMENT RESEARCH : A RESEARCH PUBLICATION OF THE WATER ENVIRONMENT FEDERATION 2024; 96:e11140. [PMID: 39382139 DOI: 10.1002/wer.11140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 08/26/2024] [Accepted: 09/18/2024] [Indexed: 10/10/2024]
Abstract
Chlorophyll-a (Chl-a) concentrations, a key indicator of algal blooms, were estimated using the XGBoost machine learning model with 23 variables, including water quality and meteorological factors. The model performance was evaluated using three indices: root mean square error (RMSE), RMSE-observation standard deviation ratio (RSR), and Nash-Sutcliffe efficiency. Nine datasets were created by averaging 1 hour data to cover time frequencies ranging from 1 hour to 1 month. The dataset with relatively high observation frequencies (1-24 h) maintained stability, with an RSR ranging between 0.61 and 0.65. However, the model's performance declined significantly for datasets with weekly and monthly intervals. The Shapley value (SHAP) analysis, an explainable artificial intelligence method, was further applied to provide a quantitative understanding of how environmental factors in the watershed impact the model's performance and is also utilized to enhance the practical applicability of the model in the field. The number of input variables for model construction increased sequentially from 1 to 23, starting from the variable with the highest SHAP value to that with the lowest. The model's performance plateaued after considering five or more variables, demonstrating that stable performance could be achieved using only a small number of variables, including relatively easily measured data collected by real-time sensors, such as pH, dissolved oxygen, and turbidity. This result highlights the practicality of employing machine learning models and real-time sensor-based measurements for effective on-site water quality management. PRACTITIONER POINTS: XAI quantifies the effects of environmental factors on algal bloom prediction models The effects of input variable frequency and seasonality were analyzed using XAI XAI analysis on key variables ensures cost-effective model development.
Collapse
Affiliation(s)
- Jungsu Park
- Department of Civil and Environmental Engineering, Hanbat National University, Republic of Korea
| | - Byeongchan Seong
- Department of Applied Statistics, Chung-Ang University, Seoul, Republic of Korea
| | - Yeonjeong Park
- Water Quality Assessment Research Division, National Institute of Environmental Research, Incheon, Republic of Korea
| | - Woo Hyoung Lee
- Department of Civil, Environmental and Construction Engineering, University of Central Florida, Orlando, FL, USA
| | - Tae-Young Heo
- Department of Information & Statistics, Chungbuk National University, Cheongju, Chungbuk, Republic of Korea
| |
Collapse
|
6
|
Lee B, Im JK, Han JW, Kang T, Kim W, Kim M, Lee S. Multiple remotely sensed datasets and machine learning models to predict chlorophyll-a concentration in the Nakdong River, South Korea. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024; 31:58505-58526. [PMID: 39316212 DOI: 10.1007/s11356-024-35005-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 09/13/2024] [Indexed: 09/25/2024]
Abstract
The Nakdong River is a crucial water resource in South Korea, supplying water for various purposes such as potable water, irrigation, and recreation. However, the river is vulnerable to algal blooms due to the inflow of pollutants from multiple points and non-point sources. Monitoring chlorophyll-a (Chl-a) concentrations, a proxy for algal biomass is essential for assessing the trophic status of the river and managing its ecological health. This study aimed to improve the accuracy and reliability of Chl-a estimation in the Nakdong River using machine learning models (MLMs) and simultaneous use of multiple remotely sensed datasets. This study compared the performances of four MLMs: multi-layer perceptron (MLP), support vector machine (SVM), random forest (RF), and eXetreme Gradient Boosting (XGB) using three different input datasets: (1) two remotely sensed datasets (Sentinel-2 and Landsat-8), (2) standalone Sentinel-2, and (3) standalone Landsat-8. The results showed that the MLP model with multiple remotely sensed datasets outperformed other MLMs with 0.43 - 0.86 greater in R2 and 0.36 - 5.88 lower in RMSE. The MLP model demonstrated the highest performance across the range of Chl-a concentrations and predicted peaks above 20 mg/m3 relatively well compared to other models. This was likely due to the capacity of MLP to handle imbalanced datasets. The predictive map of the spatial distribution of Chl-a generated by MLP well captured the areas with high and low Chl-a concentrations. This study pointed out the impacts of imbalanced Chl-a concentration observations (dominated by low Chl-a concentrations) on the performance of MLMs. The data imbalance likely led to MLMs poorly trained for high Chl-a values, producing low prediction accuracy. In conclusion, this study demonstrated the value of multiple remotely sensed datasets in enhancing the accuracy and reliability of Chl-a estimation, mainly when using the MLP model. These findings would provide valuable insights into utilizing MLMs effectively for Chl-a monitoring.
Collapse
Affiliation(s)
- Byeongwon Lee
- Department of Environmental Science & Ecological Engineering, College of Life Sciences & Biotechnology, Korea University, 145, Anam-Ro, Seongbuk-Gu, Seoul, 02841, South Korea
| | - Jong Kwon Im
- National Institute of Environmental Research, 42, Hwangyeong-Ro, Seo-Gu, Incheon, 22689, South Korea
| | - Ji Woo Han
- Han River Environment Research Center, National Institute of Environmental Research, 42, Dumulmeori-Gil 68Beon-Gil, Yangseo-Myeon, Yangpyeong-Gun, 12585, South Korea
| | - Taegu Kang
- Han River Environment Research Center, National Institute of Environmental Research, 42, Dumulmeori-Gil 68Beon-Gil, Yangseo-Myeon, Yangpyeong-Gun, 12585, South Korea
| | - Wonkook Kim
- Department of Civil and Environmental Engineering, Pusan National University, 2, Busandaehak-Ro 63Beon-Gil, Geumjeong-Gu, Busan, 46241, South Korea
| | - Moonil Kim
- Division of ICT-Integrated Environment, Pyeongtaek University, 3825, Seodong-Daero, Pyeongtaek-Si, 17869, Gyeonggi-Do, South Korea
| | - Sangchul Lee
- Department of Environmental Science & Ecological Engineering, College of Life Sciences & Biotechnology, Korea University, 145, Anam-Ro, Seongbuk-Gu, Seoul, 02841, South Korea.
| |
Collapse
|
7
|
Ugulen HS, Koestner D, Sandven H, Hamre B, Kristoffersen AS, Saetre C. Neural network approach for correction of multiple scattering errors in the LISST-VSF instrument. OPTICS EXPRESS 2023; 31:32737-32751. [PMID: 37859069 DOI: 10.1364/oe.495523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 08/31/2023] [Indexed: 10/21/2023]
Abstract
The LISST-VSF is a commercially developed instrument used to measure the volume scattering function (VSF) and attenuation coefficient in natural waters, which are important for remote sensing, environmental monitoring and underwater optical wireless communication. While the instrument has been shown to work well at relatively low particle concentration, previous studies have shown that the VSF obtained from the LISST-VSF instrument is heavily influenced by multiple scattering in turbid waters. High particle concentrations result in errors in the measured VSF, as well as the derived properties, such as the scattering coefficient and phase function, limiting the range at which the instrument can be used reliably. Here, we present a feedforward neural network approach for correcting this error, using only the measured VSF as input. The neural network is trained with a large dataset generated using Monte Carlo simulations of the LISST-VSF with scattering coefficients b=0.05-50m-1, and tested on VSFs from measurements with natural water samples. The results show that the neural network estimated VSF is very similar to the expected VSF without multiple scattering errors, both in angular shape and magnitude. One example showed that the error in the scattering coefficient was reduced from 103% to 5% for a benchtop measurement of natural water sample with expected b=10.6m-1. Hence, the neural network drastically reduces uncertainties in the VSF and derived properties resulting from measurements with the LISST-VSF in turbid waters.
Collapse
|
8
|
Kim J, Jung W, An J, Oh HJ, Park J. Self-optimization of training dataset improves forecasting of cyanobacterial bloom by machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 866:161398. [PMID: 36621510 DOI: 10.1016/j.scitotenv.2023.161398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/30/2022] [Accepted: 01/01/2023] [Indexed: 06/17/2023]
Abstract
Data-driven model (DDM) prediction of aquatic ecological responses, such as cyanobacterial harmful algal blooms (CyanoHABs), is critically influenced by the choice of training dataset. However, a systematic method to choose the optimal training dataset considering data history has not yet been developed. Providing a comprehensive procedure with self-based optimal training dataset-selecting algorithm would self-improve the DDM performance. In this study, a novel algorithm was developed to self-generate possible training dataset candidates from the available input and output variable data and self-choose the optimal training dataset that maximizes CyanoHAB forecasting performance. Nine years of meteorological and water quality data (input) and CyanoHAB data (output) from a site on the Nakdong River, South Korea, were acquired and pretreated via an automated process. An artificial neural network (ANN) was chosen from among the DDM candidates by first-cut training and validation using the entire collected dataset. Optimal training datasets for the ANN were self-selected from among the possible self-generated training datasets by systematically simulating the performance in response to 46 periods and 40 sizes (number of data elements) of the generated training datasets. The best-performing models were screened to identify the candidate models. The best performance corresponded to 6-7 years of training data (∼18 % lower error) for forecasting 1-28 d ahead (1-28 d of forecasting lead time (FLT)). After the hyperparameters of the screened model candidates were fine-tuned, the best-performing model (7 years of data with 14 d FLT) was self-determined by comparing the forecasts with unseen CyanoHAB events. The self-determined model could reasonably predict CyanoHABs occurring in Korean waters (cyanobacteria cells/mL ≥ 1000). Thus, our proposed method of self-optimizing the training dataset effectively improved the predictive accuracy and operational efficiency of the DDM prediction of CyanoHAB.
Collapse
Affiliation(s)
- Jayun Kim
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea
| | - Woosik Jung
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea
| | - Jusuk An
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea; Department of Environmental Research, Korea Institute of Civil Engineering and Building Technology, Goyang, Republic of Korea
| | - Hyun Je Oh
- Department of Environmental Research, Korea Institute of Civil Engineering and Building Technology, Goyang, Republic of Korea
| | - Joonhong Park
- Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea.
| |
Collapse
|
9
|
Wen J, Yang J, Li Y, Gao L. Harmful algal bloom warning based on machine learning in maritime site monitoring. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108569] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
10
|
A Classification-Based Machine Learning Approach to the Prediction of Cyanobacterial Blooms in Chilgok Weir, South Korea. WATER 2022. [DOI: 10.3390/w14040542] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Cyanobacterial blooms appear by complex causes such as water quality, climate, and hydrological factors. This study aims to present the machine learning models to predict occurrences of these complicated cyanobacterial blooms efficiently and effectively. The dataset was classified into groups consisting of two, three, or four classes based on cyanobacterial cell density after a week, which was used as the target variable. We developed 96 machine learning models for Chilgok weir using four classification algorithms: k-Nearest Neighbor, Decision Tree, Logistic Regression, and Support Vector Machine. In the modeling methodology, we first selected input features by applying ANOVA (Analysis of Variance) and solving a multi-collinearity problem as a process of feature selection, which is a method of removing irrelevant features to a target variable. Next, we adopted an oversampling method to resolve the problem of having an imbalanced dataset. Consequently, the best performance was achieved for models using datasets divided into two classes, with an accuracy of 80% or more. Comparatively, we confirmed low accuracy of approximately 60% for models using datasets divided into three classes. Moreover, while we produced models with overall high accuracy when using logCyano (logarithm of cyanobacterial cell density) as a feature, several models in combination with air temperature and NO3-N (nitrate nitrogen) using two classes also demonstrated more than 80% accuracy. It can be concluded that it is possible to develop very accurate classification-based machine learning models with two features related to cyanobacterial blooms. This proved that we could make efficient and effective models with a low number of inputs.
Collapse
|
11
|
Xia R, Zou L, Zhang Y, Zhang Y, Chen Y, Liu C, Yang Z, Ma S. Algal bloom prediction influenced by the Water Transfer Project in the Middle-lower Hanjiang River. Ecol Modell 2022. [DOI: 10.1016/j.ecolmodel.2021.109814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
12
|
Liu L, Wang M, Li G, Wang Q. Construction of Predictive Model for Type 2 Diabetic Retinopathy Based on Extreme Learning Machine. Diabetes Metab Syndr Obes 2022; 15:2607-2617. [PMID: 36046759 PMCID: PMC9420743 DOI: 10.2147/dmso.s374767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 08/18/2022] [Indexed: 12/02/2022] Open
Abstract
PURPOSE The common cause of blindness in people with type 2 diabetes (T2D) is diabetic retinopathy (DR). Early fundus examinations have been shown to prevent vision loss, but routine ophthalmic screenings for patients with diabetes present significant financial and material challenges to existing health-care systems. The purpose of this study is to build a DR prediction model based on the extreme learning machine (ELM) and to compare the performance with the DR prediction models based on support machine vector (SVM), K proximity (KNN), random forest (RF) and artificial neural network (ANN). METHODS From January 1, 2020 to November 31, 2021, data were collected from electronic inpatient medical records at Lu'an Hospital of Anhui Medical University in China. An extreme learning machine (ELM) algorithm was used to develop a prediction model based on demographic data and blood testing and urine test results. Several metrics were used to evaluate the model's performance: (1) classification accuracy (ACC), (2) sensitivity, (3) specificity, (4) Precision,(5) Negative predictive value (NPV), (6) Training time and (7) area under the receiver operating characteristic (ROC) curve (AUC). RESULTS In terms of ACC, Sensitivity, Specificity, Precision, NPV and AUC, DR prediction model based on SVM and ELM is better than DR prediction model based on ANN, KNN and RF. The prediction model for diabetic retinopathy based on elm is the best among them in terms of ACC, Precision, Specificity, Training time and AUC, with 84.45%, 83.93%, 93.16%,1.24s, and 88.34%, respectively. The DR prediction model based on SVM is the best in terms of sensitivity and NPV, which are, respectively, 70.82% and 85.60%. CONCLUSION According to the findings of this study, the model based on the extreme learning machine presents an outstanding performance in predicting diabetic retinopathy thus providing technological assistance for screening of diabetic retinopathy.
Collapse
Affiliation(s)
- Lei Liu
- Graduate School of Bengbu Medical College, Bengbu Medical College, Bengbu City, People’s Republic of China
| | - Mengmeng Wang
- Graduate School of Bengbu Medical College, Bengbu Medical College, Bengbu City, People’s Republic of China
| | - Guocheng Li
- School of Finance & Mathematics, West Anhui University, Lu’an City, People’s Republic of China
| | - Qi Wang
- Graduate School of Bengbu Medical College, Bengbu Medical College, Bengbu City, People’s Republic of China
- Department of Endocrinology, Lu’an Hospital of Anhui Medical University, Lu’an City, People’s Republic of China
- Correspondence: Qi Wang, Department of Endocrinology, Lu’an Hospital of Anhui Medical University, No. 21, Wanxi West Road, Lu’an City, People’s Republic of China, Tel +86-13966299858, Email
| |
Collapse
|
13
|
Kim JH, Shin JK, Lee H, Lee DH, Kang JH, Cho KH, Lee YG, Chon K, Baek SS, Park Y. Improving the performance of machine learning models for early warning of harmful algal blooms using an adaptive synthetic sampling method. WATER RESEARCH 2021; 207:117821. [PMID: 34781184 DOI: 10.1016/j.watres.2021.117821] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 10/23/2021] [Accepted: 10/26/2021] [Indexed: 06/13/2023]
Abstract
Many countries have attempted to monitor and predict harmful algal blooms to mitigate related problems and establish management practices. The current alert system-based sampling of cell density is used to intimate the bloom status and to inform rapid and adequate response from water-associated organizations. The objective of this study was to develop an early warning system for cyanobacterial blooms to allow for efficient decision making prior to the occurrence of algal blooms and to guide preemptive actions regarding management practices. In this study, two machine learning models: artificial neural network (ANN) and support vector machine (SVM), were constructed for the timely prediction of alert levels of algal bloom using eight years' worth of meteorological, hydrodynamic, and water quality data in a reservoir where harmful cyanobacterial blooms frequently occur during summer. However, the proportion imbalance on all alert level data as the output variable leads to biased training of the data-driven model and degradation of model prediction performance. Therefore, the synthetic data generated by an adaptive synthetic (ADASYN) sampling method were used to resolve the imbalance of minority class data in the original data and to improve the prediction performance of the models. The results showed that the overall prediction performance yielded by the caution level (L1) and warning level (L2) in the models constructed using a combination of original and synthetic data was higher than the models constructed using original data only. In particular, the optimal ANN and SVM constructed using a combination of original and synthetic data during both training (including validation) and test generated distinctively improved recall and precision values of L1, which is a very critical alert level as it indicates a transition status from normalcy to bloom formation. In addition, both optimal models constructed using synthetic-added data exhibited improvement in recall and precision by more than 33.7% while predicting L-1 and L-2 during the test. Therefore, the application of synthetic data can improve detection performance of machine learning models by solving the imbalance of observed data. Reliable prediction by the improved models can be used to aid the design of management practices to mitigate algal blooms within a reservoir.
Collapse
Affiliation(s)
- Jin Hwi Kim
- Department of Civil, Environmental and Plant Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Jae-Ki Shin
- Office for Busan Region Management of the Nakdong River, Korea Water Resources Corporation (K-water), Busan 49300, Republic of Korea
| | - Hankyu Lee
- Department of Civil, Environmental and Plant Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Dong Hoon Lee
- Department of Civil and Environmental Engineering, Dongguk University, Seoul, 04620, Republic of Korea
| | - Joo-Hyon Kang
- Department of Civil and Environmental Engineering, Dongguk University, Seoul, 04620, Republic of Korea
| | - Kyung Hwa Cho
- School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| | - Yong-Gu Lee
- Department of Environmental Engineering, Kangwon National University, Gangwon-do 24341, Republic of Korea
| | - Kangmin Chon
- Department of Environmental Engineering, Kangwon National University, Gangwon-do 24341, Republic of Korea; Department of Integrated Energy and Infra System, Kangwon National University, Gangwon-do 24341, Republic of Korea
| | - Sang-Soo Baek
- School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea.
| | - Yongeun Park
- Department of Civil, Environmental and Plant Engineering, Konkuk University, Seoul 05029, Republic of Korea; Department of Civil and Environmental Engineering, Konkuk University, Seoul 05029, Republic of Korea.
| |
Collapse
|
14
|
Ly QV, Nguyen XC, Lê NC, Truong TD, Hoang THT, Park TJ, Maqbool T, Pyo J, Cho KH, Lee KS, Hur J. Application of Machine Learning for eutrophication analysis and algal bloom prediction in an urban river: A 10-year study of the Han River, South Korea. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 797:149040. [PMID: 34311376 DOI: 10.1016/j.scitotenv.2021.149040] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/29/2021] [Accepted: 07/10/2021] [Indexed: 06/13/2023]
Abstract
The increasing release of nutrients to aquatic environments has led to great concern regarding eutrophication and the risk of unwanted algal blooms. Based on observational data of 20 water quality parameters measured on a monthly basis at 40 stations from 2011 to 2020, this study applied different Machine Learning (ML) algorithms to suggest the best option for algal bloom prediction in the Han River, a large river in South Korea. Eight different ML algorithms were categorized into several groups of statistical learning, regression family, and deep learning, and were then compared for their suitability to predict the chlorophyll-derived trophic index (TSI-Chla). ML algorithms helped identify the most important water quality parameters contributing to algal bloom prediction. The ML results confirmed that eutrophication and algal proliferation were governed by the complex interplay between nutrients (nitrogen and phosphorus), organic contaminants, and environmental factors. Of the models tested, the adaptive neuro-fuzzy inference system (ANFIS) exhibited the best performance owing to its consistent and outperforming prediction both quantitatively (i.e., via regression) and qualitatively (i.e., via classification), which was evidenced by the lowest value of mean absolute error (MAE) of 0.09, and the highest F1-score, Recall and Precision of 0.97, 0.98 and 0.96, respectively. In a further step, a representative web application was constructed to assist common users to predict the trophic status of the Han River. This study demonstrated that ML techniques are not only promising for highly accurate water quality modeling of urban rivers, but also reduce time and labor intensity for experiments, which decreases the number of monitored water quality parameters, providing further insights into the driving factors of water quality deterioration. They ultimately help devise proactive strategies for sustainable water management.
Collapse
Affiliation(s)
- Quang Viet Ly
- Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, Guangdong, China
| | - Xuan Cuong Nguyen
- Laboratory of Energy and Environmental Science, Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam; Faculty of Environmental and Chemical Engineering, Duy Tan University, Da Nang 550000, Vietnam
| | - Ngoc C Lê
- School of Applied Mathematics and Informatics, Hanoi University of Science and Technology, Hanoi 100000, Vietnam
| | - Tien-Dung Truong
- School of Applied Mathematics and Informatics, Hanoi University of Science and Technology, Hanoi 100000, Vietnam
| | - Thu-Huong T Hoang
- School of Environmental Science and Technology, Hanoi University of Science and Technology, Hanoi 100000, Vietnam.
| | - Tae Jun Park
- Department of Environment and Energy, Sejong University, Seoul 05006, South Korea
| | - Tahir Maqbool
- Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, Guangdong, China
| | - JongCheol Pyo
- Center for Environmental Data Strategy, Korea Environment Institute, Sejong 30147, South Korea
| | - Kyung Hwa Cho
- School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan 44919, South Korea
| | - Kwang-Sik Lee
- Korea Basic Science Institute, Yeongudanji-ro 162, Cheongwon-gu, Cheongju, Chungcheongbuk-do 28119, South Korea
| | - Jin Hur
- Department of Environment and Energy, Sejong University, Seoul 05006, South Korea.
| |
Collapse
|
15
|
Sadeghi H, Mohandes SR, Hosseini MR, Banihashemi S, Mahdiyar A, Abdullah A. Developing an Ensemble Predictive Safety Risk Assessment Model: Case of Malaysian Construction Projects. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17228395. [PMID: 33202768 PMCID: PMC7696253 DOI: 10.3390/ijerph17228395] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 10/02/2020] [Accepted: 10/15/2020] [Indexed: 11/16/2022]
Abstract
Occupational Health and Safety (OHS)-related injuries are vexing problems for construction projects in developing countries, mostly due to poor managerial-, governmental-, and technical safety-related issues. Though some studies have been conducted on OHS-associated issues in developing countries, research on this topic remains scarce. A review of the literature shows that presenting a predictive assessment framework through machine learning techniques can add much to the field. As for Malaysia, despite the ongoing growth of the construction sector, there has not been any study focused on OHS assessment of workers involved in construction activities. To fill these gaps, an Ensemble Predictive Safety Risk Assessment Model (EPSRAM) is developed in this paper as an effective tool to assess the OHS risks related to workers on construction sites. The developed EPSRAM is based on the integration of neural networks with fuzzy inference systems. To show the effectiveness of the EPSRAM developed, it is applied to several Malaysian construction case projects. This paper contributes to the field in several ways, through: (1) identifying major potential safety risks, (2) determining crucial factors that affect the safety assessment for construction workers, (3) predicting the magnitude of identified safety risks accurately, and (4) predicting the evaluation strategies applicable to the identified risks. It is demonstrated how EPSRAM can provide safety professionals and inspectors concerned with well-being of workers with valuable information, leading to improving the working environment of construction crew members.
Collapse
Affiliation(s)
- Haleh Sadeghi
- Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China; (H.S.); (S.R.M.)
| | - Saeed Reza Mohandes
- Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China; (H.S.); (S.R.M.)
| | - M. Reza Hosseini
- School of Architecture and Built Environment, Deakin University, Geelong 3217, VIC, Australia;
| | - Saeed Banihashemi
- Department of Building and Construction Management, University of Canberra, Bruce 2617, ACT, Australia;
| | - Amir Mahdiyar
- School of Housing, Building and Planning, Universiti Sains Malaysia, Penang 11800, Malaysia
- Correspondence:
| | - Arham Abdullah
- Universiti Malaysia Kelantan, Beg Bercunci No. 01, Bachok, Kelantan 16300, Malaysia;
| |
Collapse
|
16
|
Alsayed A, Sadir H, Kamil R, Sari H. Prediction of Epidemic Peak and Infected Cases for COVID-19 Disease in Malaysia, 2020. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E4076. [PMID: 32521641 PMCID: PMC7312594 DOI: 10.3390/ijerph17114076] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 05/08/2020] [Accepted: 05/11/2020] [Indexed: 12/12/2022]
Abstract
The coronavirus COVID-19 has recently started to spread rapidly in Malaysia. The number of total infected cases has increased to 3662 on 05 April 2020, leading to the country being placed under lockdown. As the main public concern is whether the current situation will continue for the next few months, this study aims to predict the epidemic peak using the Susceptible-Exposed-Infectious-Recovered (SEIR) model, with incorporation of the mortality cases. The infection rate was estimated using the Genetic Algorithm (GA), while the Adaptive Neuro-Fuzzy Inference System (ANFIS) model was used to provide short-time forecasting of the number of infected cases. The results show that the estimated infection rate is 0.228 ± 0.013, while the basic reproductive number is 2.28 ± 0.13. The epidemic peak of COVID-19 in Malaysia could be reached on 26 July 2020, with an uncertain period of 30 days (12 July-11 August). Possible interventions by the government to reduce the infection rate by 25% over two or three months would delay the epidemic peak by 30 and 46 days, respectively. The forecasting results using the ANFIS model show a low Normalized Root Mean Square Error (NRMSE) of 0.041; a low Mean Absolute Percentage Error (MAPE) of 2.45%; and a high coefficient of determination (R2) of 0.9964. The results also show that an intervention has a great effect on delaying the epidemic peak and a longer intervention period would reduce the epidemic size at the peak. The study provides important information for public health providers and the government to control the COVID-19 epidemic.
Collapse
Affiliation(s)
- Abdallah Alsayed
- Department of Electrical and Electronic Engineering, Faculty of Engineering, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia
| | - Hayder Sadir
- Department of Computer and Wireless Communication, Faculty of Engineering, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia;
| | - Raja Kamil
- Department of Electrical and Electronic Engineering, Faculty of Engineering, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia
- Laboratory of Computational Statistics and Operations Research, Institute for Mathematical Research, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia
| | - Hasan Sari
- College of Computer Science and Information Technology, Universiti Tenaga Nasional, Kajang 43000, Malaysia;
| |
Collapse
|
17
|
Using Machine-Learning Algorithms for Eutrophication Modeling: Case Study of Mar Menor Lagoon (Spain). INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17041189. [PMID: 32069834 PMCID: PMC7068380 DOI: 10.3390/ijerph17041189] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 02/07/2020] [Accepted: 02/09/2020] [Indexed: 11/16/2022]
Abstract
The Mar Menor is a hypersaline coastal lagoon with high environmental value and a characteristic example of a highly anthropized hydro-ecosystem located in the southeast of Spain. An unprecedented eutrophication crisis in 2016 and 2019 with abrupt changes in the quality of its waters caused a great social alarm. Understanding and modeling the level of a eutrophication indicator, such as chlorophyll-a (Chl-a), benefits the management of this complex system. In this study, we investigate the potential machine learning (ML) methods to predict the level of Chl-a. Particularly, Multilayer Neural Networks (MLNNs) and Support Vector Regressions (SVRs) are evaluated using as a target dataset information of up to nine different water quality parameters. The most relevant input combinations were extracted using wrapper feature selection methods which simplified the structure of the model, resulting in a more accurate and efficient procedure. Although the performance in the validation phase showed that SVR models obtained better results than MLNNs, experimental results indicated that both ML algorithms provide satisfactory results in the prediction of Chl-a concentration, reaching up to 0.7 R2CV (cross-validated coefficient of determination) for the best-fit models.
Collapse
|
18
|
Hussein AM, Abd Elaziz M, Abdel Wahed MS, Sillanpää M. A new approach to predict the missing values of algae during water quality monitoring programs based on a hybrid moth search algorithm and the random vector functional link network. JOURNAL OF HYDROLOGY 2019; 575:852-863. [DOI: 10.1016/j.jhydrol.2019.05.073] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|