1
|
Chen F, Zhou B, Yang L, Zhuang J, Chen X. Assessing the risk of E. coli contamination from manure application in Chinese farmland by integrating machine learning and Phydrus. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 356:124345. [PMID: 38852664 DOI: 10.1016/j.envpol.2024.124345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 05/12/2024] [Accepted: 06/06/2024] [Indexed: 06/11/2024]
Abstract
This study aims to present a comprehensive study on the risks associated with the residual presence and transport of Escherichia coli (E. coli) in soil following the application of livestock manure in Chinese farmlands by integrating machine learning algorithms with mechanism-based models (Phydrus). We initially review 28 published papers to gather data on E. coli's die-off and attachment characteristics in soil. Machine learning models, including deep learning and gradient boosting machine, are employed to predict key parameters such as the die-off rate of E. coli and first-order attachment coefficient in soil. Then, Phydrus was used to simulate E. coli transport and survival in 23692 subregions in China. The model considered regional differences in E. coli residual risk and transport, influenced by soil properties, soil depths, precipitation, seasonal variations, and regional disparities. The findings indicate higher residual risks in regions such as the Northeast China, Eastern Qinghai-Tibet Plateau, and pronounced transport risks in the fringe of the Sichuan Basin fringe, the Loess Plateau, the North China Plain, the Northeast Plain, the Shigatse Basin, and the Shangri-La region. The study also demonstrates a significant reduction in both residual and transport risks one month after manure application, highlighting the importance of timing manure application and implementing region-specific standards. This research contributes to the broader understanding of pathogen behavior in agricultural soils and offers practical guidelines for managing the risks associated with manure use. This study's comprehensive method offers a potentially valuable tool for evaluating microbial contaminants in agricultural soils across the globe.
Collapse
Affiliation(s)
- Fengxian Chen
- Key Laboratory of Pollution Ecology and Environmental Engineering, Institute of Applied Ecology, Chinese Academy of Sciences, Shenyang, Liaoning 110016, China
| | - Bin Zhou
- Chair of model-based environmental exposure science, Faculty of Medicine, University of Augsburg, Augsburg 86159, Germany
| | - Liqiong Yang
- Key Laboratory of Pollution Ecology and Environmental Engineering, Institute of Applied Ecology, Chinese Academy of Sciences, Shenyang, Liaoning 110016, China
| | - Jie Zhuang
- Department of Biosystems Engineering and Soil Science, Institute for a Secure and Sustainable Environment, The University of Tennessee, Knoxville, TN 37996, United States
| | - Xijuan Chen
- Sino-Spain Joint Laboratory for Agricultural Environment Emerging Contaminants of Zhejiang Province, College of Environmental and Resource Sciences, Zhejiang Agriculture and Forestry University, Hangzhou 311300, China.
| |
Collapse
|
2
|
Narvaez-Montoya C, Mahlknecht J, Torres-Martínez JA, Mora A, Pino-Vargas E. FlowSOM clustering - A novel pattern recognition approach for water research: Application to a hyper-arid coastal aquifer system. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 915:169988. [PMID: 38211857 DOI: 10.1016/j.scitotenv.2024.169988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 01/04/2024] [Accepted: 01/05/2024] [Indexed: 01/13/2024]
Abstract
Monitoring and understanding of water resources have become essential in designing effective and sustainable management strategies to overcome the growing water quality challenges. In this context, the utilization of unsupervised learning techniques for evaluating environmental tracers has facilitated the exploration of sources and dynamics of groundwater systems through pattern recognition. However, conventional techniques may overlook spatial and temporal non-linearities present in water research data. This paper introduces the adaptation of FlowSOM, a pioneering approach that combines self-organizing maps (SOM) and minimal spanning trees (MST), with the fast-greedy network clustering algorithm to unravel intricate relationships within multivariate water quality datasets. By capturing connections within the data, this ensemble tool enhances clustering and pattern recognition. Applied to the complex water quality context of the hyper-arid transboundary Caplina/Concordia coastal aquifer system (Peru/Chile), the FlowSOM network and clustering yielded compelling results in pattern recognition of the aquifer salinization. Analyzing 143 groundwater samples across eight variables, including major ions, the approach supports the identification of distinct clusters and connections between them. Three primary sources of salinization were identified: river percolation, slow lateral aquitard recharge, and seawater intrusion. The analysis demonstrated the superiority of FlowSOM clustering over traditional techniques in the case study, producing clusters that align more closely with the actual hydrogeochemical pattern. The outcomes broaden the utilization of multivariate analysis in water research, presenting a comprehensive approach to support the understanding of groundwater systems.
Collapse
Affiliation(s)
- Christian Narvaez-Montoya
- Escuela de Ingenieria y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, Monterrey, N.L. 64849, Mexico
| | - Jürgen Mahlknecht
- Escuela de Ingenieria y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, Monterrey, N.L. 64849, Mexico.
| | - Juan Antonio Torres-Martínez
- Escuela de Ingenieria y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, Monterrey, N.L. 64849, Mexico
| | - Abrahan Mora
- Escuela de Ingenieria y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, Monterrey, N.L. 64849, Mexico
| | - Edwin Pino-Vargas
- Facultad de Ingenieria Civil, Arquitectura y Geotecnia, Universidad Nacional Jorge Basadre Grohmann, Av. Miraflores S/N, Tacna 23000, Peru
| |
Collapse
|
3
|
Zahra Q, Gul J, Shah AR, Yasir M, Karim AM. Antibiotic resistance genes prevalence prediction and interpretation in beaches affected by urban wastewater discharge. One Health 2023; 17:100642. [PMID: 38024281 PMCID: PMC10665162 DOI: 10.1016/j.onehlt.2023.100642] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 10/10/2023] [Indexed: 12/01/2023] Open
Abstract
Background The annual death toll of over 1.2 million worldwide is attributed to infections caused by resistant bacteria, driven by the significant impact of antibiotic misuse and overuse in spreading these bacteria and their associated antibiotic resistance genes (ARGs). While limited data suggest the presence of ARGs in beach environments, efficient prediction tools are needed for monitoring and detecting ARGs to ensure public health safety. This study aims to develop interpretable machine learning methods for predicting ARGs in beach waters, addressing the challenge of black-box models and enhancing our understanding of their internal mechanisms. Methods In this study, we systematically collected beach water samples and subsequently isolated bacteria from these samples using various differential and selective media supplemented with different antibiotics. Resistance profiles of bacteria were determined by using Kirby-Bauer disk diffusion method. Further, ARGs were enumerated by using the quantitative polymerase chain reaction (qPCR) to detect and quantify ARGs. The obtained qPCR data and hydro-meteorological were used to create an ML model with high prediction performance and we further used two explainable artificial intelligence (xAI) model-agnostic interpretation methods to describe the internal behavior of ML model. Results Using qPCR, we detected blaCTX-M, blaNDM, blaCMY, blaOXA, blatetX, blasul1, and blaaac(6'-Ib-cr) in the beach waters. Further, we developed ML prediction models for blaaac(6'-Ib-cr), blasul1, and blatetX using the hydro-metrological and qPCR-derived data and the models demonstrated strong performance, with R2 values of 0.957, 0.997, and 0.976, respectively. Conclusions Our findings show that environmental factors, such as water temperature, precipitation, and tide, are among the important predictors of the abundance of resistance genes at beaches.
Collapse
Affiliation(s)
- Qandeel Zahra
- Azra Naheed Medical College, Lahore 54000, Punjab, Pakistan
| | - Jawaria Gul
- Al-Nafees Medical College & Hospital, Islamabad 44000, Pakistan
| | - Ali Raza Shah
- Azra Naheed Medical College, Lahore 54000, Punjab, Pakistan
| | - Muhammad Yasir
- Special Infectious Agents Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Asad Mustafa Karim
- Department of Oriental Medicine and Biotechnology, College of Life Sciences, Kyung Hee University, Yongin-si 17104, South Korea
| |
Collapse
|
4
|
Iftikhar S, Karim AM, Karim AM, Karim MA, Aslam M, Rubab F, Malik SK, Kwon JE, Hussain I, Azhar EI, Kang SC, Yasir M. Prediction and interpretation of antibiotic-resistance genes occurrence at recreational beaches using machine learning models. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 328:116969. [PMID: 36495825 DOI: 10.1016/j.jenvman.2022.116969] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 11/22/2022] [Accepted: 12/03/2022] [Indexed: 06/17/2023]
Abstract
Antibiotic-resistant bacteria and antibiotic resistance genes (ARGs) are pollutants of worldwide concern that seriously threaten public health and ecosystems. Machine learning (ML) prediction models have been applied to predict ARGs in beach waters. However, the existing studies were conducted at a single location and had low prediction performance. Moreover, ML models are "black boxes" that do not reveal their predictions' internal nuances and mechanisms. This lack of transparency and trust can result in serious consequences when using these models in high-stakes decisions. In this study, we developed a gradient boosted regression tree based (GBRT) ML model and then described its behavior using six explainable artificial intelligence (XAI) model-agnostic explanation methods. We used hydro-meteorological and qPCR data from the beaches in South Korea and Pakistan and developed ML prediction models for aac (6'-lb-cr), sul1, and tetX with 10-fold time-blocked cross-validation performances of 4.9, 2.06 and 4.4 root mean squared logarithmic error, respectively. We then analyzed the local and global behavior of the developed ML model using four interpretation methods. The developed ML models showed that water temperature, precipitation and tide are the most important predictors for prediction of ARGs at recreational beaches. We show that the model-agnostic interpretation methods not only explain the behavior of the ML model but also provide insights into the behavior of the ML model under new unseen conditions. Moreover, these post-processing techniques can be a debugging tool for ML-based modeling.
Collapse
Affiliation(s)
- Sara Iftikhar
- Department of Electrical Engineering and Computer Sciences, National University of Sciences and Technology (NUST), Islamabad 64000, Pakistan
| | - Asad Mustafa Karim
- Department of Biotechnology, College of Life Sciences, Kyung Hee University, Yongin-si 17104, Republic of Korea
| | - Aoun Murtaza Karim
- Institute of Geology and Geophysics, University of Chinese Academy of Sciences, Beijing, China; Institute of Geology, University of the Punjab, Lahore 54590, Pakistan
| | | | - Muhammad Aslam
- Department of Artificial Intelligence, Sejong University, Seoul, 05006, Republic of Korea
| | - Fazila Rubab
- Department of Electrical and Computer Engineering, COMSATS University Islamabad, Wah Campus, Wah Cantt, 47040, Pakistan
| | - Sumera Kausar Malik
- Department of Bioscience and Biotechnology, The University of Suwon, Hwaseong-si, Gyeonggi-do 18323, Republic of Korea
| | - Jeong Eun Kwon
- Department of Biotechnology, College of Life Sciences, Kyung Hee University, Yongin-si 17104, Republic of Korea
| | - Imran Hussain
- Environmental Biotechnology Lab, Department of Biotechnology Comsats University Islamabad, Abbottabad Campus, Pakistan
| | - Esam I Azhar
- Special Infectious Agents Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah 21589, Saudi Arabia; Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Se Chan Kang
- Department of Biotechnology, College of Life Sciences, Kyung Hee University, Yongin-si 17104, Republic of Korea.
| | - Muhammad Yasir
- Special Infectious Agents Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah 21589, Saudi Arabia; Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia.
| |
Collapse
|
5
|
Adedeji IC, Ahmadisharaf E, Sun Y. Predicting in-stream water quality constituents at the watershed scale using machine learning. JOURNAL OF CONTAMINANT HYDROLOGY 2022; 251:104078. [PMID: 36206579 DOI: 10.1016/j.jconhyd.2022.104078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 09/09/2022] [Accepted: 09/11/2022] [Indexed: 06/16/2023]
Abstract
Predicting in-stream water quality is necessary to support the decision-making process of protecting healthy waterbodies and restoring impaired ones. Data-driven modeling is an efficient technique that can be used to support such efforts. Our objective was to determine if in-stream concentrations of contaminants, nutrients-total phosphorus (TP) and total nitrogen (TN) -total suspended solids (TSS), dissolved oxygen (DO), and fecal coliform bacteria (FC) can be predicted satisfactorily using machine learning (ML) algorithms based on publicly available datasets. To achieve this objective, we evaluated four modeling scenarios, differing in terms of the required inputs (i.e., publicly available datasets (e.g., land-use/land cover)), antecedent conditions, and additional in-stream water quality observations (e.g., pH and turbidity). We implemented five ML algorithms-Support Vector Machines, Random Forest (RF), eXtreme Gradient Boost (XGB), ensemble RF-XGB, and Artificial Neural Network (ANN) -and demonstrated our modeling framework in an inland stream-Bullfrog Creek, located near Tampa, Florida. The results showed that, while including additional water quality drivers improved overall model performance for all target constituents, TP, TN, DO, and TSS could still be predicted satisfactorily using only publicly available datasets (Nash-Sutcliffe efficiency [NSE] > 0.75 and percent bias [PBIAS] < 10%), whereas FC could not (NSE < 0.49 and PBIAS >25%). Additionally, antecedent conditions slightly improved predictions and reduced the predictive uncertainty, particularly when paired with other water quality observations (6.9% increase in NSE for FC, and 2.7% for TP, TN, DO, and TSS). Also, comparable model performances of all water quality constituents in wet and dry seasons suggest minimal season-dependence of the predictions (<4% difference in NSE and < 10% difference in PBIAS). Our developed modeling framework is generic and can serve as a complementary tool for monitoring and predicting in-stream water quality constituents.
Collapse
Affiliation(s)
- Itunu C Adedeji
- Department of Civil and Environmental Engineering, Resilient Infrastructure and Disaster Response Center, Florida A&M University-Florida State University College of Engineering, 2525 Pottsdamer St., Tallahassee, FL 32310, USA.
| | - Ebrahim Ahmadisharaf
- Department of Civil and Environmental Engineering, Resilient Infrastructure and Disaster Response Center, Florida A&M University-Florida State University College of Engineering, 2525 Pottsdamer St., Tallahassee, FL 32310, USA.
| | - Yanshuo Sun
- Department of Industrial and Manufacturing Engineering, Resilient Infrastructure and Disaster Response Center, Florida A&M University-Florida State University College of Engineering, 2525 Pottsdamer St., Tallahassee, FL 32310, USA.
| |
Collapse
|
6
|
An application of machine learning regression to feature selection: a study of logistics performance and economic attribute. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07266-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
AbstractThis study demonstrates how to profit from up-to-date dynamic economic big data, which contributes to selecting economic attributes that indicate logistics performance as reflected by the Logistics Performance Index (LPI). The analytical technique employs a high degree of productivity in machine learning (ML) for prediction or regression using adequate economic features. The goal of this research is to determine the ideal collection of economic attributes that best characterize a particular anticipated variable for predicting a country’s logistics performance. In addition, several potential ML regression algorithms may be used to optimize prediction accuracy. The feature selection of filter techniques of correlation and principal component analysis (PCA), as well as the embedded technique of LASSO and Elastic-net regression, is utilized. Then, based on the selected features, the ML regression approaches artificial neural network (ANN), multi-layer perceptron (MLP), support vector regression (SVR), random forest regression (RFR), and Ridge regression are used to train and validate the data set. The findings demonstrate that the PCA and Elastic-net feature sets give the closest to adequate performance based on the error measurement criteria. A feature union and intersection procedure of an acceptable feature set are used to make a more precise decision. Finally, the union of feature sets yields the best results. The findings suggest that ML algorithms are capable of assisting in the selection of a proper set of economic factors that indicate a country's logistics performance. Furthermore, the ANN was shown to be the best effective prediction model in this investigation.
Collapse
|
7
|
Buyrukoğlu S, Yılmaz Y, Topalcengiz Z. Correlation value determined to increase Salmonella prediction success of deep neural network for agricultural waters. ENVIRONMENTAL MONITORING AND ASSESSMENT 2022; 194:373. [PMID: 35435507 DOI: 10.1007/s10661-022-10050-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 04/09/2022] [Indexed: 06/14/2023]
Abstract
The use of computer-based tools has been becoming popular in the field of produce safety. Various algorithms have been applied to predict the population and presence of indicator microorganisms and pathogens in agricultural water sources. The purpose of this study is to improve the Salmonella prediction success of deep feed-forward neural network (DFNN) in agricultural surface waters with a determined correlation value based on selected features. Datasets were collected from six agricultural ponds in Central Florida. The most successful physicochemical and environmental features were selected by the gain ratio for the prediction of generic Escherichia coli population with machine learning algorithms (decision tree, random forest, support vector machine). Salmonella prediction success of DFNN was evaluated with dataset including selected environmental and physicochemical features combined with predicted E. coli populations with and without correlation value. The performance of correlation value was evaluated with all possible mathematical dataset combinations (nCr) of six ponds. The higher accuracy performances (%) were achieved through DFNN analyses with correlation value between 88.89 and 98.41 compared to values with no correlation value from 83.68 to 96.99 for all dataset combinations. The findings emphasize the success of determined correlation value for the prediction of Salmonella presence in agricultural surface waters.
Collapse
Affiliation(s)
- Selim Buyrukoğlu
- Department of Computer Engineering, Faculty of Engineering, Çankırı Karatekin University, 18100, Çankırı, Turkey.
| | - Yıldıran Yılmaz
- Computer Engineering Department, Faculty of Engineering and Architecture, Recep Tayyip Erdogan University, 53020, Rize, Turkey
| | - Zeynal Topalcengiz
- Department of Food Engineering, Faculty of Engineering and Architecture, Muş Alparslan University, 49250, Muş, Turkey
| |
Collapse
|
8
|
Rubeck LM, Wells JE, Hanford KJ, Durso LM, Schacht WH, Berry ED. Management-intensive grazing impacts on total Escherichia coli, E. coli O157:H7, and antibiotic resistance genes in a riparian stream. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 817:152611. [PMID: 34995584 DOI: 10.1016/j.scitotenv.2021.152611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/17/2021] [Accepted: 12/18/2021] [Indexed: 06/14/2023]
Abstract
The impacts of management-intensive grazing (MIG) of cattle on concentrations of total Escherichia coli, total suspended solids (TSS), and nitrate-nitrite nitrogen (NO3 + NO2-N), and occurrence of E. coli O157:H7 and selected antibiotic resistance genes (ARGs) in stream water and/or sediments were evaluated. Cattle were grazed for two-week periods in May in each of three years. Overall, grazing increased total E. coli in downstream water by 0.89 log10 MPN/100 mL (p < 0.0001), and downstream total E. coli concentrations were higher than upstream over all sampling intervals. Downstream TSS levels also increased (p ≤ 0.0294) during grazing. In contrast, there was a main effect of treatment for downstream NO3 + NO2-N to be lower than upstream (3.59 versus 3.70 mg/L; p = 0.0323). Overwintering mallard ducks increased total E. coli and TSS concentrations in January and February (p < 0.05). For precipitation events during the 24 h before sampling, each increase of 1.00 cm of rainfall increased total E. coli by 0.49 log10 MPN/100 mL (p = 0.0005). In contrast, there was no association of previous 24 h precipitation volume on TSS (p = 0.1540), and there was a negative linear effect on NO3 + NO2-N (p = 0.0002). E. coli O157:H7 prevalence was low, but the pathogen was detected downstream up to 2½ months after grazing. Examination of ARGs sul1, ermB, blactx-m-32, and intI1 identified the need for additional research to understand the impact of grazing on the ecology of these resistance determinants in pasture-based cattle production. While E. coli remained higher in downstream water compared to upstream, MIG may reduce the magnitude of the downstream E. coli concentrations. Likewise, the MIG strategy may prevent large increases in TSS and NO3 + NO2-N concentrations during heavy rain events. Results indicate that MIG can limit the negative effects of cattle grazing on stream water quality.
Collapse
Affiliation(s)
- Laura M Rubeck
- University of Nebraska-Lincoln, U.S. Meat Animal Research Center, 844 Road 313, Clay Center, NE 68933, USA
| | - James E Wells
- USDA, Agricultural Research Service, U.S. Meat Animal Research Center, 844 Road 313, Clay Center, NE 68933, USA
| | - Kathryn J Hanford
- University of Nebraska-Lincoln, Department of Statistics, 343A Hardin Hall, Lincoln, NE 68583, USA
| | - Lisa M Durso
- USDA, Agricultural Research Service, Agroecosystem Management Research Unit, 251 Filley Hall, University of Nebraska-Lincoln East Campus, Lincoln, NE 68583, USA
| | - Walter H Schacht
- University of Nebraska-Lincoln, Department of Agronomy and Horticulture, 202 Keim Hall, Lincoln, NE 68583, USA
| | - Elaine D Berry
- USDA, Agricultural Research Service, U.S. Meat Animal Research Center, 844 Road 313, Clay Center, NE 68933, USA.
| |
Collapse
|
9
|
Kim T, Lee D, Shin J, Kim Y, Cha Y. Learning hierarchical Bayesian networks to assess the interaction effects of controlling factors on spatiotemporal patterns of fecal pollution in streams. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 812:152520. [PMID: 34953848 DOI: 10.1016/j.scitotenv.2021.152520] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 11/28/2021] [Accepted: 12/14/2021] [Indexed: 06/14/2023]
Abstract
The dynamics of fecal indicator bacteria, such as fecal coliforms (FC) in streams, are influenced by the interactions of a myriad of factors. To predict complex spatiotemporal patterns of FC in streams and assess the relative importance of numerous controlling factors, the adoption of a hierarchical Bayesian network (HBN) was proposed in this study. By introducing latent variables correlated to the observed variables into a Bayesian network, the HBN can represent causal relationships among a large set of variables with a multilevel hierarchy. The study area encompasses 215 sites across the watersheds of the four major rivers in South Korea. The monitoring data collected during the 2012-2019 period included 32 input variables pertaining to meteorology, geography, soil characteristics, land cover, urbanization index, livestock density, and point sources. As model endpoints, the exceedance probability of the FC standard concentration as well as two pollution characteristics (i.e., pollution degree and type), derived from FC load duration curves were used. The probability of exceeding an FC threshold value (200 CFU/100 mL) showed spatiotemporal variations, whereas pollution degree and type showed spatial variations that represent long-term severity and relative dominance of nonpoint and point source fecal pollution, respectively. The conceptual model was validated using structural equation modeling to develop the HBN. The results demonstrate that the HBN effectively simplified the model structure, while showing strong model performance (AUC = 0.81, accuracy = 0.74). The results of the sensitivity analysis indicate that land cover is the most important factor in predicting the probability of exceedance and pollution degree, whereas the urbanization index explains most of the variability in pollution type. Furthermore, the results of the scenario analysis suggest that the HBN provides an interpretable framework in which the interaction of controlling factors has causal relationships at different levels that can be identified and visualized.
Collapse
Affiliation(s)
- TaeHo Kim
- School of Environment Engineering, University of Seoul, 163, Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - DoYeon Lee
- School of Environment Engineering, University of Seoul, 163, Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - Jihoon Shin
- School of Environment Engineering, University of Seoul, 163, Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - YoungWoo Kim
- School of Environment Engineering, University of Seoul, 163, Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - YoonKyung Cha
- School of Environment Engineering, University of Seoul, 163, Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea.
| |
Collapse
|
10
|
Stocker MD, Pachepsky YA, Hill RL. Prediction of E. coli Concentrations in Agricultural Pond Waters: Application and Comparison of Machine Learning Algorithms. Front Artif Intell 2022; 4:768650. [PMID: 35088045 PMCID: PMC8787305 DOI: 10.3389/frai.2021.768650] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 12/13/2021] [Indexed: 11/13/2022] Open
Abstract
The microbial quality of irrigation water is an important issue as the use of contaminated waters has been linked to several foodborne outbreaks. To expedite microbial water quality determinations, many researchers estimate concentrations of the microbial contamination indicator Escherichia coli (E. coli) from the concentrations of physiochemical water quality parameters. However, these relationships are often non-linear and exhibit changes above or below certain threshold values. Machine learning (ML) algorithms have been shown to make accurate predictions in datasets with complex relationships. The purpose of this work was to evaluate several ML models for the prediction of E. coli in agricultural pond waters. Two ponds in Maryland were monitored from 2016 to 2018 during the irrigation season. E. coli concentrations along with 12 other water quality parameters were measured in water samples. The resulting datasets were used to predict E. coli using stochastic gradient boosting (SGB) machines, random forest (RF), support vector machines (SVM), and k-nearest neighbor (kNN) algorithms. The RF model provided the lowest RMSE value for predicted E. coli concentrations in both ponds in individual years and over consecutive years in almost all cases. For individual years, the RMSE of the predicted E. coli concentrations (log10 CFU 100 ml-1) ranged from 0.244 to 0.346 and 0.304 to 0.418 for Pond 1 and 2, respectively. For the 3-year datasets, these values were 0.334 and 0.381 for Pond 1 and 2, respectively. In most cases there was no significant difference (P > 0.05) between the RMSE of RF and other ML models when these RMSE were treated as statistics derived from 10-fold cross-validation performed with five repeats. Important E. coli predictors were turbidity, dissolved organic matter content, specific conductance, chlorophyll concentration, and temperature. Model predictive performance did not significantly differ when 5 predictors were used vs. 8 or 12, indicating that more tedious and costly measurements provide no substantial improvement in the predictive accuracy of the evaluated algorithms.
Collapse
Affiliation(s)
- Matthew D. Stocker
- Environmental Microbial and Food Safety Laboratory, United States Department of Agriculture–Agricultural Research Service, Beltsville, MD, United States
- Oak Ridge Institute for Science and Education, Oak Ridge, TN, United States
- Department of Environmental Science and Technology, University of Maryland, College Park, MD, United States
| | - Yakov A. Pachepsky
- Environmental Microbial and Food Safety Laboratory, United States Department of Agriculture–Agricultural Research Service, Beltsville, MD, United States
| | - Robert L. Hill
- Department of Environmental Science and Technology, University of Maryland, College Park, MD, United States
| |
Collapse
|
11
|
Tousi EG, Duan JG, Gundy PM, Bright KR, Gerba CP. Evaluation of E. coli in sediment for assessing irrigation water quality using machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 799:149286. [PMID: 34388882 DOI: 10.1016/j.scitotenv.2021.149286] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 07/03/2021] [Accepted: 07/22/2021] [Indexed: 06/13/2023]
Abstract
Fresh produce irrigated with contaminated water poses a substantial risk to human health. This study evaluated the impact of incorporating sediment information on improving the performance of machine learning models to quantify E. coli level in irrigation water. Field samples were collected from irrigation canals in the Southwest U.S., for which meteorological, chemical, and physical water quality variables as well as three additional flow and sediment properties: the concentration of E. coli in sediment, sediment median size, and bed shear stress. Water quality was classified based on E. coli concentration exceeding two standard levels: 1 E. coli and 126 E. coli colony forming units (CFU) per 100 ml of irrigation water. Two series of features, including (FIS) and excluding (FES) sediment features, were selected using multi-variant filter feature selection. The correlation analysis revealed the inclusion of sediment features improves the correlation with the target standards for E. coli compared to the models excluding these features. Support vector machine, logistic regression, and ridge classifier were tested in this study. The support vector machine model performed the best for both targeted standards. Besides, incorporating sediment features improved all models' performance. Therefore, the concentration of E. coli in sediment and bed shear stress are major factors influencing E. coli concentration in irrigation water.
Collapse
Affiliation(s)
- Erfan Ghasemi Tousi
- Department of Civil & Architectural Engineering and Mechanics, The University of Arizona, 1209 E. 2nd St., Tucson, AZ, USA
| | - Jennifer G Duan
- Department of Civil & Architectural Engineering and Mechanics, The University of Arizona, 1209 E. 2nd St., Tucson, AZ, USA.
| | - Patricia M Gundy
- Department of Environmental Science, The University of Arizona, Water & Energy Sustainable Technology (WEST) Center, 2959 W. Calle Agua Nueva, Tucson, AZ 85745, USA
| | - Kelly R Bright
- Department of Environmental Science, The University of Arizona, Water & Energy Sustainable Technology (WEST) Center, 2959 W. Calle Agua Nueva, Tucson, AZ 85745, USA
| | - Charles P Gerba
- Department of Environmental Science, The University of Arizona, Water & Energy Sustainable Technology (WEST) Center, 2959 W. Calle Agua Nueva, Tucson, AZ 85745, USA
| |
Collapse
|
12
|
Jang J, Abbas A, Kim M, Shin J, Kim YM, Cho KH. Prediction of antibiotic-resistance genes occurrence at a recreational beach with deep learning models. WATER RESEARCH 2021; 196:117001. [PMID: 33744657 DOI: 10.1016/j.watres.2021.117001] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 02/27/2021] [Accepted: 03/01/2021] [Indexed: 06/12/2023]
Abstract
Antibiotic resistance genes (ARGs) have been reported to threaten the public health of beachgoers worldwide. Although ARG monitoring and beach guidelines are necessary, substantial efforts are required for ARG sampling and analysis. Accordingly, in this study, we predicted ARGs occurrence that are primarily found on the coast after rainfall using a conventional long short-term memory (LSTM), LSTM-convolutional neural network (CNN) hybrid model, and input attention (IA)-LSTM. To develop the models, 10 categories of environmental data collected at 30-min intervals and concentration data of 4 types of major ARGs (i.e., aac(6'-Ib-cr), blaTEM, sul1, and tetX) obtained at the Gwangalli Beach in South Korea, between 2018 and 2019 were used. When individually predicting ARGs occurrence, the conventional LSTM and IA-LSTM exhibited poor R2 values during training and testing. In contrast, the LSTM-CNN exhibited a 2-6-times improvement in accuracy over those of the conventional LSTM and IA-LSTM. However, when predicting all ARGs occurrence simultaneously, the IA-LSTM model exhibited a superior performance overall compared to that of LSTM-CNN. Additionally, the influence of environmental variables on prediction was investigated using the IA-LSTM model, and the ranges of input variables that affect each ARG were identified. Consequently, this study demonstrated the possibility of predicting the occurrence and distribution of major ARGs at the beach based on various environmental variables, and the results are expected to contribute to management of ARG occurrence at a recreational beach.
Collapse
Affiliation(s)
- Jiyi Jang
- Department of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology (UNIST), 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan, 44919 South Korea
| | - Ather Abbas
- Department of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology (UNIST), 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan, 44919 South Korea
| | - Minjeong Kim
- Division of Radioactive Waste Disposal Research, Korea Atomic Energy Research Institute (KAERI), 989-111, Daedeok-daero, Yuseong-gu, Daejeon, 34057, South Korea
| | - Jingyeong Shin
- Department of Civil and Environmental Engineering, Hanyang University, 222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, South Korea
| | - Young Mo Kim
- Department of Civil and Environmental Engineering, Hanyang University, 222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, South Korea
| | - Kyung Hwa Cho
- Department of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology (UNIST), 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan, 44919 South Korea.
| |
Collapse
|
13
|
Modeling and Prioritizing Interventions Using Pollution Hotspots for Reducing Nutrients, Atrazine and E. coli Concentrations in a Watershed. SUSTAINABILITY 2020. [DOI: 10.3390/su13010103] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Excess nutrients and herbicides remain two major causes of waterbody impairment globally. In an attempt to better understand pollutant sources in the Big Sandy Creek Watershed (BSCW) and the prospects for successful remediation, a program was initiated to assist agricultural producers with the implementation of best management practices (BMPs). The objectives were to (1) simulate BMPs within hotspots to determine reductions in pollutant loads and (2) to determine if water-quality standards are met at the watershed outlet. Regression-based load estimator (LOADEST) was used for determining sediment, nutrient and atrazine loads, while artificial neural networks (ANN) were used for determining E. coli concentrations. With respect to reducing sediment, total nitrogen and total phosphorus loads at hotspots with individual BMPs, implementing grassed waterways resulted in average reductions of 97%, 53% and 65% respectively if implemented all over the hotspots. Although reducing atrazine application rate by 50% in all hotspots was the most effective BMP for reducing atrazine concentrations (21%) at the gauging station 06883940, this reduction was still six times higher than the target concentration. Similarly, with grassed waterways established in all hotspots, the 64% reduction in E. coli concentration was not enough to meet the target at the gauging station. With scaled-down acreage based on the proposed implementation plan, filter strip led to more pollutant reductions at the targeted hotspots. Overall, a combination of filter strip, grassed waterway and atrazine rate reduction will most likely yield measureable improvement both in the hotspots (>20% reduction in sediment, total nitrogen and total phosphorus pollution) and at the gauging station. Despite the model’s uncertainties, the results showed a possibility of using Soil and Water Assessment Tool (SWAT) to assess the effectiveness of various BMPs in agricultural watersheds.
Collapse
|