1
|
Liu T, Zhang H, Wu J, Liu W, Fang Y. Wastewater treatment process enhancement based on multi-objective optimization and interpretable machine learning. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 364:121430. [PMID: 38875983 DOI: 10.1016/j.jenvman.2024.121430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 04/22/2024] [Accepted: 06/07/2024] [Indexed: 06/16/2024]
Abstract
Optimization and control of wastewater treatment process (WTP) can contribute to cost reduction and efficiency. A wastewater treatment process multi-objective optimization (WTPMO) framework is proposed in this paper to provide suggestions for decision-making in setting parameters of WTP. Firstly, the prediction models based on Extreme Gradient Boosting (XGB) with Bayesian optimization (BO) are developed for predicting effluent water quality (EQ) and energy consumption (EC) for different influent quality and process parameter settings. Then, the SHapley Additive exPlanations (SHAP) algorithm is used to complement the interpretability of machine learning to quantitatively evaluate the impact of different features on the predicted targets. Finally, the Non-dominated Sorting Genetic Algorithm II (NSGA-II) with the Technique for Ordering Preferences on Similarity of Ideal Solutions (TOPSIS) is introduced to solve and make decisions on the multi-objective optimization problem. The WTPMO applicability is validated on Benchmark Simulation Model 1 (BSM1). The results show that BOXGB achieves accurate prediction for EQ and EC with R2 values of 0.923 and 0.965, respectively, indicating that BO can effectively select the model hyperparameters in XGB. Based on SHAP supplemented the interpretability of the model to fully explain how the influent water quality and decision variables affect the EQ and EC of the WTP. In addition, the optimized process parameters are determined based on NSGA-II and TOPSIS, and the EC optimization rate is 1.552% while guaranteeing water quality compliance. Overall, this research can effectively achieve the optimization of WTP, ensure that the effluent water quality meets the standards while reducing energy consumption, assist Wastewater treatment plants (WWTPs) to achieve more intelligent and efficient operation and maintenance management, and provide strong support for environmental protection and sustainable development goals.
Collapse
Affiliation(s)
- Tianxiang Liu
- National Center of Technology Innovation for Digital Construction, Huazhong University of Science & Technology, Wuhan, Hubei, 430074, China; School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Heng Zhang
- National Center of Technology Innovation for Digital Construction, Huazhong University of Science & Technology, Wuhan, Hubei, 430074, China; School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Junhao Wu
- National Center of Technology Innovation for Digital Construction, Huazhong University of Science & Technology, Wuhan, Hubei, 430074, China; School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Wenli Liu
- National Center of Technology Innovation for Digital Construction, Huazhong University of Science & Technology, Wuhan, Hubei, 430074, China; School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China.
| | - Yihai Fang
- Department of Civil Engineering, Monash University, Clayton, 3800, Victoria, Australia
| |
Collapse
|
2
|
Zhang S, Zhao J, Zhu L. Predicting removal efficiency of organic pollutants by soil vapor extraction based on an optimized machine learning method. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 927:172438. [PMID: 38614354 DOI: 10.1016/j.scitotenv.2024.172438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 04/07/2024] [Accepted: 04/10/2024] [Indexed: 04/15/2024]
Abstract
Soil vapor extraction (SVE) was a cost-effective technology for remediating volatile and semi-volatile organic contaminated soils. Many factors, including SVE parameters, soil properties, and contaminant characteristics, significantly influenced the remediation efficiency of SVE. The optimal conditions for organic pollutants removal efficiency were site-specific and varied among studies. Therefore, a generalized model was needed to predict the remediation efficiency of SVE in organic contaminated soils. This study employed machine learning to predict the removal efficiency of organic pollutants by SVE. The model's development was based on a trainset, and its predictive capabilities were evaluated using a testset. An XGBoost (XGB) model was derived from literature data (R2 = 0.9728). Time, pollutant type, and temperature were identified as the three most important features affecting SVE remediation efficiency. The accuracy (R2 = 0.9799) and universality of the model were enhanced through an optimization scheme. The developed XGB model demonstrated the ability to predict the removal efficiency of organic pollutants by considering all collected influential factors. The mechanism of multi-factor interaction on remediation efficiency was clarified. Overall, this study would contribute to evaluating the remediation potential of SVE for specific organic contaminated soils, aiding in maximizing the removal efficiency of organic pollutants under optimal conditions.
Collapse
Affiliation(s)
- Shuai Zhang
- College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China; Zhejiang Provincial Key Laboratory of Organic Pollution Process and Control, Hangzhou, Zhejiang 310058, China
| | - Jiating Zhao
- College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China; Zhejiang Provincial Key Laboratory of Organic Pollution Process and Control, Hangzhou, Zhejiang 310058, China
| | - Lizhong Zhu
- College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China; Zhejiang Provincial Key Laboratory of Organic Pollution Process and Control, Hangzhou, Zhejiang 310058, China.
| |
Collapse
|
3
|
Oliullah K, Rasel MH, Islam MM, Islam MR, Wadud MAH, Whaiduzzaman M. A stacked ensemble machine learning approach for the prediction of diabetes. J Diabetes Metab Disord 2024; 23:603-617. [PMID: 38932863 PMCID: PMC11196524 DOI: 10.1007/s40200-023-01321-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 09/22/2023] [Indexed: 06/28/2024]
Abstract
Objectives Diabetes has become a leading cause of mortality in both developed and developing countries, impacting a growing number of individuals worldwide. As the prevalence of the disease continues to rise, researchers have diligently worked towards developing accurate diabetes prediction models. The primary aim of this study is to utilize a diverse set of machine learning algorithms to detect the presence of diabetes, particularly in females, at an early stage. By leveraging these methods, this research seeks to provide physicians with valuable tools to identify the disease early, enabling timely interventions and improving patient outcomes. Methods In this study, some state-of-the-art machine learning techniques, such as random forest classifiers with gridsearchCV, XGBoost, NGBoost, Bagging, LightGBM, and AdaBoost classifiers, were employed. These models were chosen as the base layer of our proposed stacked ensemble model because of their high accuracy. Before feeding the data into the models, the dataset was preprocessed to ensure optimal performance and obtain improved results. Results The accuracy achieved in this study was 92.91%, which demonstrates its competitiveness with the existing approaches. Moreover, the utilization of the Shapley additive explanation (SHAP) facilitated the interpretation of machine learning models. Conclusion We anticipate that these findings will be beneficial to healthcare providers, stakeholders, students, and researchers involved in diabetes prediction research and development.
Collapse
Affiliation(s)
- Khondokar Oliullah
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh
| | - Mahedi Hasan Rasel
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh
| | - Md. Manzurul Islam
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh
| | - Md. Reazul Islam
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh
| | - Md. Anwar Hussen Wadud
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh
| | - Md. Whaiduzzaman
- School of Information Systems, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
4
|
Tseng KY, Hsieh YT, Lin HC. Machine learning prediction on wetland succession and the impact of artificial structures from a decade of field data. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 937:173426. [PMID: 38796015 DOI: 10.1016/j.scitotenv.2024.173426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 05/19/2024] [Accepted: 05/19/2024] [Indexed: 05/28/2024]
Abstract
The artificial structures can influence wetland topology and sediment properties, thereby shaping plant distribution and composition. Macrobenthos composition was correlated with plant cover. Previous studies on the impact of artificial structures on plant distribution are scarce in incorporating time-series data or extended field surveys. In this study, a machine-learning-based species distribution model with decade-long observation was analyzed to investigate the correlation between the shift in the distribution of B. planiculmis, artificial structure-induced elevation changes and the expansion of other plants, as well as their connection to soil properties and crab composition dynamics under plants in Gaomei Wetland. Long short-term memory model (LSTM) with Shapley additive explanations (SHAP) was employed for predicting the distribution of B. planiculmis and explaining feature importance. The results indicated that wetland topology was influenced by both artificial structures and plants. Areas initially colonized by B. planiculmis were replaced by other species. Soil properties showed significant differences among plant patches; however, principal component analysis (PCA) of sediment properties and niche similarity analysis showed that the niche of plants was overlapped. Crab composition was different under different plants. The presence probability of B. planiculmis near woody paths decreased according to LSTM and field survey data. SHAP analysis suggested that the distribution of other plants, historical distribution of B. planiculmis and sediment properties significantly contributed to the presence probability of B. planiculmis. A sharp decrease in SHAP values with increasing NDVI at suitable elevations, overlap in PCA of sediment properties and niche similarity indicated potential competition among plants. This decade-long time-series field survey revealed the joint effects of artificial structure and vegetation on the topology and soil properties dynamics. These changes influenced the plant distribution through potential plant competition. LSTM with SHAP provided valuable insights in the underlying the mechanisms of artificial structure effects on the plant zonation process.
Collapse
Affiliation(s)
- Kuang-Yu Tseng
- Department of Life Science, Tunghai University, Taichung 407, Taiwan
| | - Yun-Ting Hsieh
- Department of Life Science, Tunghai University, Taichung 407, Taiwan
| | - Hui-Chen Lin
- Department of Life Science, Tunghai University, Taichung 407, Taiwan; Center for Ecology and Environment, Tunghai University, Taiwan.
| |
Collapse
|
5
|
Yang Z, Sun Y, Gao S, Yu Q, Zhao Y, Huo Y, Wan Z, Huang S, Wang Y, Gu X. General Model for Predicting Response of Gas-Sensitive Materials to Target Gas Based on Machine Learning. ACS Sens 2024; 9:2509-2519. [PMID: 38642064 DOI: 10.1021/acssensors.4c00186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/22/2024]
Abstract
Gas sensors play a crucial role in various industries and applications. In recent years, there has been an increasing demand for gas sensors in society. However, the current method for screening gas-sensitive materials is time-, energy-, and cost-consuming. Consequently, an imperative exists to enhance the screening efficiency. In this study, we proposed a collaborative screening strategy through integration of density functional theory and machine learning. Taking zinc oxide (ZnO) as an example, the responsiveness of ZnO to the target gas was determined quickly on the basis of the changes in the electronic state and structure before and after gas adsorption. In this work, the adsorption energy and electronic and structural characteristics of ZnO after adsorbing 24 kinds of gases were calculated. These computed features served as the basis for training a machine learning model. Subsequently, various machine learning and evaluation algorithms were utilized to train the fast screening model. The importance of feature values was evaluated by the AdaBoost, Random Forest, and Extra Trees models. Specifically, charge transfer was assigned importance values of 0.160, 0.127, and 0.122, respectively, ranking as the highest among the 11 features. Following closely was the d-band center, which was presumed to exert influence on electrical conductivity and, consequently, adsorption properties. With 5-fold cross-validation using the Extra Tree accuracy, the 24-sample data set achieved an accuracy of 88%. The 72-sample data set achieved an accuracy of 78% using multilayer perceptron after 5-fold cross-validation, with both data sets exhibiting low standard deviations. This verified the accuracy and reliability of the strategy, showcasing its potential for rapidly screening a material's responsiveness to the target gas.
Collapse
Affiliation(s)
- Zijiang Yang
- School of Materials Science and Physics, China University of Mining and Technology, Xuzhou 221116, Jiangsu, China
| | - Yujiao Sun
- School of Materials Science and Physics, China University of Mining and Technology, Xuzhou 221116, Jiangsu, China
| | - Shasha Gao
- School of Materials Science and Physics, China University of Mining and Technology, Xuzhou 221116, Jiangsu, China
| | - Qiuchen Yu
- School of Materials Science and Physics, China University of Mining and Technology, Xuzhou 221116, Jiangsu, China
| | - Yizhe Zhao
- National Narcotics Laboratory Beijing Regional Center, Beijing 100164, China
| | - Yumeng Huo
- National Narcotics Laboratory Beijing Regional Center, Beijing 100164, China
| | - Zixin Wan
- National Narcotics Laboratory Beijing Regional Center, Beijing 100164, China
| | - Sheng Huang
- School of Materials Science and Physics, China University of Mining and Technology, Xuzhou 221116, Jiangsu, China
- School of Safety Engineering, China University of Mining and Technology, Xuzhou 221116, Jiangsu, China
| | - Yanyan Wang
- National Narcotics Laboratory Beijing Regional Center, Beijing 100164, China
| | - Xiuquan Gu
- School of Materials Science and Physics, China University of Mining and Technology, Xuzhou 221116, Jiangsu, China
| |
Collapse
|
6
|
Jun BM, Chae SH, Kim D, Jung JY, Kim TJ, Nam SN, Yoon Y, Park C, Rho H. Adsorption of uranyl ion on hexagonal boron nitride for remediation of real U-contaminated soil and its interpretation using random forest. JOURNAL OF HAZARDOUS MATERIALS 2024; 469:134072. [PMID: 38522201 DOI: 10.1016/j.jhazmat.2024.134072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/09/2024] [Accepted: 03/16/2024] [Indexed: 03/26/2024]
Abstract
Acid leaching has been widely applied to treat contaminated soil, however, it contains several inorganic pollutants. The decommissioning of nuclear power plants introduces radioactive and soluble U(VI), a substance posing chemical toxicity to humans. Our investigation sought to ascertain the efficacy of hexagonal boron nitride (h-BN), an highly efficient adsorbent, in treating U(VI) in wastewater. The adsorption equilibrium of U(VI) by h-BN reached saturation within a mere 2 h. The adsorption of U(VI) by h-BN appears to be facilitated through electrostatic attraction, as evidenced by the observed impact of pH variations, acidic agents (i.e., HCl or H2SO4), and the presence of background ions on the adsorption performance. A reusability test demonstrated the successful completion of five cycles of adsorption/desorption, relying on the surface characteristics of h-BN as influenced by solution pH. Based on the experimental variables of initial U(VI) concentration, exposure time, temperature, pH, and the presence of background ions/organic matter, a feature importance analysis using random forest (RF) was carried out to evaluate the correlation between performances and conditions. To the best of our knowledge, this study is the first attempt to conduct the adsorption of U(VI) generated from real contaminated soil by h-BN, followed by interpretation of the correlation between performance and conditions using RF. Lastly, a. plausible adsorption mechanism between U(VI) and h-BN was explained based on the experimental results, characterizations, and a. comparison with previous adsorption studies on the removal of heavy metals by h-BN.
Collapse
Affiliation(s)
- Byung-Moon Jun
- Radwaste Management Center, Korea Atomic Energy Research Institute (KAERI), 111 Daedeok-Daero 989beon-gil, Yuseong-Gu, Daejeon 34057, Republic of Korea
| | - Sung Ho Chae
- Center for Water Cycle Research, Korea Institute of Science and Technology (KIST), Seoul 02792, Republic of Korea
| | - Deokhwan Kim
- Department of Environment Research, Korea Institute of Civil Engineering and Building Technology (KICT), 283 Goyang-Daero, Ilsanseo-Gu, Goyang-si, Gyeonggi-do 10223, Republic of Korea; Department of Civil and Environment Engineering, University of Science and Technology (UST), 217 Gajeong-Ro, Yuseong-Gu, Daejeon 34113, Republic of Korea
| | - Jun-Young Jung
- Radwaste Management Center, Korea Atomic Energy Research Institute (KAERI), 111 Daedeok-Daero 989beon-gil, Yuseong-Gu, Daejeon 34057, Republic of Korea
| | - Tack-Jin Kim
- Radwaste Management Center, Korea Atomic Energy Research Institute (KAERI), 111 Daedeok-Daero 989beon-gil, Yuseong-Gu, Daejeon 34057, Republic of Korea
| | - Seong-Nam Nam
- Department of Chemical and Environmental Science, Korea Army Academy, Yeong-Cheon 495 Hoguk-ro, Gokyeong-myeon, Yeongcheon-si, Gyeongsangbuk-do, Republic of Korea
| | - Yeomin Yoon
- Department of Environmental Science and Engineering, Ewha Womans University, 52 Ewhayeodae-gil, Seodaemun-gu, Seoul 03760, Republic of Korea
| | - Chanhyuk Park
- Department of Environmental Science and Engineering, Ewha Womans University, 52 Ewhayeodae-gil, Seodaemun-gu, Seoul 03760, Republic of Korea
| | - Hojung Rho
- Department of Environment Research, Korea Institute of Civil Engineering and Building Technology (KICT), 283 Goyang-Daero, Ilsanseo-Gu, Goyang-si, Gyeonggi-do 10223, Republic of Korea; Department of Civil and Environment Engineering, University of Science and Technology (UST), 217 Gajeong-Ro, Yuseong-Gu, Daejeon 34113, Republic of Korea.
| |
Collapse
|
7
|
Yang C, Huebner ES, Tian L. Prediction of suicidal ideation among preadolescent children with machine learning models: A longitudinal study. J Affect Disord 2024; 352:403-409. [PMID: 38387673 DOI: 10.1016/j.jad.2024.02.070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 02/15/2024] [Accepted: 02/19/2024] [Indexed: 02/24/2024]
Abstract
BACKGROUND Machine learning (ML) has been widely used to predict suicidal ideation (SI) in adolescents and adults. Nevertheless, studies of accurate and efficient models of SI prediction with preadolescent children are still needed because SI is surprisingly prevalent during the transition into adolescence. This study aimed to explore the potential of ML models to predict SI among preadolescent children. METHODS A total of 4691 Chinese children (54.89 % boys, Mage = 10.92 at baseline) and their parents completed relevant measures at baseline and the children provided 6-month follow-up data for SI. The current study compared four ML models: Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM), and Multilayer Perceptron (MLP), to predict SI and to identify variables with predictive value based on the best-performing model among Chinese preadolescent children. RESULTS The RF model achieved the highest discriminant performance with an AUC of 0.92, accuracy of 0.93 (balanced accuracy = 0.88). The factors of internalizing problems, externalizing problems, neuroticism, childhood maltreatment, and subjective well-being in school demonstrated the highest values in predicting SI. CONCLUSION The findings of this study suggested that ML models based on the observation and assessment of children's general characteristics and experiences in everyday life can serve as convenient screening and evaluation tools for suicide risk assessment among Chinese preadolescent children. The findings also provide insights for early intervention.
Collapse
Affiliation(s)
- Chi Yang
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents, South China Normal University, Ministry of Education, Guangzhou 510631, People's Republic of China; School of Psychology, South China Normal University, Guangzhou 510631, People's Republic of China
| | - E Scott Huebner
- Department of Psychology, University of South Carolina, Columbia, SC 29208, USA
| | - Lili Tian
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents, South China Normal University, Ministry of Education, Guangzhou 510631, People's Republic of China.
| |
Collapse
|
8
|
Su G, Jiang P. Machine learning models for predicting biochar properties from lignocellulosic biomass torrefaction. BIORESOURCE TECHNOLOGY 2024; 399:130519. [PMID: 38437964 DOI: 10.1016/j.biortech.2024.130519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/14/2024] [Accepted: 02/29/2024] [Indexed: 03/06/2024]
Abstract
This study developed six machine learning models to predict the biochar properties from the dry torrefaction of lignocellulosic biomass by using biomass characteristics and torrefaction conditions as input variables. After optimization, gradient boosting machines were the optimal model, with the highest coefficient of determination ranging from 0.89 to 0.94. Torrefaction conditions exhibited a higher relative contribution to the yield and higher heating value (HHV) of biochar than biomass characteristics. Temperature was the dominant contributor to the elemental and proximate composition and the yield and HHV of biochar. Feature importance and SHapley Additive exPlanations revealed the effect of each influential factor on the target variables and the interactions between these factors in torrefaction. Software that can accurately predict the element, yield, and HHV of biochar was developed. These findings provide a comprehensive understanding of the key factors and their interactions influencing the torrefaction process and biochar properties.
Collapse
Affiliation(s)
- Guangcan Su
- Department of Mechanical Engineering, Faculty of Engineering, University of Malaya, Kuala Lumpur 50603, Malaysia; Centre for Energy Sciences, University of Malaya, Kuala Lumpur 50603, Malaysia.
| | - Peng Jiang
- State Key Laboratory of Materials-oriented Chemical Engineering, College of Chemical Engineering, Nanjing Tech University, Nanjing 211816, China
| |
Collapse
|
9
|
Kang X, Zhao Y, Yao L, Tan Z. Explainable machine learning for predicting the geographical origin of Chinese Oysters via mineral elements analysis. Curr Res Food Sci 2024; 8:100738. [PMID: 38659973 PMCID: PMC11039350 DOI: 10.1016/j.crfs.2024.100738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 04/06/2024] [Accepted: 04/12/2024] [Indexed: 04/26/2024] Open
Abstract
The traceability of geographic origin is essential for guaranteeing the quality, safety, and protection of oyster brands. However, the current outcomes of traceability lack credibility as they do not adequately explain the model's predictions. Consequently, we conducted a study to evaluate the efficacy of utilizing explainable machine learning combined with mineral elements analysis. The study findings revealed that 18 elements have the ability to determine regional orientation. Simultaneously, individuals should pay closer attention to the potential risks associated with oyster consumption due to the regional differences in essential and toxic elements they contain. Light gradient boosting machine (LightGBM) model exhibited indistinguishable performance, achieving flawless accuracy, precision, recall, F1 score and AUC, with values of 96.77%, 96.43%, 98.53%, 97.32% and 0.998, respectively. The SHapley Additive exPlanations (SHAP) method was used to evaluate the output of the LightGBM model, revealing differences in feature interactions among oysters from different provinces. Specifically, the features Na, Zn, V, Mg, and K were found to have a significant impact on the predictive process of the model. Consistent with existing research, the use of explainable machine learning techniques can provide insights into the complex connections between important product attributes and relevant geographical information.
Collapse
Affiliation(s)
- Xuming Kang
- Key Laboratory of Testing and Evaluation for Aquatic Product Safety and Quality, Ministry of Agriculture and Rural Affairs, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, 266071, China
| | - Yanfang Zhao
- Key Laboratory of Testing and Evaluation for Aquatic Product Safety and Quality, Ministry of Agriculture and Rural Affairs, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, 266071, China
| | - Lin Yao
- Key Laboratory of Testing and Evaluation for Aquatic Product Safety and Quality, Ministry of Agriculture and Rural Affairs, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, 266071, China
| | - Zhijun Tan
- Key Laboratory of Testing and Evaluation for Aquatic Product Safety and Quality, Ministry of Agriculture and Rural Affairs, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, 266071, China
- Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, 266071, China
- Collaborative Innovation Center of Seafood Deep Processing, Dalian Polytechnic University, Dalian, 116034, China
| |
Collapse
|
10
|
Gholizadeh M, Saeedi R, Bagheri A, Paeezi M. Machine learning-based prediction of effluent total suspended solids in a wastewater treatment plant using different feature selection approaches: A comparative study. ENVIRONMENTAL RESEARCH 2024; 246:118146. [PMID: 38215928 DOI: 10.1016/j.envres.2024.118146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 12/31/2023] [Accepted: 01/05/2024] [Indexed: 01/14/2024]
Abstract
Accurately predicting the characteristics of effluent, discharged from wastewater treatment plants (WWTPs) is crucial for reducing sampling requirements, labor, costs, and environmental pollution. Machine learning (ML) techniques can be effective in achieving this goal. To optimize ML-based models, various feature selection (FS) methods are employed. This study aims to investigate the impact of six FS methods (categorized as Wrapper, Filter, and Embedded methods) on the accuracy of three supervised ML algorithms in predicting total suspended solids (TSS) concentration in the effluent of a municipal wastewater treatment plant. Based on the features proposed by each FS method, five distinct scenarios were defined. Within each scenario, three ML algorithms, namely artificial neural network-multi layer perceptron (ANN-MLP), K-nearest neighbors (KNN), and adaptive boosting (AdaBoost) were applied. The features utilized for predicting TSS concentration in the WWTP effluent included BOD5, COD, TSS, TN, NH3 in the influent, and BOD5, COD, residual Cl2, NO3, TN, NH4 in the effluent. To construct the models, the dataset was randomly divided into training and testing subsets, and K-fold cross-validation was employed to control overfitting and underfitting. The evaluation metrics that are used are root mean squared error (RMSE), mean absolute error (MAE), and correlation coefficient (R2). The most efficient scenario was identified as Scenario IV, with the Sequential Backward Selection FS method. The features selected by this method were CODe, BOD5e, BOD5i, TNi. Furthermore, the ANN-MLP algorithm demonstrated the best performance, achieving the highest R2 value. This algorithm exhibited acceptable performance in both the training and testing subsets (R2 = 0.78 and R2 = 0.8, respectively).
Collapse
Affiliation(s)
- Mahdi Gholizadeh
- Environmental and Occupational Hazards Control Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran; Department of Health, Safety and Environment, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Reza Saeedi
- Department of Health, Safety and Environment, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran; Workplace Health Promotion Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amin Bagheri
- Environmental and Occupational Hazards Control Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran; Department of Health, Safety and Environment, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Mohammad Paeezi
- Department of Health, Safety and Environment, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran; Workplace Health Promotion Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
11
|
Zeng F, Su X, Liang X, Liao M, Zhong H, Xu J, Gou W, Zhang X, Shen L, Zheng JS, Chen YM. Gut microbiome features and metabolites in non-alcoholic fatty liver disease among community-dwelling middle-aged and older adults. BMC Med 2024; 22:104. [PMID: 38454425 PMCID: PMC10921631 DOI: 10.1186/s12916-024-03317-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 02/23/2024] [Indexed: 03/09/2024] Open
Abstract
BACKGROUND The specific microbiota and associated metabolites linked to non-alcoholic fatty liver disease (NAFLD) are still controversial. Thus, we aimed to understand how the core gut microbiota and metabolites impact NAFLD. METHODS The data for the discovery cohort were collected from the Guangzhou Nutrition and Health Study (GNHS) follow-up conducted between 2014 and 2018. We collected 272 metadata points from 1546 individuals. The metadata were input into four interpretable machine learning models to identify important gut microbiota associated with NAFLD. These models were subsequently applied to two validation cohorts [the internal validation cohort (n = 377), and the prospective validation cohort (n = 749)] to assess generalizability. We constructed an individual microbiome risk score (MRS) based on the identified gut microbiota and conducted animal faecal microbiome transplantation experiment using faecal samples from individuals with different levels of MRS to determine the relationship between MRS and NAFLD. Additionally, we conducted targeted metabolomic sequencing of faecal samples to analyse potential metabolites. RESULTS Among the four machine learning models used, the lightGBM algorithm achieved the best performance. A total of 12 taxa-related features of the microbiota were selected by the lightGBM algorithm and further used to calculate the MRS. Increased MRS was positively associated with the presence of NAFLD, with odds ratio (OR) of 1.86 (1.72, 2.02) per 1-unit increase in MRS. An elevated abundance of the faecal microbiota (f__veillonellaceae) was associated with increased NAFLD risk, whereas f__rikenellaceae, f__barnesiellaceae, and s__adolescentis were associated with a decreased presence of NAFLD. Higher levels of specific gut microbiota-derived metabolites of bile acids (taurocholic acid) might be positively associated with both a higher MRS and NAFLD risk. FMT in mice further confirmed a causal association between a higher MRS and the development of NAFLD. CONCLUSIONS We confirmed that an alteration in the composition of the core gut microbiota might be biologically relevant to NAFLD development. Our work demonstrated the role of the microbiota in the development of NAFLD.
Collapse
Affiliation(s)
- Fangfang Zeng
- Department of Public Health and Preventive Medicine, School of Medicine, Jinan University, No.601 Huangpu Road West, Guangzhou, 510632, China.
- Department of Epidemiology, Guangdong Provincial Key Laboratory of Food, Nutrition and Health, School of Public Health, Sun Yat-Sen University, Guangzhou, 510275, China.
| | - Xin Su
- Department of Public Health and Preventive Medicine, School of Medicine, Jinan University, No.601 Huangpu Road West, Guangzhou, 510632, China
| | - Xinxiu Liang
- Zhejiang Key Laboratory of Multi-Omics in Infection and Immunity, School of Medicine and School of Life Sciences, Westlake University, Hangzhou, 310030, China
| | - Minqi Liao
- Institute of Epidemiology, Helmholtz Zentrum München-German Research Center for Environmental Health, Ingolstädter Landstr. 1, 85764, Neuherberg, Germany
| | - Haili Zhong
- Department of Epidemiology, Guangdong Provincial Key Laboratory of Food, Nutrition and Health, School of Public Health, Sun Yat-Sen University, Guangzhou, 510275, China
| | - Jinjian Xu
- Department of Epidemiology, Guangdong Provincial Key Laboratory of Food, Nutrition and Health, School of Public Health, Sun Yat-Sen University, Guangzhou, 510275, China
| | - Wanglong Gou
- Zhejiang Key Laboratory of Multi-Omics in Infection and Immunity, School of Medicine and School of Life Sciences, Westlake University, Hangzhou, 310030, China
| | - Xiangzhou Zhang
- Big Data Decision Institute, Jinan University, No.601 Huangpu Road West, Guangzhou, 510632, China
| | - Luqi Shen
- Zhejiang Key Laboratory of Multi-Omics in Infection and Immunity, School of Medicine and School of Life Sciences, Westlake University, Hangzhou, 310030, China
| | - Ju-Sheng Zheng
- Zhejiang Key Laboratory of Multi-Omics in Infection and Immunity, School of Medicine and School of Life Sciences, Westlake University, Hangzhou, 310030, China.
| | - Yu-Ming Chen
- Department of Epidemiology, Guangdong Provincial Key Laboratory of Food, Nutrition and Health, School of Public Health, Sun Yat-Sen University, Guangzhou, 510275, China.
| |
Collapse
|
12
|
Ye G, Wan J, Deng Z, Wang Y, Chen J, Zhu B, Ji S. Prediction of effluent total nitrogen and energy consumption in wastewater treatment plants: Bayesian optimization machine learning methods. BIORESOURCE TECHNOLOGY 2024; 395:130361. [PMID: 38286171 DOI: 10.1016/j.biortech.2024.130361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 01/18/2024] [Accepted: 01/18/2024] [Indexed: 01/31/2024]
Abstract
The control of effluent total nitrogen (TN) and total energy consumption (TEC) is a key issue in managing wastewater treatment plants. In this study, effluent TN and TEC predictive models were established by selecting influent water quality and process control indicators as input features. The prediction performance of machine learning methods under different random seeds was explored, the moving average method was used for data amplification, and the Bayesian algorithm was used for hyperparameter optimization. The results showed that compared with the traditional hyperparameter optimization method for effluent TN prediction, the coefficient of determination (R2) increased by 0.092 and 0.067, reaching 0.725, and the root mean square error (RMSE) decreased by 0.262 and 0.215 mg/L, reaching 1.673 mg/L, respectively, after Bayesian optimization and data amplification. During TEC prediction, R2 increased by 0.068 and 0.042, reaching 0.884, and the RMSE decreased by 232.444 and 197.065 kWh, reaching 1305.829 kWh, respectively.
Collapse
Affiliation(s)
- Gang Ye
- College of Environment and Energy, South China University of Technology, Guangzhou 510006, China
| | - Jinquan Wan
- College of Environment and Energy, South China University of Technology, Guangzhou 510006, China.
| | - Zhicheng Deng
- College of Environment and Energy, South China University of Technology, Guangzhou 510006, China
| | - Yan Wang
- College of Environment and Energy, South China University of Technology, Guangzhou 510006, China
| | - Jian Chen
- College of Environment and Energy, South China University of Technology, Guangzhou 510006, China
| | - Bin Zhu
- Guangdong Shunkong Zihua Technology Co, Ltd, Foshan 528300, China
| | - Shiming Ji
- Guangdong Shunkong Zihua Technology Co, Ltd, Foshan 528300, China
| |
Collapse
|
13
|
Manav-Demir N, Gelgor HB, Oz E, Ilhan F, Ulucan-Altuntas K, Tiwary A, Debik E. Effluent parameters prediction of a biological nutrient removal (BNR) process using different machine learning methods: A case study. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 351:119899. [PMID: 38159310 DOI: 10.1016/j.jenvman.2023.119899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 12/16/2023] [Accepted: 12/19/2023] [Indexed: 01/03/2024]
Abstract
This paper proposes a novel targeted blend of machine learning (ML) based approaches for controlling wastewater treatment plant (WWTP) operation by predicting distributions of key effluent parameters of a biological nutrient removal (BNR) process. Two years of data were collected from Plajyolu wastewater treatment plant in Kocaeli, Türkiye and the effluent parameters were predicted using six machine learning algorithms to compare their performances. Based on mean absolute percentage error (MAPE) metric only, support vector regression machine (SVRM) with linear kernel method showed a good agreement for COD and BOD5, with the MAPE values of about 9% and 0.9%, respectively. Random Forest (RF) and EXtreme Gradient Boosting (XGBoost) regression were found to be the best algorithms for TN and TP effluent parameters, with the MAPE values of about 34% and 27%, respectively. Further, when the results were evaluated together according to all the performance metrics, RF, SVRM (with both linear kernel and RBF kernel), and Hybrid Regression algorithms generally made more successful predictions than Light GBM and XGBoost algorithms for all the parameters. Through this case study we demonstrated selective application of ML algorithms can be used to predict different effluent parameters more effectively. Wider implementation of this approach can potentially reduce the resource demands for active monitoring the environmental performance of WWTPs.
Collapse
Affiliation(s)
- Neslihan Manav-Demir
- Yildiz Technical University, Environmental Engineering Department, Esenler, Istanbul, 34220, Turkey.
| | - Huseyin Baran Gelgor
- Yildiz Technical University, Environmental Engineering Department, Esenler, Istanbul, 34220, Turkey
| | - Ersoy Oz
- Yildiz Technical University, Statistics Department, Esenler, Istanbul, 34220, Turkey.
| | - Fatih Ilhan
- Yildiz Technical University, Environmental Engineering Department, Esenler, Istanbul, 34220, Turkey
| | - Kubra Ulucan-Altuntas
- Istanbul Technical University, Environmental Engineering Department, Maslak, Istanbul, 34469, Turkey
| | - Abhishek Tiwary
- De Montfort University, School of Engineering and Sustainable Development, The Gateway, Leicester, LE1 9BH, United Kingdom
| | - Eyup Debik
- Yildiz Technical University, Environmental Engineering Department, Esenler, Istanbul, 34220, Turkey
| |
Collapse
|
14
|
Kerem A, Yuce E. Electrical energy recovery from wastewater: prediction with machine learning algorithms. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:125019-125032. [PMID: 36462079 DOI: 10.1007/s11356-022-24482-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 11/26/2022] [Indexed: 06/17/2023]
Abstract
Wind, solar, biomass, tidal, etc. are renewable energy sources obtained from natural sources. Among these resources, biomass can be characterized as a significant energy source. Today, the process of producing biogas from waste and turning it into electrical energy has become more popular. So, clean, sustainable, and eco-friendly energy is generated as the waste is managed and converted into electrical energy. The estimation of the electrical energy that will be produced by wastewater recovery using machine learning (ML) algorithms is vital and has not yet been investigated. Thus, this study fills this gap. In this study, it is aimed to predict the electrical energy recovery potential of the sewage sludge of Kahramanmaraş Advanced Biological Wastewater Treatment Plant (KABWWTP) (Turkey), through incineration and anaerobic digestion. For this aim, 6 distinct ML algorithms including linear regression (LR), extreme gradient boosting (XGB), Gaussian process regression (GPR), ridge regression (RR), Lasso regression (LASReg), and Bayesian ridge regression (BR) have been used. Another novelty in this study is the restricted number of input parameters. That is, the electrical energy (output parameter) is predicted using only 3 distinct input parameters (gas flow, conductivity, and TSS). With a MAPE value of 1.032, the XGB method has been determined as the most successful model. Heat mapping and correlation analyses are used to evaluate the relationship between these parameters. Performance results are presented in tables and graphs.
Collapse
Affiliation(s)
- Alper Kerem
- Department of Electrical Electronics Engineering, Engineering and Architecture Faculty, Kahramanmaraş Sütçü İmam University, Kahramanmaraş, Turkey.
| | - Ekrem Yuce
- Department of Electrical Electronics Engineering, Engineering and Architecture Faculty, Kahramanmaraş Sütçü İmam University, Kahramanmaraş, Turkey
| |
Collapse
|
15
|
Liu K, Zhang Y, He H, Xiao H, Wang S, Zhang Y, Li H, Qian X. Time series prediction of the chemical components of PM 2.5 based on a deep learning model. CHEMOSPHERE 2023; 342:140153. [PMID: 37714468 DOI: 10.1016/j.chemosphere.2023.140153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 08/26/2023] [Accepted: 09/11/2023] [Indexed: 09/17/2023]
Abstract
Modeling-based prediction methods enable rapid, reagent-free air pollution detection based on inexpensive multi-source data than traditional chemical reaction-based detection methods in order to quickly understand the air pollution situation. In this study, a convolutional neural network (CNN) and long and short-term memory (LSTM) neural networks are integrated to create a CNN-LSTM time series prediction model to predict the concentration of PM2.5 and its chemical components (i.e., heavy metals, carbon component, and water-soluble ions) using meteorological data and air pollutants (PM2.5, SO2, NO2, CO, and O3). In the integrated CNN-LSTM model, the CNN uses convolutional and pooling layers to extract features from the data, whereas the powerful nonlinear mapping and learning capabilities of LSTM enable the time series prediction of air pollution. The experimental results showed that the CNN-LSTM exhibited good generalization ability in the prediction of As, Cd, Cr, Cu, Ni, and Zn, with a mean R2 above 0.9. Mean R2 predicted for PM2.5, Pb, Ti, EC, OC, SO42-, and NO3- ranged from 0.85 to 0.9. Shapley value showed that PM2.5, NO2, SO2, and CO had a greater influence on the predicted heavy metal results of the model. Regarding water-soluble ions, the predicted results were dominantly influenced by PM2.5, CO, and humidity. The prediction of the carbon fraction was affected mainly by the PM2.5 concentration. Additionally, several input variables for various components were eliminated without affecting the prediction accuracy of the model, with R2 between 0.70 and 0.84, thereby maximizing modeling efficiency and lowering operational costs. The fully trained model prediction results showed that most predicted components of PM2.5 were lower during January to March 2020 than those in 2018 and 2019. This study provides insight into improving the accuracy of modeling-based detection methods and promotes the development of integrated air pollution monitoring toward a more sustainable direction.
Collapse
Affiliation(s)
- Kai Liu
- School of Environment, Nanjing Normal University, Nanjing 210023, PR China
| | - Yuanhang Zhang
- School of Environment, Nanjing Normal University, Nanjing 210023, PR China
| | - Huan He
- School of Environment, Nanjing Normal University, Nanjing 210023, PR China; Jiangsu Province Engineering Research Center of Environmental Risk Prevention and Emergency Response Technology, Nanjing 210023, PR China
| | - Hui Xiao
- School of Environment, Nanjing Normal University, Nanjing 210023, PR China
| | - Siyuan Wang
- School of Environment, Nanjing Normal University, Nanjing 210023, PR China
| | - Yuteng Zhang
- School of Environment, Nanjing Normal University, Nanjing 210023, PR China
| | - Huiming Li
- School of Environment, Nanjing Normal University, Nanjing 210023, PR China; Jiangsu Province Engineering Research Center of Environmental Risk Prevention and Emergency Response Technology, Nanjing 210023, PR China.
| | - Xin Qian
- Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science & Technology, Nanjing 210044, PR China
| |
Collapse
|
16
|
Guo X, Xiong H, Li H, Gui X, Hu X, Li Y, Cui H, Qiu Y, Zhang F, Ma C. Designing dynamic groundwater management strategies through a composite groundwater vulnerability model: Integrating human-related parameters into the DRASTIC model using LightGBM regression and SHAP analysis. ENVIRONMENTAL RESEARCH 2023; 236:116871. [PMID: 37573023 DOI: 10.1016/j.envres.2023.116871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 07/20/2023] [Accepted: 08/09/2023] [Indexed: 08/14/2023]
Abstract
Groundwater nitrate contamination has emerged as a pressing global concern. Given its potential for long-term impacts on aquifers, protective measures should primarily focus on prevention. Drawing on the theory of groundwater vulnerability (GV), the original DRASTIC model and parameters related to human activities are employed as inputs and integrated with the LightGBM regression algorithm to facilitate nitrate index (NI) prediction tasks. The SHAP analysis is conducted to effectively examine the contribution of parameters to the NI prediction and interpret the issue of parameter interactions. In addition, to mitigate the limitations of the intrinsic GV model, a composite nitrate index (CNI) is developed by linearly combining the DRASTIC index with the NI. The framework presented in this study provides adaptive strategies for managing groundwater resources over different time periods. A representative region for arid and semiarid climates, the Yinchuan region, is studied using the framework. As compared to 2012, the intrinsic GV index has changed spatially in 2022. Human activities have increased the influence of the nitrate concentration as shown by the Pearson correlation coefficient of -0.082 between the DRASTIC index and nitrate concentration. A significant increase in pollution levels was predicted by NI, ranging from -0.116 to 0.968. According to SHAP analysis, the significant increase in NI levels in 2022 was mainly due to high-value industrial and agricultural production. In 2022, 12.02% of the areas had an increase of at least 0.549 in the CNI. 42.1% of the areas were classified as moderate or high CNI levels. The farm was identified as a high-contributing source to nitrate pollution. The small-scale agricultural and livestock activities in non-urban areas also contribute to groundwater pollution. Dynamic groundwater management strategies need to be implemented in high-growth and high-level CNI areas.
Collapse
Affiliation(s)
- Xu Guo
- School of Environmental Studies, China University of Geosciences, Wuhan, 430074, China.
| | - Hanxiang Xiong
- School of Environmental Studies, China University of Geosciences, Wuhan, 430074, China
| | - Haixue Li
- School of Environmental Studies, China University of Geosciences, Wuhan, 430074, China; Center for Hydrogeology and Environmental Geology Survey, China Geological Survey, Baoding, 071051, Hebei, China
| | | | - Xiaojing Hu
- School of Environmental Studies, China University of Geosciences, Wuhan, 430074, China
| | - Yonggang Li
- School of Environmental Studies, China University of Geosciences, Wuhan, 430074, China
| | - Hao Cui
- School of Environmental Studies, China University of Geosciences, Wuhan, 430074, China
| | - Yang Qiu
- School of Environmental Studies, China University of Geosciences, Wuhan, 430074, China
| | - Fawang Zhang
- School of Environmental Studies, China University of Geosciences, Wuhan, 430074, China; Center for Hydrogeology and Environmental Geology Survey, China Geological Survey, Baoding, 071051, Hebei, China.
| | - Chuanming Ma
- School of Environmental Studies, China University of Geosciences, Wuhan, 430074, China.
| |
Collapse
|
17
|
Bellamoli F, Di Iorio M, Vian M, Melgani F. Machine learning methods for anomaly classification in wastewater treatment plants. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 344:118594. [PMID: 37473555 DOI: 10.1016/j.jenvman.2023.118594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 06/21/2023] [Accepted: 07/03/2023] [Indexed: 07/22/2023]
Abstract
Modern wastewater treatment plants base their biological processes on advanced control systems which ensure compliance with discharge limits and minimize energy consumption responding to information from on-line probes. The correct readings of probes are particularly crucial for intermittent aeration controllers, which rely on real-time measurements of ammonia and oxygen in biological tanks. These data are also an important resource for developing artificial intelligence algorithms that can identify process or sensor anomalies, thus guiding the choices of plant operators and automatic process controllers. However, using anomaly detection and classification algorithms in real-time wastewater treatment is challenging because of the noisy nature of sensor measurements, the difficulty of obtaining labeled real-plant data, and the complex and interdependent mechanisms that govern biological processes. This work aims at thoroughly exploring the performance of machine learning methods in detecting and classifying the main anomalies in plants operating with intermittent aeration. Using oxygen, ammonia and aeration power measurements from a set of plants in Italy, we perform both binary and multiclass classification, and we compare them through a rigorous validation procedure that includes a test on an unknown dataset, proposing a new evaluation protocol. The classification methods explored are support vector machine, multilayer perceptron, random forest, and two gradient boosting methods (LightGBM and XGBoost). The best performance was achieved using the gradient boosting ensemble algorithms, with up to 96% of anomalies detected and up to 84% and 62% of anomalies classified correctly on the first and second datasets respectively.
Collapse
Affiliation(s)
- Francesca Bellamoli
- University of Trento, Department of Information Engineering and Computer Science, via Sommarive 9, Trento, 38123, Italy; ETC Sustainable Solutions Srl, via dei Palustei 16, Trento, 38121, Italy.
| | | | - Marco Vian
- ETC Sustainable Solutions Srl, via dei Palustei 16, Trento, 38121, Italy
| | - Farid Melgani
- University of Trento, Department of Information Engineering and Computer Science, via Sommarive 9, Trento, 38123, Italy
| |
Collapse
|
18
|
Ghaheri P, Nasiri H, Shateri A, Homafar A. Diagnosis of Parkinson's disease based on voice signals using SHAP and hard voting ensemble method. Comput Methods Biomech Biomed Engin 2023:1-17. [PMID: 37771234 DOI: 10.1080/10255842.2023.2263125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 09/17/2023] [Indexed: 09/30/2023]
Abstract
Parkinson's disease (PD) is the second most common progressive neurological condition after Alzheimer's. The significant number of individuals afflicted with this illness makes it essential to develop a method to diagnose the conditions in their early phases. PD is typically identified from motor symptoms or via other Neuroimaging techniques. Expensive, time-consuming, and unavailable to the general public, these methods are not very accurate. Another issue to be addressed is the black-box nature of machine learning methods that needs interpretation. These issues encourage us to develop a novel technique using Shapley additive explanations (SHAP) and Hard Voting Ensemble Method based on voice signals to diagnose PD more accurately. Another purpose of this study is to interpret the output of the model and determine the most important features in diagnosing PD. The present article uses Pearson Correlation Coefficients to understand the relationship between input features and the output. Input features with high correlation are selected and then classified by the Extreme Gradient Boosting, Light Gradient Boosting Machine, Gradient Boosting, and Bagging. Moreover, the weights in Hard Voting Ensemble Method are determined based on the performance of the mentioned classifiers. At the final stage, it uses SHAP to determine the most important features in PD diagnosis. The effectiveness of the proposed method is validated using 'Parkinson Dataset with Replicated Acoustic Features' from the UCI machine learning repository. It has achieved an accuracy of 85.42%. The findings demonstrate that the proposed method outperformed state-of-the-art approaches and can assist physicians in diagnosing Parkinson's cases.
Collapse
Affiliation(s)
- Paria Ghaheri
- Electrical and Computer Engineering Department, Semnan University, Semnan, Iran
| | - Hamid Nasiri
- Department of Computer Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
| | - Ahmadreza Shateri
- Electrical and Computer Engineering Department, Semnan University, Semnan, Iran
| | - Arman Homafar
- Electrical and Computer Engineering Department, Semnan University, Semnan, Iran
| |
Collapse
|
19
|
Yan Y, Shi T, Bao X, Gai Y, Liang X, Jiang Y, Li Q. Combined network analysis and interpretable machine learning reveals the environmental adaptations of more than 10,000 ruminant microbial genomes. Front Microbiol 2023; 14:1147007. [PMID: 37799596 PMCID: PMC10548237 DOI: 10.3389/fmicb.2023.1147007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 08/28/2023] [Indexed: 10/07/2023] Open
Abstract
Background The ruminant gastrointestinal contains numerous microbiomes that serve a crucial role in sustaining the host's productivity and health. In recent times, numerous studies have revealed that variations in influencing factors, including the environment, diet, and host, contribute to the shaping of gastrointestinal microbial adaptation to specific states. Therefore, understanding how host and environmental factors affect gastrointestinal microbes will help to improve the sustainability of ruminant production systems. Results Based on a graphical analysis perspective, this study elucidates the microbial topology and robustness of the gastrointestinal of different ruminant species, showing that the microbial network is more resistant to random attacks. The risk of transmission of high-risk metagenome-assembled genome (MAG) was also demonstrated based on a large-scale survey of the distribution of antibiotic resistance genes (ARG) in the microbiota of most types of ecosystems. In addition, an interpretable machine learning framework was developed to study the complex, high-dimensional data of the gastrointestinal microbial genome. The evolution of gastrointestinal microbial adaptations to the environment in ruminants were analyzed and the adaptability changes of microorganisms to different altitudes were identified, including microbial transcriptional repair. Conclusion Our findings indicate that the environment has an impact on the functional features of microbiomes in ruminant. The findings provide a new insight for the future development of microbial resources for the sustainable development in agriculture.
Collapse
Affiliation(s)
- Yueyang Yan
- Key Laboratory for Zoonoses Research of the Ministry of Education, Institute of Zoonosis, College of Veterinary Medicine, Jilin University, Changchun, China
| | - Tao Shi
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Xin Bao
- Department of Stomatology, Taian Central Hospital, Tai'an, Shandong, China
| | - Yunpeng Gai
- School of Grassland Science, Beijing Forestry University, Beijing, China
| | - Xingxing Liang
- School of Grassland Science, Beijing Forestry University, Beijing, China
| | - Yu Jiang
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Qiushi Li
- Key Laboratory for Zoonoses Research of the Ministry of Education, Institute of Zoonosis, College of Veterinary Medicine, Jilin University, Changchun, China
- Department of Stomatology, The Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, Guangdong, China
| |
Collapse
|
20
|
Esteki B, Masoomi M, Moosazadeh M, Yoo C. Data-Driven Prediction of Janus/Core-Shell Morphology in Polymer Particles: A Machine-Learning Approach. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2023; 39:4943-4958. [PMID: 36999232 DOI: 10.1021/acs.langmuir.2c03355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The majority of research on Janus particles prepared by solvent evaporation-induced phase separation technique uses models based on interfacial tension or free energy to predict Janus/core-shell morphology. Data-driven predictions, in contrast, utilize multiple samples to identify patterns and outliers. Using machine-learning algorithms and explainable artificial intelligence (XAI) analysis, we developed a model based on a 200-instance data set to predict particle morphology. As model features, simplified molecular input line entry system syntax identifies explanatory variables, including cohesive energy density, molar volume, the Flory-Huggins interaction parameter of polymers, and the solvent solubility parameter. Our most accurate ensemble classifiers predict morphology with an accuracy of 90%. In addition, we employ innovative XAI tools to interpret system behavior, suggesting phase-separated morphology to be most affected by solvent solubility, polymer cohesive energy difference, and blend composition. While polymers with cohesive energy densities above a certain threshold favor the core-shell structure, systems with weak intermolecular interactions favor the Janus structure. The correlation between molar volume and morphology suggests that increasing the size of polymer repeating units favors Janus particles. Additionally, the Janus structure is preferred when the Flory-Huggins interaction parameter exceeds 0.4. XAI analysis introduces feature values that generate the thermodynamically low driving force of phase separation, resulting in kinetically stable morphologies as opposed to thermodynamically stable ones. The Shapley plots of this study also reveal novel methods for creating Janus or core-shell particles based on solvent evaporation-induced phase separation by selecting feature values that strongly favor a given morphology.
Collapse
Affiliation(s)
- Bahareh Esteki
- Department of Chemical Engineering, Polymer Group, Isfahan University of Technology, Isfahan 84156-83111, Iran
| | - Mahmood Masoomi
- Department of Chemical Engineering, Polymer Group, Isfahan University of Technology, Isfahan 84156-83111, Iran
| | - Mohammad Moosazadeh
- Integrated Engineering Major, Department of Environmental Science and Engineering, Kyung Hee University, Seocheon-dong 1, Giheung-gu, Yongin-Si, Gyeonggi-Do 446-701, South Korea
| | - ChangKyoo Yoo
- Integrated Engineering Major, Department of Environmental Science and Engineering, Kyung Hee University, Seocheon-dong 1, Giheung-gu, Yongin-Si, Gyeonggi-Do 446-701, South Korea
| |
Collapse
|
21
|
Zhu X, Liu B, Sun L, Li R, Deng H, Zhu X, Tsang DCW. Machine learning-assisted exploration for carbon neutrality potential of municipal sludge recycling via hydrothermal carbonization. BIORESOURCE TECHNOLOGY 2023; 369:128454. [PMID: 36503096 DOI: 10.1016/j.biortech.2022.128454] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 12/02/2022] [Accepted: 12/04/2022] [Indexed: 06/17/2023]
Abstract
In the context of advocating carbon neutrality, there are new requirements for sustainable management of municipal sludge (MS). Hydrothermal carbonization (HTC) is a promising technology to deal with high-moisture MS considering its low energy consumption (without drying pretreatment) and value-added products (i.e., hydrochar). This study applied machine learning (ML) methods to conduct a holistic assessment with higher heating value (HHV) of hydrochar, carbon recovery (CR), and energy recovery (ER) as model targets, yielding accurate prediction models with R2 of 0.983, 0.844 and 0.858, respectively. Furthermore, MS properties showed positive (e.g., carbon content, HHV) and negative (e.g., ash content, O/C, and N/C) influences on the hydrochar HHV. By comparison, HTC parameters play a critical role for CR (51.7%) and ER (52.5%) prediction. The primary sludge was an optimal HTC feedstock while anaerobic digestion sludge had the lowest potential. This study provided a comprehensive reference for sustainable MS treatment and industrial application.
Collapse
Affiliation(s)
- Xinzhe Zhu
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China; Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, Sun Yat-sen University, Guangzhou 510275, China
| | - Bingyou Liu
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China
| | - Lianpeng Sun
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China; Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, Sun Yat-sen University, Guangzhou 510275, China.
| | - Ruohong Li
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China; Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, Sun Yat-sen University, Guangzhou 510275, China
| | - Huanzhong Deng
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China
| | - Xiefei Zhu
- Department of Civil and Environmental Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China; Department of Thermal Science and Energy Engineering, University of Science and Technology of China, 96 Jinzhai Road, Hefei, Anhui 230026, China
| | - Daniel C W Tsang
- Department of Civil and Environmental Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China
| |
Collapse
|
22
|
Yi Z, Wu L. Identification of factors influencing net primary productivity of terrestrial ecosystems based on interpretable machine learning --evidence from the county-level administrative districts in China. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 326:116798. [PMID: 36435139 DOI: 10.1016/j.jenvman.2022.116798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 11/10/2022] [Accepted: 11/13/2022] [Indexed: 06/16/2023]
Abstract
Global climate change is rooted in the imbalance between carbon sources and sinks, and net-zero greenhouse gas emissions should focus not only on the source-side drivers but also on the sink-side influencing factors. Taking the county-level administrative districts in China as the sample, this study uses machine learning models to fit the relationship between socioeconomic development (SED) and net primary productivity (NPP) of terrestrial ecosystems. Moreover, it identifies key influencing factors and their effects based on the SHapley Additive exPlanations (SHAP) algorithm. The results show that the districts with low terrestrial NPP show the characteristics of agglomeration distribution. The eight key factors, in order, are as follows: agricultural development level, latitude, population size, longitude, animal husbandry development level, economic scale, time trend and industrialization level. In this study, via SHAP interaction plots, we found that the effects of population, economic growth, and industrialization on terrestrial NPP are regionally heterogeneous; via cluster analysis, we found the stage characteristics of the mode of SED affecting terrestrial NPP. Therefore, the conservation of terrestrial NPP needs to be combined with the stage changes of SED, as well as inter-regional differences, to develop a regionally coordinated and time-coherent ecological carbon sink conservation plan.
Collapse
Affiliation(s)
- Zhaoqiang Yi
- School of Economics and Management, Southeast University, Nanjing, 211189, China
| | - Lihua Wu
- School of Economics and Management, Southeast University, Nanjing, 211189, China.
| |
Collapse
|
23
|
Sadri Moghaddam S, Mesghali H. A new hybrid ensemble approach for the prediction of effluent total nitrogen from a full-scale wastewater treatment plant using a combined trickling filter-activated sludge system. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:1622-1639. [PMID: 35921006 DOI: 10.1007/s11356-022-21864-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 07/01/2022] [Indexed: 06/15/2023]
Abstract
In this study, different K-nearest neighbors (KNN), support vector regression (SVR), decision tree (DT), and random forest (RF) algorithms integrated with the Bayesian optimization algorithm (BOP) have been applied as novel hybrid modeling/optimization tools to predict the total nitrogen in treated wastewater of Southern Tehran Wastewater Treatment Plant (STWWTP). In order to enhance the outcomes of hybrid models, the chosen sub-models (the best and least correlated hybrid models) were used to generate voting average and stacked regression ensemble models. Throughout the preprocessing step, two alternative scenarios were used to handle missing values from the samples, including elimination versus estimation via linear interpolation. The results of this research demonstrated that ensemble models were better than individual hybrid models, although not all ensemble models were superior to single models. The results also revealed that the stacking regression ensemble model using KNN-BOP and SVR-BOP as sub-models was the most superior model among the developed models, with the coefficient of determination (R2) = 0.640, root mean squared error (RMSE) = 2.378, and mean absolute error (MAE) = 1.838 on the test data. The best hybrid ensemble model that can accurately predict the concentration of total nitrogen (TN) in the effluent can give people a heads-up about water pollution caused by eutrophication before it gets bad.
Collapse
Affiliation(s)
| | - Hassan Mesghali
- Faculty of Civil Engineering, K. N. Toosi University of Technology, Tehran, Iran
| |
Collapse
|
24
|
Zhao GY, Suzuki S, Deng JH, Fujita M. Machine learning estimation of biodegradable organic matter concentrations in municipal wastewater. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2022; 323:116191. [PMID: 36108510 DOI: 10.1016/j.jenvman.2022.116191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/29/2022] [Accepted: 09/03/2022] [Indexed: 06/15/2023]
Abstract
This study investigates whether a novel estimation method based on machine learning can feasibly predict the readily biodegradable chemical oxygen demand (RB-COD) and slowly biodegradable COD (SB-COD) in municipal wastewater from the oxidation-reduction potential (ORP) data of anoxic batch experiments. Anoxic batch experiments were conducted with highly mixed liquor volatile suspended solids under different RB-COD and SB-COD conditions. As the RB-COD increased, the ORP breakpoint appeared earlier, and fermentation occurred in the interior of the activated sludge, even under anoxic conditions. Therefore, the ORP decline rates before and after the breakpoint were significantly correlated with the RB-COD and SB-COD, respectively (p < 0.05). The two biodegradable CODs were estimated separately using six machine learning models: an artificial neural network (ANN), support vector regression (SVR), an ANN-based AdaBoost, a SVR-based AdaBoost, decision tree, and random forest. Against the ORP dataset, the RB-COD and SB-COD estimation correlation coefficients of SVR-based AdaBoost were 0.96 and 0.88, respectively. To identify which ORP data are useful for estimations, the ORP decline rates before and after the breakpoint were separately input as datasets to the estimation methods. All six machine learning models successfully estimated the two biodegradable CODs simultaneously with accuracies of ≥0.80 from only ORP time-series data. Sensitivity analysis using the Shapley additive explanation method demonstrated that the ORP decline rates before and after the breakpoint obviously contributed to the estimation of RB-COD and SB-COD, respectively, indicating that acquiring the ORP data with various decline rates before and after the breakpoint improved the estimations of RB-COD and SB-COD, respectively. This novel estimation method for RB-COD and SB-COD can assist the rapid control of biological wastewater treatment when the biodegradable organic matter concentration dynamically changes in influent wastewater.
Collapse
Affiliation(s)
- Guang-Yao Zhao
- Graduate School of Science and Engineering, Ibaraki University, Hitachi, Ibaraki, 316-8511, Japan
| | - Shunya Suzuki
- Graduate School of Science and Engineering, Ibaraki University, Hitachi, Ibaraki, 316-8511, Japan
| | - Jia-Hao Deng
- Graduate School of Science and Engineering, Ibaraki University, Hitachi, Ibaraki, 316-8511, Japan
| | - Masafumi Fujita
- Global and Local Environment Co-creation Institute, Ibaraki University, Hitachi, Ibaraki, 316-8511, Japan.
| |
Collapse
|
25
|
Xiang Q, Chen K, Peng L, Luo J, Jiang J, Chen Y, Lan L, Song H, Zhou X. Prediction of the trajectories of depressive symptoms among children in the adolescent brain cognitive development (ABCD) study using machine learning approach. J Affect Disord 2022; 310:162-171. [PMID: 35545159 DOI: 10.1016/j.jad.2022.05.020] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 03/02/2022] [Accepted: 05/05/2022] [Indexed: 02/08/2023]
Abstract
BACKGROUND Depression often first emerges during adolescence and evidence shows that the long-term patterns of depressive symptoms over time are heterogeneous. It is meaningful to predict the trajectory of depressive symptoms in adolescents to find early intervention targets. METHODS Based on the Adolescent Brain Cognitive Development Study, we included 4962 participants aged 9-10 who were followed-up for 2 years. Trajectories of depressive symptoms were identified by Latent Class Growth Analyses (LCGA). Four types of machine learning models were built to predict the identified trajectories and to obtain variables with predictive value based on the best performance model. RESULTS Of all participants, 536 (10.80%) were classified as increasing, 269 (5.42%) as persistently high, 433 (8.73%) as decreasing, and 3724 (75.05%) as persistently low by LCGA. Gradient Boosting Machine (GBM) model got the highest discriminant performance. Sleep quality, parental emotional state and family financial adversities were the most important predictors and three resting state functional magnetic resonance imaging functional connectivity data were also helpful to distinguish trajectories. LIMITATION We only have depressive symptom scores at three time points. Some valuable predictors are not specific to depression. External validation is an important next step. These predictors should not be interpreted as etiology and some variables were reported by parents/caregivers. CONCLUSION Using GBM combined with baseline characteristics, the trajectories of depressive symptoms with two years among adolescents aged 9-10 years can be well predicted, which might further facilitate the identification of adolescents at high risk of depressive symptoms and development of effective early interventions.
Collapse
Affiliation(s)
- Qu Xiang
- West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, China; Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Kai Chen
- School of Public Health, University of Texas Health Center at Houston, Houston, TX, USA
| | - Li Peng
- Department of Radiology, West China Hospital, Sichuan University, Chengdu, China
| | - Jiawei Luo
- West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, China; Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Jingwen Jiang
- West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, China; Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Yang Chen
- West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, China; Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Lan Lan
- West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, China; Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Huan Song
- West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, China; Med-X Center for Informatics, Sichuan University, Chengdu, China.
| | - Xiaobo Zhou
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
26
|
Prediction of Dichloroethene Concentration in the Groundwater of a Contaminated Site Using XGBoost and LSTM. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19159374. [PMID: 35954730 PMCID: PMC9367752 DOI: 10.3390/ijerph19159374] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/22/2022] [Accepted: 07/27/2022] [Indexed: 02/04/2023]
Abstract
Chlorinated aliphatic hydrocarbons (CAHs) are widely used in agriculture and industries and have become one of the most common groundwater contaminations. With the excellent performance of the deep learning method in predicting, LSTM and XGBoost were used to forecast dichloroethene (DCE) concentrations in a pesticide-contaminated site undergoing natural attenuation. The input variables included BTEX, vinyl chloride (VC), and five water quality indicators. In this study, the predictive performances of long short-term memory (LSTM) and extreme gradient boosting (XGBoost) were compared, and the influences of variables on models’ performances were evaluated. The results indicated XGBoost was more likely to capture DCE variation and was robust in high values, while the LSTM model presented better accuracy for all wells. The well with higher DCE concentrations would lower the model’s accuracy, and its influence was more evident in XGBoost than LSTM. The explanation of the SHapley Additive exPlanations (SHAP) value of each variable indicated high consistency with the rules of biodegradation in the real environment. LSTM and XGBoost could predict DCE concentrations through only using water quality variables, and LSTM performed better than XGBoost.
Collapse
|
27
|
Jiang Y, Li C, Song H, Wang W. Deep learning model based on urban multi-source data for predicting heavy metals (Cu, Zn, Ni, Cr) in industrial sewer networks. JOURNAL OF HAZARDOUS MATERIALS 2022; 432:128732. [PMID: 35334271 DOI: 10.1016/j.jhazmat.2022.128732] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 03/14/2022] [Accepted: 03/15/2022] [Indexed: 06/14/2023]
Abstract
The high concentrations of heavy metals in municipal industrial sewer networks will seriously impact the microorganisms of the activated sludge in the wastewater treatment plant (WWTP), thus deteriorating the effluent quality and destroying the stability of sewage treatment. Therefore, timely prediction and early warning of heavy metal concentrations in industrial sewer networks is crucial. However, due to the complex sources of heavy metals in industrial sewer networks, traditional physical modeling and linear methods cannot establish an accurate prediction model. Herein, we developed a Gated Recurrent Unit (GRU) neural network model based on a deep learning algorithm for predicting the concentrations of heavy metals in industrial sewer networks. To train the GRU model, we used low-cost and easy-to-obtain urban multi-source data, including socio-environmental indicator data, air environmental indicator data, water quantity indicator data, and easily measurable water quality indicator data. The model was applied to predict the concentrations of heavy metals (Cu, Zn, Ni, and Cr) in the sewer networks of an industrial area in southern China. The results are compared with the commonly used Artificial Neural Network (ANN) model. In this study, it was shown that the GRU had better prediction performance for Cu, Zn, Ni, and Cr concentrations, with the average R2 significantly increased by 12.35%, 11.94%, 9.21%, and 8.13%, respectively, compared to ANN predictions. The sensitivity analysis based on Shapley (SHAP) values revealed that conductivity (σ), temperature (T), pH, and sewage flow (Flow) contributed significantly to the prediction results of the model. Furthermore, the three input variables including air pressure (AP), land area (A), and population (Pop.) were removed without affecting the prediction performance of the model, which maximized the modeling efficiency and reduced the operational cost. This study provides an economical and feasible technical method for early warning of abnormal heavy metal concentrations in urban industrial sewer networks.
Collapse
Affiliation(s)
- Yiqi Jiang
- School of Civil and Environmental Engineering, Harbin Institute of Technology, Shenzhen 518055, China
| | - Chaolin Li
- School of Civil and Environmental Engineering, Harbin Institute of Technology, Shenzhen 518055, China; State Key Laboratory of Urban Water Resource and Environment, Harbin Institute of Technology, Harbin 150090, China.
| | - Hongxing Song
- Shenzhen Hydrology and Water Quality Center, Shenzhen 518038, China
| | - Wenhui Wang
- School of Civil and Environmental Engineering, Harbin Institute of Technology, Shenzhen 518055, China.
| |
Collapse
|
28
|
Fatahi R, Nasiri H, Dadfar E, Chehreh Chelgani S. Modeling of energy consumption factors for an industrial cement vertical roller mill by SHAP-XGBoost: a "conscious lab" approach. Sci Rep 2022; 12:7543. [PMID: 35534588 PMCID: PMC9085744 DOI: 10.1038/s41598-022-11429-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 04/25/2022] [Indexed: 11/30/2022] Open
Abstract
Cement production is one of the most energy-intensive manufacturing industries, and the milling circuit of cement plants consumes around 4% of a year's global electrical energy production. It is well understood that modeling and digitalizing industrial-scale processes would help control production circuits better, improve efficiency, enhance personal training systems, and decrease plants' energy consumption. This tactical approach could be integrated using conscious lab (CL) as an innovative concept in the internet age. Surprisingly, no CL has been reported for the milling circuit of a cement plant. A robust CL interconnect datasets originated from monitoring operational variables in the plants and translating them to human basis information using explainable artificial intelligence (EAI) models. By initiating a CL for an industrial cement vertical roller mill (VRM), this study conducted a novel strategy to explore relationships between VRM monitored operational variables and their representative energy consumption factors (output temperature and motor power). Using SHapley Additive exPlanations (SHAP) as one of the most recent EAI models accurately helped fill the lack of information about correlations within VRM variables. SHAP analyses highlighted that working pressure and input gas rate with positive relationships are the key factors influencing energy consumption. eXtreme Gradient Boosting (XGBoost) as a powerful predictive tool could accurately model energy representative factors by R-square ever 0.80 in the testing phase. Comparison assessments indicated that SHAP-XGBoost could provide higher accuracy for VRM-CL structure than conventional modeling tools (Pearson correlation, Random Forest, and Support vector regression.
Collapse
|
29
|
A Review on Interpretable and Explainable Artificial Intelligence in Hydroclimatic Applications. WATER 2022. [DOI: 10.3390/w14081230] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
This review focuses on the use of Interpretable Artificial Intelligence (IAI) and eXplainable Artificial Intelligence (XAI) models for data imputations and numerical or categorical hydroclimatic predictions from nonlinearly combined multidimensional predictors. The AI models considered in this paper involve Extreme Gradient Boosting, Light Gradient Boosting, Categorical Boosting, Extremely Randomized Trees, and Random Forest. These AI models can transform into XAI models when they are coupled with the explanatory methods such as the Shapley additive explanations and local interpretable model-agnostic explanations. The review highlights that the IAI models are capable of unveiling the rationale behind the predictions while XAI models are capable of discovering new knowledge and justifying AI-based results, which are critical for enhanced accountability of AI-driven predictions. The review also elaborates the importance of domain knowledge and interventional IAI modeling, potential advantages and disadvantages of hybrid IAI and non-IAI predictive modeling, unequivocal importance of balanced data in categorical decisions, and the choice and performance of IAI versus physics-based modeling. The review concludes with a proposed XAI framework to enhance the interpretability and explainability of AI models for hydroclimatic applications.
Collapse
|