1
|
Yao L, Han Y, Qi X, Huang D, Che H, Long X, Du Y, Meng L, Yao X, Zhang L, Chen Y. Determination of major drive of ozone formation and improvement of O 3 prediction in typical North China Plain based on interpretable random forest model. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 934:173193. [PMID: 38744393 DOI: 10.1016/j.scitotenv.2024.173193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 04/23/2024] [Accepted: 05/11/2024] [Indexed: 05/16/2024]
Abstract
O3 pollution in China has become prominent in recent years, and it has become one of the most challenging issues in air pollution control. We used data on atmospheric pollutants and meteorology from 2019 to 2021 to build an interpretable random forest (RF) model, applying this model to predict O3 concentration in 2022 in five cities in the Southwest North China Plain. The model was also used to identify and explain the influence of various factors on O3 formation. The correlation coefficient R2 between the predicted O3 concentration and observed O3 concentration was 0.82, the MAE was 15.15 μg/m3, and the RMSE was 20.29 μg/m3, indicating that the model can effectively predict O3 concentration in the studying area. The results of correlation analysis, feature importance, and the driving factor analysis from SHapley Additive exPlanations (SHAP) model indicated that temperature (T), NO2, and relative humidity (RH) are the top three features affecting O3 prediction, while the weights of wind speed and wind direction were relatively low. Thus, O3 in the southwestern North China Plain may mainly come from the formation of local photochemical activities. The dominant factors behind O3 also varied in different seasons. In spring and autumn, O3 pollution is more likely to occur under high NO2 concentration and high-temperature conditions, while in summer, it is more likely to occur under high-temperature and precipitation-free weather. In winter, NO2 is the dominant factor in O3 formation. Finally, the interpretable RF model is used to predict future O3 concentration based on features provided by Community Multiscale Air Quality (CMAQ) and Weather Research & Forecast (WRF) model, and the simulation performance of CMAQ on O3 concentration is enhanced to a certain extent, improving the prediction of future O3 pollution situations and guiding pollution control.
Collapse
Affiliation(s)
- Liyin Yao
- College of Environmental and Chemical Engineering, Chongqing Three Gorges University, Chongqing 404199, China; Research Center for Atmospheric Environment, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Yan Han
- Research Center for Atmospheric Environment, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Xin Qi
- Research Center for Atmospheric Environment, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Dasheng Huang
- Research Center for Atmospheric Environment, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Hanxiong Che
- Research Center for Atmospheric Environment, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Xin Long
- Research Center for Atmospheric Environment, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Yang Du
- Research Center for Atmospheric Environment, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Lingshuo Meng
- Research Center for Atmospheric Environment, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Xiaojiang Yao
- Research Center for Atmospheric Environment, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Liuyi Zhang
- College of Environmental and Chemical Engineering, Chongqing Three Gorges University, Chongqing 404199, China.
| | - Yang Chen
- Research Center for Atmospheric Environment, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China.
| |
Collapse
|
2
|
Lv S, Zhu Y, Cheng L, Zhang J, Shen W, Li X. Evaluation of the prediction effectiveness for geochemical mapping using machine learning methods: A case study from northern Guangdong Province in China. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 927:172223. [PMID: 38588737 DOI: 10.1016/j.scitotenv.2024.172223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 03/06/2024] [Accepted: 04/03/2024] [Indexed: 04/10/2024]
Abstract
This study compares seven machine learning models to investigate whether they improve the accuracy of geochemical mapping compared to ordinary kriging (OK). Arsenic is widely present in soil due to human activities and soil parent material, posing significant toxicity. Predicting the spatial distribution of elements in soil has become a current research hotspot. Lianzhou City in northern Guangdong Province, China, was chosen as the study area, collecting a total of 2908 surface soil samples from 0 to 20 cm depth. Seven machine learning models were chosen: Random Forest (RF), Support Vector Machine (SVM), Ridge Regression (Ridge), Gradient Boosting Decision Tree (GBDT), Artificial Neural Network (ANN), K-Nearest Neighbors (KNN), and Gaussian Process Regression (GPR). Exploring the advantages and disadvantages of machine learning and traditional geological statistical models in predicting the spatial distribution of heavy metal elements, this study also analyzes factors affecting the accuracy of element prediction. The two best-performing models in the original model, RF (R2 = 0.445) and GBDT (R2 = 0.414), did not outperform OK (R2 = 0.459) in terms of prediction accuracy. Ridge and GPR, the worst-performing methods, have R2 values of only 0.201 and 0.248, respectively. To improve the models' prediction accuracy, a spatial regionalized (SR) covariate index was added. Improvements varied among different methods, with RF and GBDT increasing their R2 values from 0.4 to 0.78 after enhancement. In contrast, the GPR model showed the least significant improvement, with its R2 value only reaching 0.25 in the improved method. This study concluded that choosing the right machine learning model and considering factors that influence prediction accuracy, such as regional variations, the number of sampling points, and their distribution, are crucial for ensuring the accuracy of predictions. This provides valuable insights for future research in this area.
Collapse
Affiliation(s)
- Songjian Lv
- Center for Health Geology & Carbon Peak and Carbon Neutrality of Lanzhou University, Key Laboratory of Western China's Environmental Systems (Ministry of Education), College of Earth and Environmental Sciences, Lanzhou University, Lanzhou 730000, China
| | - Ying Zhu
- Center for Health Geology & Carbon Peak and Carbon Neutrality of Lanzhou University, Key Laboratory of Western China's Environmental Systems (Ministry of Education), College of Earth and Environmental Sciences, Lanzhou University, Lanzhou 730000, China
| | - Li Cheng
- Center for Health Geology & Carbon Peak and Carbon Neutrality of Lanzhou University, Key Laboratory of Western China's Environmental Systems (Ministry of Education), College of Earth and Environmental Sciences, Lanzhou University, Lanzhou 730000, China
| | - Jingru Zhang
- Center for Health Geology & Carbon Peak and Carbon Neutrality of Lanzhou University, Key Laboratory of Western China's Environmental Systems (Ministry of Education), College of Earth and Environmental Sciences, Lanzhou University, Lanzhou 730000, China; Guangdong Province Academic of Environmental Science, Guangzhou 510045, China
| | - Wenjie Shen
- School of Earth Sciences and Engineering, Sun Yat-sen University, Zhuhai 519000, China
| | - Xingyuan Li
- Center for Health Geology & Carbon Peak and Carbon Neutrality of Lanzhou University, Key Laboratory of Western China's Environmental Systems (Ministry of Education), College of Earth and Environmental Sciences, Lanzhou University, Lanzhou 730000, China.
| |
Collapse
|
3
|
Fu P, Li X, Zhang J, Ma C, Wang Y, Meng F. Remote sensing inversion on heavy metal content in salinized soil of Yellow River Delta based on Random Forest Regression-a case study of Gudao Town. Sci Rep 2024; 14:11216. [PMID: 38755273 PMCID: PMC11099045 DOI: 10.1038/s41598-024-62087-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 05/13/2024] [Indexed: 05/18/2024] Open
Abstract
To explore the potential of using the mineral alteration information extracted by remote sensing technology to indirectly estimate the heavy metal content of salinized soil, 23 sampling points were uniformly set up in the town of Gudao in the Yellow River Delta as the research area in 2022. The concentrations of seven heavy metals, Cr, Cu, Pb, Zn, As, Mn and Ni, at the sampling points were determined in laboratory tests. Spectral derivative indices, topographic factors, and mineral alteration information (iron staining, hydroxyl, and carbonate ions) were extracted and screened as modeling factors using Sentinel 2 imagery. An inverse model of heavy metal content was constructed using the random forest algorithm, and the model accuracy was evaluated using the cross-validation method. The results of the study show that: (1) Hydroxyl and carbonate ion alteration can be effectively used for the inversion of soil As and Ni content in this study area. Iron-stained alteration can be used as a modeling factor in the inversion of Cr, Cu, Pb, Zn, and Mn concentrations. (2) The inclusion of alteration information improves the accuracy of heavy metal content inversion. The Cu concentration was verified to be the best predictor, with an RMSE of 3.309, MAPE of 11.072%, and R2 of 0.904, followed by As, Ni, and Zn; the predictive value of Mn, Cr and Pb was average. (3) Based on the results of concentration inversion, the high concentration areas of As, Ni, and Mn are primarily distributed on both sides of the river and around lakes and ponds. The high-concentration areas of Zn were mainly distributed in the farmland areas on both sides of the river. Areas with high concentrations of Cu were mainly distributed in the eastern oil extraction area, both sides of the rivers, and around lakes.
Collapse
Affiliation(s)
- Pingjie Fu
- School of Surveying and Geo-Informatics, Shandong Jianzhu University, Jinan, 250101, China.
| | - Xiaotong Li
- School of Surveying and Geo-Informatics, Shandong Jianzhu University, Jinan, 250101, China
| | - Jiawei Zhang
- School of Surveying and Geo-Informatics, Shandong Jianzhu University, Jinan, 250101, China
- College of Geodesy and Geomatics, Shandong University of Science and Technology, Qingdao, 266590, China
| | - Chijie Ma
- School of Surveying and Geo-Informatics, Shandong Jianzhu University, Jinan, 250101, China
| | - Yuqiang Wang
- Disaster Reduction Center of Shandong Province, Jinan, 250101, China
| | - Fei Meng
- School of Surveying and Geo-Informatics, Shandong Jianzhu University, Jinan, 250101, China
| |
Collapse
|
4
|
Gou Z, Liu C, Qi M, Zhao W, Sun Y, Qu Y, Ma J. Machine learning-based prediction of cadmium bioaccumulation capacity and associated analysis of driving factors in tobacco grown in Zunyi City, China. JOURNAL OF HAZARDOUS MATERIALS 2024; 463:132910. [PMID: 37926014 DOI: 10.1016/j.jhazmat.2023.132910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/17/2023] [Accepted: 10/30/2023] [Indexed: 11/07/2023]
Abstract
Tobacco grown in areas with high-geochemical backgrounds exhibits considerably different cadmium (Cd) bioaccumulation abilities due to regional disparities and environmental changes. However, the impact of key factors on the Cd bioaccumulation ability of tobacco grown in the karst regions with high selenium (Se) geochemical backgrounds is unclear. Herein, 365 paired rhizospheric soil-grown tobacco samples and 321 topsoil samples were collected from typical karst tobacco-growing soil in southwestern China and analyzed for Cd and Se. XGBoost was used to predict and evaluate the Cd bioaccumulation ability of tobacco and potential influencing factors. Results showed that regional geochemical characteristics, such as soil Cd and Se contents, soil type, and lithology, have the highest influence on the Cd bioaccumulation ability of tobacco, accounting for 46.5% of the overall variation. Moreover, soil Se contents in high-geochemical background areas considerably affect Cd bioaccumulation in tobacco, with a threshold for the mutual suppression effects of Cd and Se at a soil Se content of 0.8 mg/kg. According to the results of bivariate local indicators of spatial association analysis, tobacco cultivated in the central, northeast, and southeast regions of Zunyi City carries a lower risk of soil Cd contamination. This study provides new insights for managing tobacco cultivation in karst regions.
Collapse
Affiliation(s)
- Zilun Gou
- State Key Laboratory of Environmental Geochemistry, Institute of Geochemistry, Chinese Academy of Sciences, Guiyang 550081, China; State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chengshuai Liu
- State Key Laboratory of Environmental Geochemistry, Institute of Geochemistry, Chinese Academy of Sciences, Guiyang 550081, China
| | - Meng Qi
- State Key Laboratory of Environmental Geochemistry, Institute of Geochemistry, Chinese Academy of Sciences, Guiyang 550081, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenhao Zhao
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Yi Sun
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Yajing Qu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Jin Ma
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China.
| |
Collapse
|
5
|
Lakhoo DP, Chersich MF, Jack C, Maimela G, Cissé G, Solarin I, Ebi KL, Chande KS, Dumbura C, Makanga PT, van Aardenne L, Joubert BR, McAllister KA, Ilias M, Makhanya S, Luchters S. Protocol of an individual participant data meta-analysis to quantify the impact of high ambient temperatures on maternal and child health in Africa (HE 2AT IPD). BMJ Open 2024; 14:e077768. [PMID: 38262654 PMCID: PMC10824032 DOI: 10.1136/bmjopen-2023-077768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 11/13/2023] [Indexed: 01/25/2024] Open
Abstract
INTRODUCTION Globally, recognition is growing of the harmful impacts of high ambient temperatures (heat) on health in pregnant women and children. There remain, however, major evidence gaps on the extent to which heat increases the risks for adverse health outcomes, and how this varies between settings. Evidence gaps are especially large in Africa. We will conduct an individual participant data (IPD) meta-analysis to quantify the impacts of heat on maternal and child health in sub-Saharan Africa. A detailed understanding and quantification of linkages between heat, and maternal and child health is essential for developing solutions to this critical research and policy area. METHODS AND ANALYSIS We will use IPD from existing, large, longitudinal trial and cohort studies, on pregnant women and children from sub-Saharan Africa. We will systematically identify eligible studies through a mapping review, searching data repositories, and suggestions from experts. IPD will be acquired from data repositories, or through collaboration with data providers. Existing satellite imagery, climate reanalysis data, and station-based weather observations will be used to quantify weather and environmental exposures. IPD will be recoded and harmonised before being linked with climate, environmental, and socioeconomic data by location and time. Adopting a one-stage and two-stage meta-analysis method, analytical models such as time-to-event analysis, generalised additive models, and machine learning approaches will be employed to quantify associations between exposure to heat and adverse maternal and child health outcomes. ETHICS AND DISSEMINATION The study has been approved by ethics committees. There is minimal risk to study participants. Participant privacy is protected through the anonymisation of data for analysis, secure data transfer and restricted access. Findings will be disseminated through conferences, journal publications, related policy and research fora, and data may be shared in accordance with data sharing policies of the National Institutes of Health. PROSPERO REGISTRATION NUMBER CRD42022346068.
Collapse
Affiliation(s)
- Darshnika Pemi Lakhoo
- Wits RHI, Faculty of Health Sciences, University of Witwatersrand, Johannesburg, South Africa
| | | | - Chris Jack
- Climate System Analysis Group, University of Cape Town, Rondebosch, South Africa
| | - Gloria Maimela
- Wits RHI, Faculty of Health Sciences, University of Witwatersrand, Johannesburg, South Africa
| | - Guéladio Cissé
- University Peleforo Gon Coulibaly, Korhogo, Côte d'Ivoire
| | - Ijeoma Solarin
- Wits RHI, Faculty of Health Sciences, University of Witwatersrand, Johannesburg, South Africa
| | | | - Kshama S Chande
- Wits RHI, Faculty of Health Sciences, University of Witwatersrand, Johannesburg, South Africa
| | - Cherlynn Dumbura
- Centre for Sexual Health and HIV/AIDS Research, Harare, Zimbabwe
| | - Prestige Tatenda Makanga
- Centre for Sexual Health and HIV/AIDS Research, Harare, Zimbabwe
- Place Alert Labs, Department of Surveying and Geomatics, Faculty of the Built Environment, Midlands State University, Gweru, Zimbabwe
| | - Lisa van Aardenne
- Climate System Analysis Group, University of Cape Town, Rondebosch, South Africa
| | - Bonnie R Joubert
- National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Durham, North Carolina, USA
| | - Kimberly A McAllister
- National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Durham, North Carolina, USA
| | - Maliha Ilias
- National Heart, Lung, and Blood Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland, USA
| | | | - Stanley Luchters
- Centre for Sexual Health and HIV/AIDS Research, Harare, Zimbabwe
- Liverpool School of Tropical Medicine, Liverpool, UK
- Department of Public Health and Primary Care, Ghent Unviersity, Ghent, Belgium
| |
Collapse
|
6
|
Zhao W, Ma J, Liu Q, Dou L, Qu Y, Shi H, Sun Y, Chen H, Tian Y, Wu F. Accurate Prediction of Soil Heavy Metal Pollution Using an Improved Machine Learning Method: A Case Study in the Pearl River Delta, China. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17751-17761. [PMID: 36821784 DOI: 10.1021/acs.est.2c07561] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
In traditional soil heavy metal (HM) pollution assessment, spatial interpolation analysis is often carried out on the limited sampling points in the study area to get the overall status of heavy metal pollution. Unfortunately, in many machine learning spatial information enhancement algorithms, the additional spatial information introduced fails to reflect the hierarchical heterogeneity of the study area. Therefore, we designed hierarchical regionalization labels based on three interpolation techniques (inverse distance weight, ordinary kriging, and trend surface interpolation) as new spatial covariates for a machine learning (ML) model. It was demonstrated that regional spatial information improved the prediction performance of the model (R2 > 0.7). On the basis of the prediction results, the status of HM pollution in the Pearl River Delta (PRD) region was evaluated: cadmium (Cd) and copper (Cu) were the most serious pollutants in the PRD (the point overstandard rates are 18.77% and 12.95%, respectively). The analysis of index importance and bivariate local indicators of spatial association (LISA) shows that the key factors affecting the spatial distribution of heavy metals are geographical and climatic conditions [namely, altitude, humidity index, and normalized vegetation difference index (NDVI)] and some industrial activities (such as metal processing, printing and dyeing, and electronics industry). This study develops a novel approach to improve existing spatial interpolation techniques, which will enable more precise and scientific soil environmental management.
Collapse
Affiliation(s)
- Wenhao Zhao
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| | - Jin Ma
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| | - Qiyuan Liu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| | - Lei Dou
- Guangdong Geological Survey Institute, Guangzhou 510110, P. R. China
| | - Yajing Qu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| | - Huading Shi
- Technical Centre for Soil, Agricultural and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, P. R. China
| | - Yi Sun
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| | - Haiyan Chen
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| | - Yuxin Tian
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| | - Fengchang Wu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, P. R. China
| |
Collapse
|
7
|
Yun Q, Wang X, Yao C, Wang H. Random forest method for estimation of brake specific fuel consumption. Sci Rep 2023; 13:17741. [PMID: 37853230 PMCID: PMC10584861 DOI: 10.1038/s41598-023-45026-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 10/14/2023] [Indexed: 10/20/2023] Open
Abstract
The internal combustion engine is a widely used power equipment in various fields, and its energy utilization is measured using brake specific fuel consumption (BSFC). BSFC map plays a crucial role in the analysis, optimization, and assessment of internal combustion engines. However, due to cost constraints, some values on the BSFC map are estimated using techniques like K-nearest neighbor, inverse distance weighted interpolation, and multi-layer perceptron, which are recognized for their limited accuracy, particularly when dealing with distributed sampled data. To address this, an improved random forest method is proposed for the estimation of BSFC. Polynomial features are employed to increase higher dimensions of features for random forest by nonlinear transformation, and critical parameters are optimized by particle swarm optimization algorithms. The performance of different methods was compared on two datasets to estimate 20%, 30%, and 40% of BSFC data, and the results reveal that the method proposed in this paper outperforms other common methods and is suitable for estimating the BSFC map.
Collapse
Affiliation(s)
- Qinsheng Yun
- Naval University of Engineering, Wuhan, 430000, China.
- Shanghai Marine Diesel Engine Research Institute, Shanghai, 200000, China.
| | - Xiangjun Wang
- Naval University of Engineering, Wuhan, 430000, China.
| | - Chen Yao
- Shanghai Marine Diesel Engine Research Institute, Shanghai, 200000, China
| | - Haiyan Wang
- Shanghai Maritime University, Shanghai, 200000, China
| |
Collapse
|
8
|
Stevenazzi S, Zuffetti C, Camera CAS, Lucchelli A, Beretta GP, Bersezio R, Masetti M. Hydrogeological characteristics and water availability in the mountainous aquifer systems of Italian Central Alps: A regional scale approach. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 340:117958. [PMID: 37116412 DOI: 10.1016/j.jenvman.2023.117958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 02/17/2023] [Accepted: 04/14/2023] [Indexed: 05/12/2023]
Abstract
Groundwater resources in mountain areas are strategically important to maintain adequate water supply for domestic uses, farming, industrial activities, and energy production, also considering the expected growing demand due to ongoing climate changes. Within this framework, the objective of the study is to develop a regional approach, compliant with the European requirements of the Water Framework Directive 2000/60/EC and Groundwater Directive 2006/118/EC, that could support public agencies and water companies to efficiently manage and protect the available water resources in mountainous environments. The proposed approach identifies and delineates groundwater bodies by coupling a 3D hydro-stratigraphic model with the definition of the water budget and water hydrochemical fingerprints in a geologically complex Alpine environment in Northern Italy. Sixteen groundwater bodies (GWBs) have been identified all over the 10.290 km2 area, showing an average storage capacity of more than 500 Mm³ y-1 (about 3% of the average total inflow from precipitation and snowmelt), with differences up to four times between GWBs mainly constituted of carbonate rocks and those prevalently composed of crystalline or terrigenous rocks. Groundwater quality in the study domain is generally excellent, with few exceptions due to geogenic (i.e., natural) or anthropogenic sources of contamination. The results of this study show the advantages of coupling 3D hydro-stratigraphic modelling combined with meteorological, hydrological and hydrogeological information, which consist in: i) identifying the most Strategic Storage Reservoir both in terms of quality and storage capacity; ii) evaluating the present ground- and surface water availability; iii) detecting areas of specific interest for implementing groundwater monitoring networks; iv) recognising recharge areas of the most relevant springs, to implement protection strategies of the resource.
Collapse
Affiliation(s)
- Stefania Stevenazzi
- Dipartimento di Ingegneria Civile, Edile e Ambientale, Università degli Studi di Napoli Federico II, Piazzale Tecchio, 80, Naples, 80125, Italy
| | - Chiara Zuffetti
- Dipartimento di Scienze della Terra "A. Desio", Università degli Studi di Milano, Via Luigi Mangiagalli, 34, Milan, 20133, Italy
| | - Corrado A S Camera
- Dipartimento di Scienze della Terra "A. Desio", Università degli Studi di Milano, Via Luigi Mangiagalli, 34, Milan, 20133, Italy.
| | - Alice Lucchelli
- Dipartimento di Scienze della Terra "A. Desio", Università degli Studi di Milano, Via Luigi Mangiagalli, 34, Milan, 20133, Italy
| | - Giovanni Pietro Beretta
- Dipartimento di Scienze della Terra "A. Desio", Università degli Studi di Milano, Via Luigi Mangiagalli, 34, Milan, 20133, Italy
| | - Riccardo Bersezio
- Dipartimento di Scienze della Terra "A. Desio", Università degli Studi di Milano, Via Luigi Mangiagalli, 34, Milan, 20133, Italy
| | - Marco Masetti
- Dipartimento di Scienze della Terra "A. Desio", Università degli Studi di Milano, Via Luigi Mangiagalli, 34, Milan, 20133, Italy
| |
Collapse
|
9
|
Elío J, Petermann E, Bossew P, Janik M. Machine learning in environmental radon science. Appl Radiat Isot 2023; 194:110684. [PMID: 36706518 DOI: 10.1016/j.apradiso.2023.110684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 11/10/2022] [Accepted: 01/13/2023] [Indexed: 01/16/2023]
Abstract
Temporal dynamic as well as spatial variability of environmental radon are controlled by factors such as meteorology, lithology, soil properties, hydrogeology, tectonics, and seismicity. In addition, indoor radon concentration is subject to anthropogenic factors, such as physical characteristics of a building and usage pattern. New tools for spatial and time series analysis and prediction belong to what is commonly called machine learning (ML). The ML algorithms presented here build models based on sample and predictor data to extract information and to make predictions. We give a short overview on ML methods and discuss their respective merits, their application, and ways of validating results. We show examples of 1) geogenic radon mapping in Germany involving a number of predictors, and of 2) time series analysis of a long-term experiment being carried out in Chiba, Japan, involving indoor radon concentrations and meteorological predictors. Finally, we identified the main weakness of the techniques, and we suggest actions to overcome their limitations.
Collapse
Affiliation(s)
- Javier Elío
- Department of Mechanical and Marine Engineering, Western Norway University of Applied Sciences, Inndalsveien 28, Bergen, 5063, Norway
| | - Eric Petermann
- Federal Office for Radiation Protection (BfS), Köpenicker Allee 120-130, Berlin, 10318, Germany
| | - Peter Bossew
- Retired from Federal Office for Radiation Protection (BfS), Köpenicker Allee 120-130, Berlin, 10318, Germany
| | - Miroslaw Janik
- The National Institutes for Quantum and Radiological Science and Technology, National Institute of Radiological Sciences (NIRS), 4-9-1 Anagawa, Inage-ku, 263-8555, Chiba, Japan.
| |
Collapse
|
10
|
Yang H, Ma W, Liu T, Li W. Assessing farmland suitability for agricultural machinery in land consolidation schemes in hilly terrain in China: A machine learning approach. FRONTIERS IN PLANT SCIENCE 2023; 14:1084886. [PMID: 36950352 PMCID: PMC10025464 DOI: 10.3389/fpls.2023.1084886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 01/31/2023] [Indexed: 06/18/2023]
Abstract
Identifying available farmland suitable for agricultural machinery is the most promising way of optimizing agricultural production and increasing agricultural mechanization. Farmland consolidation suitable for agricultural machinery (FCAM) is implemented as an effective tool for increasing sustainable production and mechanized agriculture. By using the machine learning approach, this study assesses the suitability of farmland for agricultural machinery in land consolidation schemes based on four parameters, i.e., natural resource endowment, accessibility of agricultural machinery, socioeconomic level, and ecological limitations. And based on "suitability" and "potential improvement in farmland productivity", we classified land into four zones: the priority consolidation zone, the moderate consolidation zone, the comprehensive consolidation zone, and the reserve consolidation zone. The results showed that most of the farmland (76.41%) was either basically or moderately suitable for FCAM. Although slope was often an indicator that land was suitable for agricultural machinery, other factors, such as the inferior accessibility of tractor roads, continuous depopulation, and ecological fragility, contributed greatly to reducing the overall suitability of land for FCAM. Moreover, it was estimated that the potential productivity of farmland would be increased by 720.8 kg/ha if FCAM were implemented. Four zones constituted a useful basis for determining the implementation sequence and differentiating strategies for FCAM schemes. Consequently, this zoning has been an effective solution for implementing FCAM schemes. However, the successful implementation of FCAM schemes, and the achievement a modern and sustainable agriculture system, will require some additional strategies, such as strengthening farmland ecosystem protection and promoting R&D into agricultural machinery suitable for hilly terrain, as well as more financial support.
Collapse
Affiliation(s)
- Heng Yang
- College of Engineering, China Agricultural University, Beijing, China
| | - Wenqiu Ma
- College of Engineering, China Agricultural University, Beijing, China
| | - Tongxin Liu
- College of Engineering, China Agricultural University, Beijing, China
| | - Wenqing Li
- Key Laboratory of Land Consolidation and Rehabilitation, Land Consolidation and Rehabilitation Center, Ministry of Natural Resources, Beijing, China
| |
Collapse
|
11
|
Chen H, Wang J. Active Learning for Efficient Soil Monitoring in Large Terrain with Heterogeneous Sensor Network. SENSORS (BASEL, SWITZERLAND) 2023; 23:2365. [PMID: 36904569 PMCID: PMC10007343 DOI: 10.3390/s23052365] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 02/13/2023] [Accepted: 02/15/2023] [Indexed: 06/18/2023]
Abstract
Soils are a complex ecosystem that provides critical services, such as growing food, supplying antibiotics, filtering wastes, and maintaining biodiversity; hence monitoring soil health and domestication is required for sustainable human development. Low-cost and high-resolution soil monitoring systems are challenging to design and build. Compounded by the sheer size of the monitoring area of interest and the variety of biological, chemical, and physical parameters to monitor, naive approaches to adding or scheduling more sensors will suffer from cost and scalability problems. We investigate a multi-robot sensing system integrated with an active learning-based predictive modeling technique. Taking advantage of advances in machine learning, the predictive model allows us to interpolate and predict soil attributes of interest from the data collected by sensors and soil surveys. The system provides high-resolution prediction when the modeling output is calibrated with static land-based sensors. The active learning modeling technique allows our system to be adaptive in data collection strategy for time-varying data fields, utilizing aerial and land robots for new sensor data. We evaluated our approach using numerical experiments with a soil dataset focusing on heavy metal concentration in a flooded area. The experimental results demonstrate that our algorithms can reduce sensor deployment costs via optimized sensing locations and paths while providing high-fidelity data prediction and interpolation. More importantly, the results verify the adapting behavior of the system to the spatial and temporal variations of soil conditions.
Collapse
Affiliation(s)
- Hui Chen
- Department of Computer & Information Science, CUNY Brooklyn College, Brooklyn, NY 11210, USA
- Department of Computer Science, CUNY Graduate Center, New York, NY 10016, USA
| | - Ju Wang
- Department of Computer Science, Virginia State University, Petersburg, VA 23806, USA
| |
Collapse
|
12
|
Li C, Wang Y, Gao Z, Sun B, Xing H, Zang Y. Identification of Typical Ecosystem Types by Integrating Active and Passive Time Series Data of the Guangdong-Hong Kong-Macao Greater Bay Area, China. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:15108. [PMID: 36429839 PMCID: PMC9690903 DOI: 10.3390/ijerph192215108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 11/11/2022] [Accepted: 11/13/2022] [Indexed: 06/16/2023]
Abstract
The identification of ecosystem types is important in ecological environmental assessment. However, due to cloud and rain and complex land cover characteristics, commonly used ecosystem identification methods have always lacked accuracy in subtropical urban agglomerations. In this study, China's Guangdong-Hong Kong-Macao Greater Bay Area (GBA) was taken as a study area, and the Sentinel-1 and Sentinel-2 data were used as the fusion of active and passive remote sensing data with time series data to distinguish typical ecosystem types in subtropical urban agglomerations. Our results showed the following: (1) The importance of different features varies widely in different types of ecosystems. For grassland and arable land, two specific texture features (VV_dvar and VH_diss) are most important; in forest and mangrove areas, synthetic-aperture radar (SAR) data for the months of October and September are most important. (2) The use of active time series remote sensing data can significantly improve the classification accuracy by 3.33%, while passive time series remote sensing data improves by 4.76%. When they are integrated, accuracy is further improved, reaching a level of 84.29%. (3) Time series passive data (NDVI) serve best to distinguish grassland from arable land, while time series active data (SAR data) are best able to distinguish mangrove from forest. The integration of active and passive time series data also improves precision in distinguishing vegetation ecosystem types, such as forest, mangrove, arable land, and, especially, grassland, where the accuracy increased by 21.88%. By obtaining real-time and more accurate land cover type change information, this study could better serve regional change detection and ecosystem service function assessment at different scales, thereby supporting decision makers in urban agglomerations.
Collapse
Affiliation(s)
- Changlong Li
- School of Information Technology and Engineering, Guangzhou College of Commerce, Guangzhou 511363, China
- Institute of Forest Resource Information Techniques, Chinese Academy of Forestry, Beijing 100091, China
- Key Laboratory of Forestry Remote Sensing and Information System, NFGA, Beijing 100091, China
| | - Yan Wang
- Shandong Geographical Institute of Land Spatial Data and Remote Sensing Technology, Jinan 250002, China
| | - Zhihai Gao
- Institute of Forest Resource Information Techniques, Chinese Academy of Forestry, Beijing 100091, China
- Key Laboratory of Forestry Remote Sensing and Information System, NFGA, Beijing 100091, China
| | - Bin Sun
- Institute of Forest Resource Information Techniques, Chinese Academy of Forestry, Beijing 100091, China
- Key Laboratory of Forestry Remote Sensing and Information System, NFGA, Beijing 100091, China
| | - He Xing
- School of Information Technology and Engineering, Guangzhou College of Commerce, Guangzhou 511363, China
| | - Yu Zang
- School of Information Technology and Engineering, Guangzhou College of Commerce, Guangzhou 511363, China
| |
Collapse
|
13
|
Mateo RG, Arellano G, Gómez-Rubio V, Tello JS, Fuentes AF, Cayola L, Loza MI, Cala V, Macía MJ. Insights on biodiversity drivers to predict species richness in tropical forests at the local scale. Ecol Modell 2022. [DOI: 10.1016/j.ecolmodel.2022.110133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
14
|
Zeyliger A, Chinilin A, Ermolaeva O. Spatial Interpolation of Gravimetric Soil Moisture Using EM38-mk Induction and Ensemble Machine Learning (Case Study from Dry Steppe Zone in Volgograd Region). SENSORS (BASEL, SWITZERLAND) 2022; 22:6153. [PMID: 36015913 PMCID: PMC9414959 DOI: 10.3390/s22166153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/15/2022] [Accepted: 08/16/2022] [Indexed: 06/15/2023]
Abstract
The implementation of the sustainable management of the interaction between agriculture and the environment requires an increasingly deep understanding and numerical description of the soil genesis and properties of soils. One of the areas of application of relevant knowledge is digital irrigated agriculture. During the development of such technologies, the traditional methods of soil research can be quite expensive and time consuming. Proximal soil sensing in combination with predictive soil mapping can significantly reduce the complexity of the work. In this study, we used topographic variables and data from the Electromagnetic Induction Meter (EM38-mk) in combination with soil surface hydrological variables to produce cartographic models of the gravimetric soil moisture for a number of depth intervals. For this purpose, in dry steppe zone conditions, a test site was organized. It was located at the border of the parcel containing the irrigated soybean crop, where 50 soil samples were taken at different points alongside electrical conductivity data (ECa) measured in situ in the field. The modeling of the gravimetric soil moisture was carried out with the stepwise inclusion of independent variables, using methods of ensemble machine learning and spatial cross-validation. The obtained cartographic models showed satisfactory results with the best performance R2cv 0.59-0.64. The best combination of predictors that provided the best results of the model characteristics for predicting gravimetric soil moisture were geographical variables (buffer zone distances) in combination with the initial variables converted into the principal components. The cartographic models of the gravimetric soil moisture variability obtained this way can be used to solve the problems of managed irrigated agriculture, applying fertilizers at variable rates, thereby optimizing the use of resources by crop producers, which can ultimately contribute to the sustainable management of natural resources.
Collapse
Affiliation(s)
- Anatoly Zeyliger
- Saratov State University of Genetics, Biotechnology and Engineering Named after N.I. Vavilov, 410012 Saratov, Russia
| | - Andrey Chinilin
- Department of Soil Geography, FRC “V.V. Dokuchaev Soil Science Institute”, 119017 Moscow, Russia
| | - Olga Ermolaeva
- Department of Applied Informatics, Russian State Agrarian University—Moscow Timiryazev Agricultural Academy, 127550 Moscow, Russia
| |
Collapse
|
15
|
Advanced Feature-Selection-Based Hybrid Ensemble Learning Algorithms for Network Intrusion Detection Systems. Symmetry (Basel) 2022. [DOI: 10.3390/sym14071461] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
As cyber-attacks become remarkably sophisticated, effective Intrusion Detection Systems (IDSs) are needed to monitor computer resources and to provide alerts regarding unusual or suspicious behavior. Despite using several machine learning (ML) and data mining methods to achieve high effectiveness, these systems have not proven ideal. Current intrusion detection algorithms suffer from high dimensionality, redundancy, meaningless data, high error rate, false alarm rate, and false-negative rate. This paper proposes a novel Ensemble Learning (EL) algorithm-based network IDS model. The efficient feature selection is attained via a hybrid of Correlation Feature Selection coupled with Forest Panelized Attributes (CFS–FPA). The improved intrusion detection involves exploiting AdaBoosting and bagging ensemble learning algorithms to modify four classifiers: Support Vector Machine, Random Forest, Naïve Bayes, and K-Nearest Neighbor. These four enhanced classifiers have been applied first as AdaBoosting and then as bagging, using the aggregation technique through the voting average technique. To provide better benchmarking, both binary and multi-class classification forms are used to evaluate the model. The experimental results of applying the model to CICIDS2017 dataset achieved promising results of 99.7%accuracy, a 0.053 false-negative rate, and a 0.004 false alarm rate. This system will be effective for information technology-based organizations, as it is expected to provide a high level of symmetry between information security and detection of attacks and malicious intrusion.
Collapse
|
16
|
Novel MLR-RF-Based Geospatial Techniques: A Comparison with OK. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2022. [DOI: 10.3390/ijgi11070371] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Geostatistical estimation methods rely on experimental variograms that are mostly erratic, leading to subjective model fitting and assuming normal distribution during conditional simulations. In contrast, Machine Learning Algorithms (MLA) are (1) free of such limitations, (2) can incorporate information from multiple sources and therefore emerge with increasing interest in real-time resource estimation and automation. However, MLAs need to be explored for robust learning of phenomena, better accuracy, and computational efficiency. This paper compares MLAs, i.e., Multiple Linear Regression (MLR) and Random Forest (RF), with Ordinary Kriging (OK). The techniques were applied to the publicly available Walkerlake dataset, while the exhaustive Walker Lake dataset was validated. The results of MLR were significant (p < 10 × 10−5), with correlation coefficients of 0.81 (R-square = 0.65) compared to 0.79 (R-square = 0.62) from the RF and OK methods. Additionally, MLR was automated (free from an intermediary step of variogram modelling as in OK), produced unbiased estimates, identified key samples representing different zones, and had higher computational efficiency.
Collapse
|
17
|
Ahmad R, Wazirali R, Abu-Ain T. Machine Learning for Wireless Sensor Networks Security: An Overview of Challenges and Issues. SENSORS 2022; 22:s22134730. [PMID: 35808227 PMCID: PMC9269255 DOI: 10.3390/s22134730] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/15/2022] [Accepted: 06/21/2022] [Indexed: 12/04/2022]
Abstract
Energy and security are major challenges in a wireless sensor network, and they work oppositely. As security complexity increases, battery drain will increase. Due to the limited power in wireless sensor networks, options to rely on the security of ordinary protocols embodied in encryption and key management are futile due to the nature of communication between sensors and the ever-changing network topology. Therefore, machine learning algorithms are one of the proposed solutions for providing security services in this type of network by including monitoring and decision intelligence. Machine learning algorithms present additional hurdles in terms of training and the amount of data required for training. This paper provides a convenient reference for wireless sensor network infrastructure and the security challenges it faces. It also discusses the possibility of benefiting from machine learning algorithms by reducing the security costs of wireless sensor networks in several domains; in addition to the challenges and proposed solutions to improving the ability of sensors to identify threats, attacks, risks, and malicious nodes through their ability to learn and self-development using machine learning algorithms. Furthermore, this paper discusses open issues related to adapting machine learning algorithms to the capabilities of sensors in this type of network.
Collapse
Affiliation(s)
- Rami Ahmad
- Institute of Networked and Embedded Systems, University of Klagenfurt, 9020 Klagenfurt, Austria
- Ubiquitous Sensing Systems Lab, University of Klagenfurt-Silicon Austria Labs, 9020 Klagenfurt, Austria
- Correspondence: (R.A.); (R.W.)
| | - Raniyah Wazirali
- College of Computing and Informatics, Saudi Electronic University, Riyadh 11673, Saudi Arabia;
- Correspondence: (R.A.); (R.W.)
| | - Tarik Abu-Ain
- College of Computing and Informatics, Saudi Electronic University, Riyadh 11673, Saudi Arabia;
| |
Collapse
|
18
|
Baratto PFB, Cecílio RA, de Sousa Teixeira DB, Zanetti SS, Xavier AC. Random forest for spatialization of daily evapotranspiration (ET 0) in watersheds in the Atlantic Forest. ENVIRONMENTAL MONITORING AND ASSESSMENT 2022; 194:449. [PMID: 35606615 DOI: 10.1007/s10661-022-10110-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 05/15/2022] [Indexed: 06/15/2023]
Abstract
The importance of daily data on reference evapotranspiration (ET0) has increased in recent years due to its relevance in planning and decision making regarding irrigated agriculture, water production, and forest restoration. Facing the scarcity of this information measured in loco, the study of interpolation methods capable of representing ET0 becomes important. Therefore, this study aimed to evaluate the adequacy of the Random Forest (RF) method in the spatialization of ET0 in the watersheds of the Mid-South region of the Espírito Santo State, located within the Atlantic Forest biome, Brazil. From this study, it was found that the RF method is the most suitable one for ET0 spatialization when compared to the Angular distance weighting (ADW) and the inverse distance weighting (IDW) techniques. Also, the spatializations carried out by this method were transformed into databases in a grid format and made available online. Furthermore, the RF database was also compared to other ET0 grid databases, and it was concluded that the RF database also carried out a better performance than the other ones.
Collapse
Affiliation(s)
| | - Roberto Avelino Cecílio
- Department of Forest and Wood Sciences, Federal University of Espírito Santo, Jerônimo Monteiro, ES, 29550-000, Brazil
| | - David Bruno de Sousa Teixeira
- Department of Agricultural Engineering, Federal University of Viçosa, Avenue Peter Henry Rolfs, Viçosa, MG, 36570-900, Brazil.
| | - Sidney Sara Zanetti
- Department of Forest and Wood Sciences, Federal University of Espírito Santo, Jerônimo Monteiro, ES, 29550-000, Brazil
| | - Alexandre Cândido Xavier
- Department of Forest and Wood Sciences, Federal University of Espírito Santo, Jerônimo Monteiro, ES, 29550-000, Brazil
| |
Collapse
|
19
|
Meray A, Sturla S, Siddiquee MR, Serata R, Uhlemann S, Gonzalez-Raymat H, Denham M, Upadhyay H, Lagos LE, Eddy-Dilek C, Wainwright HM. PyLEnM: A Machine Learning Framework for Long-Term Groundwater Contamination Monitoring Strategies. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:5973-5983. [PMID: 35427133 PMCID: PMC9069689 DOI: 10.1021/acs.est.1c07440] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 03/08/2022] [Accepted: 03/21/2022] [Indexed: 06/14/2023]
Abstract
In this study, we have developed a comprehensive machine learning (ML) framework for long-term groundwater contamination monitoring as the Python package PyLEnM (Python for Long-term Environmental Monitoring). PyLEnM aims to establish the seamless data-to-ML pipeline with various utility functions, such as quality assurance and quality control (QA/QC), coincident/colocated data identification, the automated ingestion and processing of publicly available spatial data layers, and novel data summarization/visualization. The key ML innovations include (1) time series/multianalyte clustering to find the well groups that have similar groundwater dynamics and to inform spatial interpolation and well optimization, (2) the automated model selection and parameter tuning, comparing multiple regression models for spatial interpolation, (3) the proxy-based spatial interpolation method by including spatial data layers or in situ measurable variables as predictors for contaminant concentrations and groundwater levels, and (4) the new well optimization algorithm to identify the most effective subset of wells for maintaining the spatial interpolation ability for long-term monitoring. We demonstrate our methodology using the monitoring data at the Savannah River Site F-Area. Through this open-source PyLEnM package, we aim to improve the transparency of data analytics at contaminated sites, empowering concerned citizens as well as improving public relations.
Collapse
Affiliation(s)
- Aurelien
O. Meray
- Applied
Research Center, Florida International University, 10555 W Flagler Street, Miami, Florida 33174, United States
| | - Savannah Sturla
- Department
of Environmental Science, Policy, and Management, University of California Berkeley, Mulford Hall, 2521 Hearst Avenue, Berkeley, California 94709, United States
| | - Masudur R. Siddiquee
- Applied
Research Center, Florida International University, 10555 W Flagler Street, Miami, Florida 33174, United States
| | - Rebecca Serata
- Department
of Civil and Environmental Engineering, University of California Berkeley, Davis Hall, 2521 Hearst Avenue, Berkeley, California 94709, United States
| | - Sebastian Uhlemann
- Climate
and Ecosystem Sciences Division, Lawrence
Berkeley National Laboratory, 1 Cyclotron Road, MS 74R-316C, Berkeley 94704, United States
| | - Hansell Gonzalez-Raymat
- Savannah
River National Laboratory, Savannah River Site, Aiken, South Carolina 29808, United States
| | - Miles Denham
- Panoramic
Environmental Consulting, LLC, P.O. Box
906, Aiken, South Carolina 29802, United States
| | - Himanshu Upadhyay
- Applied
Research Center, Florida International University, 10555 W Flagler Street, Miami, Florida 33174, United States
| | - Leonel E. Lagos
- Applied
Research Center, Florida International University, 10555 W Flagler Street, Miami, Florida 33174, United States
| | - Carol Eddy-Dilek
- Savannah
River National Laboratory, Savannah River Site, Aiken, South Carolina 29808, United States
| | - Haruko M. Wainwright
- Climate
and Ecosystem Sciences Division, Lawrence
Berkeley National Laboratory, 1 Cyclotron Road, MS 74R-316C, Berkeley 94704, United States
- Department
of Nuclear Science & Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
20
|
Ahmad S, Koh KY, Lee JI, Suh GH, Lee CM. Interpolation of Point Prevalence Rate of the Highly Pathogenic Avian Influenza Subtype H5N8 Second Phase Epidemic in South Korea. Vet Sci 2022; 9:vetsci9030139. [PMID: 35324867 PMCID: PMC8954420 DOI: 10.3390/vetsci9030139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 03/09/2022] [Accepted: 03/10/2022] [Indexed: 11/29/2022] Open
Abstract
Humans and animals are both susceptible to highly pathogenic avian influenza (HPAI) viruses. In the future, HPAI has the potential to be a source of zoonoses and pandemic disease drivers. It is necessary to identify areas of high risk that are more vulnerable to HPAI infections. In this study, we applied unbiased predictions based on known information to find points of localities with a high probability of point prevalence rate. To carry out such predictions, we utilized the inverse distance weighting (IDW) and kriging method, with the help of the R statistical computing program. The provinces of Jeollanam-do, Gyeonggi-do, Chungcheongbuk-do and Ulsan have high anticipated risk. This research might aid in the management of avian influenza threats associated with various potential risks.
Collapse
Affiliation(s)
- Saleem Ahmad
- Veterinary Public Health Lab, College of Veterinary Medicine, Chonnam National University, Gwangju 61186, Korea; (S.A.); (K.-Y.K.); (J.-i.L.)
| | - Kye-Young Koh
- Veterinary Public Health Lab, College of Veterinary Medicine, Chonnam National University, Gwangju 61186, Korea; (S.A.); (K.-Y.K.); (J.-i.L.)
| | - Jae-il Lee
- Veterinary Public Health Lab, College of Veterinary Medicine, Chonnam National University, Gwangju 61186, Korea; (S.A.); (K.-Y.K.); (J.-i.L.)
| | - Guk-Hyun Suh
- Department of Veterinary Internal Medicine, College of Veterinary Medicine and BK21 FOUR Program, Chonnam National University, Gwangju 61186, Korea;
| | - Chang-Min Lee
- Department of Veterinary Internal Medicine, College of Veterinary Medicine and BK21 FOUR Program, Chonnam National University, Gwangju 61186, Korea;
- Correspondence:
| |
Collapse
|
21
|
Interpolation-Based Fusion of Sentinel-5P, SRTM, and Regulatory-Grade Ground Stations Data for Producing Spatially Continuous Maps of PM2.5 Concentrations Nationwide over Thailand. ATMOSPHERE 2022. [DOI: 10.3390/atmos13020161] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Atmospheric pollution has recently drawn significant attention due to its proven adverse effects on public health and the environment. This concern has been aggravated specifically in Southeast Asia due to increasing vehicular use, industrial activity, and agricultural burning practices. Consequently, elevated PM2.5 concentrations have become a matter of intervention for national authorities who have addressed the needs of monitoring air pollution by operating ground stations. However, their spatial coverage is limited and the installation and maintenance are costly. Therefore, alternative approaches are necessary at national and regional scales. In the current paper, we investigated interpolation models to fuse PM2.5 measurements from ground stations and satellite data in an attempt to produce spatially continuous maps of PM2.5 nationwide over Thailand. Four approaches are compared, namely the inverse distance weighted (IDW), ordinary kriging (OK), random forest (RF), and random forest combined with OK (RFK) leveraging on the NO2, SO2, CO, HCHO, AI, and O3 products from the Sentinel-5P satellite, regulatory-grade ground PM2.5 measurements, and topographic parameters. The results suggest that RFK is the most robust, especially when the pollution levels are moderate or extreme, achieving an RMSE value of 7.11 μg/m3 and an R2 value of 0.77 during a 10-day long period in February, and an RMSE of 10.77 μg/m3 and R2 and 0.91 during the entire month of March. The proposed approach can be adopted operationally and expanded by leveraging regulatory-grade stations, low-cost sensors, as well as upcoming satellite missions such as the GEMS and the Sentinel-5.
Collapse
|
22
|
Flash Flood Water Depth Estimation Using SAR Images, Digital Elevation Models, and Machine Learning Algorithms. REMOTE SENSING 2022. [DOI: 10.3390/rs14030440] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
In this article, the local spatial correlation of multiple remote sensing datasets, such as those from Sentinel-1, Sentinel-2, and digital surface models (DSMs), are linked to machine learning (ML) regression algorithms for flash floodwater depth retrieval. Edge detection filters are applied to remote sensing images to extract features that are used as independent features by ML algorithms to estimate flood depths. Data of dependent variables were obtained from the Hydrologic Engineering Center’s River Analysis System (HEC-RAS 2D) simulation model, as applied to the New Cairo, Egypt, post-flash flood event from 24–26 April 2018. Gradient boosting regression (GBR), random forest regression (RFR), linear regression (LR), extreme gradient boosting regression (XGBR), multilayer perceptron neural network regression (MLPR), k-nearest neighbors regression (KNR), and support vector regression (SVR) were used to estimate floodwater depths; their outputs were compared and evaluated for accuracy using the root-mean-square error (RMSE). The RMSE accuracy for all ML algorithms was 0.18–0.22 m for depths less than 1 m (96% of all test data), indicating that ML models are relatively portable and capable of computing floodwater depths using remote sensing data as an input.
Collapse
|
23
|
Evaluation of Air Quality Index by Spatial Analysis Depending on Vehicle Traffic during the COVID-19 Outbreak in Turkey. ENERGIES 2021. [DOI: 10.3390/en14185729] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
As in other countries of the world, the Turkish government is implementing many preventive partial and total lockdown practices against the virus’s infectious effect. When the first virus case has been detected, the public authorities have taken some restriction to reduce people and traffic mobility, which has also turned into some positive affect in air quality. To this end, the paper aims to examine how this pandemic affects traffic mobility and air quality in Istanbul. The pandemic does not only have a human health impact. This study also investigates the social and environmental effects. In our analysis, we observe, visualize, compare and discuss the impact of the post- and pre-lockdown on Istanbul’s traffic mobility and air quality. To do so, a geographic information system (GIS)-based approach is proposed. Various spatial analyses are performed in GIS with the statistical data used; thus, the environmental effects of the pandemic can be better observed. We test the hypothesis that this has reduced traffic mobility and improved air quality using traffic density cluster set and air monitoring stations (five air pollutant parameters) data for five months. The results shows that there are positive changes in terms of both traffic mobility and air quality, especially in April–May. PM10, SO2, CO, NO2 and NOx parameter values improved by 21.21%, 16.55%, 18.82%, 28.62% and 39.99%, respectively. In addition, there was a 7% increase in the average traffic speed. In order for the changes to be permanent, it is recommended to integrate e-mobility and sharing systems into the current transportation network.
Collapse
|
24
|
An enhanced dual IDW method for high-quality geospatial interpolation. Sci Rep 2021; 11:9903. [PMID: 33972610 PMCID: PMC8110750 DOI: 10.1038/s41598-021-89172-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Accepted: 04/20/2021] [Indexed: 11/08/2022] Open
Abstract
Many geoscience problems involve predicting attributes of interest at un-sampled locations. Inverse distance weighting (IDW) is a standard solution to such problems. However, IDW is generally not able to produce favorable results in the presence of clustered data, which is commonly used in the geospatial data process. To address this concern, this paper presents a novel interpolation approach (DIDW) that integrates data-to-data correlation with the conventional IDW and reformulates it within the geostatistical framework considering locally varying exponents. Traditional IDW, DIDW, and ordinary kriging are employed to evaluate the interpolation performance of the proposed method. This evaluation is based on a case study using the public Walker Lake dataset, and the associated interpolations are performed in various contexts, such as different sample data sizes and variogram parameters. The results demonstrate that DIDW with locally varying exponents stably produces more accurate and reliable estimates than the conventional IDW and DIDW. Besides, it yields more robust estimates than ordinary kriging in the face of varying variogram parameters. Thus, the proposed method can be applied as a preferred spatial interpolation method for most applications regarding its stability and accuracy.
Collapse
|
25
|
Sekulić A, Kilibarda M, Protić D, Bajat B. A high-resolution daily gridded meteorological dataset for Serbia made by Random Forest Spatial Interpolation. Sci Data 2021; 8:123. [PMID: 33931656 PMCID: PMC8087659 DOI: 10.1038/s41597-021-00901-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 03/23/2021] [Indexed: 11/09/2022] Open
Abstract
We produced the first daily gridded meteorological dataset at a 1-km spatial resolution across Serbia for 2000-2019, named MeteoSerbia1km. The dataset consists of five daily variables: maximum, minimum and mean temperature, mean sea-level pressure, and total precipitation. In addition to daily summaries, we produced monthly and annual summaries, and daily, monthly, and annual long-term means. Daily gridded data were interpolated using the Random Forest Spatial Interpolation methodology, based on using the nearest observations and distances to them as spatial covariates, together with environmental covariates to make a random forest model. The accuracy of the MeteoSerbia1km daily dataset was assessed using nested 5-fold leave-location-out cross-validation. All temperature variables and sea-level pressure showed high accuracy, although accuracy was lower for total precipitation, due to the discontinuity in its spatial distribution. MeteoSerbia1km was also compared with the E-OBS dataset with a coarser resolution: both datasets showed similar coarse-scale patterns for all daily meteorological variables, except for total precipitation. As a result of its high resolution, MeteoSerbia1km is suitable for further environmental analyses.
Collapse
Affiliation(s)
- Aleksandar Sekulić
- University of Belgrade, Faculty of Civil Engineering, Department of Geodesy and Geoinformatics, Belgrade, 11000, Serbia.
| | - Milan Kilibarda
- University of Belgrade, Faculty of Civil Engineering, Department of Geodesy and Geoinformatics, Belgrade, 11000, Serbia
| | - Dragutin Protić
- University of Belgrade, Faculty of Civil Engineering, Department of Geodesy and Geoinformatics, Belgrade, 11000, Serbia
| | - Branislav Bajat
- University of Belgrade, Faculty of Civil Engineering, Department of Geodesy and Geoinformatics, Belgrade, 11000, Serbia
| |
Collapse
|
26
|
Spatio-Temporal Classification Framework for Mapping Woody Vegetation from Multi-Temporal Sentinel-2 Imagery. REMOTE SENSING 2020. [DOI: 10.3390/rs12172845] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The inventory of woody vegetation is of great importance for good forest management. Advancements of remote sensing techniques have provided excellent tools for such purposes, reducing the required amount of time and labor, yet with high accuracy and the information richness. Sentinel-2 is one of the relatively new satellite missions, whose 13 spectral bands and short revisit time proved to be very useful when it comes to forest monitoring. In this study, the novel spatio-temporal classification framework for mapping woody vegetation from Sentinel-2 multitemporal data has been proposed. The used framework is based on the probability random forest classification, where temporal information is explicitly defined in the model. Because of this, several predictions are made for each pixel of the study area, which allow for specific spatio-temporal aggregation to be performed. The proposed methodology has been successfully applied for mapping eight potential forest and shrubby vegetation types over the study area of Serbia. Several spatio-temporal aggregation approaches have been tested, divided into two main groups: pixel-based and neighborhood-based. The validation metrics show that determining the most common vegetation type classes in the neighborhood of 5 × 5 pixels provides the best results. The overall accuracy and kappa coefficient obtained from five-fold cross validation of the results are 82.97% and 0.75, respectively. The corresponding producer’s accuracies range from 36.74% to 97.99% and user’s accuracies range from 46.31% to 98.43%. The proposed methodology proved to be applicable for mapping woody vegetation in Serbia and shows a potential to be implemented in other areas as well. Further testing is necessary to confirm such assumptions.
Collapse
|
27
|
A New Approach for Understanding Urban Microclimate by Integrating Complementary Predictors at Different Scales in Regression and Machine Learning Models. REMOTE SENSING 2020. [DOI: 10.3390/rs12152434] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Climate change is a major contemporary phenomenon with multiple consequences. In urban areas, it exacerbates the urban heat island phenomenon. It impacts the health of the inhabitants and the sensation of thermal discomfort felt in urban areas. Thus, it is necessary to estimate as well as possible the air temperature at any point of a territory, in particular in view of the ongoing rationalization of the network of fixed meteorological stations of Météo-France. Understanding the air temperature is increasingly in demand to input quantitative models related to a wide range of fields, such as hydrology, ecology, or climate change studies. This study thus proposes to model air temperature, measured during four mobile campaigns carried out during the summer months, between 2016 and 2019, in Lyon (France), in clear sky weather, using regression models based on 33 explanatory variables from traditionally used data, data from remote sensing by LiDAR (Light Detection and Ranging), or Landsat 8 satellite acquisition. Three types of statistical regression were experimented: partial least square regression, multiple linear regression, and a machine learning method, the random forest regression. For example, for the day of 30 August 2016, multiple linear regression explained 89% of the variance for the study days, with a root mean square error (RMSE) of only 0.23 °C. Variables such as surface temperature, Normalized Difference Vegetation Index (NDVI), and Modified Normalized Difference Water Index (MNDWI) have a strong impact on the estimation model. This study contributes to the emergence of urban cooling systems. The solutions available vary. For example, they may include increasing the proportion of vegetation on the ground, facades, or roofs, increasing the number of basins and water bodies to promote urban cooling, choosing water-retaining materials, humidifying the pavement, increasing the number of public fountains and foggers, or creating shade with stretched canvas.
Collapse
|