1
|
Zhang Y, Du S, Guan L, Chen X, Lei L, Liu L. Estimating global 0.1° scale gridded anthropogenic CO 2 emissions using TROPOMI NO 2 and a data-driven method. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 949:175177. [PMID: 39094662 DOI: 10.1016/j.scitotenv.2024.175177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 07/03/2024] [Accepted: 07/29/2024] [Indexed: 08/04/2024]
Abstract
Satellite remote sensing is a promising approach for monitoring global CO2 emissions. However, existing satellite-based CO2 observations are too coarse to meet the requirements of fine-scale global mapping. We propose a novel data-driven method to estimate global anthropogenic CO2 emissions at a 0.1° scale, which integrates emissions inventories and satellite data while bypassing the inadequate accuracy of CO2 observations. Due to the co-emitted anthropogenic emissions of nitrogen oxides (NOx = NO + NO2) and CO2, high-resolution NO2 measurements from the TROPOspheric Monitoring Instrument (TROPOMI) are employed to map the global anthropogenic emissions at a global 0.1° scale. We construct the driving features from NO2 data and also incorporate gridded CO2/NOx emission ratios and NOx/NO2 conversion ratios as driving data to describe co-emissions. Both ratios are predicted using a long short-term memory (LSTM) neural network (with an R2 of 0.984 for the CO2/NOx emission ratio and an R2 of 0.980 for the NOx/NO2 conversion ratio). The data-driven model for estimating anthropogenic CO2 emissions is implemented by random forest regression (RFR) and trained using the Emissions Database for Global Atmospheric Research (EDGAR). The satellite-based anthropogenic CO2 emission dataset at a global 0.1° scale agrees well with the national CO2 emission inventories (an R2 of 0.998 with Global Carbon Budget (GCB) and an R2 of 0.996 with EDGAR) and consistent with city-level emission estimates from Carbon Monitor Cities (CMC) with the R2 of 0.824. This data-driven method based on satellite-observed NO2 provides a new perspective for fine-resolution anthropogenic CO2 emissions estimation.
Collapse
Affiliation(s)
- Yucong Zhang
- Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shanshan Du
- Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China
| | - Linlin Guan
- Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China
| | - Xiaoyu Chen
- Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Liping Lei
- Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China
| | - Liangyun Liu
- Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
2
|
Fatima M, Ahmad A, Butt I, Arshad S, Kiani B. Geospatial modelling of ambient air pollutants and chronic obstructive pulmonary diseases at regional scale in Pakistan. ENVIRONMENTAL MONITORING AND ASSESSMENT 2024; 196:929. [PMID: 39271595 DOI: 10.1007/s10661-024-13105-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Accepted: 09/06/2024] [Indexed: 09/15/2024]
Abstract
Pakistan is among the South Asian countries mostly vulnerable to the negative health impacts of air pollution. In this context, the study aimed to analyze the spatiotemporal patterns of chronic obstructive pulmonary disease (COPD) incidence and its relationship with air pollutants including aerosol absorbing index (AAI), carbon monoxide, sulfur dioxide (SO2), and nitrogen dioxide. Spatial scan statistics were employed to identify temporal, spatial, and spatiotemporal clusters of COPD. Generalized linear regression (GLR) and random forest (RF) models were utilized to evaluate the linear and non-linear relationships between COPD and air pollutants for the years 2019 and 2020. The findings revealed three spatial clusters of COPD in the eastern and central regions, with a high-risk spatiotemporal cluster in the east. The GLR identified a weak linear relationship between the COPD and air pollutants with R2 = 0.1 and weak autocorrelation with Moran's index = -0.09. The spatial outcome of RF model provided more accurate COPD predictions with improved R2 of 0.8 and 0.9 in the respective years and a very low Moran's I = -0.02 showing a random residual distribution. The RF findings also suggested AAI and SO2 to be the most contributing predictors for the year 2019 and 2020. Hence, the strong association of COPD clusters with some air pollutants highlight the urgency of comprehensive measures to combat air pollution in the region to avoid future health risks.
Collapse
Affiliation(s)
- Munazza Fatima
- Department of Geography, The Islamia University of Bahawalpur, Punjab, 63100, Pakistan.
| | - Adeel Ahmad
- Taylor Geospatial Institute, St. Louis, 63103, USA
- Department of Computer Science & Engineering, Washington University in St. Louis, St. Louis, 63130, USA
- Institute of Geography, University of Punjab Lahore, Lahore, 54590, Pakistan
| | - Ibtisam Butt
- Institute of Geography, University of Punjab Lahore, Lahore, 54590, Pakistan
| | - Sana Arshad
- Department of Geography, The Islamia University of Bahawalpur, Punjab, 63100, Pakistan
| | - Behzad Kiani
- Centre for Clinical Research, The University of Queensland, Brisbane, Australia
| |
Collapse
|
3
|
Zhang K, Lin J, Li Y, Sun Y, Tong W, Li F, Chien LC, Yang Y, Su WC, Tian H, Fu P, Qiao F, Romeiko XX, Lin S, Luo S, Craft E. Unmasking the sky: high-resolution PM 2.5 prediction in Texas using machine learning techniques. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2024; 34:814-820. [PMID: 38561475 DOI: 10.1038/s41370-024-00659-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 03/06/2024] [Accepted: 03/07/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND Although PM2.5 (fine particulate matter with an aerodynamic diameter less than 2.5 µm) is an air pollutant of great concern in Texas, limited regulatory monitors pose a significant challenge for decision-making and environmental studies. OBJECTIVE This study aimed to predict PM2.5 concentrations at a fine spatial scale on a daily basis by using novel machine learning approaches and incorporating satellite-derived Aerosol Optical Depth (AOD) and a variety of weather and land use variables. METHODS We compiled a comprehensive dataset in Texas from 2013 to 2017, including ground-level PM2.5 concentrations from regulatory monitors; AOD values at 1-km resolution based on images retrieved from the MODIS satellite; and weather, land-use, population density, among others. We built predictive models for each year separately to estimate PM2.5 concentrations using two machine learning approaches called gradient boosted trees and random forest. We evaluated the model prediction performance using in-sample and out-of-sample validations. RESULTS Our predictive models demonstrate excellent in-sample model performance, as indicated by high R2 values generated from the gradient boosting models (0.94-0.97) and random forest models (0.81-0.90). However, the out-of-sample R2 values fall within a range of 0.52-0.75 for gradient boosting models and 0.44-0.69 for random forest models. Model performance varies slightly across years. A generally decreasing trend in predicted PM2.5 concentrations over time is observed in Eastern Texas. IMPACT STATEMENT We utilized machine learning approaches to predict PM2.5 levels in Texas. Both gradient boosting and random forest models perform well. Gradient boosting models perform slightly better than random forest models. Our models showed excellent in-sample prediction performance (R2 > 0.9).
Collapse
Affiliation(s)
- Kai Zhang
- Department of Environmental Health Sciences, School of Public Health,University at Albany, State University of New York, Rensselaer, NY, USA.
| | - Jeffrey Lin
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yuanfei Li
- Asian Demographic Research Institute, Shanghai University, Shanghai, China
| | - Yue Sun
- Department of International Development, Community, and Environment, Clark University, Worcester, MA, USA
| | - Weitian Tong
- Department of Computer Science, Georgia Southern University, Statesboro, GA, USA
| | - Fangyu Li
- Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Lung-Chang Chien
- Department of Epidemiology and Biostatistics, School of Public Health, University of Nevada, Las Vegas, Las Vegas, NV, USA
| | - Yiping Yang
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Wei-Chung Su
- Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Hezhong Tian
- State Key Joint Laboratory of Environmental Simulation & Pollution Control, School of Environment, Beijing Normal University, Beijing, China
- Center for Atmospheric Environmental Studies, Beijing Normal University, Beijing, China
| | - Peng Fu
- Department of Plant Biology, University of Illinois, Urbana, IL, USA
- Center for Economy, Environment, and Energy, Harrisburg University, Harrisburg, PA, USA
| | - Fengxiang Qiao
- Innovative Transportation Research Institute, Texas Southern University, Houston, TX, USA
| | - Xiaobo Xue Romeiko
- Department of Environmental Health Sciences, School of Public Health,University at Albany, State University of New York, Rensselaer, NY, USA
| | - Shao Lin
- Department of Environmental Health Sciences, School of Public Health,University at Albany, State University of New York, Rensselaer, NY, USA
| | - Sheng Luo
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, USA
| | | |
Collapse
|
4
|
Rakholia R, Le Q, Vu K, Ho BQ, Carbajo RS. Accurate PM 2.5 urban air pollution forecasting using multivariate ensemble learning Accounting for evolving target distributions. CHEMOSPHERE 2024; 364:143097. [PMID: 39154769 DOI: 10.1016/j.chemosphere.2024.143097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 07/28/2024] [Accepted: 08/13/2024] [Indexed: 08/20/2024]
Abstract
Over the past decades, air pollution has caused severe environmental and public health problems. According to the World Health Organization (WHO), fine particulate matter (PM2.5), a key component reflecting air quality, is the fourth leading cause of death worldwide after cardiovascular disease, smoking, and diet. Various research efforts have aimed to develop PM2.5 forecasting models that can be integrated into a solution to mitigate the adverse effects of air pollution. However, PM2.5 forecasting is challenging because air pollution data are non-stationary and influenced by multiple random effects. This paper proposes an effective multivariate multi-step ensemble machine learning model for predicting continuous 24-h PM2.5 concentrations, considering meteorological conditions, the rolling mean of PM2.5 time series, and temporal features. PM2.5 is strongly correlated with space and time. Therefore, forecasting results from one location are insufficient to represent the level of air pollution for an entire city. In this study, we established six real-time air quality monitoring sites in different regions, including traffic, residential, and industrial areas in Ho Chi Minh City (HCMC), and generated forecasting results for each station. Various statistical methods are incorporated to evaluate the performance of the model. The experimental results confirm that the model performs well, substantially improving its forecasting accuracy compared to existing PM2.5 forecasting models developed for HCMC. In addition, we analyze to determine the contribution of different feature groups to model performance. The model can serve as a reference for citizens scheduling local travel and for healthcare providers to provide early warnings.
Collapse
Affiliation(s)
- Rajnish Rakholia
- Ireland's National Centre for Artificial Intelligence (CeADAR), University College Dublin, NexusUCD, Belfield Office Park, Dublin, Ireland
| | - Quan Le
- Ireland's National Centre for Artificial Intelligence (CeADAR), University College Dublin, NexusUCD, Belfield Office Park, Dublin, Ireland.
| | - Khue Vu
- Institute for Environment and Resources (IER), Ho Chi Minh City, 700000, Viet Nam
| | - Bang Quoc Ho
- Institute for Environment and Resources (IER), Ho Chi Minh City, 700000, Viet Nam; Department of Science and Technology, Vietnam National University, Ho Chi Minh City, 700000, Viet Nam
| | - Ricardo Simon Carbajo
- Ireland's National Centre for Artificial Intelligence (CeADAR), University College Dublin, NexusUCD, Belfield Office Park, Dublin, Ireland
| |
Collapse
|
5
|
Zalzal J, Minet L, Brook J, Mihele C, Chen H, Hatzopoulou M. Capturing Exposure Disparities with Chemical Transport Models: Evaluating the Suitability of Downscaling Using Land Use Regression. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024. [PMID: 39092553 DOI: 10.1021/acs.est.4c03725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
High resolution exposure surfaces are essential to capture disparities in exposure to traffic-related air pollution in urban areas. In this study, we develop an approach to downscale Chemical Transport Model (CTM) simulations to a hyperlocal level (∼100m) in the Greater Toronto Area (GTA) under three scenarios where emissions from cars, trucks and buses are zeroed out, thus capturing the burden of each transportation mode. This proposed approach statistically fuses CTMs with Land-Use Regression using machine learning techniques. With this proposed downscaling approach, changes in air pollutant concentrations under different scenarios are appropriately captured by downscaling factors that are trained to reflect the spatial distribution of emission reductions. Our validation analysis shows that high-resolution models resulted in better performance than coarse models when compared with observations at reference stations. We used this downscaling approach to assess disparities in exposure to nitrogen dioxide (NO2) for populations composed of renters, low-income households, recent immigrants, and visible minorities. Individuals in all four categories were disproportionately exposed to the burden of cars, trucks, and buses. We conducted this analysis at spatial resolutions of 12, 4, 1 km, and 100 m and observed that disparities were significantly underestimated when using coarse spatial resolutions. This reinforces the need for high-spatial resolution exposure surfaces for environmental justice analyses.
Collapse
Affiliation(s)
- Jad Zalzal
- Department of Civil & Mineral Engineering, University of Toronto, 35 St George Street, Toronto, Ontario M5S 1A4, Canada
| | - Laura Minet
- Department of Civil Engineering, University of Victoria, 3800 Finnerty Road, Victoria, British Columbia V8P 5C2, Canada
| | - Jeffrey Brook
- Dalla Lana School of Public Health, University of Toronto, 155 College Street, Toronto, Ontario M5T 3M7, Canada
| | - Cristian Mihele
- Air Quality Research Division, Environment and Climate Change Canada, 4905 Dufferin Street, North York, Ontario M3H 5T4, Canada
| | - Hong Chen
- Dalla Lana School of Public Health, University of Toronto, 155 College Street, Toronto, Ontario M5T 3M7, Canada
- Environmental Health Science and Research Bureau, Health Canada, 50 Colombine Driveway, Ottawa, Ontario K1A 0K9, Canada
- Public Health Ontario, 480 University Avenue, Toronto, Ontario M5G 1 V2, Canada
- ICES, 2075 Bayview Avenue, Toronto, Ontario M4N 3M5, Canada
| | - Marianne Hatzopoulou
- Department of Civil & Mineral Engineering, University of Toronto, 35 St George Street, Toronto, Ontario M5S 1A4, Canada
| |
Collapse
|
6
|
Mohammadi Dashtaki N, Mirahmadizadeh A, Fararouei M, Mohammadi Dashtaki R, Hoseini M, Nayeb MR. The Lag -Effects of Air Pollutants and Meteorological Factors on COVID-19 Infection Transmission and Severity: Using Machine Learning Techniques. J Res Health Sci 2024; 24:e00622. [PMID: 39311105 PMCID: PMC11380733 DOI: 10.34172/jrhs.2024.157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 02/12/2024] [Accepted: 05/20/2024] [Indexed: 09/27/2024] Open
Abstract
BACKGROUND Exposure to air pollution is a major health problem worldwide. This study aimed to investigate the effect of the level of air pollutants and meteorological parameters with their related lag time on the transmission and severity of coronavirus disease 19 (COVID-19) using machine learning (ML) techniques in Shiraz, Iran. Study Design: An ecological study. METHODS In this ecological research, three main ML techniques, including decision trees, random forest, and extreme gradient boosting (XGBoost), have been applied to correlate meteorological parameters and air pollutants with infection transmission, hospitalization, and death due to COVID-19 from 1 October 2020 to 1 March 2022. These parameters and pollutants included particulate matter (PM2), sulfur dioxide (SO2 ), nitrogen dioxide (NO2 ), nitric oxide (NO), ozone (O3 ), carbon monoxide (CO), temperature (T), relative humidity (RH), dew point (DP), air pressure (AP), and wind speed (WS). RESULTS Based on the three ML techniques, NO2 (lag 5 day), CO (lag 4), and T (lag 25) were the most important environmental features affecting the spread of COVID-19 infection. In addition, the most important features contributing to hospitalization due to COVID-19 included RH (lag 28), T (lag 11), and O3 (lag 10). After adjusting for the number of infections, the most important features affecting the number of deaths caused by COVID-19 were NO2 (lag 20), O3 (lag 22), and NO (lag 23). CONCLUSION Our findings suggested that epidemics caused by COVID-19 and (possibly) similarly viral transmitted infections, including flu, air pollutants, and meteorological parameters, can be used to predict their burden on the community and health system. In addition, meteorological and air quality data should be included in preventive measures.
Collapse
Affiliation(s)
| | - Alireza Mirahmadizadeh
- Non-communicable Diseases Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Mohammad Fararouei
- AIDS/HIV Research Center, School of Public Health, Shiraz University of Medical Sciences, Shiraz, Iran
| | | | - Mohammad Hoseini
- Department of Environmental Health Engineering, School of Health, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Mohammad Reza Nayeb
- Student Research Committee, Shiraz University of Medical Sciences, Shiraz, Iran
| |
Collapse
|
7
|
Lim B, Song W. Exploring CrossFit performance prediction and analysis via extensive data and machine learning. J Sports Med Phys Fitness 2024; 64:640-649. [PMID: 38916087 DOI: 10.23736/s0022-4707.24.15786-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
BACKGROUND The analysis of athletic performance has always aroused great interest from sport scientist. This study utilized machine learning methods to build predictive models using a comprehensive CrossFit (CF) dataset, aiming to reveal valuable insights into the factors influencing performance and emerging trends. METHODS Random forest (RF) and multiple linear regression (MLR) were employed to predict performance in four key weightlifting exercises within CF: clean and jerk, snatch, back squat, and deadlift. Performance was evaluated using R-squared (R2) values and mean squared error (MSE). Feature importance analysis was conducted using RF, XGBoost, and AdaBoost models. RESULTS The RF model excelled in deadlift performance prediction (R2=0.80), while the MLR model demonstrated remarkable accuracy in clean and jerk (R2=0.93). Across exercises, clean and jerk consistently emerged as a crucial predictor. The feature importance analysis revealed intricate relationships among exercises, with gender significantly impacting deadlift performance. CONCLUSIONS This research advances our understanding of performance prediction in CF through machine learning techniques. It provides actionable insights for practitioners, optimize performance, and demonstrates the potential for future advancements in data-driven sports analytics.
Collapse
Affiliation(s)
- Byunggul Lim
- Health and Exercise Science Laboratory, Department of Physical Education, Seoul National University, Seoul, South Korea
- Institute on Aging, Seoul National University, Seoul, South Korea
| | - Wook Song
- Health and Exercise Science Laboratory, Department of Physical Education, Seoul National University, Seoul, South Korea -
- Institute on Aging, Seoul National University, Seoul, South Korea
- Institute of Sport Science, Seoul National University, Seoul, South Korea
| |
Collapse
|
8
|
Venkatraman Jagatha J, Schneider C, Sauter T. Parsimonious Random-Forest-Based Land-Use Regression Model Using Particulate Matter Sensors in Berlin, Germany. SENSORS (BASEL, SWITZERLAND) 2024; 24:4193. [PMID: 39000970 PMCID: PMC11244214 DOI: 10.3390/s24134193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/07/2024] [Accepted: 06/21/2024] [Indexed: 07/16/2024]
Abstract
Machine learning (ML) methods are widely used in particulate matter prediction modelling, especially through use of air quality sensor data. Despite their advantages, these methods' black-box nature obscures the understanding of how a prediction has been made. Major issues with these types of models include the data quality and computational intensity. In this study, we employed feature selection methods using recursive feature elimination and global sensitivity analysis for a random-forest (RF)-based land-use regression model developed for the city of Berlin, Germany. Land-use-based predictors, including local climate zones, leaf area index, daily traffic volume, population density, building types, building heights, and street types were used to create a baseline RF model. Five additional models, three using recursive feature elimination method and two using a Sobol-based global sensitivity analysis (GSA), were implemented, and their performance was compared against that of the baseline RF model. The predictors that had a large effect on the prediction as determined using both the methods are discussed. Through feature elimination, the number of predictors were reduced from 220 in the baseline model to eight in the parsimonious models without sacrificing model performance. The model metrics were compared, which showed that the parsimonious_GSA-based model performs better than does the baseline model and reduces the mean absolute error (MAE) from 8.69 µg/m3 to 3.6 µg/m3 and the root mean squared error (RMSE) from 9.86 µg/m3 to 4.23 µg/m3 when applying the trained model to reference station data. The better performance of the GSA_parsimonious model is made possible by the curtailment of the uncertainties propagated through the model via the reduction of multicollinear and redundant predictors. The parsimonious model validated against reference stations was able to predict the PM2.5 concentrations with an MAE of less than 5 µg/m3 for 10 out of 12 locations. The GSA_parsimonious performed best in all model metrics and improved the R2 from 3% in the baseline model to 17%. However, the predictions exhibited a degree of uncertainty, making it unreliable for regional scale modelling. The GSA_parsimonious model can nevertheless be adapted to local scales to highlight the land-use parameters that are indicative of PM2.5 concentrations in Berlin. Overall, population density, leaf area index, and traffic volume are the major predictors of PM2.5, while building type and local climate zones are the less significant predictors. Feature selection based on sensitivity analysis has a large impact on the model performance. Optimising models through sensitivity analysis can enhance the interpretability of the model dynamics and potentially reduce computational costs and time when modelling is performed for larger areas.
Collapse
Affiliation(s)
| | - Christoph Schneider
- Geography Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
| | - Tobias Sauter
- Geography Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
| |
Collapse
|
9
|
Jitkajornwanich K, Vijaranakul N, Jaiyen S, Srestasathiern P, Lawawirojwong S. Enhancing risk communication and environmental crisis management through satellite imagery and AI for air quality index estimation. MethodsX 2024; 12:102611. [PMID: 38420115 PMCID: PMC10901142 DOI: 10.1016/j.mex.2024.102611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 02/10/2024] [Indexed: 03/02/2024] Open
Abstract
Due to climate change, the air pollution problem has become more and more prominent [23]. Air pollution has impacts on people globally, and is considered one of the leading risk factors for premature death worldwide; it was ranked as number 4 according to the website [24]. A study, 'The Global Burden of Disease,' reported 4,506,193 deaths were caused by outdoor air pollution in 2019 [22,25]. The air pollution problem is become even more apparent when it comes to developing countries [22], including Thailand, which is considered one of the developing countries [26]. In this research, we focus and analyze the air pollution in Thailand, which has the annual average PM2.5 (particulate matter 2.5) concentration falls in between 15 and 25, classified as the interim target 2 by 2021's WHO AQG (World Health Organization's Air Quality Guidelines) [27]. (The interim targets refer to areas where the air pollutants concentration is high, with 1 being the highest concentration and decreasing down to 4 [27,28]). However, the methodology proposed here can also be adopted in other areas as well. During the winter in Thailand, Bangkok and its surrounding metroplex have been facing the issue of air pollution (e.g., PM2.5) every year. Currently, air quality measurement is done by simply implementing physical air quality measurement devices at designated-but limited number of locations. In this work, we propose a method that allows us to estimate the Air Quality Index (AQI) on a larger scale by utilizing Landsat 8 images with machine learning techniques. We propose and compare hybrid models with pure regression models to enhance AQI prediction based on satellite images. Our hybrid model consists of two parts as follows:•The classification part and the estimation part, whereas the pure regressor model consists of only one part, which is a pure regression model for AQI estimation.•The two parts of the hybrid model work hand in hand such that the classification part classifies data points into each class of air quality standard, which is then passed to the estimation part to estimate the final AQI. From our experiments, after considering all factors and comparing their performances, we conclude that the hybrid model has a slightly better performance than the pure regressor model, although both models can achieve a generally minimum R2 (R2 > 0.7). We also introduced and tested an additional factor, DOY (day of year), and incorporated it into our model. Additional experiments with similar approaches are also performed and compared. And, the results also show that our hybrid model outperform them. Keywords: climate change, air pollution, air quality assessment, air quality index, AQI, machine learning, AI, Landsat 8, satellite imagery analysis, environmental data analysis, natural disaster monitoring and management, crisis and disaster management and communication.
Collapse
Affiliation(s)
- Kulsawasd Jitkajornwanich
- Department of Computer Science, School of Science, King Mongkut's Institute of Technology Ladkrabang (KMITL), Bangkok 10520, Thailand
| | - Nattadet Vijaranakul
- College of Media and Communication, Texas Tech University, Lubbock, TX 79409, USA
| | - Saichon Jaiyen
- School of Information Technology, King Mongkut's University of Technology Thonburi (KMUTT), Bangkok 10140, Thailand
| | - Panu Srestasathiern
- Geo-Informatics and Space Technology Development Agency, GISTDA (Public Organization), Bangkok 10210, Thailand
| | - Siam Lawawirojwong
- Geo-Informatics and Space Technology Development Agency, GISTDA (Public Organization), Bangkok 10210, Thailand
| |
Collapse
|
10
|
Xia Y, McCracken T, Liu T, Chen P, Metcalf A, Fan C. Understanding the Disparities of PM2.5 Air Pollution in Urban Areas via Deep Support Vector Regression. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:8404-8416. [PMID: 38698567 DOI: 10.1021/acs.est.3c09177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
In densely populated urban areas, PM2.5 has a direct impact on the health and quality of residents' life. Thus, understanding the disparities of PM2.5 is crucial for ensuring urban sustainability and public health. Traditional prediction models often overlook the spillover effects within urban areas and the complexity of the data, leading to inaccurate spatial predictions of PM2.5. We propose Deep Support Vector Regression (DSVR) that models the urban areas as a graph, with grid center points as the nodes and the connections between grids as the edges. Nature and human activity features of each grid are initialized as the representation of each node. Based on the graph, DSVR uses random diffusion-based deep learning to quantify the spillover effects of PM2.5. It leverages random walk to uncover more extensive spillover relationships between nodes, thereby capturing both the local and nonlocal spillover effects of PM2.5. And then it engages in predictive learning using the feature vectors that encapsulate spillover effects, enhancing the understanding of PM2.5 disparities and connections across different regions. By applying our proposed model in the northern region of New York for predictive performance analysis, we found that DSVR consistently outperforms other models. During periods of PM2.5 surges, the R-square of DSVR reaches as high as 0.729, outperforming non-spillover models by 2.5 to 5.7 times and traditional spatial metric models by 2.2 to 4.6 times. Therefore, our proposed model holds significant importance for understanding disparities of PM2.5 air pollution in urban areas, taking the first steps toward a new method that considers both the spillover effects and nonlinear feature of data for prediction.
Collapse
Affiliation(s)
- Yuling Xia
- School of Mathematics, Southwest Jiaotong University, Sichuan province Chengdu 611756, China
| | - Teague McCracken
- School of Civil and Environmental Engineering and Earth Sciences, Clemson University, Clemson, South Carolina 29634, United States
| | - Tong Liu
- School of Civil and Environmental Engineering and Earth Sciences, Clemson University, Clemson, South Carolina 29634, United States
| | - Pei Chen
- Department of Computer Science and Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Andrew Metcalf
- School of Civil and Environmental Engineering and Earth Sciences, Clemson University, Clemson, South Carolina 29634, United States
| | - Chao Fan
- School of Civil and Environmental Engineering and Earth Sciences, Clemson University, Clemson, South Carolina 29634, United States
| |
Collapse
|
11
|
Zhao S, Chen K, Xiong B, Guo C, Dang Z. Prediction of adsorption of metal cations by clay minerals using machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 924:171733. [PMID: 38492590 DOI: 10.1016/j.scitotenv.2024.171733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/24/2024] [Accepted: 03/13/2024] [Indexed: 03/18/2024]
Abstract
Adsorption of heavy metals by clay minerals occurs widely at the solid-liquid interface in natural environments, and in this paper, the phenomenon of adsorption of Cd2+, Cu2+, Pb2+, Zn2+, Ni2+ and Co2+ by montmorillonite, kaolinite and illite was simulated using machine learning. We firstly used six machine learning models including Random Forest(R), Extremely Forest(E), Gradient Boosting Decision Tree(G), Extreme Gradient Boosting(X), Light Gradient Boosting(LGB) and Category Boosting(CAT) to feature engineer the metal cations and the parameters of the minerals, and based on the feature engineering results, we determined the first order hydrolysis constant(log K), solubility product constant(SPC), and higher hydrolysis constant (HHC) as the descriptors of the metal cations, and site density(SD) and cation exchange capacity(CEC) as the descriptors of the clay minerals. After comparing the predictive effects of different data cleaning methods (pH50 method, Box method and pH50-Box method) and six model combinations, it was finally concluded that the best simulation results could be achieved by using the pH 50-Box method for data cleaning and Extreme Gradient Boosting for modelling (RMSE = 4.158 %, R2 = 0.977). Finally, model interpretation was carried out using Shapley explanation plot (SHAP) and partial dependence plot(PDP) to analyse the potential connection between each input variable and the output results. This study combines machine learning with geochemical analysis of the mechanism of heavy metal adsorption by clay minerals, which provides a different research perspective from the traditional surface complexation model.
Collapse
Affiliation(s)
- Shoushi Zhao
- School of Environment and Energy, South China University of Technology, Guangzhou 510006, PR China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou 510006, PR China
| | - Kai Chen
- School of Environment and Energy, South China University of Technology, Guangzhou 510006, PR China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou 510006, PR China
| | - Beiyi Xiong
- School of Environment and Energy, South China University of Technology, Guangzhou 510006, PR China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou 510006, PR China
| | - Chuling Guo
- School of Environment and Energy, South China University of Technology, Guangzhou 510006, PR China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou 510006, PR China.
| | - Zhi Dang
- School of Environment and Energy, South China University of Technology, Guangzhou 510006, PR China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou 510006, PR China
| |
Collapse
|
12
|
Shi TL, Jia KH, Bao YT, Nie S, Tian XC, Yan XM, Chen ZY, Li ZC, Zhao SW, Ma HY, Zhao Y, Li X, Zhang RG, Guo J, Zhao W, El-Kassaby YA, Müller N, Van de Peer Y, Wang XR, Street NR, Porth I, An X, Mao JF. High-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar. PLANT PHYSIOLOGY 2024; 195:652-670. [PMID: 38412470 PMCID: PMC11060683 DOI: 10.1093/plphys/kiae078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 01/08/2024] [Accepted: 01/09/2024] [Indexed: 02/29/2024]
Abstract
Poplar (Populus) is a well-established model system for tree genomics and molecular breeding, and hybrid poplar is widely used in forest plantations. However, distinguishing its diploid homologous chromosomes is difficult, complicating advanced functional studies on specific alleles. In this study, we applied a trio-binning design and PacBio high-fidelity long-read sequencing to obtain haplotype-phased telomere-to-telomere genome assemblies for the 2 parents of the well-studied F1 hybrid "84K" (Populus alba × Populus tremula var. glandulosa). Almost all chromosomes, including the telomeres and centromeres, were completely assembled for each haplotype subgenome apart from 2 small gaps on one chromosome. By incorporating information from these haplotype assemblies and extensive RNA-seq data, we analyzed gene expression patterns between the 2 subgenomes and alleles. Transcription bias at the subgenome level was not uncovered, but extensive-expression differences were detected between alleles. We developed machine-learning (ML) models to predict allele-specific expression (ASE) with high accuracy and identified underlying genome features most highly influencing ASE. One of our models with 15 predictor variables achieved 77% accuracy on the training set and 74% accuracy on the testing set. ML models identified gene body CHG methylation, sequence divergence, and transposon occupancy both upstream and downstream of alleles as important factors for ASE. Our haplotype-phased genome assemblies and ML strategy highlight an avenue for functional studies in Populus and provide additional tools for studying ASE and heterosis in hybrids.
Collapse
Affiliation(s)
- Tian-Le Shi
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Kai-Hua Jia
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- Key Laboratory of Crop Genetic Improvement & Ecology and Physiology, Institute of Crop Germplasm Resources, Shandong Academy of Agricultural Sciences, Ji’nan 250100, China
| | - Yu-Tao Bao
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Shuai Nie
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, China
| | - Xue-Chan Tian
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Xue-Mei Yan
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Zhao-Yang Chen
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Zhi-Chao Li
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Shi-Wei Zhao
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Hai-Yao Ma
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Ye Zhao
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Xiang Li
- School of Agriculture, Ningxia University, Yinchuan 750021, China
| | - Ren-Gang Zhang
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China
| | - Jing Guo
- College of Forestry, Shandong Agricultural University, Tai’an 271000, China
| | - Wei Zhao
- Umeå Plant Science Centre, Department of Ecology and Environmental Science, Umeå University, SE-901 87 Umeå, Sweden
| | - Yousry Aly El-Kassaby
- Department of Forest and Conservation Sciences, Faculty of Forestry, University of British Columbia, Vancouver, Bc, V6T 1Z4, Canada
| | - Niels Müller
- Thünen-Institute of Forest Genetics, 22927 Grosshansdorf, Germany
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
- Centre for Microbial Ecology and Genomics, Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0028, South Africa
- College of Horticulture, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing 210095, China
| | - Xiao-Ru Wang
- Umeå Plant Science Centre, Department of Ecology and Environmental Science, Umeå University, SE-901 87 Umeå, Sweden
| | - Nathaniel Robert Street
- Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, SE-901 87 Umeå, Sweden
| | - Ilga Porth
- Départment des Sciences du Bois et de la Forêt, Faculté de Foresterie, de Géographie et Géomatique, Université Laval, Québec, QC G1V 0A6, Canada
| | - Xinmin An
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Jian-Feng Mao
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, SE-901 87 Umeå, Sweden
| |
Collapse
|
13
|
Wood DA. Trend-attribute forecasting of hourly PM2.5 trends in fifteen cities of Central England applying optimized machine learning feature selection. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 356:120561. [PMID: 38479290 DOI: 10.1016/j.jenvman.2024.120561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 02/18/2024] [Accepted: 03/05/2024] [Indexed: 04/07/2024]
Abstract
Recorded particulate matter (PM2.5) hourly trends are compared for fifteen urban recording sites distributed across central England for the period 2018 to 2022. They include 10 urban-background and five urban-traffic (roadside) sites with some located within the same urban area. The sites all show consistent background and peak distributions with mean annual values and standard deviations higher for 2018 and 2019 than for 2020 to 2022. The objective of this study is to demonstrate that trend attributes extracted from hourly recorded univariate PM2.5 trends at these sites can be used to provide reliable short-term hourly predictions and provide valuable insight into the regional variations in the recorded trends. Fifteen trend attributes extracted from the prior 12 h (t-1 to t-12) of recorded PM2.5 data were compiled and used as input to four supervised machine learning models (SML) to forecast PM2.5 concentrations up to 13 h ahead (t0 to t+12). All recording sites delivered forecasts with similar ranges of error levels for specific hours ahead which are consistent with their PM2.5 recorded ranges. Forecasting results for four representative sites are presented in detail using models trained and cross-validated with 2020 and 2021 hourly data to forecast 2021 and 2022 hourly data, respectively. A novel optimized feature selection procedure using a suite of five optimizers is used to improve the efficiency of the forecasting models. The LASSO and support vector regression models generate the best and most generalizable hourly PM2.5 forecasts from trained and validated SML models with mean average error (MAE) of between ∼1 and ∼3 μg/m3 for t0 to t+3 h ahead. A novel overfitting indicator, exploiting the cross-validation mean values, demonstrates that these two models are not affected by overfitting. Forecasts for t+6 to t+12 h forward generate higher MAE values between ∼3 and ∼4 μg/m3 due to their tendency to underestimate some of the extreme PM2.5 peaks. These findings indicate that further model refinements are required to generate more reliable short-term predictions for the t+6 to t+24 h ahead.
Collapse
|
14
|
Ma Z, Wang B, Luo W, Jiang J, Liu D, Wei H, Luo H. Air pollutant prediction model based on transfer learning two-stage attention mechanism. Sci Rep 2024; 14:7385. [PMID: 38548823 PMCID: PMC10978953 DOI: 10.1038/s41598-024-57784-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 03/21/2024] [Indexed: 04/01/2024] Open
Abstract
Atmospheric pollution significantly impacts the regional economy and human health, and its prediction has been increasingly emphasized. The performance of traditional prediction methods is limited due to the lack of historical data support in new atmospheric monitoring sites. Therefore, this paper proposes a two-stage attention mechanism model based on transfer learning (TL-AdaBiGRU). First, the first stage of the model utilizes a temporal distribution characterization algorithm to segment the air pollutant sequences into periods. It introduces a temporal attention mechanism to assign self-learning weights to the period segments in order to filter out essential period features. Then, in the second stage of the model, a multi-head external attention mechanism is introduced to mine the network's hidden layer key features. Finally, the adequate knowledge learned by the model at the source domain site is migrated to the new site to improve the prediction capability of the new site. The results show that (1) the model is modeled from the data distribution perspective, and the critical information within the sequence of periodic segments is mined in depth. (2) The model employs a unique two-stage attention mechanism to capture complex nonlinear relationships in air pollutant data. (3) Compared with the existing models, the mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE) of the model decreased by 14%, 13%, and 4%, respectively, and the prediction accuracy was greatly improved.
Collapse
Affiliation(s)
- Zhanfei Ma
- School of Information Science and Technology, Baotou Teachers' College, Baotou, 014010, Inner Mongolia, China
- School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, Inner Mongolia, China
| | - Bisheng Wang
- School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, Inner Mongolia, China.
| | - Wenli Luo
- School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, Inner Mongolia, China
| | - Jing Jiang
- School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, Inner Mongolia, China
| | - Dongxiang Liu
- School of Information Science and Technology, Baotou Teachers' College, Baotou, 014010, Inner Mongolia, China
| | - Hui Wei
- School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, Inner Mongolia, China
| | - HaoYe Luo
- School of Information Science and Technology, Baotou Teachers' College, Baotou, 014010, Inner Mongolia, China
| |
Collapse
|
15
|
Ayinde BO, Musa MR, Ayinde AAO. Application of machine learning models and landsat 8 data for estimating seasonal pm 2.5 concentrations. Environ Anal Health Toxicol 2024; 39:e2024011-0. [PMID: 38631403 PMCID: PMC11079408 DOI: 10.5620/eaht.2024011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 03/12/2024] [Indexed: 04/19/2024] Open
Abstract
Air pollution is a significant global challenge that affects many cities. In Europe, Bosnia and Herzegovina (BiH) are among the most highly polluted and are mainly affected by air pollution. In this study, we integrate open-source landsat 8 remote sensing products, topographical data, and the limited ground truth PM2.5 data to spatially predict the air quality level across different seasons in Tuzla Canton, BiH by adopting three pre-existing machine learning models, namely XGBoost, K-Nearest Neighbour (KNN) and Naive Bayes (NB). These classification models were implemented based on landsat 8 bands, environmental-derived indices, and topographical variables generated for the study area. Based on the predicted results, the XGBoost model exhibited the highest overall accuracy across all seasons. The predicted model results were used to generate spatial air quality maps. Based on the classification maps, the PM2.5 air quality level predicted for Tuzla Canton in the Winter Season is very unhealthy. The findings conclude that the PM2.5 air quality concentration in Tuzla Canton is relatively unsatisfactory and requires urgent intervention by the government to prevent further deterioration of air quality in Tuzla and other affected cantons in BiH.
Collapse
|
16
|
Brahimi N, Zhang H, Zaidi SDA, Dai L. A Unified Spatio-Temporal Inference Network for Car-Sharing Serial Prediction. SENSORS (BASEL, SWITZERLAND) 2024; 24:1266. [PMID: 38400424 PMCID: PMC10892602 DOI: 10.3390/s24041266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 02/07/2024] [Accepted: 02/14/2024] [Indexed: 02/25/2024]
Abstract
Car-sharing systems require accurate demand prediction to ensure efficient resource allocation and scheduling decisions. However, developing precise predictive models for vehicle demand remains a challenging problem due to the complex spatio-temporal relationships. This paper introduces USTIN, the Unified Spatio-Temporal Inference Prediction Network, a novel neural network architecture for demand prediction. The model consists of three key components: a temporal feature unit, a spatial feature unit, and a spatio-temporal feature unit. The temporal unit utilizes historical demand data and comprises four layers, each corresponding to a different time scale (hourly, daily, weekly, and monthly). Meanwhile, the spatial unit incorporates contextual points of interest data to capture geographic demand factors around parking stations. Additionally, the spatio-temporal unit incorporates weather data to model the meteorological impacts across locations and time. We conducted extensive experiments on real-world car-sharing data. The proposed USTIN model demonstrated its ability to effectively learn intricate temporal, spatial, and spatiotemporal relationships, and outperformed existing state-of-the-art approaches. Moreover, we employed negative binomial regression with uncertainty to identify the most influential factors affecting car usage.
Collapse
Affiliation(s)
| | - Huaping Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; (N.B.); (S.D.A.Z.); (L.D.)
| | | | | |
Collapse
|
17
|
Guastavino S, Piana M, Benvenuto F. Bad and Good Errors: Value-Weighted Skill Scores in Deep Ensemble Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1993-2002. [PMID: 35776819 DOI: 10.1109/tnnls.2022.3186068] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Forecast verification is a crucial task for assessing the predictive power of prognostic model forecasts and it is usually implemented by checking quality-based skill scores. In this article, we propose a novel approach to realize forecast verification focusing not just on the forecast quality but rather on its value. Specifically, we introduce a strategy for assessing the severity of forecast errors based on the evidence that, on the one hand, a false alarm just anticipating an occurring event is better than one in the middle of consecutive nonoccurring events, and that, on the other hand, a miss of an isolated event has a worse impact than a miss of a single event, which is part of several consecutive occurrences. Relying on this idea, we introduce a notion of value-weighted skill scores giving greater importance to the value of the prediction rather than to its quality. Then, we introduce an ensemble strategy to maximize quality-based and value-weighted skill scores independently of one another. We test it on the predictions provided by deep learning methods for binary classification in the case of four applications concerned with pollution, space weather, stock price, and IoT data stream forecasting. Our experimental studies show that using the ensemble strategy for maximizing the value-weighted skill scores generally improves both the value and quality of the forecast.
Collapse
|
18
|
Verma A, Ranga V, Vishwakarma DK. A novel approach for forecasting PM2.5 pollution in Delhi using CATALYST. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023; 195:1457. [PMID: 37950817 DOI: 10.1007/s10661-023-12020-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 10/23/2023] [Indexed: 11/13/2023]
Abstract
Air pollution is one of the main environmental issues in densely populated urban areas like Delhi. Predictions of the PM2.5 concentration must be accurate for pollution reduction strategies and policy actions to succeed. This research article presents a novel approach for forecasting PM2.5 pollution in Delhi by combining a pre-trained CNN model with a transformer-based model called CATALYST (Convolutional and Transformer model for Air Quality Forecasting). This proposed strategy uses a mixture of the two models. To derive attributes of the PM2.5 timeline of data, a pre-existing CNN model is utilized to transform the data into visual representations, which are analyzed subsequently. The CATALYST model is trained to predict future PM2.5 pollution levels using a sliding window training approach on extracted features. The model is utilized for analyzing temporal dependencies in PM2.5 time-series data. This model incorporates the advancements in the transformer-based architecture initially designed for natural language processing applications. CATALYST combines positional encoding with the Transformer architecture to capture intricate patterns and variations resulting from diverse meteorological, geographical, and anthropogenic factors. In addition, an innovative approach is suggested for building input-output couples, intending to address the problem of missing or partial data in environmental time-series datasets while ensuring that all training data blocks are comprehensive. On a PM2.5 dataset, we analyze the proposed CATALYST model and compare its performance with other standard time-series forecasting approaches, such as ARIMA and LSTM. The outcomes of the experiments demonstrate that the suggested model works better than conventional methods and is a potential strategy for accurately forecasting PM2.5 pollution. The applicability of CATALYST to real-world scenarios can be tested by running more experiments on real-world datasets. This can help develop efficient pollution mitigation measures, impacting public health and environmental sustainability.
Collapse
Affiliation(s)
- Abhishek Verma
- Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Bawana Road, Delhi, -110042, India.
| | - Virender Ranga
- Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Bawana Road, Delhi, -110042, India
| | - Dinesh Kumar Vishwakarma
- Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Bawana Road, Delhi, -110042, India
| |
Collapse
|
19
|
Kieu HT, Pak HY, Trinh HL, Pang DSC, Khoo E, Law AWK. UAV-based remote sensing of turbidity in coastal environment for regulatory monitoring and assessment. MARINE POLLUTION BULLETIN 2023; 196:115482. [PMID: 37864857 DOI: 10.1016/j.marpolbul.2023.115482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Revised: 08/30/2023] [Accepted: 09/01/2023] [Indexed: 10/23/2023]
Abstract
The adoption of Unmanned Aerial Vehicle (UAV) remote sensing for the regulatory monitoring of turbidity plumes induced by land reclamation operations remains a difficult task. Compared to UAV remote sensing on ambient turbidity in estuaries and rivers, such monitoring of construction-induced turbidity plumes requires significantly higher spatial resolutions and accuracy as well as wider turbidity ranges with nonlinear reflectance. In this study, a pilot-scale deployment of UAV-based hyperspectral sensing is carried out for this objective, with specific new elements developed to overcome the challenges and minimise the uncertainties involved. In particular, Machine learning (ML) models for the turbidity determination were trained by the large dataset collected to better capture the non-linearity of the relationship between the water leaving reflectance and turbidity level. The models achieve a good accuracy with a R2 score of 0.75 that is deemed acceptable in view of the uncertainties associated with construction and land reclamation work.
Collapse
Affiliation(s)
- Hieu Trung Kieu
- Environmental Process Modelling Centre, Nanyang Environment and Water Research Institute, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
| | - Hui Ying Pak
- Environmental Process Modelling Centre, Nanyang Environment and Water Research Institute, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore; Interdisciplinary Graduate Programme, Graduate College, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
| | - Ha Linh Trinh
- Environmental Process Modelling Centre, Nanyang Environment and Water Research Institute, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
| | - Dawn Sok Cheng Pang
- Environmental Process Modelling Centre, Nanyang Environment and Water Research Institute, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
| | - Eugene Khoo
- Engineering and Project Management Division, Maritime and Port Authority of Singapore, Singapore 119963, Singapore
| | - Adrian Wing-Keung Law
- Environmental Process Modelling Centre, Nanyang Environment and Water Research Institute, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore; School of Civil and Environmental Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore.
| |
Collapse
|
20
|
Morapedi TD, Obagbuwa IC. Air pollution particulate matter (PM2.5) prediction in South African cities using machine learning techniques. Front Artif Intell 2023; 6:1230087. [PMID: 37881653 PMCID: PMC10595005 DOI: 10.3389/frai.2023.1230087] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 09/04/2023] [Indexed: 10/27/2023] Open
Abstract
Background Air pollution contributes to the most severe environmental and health problems due to industrial emissions and atmosphere contamination, produced by climate and traffic factors, fossil fuel combustion, and industrial characteristics. Because this is a global issue, several nations have established control of air pollution stations in various cities to monitor pollutants like Nitrogen Dioxide (NO2), Ozone (O3), Sulfur Dioxide (SO2), Carbon Monoxide (CO), Particulate Matter (PM2.5, PM10), to notify inhabitants when pollution levels surpass the quality threshold. With the rise in air pollution, it is necessary to construct models to capture data on air pollutant concentrations. Compared to other parts of the world, Africa has a scarcity of reliable air quality sensors for monitoring and predicting Particulate Matter (PM2.5). This demonstrates the possibility of extending research in air pollution control. Methods Machine learning techniques were utilized in this study to identify air pollution in terms of time, cost, and efficiency so that different scenarios and systems may select the optimal way for their needs. To assess and forecast the behavior of Particulate Matter (PM2.5), this study presented a Machine Learning approach that includes Cat Boost Regressor, Extreme Gradient Boosting Regressor, Random Forest Classifier, Logistic Regression, Support Vector Machine, K-Nearest Neighbor, and Decision Tree. Results Cat Boost Regressor and Extreme Gradient Boosting Regressor were implemented to predict the latest PM2.5 concentrations for South African Cities with recording stations using past dated recordings, then the best performing model between the two is used to predict PM2.5 concentrations for South African Cities with no recording stations and also to predict future PM2.5 concentrations for South African Cities. K-Nearest Neighbor, Logistic Regression, Support Vector Machine, Decision Tree, and Random Forest Classifier were implemented to create a system predicting the Air Quality Index (AQI) Status. Conclusion This study investigated various machine learning techniques for air pollution to analyze and predict air pollution behavior regarding air quality and air pollutants, detecting which areas are most affected in South African cities.
Collapse
Affiliation(s)
| | - Ibidun Christiana Obagbuwa
- Department of Computer Science and Information Technology, School of Natural and Applied Sciences, Sol Plaatje University, Kimberley, South Africa
| |
Collapse
|
21
|
Guo Q, Zhang H, Zhang Y, Jiang X. Prediction of PM 2.5 concentration based on the CEEMDAN-RLMD-BiLSTM-LEC model. PeerJ 2023; 11:e15931. [PMID: 37663301 PMCID: PMC10470446 DOI: 10.7717/peerj.15931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 07/30/2023] [Indexed: 09/05/2023] Open
Abstract
Air quality has emerged as a critical concern in recent years, with the concentration of PM2.5 recognized as a vital index for assessing it. The accuracy of predicting PM2.5 concentrations holds significant value for effective air quality monitoring and management. In response to this, a combined model comprising CEEMDAN-RLMD-BiLSTM-LEC has been introduced, analyzed, and compared against various other models. The combined decomposition method effectively underlines the fundamental characteristics of the data compared to individual decomposition techniques. Additionally, local error correction (LEC) efficiently addresses the issue of prediction errors induced by excessive disturbances. The empirical results of nine steps indicate that the combined CEEMDAN-RLMD-BiLSTM-LEC model outperforms single prediction models such as RLMD and CEEMDAN, reducing MAE, RMSE, and SAMPE by 36.16%, 28.63%, 45.27% and 16.31%, 6.15%, 37.76%, respectively. Moreover, the inclusion of LEC in the model further diminishes MAE, RMSE, and SMAPE by 20.69%, 7.15%, and 44.65%, respectively, exhibiting commendable performance in generalization experiments. These findings demonstrate that the combined CEEMDAN-RLMD-BiLSTM-LEC model offers high predictive accuracy and robustness, effectively handling noisy data predictions and severe local variations. With its wide applicability, this model emerges as a potent tool for addressing various related challenges in the field.
Collapse
Affiliation(s)
- Qiao Guo
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| | - Haoyu Zhang
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| | - Yuhao Zhang
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| | - Xuchu Jiang
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| |
Collapse
|
22
|
Zhen Y, Wang L, Sun H, Liu C. Prediction of microplastic abundance in surface water of the ocean and influencing factors based on ensemble learning. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 331:121834. [PMID: 37209894 DOI: 10.1016/j.envpol.2023.121834] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 04/18/2023] [Accepted: 05/13/2023] [Indexed: 05/22/2023]
Abstract
Microplastics are regarded as emergent contaminants posing a serious threat to the marine ecosystem. It is time-consuming and labor-intensive to determine the number of microplastics in different seas using traditional sampling and detection methods. Machine learning can provide a promising tool for prediction, but there is a lack of research on this. To screen high-performance models for the prediction of microplastic abundance in the marine surface water and explore the influencing factors, three ensemble learning models, random forest (RF), gradient boosted decision tree (GBDT), and extreme gradient boosting (XGBoost), were developed and compared. A total of 1169 samples were collected, and multi-classification prediction models were constructed with 16 features of the data as inputs and six classes of microplastic abundance intervals as outputs. Our results show that the XGBoost model has the best performance of prediction, with a total accuracy rate of 0.719 and an ROC AUC (Receiver Operating Characteristic curve, Area Under Curve) value of 0.914. Seawater phosphate (PHOS) and seawater temperature (TEMP) have negative effects on the abundance of microplastics in surface seawater, while the distance between the sampling point and the coast (DIS), wind stress (WS), human development index (HDI), and sampling latitude (LAT) have positive effects. This work not only predicts the abundance of microplastics in different seas but also offers a framework for the use of machine learning in the study of marine microplastics.
Collapse
Affiliation(s)
- Yu Zhen
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education), Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, 300350, China
| | - Lei Wang
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education), Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, 300350, China
| | - Hongwen Sun
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education), Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, 300350, China
| | - Chunguang Liu
- Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education), Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, 300350, China.
| |
Collapse
|
23
|
Lu Y, Li K. Multistation collaborative prediction of air pollutants based on the CNN-BiLSTM model. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:92417-92435. [PMID: 37490250 DOI: 10.1007/s11356-023-28877-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 07/16/2023] [Indexed: 07/26/2023]
Abstract
The development of industry has led to serious air pollution problems. It is very important to establish high-precision and high-performance air quality prediction models and take corresponding control measures. In this paper, based on 4 years of air quality and meteorological data from Tianjin, China, the relationships between various meteorological factors and air pollutant concentrations are analyzed. A hybrid deep learning model consisting of a convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) is proposed to predict pollutant concentrations. In addition, a Bayesian optimization algorithm is applied to obtain the optimal combination of hyperparameters for the proposed deep learning model, which enhances the generalization ability of the model. Furthermore, based on air quality data from multiple stations in the region, a multistation collaborative prediction method is designed, and the concept of a strongly correlated station (SCS) is defined. The predictive model is modified using the idea of SCS and is used to predict the pollutant concentration in Tianjin. The coefficient of determination R2 of PM2.5, PM10, SO2, NO2, CO, and O3 are 0.89, 0.84, 0.69, 0.83, 0.92, and 0.84, respectively. The results show that our model is capable of dealing with air pollutant prediction with satisfactory accuracy.
Collapse
Affiliation(s)
- Yanan Lu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, 200433, China.
| | - Kun Li
- School of Economics and Management, Tiangong University, Tianjin, 300387, China
| |
Collapse
|
24
|
Zhang Y, Wu W, Li Y, Li Y. An investigation of PM2.5 concentration changes in Mid-Eastern China before and after COVID-19 outbreak. ENVIRONMENT INTERNATIONAL 2023; 175:107941. [PMID: 37146469 PMCID: PMC10119641 DOI: 10.1016/j.envint.2023.107941] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 03/24/2023] [Accepted: 04/17/2023] [Indexed: 05/07/2023]
Abstract
With the Chinese government revising ambient air quality standards and strengthening the monitoring and management of pollutants such as PM2.5, the concentrations of air pollutants in China have gradually decreased in recent years. Meanwhile, the strong control measures taken by the Chinese government in the face of COVID-19 in 2020 have an extremely profound impact on the reduction of pollutants in China. Therefore, investigations of pollutant concentration changes in China before and after COVID-19 outbreak are very necessary and concerning, but the number of monitoring stations is very limited, making it difficult to conduct a high spatial density investigation. In this study, we construct a modern deep learning model based on multi-source data, which includes remotely sensed AOD data products, other reanalysis element data, and ground monitoring station data. Combining satellite remote sensing techniques, we finally realize a high spital density PM2.5 concentration change investigation method, and analyze the seasonal and annual, the spatial and temporal characteristics of PM2.5 concentrations in Mid-Eastern China from 2016 to 2021 and the impact of epidemic closure and control measures on regional and provincial PM2.5 concentrations. We find that PM2.5 concentrations in Mid-Eastern China during these years is mainly characterized by "north-south superiority and central inferiority", seasonal differences are evident, with the highest in winter, the second highest in autumn and the lowest in summer, and a gradual decrease in overall concentration during the year. According to our experimental results, the annual average PM2.5 concentration decreases by 3.07 % in 2020, and decreases by 24.53 % during the shutdown period, which is probably caused by China's epidemic control measures. At the same time, some provinces with a large share of secondary industry see PM2.5 concentrations drop by more than 30 %. By 2021, PM2.5 concentrations rebound slightly, rising by 10 % in most provinces.
Collapse
Affiliation(s)
- Yongjun Zhang
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China.
| | - Wenpin Wu
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China.
| | - Yiliang Li
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China.
| | - Yansheng Li
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China.
| |
Collapse
|
25
|
Spatio-temporal air quality analysis and PM2.5 prediction over Hyderabad City, India using artificial intelligence techniques. ECOL INFORM 2023. [DOI: 10.1016/j.ecoinf.2023.102067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
|
26
|
Islam ARMT, Al Awadh M, Mallick J, Pal SC, Chakraborty R, Fattah MA, Ghose B, Kakoli MKA, Islam MA, Naqvi HR, Bilal M, Elbeltagi A. Estimating ground-level PM 2.5 using subset regression model and machine learning algorithms in Asian megacity, Dhaka, Bangladesh. AIR QUALITY, ATMOSPHERE, & HEALTH 2023; 16:1117-1139. [PMID: 37303964 PMCID: PMC9961308 DOI: 10.1007/s11869-023-01329-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 02/16/2023] [Indexed: 06/13/2023]
Abstract
Fine particulate matter (PM2.5) has become a prominent pollutant due to rapid economic development, urbanization, industrialization, and transport activities, which has serious adverse effects on human health and the environment. Many studies have employed traditional statistical models and remote-sensing technologies to estimate PM2.5 concentrations. However, statistical models have shown inconsistency in PM2.5 concentration predictions, while machine learning algorithms have excellent predictive capacity, but little research has been done on the complementary advantages of diverse approaches. The present study proposed the best subset regression model and machine learning approaches, including random tree, additive regression, reduced error pruning tree, and random subspace, to estimate the ground-level PM2.5 concentrations over Dhaka. This study used advanced machine learning algorithms to measure the effects of meteorological factors and air pollutants (NOX, SO2, CO, and O3) on the dynamics of PM2.5 in Dhaka from 2012 to 2020. Results showed that the best subset regression model was well-performed for forecasting PM2.5 concentrations for all sites based on the integration of precipitation, relative humidity, temperature, wind speed, SO2, NOX, and O3. Precipitation, relative humidity, and temperature have negative correlations with PM2.5. The concentration levels of pollutants are much higher at the beginning and end of the year. Random subspace is the optimal model for estimating PM2.5 because it has the least statistical error metrics compared to other models. This study suggests ensemble learning models to estimate PM2.5 concentrations. This study will help quantify ground-level PM2.5 concentration exposure and recommend regional government actions to prevent and regulate PM2.5 air pollution. Supplementary Information The online version contains supplementary material available at 10.1007/s11869-023-01329-w.
Collapse
Affiliation(s)
| | - Mohammed Al Awadh
- Department of Industrial Engineering, College of Engineering, King Khalid University, Abha, 61421 Saudi Arabia
| | - Javed Mallick
- Department of Civil Engineering, King Khalid University, Abha, Saudi Arabia
| | - Subodh Chandra Pal
- Department of Geography, The University of Burdwan, Bardhaman, West Bengal 713104 India
| | - Rabin Chakraborty
- Department of Geography, The University of Burdwan, Bardhaman, West Bengal 713104 India
| | - Md. Abdul Fattah
- Department of Urban and Regional Planning, Khulna University of Engineering and Technology, Khulna, Bangladesh
| | - Bonosri Ghose
- Department of Disaster Management, Begum Rokeya University, Rangpur, Rangpur, 5400 Bangladesh
| | | | - Md. Aminul Islam
- Department of Disaster Management, Begum Rokeya University, Rangpur, Rangpur, 5400 Bangladesh
| | - Hasan Raja Naqvi
- Department of Geography, Faculty of Natural Sciences, Jamia Millia Islamia (A Central University), New Delhi, 110025 India
| | - Muhammad Bilal
- School of Surveying and Land Information Engineering, Henan Polytechnic University, Jiaozuo, 45003 China
| | - Ahmed Elbeltagi
- Agricultural Engineering Dept., Faculty of Agriculture, Mansoura University, Mansoura, 35516 Egypt
| |
Collapse
|
27
|
Cha GW, Choi SH, Hong WH, Park CW. Developing a Prediction Model of Demolition-Waste Generation-Rate via Principal Component Analysis. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:3159. [PMID: 36833851 PMCID: PMC9968033 DOI: 10.3390/ijerph20043159] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 02/08/2023] [Accepted: 02/09/2023] [Indexed: 06/18/2023]
Abstract
Construction and demolition waste accounts for a sizable proportion of global waste and is harmful to the environment. Its management is therefore a key challenge in the construction industry. Many researchers have utilized waste generation data for waste management, and more accurate and efficient waste management plans have recently been prepared using artificial intelligence models. Here, we developed a hybrid model to forecast the demolition-waste-generation rate in redevelopment areas in South Korea by combining principal component analysis (PCA) with decision tree, k-nearest neighbors, and linear regression algorithms. Without PCA, the decision tree model exhibited the highest predictive performance (R2 = 0.872) and the k-nearest neighbors (Chebyshev distance) model exhibited the lowest (R2 = 0.627). The hybrid PCA-k-nearest neighbors (Euclidean uniform) model exhibited significantly better predictive performance (R2 = 0.897) than the non-hybrid k-nearest neighbors (Euclidean uniform) model (R2 = 0.664) and the decision tree model. The mean of the observed values, k-nearest neighbors (Euclidean uniform) and PCA-k-nearest neighbors (Euclidean uniform) models were 987.06 (kg·m-2), 993.54 (kg·m-2) and 991.80 (kg·m-2), respectively. Based on these findings, we propose the k-nearest neighbors (Euclidean uniform) model using PCA as a machine-learning model for demolition-waste-generation rate predictions.
Collapse
Affiliation(s)
- Gi-Wook Cha
- School of Science and Technology Acceleration Engineering, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Se-Hyu Choi
- School of Architectural, Civil, Environmental and Energy Engineering, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Won-Hwa Hong
- School of Architectural, Civil, Environmental and Energy Engineering, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Choon-Wook Park
- Industry Academic Cooperation Foundation, Kyungpook National University, Daegu 41566, Republic of Korea
| |
Collapse
|
28
|
Bagheri H. Using deep ensemble forest for high-resolution mapping of PM2.5 from MODIS MAIAC AOD in Tehran, Iran. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023; 195:377. [PMID: 36757448 DOI: 10.1007/s10661-023-10951-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 01/20/2023] [Indexed: 06/18/2023]
Abstract
High-resolution mapping of PM2.5 concentration over Tehran city is challenging because of the complicated behavior of numerous sources of pollution and the insufficient number of ground air quality monitoring stations. Alternatively, high-resolution satellite Aerosol Optical Depth (AOD) data can be employed for high-resolution mapping of PM2.5. For this purpose, different data-driven methods have been used in the literature. Recently, deep learning methods have demonstrated their ability to estimate PM2.5 from AOD data. However, these methods have several weaknesses in solving the problem of estimating PM2.5 from satellite AOD data. In this paper, the potential of the deep ensemble forest method for estimating the PM2.5 concentration from AOD data was evaluated. The results showed that the deep ensemble forest method with [Formula: see text] gives a higher accuracy of PM2.5 estimation than deep learning methods ([Formula: see text]) as well as classic data-driven methods such as random forest ([Formula: see text]). Additionally, the estimated values of PM2.5 using the deep ensemble forest algorithm were used along with ground data to generate a high-resolution map of PM2.5. Evaluation of produced PM2.5 map revealed the good performance of the deep ensemble forest for modeling the variation of PM2.5 in the city of Tehran.
Collapse
Affiliation(s)
- Hossein Bagheri
- Faculty of Civil Engineering and Transportation, University of Isfahan, Azadi Square, Isfahan, 8174673441, Iran.
| |
Collapse
|
29
|
Hikouei IS, Eshleman KN, Saharjo BH, Graham LLB, Applegate G, Cochrane MA. Using machine learning algorithms to predict groundwater levels in Indonesian tropical peatlands. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 857:159701. [PMID: 36306856 DOI: 10.1016/j.scitotenv.2022.159701] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/12/2022] [Accepted: 10/20/2022] [Indexed: 06/16/2023]
Abstract
Tropical peatlands play a vital role in the global carbon cycle as large carbon reservoirs and substantial carbon sinks. Indonesia possesses the largest share (65 %) of tropical peat carbon, equal to 57.4 Gt C. Human perturbations such as extensive logging, deforestation and canalization exacerbate water losses, especially during dry seasons, when low precipitation and high evapotranspiration rates combine with the increased drainage to lower groundwater levels. Drying and increasing temperatures of the surface peat exacerbate ignition and wildfire risks within the peat soils. As such, it is critically important to know how groundwater levels in peatlands are changing over space and time. In this study, a multilinear regression model as well as two machine learning algorithms, random forest and extreme gradient boosting, were used to model groundwater level over the study period (2010-12) within a peat dome impacted by drainage canals and multiple wildfires in Central Kalimantan, Indonesia. Although all three models performed well, based on overall fit, spatial modeling of groundwater level results revealed that extreme gradient boosting (R2 = 0.998, RMSE = 0.048 m) outperformed random forest (R2 = 0.997, RMSE = 0.054 m) and multilinear regression (R2 = 0.970, RMSE = 0.221 m) near drainage canals, which are key fire ignition risk locations in the peatlands. Our study also shows that, on average, elevation and precipitation are the most important parameters influencing groundwater level spatiotemporally.
Collapse
Affiliation(s)
- Iman Salehi Hikouei
- Appalachian Laboratory, University of Maryland Center for Environmental Science, Frostburg, MD, USA.
| | - Keith N Eshleman
- Appalachian Laboratory, University of Maryland Center for Environmental Science, Frostburg, MD, USA
| | | | - Laura L B Graham
- Borneo Orangutan Survival Foundation, Palangka Raya, Indonesia; Tropical Forests and People Research Centre, University of the Sunshine Coast, Sippy Downs, QLD 4556, Australia
| | - Grahame Applegate
- Tropical Forests and People Research Centre, University of the Sunshine Coast, Sippy Downs, QLD 4556, Australia
| | - Mark A Cochrane
- Appalachian Laboratory, University of Maryland Center for Environmental Science, Frostburg, MD, USA
| |
Collapse
|
30
|
Fan K, Dhammapala R, Harrington K, Lamb B, Lee Y. Machine learning-based ozone and PM2.5 forecasting: Application to multiple AQS sites in the Pacific Northwest. Front Big Data 2023; 6:1124148. [PMID: 36910164 PMCID: PMC9999009 DOI: 10.3389/fdata.2023.1124148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 02/06/2023] [Indexed: 03/14/2023] Open
Abstract
Air quality in the Pacific Northwest (PNW) of the U.S has generally been good in recent years, but unhealthy events were observed due to wildfires in summer or wood burning in winter. The current air quality forecasting system, which uses chemical transport models (CTMs), has had difficulty forecasting these unhealthy air quality events in the PNW. We developed a machine learning (ML) based forecasting system, which consists of two components, ML1 (random forecast classifiers and multiple linear regression models) and ML2 (two-phase random forest regression model). Our previous study showed that the ML system provides reliable forecasts of O3 at a single monitoring site in Kennewick, WA. In this paper, we expand the ML forecasting system to predict both O3 in the wildfire season and PM2.5 in wildfire and cold seasons at all available monitoring sites in the PNW during 2017-2020, and evaluate our ML forecasts against the existing operational CTM-based forecasts. For O3, both ML1 and ML2 are used to achieve the best forecasts, which was the case in our previous study: ML2 performs better overall (R2 = 0.79), especially for low-O3 events, while ML1 correctly captures more high-O3 events. Compared to the CTM-based forecast, our O3 ML forecasts reduce the normalized mean bias (NMB) from 7.6 to 2.6% and normalized mean error (NME) from 18 to 12% when evaluating against the observation. For PM2.5, ML2 performs the best and thus is used for the final forecasts. Compared to the CTM-based PM2.5, ML2 clearly improves PM2.5 forecasts for both wildfire season (May to September) and cold season (November to February): ML2 reduces NMB (-27 to 7.9% for wildfire season; 3.4 to 2.2% for cold season) and NME (59 to 41% for wildfires season; 67 to 28% for cold season) significantly and captures more high-PM2.5 events correctly. Our ML air quality forecast system requires fewer computing resources and fewer input datasets, yet it provides more reliable forecasts than (if not, comparable to) the CTM-based forecast. It demonstrates that our ML system is a low-cost, reliable air quality forecasting system that can support regional/local air quality management.
Collapse
Affiliation(s)
- Kai Fan
- Center for Advanced Systems Understanding, Görlitz, Germany.,Helmholtz-Zentrum Dresden Rossendorf, Dresden, Germany.,Laboratory for Atmospheric Research, Department of Civil and Environmental Engineering, Washington State University, Pullman, WA, United States
| | - Ranil Dhammapala
- South Coast Air Quality Management District, Diamond Bar, CA, United States
| | | | - Brian Lamb
- Laboratory for Atmospheric Research, Department of Civil and Environmental Engineering, Washington State University, Pullman, WA, United States
| | - Yunha Lee
- Center for Advanced Systems Understanding, Görlitz, Germany.,Helmholtz-Zentrum Dresden Rossendorf, Dresden, Germany.,Laboratory for Atmospheric Research, Department of Civil and Environmental Engineering, Washington State University, Pullman, WA, United States
| |
Collapse
|
31
|
Karimian H, Li Y, Chen Y, Wang Z. Evaluation of different machine learning approaches and aerosol optical depth in PM 2.5 prediction. ENVIRONMENTAL RESEARCH 2023; 216:114465. [PMID: 36241075 DOI: 10.1016/j.envres.2022.114465] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Revised: 09/11/2022] [Accepted: 09/27/2022] [Indexed: 06/16/2023]
Abstract
Atmospheric Aerosol Optical Depth (AOD), derived from polar-orbiting satellites, has shown potential in PM2.5 predictions. However, this important source of data suffers from low temporal resolution. Recently, geostationary satellites provide AOD data in high temporal and spatial resolution. However, the feasibility of these data in PM2.5 prediction needs further study. In this paper, we analyzed the impact of AOD derived from Himawari-8 in PM2.5 predictions. Moreover, by combining wavelet, machine learning techniques, and minimum redundancy maximum relevance (mRMR), a novel hybrid model was proposed. The results showed that AOD missing rate over Yangtze River Delta region is the highest in Nanjing, Hefei, and Maanshan. In addition, missing rates are the lowest in winter and summer (∼80%). Moreover, we found that considering AOD, as an auxiliary variable in the model, could not improve the accuracy of PM2.5 predictions, and in some cases decreased it slightly. In comparison with other models, our proposed hybrid model showed higher prediction accuracy, R2 is improved by 11.64% on average, and root mean square error, mean absolute error, and mean absolute percentage error is reduced by 26.82%, 27.24%, and 29.88% respectively. This research provides a general overview of the availability of Himawari-8 AOD data and its feasibility in PM2.5 predictions. In addition, it evaluates different machine learning approaches in PM2.5 predictions. Our proposed framework can be used in other regions to predict different air pollutants concentrations and can be used as an aid for air pollution controlling programs.
Collapse
Affiliation(s)
- Hamed Karimian
- School of Civil and Surveying & Mapping Engineering, Jiangxi University of Science and Technology, Ganzhou, 341000, China
| | - Yaqian Li
- School of Civil and Surveying & Mapping Engineering, Jiangxi University of Science and Technology, Ganzhou, 341000, China
| | - Youliang Chen
- School of Civil and Surveying & Mapping Engineering, Jiangxi University of Science and Technology, Ganzhou, 341000, China; School of Geosciences and Info Physics, Central South University, Changsha, China.
| | - Zhaoru Wang
- School of Resources and Environmental Engineering, Jiangxi University of Science and Technology, Ganzhou, 341000, China
| |
Collapse
|
32
|
Cha GW, Choi SH, Hong WH, Park CW. Development of Machine Learning Model for Prediction of Demolition Waste Generation Rate of Buildings in Redevelopment Areas. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 20:107. [PMID: 36612429 PMCID: PMC9819715 DOI: 10.3390/ijerph20010107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 12/14/2022] [Accepted: 12/17/2022] [Indexed: 06/17/2023]
Abstract
Owing to a rapid increase in waste, waste management has become essential, for which waste generation (WG) information has been effectively utilized. Various studies have recently focused on the development of reliable predictive models by applying artificial intelligence to the construction and prediction of WG information. In this study, research was conducted on the development of machine learning (ML) models for predicting the demolition waste generation rate (DWGR) of buildings in redevelopment areas in South Korea. Various ML algorithms (i.e., artificial neural network (ANN), K-nearest neighbors (KNN), linear regression (LR), random forest (RF), and support vector machine (SVM)) were applied to the development of an optimal predictive model, and the main hyper parameters (HPs) for each algorithm were optimized. The results suggest that ANN-ReLu (coefficient of determination (R2) 0.900, the ratio of percent deviation (RPD) 3.16), SVM-polynomial (R2 0.889, RPD 3.00), and ANN-logistic (R2 0.883, RPD 2.92) are the best ML models for predicting the DWGR. They showed average errors of 7.3%, 7.4%, and 7.5%, respectively, compared to the average observed values, confirming the accurate predictive performance, and in the uncertainty analysis, the d-factor of the models appeared less than 1, showing that the presented models are reliable. Through a comparison with ML algorithms and HPs applied in previous related studies, the results herein also showed that the selection of various ML algorithms and HPs is important in developing optimal ML models for WG management.
Collapse
Affiliation(s)
- Gi-Wook Cha
- School of Science and Technology Acceleration Engineering, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Se-Hyu Choi
- School of Architectural, Civil, Environmental and Energy Engineering, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Won-Hwa Hong
- School of Architectural, Civil, Environmental and Energy Engineering, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Choon-Wook Park
- Industry Academic Cooperation Foundation, Kyungpook National University, Daegu 41566, Republic of Korea
| |
Collapse
|
33
|
A Review on Pollution Treatment in Cement Industrial Areas: From Prevention Techniques to Python-Based Monitoring and Controlling Models. Processes (Basel) 2022. [DOI: 10.3390/pr10122682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Anthropogenic climate change, global warming, environmental pollution, and fossil fuel depletion have been identified as critical current scenarios and future challenges. Cement plants are one of the most impressive zones, emitting 15% of the worldwide contaminations into the environment among various industries. These contaminants adversely affect human well-being, flora, and fauna. Meanwhile, the use of cement-based substances in various fields, such as civil engineering, medical applications, etc., is inevitable due to the continuous increment of population and urbanization. To cope with this challenge, numerous filtering methods, recycling techniques, and modeling approaches have been introduced. Among the various statistical, mathematical, and computational modeling solutions, Python has received tremendous attention because of the benefit of smart libraries, heterogeneous data integration, and meta-models. The Python-based models are able to optimize the raw material contents and monitor the released pollutants in cement complex outputs with intelligent predictions. Correspondingly, this paper aims to summarize the performed studies to illuminate the resultant emissions from the cement complexes, their treatment methods, and the crucial role of Python modeling toward the high-efficient production of cement via a green and eco-friendly procedure. This comprehensive review sheds light on applying smart modeling techniques rather than experimental analysis for fundamental and applied research and developing future opportunities.
Collapse
|
34
|
Fallah-Shorshani M, Yin X, McConnell R, Fruin S, Franklin M. Estimating traffic noise over a large urban area: An evaluation of methods. ENVIRONMENT INTERNATIONAL 2022; 170:107583. [PMID: 36272254 DOI: 10.1016/j.envint.2022.107583] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/29/2022] [Accepted: 10/11/2022] [Indexed: 06/16/2023]
Abstract
Unlike air pollution, traffic-related noise remains unregulated and has been under-studied despite evidence of its deleterious health impacts. To characterize population exposure to traffic noise, both acoustic-based numerical models and data-driven statistical approaches can generate estimates over large urban areas. The aim of this work is to formally compare the performances of the most common traffic noise models by evaluating their estimates for different categories of roads and validating them against a unique dataset of measured noise in Long Beach, California. Specifically, a statistical land use regression model, an extreme gradient boosting machine learning model (XGB), and three numerical/acoustic traffic noise models: the US Noise Model (FHWA-TNM2.5), a commercial noise model (CadnaA), and an open-source European model (Harmonoise) were optimized and compared. The results demonstrate that XGB and CadnaA were the most effective models for estimating traffic noise, and they are particularly adept at differentiating noise levels on different categories of road.
Collapse
Affiliation(s)
- Masoud Fallah-Shorshani
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA.
| | - Xiaozhe Yin
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA.
| | - Rob McConnell
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA.
| | - Scott Fruin
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA.
| | - Meredith Franklin
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA; Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
35
|
Tella A, Balogun AL. GIS-based air quality modelling: spatial prediction of PM10 for Selangor State, Malaysia using machine learning algorithms. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:86109-86125. [PMID: 34533750 DOI: 10.1007/s11356-021-16150-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 08/20/2021] [Indexed: 06/13/2023]
Abstract
Rapid urbanization has caused severe deterioration of air quality globally, leading to increased hospitalization and premature deaths. Therefore, accurate prediction of air quality is crucial for mitigation planning to support urban sustainability and resilience. Although some studies have predicted air pollutants such as particulate matter (PM) using machine learning algorithms (MLAs), there is a paucity of studies on spatial hazard assessment with respect to the air quality index (AQI). Incorporating PM in AQI studies is crucial because of its easily inhalable micro-size which has adverse impacts on ecology, environment, and human health. Accurate and timely prediction of the air quality index can ensure adequate intervention to aid air quality management. Therefore, this study undertakes a spatial hazard assessment of the air quality index using particulate matter with a diameter of 10 μm or lesser (PM10) in Selangor, Malaysia, by developing four machine learning models: eXtreme Gradient Boosting (XGBoost), random forest (RF), K-nearest neighbour (KNN), and Naive Bayes (NB). Spatially processed data such as NDVI, SAVI, BU, LST, Ws, slope, elevation, and road density was used for the modelling. The model was trained with 70% of the dataset, while 30% was used for cross-validation. Results showed that XGBoost has the highest overall accuracy and precision of 0.989 and 0.995, followed by random forest (0.989, 0.993), K-nearest neighbour (0.987, 0.984), and Naive Bayes (0.917, 0.922), respectively. The spatial air quality maps were generated by integrating the geographical information system (GIS) with the four MLAs, which correlated with Malaysia's air pollution index. The maps indicate that air quality in Selangor is satisfactory and posed no threats to health. Nevertheless, the two algorithms with the best performance (XGBoost and RF) indicate that a high percentage of the air quality is moderate. The study concludes that successful air pollution management policies such as green infrastructure practice, improvement of energy efficiency, and restrictions on heavy-duty vehicles can be adopted in Selangor and other Southeast Asian cities to prevent deterioration of air quality in the future.
Collapse
Affiliation(s)
- Abdulwaheed Tella
- Geospatial Analysis and Modelling (GAM) Research Laboratory, Department of Civil and Environmental Engineering, Universiti Teknologi PETRONAS (UTP), 32610, Seri Iskandar, Perak, Malaysia.
| | - Abdul-Lateef Balogun
- Geospatial Analysis and Modelling (GAM) Research Laboratory, Department of Civil and Environmental Engineering, Universiti Teknologi PETRONAS (UTP), 32610, Seri Iskandar, Perak, Malaysia
| |
Collapse
|
36
|
Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q. Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOLECULAR PLANT 2022; 15:1664-1695. [PMID: 36081348 DOI: 10.1016/j.molp.2022.09.001] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 05/12/2023]
Abstract
The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.
Collapse
Affiliation(s)
- Yunbi Xu
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; CIMMYT-China Tropical Maize Research Center, School of Food Science and Engineering, Foshan University, Foshan, Guangdong 528231, China; Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China.
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China
| | - Huihui Li
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan 572024, China
| | - Hongjian Zheng
- CIMMYT-China Specialty Maize Research Center, Shanghai Academy of Agricultural Sciences, Shanghai 201400, China
| | - Jianan Zhang
- MolBreeding Biotechnology Co., Ltd., Shijiazhuang, Hebei 050035, China
| | - Michael S Olsen
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Rajeev K Varshney
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, Australia
| | - Boddupalli M Prasanna
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Qian Qian
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
37
|
Deep matrix factorization models for estimation of missing data in a low-cost sensor network to measure air quality. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
38
|
Li J, Kang CM, Wolfson JM, Alahmad B, Al-Hemoud A, Garshick E, Koutrakis P. Estimation of fine particulate matter in an arid area from visibility based on machine learning. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2022; 32:926-931. [PMID: 36151455 PMCID: PMC9742157 DOI: 10.1038/s41370-022-00480-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 09/12/2022] [Accepted: 09/13/2022] [Indexed: 05/04/2023]
Abstract
BACKGROUND The absence of air pollution monitoring networks makes it difficult to assess historical fine particulate matter (PM2.5) exposures for countries in the areas, such as Kuwait, which are severe impacted by desert dust and anthropogenic pollution. OBJECTIVE We constructed an ensemble machine learning model to predict daily PM2.5 concentrations for regions lack of PM2.5 observations. METHODS The model was constructed based on daily PM2.5, visibility, and other meteorological data collected at two sites in Kuwait. Then, our model was applied to predict the daily level of PM2.5 concentrations for eight airports located in Kuwait and Iraq from 2013 to 2020. RESULTS As compared to traditional statistic models, the proposed machine learning methods improved the accuracy in using visibility to predict daily PM2.5 concentrations with a cross-validation R2 of 0.68. The predicted level of daily PM2.5 concentrations were consistent with previous measurements. The predicted average yearly PM2.5 concentration for the eight stations is 50.65 µg/m3. For all stations, the monthly average PM2.5 concentrations reached their maximum in July and their minimum in November. SIGNIFICANCE These findings make it possible to retrospectively estimate daily PM2.5 exposures using the large-scale databases of historical visibility in regions with few particulate matter monitoring stations. IMPACT STATEMENT The scarcity of air pollution ground monitoring networks makes it difficult to assess historical fine particulate matter exposures for countries in arid areas such as Kuwait. Visibility is closely related to atmospheric particulate matter concentrations and historical airport visibility records are commonly available in most countries. Our model make it possible to retrospectively estimate daily PM2.5 exposures using the large-scale databases of historical visibility in arid regions with few particulate matter ground monitoring stations. The product of such models can be critical for environmental risk assessments and population health studies.
Collapse
Affiliation(s)
- Jing Li
- Institute of Child and Adolescent Health, School of Public Health, Peking University, Beijing, 100191, China.
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, 02115, USA.
| | - Choong-Min Kang
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, 02115, USA
| | - Jack M Wolfson
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, 02115, USA
| | - Barrak Alahmad
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, 02115, USA
| | - Ali Al-Hemoud
- Crisis Decision Support Program, Environment and Life Sciences Research Center, Kuwait Institute for Scientific Research, Safat, 13109, Kuwait
| | - Eric Garshick
- Pulmonary, Allergy, Sleep, and Critical Care Medicine Section, Medical Service, VA Boston Healthcare System, Boston, MA, 02132, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Petros Koutrakis
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, 02115, USA
| |
Collapse
|
39
|
Xu H, Zhang A, Xu X, Li P, Ji Y. Prediction of Particulate Concentration Based on Correlation Analysis and a Bi-GRU Model. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:13266. [PMID: 36293843 PMCID: PMC9603264 DOI: 10.3390/ijerph192013266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/08/2022] [Accepted: 10/11/2022] [Indexed: 06/16/2023]
Abstract
In recent decades, particulate pollution in the air has caused severe health problems. Therefore, it has become a hot research topic to accurately predict particulate concentrations. Particle concentration has a strong spatial-temporal correlation due to pollution transportation between regions, making it important to understand how to utilize these features to predict particulate concentration. In this paper, Pearson Correlation Coefficients (PCCs) are used to compare the particle concentrations at the target site with those at other locations. The models based on bi-directional gated recurrent units (Bi-GRUs) and PCCs are proposed to predict particle concentrations. The proposed model has the advantage of requiring fewer samples and can forecast particulate concentrations in real time within the next six hours. As a final step, several Beijing air quality monitoring stations are tested for pollutant concentrations hourly. Based on the correlation analysis and the proposed prediction model, the prediction error within the first six hours is smaller than those of the other three models. The model can help environmental researchers improve the prediction accuracy of fine particle concentrations and help environmental policymakers implement relevant pollution control policies by providing tools. With the correlation analysis between the target site and adjacent sites, an accurate pollution control decision can be made based on the internal relationship.
Collapse
Affiliation(s)
- He Xu
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
- Jiangsu HPC and Intelligent Processing Engineer Research Center, Nanjing 210003, China
| | - Aosheng Zhang
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
- Jiangsu HPC and Intelligent Processing Engineer Research Center, Nanjing 210003, China
| | - Xin Xu
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
- Jiangsu HPC and Intelligent Processing Engineer Research Center, Nanjing 210003, China
| | - Peng Li
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
- Jiangsu HPC and Intelligent Processing Engineer Research Center, Nanjing 210003, China
| | - Yimu Ji
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
- Jiangsu HPC and Intelligent Processing Engineer Research Center, Nanjing 210003, China
| |
Collapse
|
40
|
Childs ML, Li J, Wen J, Heft-Neal S, Driscoll A, Wang S, Gould CF, Qiu M, Burney J, Burke M. Daily Local-Level Estimates of Ambient Wildfire Smoke PM 2.5 for the Contiguous US. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:13607-13621. [PMID: 36134580 DOI: 10.1021/acs.est.2c02934] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Smoke from wildfires is a growing health risk across the US. Understanding the spatial and temporal patterns of such exposure and its population health impacts requires separating smoke-driven pollutants from non-smoke pollutants and a long time series to quantify patterns and measure health impacts. We develop a parsimonious and accurate machine learning model of daily wildfire-driven PM2.5 concentrations using a combination of ground, satellite, and reanalysis data sources that are easy to update. We apply our model across the contiguous US from 2006 to 2020, generating daily estimates of smoke PM2.5 over a 10 km-by-10 km grid and use these data to characterize levels and trends in smoke PM2.5. Smoke contributions to daily PM2.5 concentrations have increased by up to 5 μg/m3 in the Western US over the last decade, reversing decades of policy-driven improvements in overall air quality, with concentrations growing fastest for higher income populations and predominantly Hispanic populations. The number of people in locations with at least 1 day of smoke PM2.5 above 100 μg/m3 per year has increased 27-fold over the last decade, including nearly 25 million people in 2020 alone. Our data set can bolster efforts to comprehensively understand the drivers and societal impacts of trends and extremes in wildfire smoke.
Collapse
Affiliation(s)
- Marissa L Childs
- Emmett Interdisciplinary Program in Environment and Resources, Stanford University, Stanford, California 94305, United States
| | - Jessica Li
- Center on Food Security and the Environment, Stanford University, Stanford, California 94305, United States
| | - Jeffrey Wen
- Department of Earth System Science, Stanford University, Stanford, California 94305, United States
| | - Sam Heft-Neal
- Center on Food Security and the Environment, Stanford University, Stanford, California 94305, United States
| | - Anne Driscoll
- Center on Food Security and the Environment, Stanford University, Stanford, California 94305, United States
| | - Sherrie Wang
- Goldman School of Public Policy, UC Berkeley, Berkeley, California 94720, United States
| | - Carlos F Gould
- Department of Earth System Science, Stanford University, Stanford, California 94305, United States
| | - Minghao Qiu
- Department of Earth System Science, Stanford University, Stanford, California 94305, United States
| | - Jennifer Burney
- Global Policy School, UC San Diego, San Diego, California 92093, United States
| | - Marshall Burke
- Center on Food Security and the Environment, Stanford University, Stanford, California 94305, United States
- Department of Earth System Science, Stanford University, Stanford, California 94305, United States
- National Bureau of Economic Research, Cambridge, Massachusetts 02138, United States
| |
Collapse
|
41
|
Abstract
The knowledge of tree species distribution at a national scale provides benefits for forest management practices and decision making for site-adapted tree species selection. An accurate assignment of tree species in relation to their location allows conclusions about potential resilience or vulnerability to biotic and abiotic factors. Identifying areas at risk helps the long-term strategy of forest conversion towards a natural, diverse, and climate-resilient forest. In the framework of the national forest inventory (NFI) in Germany, data on forest tree species are collected in sample plots, but there is a lack of a full coverage map of the tree species distribution. The NFI data were used to train and test a machine-learning approach that classifies a dense Sentinel-2 time series with the result of a dominant tree species map of German forests with seven main tree species classes. The test of the model’s accuracy for the forest type classification showed a weighted average F1-score for deciduous tree species (Beech, Oak, Larch, and Other Broadleaf) between 0.77 and 0.91 and for non-deciduous tree species (Spruce, Pine, and Douglas fir) between 0.85 and 0.94. Two additional plausibility checks with independent forest stand inventories and statistics from the NFI show conclusive agreement. The results are provided to the public via a web-based interactive map, in order to initiate a broad discussion about the potential and limitations of satellite-supported forest management.
Collapse
|
42
|
Khan N, Kamaruddin MA, Ullah Sheikh U, Zawawi MH, Yusup Y, Bakht MP, Mohamed Noor N. Prediction of Oil Palm Yield Using Machine Learning in the Perspective of Fluctuating Weather and Soil Moisture Conditions: Evaluation of a Generic Workflow. PLANTS (BASEL, SWITZERLAND) 2022; 11:1697. [PMID: 35807648 PMCID: PMC9268852 DOI: 10.3390/plants11131697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 06/20/2022] [Accepted: 06/24/2022] [Indexed: 11/19/2022]
Abstract
Current development in precision agriculture has underscored the role of machine learning in crop yield prediction. Machine learning algorithms are capable of learning linear and nonlinear patterns in complex agro-meteorological data. However, the application of machine learning methods for predictive analysis is lacking in the oil palm industry. This work evaluated a supervised machine learning approach to develop an explainable and reusable oil palm yield prediction workflow. The input data included 12 weather and three soil moisture parameters along with 420 months of actual yield records of the study site. Multisource data and conventional machine learning techniques were coupled with an automated model selection process. The performance of two top regression models, namely Extra Tree and AdaBoost was evaluated using six statistical evaluation metrics. The prediction was followed by data preprocessing and feature selection. Selected regression models were compared with Random Forest, Gradient Boosting, Decision Tree, and other non-tree algorithms to prove the R2 driven performance superiority of tree-based ensemble models. In addition, the learning process of the models was examined using model-based feature importance, learning curve, validation curve, residual analysis, and prediction error. Results indicated that rainfall frequency, root-zone soil moisture, and temperature could make a significant impact on oil palm yield. Most influential features that contributed to the prediction process are rainfall, cloud amount, number of rain days, wind speed, and root zone soil wetness. It is concluded that the means of machine learning have great potential for the application to predict oil palm yield using weather and soil moisture data.
Collapse
Affiliation(s)
- Nuzhat Khan
- School of Industrial Technology, Universiti Sains Malaysia, Gelugor 11800, Malaysia; (N.K.); (Y.Y.)
| | - Mohamad Anuar Kamaruddin
- School of Industrial Technology, Universiti Sains Malaysia, Gelugor 11800, Malaysia; (N.K.); (Y.Y.)
| | - Usman Ullah Sheikh
- School of Electrical Engineering, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia;
| | - Mohd Hafiz Zawawi
- Department of Civil Engineering, Universiti Tenaga Nasional, Kajang 43000, Malaysia
| | - Yusri Yusup
- School of Industrial Technology, Universiti Sains Malaysia, Gelugor 11800, Malaysia; (N.K.); (Y.Y.)
| | - Muhammed Paend Bakht
- School of Electrical Engineering, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia;
- Faculty of Information and Communication Technology, BUITEMS, Quetta 87300, Pakistan
| | - Norazian Mohamed Noor
- Sustainable Environment Research Group (SERG), Centre of Excellence Geopolymer and Green Technology (CEGeoGTech), Faculty of Civil Engineering Technology, Universiti Malaysia Perlis, Arau 01000, Malaysia;
| |
Collapse
|
43
|
Prediction of Air Pollutant Concentrations via RANDOM Forest Regressor Coupled with Uncertainty Analysis—A Case Study in Ningxia. ATMOSPHERE 2022. [DOI: 10.3390/atmos13060960] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Air pollution has not received much attention until recent years when people started to understand its dreadful impacts on human health. According to air pollution and the meteorological monitoring data from 1 January 2016 to 31 December 2017 in Ningxia, we analyzed the impact of ground surface temperature, air temperature, relative humidity and the power of wind on air pollutant concentrations. Meanwhile, we analyze the relationships between air pollutant concentrations and meteorological variables by using the mathematical model of decision tree regressor (DTR), feedforward artificial neural network with back-propagation algorithm (FFANN-BP) and random forest regressor (RFR) according to air-monitoring station data. For all pollutants, the RFR increases R2 of FFANN-BP and DTR by up to 0.53 and 0.42 respectively, reduces root mean square error (RMSE) by up to 68.7 and 41.2, and MAE by up to 25.2 and 17. The empirical results show that the proposed RFR displays the best forecasting performance and could provide local authorities with reliable and precise predictions of air pollutant concentrations. The RFR effectively establishes the relationships between the influential factors and air pollutant concentrations, and well suppresses the overfitting problem and improves the accuracy of prediction. Besides, the limitation of machine learning for single site prediction is also overcame.
Collapse
|
44
|
Environmental Pollution Analysis and Impact Study-A Case Study for the Salton Sea in California. ATMOSPHERE 2022. [DOI: 10.3390/atmos13060914] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
A natural experiment conducted on the shrinking Salton Sea, a saline lake in California, showed that each one foot drop in lake elevation resulted in a 2.6% average increase in PM2.5 concentrations. The shrinking has caused the asthma rate continues to increase among children, with one in five children being sent to the emergency department, which is related to asthma. In this paper, several data-driven machine learning (ML) models are developed for forecasting air quality and dust emission to study, evaluate and predict the impacts on human health due to the shrinkage of the sea, such as the Salton Sea. The paper presents an improved long short-term memory (LSTM) model to predict the hourly air quality (O3 and CO) based on air pollutants and weather data in the previous 5 h. According to our experiment results, the model generates a very good R2 score of 0.924 and 0.835 for O3 and CO, respectively. In addition, the paper proposes an ensemble model based on random forest (RF) and gradient boosting (GBoost) algorithms for forecasting hourly PM2.5 and PM10 using the air quality and weather data in the previous 5 h. Furthermore, the paper shares our research results for PM2.5 and PM10 prediction based on the proposed ensemble ML models using satellite remote sensing data. Daily PM2.5 and PM10 concentration maps in 2018 are created to display the regional air pollution density and severity. Finally, the paper reports Artificial Intelligence (AI) based research findings of measuring air pollution impact on asthma prevalence rate of local residents in the Salton Sea region. A stacked ensemble model based on support vector regression (SVR), elastic net regression (ENR), RF and GBoost is developed for asthma prediction with a good R2 score of 0.978.
Collapse
|
45
|
Xia W, Jiang Y, Chen X, Zhao R. Application of machine learning algorithms in municipal solid waste management: A mini review. WASTE MANAGEMENT & RESEARCH : THE JOURNAL OF THE INTERNATIONAL SOLID WASTES AND PUBLIC CLEANSING ASSOCIATION, ISWA 2022; 40:609-624. [PMID: 34269157 PMCID: PMC9016669 DOI: 10.1177/0734242x211033716] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Population growth and the acceleration of urbanization have led to a sharp increase in municipal solid waste production, and researchers have sought to use advanced technology to solve this problem. Machine learning (ML) algorithms are good at modeling complex nonlinear processes and have been gradually adopted to promote municipal solid waste management (MSWM) and help the sustainable development of the environment in the past few years. In this study, more than 200 publications published over the last two decades (2000-2020) were reviewed and analyzed. This paper summarizes the application of ML algorithms in the whole process of MSWM, from waste generation to collection and transportation, to final disposal. Through this comprehensive review, the gaps and future directions of ML application in MSWM are discussed, providing theoretical and practical guidance for follow-up related research.
Collapse
Affiliation(s)
- Wanjun Xia
- School of Computing and
Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan,
China
- Library, Southwest Jiaotong
University, Chengdu, Sichuan, China
- Wanjun Xia, School of Computing and
Artificial Intelligence, Southwest Jiaotong University, West Park of
Hi-Tech Zone, Chengdu, Sichuan 611756, China.
| | - Yanping Jiang
- Library, Southwest Jiaotong
University, Chengdu, Sichuan, China
| | - Xiaohong Chen
- Library, Southwest Jiaotong
University, Chengdu, Sichuan, China
| | - Rui Zhao
- Faculty of Geosciences and
Environmental Engineering, Southwest Jiaotong University, Chengdu, Sichuan,
China
| |
Collapse
|
46
|
Wijnands JS, Nice KA, Seneviratne S, Thompson J, Stevenson M. The impact of the COVID-19 pandemic on air pollution: A global assessment using machine learning techniques. ATMOSPHERIC POLLUTION RESEARCH 2022; 13:101438. [PMID: 35506000 PMCID: PMC9047632 DOI: 10.1016/j.apr.2022.101438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 04/21/2022] [Accepted: 04/22/2022] [Indexed: 06/14/2023]
Abstract
In response to the COVID-19 pandemic, most countries implemented public health ordinances that resulted in restricted mobility and a resultant change in air quality. This has provided an opportunity to quantify the extent to which carbon-based transport and industrial activity affect air quality. However, quantification of these complex effects has proven to be difficult, depending on the stringency of restrictions, country-specific emission source profiles, long-term trends and meteorological effects on atmospheric chemistry, emission levels and in-flow from nearby countries. In this study, confounding factors were disentangled for a direct comparison of pandemic-related reductions in absolute pollutions levels, globally. The non-linear relationships between atmospheric processes and daily ground-level NO2 , PM10, PM2.5 and O3 measurements were captured in city- and pollutant-specific XGBoost models for over 700 cities, adjusting for weather, seasonality and trends. City-level modelling allowed adaptation to the distinct topography, urban morphology, climate and atmospheric conditions for each city, individually, as the weather variables that were most predictive varied across cities. Pollution forecasts for 2020 in absence of a pandemic were generated based on weather and formed an ensemble for country-level pollution reductions. Findings were robust to modelling assumptions and consistent with various published case studies. NO2 reduced most in China, Europe and India, following severe government restrictions as part of the initial lockdowns. Reductions were highly correlated with changes in mobility levels, especially trips to transit stations, workplaces, retail and recreation venues. Further, NO2 did not fully revert to pre-pandemic levels in 2020. Ambient PM2.5 pollution, which has severe adverse health consequences, reduced most in China and India. Since positive health effects could be offset to some extent by prolonged exposure to indoor pollution, alternative transport initiatives could prove to be an important pathway towards better health outcomes in these countries. Increased O3 levels during initial lockdowns have been documented widely. However, our analyses also found a subsequent reduction in O3 for many countries below what was expected based on meteorological conditions during summer months (e.g., China, United Kingdom, France, Germany, Poland, Turkey). The effects in periods with high O3 levels are especially important for the development of effective mitigation strategies to improve health outcomes.
Collapse
Affiliation(s)
- Jasper S Wijnands
- Transport, Health and Urban Design Research Lab, Melbourne School of Design, The University of Melbourne, Parkville VIC 3010, Australia
- Royal Netherlands Meteorological Institute (KNMI), 3731 GA De Bilt, The Netherlands
| | - Kerry A Nice
- Transport, Health and Urban Design Research Lab, Melbourne School of Design, The University of Melbourne, Parkville VIC 3010, Australia
| | - Sachith Seneviratne
- Transport, Health and Urban Design Research Lab, Melbourne School of Design, The University of Melbourne, Parkville VIC 3010, Australia
| | - Jason Thompson
- Transport, Health and Urban Design Research Lab, Melbourne School of Design, The University of Melbourne, Parkville VIC 3010, Australia
| | - Mark Stevenson
- Transport, Health and Urban Design Research Lab, Melbourne School of Design, The University of Melbourne, Parkville VIC 3010, Australia
- Faculty of Engineering and Information Technology, The University of Melbourne, Parkville VIC 3010, Australia
- Melbourne School of Population and Global Health, The University of Melbourne, Parkville VIC 3010, Australia
| |
Collapse
|
47
|
Zhang P, Yang L, Ma W, Wang N, Wen F, Liu Q. Spatiotemporal estimation of the PM 2.5 concentration and human health risks combining the three-dimensional landscape pattern index and machine learning methods to optimize land use regression modeling in Shaanxi, China. ENVIRONMENTAL RESEARCH 2022; 208:112759. [PMID: 35077716 DOI: 10.1016/j.envres.2022.112759] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 01/05/2022] [Accepted: 01/16/2022] [Indexed: 06/14/2023]
Abstract
PM2.5 pollution endangers human health and urban sustainable development. Land use regression (LUR) is one of the most important methods to reveal the temporal and spatial heterogeneity of PM2.5, and the introduction of characteristic variables of geographical factors and the improvement of model construction methods are important research directions for its optimization. However, the complex non-linear correlation between PM2.5 and influencing indicators is always unrecognized by the traditional regression model. The two-dimensional landscape pattern index is difficult to reflect the real information of the surface, and the research accuracy cannot meet the requirements. As such, a novel integrated three-dimensional landscape pattern index (TDLPI) and machine learning extreme gradient boosting (XGBOOST) improved LUR model (LTX) are developed to estimate the spatiotemporal heterogeneity in the fine particle concentration in Shaanxi, China, and health risks of exposure and inhalation of PM2.5 were explored. The LTX model performed well with R2 = 0.88, RMSE of 8.73 μg/m3 and MAE of 5.85 μg/m3. Our findings suggest that integrated three-dimensional landscape pattern information and XGBOOST approaches can accurately estimate annual and seasonal variations of PM2.5 pollution The Guanzhong Plain and northern Shaanxi always feature high PM2.5 values, which exhibit similar distribution trends to those of the observed PM2.5 pollution. This study demonstrated the outstanding performance of the LTX model, which outperforms most models in past researches. On the whole, LTX approach is reliable and can improve the accuracy of pollutant concentration prediction. The health risks of human exposure to fine particles are relatively high in winter. Central part is a high health risk area, while northern area is low. Our study provides a new method for atmospheric pollutants assessing, which is important for LUR model optimization, high-precision PM2.5 pollution prediction and landscape pattern planning. These results can also contribute to human health exposure risks and future epidemiological studies of air pollution.
Collapse
Affiliation(s)
- Ping Zhang
- School of Environmental and Chemical Engineering, Xi'an Polytechnic University, Xi'an, 710048, China; Shaanxi Key Laboratory of Land Consolidation, Xi'an, 710075, China.
| | - Lianwei Yang
- School of Environmental and Chemical Engineering, Xi'an Polytechnic University, Xi'an, 710048, China
| | - Wenjie Ma
- School of Environmental and Chemical Engineering, Xi'an Polytechnic University, Xi'an, 710048, China
| | - Ning Wang
- School of Environmental and Chemical Engineering, Xi'an Polytechnic University, Xi'an, 710048, China
| | - Feng Wen
- School of Environmental and Chemical Engineering, Xi'an Polytechnic University, Xi'an, 710048, China.
| | - Qi Liu
- School of Environmental and Chemical Engineering, Xi'an Polytechnic University, Xi'an, 710048, China; The First Institute of Photogrammetry and Remote Sensing, MNR, Xi'an, 710054, China.
| |
Collapse
|
48
|
Estimating Hourly Surface Solar Irradiance from GK2A/AMI Data Using Machine Learning Approach around Korea. REMOTE SENSING 2022. [DOI: 10.3390/rs14081840] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Surface solar irradiance (SSI) is a crucial component in climatological and agricultural applications. Because the use of renewable energy is crucial, the importance of SSI has increased. In situ measurements are often used to investigate SSI; however, their availability is limited in spatial coverage. To precisely estimate the distribution of SSI with fine spatiotemporal resolutions, we used the GEOstationary Korea Multi-Purpose SATellite 2A (GEO-KOMPSAT 2A, GK2A) equipped with the Advanced Meteorological Imager (AMI). To obtain an optimal model for estimating hourly SSI around Korea using GK2A/AMI, the convolutional neural network (CNN) model as a machine learning (ML) technique was applied. Through statistical verification, CNN showed a high accuracy, with a root mean square error (RMSE) of 0.180 MJ m−2, a bias of −0.007 MJ m−2, and a Pearson’s R of 0.982. The SSI obtained through a ML approach showed an accuracy higher than the GK2A/AMI operational SSI product. The CNN SSI was evaluated by comparing it with the in situ SSI from the Ieodo Ocean Research Station and from flux towers over land; these in situ SSI values were not used for training the model. We investigated the error characteristics of the CNN SSI regarding environmental conditions including local time, solar zenith angle, in situ visibility, and in situ cloud amount. Furthermore, monthly and annual mean daily SSI were calculated for the period from 1 January 2020 to 31 January 2022, and regional characteristics of SSI around Korea were analyzed. This study addressed the availability of satellite-derived SSI to resolve the limitations of in situ measurements. This could play a principal role in climatological and renewable energy applications.
Collapse
|
49
|
Identification of Smartwatch-Collected Lifelog Variables Affecting Body Mass Index in Middle-Aged People Using Regression Machine Learning Algorithms and SHapley Additive Explanations. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12083819] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Body mass index (BMI) plays a vital role in determining the health of middle-aged people, and a high BMI is associated with various chronic diseases. This study aims to identify important lifelog factors related to BMI. The sleep, gait, and body data of 47 middle-aged women and 71 middle-aged men were collected using smartwatches. Variables were derived to examine the relationships between these factors and BMI. The data were divided into groups according to height based on the definition of BMI as the most influential variable. The data were analyzed using regression and tree-based models: Ridge Regression, eXtreme Gradient Boosting (XGBoost), and Category Boosting (CatBoost). Moreover, the importance of the BMI variables was visualized and examined using the SHapley Additive Explanations Technique (SHAP). The results showed that total sleep time, average morning gait speed, and sleep efficiency significantly affected BMI. However, the variables with the most substantial effects differed among the height groups. This indicates that the factors most profoundly affecting BMI differ according to body characteristics, suggesting the possibility of developing efficient methods for personalized healthcare.
Collapse
|
50
|
Gill M, Anderson R, Hu H, Bennamoun M, Petereit J, Valliyodan B, Nguyen HT, Batley J, Bayer PE, Edwards D. Machine learning models outperform deep learning models, provide interpretation and facilitate feature selection for soybean trait prediction. BMC PLANT BIOLOGY 2022; 22:180. [PMID: 35395721 PMCID: PMC8991976 DOI: 10.1186/s12870-022-03559-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 03/21/2022] [Indexed: 05/26/2023]
Abstract
Recent growth in crop genomic and trait data have opened opportunities for the application of novel approaches to accelerate crop improvement. Machine learning and deep learning are at the forefront of prediction-based data analysis. However, few approaches for genotype to phenotype prediction compare machine learning with deep learning and further interpret the models that support the predictions. This study uses genome wide molecular markers and traits across 1110 soybean individuals to develop accurate prediction models. For 13/14 sets of predictions, XGBoost or random forest outperformed deep learning models in prediction performance. Top ranked SNPs by F-score were identified from XGBoost, and with further investigation found overlap with significantly associated loci identified from GWAS and previous literature. Feature importance rankings were used to reduce marker input by up to 90%, and subsequent models maintained or improved their prediction performance. These findings support interpretable machine learning as an approach for genomic based prediction of traits in soybean and other crops.
Collapse
Affiliation(s)
- Mitchell Gill
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Robyn Anderson
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Haifei Hu
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Mohammed Bennamoun
- Department of Computer Science and Software Engineering, The University of Western Australia, Perth, WA, Australia
| | - Jakob Petereit
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Babu Valliyodan
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO, 65211, USA
- Department of Agriculture and Environmental Sciences, Lincoln University, Jefferson City, MO, 65101, USA
| | - Henry T Nguyen
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO, 65211, USA
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Philipp E Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia.
| |
Collapse
|