1
|
Hua V, Nguyen T, Dao MS, Nguyen HD, Nguyen BT. The impact of data imputation on air quality prediction problem. PLoS One 2024; 19:e0306303. [PMID: 39264957 PMCID: PMC11392267 DOI: 10.1371/journal.pone.0306303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 06/15/2024] [Indexed: 09/14/2024] Open
Abstract
With rising environmental concerns, accurate air quality predictions have become paramount as they help in planning preventive measures and policies for potential health hazards and environmental problems caused by poor air quality. Most of the time, air quality data are time series data. However, due to various reasons, we often encounter missing values in datasets collected during data preparation and aggregation steps. The inability to analyze and handle missing data will significantly hinder the data analysis process. To address this issue, this paper offers an extensive review of air quality prediction and missing data imputation techniques for time series, particularly in relation to environmental challenges. In addition, we empirically assess eight imputation methods, including mean, median, kNNI, MICE, SAITS, BRITS, MRNN, and Transformer, to scrutinize their impact on air quality data. The evaluation is conducted using diverse air quality datasets gathered from numerous cities globally. Based on these evaluations, we offer practical recommendations for practitioners dealing with missing data in time series scenarios for environmental data.
Collapse
Affiliation(s)
- Van Hua
- Faculty of Mathematics and Computer Science, University of Science, Ho Chi Minh City, Vietnam
- Vietnam National University Ho Chi Minh City, Ho Chi Minh City, Vietnam
- Faculty of Information Technology, HUTECH University, Ho Chi Minh City, Vietnam
| | | | - Minh-Son Dao
- National Institute of Information and Communications Technology, Tokyo, Japan
| | - Hien D Nguyen
- Vietnam National University Ho Chi Minh City, Ho Chi Minh City, Vietnam
- University of Information Technology, Ho Chi Minh City, Vietnam
| | - Binh T Nguyen
- Faculty of Mathematics and Computer Science, University of Science, Ho Chi Minh City, Vietnam
- Vietnam National University Ho Chi Minh City, Ho Chi Minh City, Vietnam
| |
Collapse
|
2
|
Xia H, Chen X, Wang Z, Chen X, Dong F. A Multi-Modal Deep-Learning Air Quality Prediction Method Based on Multi-Station Time-Series Data and Remote-Sensing Images: Case Study of Beijing and Tianjin. ENTROPY (BASEL, SWITZERLAND) 2024; 26:91. [PMID: 38275499 PMCID: PMC11154360 DOI: 10.3390/e26010091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/16/2024] [Accepted: 01/20/2024] [Indexed: 01/27/2024]
Abstract
The profound impacts of severe air pollution on human health, ecological balance, and economic stability are undeniable. Precise air quality forecasting stands as a crucial necessity, enabling governmental bodies and vulnerable communities to proactively take essential measures to reduce exposure to detrimental pollutants. Previous research has primarily focused on predicting air quality using only time-series data. However, the importance of remote-sensing image data has received limited attention. This paper proposes a new multi-modal deep-learning model, Res-GCN, which integrates high spatial resolution remote-sensing images and time-series air quality data from multiple stations to forecast future air quality. Res-GCN employs two deep-learning networks, one utilizing the residual network to extract hidden visual information from remote-sensing images, and another using a dynamic spatio-temporal graph convolution network to capture spatio-temporal information from time-series data. By extracting features from two different modalities, improved predictive performance can be achieved. To demonstrate the effectiveness of the proposed model, experiments were conducted on two real-world datasets. The results show that the Res-GCN model effectively extracts multi-modal features, significantly enhancing the accuracy of multi-step predictions. Compared to the best-performing baseline model, the multi-step prediction's mean absolute error, root mean square error, and mean absolute percentage error increased by approximately 6%, 7%, and 7%, respectively.
Collapse
Affiliation(s)
- Hanzhong Xia
- Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China; (H.X.); (Z.W.)
| | - Xiaoxia Chen
- Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China; (H.X.); (Z.W.)
| | - Zhen Wang
- Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China; (H.X.); (Z.W.)
| | - Xinyi Chen
- School of Mathematics and Statistics, Ningbo University, Ningbo 315211, China;
| | - Fangyan Dong
- Faculty of Mechanical Engineering and Mechanics, Ningbo University, Ningbo 315211, China
| |
Collapse
|
3
|
Srivastava H, Kumar Das S. Air pollution prediction system using XRSTH-LSTM algorithm. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:125313-125327. [PMID: 37481499 DOI: 10.1007/s11356-023-28393-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 06/19/2023] [Indexed: 07/24/2023]
Abstract
Globally, there are significant worries about the rise in air pollution (AP) from substances that are harmful to human health, different living forms, and unfavorable environmental imbalances. To overcome the problem, AI-based prediction model is the need of the hour. Therefore, an attempt was made to develop a novel AP prediction system based on Xavier Reptile Switan-h-based Long-Short Term Memory (XRSTH-LSTM), which undergoes fine-tuning at various steps such as pre-processing, attribute extraction, and air-quality index prediction, in order to reduce computational cost and also to increase accuracy as well as precision. The dataset used to train the proposed methodology is Air Quality Data in India (2015-2020), taken from publically available sources Kaggle. The dataset includes information on the AQI and air quality at different stations in numerous Indian cities at hourly and daily intervals. The accuracy has been calculated using MSE, MAPE, RMSE, precision, recall, and F-measure. The robustness of the proposed model is tested using parameters such as negative predicted value and Mathew correlation coefficient. The proposed model is found to efficiently process air quality with an improved accuracy of 98.52% and precision of 99.79%, which is 0.74% higher than the existing state-of-the-art model. The testing findings showed that the proposed approach worked better than the current models and offered a higher rate of accuracy in predicting air pollution.
Collapse
Affiliation(s)
- Harshit Srivastava
- Department of Electronics and Communication, National Institute of Technology, Rourkela, 769008, Odisha, India
| | - Santos Kumar Das
- Department of Electronics and Communication, National Institute of Technology, Rourkela, 769008, Odisha, India.
| |
Collapse
|
4
|
Lu Y, Li K. Multistation collaborative prediction of air pollutants based on the CNN-BiLSTM model. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:92417-92435. [PMID: 37490250 DOI: 10.1007/s11356-023-28877-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 07/16/2023] [Indexed: 07/26/2023]
Abstract
The development of industry has led to serious air pollution problems. It is very important to establish high-precision and high-performance air quality prediction models and take corresponding control measures. In this paper, based on 4 years of air quality and meteorological data from Tianjin, China, the relationships between various meteorological factors and air pollutant concentrations are analyzed. A hybrid deep learning model consisting of a convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) is proposed to predict pollutant concentrations. In addition, a Bayesian optimization algorithm is applied to obtain the optimal combination of hyperparameters for the proposed deep learning model, which enhances the generalization ability of the model. Furthermore, based on air quality data from multiple stations in the region, a multistation collaborative prediction method is designed, and the concept of a strongly correlated station (SCS) is defined. The predictive model is modified using the idea of SCS and is used to predict the pollutant concentration in Tianjin. The coefficient of determination R2 of PM2.5, PM10, SO2, NO2, CO, and O3 are 0.89, 0.84, 0.69, 0.83, 0.92, and 0.84, respectively. The results show that our model is capable of dealing with air pollutant prediction with satisfactory accuracy.
Collapse
Affiliation(s)
- Yanan Lu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, 200433, China.
| | - Kun Li
- School of Economics and Management, Tiangong University, Tianjin, 300387, China
| |
Collapse
|
5
|
Duan J, Gong Y, Luo J, Zhao Z. Air-quality prediction based on the ARIMA-CNN-LSTM combination model optimized by dung beetle optimizer. Sci Rep 2023; 13:12127. [PMID: 37495616 PMCID: PMC10372025 DOI: 10.1038/s41598-023-36620-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 06/07/2023] [Indexed: 07/28/2023] Open
Abstract
Air pollution is a serious problem that affects economic development and people's health, so an efficient and accurate air quality prediction model would help to manage the air pollution problem. In this paper, we build a combined model to accurately predict the AQI based on real AQI data from four cities. First, we use an ARIMA model to fit the linear part of the data and a CNN-LSTM model to fit the non-linear part of the data to avoid the problem of blinding in the CNN-LSTM hyperparameter setting. Then, to avoid the blinding dilemma in the CNN-LSTM hyperparameter setting, we use the Dung Beetle Optimizer algorithm to find the hyperparameters of the CNN-LSTM model, determine the optimal hyperparameters, and check the accuracy of the model. Finally, we compare the proposed model with nine other widely used models. The experimental results show that the model proposed in this paper outperforms the comparison models in terms of root mean square error (RMSE), mean absolute error (MAE) and coefficient of determination (R2). The RMSE values for the four cities were 7.594, 14.94, 7.841 and 5.496; the MAE values were 5.285, 10.839, 5.12 and 3.77; and the R2 values were 0.989, 0.962, 0.953 and 0.953 respectively.
Collapse
Affiliation(s)
- Jiahui Duan
- School of Marine Engineer Equipment, Zhejiang Ocean University, Zhoushan, China
| | - Yaping Gong
- School of Marine Engineer Equipment, Zhejiang Ocean University, Zhoushan, China.
| | - Jun Luo
- School of Marine Engineer Equipment, Zhejiang Ocean University, Zhoushan, China
| | - Zhiyao Zhao
- School of Marine Engineer Equipment, Zhejiang Ocean University, Zhoushan, China
| |
Collapse
|
6
|
Wang K, Fan X, Yang X, Zhou Z. An AQI decomposition ensemble model based on SSA-LSTM using improved AMSSA-VMD decomposition reconstruction technique. ENVIRONMENTAL RESEARCH 2023:116365. [PMID: 37301497 DOI: 10.1016/j.envres.2023.116365] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 05/21/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023]
Abstract
Air quality index (AQI) is a key index for monitoring air pollution and can be used as guide for ensuring good public health. Accurate AQI prediction allows timely control and management of air pollution. In this study, a new integrated learning model was constructed to predict AQI. A smart reverse learning approach based on AMSSA was utilized to increase the diversity of populations, and an improved AMSSA (IAMSSA) was established. The optimum parameters with penalty factor α and mode number K of VMD were obtained using IAMSSA. The IAMSSA-VMD was used to decompose nonlinear and non-stationary AQI information series into several regular and smooth sub-sequences. The Sparrow Search Algorithm (SSA) was used to determine the optimum LSTM parameters. The results showed that: (1) IAMSSA exhibits faster convergence and higher accuracy and stability using simulation experiments compared with seven conventional optimization algorithms in 12 test functions. (2) IAMSSA-VMD was used to decompose the original air quality data results in multiple uncoupled intrinsic mode function (IMF) components and one residual (RES). An SSA-LSTM model was built for each IMF and one RES component, which effectively extracted the predicted values. (3) LSTM, SSA-LSTM, VMD-LSTM, VMD-SSA-LSTM, AMSSA-VMD-SSA-LSTM, and IAMSSA-VMD-SSA-LSTM models were used for prediction of AQI based on data from three cities (Chengdu, Guangzhou, and Shenyang). IAMSSA-VMD-SSA-LSTM exhibited the optimal prediction performance with MAE, RMSE, MAPE, and R2 of 3.692, 4.909, 6.241, and 0.981, respectively. (4) Generalization outcomes revealed that the IAMSSA-VMD-SSA-LSTM model had optimal generalization ability. In summary, the decomposition ensemble model proposed in this study has higher prediction accuracy, improved fitting effect and generalization ability compared with other models. These properties indicate the superiority of the decomposition ensemble model and provides a theoretical and technical basis for prediction of air pollution and ecosystem restoration.
Collapse
Affiliation(s)
- Kai Wang
- College of Mathematics and Physics, Chengdu University of Technology, Chengdu, 610059, China
| | - Xinyue Fan
- College of Management Science, Chengdu University of Technology, Chengdu, 610059, China.
| | - Xiaoyi Yang
- College of Mathematics and Physics, Chengdu University of Technology, Chengdu, 610059, China
| | - Zhongli Zhou
- College of Mathematics and Physics, Chengdu University of Technology, Chengdu, 610059, China; College of Management Science, Chengdu University of Technology, Chengdu, 610059, China
| |
Collapse
|
7
|
Song Q, Zou J, Xu M, Xi M, Zhou Z. Air quality prediction for Chengdu based on long short-term memory neural network with improved jellyfish search optimizer. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:64416-64442. [PMID: 37067716 DOI: 10.1007/s11356-023-26782-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 03/29/2023] [Indexed: 05/11/2023]
Abstract
Air quality prediction plays an important role in preventing air pollution and improving living environment. For this prediction, many indicators can be employed to reflect the air quality, among which air quality index (AQI) is the most commonly used. However, existing methods are relatively simple and the corresponding prediction accuracy needs to be improved. Particularly, the prediction accuracy is affected by the parameter selection of methods, and the corresponding optimization problems are usually non-convex and multi-modal. Therefore, based on long short-term memory (LSTM) neural network with improved jellyfish search optimizer (IJSO), a novel hybrid model denoted by IJSO-LSTM is proposed to predict AQI for Chengdu. In order to evaluate the optimizing ability of IJSO, other variants of jellyfish search optimizer as well as other state-of-the-art meta-heuristic algorithms are applied to optimize the hyperparameters of LSTM neural network for comparison, and the results confirm that IJSO is more suitable for optimizing LSTM neural network. In addition, compared with other well-known models, the results demonstrate IJSO-LSTM has higher prediction accuracy with root-mean-square error, mean absolute error, and mean absolute percentage error controlling below 4, 3, and 4%, respectively.
Collapse
Affiliation(s)
- Qixian Song
- School of Physics and Electronic Engineering, Sichuan Normal University, Chengdu, 610101, Sichuan, China
| | - Jing Zou
- School of Physics and Electronic Engineering, Sichuan Normal University, Chengdu, 610101, Sichuan, China
| | - Min Xu
- School of Physics and Electronic Engineering, Sichuan Normal University, Chengdu, 610101, Sichuan, China
| | - Mingyang Xi
- School of Physics and Electronic Engineering, Sichuan Normal University, Chengdu, 610101, Sichuan, China
| | - Zhaorong Zhou
- School of Physics and Electronic Engineering, Sichuan Normal University, Chengdu, 610101, Sichuan, China.
- Meteorological Information and Signal Processing Key Laboratory of Sichuan Higher Education Institutes, Chengdu University of Information Technology, Chengdu, 610225, Sichuan, China.
| |
Collapse
|
8
|
Méndez M, Merayo MG, Núñez M. Machine learning algorithms to forecast air quality: a survey. Artif Intell Rev 2023; 56:1-36. [PMID: 36820441 PMCID: PMC9933038 DOI: 10.1007/s10462-023-10424-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/01/2023] [Indexed: 02/18/2023]
Abstract
Air pollution is a risk factor for many diseases that can lead to death. Therefore, it is important to develop forecasting mechanisms that can be used by the authorities, so that they can anticipate measures when high concentrations of certain pollutants are expected in the near future. Machine Learning models, in particular, Deep Learning models, have been widely used to forecast air quality. In this paper we present a comprehensive review of the main contributions in the field during the period 2011-2021. We have searched the main scientific publications databases and, after a careful selection, we have considered a total of 155 papers. The papers are classified in terms of geographical distribution, predicted values, predictor variables, evaluation metrics and Machine Learning model.
Collapse
Affiliation(s)
- Manuel Méndez
- Design and Testing of Reliable Systems Research Group, Universidad Complutense de Madrid, C/ Profesor José García Santesmases, 9, 28040 Madrid, Madrid Spain
| | - Mercedes G. Merayo
- Design and Testing of Reliable Systems Research Group, Universidad Complutense de Madrid, C/ Profesor José García Santesmases, 9, 28040 Madrid, Madrid Spain
| | - Manuel Núñez
- Design and Testing of Reliable Systems Research Group, Universidad Complutense de Madrid, C/ Profesor José García Santesmases, 9, 28040 Madrid, Madrid Spain
| |
Collapse
|
9
|
A novel spatiotemporal multigraph convolutional network for air pollution prediction. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04418-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
10
|
Ajdour A, Adnane A, Ydir B, Ben Hmamou D, Khomsi K, Amghar H, Chelhaoui Y, Chaoufi J, Leghrib R. A new hybrid models based on the neural network and discrete wavelet transform to identify the CHIMERE model limitation. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:13141-13161. [PMID: 36127529 DOI: 10.1007/s11356-022-23084-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 09/14/2022] [Indexed: 06/15/2023]
Abstract
A greater understanding of ozone damage to the environment and health led to an increased demand for accurate predictions. This study provides two new accurate hybrid models of ozone prediction. The first one (CHIMERE-NARX) is based on a NARX model as a post-processing of the CHIMERE model. In the second (CHIMERE-NARX-DWT), a discrete wavelet transform (DWT) has been added. Our models were built and validated using ozone measurements from the Mediouna station in Casablanca, Morocco, from February 1st to March 27th, 2021. The results highlighted the CHIMERE model limitations, such as wind speed overestimation and insufficient emission data. The first hybrid successfully increased the correlation coefficient from 88 to 93% and reduced RMSE from 23.99 μg/m3 to -3.54 μg/m3, overcoming CHIMERE limitations to some extent, especially during nighttime. A second hybrid addressed the first hybrid limitation, such as using ozone as a single input. This hybrid successfully balanced the weight of NARX at night against the day, increasing the correlation coefficient to 98% and decreasing RMSE to -0.02 μg/m3. This study presents a new generation of post-processing based on deterministic model processes, with the possibility of training them with minimum input data, which can be applied to other models using various pollutants.
Collapse
Affiliation(s)
- Amine Ajdour
- LETSMP, Department of Physics, Faculty of Science, University Ibn Zohr, Agadir, Morocco.
| | - Anas Adnane
- LETSMP, Department of Physics, Faculty of Science, University Ibn Zohr, Agadir, Morocco
- General Directorate of Meteorology, Face Préfecture Hay Hassani, B.P. 8106 Casa-Oasis, Casablanca, Morocco
| | - Brahim Ydir
- LETSMP, Department of Physics, Faculty of Science, University Ibn Zohr, Agadir, Morocco
| | - Dris Ben Hmamou
- LETSMP, Department of Physics, Faculty of Science, University Ibn Zohr, Agadir, Morocco
| | - Kenza Khomsi
- General Directorate of Meteorology, Face Préfecture Hay Hassani, B.P. 8106 Casa-Oasis, Casablanca, Morocco
| | - Hassan Amghar
- General Directorate of Meteorology, Face Préfecture Hay Hassani, B.P. 8106 Casa-Oasis, Casablanca, Morocco
| | - Youssef Chelhaoui
- General Directorate of Meteorology, Face Préfecture Hay Hassani, B.P. 8106 Casa-Oasis, Casablanca, Morocco
| | - Jamal Chaoufi
- LETSMP, Department of Physics, Faculty of Science, University Ibn Zohr, Agadir, Morocco
| | - Radouane Leghrib
- LETSMP, Department of Physics, Faculty of Science, University Ibn Zohr, Agadir, Morocco
| |
Collapse
|
11
|
Barthwal A. A Markov chain-based IoT system for monitoring and analysis of urban air quality. ENVIRONMENTAL MONITORING AND ASSESSMENT 2022; 195:235. [PMID: 36574091 DOI: 10.1007/s10661-022-10857-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 12/15/2022] [Indexed: 06/17/2023]
Abstract
Severe deterioration of urban air quality in Asian cities is the cause of a large number of deaths every year. A Markov chain-based IoT system is developed in this study to monitor, analyze, and predict urban air quality. The proposed sensing setup is integrated with an automobile and is used for collecting air quality information. An Android application is used to transfer and store the sensed data in the data cloud. The data stored is used to generate the transition matrix of the AQI states and calculate return periods for each AQI state. The estimated time interval after which an AQI event recurs or is repeated is known as return period. The actual return periods for each AQI state at the test locations in Delhi-NCR are compared with those predicted using discrete time Markov chain (DTMC) models. Average absolute forecast error using our model was found to be 3.38% and 4.06%, respectively, at the selected locations.
Collapse
Affiliation(s)
- Anurag Barthwal
- Department of Computer Science and Engineering, SRM Institute of Science and Technology, NCR Campus, Ghaziabad, Uttar Pradesh, India.
| |
Collapse
|
12
|
Huang W, Cao Y, Cheng X, Guo Z. Research on air quality prediction based on improved long short-term memory network algorithm. PeerJ Comput Sci 2022; 8:e1187. [PMID: 37346303 PMCID: PMC10280268 DOI: 10.7717/peerj-cs.1187] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 11/21/2022] [Indexed: 06/23/2023]
Abstract
Air quality is changing due to the influence of industry, agriculture, people's living activities and other factors. Traditional machine learning methods generally do not consider the time series of the data itself and cannot handle long-range dependencies, thus ignoring information relevant to the predicted items and affecting the accuracy of air quality predictions. Therefore, an attention mechanism is introduced based on the long short term memory network model (LSTM), which attenuates unimportant information by controlling the proportion of the weight distribution. Finally, an integrated lightGBM+LSTM-attention model was constructed based on the light gradient boosting machine (lightGBM), and the prediction results were compared with those of 11 models. The experimental results show that the integrated model constructed in this article performs better, with the coefficient of determination (R2) of prediction accuracy reaching 0.969 and the root mean square error (RMSE) improving by 5.09, 4.94, 4.85 and 4.0 respectively compared to other models, verifying the superiority of the model.
Collapse
Affiliation(s)
- Wenchao Huang
- School of Information and Control Engineering, Liaoning Petrochemical University, Fushun, Liaoning, China
| | - Yu Cao
- School of Information and Control Engineering, Liaoning Petrochemical University, Fushun, Liaoning, China
| | - Xu Cheng
- School of Economics and Management, Shenyang Agricultural University, Shenyang, Liaoning, China
| | - Zongkai Guo
- Liaoning Meteorological Equipment Support Center, Shenyang, Liaoning, China
| |
Collapse
|
13
|
Jiang W, Zhu G, Shen Y, Xie Q, Ji M, Yu Y. An Empirical Mode Decomposition Fuzzy Forecast Model for Air Quality. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1803. [PMID: 36554208 PMCID: PMC9778395 DOI: 10.3390/e24121803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 11/30/2022] [Accepted: 12/07/2022] [Indexed: 06/17/2023]
Abstract
Air quality has a significant influence on people's health. Severe air pollution can cause respiratory diseases, while good air quality is beneficial to physical and mental health. Therefore, the prediction of air quality is very important. Since the concentration data of air pollutants are time series, their time characteristics should be considered in their prediction. However, the traditional neural network for time series prediction is limited by its own structure, which makes it very easy for it to fall into a local optimum during the training process. The empirical mode decomposition fuzzy forecast model for air quality, which is based on the extreme learning machine, is proposed in this paper. Empirical mode decomposition can analyze the changing trend of air quality well and obtain the changing trend of air quality under different time scales. According to the changing trend under different time scales, the extreme learning machine is used for fast training, and the corresponding prediction value is obtained. The adaptive fuzzy inference system is used for fitting to obtain the final air quality prediction result. The experimental results show that our model improves the accuracy of both short-term and long-term prediction by about 30% compared to other models, which indicates the remarkable efficacy of our approach. The research of this paper can provide the government with accurate future air quality information, which can take corresponding control measures in a targeted manner.
Collapse
|
14
|
Bhimavarapu U, Sreedevi M. An enhanced loss function in deep learning model to predict PM2.5 in India. INTELLIGENT DECISION TECHNOLOGIES 2022. [DOI: 10.3233/idt-220111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Fine particulate matter (PM2.5) is one of the major air pollutants and is an important parameter for measuring air quality levels. High concentrations of PM2.5 show its impact on human health, the environment, and climate change. An accurate prediction of fine particulate matter (PM2.5) is significant to air pollution detection, environmental management, human health, and social development. The primary approach is to boost the forecast performance by reducing the error in the deep learning model. So, there is a need to propose an enhanced loss function (ELF) to decrease the error and improve the accurate prediction of daily PM2.5 concentrations. This paper proposes the ELF in CTLSTM (Chi-Square test Long Short Term Memory) to improve the PM2.5 forecast. The ELF in the CTLSTM model gives more accurate results than the standard forecast models and other state-of-the-art deep learning techniques. The proposed ELFCTLSTM reduces the prediction error of by a maximum of 10 to 25 percent than the state-of-the-art deep learning models.
Collapse
Affiliation(s)
- Usharani Bhimavarapu
- Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India
| | - M. Sreedevi
- Department of CSE, Amrita Sai Institute of Science and Technology, Paritala, Andhra Pradesh, India
| |
Collapse
|
15
|
Xu S, Li W, Zhu Y, Xu A. A novel hybrid model for six main pollutant concentrations forecasting based on improved LSTM neural networks. Sci Rep 2022; 12:14434. [PMID: 36002466 PMCID: PMC9402967 DOI: 10.1038/s41598-022-17754-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 07/30/2022] [Indexed: 12/03/2022] Open
Abstract
In recent years, air pollution has become a factor that cannot be ignored, affecting human lives and health. The distribution of high-density populations and high-intensity development and construction have accentuated the problem of air pollution in China. To accelerate air pollution control and effectively improve environmental air quality, the target of our research was cities with serious air pollution problems to establish a model for air pollution prediction. We used the daily monitoring data of air pollution from January 2016 to December 2020 for the respective cities. We used the long short term memory networks (LSTM) algorithm model to solve the problem of gradient explosion in recurrent neural networks, then used the particle swarm optimization algorithm to determine the parameters of the CNN-LSTM model, and finally introduced the complete ensemble empirical mode decomposition of adaptive noise (CEEMDAN) decomposition to decompose air pollution and improve the accuracy of model prediction. The experimental results show that compared with a single LSTM model, the CEEMDAN-CNN-LSTM model has higher accuracy and lower prediction errors. The CEEMDAN-CNN-LSTM model enables a more precise prediction of air pollution, and may thus be useful for sustainable management and the control of air pollution.
Collapse
Affiliation(s)
- Shenyi Xu
- School of Statistics and Mathematics, Zhejiang Gongshang University, No.18 Xuezheng Street, Xiasha Higher Education Park, Hangzhou, Zhejiang, China
| | - Wei Li
- School of Statistics and Mathematics, Zhejiang Gongshang University, No.18 Xuezheng Street, Xiasha Higher Education Park, Hangzhou, Zhejiang, China
| | - Yuhan Zhu
- School of Statistics and Mathematics, Zhejiang Gongshang University, No.18 Xuezheng Street, Xiasha Higher Education Park, Hangzhou, Zhejiang, China.,Collaborative Innovation Center of Statistical Data Engineering, Technology & Application, Zhejiang Gongshang University, Hangzhou, China
| | - Aiting Xu
- School of Statistics and Mathematics, Zhejiang Gongshang University, No.18 Xuezheng Street, Xiasha Higher Education Park, Hangzhou, Zhejiang, China. .,Collaborative Innovation Center of Statistical Data Engineering, Technology & Application, Zhejiang Gongshang University, Hangzhou, China.
| |
Collapse
|
16
|
Design of induction motor speed observer based on long short-term memory. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07458-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
17
|
Wang J, Li X, Jin L, Li J, Sun Q, Wang H. An air quality index prediction model based on CNN-ILSTM. Sci Rep 2022; 12:8373. [PMID: 35589914 PMCID: PMC9120089 DOI: 10.1038/s41598-022-12355-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Accepted: 05/10/2022] [Indexed: 11/10/2022] Open
Abstract
Air quality index (AQI) is an essential measure of air pollution evaluation, which describes the air pollution degree and its impact on health, so the accurate prediction of AQI is significant. This paper presents an AQI prediction model based on Convolution Neural Networks (CNN) and Improved Long Short-Term Memory (ILSTM), named CNN-ILSTM. ILSTM deletes the output gate in LSTM and improves its input gate and forget gate, and introduces a Conversion Information Module (CIM) to prevent supersaturation in the learning process. ILSTM realizes efficient learning of historical data, improves prediction accuracy, and reduces the training time. CNN extracts the eigenvalues of input data effectively. This paper uses air quality data from 00:00 on January 1, 2017, to 23:00 on June 30, 2021, in Shijiazhuang City, Hebei Province, China, as experimental data sets, and compares this model with eight prediction models: SVR, RFR, MLP, LSTM, GRU, ILSTM, CNN-LSTM, and CNN-GRU to prove the validity and accuracy of CNN-ILSTM prediction model. The experimental results show the MAE of CNN-ILSTM is 8.4134, MSE is 202.1923, R2 is 0.9601, and the training time is 85.3 s. In this experiment, the performance of this model performs better than other models.
Collapse
Affiliation(s)
- Jingyang Wang
- School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, 050018, China
| | - Xiaolei Li
- School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, 050018, China
| | - Lukai Jin
- School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, 050018, China
| | - Jiazheng Li
- School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, 050018, China
| | - Qiuhong Sun
- School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, 050018, China
| | - Haiyao Wang
- School of Ocean Mechatronics, Xiamen Ocean Vocational College, Xiamen, 361100, China.
| |
Collapse
|
18
|
Tasyurek M, Celik M. 4D-GWR: geographically, altitudinal, and temporally weighted regression. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07311-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
19
|
Abstract
Owing to climate change, industrial pollution, and population gathering, the air quality status in many places in China is not optimal. The continuous deterioration of air-quality conditions has considerably affected the economic development and health of China’s people. However, the diversity and complexity of the factors which affect air pollution render air quality monitoring data complex and nonlinear. To improve the accuracy of prediction of the air quality index (AQI) and obtain more accurate AQI data with respect to their nonlinear and nonsmooth characteristics, this study introduces an air quality prediction model based on the empirical mode decomposition (EMD) of LSTM and uses improved particle swarm optimization (IPSO) to identify the optimal LSTM parameters. First, the model performed the EMD decomposition of air quality data and obtained uncoupled intrinsic mode function (IMF) components after removing noisy data. Second, we built an EMD–IPSO–LSTM air quality prediction model for each IMF component and extracted prediction values. Third, the results of validation analyses of the algorithm showed that compared with LSTM and EMD–LSTM, the improved model had higher prediction accuracy and improved the model fitting effect, which provided theoretical and technical support for the prediction and management of air pollution.
Collapse
|
20
|
Prediction of Daily Mean PM10 Concentrations Using Random Forest, CART Ensemble and Bagging Stacked by MARS. SUSTAINABILITY 2022. [DOI: 10.3390/su14020798] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
A novel framework for stacked regression based on machine learning was developed to predict the daily average concentrations of particulate matter (PM10), one of Bulgaria’s primary health concerns. The measurements of nine meteorological parameters were introduced as independent variables. The goal was to carefully study a limited number of initial predictors and extract stochastic information from them to build an extended set of data that allowed the creation of highly efficient predictive models. Four base models using random forest, CART ensemble and bagging, and their rotation variants, were built and evaluated. The heterogeneity of these base models was achieved by introducing five types of diversities, including a new simplified selective ensemble algorithm. The predictions from the four base models were then used as predictors in multivariate adaptive regression splines (MARS) models. All models were statistically tested using out-of-bag or with 5-fold and 10-fold cross-validation. In addition, a variable importance analysis was conducted. The proposed framework was used for short-term forecasting of out-of-sample data for seven days. It was shown that the stacked models outperformed all single base models. An index of agreement IA = 0.986 and a coefficient of determination of about 95% were achieved.
Collapse
|
21
|
Prediction of Air Pollutant Concentration Based on One-Dimensional Multi-Scale CNN-LSTM Considering Spatial-Temporal Characteristics: A Case Study of Xi’an, China. ATMOSPHERE 2021. [DOI: 10.3390/atmos12121626] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Air pollution has become a serious problem threatening human health. Effective prediction models can help reduce the adverse effects of air pollutants. Accurate predictions of air pollutant concentration can provide a scientific basis for air pollution prevention and control. However, the previous air pollution-related prediction models mainly processed air quality prediction, or the prediction of a single or two air pollutants. Meanwhile, the temporal and spatial characteristics and multiple factors of pollutants were not fully considered. Herein, we establish a deep learning model for an atmospheric pollutant memory network (LSTM) by both applying the one-dimensional multi-scale convolution kernel (ODMSCNN) and a long-short-term memory network (LSTM) on the basis of temporal and spatial characteristics. The temporal and spatial characteristics combine the respective advantages of CNN and LSTM networks. First, ODMSCNN is utilized to extract the temporal and spatial characteristics of air pollutant-related data to form a feature vector, and then the feature vector is input into the LSTM network to predict the concentration of air pollutants. The data set comes from the daily concentration data and hourly concentration data of six atmospheric pollutants (PM2.5, PM10, NO2, CO, O3, SO2) and 17 types of meteorological data in Xi’an. Daily concentration data prediction, hourly concentration data prediction, group data prediction and multi-factor prediction were used to verify the effectiveness of the model. In general, the air pollutant concentration prediction model based on ODMSCNN-LSTM shows a better prediction effect compared with multi-layer perceptron (MLP), CNN, and LSTM models.
Collapse
|
22
|
Short and Medium-Term Prediction of Winter Wheat NDVI Based on the DTW–LSTM Combination Method and MODIS Time Series Data. REMOTE SENSING 2021. [DOI: 10.3390/rs13224660] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The normalized difference vegetation index (NDVI) is an important agricultural parameter that is closely correlated with crop growth. In this study, a novel method combining the dynamic time warping (DTW) model and the long short-term memory (LSTM) deep recurrent neural network model was developed to predict the short and medium-term winter wheat NDVI. LSTM is well-suited for modelling long-term dependencies, but this method may be susceptible to overfitting. In contrast, DTW possesses good predictive ability and is less susceptible to overfitting. Therefore, by utilizing the combination of these two models, the prediction error caused by overfitting is reduced, thus improving the final prediction accuracy. The combined method proposed here utilizes the historical MODIS time series data with an 8-day time resolution from 2015 to 2020. First, fast Fourier transform (FFT) is used to decompose the time series into two parts. The first part reflects the inter-annual and seasonal variation characteristics of winter wheat NDVI, and the DTW model is applied for prediction. The second part reflects the short-term change characteristics of winter wheat NDVI, and the LSTM model is applied for prediction. Next, the results from both models are combined to produce a final prediction. A case study in Hebei Province that predicts the NDVI of winter wheat at five prediction horizons in the future indicates that the DTW–LSTM model proposed here outperforms the LSTM model according to multiple evaluation indicators. The results of this study suggest that the DTW–LSTM model is highly promising for short and medium-term NDVI prediction.
Collapse
|
23
|
Zhang Z, Zeng Y, Yan K. A hybrid deep learning technology for PM 2.5 air quality forecasting. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2021; 28:39409-39422. [PMID: 33759095 DOI: 10.1007/s11356-021-12657-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 01/20/2021] [Indexed: 06/12/2023]
Abstract
The concentration of PM2.5 is one of the main factors in evaluating the air quality in environmental science. The severe level of PM2.5 directly affects the public health, economics and social development. Due to the strong nonlinearity and instability of the air quality, it is difficult to predict the volatile changes of PM2.5 over time. In this paper, a hybrid deep learning model VMD-BiLSTM is constructed, which combines variational mode decomposition (VMD) and bidirectional long short-term memory network (BiLSTM), to predict PM2.5 changes in cities in China. VMD decomposes the original PM2.5 complex time series data into multiple sub-signal components according to the frequency domain. Then, BiLSTM is employed to predict each sub-signal component separately, which significantly improved forecasting accuracy. Through a comprehensive study with existing models, such as the EMD-based models and other VMD-based models, we justify the outperformance of the proposed VMD-BiLSTM model over all compared models. The results show that the prediction results are significantly improved with the proposed forecasting framework. And the prediction models integrating VMD are better than those integrating EMD. Among all the models integrating VMD, the proposed VMD-BiLSTM model is the most stable forecasting method.
Collapse
Affiliation(s)
- Zhendong Zhang
- Key Laboratory of Electromagnetic Wave Information Technology and Metrology of Zhejiang Province, College of Information Engineering, China Jiliang University, Hangzhou, 310018, China
| | - Yongkang Zeng
- Key Laboratory of Electromagnetic Wave Information Technology and Metrology of Zhejiang Province, College of Information Engineering, China Jiliang University, Hangzhou, 310018, China
| | - Ke Yan
- Key Laboratory of Electromagnetic Wave Information Technology and Metrology of Zhejiang Province, College of Information Engineering, China Jiliang University, Hangzhou, 310018, China.
- National University of Singapore, Singapore, 117566, Singapore.
| |
Collapse
|