1
|
Liu Z, Wang L. Semi-supervised urban haze pollution prediction based on multi-source heterogeneous data. Heliyon 2024; 10:e33332. [PMID: 39022081 PMCID: PMC11252978 DOI: 10.1016/j.heliyon.2024.e33332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 05/29/2024] [Accepted: 06/19/2024] [Indexed: 07/20/2024] Open
Abstract
Particulate matter (PM) is defined by the Texas Commission on Environmental Quality (TCEQ) as "a mixture of solid particles and liquid droplets found in the air". These particles vary widely in size. Those particles that are less than 2.5 μm in aerodynamic diameter are known as Particulate Matter 2.5 or PM2.5. Urban haze pollution represented by PM2.5 is becoming serious, so air pollution monitoring is very important. However, due to high cost, the number of air monitoring stations is limited. Our work focuses on integrating multi-source heterogeneous data of Nanchang, China, which includes Taxi track, human mobility, Road networks, Points of Interest (POIs), Meteorology (e.g., temperature, dew point, humidity, wind speed, wind direction, atmospheric pressure, weather activity, weather conditions) and PM2.5 forecast data of air monitoring stations. This research presents an innovative approach to air quality prediction by integrating the above data sets from various sources and utilizing diverse architectures in Nanchang City, China. So for that, semi-supervised learning techniques will be used, namely collaborative training algorithm Co-Training (Co-T), who further adjusting algorithm Tri-Training (Tri-T). The objective is to accurately estimate haze pollution by integrating and using these multi-source heterogeneous data. We achieved this for the first time by employing a semi-supervised co-training strategy to accurately estimate pollution levels after applying the U-air system to environmental data. In particular, the algorithm of U-Air system is reproduced on these highly diverse heterogeneous data of Nanchang City, and the semi-supervised learning Co-T and Tri-T are used to conduct more detailed urban haze pollution prediction. Compared with Co-T, which train time classifier (TC) and subspace classifier (SC) respectively from the separated spatio-temporal perspective, the Tri-T is more accurate with a and faster because of its testing accuracy up to 85.62 %. The forecast results also present the potential of the city multi-source heterogeneous data and the effectiveness of the semi-supervised learning. We hope that this synthesis will motivate atmospheric environmental officials, scientists, and environmentalists in China to explore machine learning technology for controlling the discharge of pollutants and environmental management.
Collapse
Affiliation(s)
- Zuhan Liu
- School of Information Engineering, Nanchang Institute of Technology, Nanchang, China
| | - Lili Wang
- College of Science, Nanchang Institute of Technology, Nanchang, China
| |
Collapse
|
2
|
Ren Y, Guan X, Zhang Q, Li L, Tao C, Ren S, Wang Q, Wang W. A machine learning-based study on the impact of COVID-19 on three kinds of pollution in Beijing-Tianjin-Hebei region. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 884:163190. [PMID: 37061051 PMCID: PMC10102532 DOI: 10.1016/j.scitotenv.2023.163190] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 03/25/2023] [Accepted: 03/27/2023] [Indexed: 05/07/2023]
Abstract
Large-scale restrictions on anthropogenic activities in China in 2020 due to the Corona Virus Disease 2019 (COVID-19) indirectly led to improvements in air quality. Previous studies have paid little attention to the changes in nitrogen dioxide (NO2), fine particulate matter (PM2.5) and ozone (O3) concentrations at different levels of anthropogenic activity limitation and their interactions. In this study, machine learning models were used to simulate the concentrations of three pollutants during periods of different levels of lockdown, and compare them with observations during the same period. The results show that the difference between the simulated and observed values of NO2 concentrations varies at different stages of the lockdown. Variation between simulated and observed O3 and PM2.5 concentrations were less distinct at different stages of lockdowns. During the most severe period of the lockdowns, NO2 concentrations decreased significantly with a maximum decrease of 65.28 %, and O3 concentrations increased with a maximum increase of 75.69 %. During the first two weeks of the lockdown, the titration reaction in the atmosphere was disrupted due to the rapid decrease in NO2 concentrations, leading to the redistribution of Ox (NO2 + O3) in the atmosphere and eventually to the production of O3 and secondary PM2.5. The effect of traffic restrictions on the reduction of NO2 concentrations is significant. However, it is also important to consider the increase in O3 due to the constant volatile organic compounds (VOCs) and the decrease in NOx (NO+NO2). Traffic restrictions had a limited effect on improving PM2.5 pollution, so other beneficial measures were needed to sustainably reduce particulate matter pollution. Research on COVID-19 could provide new insights into future clean air action.
Collapse
Affiliation(s)
- Yuchao Ren
- Big Data Research Center for Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, PR China
| | - Xu Guan
- Shandong Academy for Environmental Planning, Jinan 250101, PR China.
| | - Qingzhu Zhang
- Big Data Research Center for Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, PR China.
| | - Lei Li
- Big Data Research Center for Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, PR China
| | - Chenliang Tao
- Big Data Research Center for Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, PR China
| | - Shilong Ren
- Big Data Research Center for Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, PR China
| | - Qiao Wang
- Big Data Research Center for Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, PR China
| | - Wenxing Wang
- Big Data Research Center for Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, PR China
| |
Collapse
|
3
|
Wang H, Wang G. The prediction model for haze pollution based on stacking framework and feature extraction of time series images. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 839:156003. [PMID: 35595147 DOI: 10.1016/j.scitotenv.2022.156003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 04/27/2022] [Accepted: 05/12/2022] [Indexed: 06/15/2023]
Abstract
In this paper, we propose a new model called "image-feature-stacking prediction model" to study the prediction problem of univariate time series data. Its main idea is to convert univariate time series data into corresponding images, and then use the optimized Inception-v1 network to extract hidden features from the images as input variables, based on these features, a two-layer stacking ensemble learning framework is constructed to output the final predicted values. The main contribution of the newly proposed model is to convert one-dimensional time series data into two-dimensional images, and automatically extract features from images. This method can truly mine the intrinsic relationship between the data instead of simply relying on descriptive statistical features to replace the original time series, thereby improving the prediction performance of the model. We use the new prediction model to predict daily PM2.5 concentration, for one-step prediction, the results show that compared with the other three time series prediction models, the proposed prediction model reduces the mean absolute percentage error and mean absolute scaled error to 19.204% and 1.242, respectively, which is 76.607% and 77.004% lower than the maximum value of mean absolute percentage error and mean absolute scaled error of four prediction models. We also make two-step and three-step predictions, and the newly proposed model also shows encouraging performance.
Collapse
Affiliation(s)
- Hui Wang
- Department of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, PR China
| | - Guizhi Wang
- Department of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, PR China.
| |
Collapse
|
4
|
Peng Z, Zhang C, Cao B, Hong Z, Han X. An interpretable prediction of FCM driven by small samples for energy analysis based on air quality prediction. JOURNAL OF THE AIR & WASTE MANAGEMENT ASSOCIATION (1995) 2022; 72:985-999. [PMID: 35394412 DOI: 10.1080/10962247.2022.2064006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 03/31/2022] [Accepted: 04/04/2022] [Indexed: 06/14/2023]
Abstract
In order to achieve prevention and control of air pollution through energy consumption adjustment in advance, the paper proposes an Fuzzy Cognitive Map (FCM) of various energy resources affecting air quality, an incremental prediction algorithm of FCM and gradient descending method used to learn the FCM based on the small sample data on various energy consumptions and concentration of air pollutants. The FCM as an interpretable prediction method not only can predict future air quality more accurately, but also can analyze and interpret the affecting of various energy types on the future air quality. As the time delay of various energy consumptions affecting concentration of air pollutants, the quantitative time sequence influencing relationships (causality) in the FCM is mined directly from these data, and the air quality affected by various types of energy consumptions is predicted based on the FCM. Accordingly, the energy types affecting air pollution can be obtained for prior decision of energy consumption structure adjustment. The experimental results in Beijing-Tianjin-Hebei show that the FCM modeling is better than Support Vector Regression (SVR), Linear Regression (LR), Principal Component Analysis (PCA)-based forecasting, Convolutional Neural Networks (CNN) and Long-Short Term Memory (LSTM) methods in predicting air quality affected by energy resources, meanwhile according to the interpretable prediction results of the FCM, we obtain some interesting results and suggestions on energy consumption types in Beijing-Tianjin-Hebei regions in advance.Implications: At present, China's air pollution control has entered the deep-water area, and the biggest challenge is how to adjust the energy (consumption) structure. Therefore, this study completed the two important tasks: (1) driven by small sample data of energy consumptions, the paper provides an interpretable prediction model and method with better performance to achieve prevention and control of air pollution through energy consumption adjustment in advance; (2) according to the interpretable prediction results, the paper obtains some interesting results used to guide energy consumption adjustment in Beijing-Tianjin-Hebei regions. This study will provide beneficial suggestions and strategies for air pollution prevention and control in Beijing-Tianjin-Hebei, will help improve the air quality and energy consumption structure in Beijing-Tianjin-Hebei, and also can be extended to other regions.
Collapse
Affiliation(s)
- Zhen Peng
- Information Management Department, Beijing Institute of Petrochemical Technology, Daxing, Beijing, People's Republic of China
| | - Caixiao Zhang
- Information Management Department, Beijing Institute of Petrochemical Technology, Daxing, Beijing, People's Republic of China
| | - Boyang Cao
- Information Management Department, Beijing Institute of Petrochemical Technology, Daxing, Beijing, People's Republic of China
| | - Zitao Hong
- School of Computer Science, Xi'an Shiyou University, Huyi, Shaanxi, People's Republic of China
| | - Xue Han
- New Material Application Technology Center of GRIMAT Engineering Institute Co., General Research Institute for Nonferrous Metals, Huairou, Beijing, People's Republic of China
| |
Collapse
|
5
|
Zou W. Analysis and Research on the Rehabilitation Effect of Physical Exercise on College Students' Mental Depression Based on Multidimensional Data Mining. Occup Ther Int 2022; 2022:7656782. [PMID: 35912309 PMCID: PMC9300367 DOI: 10.1155/2022/7656782] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Revised: 06/13/2022] [Accepted: 06/16/2022] [Indexed: 11/17/2022] Open
Abstract
In order to explore the physical exercise and psychological changes of college students, this study adopts the method of multidimensional data mining, taking 23,146 undergraduates from a university in Guangzhou, Guangdong Province, as the research object. On the basis of summarizing and analyzing the previous research literature, this study expounds the development background, current situation, and future challenges of multidimensional data mining technology. This paper introduces the methods and principles of sample selection and multidimensional assessment of the physical and mental depression of college students, analyzes the physical health status of college students, summarizes the psychological changes of college students before and after the intervention, and discusses the relationship between physical exercise and mental health of college students. In addition, this paper analyzes the influencing factors and psychological changes of college students' physical exercise and puts forward suggestions for improving the physical and mental health of college students. The results of this study show that with increasing age, college students have a lower risk of moderate anxiety and mild depression; girls are more prone to mild depression than boys; rural college students are more prone to mild, moderate anxiety; college students with nonmedical backgrounds were more likely to experience moderate anxiety than college students with medical backgrounds. During the intervention control period, continuous connection with others may encourage college students to actively use internal resources to actively cope with obstacles and setbacks, and as a protective factor, psychological resilience can appropriately weaken the association between risk factors in life and anxiety and relieve anxiety. The results of this study can provide a reference for further research on the physical exercise and psychological changes of college students.
Collapse
Affiliation(s)
- Wen Zou
- Chengdu Sports University, Chengdu, Sichuan 610041, China
| |
Collapse
|
6
|
A Variational Bayesian Deep Network with Data Self-Screening Layer for Massive Time-Series Data Forecasting. ENTROPY 2022; 24:e24030335. [PMID: 35327846 PMCID: PMC8947458 DOI: 10.3390/e24030335] [Citation(s) in RCA: 53] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 02/23/2022] [Accepted: 02/23/2022] [Indexed: 02/01/2023]
Abstract
Compared with mechanism-based modeling methods, data-driven modeling based on big data has become a popular research field in recent years because of its applicability. However, it is not always better to have more data when building a forecasting model in practical areas. Due to the noise and conflict, redundancy, and inconsistency of big time-series data, the forecasting accuracy may reduce on the contrary. This paper proposes a deep network by selecting and understanding data to improve performance. Firstly, a data self-screening layer (DSSL) with a maximal information distance coefficient (MIDC) is designed to filter input data with high correlation and low redundancy; then, a variational Bayesian gated recurrent unit (VBGRU) is used to improve the anti-noise ability and robustness of the model. Beijing’s air quality and meteorological data are conducted in a verification experiment of 24 h PM2.5 concentration forecasting, proving that the proposed model is superior to other models in accuracy.
Collapse
|
7
|
Zhang T, Rong M, Shan H, Liu M. Causal Asymmetry Analysis in the View of Concept-Cognitive Learning by Incremental Concept Tree. Cognit Comput 2021. [DOI: 10.1007/s12559-021-09930-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|