1
|
Rincon G, Morantes Quintana G, Gonzalez A, Buitrago Y, Gonzalez JC, Molina C, Jones B. PM 2.5 exceedances and source appointment as inputs for an early warning system. ENVIRONMENTAL GEOCHEMISTRY AND HEALTH 2022; 44:4569-4593. [PMID: 35192100 PMCID: PMC9675665 DOI: 10.1007/s10653-021-01189-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 12/17/2021] [Indexed: 05/05/2023]
Abstract
Between June 2018 and April 2019, a sampling campaign was carried out to collect PM2.5, monitoring meteorological parameters and anthropogenic events in the Sartenejas Valley, Venezuela. We develop a logistic model for PM2.5 exceedances (≥ 12.5 µg m-3). Source appointment was done using elemental composition and morphology of PM by scanning electron microscopy coupled with energy dispersive spectroscopy (SEM-EDS). A proposal of an early warning system (EWS) for PM pollution episodes is presented. The logistic model has a holistic success rate of 94%, with forest fires and motor vehicle flows as significant variables. Source appointment analysis by occurrence of events showed that samples with higher concentrations of PM had carbon-rich particles and traces of K associated with biomass burning, as well as aluminosilicates and metallic elements associated with resuspension of soil dust by motor-vehicles. Quantitative source appointment analysis showed that soil dust, garbage burning/marine aerosols and wildfires are three majority sources of PM. An EWS for PM pollution episodes around the Sartenejas Valley is proposed considering the variables and elements mentioned.
Collapse
Affiliation(s)
- Gladys Rincon
- Escuela Superior Politécnica del Litoral, ESPOL, Facultad de Ingeniería Marítima y Ciencias del Mar (FIMCM), Guayaquil, Ecuador.
- Pacific International Center for Disaster Risk Reduction, ESPOL, Guayaquil, Ecuador.
| | - Giobertti Morantes Quintana
- Department of Architecture and Built Environment, University of Nottingham, Nottingham, NG7 2RD, UK.
- Departamento de Procesos y Sistemas, Laboratorio de Residuales de Petróleo, Universidad Simón Bolívar, Caracas, Venezuela.
| | - Ahilymar Gonzalez
- Departamento de Procesos y Sistemas, Laboratorio de Residuales de Petróleo, Universidad Simón Bolívar, Caracas, Venezuela
| | - Yudeisy Buitrago
- Departamento de Procesos y Sistemas, Laboratorio de Residuales de Petróleo, Universidad Simón Bolívar, Caracas, Venezuela
| | - Jean Carlos Gonzalez
- Departamento de Procesos y Sistemas, Laboratorio de Residuales de Petróleo, Universidad Simón Bolívar, Caracas, Venezuela
| | - Constanza Molina
- Escuela de Construcción Civil, Pontificia Universidad Católica de Chile, Santiago de Chile, Chile
| | - Benjamin Jones
- Department of Architecture and Built Environment, University of Nottingham, Nottingham, NG7 2RD, UK
| |
Collapse
|
2
|
Alazmi A, Rakha H. Assessing and Validating the Ability of Machine Learning to Handle Unrefined Particle Air Pollution Mobile Monitoring Data Randomly, Spatially, and Spatiotemporally. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:10098. [PMID: 36011733 PMCID: PMC9408314 DOI: 10.3390/ijerph191610098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 06/19/2022] [Accepted: 06/20/2022] [Indexed: 06/15/2023]
Abstract
Many epidemiological studies have evaluated the accuracy of machine learning models in predicting levels of particulate number (PN) and black carbon (BC) pollutant concentrations. However, few studies have investigated the ability of machine learning to predict the pollutant concentration with using unrefined mobile measurement data and explore the reliability of the prediction models. Additionally, researchers are moving away from using fixed-site data in favor of using mobile monitoring data in a variety of locations to develop hourly empirical models of particulate air pollution. This study compared the differences between long-term (daily average) and short-term (hourly average and 1 s unrefined data) model performance in three different classes of cross validation: randomly, spatially, and spatially temporally. This study used secondary data describing BC and PN pollutant levels in the rural location of Blacksburg (VA). Our results show that the model based on unrefined data was able to detect the pollutant hot spot areas with similar accuracy compared to the aggregated model. Moreover, the performance was found to improve when temporal data added to the model: the 10-fold MAE for the BC and PN were 0.44 μg/m3 and 3391 pt/cm3, respectively, for the unrefined data (one second data) model. The findings detailed here will add to the literature on the correlation between data (pre)processing and the efficacy of machine learning models in predicting pollution levels while also enhancing our understanding of more reliable validation strategies.
Collapse
Affiliation(s)
- Asmaa Alazmi
- Department of Construction Project, Ministry of Public Work of Kuwait, Kuwait City 12011, Kuwait
| | - Hesham Rakha
- Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| |
Collapse
|
3
|
Abstract
Air pollution and its consequences are negatively impacting on the world population and the environment, which converts the monitoring and forecasting air quality techniques as essential tools to combat this problem. To predict air quality with maximum accuracy, along with the implemented models and the quantity of the data, it is crucial also to consider the dataset types. This study selected a set of research works in the field of air quality prediction and is concentrated on the exploration of the datasets utilised in them. The most significant findings of this research work are: (1) meteorological datasets were used in 94.6% of the papers leaving behind the rest of the datasets with a big difference, which is complemented with others, such as temporal data, spatial data, and so on; (2) the usage of various datasets combinations has been commenced since 2009; and (3) the utilisation of open data have been started since 2012, 32.3% of the studies used open data, and 63.4% of the studies did not provide the data.
Collapse
|
4
|
Molina-Gómez NI, Calderón-Rivera DS, Sierra-Parada R, Díaz-Arévalo JL, López-Jiménez PA. Analysis of incidence of air quality on human health: a case study on the relationship between pollutant concentrations and respiratory diseases in Kennedy, Bogotá. INTERNATIONAL JOURNAL OF BIOMETEOROLOGY 2021; 65:119-132. [PMID: 32661801 DOI: 10.1007/s00484-020-01955-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 06/10/2020] [Accepted: 06/12/2020] [Indexed: 05/13/2023]
Abstract
Thousands of deaths associated with air pollution each year could be prevented by forecasting the behavior of factors that pose risks to people's health and their geographical distribution. Proximity to pollution sources, degree of urbanization, and population density are some of the factors whose spatial distribution enables the identification of possible influence on the presence of respiratory diseases (RD). Currently, Bogotá is among the cities with the poorest air quality in Latin America. Specifically, the locality of Kennedy is one of the zones in the city with the highest recorded concentration levels of local pollutants over the last 10 years. From 2009 to 2016, there were 8619 deaths associated with respiratory and cardiovascular diseases in the locality. Given these characteristics, this study set out to identify and analyze the areas in which the primary socioeconomic and environmental conditions contribute to the presence of symptoms associated with RD. To this end, information collected in field by performing georeferenced surveys was analyzed through geostatistical and machine learning tools which carried out cluster and pattern analyses. Random forests and AdaBoost were applied to establish hot spots where RD could occur, given the conjugation of predictor variables in the micro-territory. It was found that random forests outperformed AdaBoost with 0.63 AUC. In particular, this study's approach applies to densely populated municipalities with high levels of air pollution. In using these tools, municipalities can anticipate environmental health situations and reduce the cost of respiratory disease treatments.
Collapse
Affiliation(s)
- Nidia Isabel Molina-Gómez
- Department of Environmental Engineering, Universidad Santo Tomás, Bogotá, 110231, Colombia.
- Hydraulic and Environmental Engineering Department, Universitat Politècnica de València, Valencia, 46022, Spain.
| | | | - Ronal Sierra-Parada
- Department of Environmental Engineering, Universidad Santo Tomás, Bogotá, 110231, Colombia
| | - José Luis Díaz-Arévalo
- Department of Civil and Agricultural Engineering, Universidad Nacional de Colombia, Bogotá, 111321, Colombia
| | - P Amparo López-Jiménez
- Hydraulic and Environmental Engineering Department, Universitat Politècnica de València, Valencia, 46022, Spain
| |
Collapse
|
5
|
Wang WCV, Lung SCC, Liu CH. Application of Machine Learning for the in-Field Correction of a PM 2.5 Low-Cost Sensor Network. SENSORS 2020; 20:s20175002. [PMID: 32899301 PMCID: PMC7506620 DOI: 10.3390/s20175002] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 08/27/2020] [Accepted: 08/31/2020] [Indexed: 01/12/2023]
Abstract
Many low-cost sensors (LCSs) are distributed for air monitoring without any rigorous calibrations. This work applies machine learning with PM2.5 from Taiwan monitoring stations to conduct in-field corrections on a network of 39 PM2.5 LCSs from July 2017 to December 2018. Three candidate models were evaluated: Multiple linear regression (MLR), support vector regression (SVR), and random forest regression (RFR). The model-corrected PM2.5 levels were compared with those of GRIMM-calibrated PM2.5. RFR was superior to MLR and SVR in its correction accuracy and computing efficiency. Compared to SVR, the root mean square errors (RMSEs) of RFR were 35% and 85% lower for the training and validation sets, respectively, and the computational speed was 35 times faster. An RFR with 300 decision trees was chosen as the optimal setting considering both the correction performance and the modeling time. An RFR with a nighttime pattern was established as the optimal correction model, and the RMSEs were 5.9 ± 2.0 μg/m3, reduced from 18.4 ± 6.5 μg/m3 before correction. This is the first work to correct LCSs at locations without monitoring stations, validated using laboratory-calibrated data. Similar models could be established in other countries to greatly enhance the usefulness of their PM2.5 sensor networks.
Collapse
Affiliation(s)
- Wen-Cheng Vincent Wang
- Research Center for Environmental Changes, Academia Sinica, Nangang, Taipei 115, Taiwan; (W.-C.V.W.); (C.-H.L.)
| | - Shih-Chun Candice Lung
- Research Center for Environmental Changes, Academia Sinica, Nangang, Taipei 115, Taiwan; (W.-C.V.W.); (C.-H.L.)
- Department of Atmospheric Sciences, National Taiwan University, Taipei 106, Taiwan
- Institute of Environmental Health, National Taiwan University, Taipei 106, Taiwan
- Correspondence:
| | - Chun-Hu Liu
- Research Center for Environmental Changes, Academia Sinica, Nangang, Taipei 115, Taiwan; (W.-C.V.W.); (C.-H.L.)
| |
Collapse
|
6
|
A data ensemble approach for real-time air quality forecasting using extremely randomized trees and deep neural networks. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04287-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
7
|
Eslami E, Choi Y, Lops Y, Sayeed A. A real-time hourly ozone prediction system using deep convolutional neural network. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04282-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
8
|
Machine Learning Approaches for Outdoor Air Quality Modelling: A Systematic Review. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8122570] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Current studies show that traditional deterministic models tend to struggle to capture the non-linear relationship between the concentration of air pollutants and their sources of emission and dispersion. To tackle such a limitation, the most promising approach is to use statistical models based on machine learning techniques. Nevertheless, it is puzzling why a certain algorithm is chosen over another for a given task. This systematic review intends to clarify this question by providing the reader with a comprehensive description of the principles underlying these algorithms and how they are applied to enhance prediction accuracy. A rigorous search that conforms to the PRISMA guideline is performed and results in the selection of the 46 most relevant journal papers in the area. Through a factorial analysis method these studies are synthetized and linked to each other. The main findings of this literature review show that: (i) machine learning is mainly applied in Eurasian and North American continents and (ii) estimation problems tend to implement Ensemble Learning and Regressions, whereas forecasting make use of Neural Networks and Support Vector Machines. The next challenges of this approach are to improve the prediction of pollution peaks and contaminants recently put in the spotlights (e.g., nanoparticles).
Collapse
|
9
|
Bellinger C, Mohomed Jabbar MS, Zaïane O, Osornio-Vargas A. A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health 2017; 17:907. [PMID: 29179711 PMCID: PMC5704396 DOI: 10.1186/s12889-017-4914-3] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 11/14/2017] [Indexed: 01/05/2023] Open
Abstract
Background Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology. Methods We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed. Results Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology. Conclusions We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air pollution epidemiology. The potential to support air pollution epidemiology continues to grow with advancements in data mining related to temporal and geo-spacial mining, and deep learning. This is further supported by new sensors and storage mediums that enable larger, better quality data. This suggests that many more fruitful applications can be expected in the future.
Collapse
Affiliation(s)
- Colin Bellinger
- Department of Computing Science, University of Alberta, Edmonton, Canada.
| | | | - Osmar Zaïane
- Department of Computing Science, University of Alberta, Edmonton, Canada
| | | |
Collapse
|
10
|
Fuller D, Buote R, Stanley K. A glossary for big data in population and public health: discussion and commentary on terminology and research methods. J Epidemiol Community Health 2017; 71:1113-1117. [PMID: 28918390 DOI: 10.1136/jech-2017-209608] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Revised: 08/15/2017] [Accepted: 08/15/2017] [Indexed: 11/03/2022]
Abstract
The volume and velocity of data are growing rapidly and big data analytics are being applied to these data in many fields. Population and public health researchers may be unfamiliar with the terminology and statistical methods used in big data. This creates a barrier to the application of big data analytics. The purpose of this glossary is to define terms used in big data and big data analytics and to contextualise these terms. We define the five Vs of big data and provide definitions and distinctions for data mining, machine learning and deep learning, among other terms. We provide key distinctions between big data and statistical analysis methods applied to big data. We contextualise the glossary by providing examples where big data analysis methods have been applied to population and public health research problems and provide brief guidance on how to learn big data analysis methods.
Collapse
Affiliation(s)
- Daniel Fuller
- School of Human Kinetics and Recreation, Memorial University of Newfoundland, Saint John's, Canada
| | - Richard Buote
- Division of Community Health and Humanities, Faculty of Medicine, Memorial University of Newfoundland, St John's, Canada
| | - Kevin Stanley
- Department of Computer Science, College of Arts and Science, University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|