1
|
VoPham T, White AJ, Jones RR. Geospatial Science for the Environmental Epidemiology of Cancer in the Exposome Era. Cancer Epidemiol Biomarkers Prev 2024; 33:451-460. [PMID: 38566558 PMCID: PMC10996842 DOI: 10.1158/1055-9965.epi-23-1237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 12/11/2023] [Accepted: 01/29/2024] [Indexed: 04/04/2024] Open
Abstract
Geospatial science is the science of location or place that harnesses geospatial tools, such as geographic information systems (GIS), to understand the features of the environment according to their locations. Geospatial science has been transformative for cancer epidemiologic studies through enabling large-scale environmental exposure assessments. As the research paradigm for the exposome, or the totality of environmental exposures across the life course, continues to evolve, geospatial science will serve a critical role in determining optimal practices for how to measure the environment as part of the external exposome. The objectives of this article are to provide a summary of key concepts, present a conceptual framework that illustrates how geospatial science is applied to environmental epidemiology in practice and through the lens of the exposome, and discuss the following opportunities for advancing geospatial science in cancer epidemiologic research: enhancing spatial and temporal resolutions and extents for geospatial data; geospatial methodologies to measure climate change factors; approaches facilitating the use of patient addresses in epidemiologic studies; combining internal exposome data and geospatial exposure models of the external exposome to provide insights into biological pathways for environment-disease relationships; and incorporation of geospatial data into personalized cancer screening policies and clinical decision making.
Collapse
Affiliation(s)
- Trang VoPham
- Epidemiology Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington
- Department of Epidemiology, University of Washington, Seattle, Washington
| | - Alexandra J. White
- Epidemiology Branch, Division of Intramural Research, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina
| | - Rena R. Jones
- Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, NCI, NIH, Department of Health and Human Services, Bethesda, Maryland
| |
Collapse
|
2
|
Wei Y, Qiu X, Yazdi MD, Shtein A, Shi L, Yang J, Peralta AA, Coull BA, Schwartz JD. The Impact of Exposure Measurement Error on the Estimated Concentration-Response Relationship between Long-Term Exposure to PM2.5 and Mortality. ENVIRONMENTAL HEALTH PERSPECTIVES 2022; 130:77006. [PMID: 35904519 PMCID: PMC9337229 DOI: 10.1289/ehp10389] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
BACKGROUND Exposure measurement error is a central concern in air pollution epidemiology. Given that studies have been using ambient air pollution predictions as proxy exposure measures, the potential impact of exposure error on health effect estimates needs to be comprehensively assessed. OBJECTIVES We aimed to generate wide-ranging scenarios to assess direction and magnitude of bias caused by exposure errors under plausible concentration-response relationships between annual exposure to fine particulate matter [PM ≤2.5μm in aerodynamic diameter (PM2.5)] and all-cause mortality. METHODS In this simulation study, we use daily PM2.5 predictions at 1-km2 spatial resolution to estimate annual PM2.5 exposures and their uncertainties for ZIP Codes of residence across the contiguous United States between 2000 and 2016. We consider scenarios in which we vary the error type (classical or Berkson) and the true concentration-response relationship between PM2.5 exposure and mortality (linear, quadratic, or soft-threshold-i.e., a smooth approximation to the hard-threshold model). In each scenario, we generate numbers of deaths using error-free exposures and confounders of concurrent air pollutants and neighborhood-level covariates and perform epidemiological analyses using error-prone exposures under correct specification or misspecification of the concentration-response relationship between PM2.5 exposure and mortality, adjusting for the confounders. RESULTS We simulate 1,000 replicates of each of 162 scenarios investigated. In general, both classical and Berkson errors can bias the concentration-response curve toward the null. The biases remain small even when using three times the predicted uncertainty to generate errors and are relatively larger at higher exposure levels. DISCUSSION Our findings suggest that the causal determination for long-term PM2.5 exposure and mortality is unlikely to be undermined when using high-resolution ambient predictions given that the estimated effect is generally smaller than the truth. The small magnitude of bias suggests that epidemiological findings are relatively robust against the exposure error. In practice, the use of ambient predictions with a finer spatial resolution will result in smaller bias. https://doi.org/10.1289/EHP10389.
Collapse
Affiliation(s)
- Yaguang Wei
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Xinye Qiu
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Mahdieh Danesh Yazdi
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Alexandra Shtein
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Liuhua Shi
- Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
| | - Jiabei Yang
- Department of Biostatistics, School of Public Health, Brown University, Providence, Rhode Island, USA
| | - Adjani A. Peralta
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Brent A. Coull
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Joel D. Schwartz
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
3
|
Research on statistical characteristics modeling of matching probability and measurement error based on machine learning. INTERNATIONAL JOURNAL OF INFORMATION SYSTEMS IN THE SERVICE SECTOR 2022. [DOI: 10.4018/ijisss.290548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In view of the problems of the current modeling methods for the statistical characteristics of matching probability and measurement error, the modeling method of matching probability and measurement error statistical characteristics based on machine learning is proposed. According to the requirements of total sequence matching probability and system matching times, the sequence matching probability is calculated. The measurement error is analyzed in the process of acquisition and matching, and the measurable interference parameters are obtained. According to the analysis results, the mean value of matching measurement error is standardized, and the matching probability and measurement error statistical characteristics are established sex model. The experimental results show that the matching probability and measurement error statistical model of this method has high accuracy, and has good application effect in practical application.
Collapse
|
4
|
Habre R, Girguis M, Urman R, Fruin S, Lurmann F, Shafer M, Gorski P, Franklin M, McConnell R, Avol E, Gilliland F. Contribution of tailpipe and non-tailpipe traffic sources to quasi-ultrafine, fine and coarse particulate matter in southern California. JOURNAL OF THE AIR & WASTE MANAGEMENT ASSOCIATION (1995) 2021; 71:209-230. [PMID: 32990509 PMCID: PMC8112073 DOI: 10.1080/10962247.2020.1826366] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/21/2020] [Accepted: 09/09/2020] [Indexed: 05/19/2023]
Abstract
Exposure to traffic-related air pollution (TRAP) in the near-roadway environment is associated with multiple adverse health effects. To characterize the relative contribution of tailpipe and non-tailpipe TRAP sources to particulate matter (PM) in the quasi-ultrafine (PM0.2), fine (PM2.5) and coarse (PM2.5-10) size fractions and identify their spatial determinants in southern California (CA). Month-long integrated PM0.2, PM2.5 and PM2.5-10 samples (n = 461, 265 and 298, respectively) were collected across cool and warm seasons in 8 southern CA communities (2008-9). Concentrations of PM mass, elements, carbons and major ions were obtained. Enrichment ratios (ER) in PM0.2 and PM10 relative to PM2.5 were calculated for each element. The Positive Matrix Factorization model was used to resolve and estimate the relative contribution of TRAP sources to PM in three size fractions. Generalized additive models (GAMs) with bivariate loess smooths were used to understand the geographic variation of TRAP sources and identify their spatial determinants. EC, OC, and B had the highest median ER in PM0.2 relative to PM2.5. Six, seven and five sources (with characteristic species) were resolved in PM0.2, PM2.5 and PM2.5-10, respectively. Combined tailpipe and non-tailpipe traffic sources contributed 66%, 32% and 18% of PM0.2, PM2.5 and PM2.5-10 mass, respectively. Tailpipe traffic emissions (EC, OC, B) were the largest contributor to PM0.2 mass (58%). Distinct gasoline and diesel tailpipe traffic sources were resolved in PM2.5. Others included fuel oil, biomass burning, secondary inorganic aerosol, sea salt, and crustal/soil. CALINE4 dispersion model nitrogen oxides, trucks and intersections were most correlated with TRAP sources. The influence of smaller roadways and intersections became more apparent once Long Beach was excluded. Non-tailpipe emissions constituted ~8%, 11% and 18% of PM0.2, PM2.5 and PM2.5-10, respectively, with important exposure and health implications. Future efforts should consider non-linear relationships amongst predictors when modeling exposures. Implications: Vehicle emissions result in a complex mix of air pollutants with both tailpipe and non-tailpipe components. As mobile source regulations lead to decreased tailpipe emissions, the relative contribution of non-tailpipe traffic emissions to near-roadway exposures is increasing. This study documents the presence of non-tailpipe abrasive vehicular emissions (AVE) from brake and tire wear, catalyst degradation and resuspended road dust in the quasi-ultrafine (PM0.2), fine and coarse particulate matter size fractions, with contributions reaching up to 30% in PM0.2 in some southern California communities. These findings have important exposure and policy implications given the high metal content of AVE and the efficiency of PM0.2 at reaching the alveolar region of the lungs and other organ systems once inhaled. This work also highlights important considerations for building models that can accurately predict tailpipe and non-tailpipe exposures for population health studies.
Collapse
Affiliation(s)
- Rima Habre
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| | - Mariam Girguis
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| | - Robert Urman
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| | - Scott Fruin
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| | | | - Martin Shafer
- Wisconsin State Laboratory of Hygiene, University of Wisconsin-Madison, Madison, WI
- Environmental Chemistry & Technology Program, University of Wisconsin-Madison, Madison WI
| | - Patrick Gorski
- Wisconsin State Laboratory of Hygiene, University of Wisconsin-Madison, Madison, WI
| | - Meredith Franklin
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| | - Rob McConnell
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| | - Ed Avol
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| | - Frank Gilliland
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| |
Collapse
|
5
|
Li L, Girguis M, Lurmann F, Pavlovic N, McClure C, Franklin M, Wu J, Oman LD, Breton C, Gilliland F, Habre R. Ensemble-based deep learning for estimating PM 2.5 over California with multisource big data including wildfire smoke. ENVIRONMENT INTERNATIONAL 2020; 145:106143. [PMID: 32980736 PMCID: PMC7643812 DOI: 10.1016/j.envint.2020.106143] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/14/2020] [Accepted: 09/13/2020] [Indexed: 05/21/2023]
Abstract
INTRODUCTION Estimating PM2.5 concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, which has high variability in natural (e.g, wildfires, dust) and anthropogenic emissions, meteorology, topography (e.g. desert surfaces, mountains, snow cover) and land use. METHODS Using ensemble-based deep learning with big data fused from multiple sources we developed a PM2.5 prediction model with uncertainty estimates at a high spatial (1 km × 1 km) and temporal (weekly) resolution for a 10-year time span (2008-2017). We leveraged autoencoder-based full residual deep networks to model complex nonlinear interrelationships among PM2.5 emission, transport and dispersion factors and other influential features. These included remote sensing data (MAIAC aerosol optical depth (AOD), normalized difference vegetation index, impervious surface), MERRA-2 GMI Replay Simulation (M2GMI) output, wildfire smoke plume dispersion, meteorology, land cover, traffic, elevation, and spatiotemporal trends (geo-coordinates, temporal basis functions, time index). As one of the primary predictors of interest with substantial missing data in California related to bright surfaces, cloud cover and other known interferences, missing MAIAC AOD observations were imputed and adjusted for relative humidity and vertical distribution. Wildfire smoke contribution to PM2.5 was also calculated through HYSPLIT dispersion modeling of smoke emissions derived from MODIS fire radiative power using the Fire Energetics and Emissions Research version 1.0 model. RESULTS Ensemble deep learning to predict PM2.5 achieved an overall mean training RMSE of 1.54 μg/m3 (R2: 0.94) and test RMSE of 2.29 μg/m3 (R2: 0.87). The top predictors included M2GMI carbon monoxide mixing ratio in the bottom layer, temporal basis functions, spatial location, air temperature, MAIAC AOD, and PM2.5 sea salt mass concentration. In an independent test using three long-term AQS sites and one short-term non-AQS site, our model achieved a high correlation (>0.8) and a low RMSE (<3 μg/m3). Statewide predictions indicated that our model can capture the spatial distribution and temporal peaks in wildfire-related PM2.5. The coefficient of variation indicated highest uncertainty over deciduous and mixed forests and open water land covers. CONCLUSION Our method can be generalized to other regions, including those having a mix of major urban areas, deserts, intensive smoke events, snow cover and complex terrains, where PM2.5 has previously been challenging to predict. Prediction uncertainty estimates can also inform further model development and measurement error evaluations in exposure and health studies.
Collapse
Affiliation(s)
- Lianfa Li
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA; State Key Laboratory of Resources and Environmental Information System, Institute of Geographical Sciences and Natural Resources, Chinese Academy of Sciences, Beijing, China.
| | - Mariam Girguis
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | | | | | | | - Meredith Franklin
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Jun Wu
- Program in Public Health, Susan and Henry Samueli College of Health Sciences, University of California, Irvine, CA, USA
| | - Luke D Oman
- Goddard Space Flight Center, National Aeronautics and Space Administration, Greenbelt, MD, USA
| | - Carrie Breton
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Frank Gilliland
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Rima Habre
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
6
|
Girguis MS, Li L, Lurmann F, Wu J, Breton C, Gilliland F, Stram D, Habre R. Exposure Measurement Error in Air Pollution Studies: The Impact of Shared, Multiplicative Measurement Error on Epidemiological Health Risk Estimates. AIR QUALITY, ATMOSPHERE, & HEALTH 2020; 13:631-643. [PMID: 32601528 PMCID: PMC7323995 DOI: 10.1007/s11869-020-00826-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 04/08/2020] [Indexed: 05/29/2023]
Abstract
Spatiotemporal air pollution models are increasingly being used to estimate health effects in epidemiological studies. Although such exposure prediction models typically result in improved spatial and temporal resolution of air pollution predictions, they remain subject to shared measurement error, a type of measurement error common in spatiotemporal exposure models which occurs when measurement error is not independent of exposures. A fundamental challenge of exposure measurement error in air pollution assessment is the strong correlation and sometimes identical (shared) error of exposure estimates across geographic space and time. When exposure estimates with shared measurement error are used to estimate health risk in epidemiological analyses, complex errors are potentially introduced, resulting in biased epidemiological conclusions. We demonstrate the influence of using a three-stage spatiotemporal exposure prediction model and introduce formal methods of shared, multiplicative measurement error (SMME) correction of epidemiological health risk estimates. Using our three-stage, ensemble learning based nitrogen oxides (NOx) exposure prediction model, we quantified SMME. We conducted an epidemiological analysis of wheeze risk in relation to NOx exposure among school-aged children. To demonstrate the incremental influence of exposure modeling stage, we iteratively estimated the health risk using assigned exposure predictions from each stage of the NOx model. We then determined the impact of SMME on the variance of the health risk estimates under various scenarios. Depending on the stage of the spatiotemporal exposure model used, we found that wheeze odds ratio ranged from 1.16 to 1.28 for an interquartile range increase in NOx. With each additional stage of exposure modeling, the health effect estimate moved further away from the null (OR=1). When corrected for observed SMME, the health effects confidence intervals slightly lengthened, but our epidemiological conclusions were not altered. When the variance estimate was corrected for the potential "worst case scenario" of SMME, the standard error further increased, having a meaningful influence on epidemiological conclusions. Our framework can be expanded and used to understand the implications of using exposure predictions subject to shared measurement error in future health investigations.
Collapse
Affiliation(s)
- Mariam S Girguis
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Lianfa Li
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | | | - Jun Wu
- Program in Public Health, Susan and Henry Samueli College of Health Sciences, University of California, Irvine, CA, USA
| | - Carrie Breton
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Frank Gilliland
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Daniel Stram
- Division of Biostatistics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Rima Habre
- Division of Environmental Health, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
7
|
A Robust Deep Learning Approach for Spatiotemporal Estimation of Satellite AOD and PM2.5. REMOTE SENSING 2020. [DOI: 10.3390/rs12020264] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Accurate estimation of fine particulate matter with diameter ≤2.5 μm (PM2.5) at a high spatiotemporal resolution is crucial for the evaluation of its health effects. Previous studies face multiple challenges including limited ground measurements and availability of spatiotemporal covariates. Although the multiangle implementation of atmospheric correction (MAIAC) retrieves satellite aerosol optical depth (AOD) at a high spatiotemporal resolution, massive non-random missingness considerably limits its application in PM2.5 estimation. Here, a deep learning approach, i.e., bootstrap aggregating (bagging) of autoencoder-based residual deep networks, was developed to make robust imputation of MAIAC AOD and further estimate PM2.5 at a high spatial (1 km) and temporal (daily) resolution. The base model consisted of autoencoder-based residual networks where residual connections were introduced to improve learning performance. Bagging of residual networks was used to generate ensemble predictions for better accuracy and uncertainty estimates. As a case study, the proposed approach was applied to impute daily satellite AOD and subsequently estimate daily PM2.5 in the Jing-Jin-Ji metropolitan region of China in 2015. The presented approach achieved competitive performance in AOD imputation (mean test R2: 0.96; mean test RMSE: 0.06) and PM2.5 estimation (test R2: 0.90; test RMSE: 22.3 μg/m3). In the additional independent tests using ground AERONET AOD and PM2.5 measurements at the monitoring station of the U.S. Embassy in Beijing, this approach achieved high R2 (0.82–0.97). Compared with the state-of-the-art machine learning method, XGBoost, the proposed approach generated more reasonable spatial variation for predicted PM2.5 surfaces. Publically available covariates used included meteorology, MERRA2 PBLH and AOD, coordinates, and elevation. Other covariates such as cloud fractions or land-use were not used due to unavailability. The results of validation and independent testing demonstrate the usefulness of the proposed approach in exposure assessment of PM2.5 using satellite AOD having massive missing values.
Collapse
|
8
|
Developing an ANFIS-PSO Model to Predict Mercury Emissions in Combustion Flue Gases. MATHEMATICS 2019. [DOI: 10.3390/math7100965] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Accurate prediction of mercury content emitted from fossil-fueled power stations is of the utmost importance for environmental pollution assessment and hazard mitigation. In this paper, mercury content in the output gas of power stations’ boilers was predicted using an adaptive neuro-fuzzy inference system (ANFIS) method integrated with particle swarm optimization (PSO). The input parameters of the model included coal characteristics and the operational parameters of the boilers. The dataset was collected from 82 sample points in power plants and employed to educate and examine the proposed model. To evaluate the performance of the proposed hybrid model of the ANFIS-PSO, the statistical meter of MARE% was implemented, which resulted in 0.003266 and 0.013272 for training and testing, respectively. Furthermore, relative errors between the acquired data and predicted values were between −0.25% and 0.1%, which confirm the accuracy of the model to deal non-linearity and represent the dependency of flue gas mercury content into the specifications of coal and the boiler type.
Collapse
|