1
|
Clark LP, Zilber D, Schmitt C, Fargo DC, Reif DM, Motsinger-Reif AA, Messier KP. A review of geospatial exposure models and approaches for health data integration. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2024:10.1038/s41370-024-00712-8. [PMID: 39251872 DOI: 10.1038/s41370-024-00712-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 08/01/2024] [Accepted: 08/05/2024] [Indexed: 09/11/2024]
Abstract
BACKGROUND Geospatial methods are common in environmental exposure assessments and increasingly integrated with health data to generate comprehensive models of environmental impacts on public health. OBJECTIVE Our objective is to review geospatial exposure models and approaches for health data integration in environmental health applications. METHODS We conduct a literature review and synthesis. RESULTS First, we discuss key concepts and terminology for geospatial exposure data and models. Second, we provide an overview of workflows in geospatial exposure model development and health data integration. Third, we review modeling approaches, including proximity-based, statistical, and mechanistic approaches, across diverse exposure types, such as air quality, water quality, climate, and socioeconomic factors. For each model type, we provide descriptions, general equations, and example applications for environmental exposure assessment. Fourth, we discuss the approaches used to integrate geospatial exposure data and health data, such as methods to link data sources with disparate spatial and temporal scales. Fifth, we describe the landscape of open-source tools supporting these workflows.
Collapse
Affiliation(s)
- Lara P Clark
- National Institute of Environmental Health Sciences, Office of the Scientific Director, Office of Data Science, Durham, NC, USA
| | - Daniel Zilber
- National Institute of Environmental Health Sciences, Division of Translational Toxicology, Predictive Toxicology Branch, Durham, NC, USA
| | - Charles Schmitt
- National Institute of Environmental Health Sciences, Office of the Scientific Director, Office of Data Science, Durham, NC, USA
| | - David C Fargo
- National Institute of Environmental Health Sciences, Office of the Director, Office of Environmental Science Cyberinfrastructure, Durham, NC, USA
| | - David M Reif
- National Institute of Environmental Health Sciences, Division of Translational Toxicology, Predictive Toxicology Branch, Durham, NC, USA
| | - Alison A Motsinger-Reif
- National Institute of Environmental Health Sciences, Division of Intramural Research, Biostatistics and Computational Biology Branch, Durham, NC, USA
| | - Kyle P Messier
- National Institute of Environmental Health Sciences, Division of Translational Toxicology, Predictive Toxicology Branch, Durham, NC, USA.
- National Institute of Environmental Health Sciences, Division of Intramural Research, Biostatistics and Computational Biology Branch, Durham, NC, USA.
| |
Collapse
|
2
|
Tatalovich Z, Chtourou A, Zhu L, Dellavalle C, Hanson HA, Henry KA, Penberthy L. Landscape analysis of environmental data sources for linkage with SEER cancer patients database. J Natl Cancer Inst Monogr 2024; 2024:132-144. [PMID: 39102880 DOI: 10.1093/jncimonographs/lgae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 02/28/2024] [Accepted: 03/17/2024] [Indexed: 08/07/2024] Open
Abstract
One of the challenges associated with understanding environmental impacts on cancer risk and outcomes is estimating potential exposures of individuals diagnosed with cancer to adverse environmental conditions over the life course. Historically, this has been partly due to the lack of reliable measures of cancer patients' potential environmental exposures before a cancer diagnosis. The emerging sources of cancer-related spatiotemporal environmental data and residential history information, coupled with novel technologies for data extraction and linkage, present an opportunity to integrate these data into the existing cancer surveillance data infrastructure, thereby facilitating more comprehensive assessment of cancer risk and outcomes. In this paper, we performed a landscape analysis of the available environmental data sources that could be linked to historical residential address information of cancer patients' records collected by the National Cancer Institute's Surveillance, Epidemiology, and End Results Program. The objective is to enable researchers to use these data to assess potential exposures at the time of cancer initiation through the time of diagnosis and even after diagnosis. The paper addresses the challenges associated with data collection and completeness at various spatial and temporal scales, as well as opportunities and directions for future research.
Collapse
Affiliation(s)
- Zaria Tatalovich
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Rockville, MD, USA
| | - Amina Chtourou
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Rockville, MD, USA
| | - Li Zhu
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Rockville, MD, USA
| | - Curt Dellavalle
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Rockville, MD, USA
| | - Heidi A Hanson
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, US Department of Energy, Oakridge, TN, USA
| | - Kevin A Henry
- Temple University, Philadelphia, PA, USA
- Cancer Prevention and Control, Fox Chase Cancer Center, Philadelphia, PA, USA
| | - Lynne Penberthy
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Rockville, MD, USA
| |
Collapse
|
3
|
Abad S, Badilla P, Marshall AT, Smith C, Tsui B, Cardenas-Iniguez C, Herting MM. Lifetime residential history collection and processing for environmental data linkages in the ABCD study. Health Place 2024; 87:103238. [PMID: 38677137 PMCID: PMC11132178 DOI: 10.1016/j.healthplace.2024.103238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 03/01/2024] [Accepted: 03/28/2024] [Indexed: 04/29/2024]
Abstract
By using geospatial information such as participants' residential history along with external datasets of environmental exposures, ongoing studies can enrich their cohorts to investigate the role of the environment on brain-behavior health outcomes. However, challenges may arise if clear guidance and key quality control steps are not taken at the outset of data collection of residential information. Here, we detail the protocol development aimed at improving the collection of lifetime residential address information from the Adolescent Brain Cognitive Development (ABCD) Study. This protocol generates a workflow for minimizing gaps in residential information, improving data collection processes, and reducing misclassification error in exposure estimates.
Collapse
Affiliation(s)
- Shermaine Abad
- Department of Radiology, University of California, San Diego, USA
| | - Paola Badilla
- Institute for Behavioral Genetics, University of Colorado, Boulder, USA
| | - Andrew T Marshall
- Department of Pediatrics (Division of Neurology), Children's Hospital Los Angeles, USA
| | - Calen Smith
- Department of Psychiatry, University of California, San Diego, USA
| | - Brandon Tsui
- Department of Radiology, University of California, San Diego, USA
| | - Carlos Cardenas-Iniguez
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, USA
| | - Megan M Herting
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, USA; Department of Pediatrics, Children's Hospital Los Angeles, USA.
| |
Collapse
|
4
|
Humphrey JL, Kinnee EJ, Robinson LF, Clougherty JE. Disentangling impacts of multiple pollutants on acute cardiovascular events in New York city: A case-crossover analysis. ENVIRONMENTAL RESEARCH 2024; 242:117758. [PMID: 38029813 PMCID: PMC11378578 DOI: 10.1016/j.envres.2023.117758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 10/29/2023] [Accepted: 11/21/2023] [Indexed: 12/01/2023]
Abstract
BACKGROUND Ambient air pollution contributes to an estimated 6.67 million deaths annually, and has been linked to cardiovascular disease (CVD), the leading cause of death. Short-term increases in air pollution have been associated with increased risk of CVD event, though relatively few studies have directly compared effects of multiple pollutants using fine-scale spatio-temporal data, thoroughly adjusting for co-pollutants and temperature, in an exhaustive citywide hospitals dataset, towards identifying key pollution sources within the urban environment to most reduce, and reduce disparities in, the leading cause of death worldwide. OBJECTIVES We aimed to examine multiple pollutants against multiple CVD diagnoses, across lag days, in models adjusted for co-pollutants and meteorology, and inherently adjusted by design for non-time-varying individual and aggregate-level covariates, using fine-scale space-time exposure estimates, in an exhaustive dataset of emergency department visits and hospitalizations across an entire city, thereby capturing the full population-at-risk. METHODS We used conditional logistic regression in a case-crossover design - inherently controlling for all confounders not varying within case month - to examine associations between spatio-temporal nitrogen dioxide (NO2), fine particulate matter (PM2.5), sulfur dioxide (SO2), and ozone (O3) in New York City, 2005-2011, on individual risk of acute CVD event (n = 837,523), by sub-diagnosis [ischemic heart disease (IHD), heart failure (HF), stroke, ischemic stroke, acute myocardial infarction]. RESULTS We found significant same-day associations between NO2 and risk of overall CVD, IHD, and HF - and between PM2.5 and overall CVD or HF event risk - robust to all adjustments and multiple comparisons. Results were comparable by sex and race - though median age at CVD was 10 years younger for Black New Yorkers than White New Yorkers. Associations for NO2 were comparable for adults younger or older than 69 years, though PM2.5 associations were stronger among older adults. DISCUSSION Our results indicate immediate, robust effects of combustion-related pollution on CVD risk, by sub-diagnosis. Though acute impacts differed minimally by age, sex, or race, the much younger age-at-event for Black New Yorkers calls attention to cumulative social susceptibility.
Collapse
Affiliation(s)
- Jamie L Humphrey
- Center Public Health Methods; RTI International, Research Triangle Park, NC, 27709, USA
| | - Ellen J Kinnee
- University Center for Social and Urban Research, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Lucy F Robinson
- Department of Epidemiology and Biostatistics, Dornsife School of Public Health, Drexel University, Philadelphia, PA, 19104, USA
| | - Jane E Clougherty
- Department of Environmental and Occupational Health, Dornsife School of Public Health, Drexel University, Philadelphia, PA, 19104, USA.
| |
Collapse
|
5
|
Klaus CA, Henry KA, Il'yasova D. Capturing emergency dispatch address points as geocoding candidates to quantify delimited confidence in residential geolocation. Int J Health Geogr 2023; 22:25. [PMID: 37752482 PMCID: PMC10523746 DOI: 10.1186/s12942-023-00347-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 09/14/2023] [Indexed: 09/28/2023] Open
Abstract
BACKGROUND In response to citizens' concerns about elevated cancer incidence in their locales, US CDC proposed publishing cancer incidence at sub-county scales. At these scales, confidence in patients' residential geolocation becomes a key constraint of geospatial analysis. To support monitoring cancer incidence in sub-county areas, we presented summary metrics to numerically delimit confidence in residential geolocation. RESULTS We defined a concept of Residential Address Discriminant Power (RADP) as theoretically perfect within all residential addresses and its practical application, i.e., using Emergency Dispatch (ED) Address Point Candidates of Equivalent Likelihood (CEL) to quantify Residential Geolocation Discriminant Power (RGDP) to approximate RADP. Leveraging different productivity of probabilistic, deterministic, and interactive geocoding record linkage, we simultaneously detected CEL for 5,807 cancer cases reported to North Carolina Central Cancer Registry (NC CCR)- in January 2022. Batch-match probabilistic and deterministic algorithms matched 86.0% cases to their unique ED address point candidates or a CEL, 4.4% to parcel site address, and 1.4% to street centerline. Interactively geocoded cases were 8.2%. To demonstrate differences in residential geolocation confidence between enumeration areas, we calculated sRGDP for cancer cases by county and assessed the existing uncertainty within the ED data, i.e., identified duplicate addresses (as CEL) for each ED address point in the 2014 version of the NC ED data and calculated ED_sRGDP by county. Both summary RGDP (sRGDP) (0.62-1.00) and ED_sRGDP (0.36-1.00) varied across counties and were lower in rural counties (p < 0.05); sRGDP correlated with ED_sRGDP (r = 0.42, p < 0.001). The discussion covered multiple conceptual and economic issues attendant to quantifying confidence in residential geolocation and presented a set of organizing principles for future work. CONCLUSIONS Our methodology produces simple metrics - sRGDP - to capture confidence in residential geolocation via leveraging ED address points as CEL. Two facts demonstrate the usefulness of sRGDP as area-based summary metrics: sRGDP variability between counties and the overall lower quality of residential geolocation in rural vs. urban counties. Low sRGDP for the cancer cases within the area of interest helps manage expectations for the uncertainty in cancer incidence data. By supplementing cancer incidence data with sRGDP and ED_sRGDP, CCRs can demonstrate transparency in geocoding success, which may help win citizen trust.
Collapse
Affiliation(s)
| | - Kevin A Henry
- Department of Geography, Environment and Urban Studies, Temple University, Philadelphia, PA, USA
- Division of Cancer Prevention and Control, Fox Chase Cancer Center, Philadelphia, PA, USA
| | - Dora Il'yasova
- Center for Social and Clinical Research, National Minority Quality Forum, Washington, DC, USA
- Department of Community and Family Health, Duke University School of Medicine, Durham, NC, USA
| |
Collapse
|
6
|
Cortés S, Leiva C, Ojeda MJ, Bustamante-Ara N, Wambaa W, Dominguez A, Pasten Salvo C, Rodriguez Peralta C, Rojas Arenas B, Vargas Mesa D, Ahumada-Padilla E. Air Pollution and Cardiorespiratory Changes in Older Adults Living in a Polluted Area in Central Chile. ENVIRONMENTAL HEALTH INSIGHTS 2022; 16:11786302221107136. [PMID: 35782316 PMCID: PMC9243574 DOI: 10.1177/11786302221107136] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/12/2022] [Indexed: 06/15/2023]
Abstract
One recognized cause of cardiorespiratory diseases is air pollution. Older adults (OA) are one of the most vulnerable groups that suffer from its adverse effects. The objective of the study was to analyze the association between exposure to air pollution and changes in cardiorespiratory variables in OA. Observational prospective cohort study. Health questionnaires, blood pressure (BP) measurements, lung functions, respiratory symptoms, physical activity levels, and physical fitness in high and low exposure to air pollution were all methods used in evaluating OAs in communes with high contamination rates. Linear and logistic models were created to adjust for variables of interest. A total of 92 OA participated in this study. 73.9% of the subjects were women with 72.3 ± 5.6 years. 46.7% were obese, while 12.1% consumed tobacco. The most prevalent diseases found were hypertension, diabetes, and cardiovascular disease. Adjusted linear models maintained an increase for systolic BP of 6.77 mmHg (95% CI: 1.04-12.51), and diastolic of 3.51 mmHg (95% CI: 0.72-6.29), during the period of high exposure to air pollution. The adjusted logistic regression model indicated that, during the period of high exposure to air pollution increase the respiratory symptoms 4 times more (OR: 4.43, 95% CI: 2.07-10.04) in the OA. The results are consistent with an adverse effect on cardiorespiratory variables in periods of high exposure to air pollution in the OA population.
Collapse
Affiliation(s)
- Sandra Cortés
- Department of Public Health, Pontificia
Universidad Católica de Chile, Santiago, Chile
- Advanced Center for Chronic Diseases
(ACCDIS), Pontificia Universidad Católica de Chile, Santiago, Chile
- Center for Sustainable Urban
Development (CEDEUS), Pontificia Universidad Católica de Chile, Santiago,
Chile
| | - Cinthya Leiva
- Department of Public Health, Pontificia
Universidad Católica de Chile, Santiago, Chile
- Center for Sustainable Urban
Development (CEDEUS), Pontificia Universidad Católica de Chile, Santiago,
Chile
| | - María José Ojeda
- Department of Public Health, Pontificia
Universidad Católica de Chile, Santiago, Chile
| | | | | | - Alan Dominguez
- Department of Public Health, Pontificia
Universidad Católica de Chile, Santiago, Chile
- Department of Experimental and Health
Sciences, Pompeu Fabra University, Barcelona, España
| | | | | | | | | | | |
Collapse
|
7
|
Harper G, Stables D, Simon P, Ahmed Z, Smith K, Robson J, Dezateux C. Evaluation of the ASSIGN open-source deterministic address-matching algorithm for allocating unique property reference numbers to general practitioner-recorded patient addresses. Int J Popul Data Sci 2021; 6:1674. [PMID: 34970633 PMCID: PMC8678979 DOI: 10.23889/ijpds.v6i1.1674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
INTRODUCTION Linking places to people is a core element of the UK government's geospatial strategy. Matching patient addresses in electronic health records to their Unique Property Reference Numbers (UPRNs) enables spatial linkage for research, innovation and public benefit. Available algorithms are not transparent or evaluated for use with addresses recorded by health care providers. OBJECTIVES To describe and quality assure the open-source deterministic ASSIGN address-matching algorithm applied to general practitioner-recorded patient addresses. METHODS Best practice standards were used to report the ASSIGN algorithm match rate, sensitivity and positive predictive value using gold-standard datasets from London and Wales. We applied the ASSIGN algorithm to the recorded addresses of a sample of 1,757,018 patients registered with all general practices in north east London. We examined bias in match results for the study population using multivariable analyses to estimate the likelihood of an address-matched UPRN by demographic, registration, and organisational variables. RESULTS We found a 99.5% and 99.6% match rate with high sensitivity (0.999,0.998) and positive predictive value (0.996,0.998) for the Welsh and London gold standard datasets respectively, and a 98.6% match rate for the study population.The 1.4% of the study population without a UPRN match were more likely to have changed registered address in the last 12 months (match rate: 95.4%), be from a Chinese ethnic background (95.5%), or registered with a general practice using the SystmOne clinical record system (94.4%). Conversely, people registered for more than 6.5 years with their general practitioner were more likely to have a match (99.4%) than those with shorter registration durations. CONCLUSIONS ASSIGN is a highly accurate open-source address-matching algorithm with a high match rate and minimal biases when evaluated against a large sample of general practice-recorded patient addresses. ASSIGN has potential to be used in other address-based datasets including those with information relevant to the wider determinants of health.
Collapse
Affiliation(s)
- Gill Harper
- Clinical Effectiveness Group, Centre for Primary Care, Wolfson Institute of Population Health, Barts and the London School of Medicine and Dentistry, Queen Mary, University of London
| | | | | | - Zaheer Ahmed
- Clinical Effectiveness Group, Centre for Primary Care, Wolfson Institute of Population Health, Barts and the London School of Medicine and Dentistry, Queen Mary, University of London
| | - Kelvin Smith
- Clinical Effectiveness Group, Centre for Primary Care, Wolfson Institute of Population Health, Barts and the London School of Medicine and Dentistry, Queen Mary, University of London
| | - John Robson
- Clinical Effectiveness Group, Centre for Primary Care, Wolfson Institute of Population Health, Barts and the London School of Medicine and Dentistry, Queen Mary, University of London
| | - Carol Dezateux
- Clinical Effectiveness Group, Centre for Primary Care, Wolfson Institute of Population Health, Barts and the London School of Medicine and Dentistry, Queen Mary, University of London
| |
Collapse
|
8
|
Torok M, Konings P, Passioura J, Chen NA, Hewett M, Phillips M, Burnett A, Shand F, Christensen H. Spatial Errors in Automated Geocoding of Incident Locations in Australian Suicide Mortality Data. Epidemiology 2021; 32:896-903. [PMID: 34310446 DOI: 10.1097/ede.0000000000001403] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
BACKGROUND There is increasing interest in the spatial analysis of suicide data to identify high-risk (often public) locations likely to benefit from access restriction measures. The identification of such locations, however, relies on accurately geocoded data. This study aims to examine the extent to which common completeness and positional spatial errors are present in suicide data due to the underlying geocoding process. METHODS Using Australian suicide mortality data from the National Coronial Information System for the period of 2008-2017, we compared the custodian automated geocoding process to an alternate multiphase process. Descriptive and kernel density cluster analyses were conducted to ascertain data completeness (address matching rates) and positional accuracy (distance revised) differences between the two datasets. RESULTS The alternate geocoding process initially improved address matching from 67.8% in the custodian dataset to 78.4%. Additional manual identification of nonaddress features (such as cliffs or bridges) improved overall match rates to 94.6%. Nearly half (49.2%) of nonresidential suicide locations were revised more than 1,000 m from data custodian coordinates. Spatial misattribution rates were greatest at the smallest levels of geography. Kernel density maps showed clear misidentification of hotspots relying solely on autogeocoded data. CONCLUSION Suicide incidents that occur at nonresidential addresses are being erroneously geocoded to centralized fall-back locations in autogeocoding processes, which can lead to misidentification of suicide clusters. Our findings provide insights toward defining the nature of the problem and refining geocoding processes, so that suicide data can be used reliably for the detection of suicide hotspots. See video abstract at, http://links.lww.com/EDE/B862.
Collapse
Affiliation(s)
- Michelle Torok
- From the Black Dog Institute, University of New South Wales, Sydney, NSW, Australia
| | - Paul Konings
- National Centre for Geographic Resources & Analysis in Primary Health Care, Research School of Population Health, Australian National University, Canberra, Australia
| | - Jason Passioura
- National Centre for Geographic Resources & Analysis in Primary Health Care, Research School of Population Health, Australian National University, Canberra, Australia
| | - Nicole A Chen
- Orygen Youth Mental Health, University of Melbourne, Parkville, VIC, Australia
| | - Michael Hewett
- National Centre for Geographic Resources & Analysis in Primary Health Care, Research School of Population Health, Australian National University, Canberra, Australia
| | - Matthew Phillips
- From the Black Dog Institute, University of New South Wales, Sydney, NSW, Australia
| | - Alexander Burnett
- From the Black Dog Institute, University of New South Wales, Sydney, NSW, Australia
| | - Fiona Shand
- From the Black Dog Institute, University of New South Wales, Sydney, NSW, Australia
| | - Helen Christensen
- From the Black Dog Institute, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
9
|
Cortes TR, Silveira IHD, Junger WL. Improving geocoding matching rates of structured addresses in Rio de Janeiro, Brazil. CAD SAUDE PUBLICA 2021; 37:e00039321. [PMID: 34346979 DOI: 10.1590/0102-311x00039321] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 06/11/2021] [Indexed: 11/22/2022] Open
Abstract
Strategies for improving geocoded data often rely on interactive manual processes that can be time-consuming and impractical for large-scale projects. In this study, we evaluated different automated strategies for improving address quality and geocoding matching rates using a large dataset of addresses from death records in Rio de Janeiro, Brazil. Mortality data included 132,863 records with address information in a structured format. We performed regular expressions and dictionary-based methods for address standardization and enrichment. All records were linked by their postal code or street name to the Brazilian National Address Directory (DNE) obtained from Brazil's Postal Service. Residential addresses were geocoded using Google Maps. Records with address data validated down to the street level and location type returned as rooftop, range interpolated, or geometric center were considered a geocoding match. The overall performance was assessed by manually reviewing a sample of addresses. Out of the original 132,863 records, 85.7% (n = 113,876) were geocoded and validated, out of which 83.8% were matched as rooftop (high accuracy). Overall sensitivity and specificity were 87% (95%CI: 86-88) and 98% (95%CI: 96-99), respectively. Our results indicate that address quality and geocoding completeness can be reliably improved with an automated geocoding process. R scripts and instructions to reproduce all the analyses are available at https://github.com/reprotc/geocoding.
Collapse
Affiliation(s)
- Taísa Rodrigues Cortes
- Instituto de Medicina Social, Universidade do Estado do Rio de Janeiro, Rio de Janeiro, Brasil
| | - Ismael Henrique da Silveira
- Instituto de Medicina Social, Universidade do Estado do Rio de Janeiro, Rio de Janeiro, Brasil.,Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador, Brasil
| | - Washington Leite Junger
- Instituto de Medicina Social, Universidade do Estado do Rio de Janeiro, Rio de Janeiro, Brasil
| |
Collapse
|
10
|
Clougherty JE, Humphrey JL, Kinnee EJ, Robinson LF, McClure LA, Kubzansky LD, Reid CE. Social Susceptibility to Multiple Air Pollutants in Cardiovascular Disease. Res Rep Health Eff Inst 2021; 2021:1-71. [PMID: 36004603 PMCID: PMC9403800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023] Open
Abstract
INTRODUCTION Cardiovascular disease (CVD) is the leading cause of death in the United States, and substantial research has linked ambient air pollution to elevated rates of CVD etiology and events. Much of this research identified increased effects of air pollution in lower socioeconomic position (SEP) communities, where pollution exposures are also often higher. The complex spatial confounding between air pollution and SEP makes it very challenging, however, to disentangle the impacts of these very different exposure types and to accurately assess their interactions. The specific causal components (i.e., specific social stressors) underlying this SEP-related susceptibility remain unknown, because there are myriad pathways through which poverty and/or lower-SEP conditions may influence pollution susceptibility - including diet, smoking, coexposures in the home and occupational environments, health behaviors, and healthcare access. Growing evidence suggests that a substantial portion of SEP-related susceptibility may be due to chronic psychosocial stress - given the known wide-ranging impacts of chronic stress on immune, endocrine, and metabolic function - and to a higher prevalence of unpredictable chronic stressors in many lower-SEP communities, including violence, job insecurity, and housing instability. As such, elucidating susceptibility to pollution in the etiology of CVD, and in the risk of CVD events, has been identified as a research priority. This interplay among social and environmental conditions may be particularly relevant for CVD, because pollution and chronic stress both impact inflammation, metabolic function, oxidative stress, hypertension, atherosclerosis, and other processes relevant to CVD etiology. Because pollution exposures are often spatially patterned by SEP, disentangling their effects - and quantifying any interplay - is especially challenging. Doing so, however, would help to improve our ability to identify and characterize susceptible populations and to improve our understanding of how community stressors may alter responses to multiple air pollutants. More clearly characterizing susceptible populations will improve our ability to design and target interventions more effectively (and cost-effectively) and may reveal greater benefits of pollution reduction in susceptible communities, strengthening cost-benefit and accountability analyses, ultimately reducing the disproportionate burden of CVD and reducing health disparities. METHODS In the current study, we aimed to quantify combined effects of multiple pollutants and stressor exposures on CVD events, using a number of unique datasets we have compiled and verified, including the following. 1. Poverty metrics, violent crime rates, a composite socioeconomic deprivation index (SDI), an index of racial and economic segregation, noise disturbance metrics, and three composite spatial factors produced from a factor analysis of 27 community stressors. All indicators have citywide coverage and were verified against individual reports of stress and stressor exposure, in citywide focus groups and surveys. 2. Spatial surfaces for multiple pollutants from the New York City (NYC) Community Air Survey (NYCCAS), which monitored multiple pollutants year-round at 150 sites and used land use regression (LUR) modeling to estimate fine-scale (100-m) intra-urban spatial variance in fine particles (PM2.5), nitrogen dioxide (NO2), sulfur dioxide (SO2), and ozone (O3). 3. Daily data and time-trends derived from all U.S. Environmental Protection Agency (EPA) Air Quality System (AQS) monitors in NYC for 2005-2011, which we combined with NYCCAS surfaces to create residence- and day-specific spatiotemporal exposure estimates. 4. Complete data on in- and out-patient unscheduled CVD events presented in NYC hospitals for 2005-2011 (n = 1,113,185) from the New York State (NYS) Department of Health's Statewide Planning and Research Cooperative System (SPARCS). In the study, we quantified relationships between multiple pollutant exposures and both community CVD event rates and individual risk of CVD events in NYC and tested whether pollution-CVD associations varied by community SEP and social stressor exposures. We hypothesized (1) that greater chronic community-level SEP, stressor, and pollution exposures would be associated with higher community CVD rates; (2) that spatiotemporal variations in multiple pollutants would be associated with excess risk of CVD events; and (3) that pollution-CVD associations would be stronger in communities of lower SEP or higher stressor exposures. RESULTS To first understand the separate and combined associations with CVD for both stressors and pollutants measured at the same spatial and temporal scale of resolution, we used ecological cross-sectional models to examine spatial relationships between multiple chronic pollutant and stressor exposures and age-adjusted community CVD rates. Using census-tract-level annual averages (n = 2,167), we compared associations with CVD rates for multiple pollutant concentrations and social stressors. We found that associations with community CVD rates were consistently stronger for social stressors than for pollutants, in terms of both magnitude and significance. We note, however, that this result may be driven by the relatively greater variation (on a proportional basis) for stressors than for pollutants in NYC. We also tested effect modification of pollutant-CVD associations by each social stressor and found evidence of stronger associations for NO2, PM2.5, and wintertime SO2 with CVD rates, particularly across quintiles of increasing community violence or assault rates (P trend < 0.0001). To examine individual-level associations between spatiotemporal exposures to multiple pollutants and the risk of CVD events, across multiple lag days, we examined the combined effects of multiple pollutant exposures, using spatiotemporal (day- and residence-specific) pollution exposure estimates and hospital data on individual CVD events in case-crossover models, which inherently adjust for nontime-varying individual confounders (e.g., sex and race) and comorbidities. We found consistent significant relationships only for same-day pollutant exposures and the risk of CVD events, suggesting very acute impacts of pollution on CVD risk. Associations with CVD were positive for NO2, PM2.5, and SO2, as hypothesized, and we found inverse associations for O3 (a secondary pollutant chemically decreased ["scavenged"] by fresh emissions that, in NYC, displays spatial and temporal patterns opposite those of NO2). Finally, to test effect modification by chronic community social stressors on the relationships between spatiotemporal pollution measures and the risk of CVD events, we used individual-level case-crossover models, adding interaction terms with categorical versions of each social stressor. We found that associations between NO2 and the risk of CVD events were significantly elevated only in communities with the highest exposures to social stressors (i.e., in the highest quintiles of poverty, socioeconomic deprivation, violence, or assault). The largest positive associations for PM2.5 and winter SO2 were generally found in the highest-stressor communities but were not significant in any quintile. We again found inverse associations for O3, which were likewise stronger for individuals living in communities with greater stressor exposures. CONCLUSIONS In ecological models, we found stronger relationships with community CVD rates for social stressors than for pollutant exposures. In case-crossover analyses, higher exposures to NO2, PM2.5, and SO2 were associated with greater excess risk of CVD events but only on the case day (there were no consistent significant lagged-day effects). In effect-modification analyses at both the community and individual level, we found evidence of stronger pollution-CVD associations in communities with higher stressor exposures. Given substantial spatial confounding across multiple social stressors, further research is needed to disentangle these effects in order to identify the predominant social stressors driving this observed differential susceptibility.
Collapse
Affiliation(s)
- J E Clougherty
- Department of Environmental and Occupational Health, Dornsife School of Public Health, Drexel University, Philadelphia, Pennsylvania
| | - J L Humphrey
- Department of Environmental and Occupational Health, Dornsife School of Public Health, Drexel University, Philadelphia, Pennsylvania
| | - E J Kinnee
- University of Pittsburgh Center for Social & Urban Research, Pittsburgh, Pennsylvania
| | - L F Robinson
- Department of Environmental and Occupational Health, Dornsife School of Public Health, Drexel University, Philadelphia, Pennsylvania
| | - L A McClure
- Department of Environmental and Occupational Health, Dornsife School of Public Health, Drexel University, Philadelphia, Pennsylvania
| | - L D Kubzansky
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - C E Reid
- University of Colorado, Boulder, Colorado
| |
Collapse
|
11
|
Spatial Heterogeneity in Positional Errors: A Comparison of Two Residential Geocoding Efforts in the Agricultural Health Study. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18041637. [PMID: 33572119 PMCID: PMC7915413 DOI: 10.3390/ijerph18041637] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 01/18/2021] [Accepted: 02/04/2021] [Indexed: 02/01/2023]
Abstract
Geocoding is a powerful tool for environmental exposure assessments that rely on spatial databases. Geocoding processes, locators, and reference datasets have improved over time; however, improvements have not been well-characterized. Enrollment addresses for the Agricultural Health Study, a cohort of pesticide applicators and their spouses in Iowa (IA) and North Carolina (NC), were geocoded in 2012–2016 and then again in 2019. We calculated distances between geocodes in the two periods. For a subset, we computed positional errors using “gold standard” rooftop coordinates (IA; N = 3566) or Global Positioning Systems (GPS) (IA and NC; N = 1258) and compared errors between periods. We used linear regression to model the change in positional error between time periods (improvement) by rural status and population density, and we used spatial relative risk functions to identify areas with significant improvement. Median improvement between time periods in IA was 41 m (interquartile range, IQR: −2 to 168) and 9 m (IQR: −80 to 133) based on rooftop coordinates and GPS, respectively. Median improvement in NC was 42 m (IQR: −1 to 109 m) based on GPS. Positional error was greater in rural and low-density areas compared to in towns and more densely populated areas. Areas of significant improvement in accuracy were identified and mapped across both states. Our findings underscore the importance of evaluating determinants and spatial distributions of errors in geocodes used in environmental epidemiology studies.
Collapse
|
12
|
Maternal proximity to Central Appalachia surface mining and birth outcomes. Environ Epidemiol 2021; 5:e128. [PMID: 33778360 PMCID: PMC7939414 DOI: 10.1097/ee9.0000000000000128] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 12/23/2020] [Indexed: 01/09/2023] Open
Abstract
Supplemental Digital Content is available in the text. Maternal residency in Central Appalachia counties with coal production has been previously associated with increased rates of low birth weight (LBW). To refine the relationship between surface mining and birth outcomes, this study employs finer spatiotemporal estimates of exposure.
Collapse
|