1
|
Chin MK, Đoàn LN, Russo RG, Roberts T, Persaud S, Huang E, Fu L, Kui KY, Kwon SC, Yi SS. Methods for retrospectively improving race/ethnicity data quality: a scoping review. Epidemiol Rev 2023; 45:127-139. [PMID: 37045807 DOI: 10.1093/epirev/mxad002] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 02/27/2023] [Accepted: 04/04/2023] [Indexed: 04/14/2023] Open
Abstract
Improving race and ethnicity (hereafter, race/ethnicity) data quality is imperative to ensure underserved populations are represented in data sets used to identify health disparities and inform health care policy. We performed a scoping review of methods that retrospectively improve race/ethnicity classification in secondary data sets. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, searches were conducted in the MEDLINE, Embase, and Web of Science Core Collection databases in July 2022. A total of 2 441 abstracts were dually screened, 453 full-text articles were reviewed, and 120 articles were included. Study characteristics were extracted and described in a narrative analysis. Six main method types for improving race/ethnicity data were identified: expert review (n = 9; 8%), name lists (n = 27, 23%), name algorithms (n = 55, 46%), machine learning (n = 14, 12%), data linkage (n = 9, 8%), and other (n = 6, 5%). The main racial/ethnic groups targeted for classification were Asian (n = 56, 47%) and White (n = 51, 43%). Some form of validation evaluation was included in 86 articles (72%). We discuss the strengths and limitations of different method types and potential harms of identified methods. Innovative methods are needed to better identify racial/ethnic subgroups and further validation studies. Accurately collecting and reporting disaggregated data by race/ethnicity are critical to address the systematic missingness of relevant demographic data that can erroneously guide policymaking and hinder the effectiveness of health care practices and intervention.
Collapse
Affiliation(s)
- Matthew K Chin
- Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
| | - Lan N Đoàn
- Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
| | - Rienna G Russo
- Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
| | - Timothy Roberts
- NYU Langone Health Sciences Library, NYU Grossman School of Medicine New York, NY 10016, United States
| | - Sonia Persaud
- Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
- Department of Health Policy and Management, CUNY School of Public Health & Health Policy, New York, NY 10027, United States
| | - Emily Huang
- Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
| | - Lauren Fu
- Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
- Georgetown University, Washington DC 20007, United States
| | - Kiran Y Kui
- Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
- Department of Epidemiology, Columbia Mailman School of Public Health, New York, NY 10032, United States
| | - Simona C Kwon
- Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
| | - Stella S Yi
- Section for Health Equity, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
| |
Collapse
|
2
|
Adjaye-Gbewonyo D, Bednarczyk RA, Davis RL, Omer SB. Using the Bayesian Improved Surname Geocoding Method (BISG) to create a working classification of race and ethnicity in a diverse managed care population: a validation study. Health Serv Res 2013; 49:268-83. [PMID: 23855558 DOI: 10.1111/1475-6773.12089] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/06/2013] [Indexed: 12/31/2022] Open
Abstract
OBJECTIVE To validate classification of race/ethnicity based on the Bayesian Improved Surname Geocoding method (BISG) and assess variations in validity by gender and age. DATA SOURCES/STUDY SETTING Secondary data on members of Kaiser Permanente Georgia, an integrated managed care organization, through 2010. STUDY DESIGN For 191,494 members with self-reported race/ethnicity, probabilities for belonging to each of six race/ethnicity categories predicted from the BISG algorithm were used to assign individuals to a race/ethnicity category over a range of cutoffs greater than a probability of 0.50. Overall as well as gender- and age-stratified sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated. Receiver operating characteristic (ROC) curves were generated and used to identify optimal cutoffs for race/ethnicity assignment. PRINCIPAL FINDINGS The overall cutoffs for assignment that optimized sensitivity and specificity ranged from 0.50 to 0.57 for the four main racial/ethnic categories (White, Black, Asian/Pacific Islander, Hispanic). Corresponding sensitivity, specificity, PPV, and NPV ranged from 64.4 to 81.4 percent, 80.8 to 99.7 percent, 75.0 to 91.6 percent, and 79.4 to 98.0 percent, respectively. Accuracy of assignment was better among males and individuals of 65 years or older. CONCLUSIONS BISG may be useful for classifying race/ethnicity of health plan members when needed for health care studies.
Collapse
|
3
|
Derose SF, Contreras R, Coleman KJ, Koebnick C, Jacobsen SJ. Race and ethnicity data quality and imputation using U.S. Census data in an integrated health system: the Kaiser Permanente Southern California experience. Med Care Res Rev 2013; 70:330-45. [PMID: 23169896 DOI: 10.1177/1077558712466293] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Research on racial and ethnic disparities using health system databases can shed light on the usual health care and outcomes of large numbers of individuals so that health inequities can be better understood and addressed. Such research often suffers from limitations in race/ethnicity data quality. We examined the quality of race/ethnicity data in a large, diverse, integrated health system that repeatedly collects these data on utilization of services. We tested the accuracy of Bayesian Improved Surname Geocoding for imputation of race/ethnicity data. Administrative race/ethnicity data were accurate as judged by comparison with self-report in adults. The Bayesian Improved Surname Geocoding method produced imputation results far better than chance assignment for the four most common race/ethnicity groups in the health system: Whites, Hispanics, Blacks, and Asians. These results support renewed efforts to conduct studies of racial and ethnic disparities in large health systems.
Collapse
|