1
|
Perets O, Stagno E, Yehuda EB, McNichol M, Anthony Celi L, Rappoport N, Dorotic M. Inherent Bias in Electronic Health Records: A Scoping Review of Sources of Bias. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.09.24305594. [PMID: 38680842 PMCID: PMC11046491 DOI: 10.1101/2024.04.09.24305594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
Objectives 1.1Biases inherent in electronic health records (EHRs), and therefore in medical artificial intelligence (AI) models may significantly exacerbate health inequities and challenge the adoption of ethical and responsible AI in healthcare. Biases arise from multiple sources, some of which are not as documented in the literature. Biases are encoded in how the data has been collected and labeled, by implicit and unconscious biases of clinicians, or by the tools used for data processing. These biases and their encoding in healthcare records undermine the reliability of such data and bias clinical judgments and medical outcomes. Moreover, when healthcare records are used to build data-driven solutions, the biases are further exacerbated, resulting in systems that perpetuate biases and induce healthcare disparities. This literature scoping review aims to categorize the main sources of biases inherent in EHRs. Methods 1.2We queried PubMed and Web of Science on January 19th, 2023, for peer-reviewed sources in English, published between 2016 and 2023, using the PRISMA approach to stepwise scoping of the literature. To select the papers that empirically analyze bias in EHR, from the initial yield of 430 papers, 27 duplicates were removed, and 403 studies were screened for eligibility. 196 articles were removed after the title and abstract screening, and 96 articles were excluded after the full-text review resulting in a final selection of 116 articles. Results 1.3Systematic categorizations of diverse sources of bias are scarce in the literature, while the effects of separate studies are often convoluted and methodologically contestable. Our categorization of published empirical evidence identified the six main sources of bias: a) bias arising from past clinical trials; b) data-related biases arising from missing, incomplete information or poor labeling of data; human-related bias induced by c) implicit clinician bias, d) referral and admission bias; e) diagnosis or risk disparities bias and finally, (f) biases in machinery and algorithms. Conclusions 1.4Machine learning and data-driven solutions can potentially transform healthcare delivery, but not without limitations. The core inputs in the systems (data and human factors) currently contain several sources of bias that are poorly documented and analyzed for remedies. The current evidence heavily focuses on data-related biases, while other sources are less often analyzed or anecdotal. However, these different sources of biases add to one another exponentially. Therefore, to understand the issues holistically we need to explore these diverse sources of bias. While racial biases in EHR have been often documented, other sources of biases have been less frequently investigated and documented (e.g. gender-related biases, sexual orientation discrimination, socially induced biases, and implicit, often unconscious, human-related cognitive biases). Moreover, some existing studies lack causal evidence, illustrating the different prevalences of disease across groups, which does not per se prove the causality. Our review shows that data-, human- and machine biases are prevalent in healthcare and they significantly impact healthcare outcomes and judgments and exacerbate disparities and differential treatment. Understanding how diverse biases affect AI systems and recommendations is critical. We suggest that researchers and medical personnel should develop safeguards and adopt data-driven solutions with a "bias-in-mind" approach. More empirical evidence is needed to tease out the effects of different sources of bias on health outcomes.
Collapse
|
2
|
Al-Sahab B, Leviton A, Loddenkemper T, Paneth N, Zhang B. Biases in Electronic Health Records Data for Generating Real-World Evidence: An Overview. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:121-139. [PMID: 38273982 PMCID: PMC10805748 DOI: 10.1007/s41666-023-00153-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/05/2023] [Accepted: 11/07/2023] [Indexed: 01/27/2024]
Abstract
Electronic Health Records (EHR) are increasingly being perceived as a unique source of data for clinical research as they provide unprecedentedly large volumes of real-time data from real-world settings. In this review of the secondary uses of EHR, we identify the anticipated breadth of opportunities, pointing out the data deficiencies and potential biases that are likely to limit the search for true causal relationships. This paper provides a comprehensive overview of the types of biases that arise along the pathways that generate real-world evidence and the sources of these biases. We distinguish between two levels in the production of EHR data where biases are likely to arise: (i) at the healthcare system level, where the principal source of bias resides in access to, and provision of, medical care, and in the acquisition and documentation of medical and administrative data; and (ii) at the research level, where biases arise from the processes of extracting, analyzing, and interpreting these data. Due to the plethora of biases, mainly in the form of selection and information bias, we conclude with advising extreme caution about making causal inferences based on secondary uses of EHRs.
Collapse
Affiliation(s)
- Ban Al-Sahab
- Department of Family Medicine, College of Human Medicine, Michigan State University, B100 Clinical Center, 788 Service Road, East Lansing, MI USA
| | - Alan Leviton
- Department of Neurology, Harvard Medical School, Boston, MA USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA USA
| | - Tobias Loddenkemper
- Department of Neurology, Harvard Medical School, Boston, MA USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA USA
| | - Nigel Paneth
- Department of Epidemiology and Biostatistics, College of Human Medicine, Michigan State University, East Lansing, MI USA
- Department of Pediatrics and Human Development, College of Human Medicine, Michigan State University, East Lansing, MI USA
| | - Bo Zhang
- Department of Neurology, Boston Children’s Hospital, Boston, MA USA
- Biostatistics and Research Design, Institutional Centers of Clinical and Translational Research, Boston Children’s Hospital, Boston, MA USA
- Harvard Medical School, Boston, MA USA
| |
Collapse
|
3
|
Hubbard RA, Pujol TA, Alhajjar E, Edoh K, Martin ML. Sources of Disparities in Surveillance Mammography Performance and Risk-Guided Recommendations for Supplemental Breast Imaging: A Simulation Study. Cancer Epidemiol Biomarkers Prev 2023; 32:1531-1541. [PMID: 37351916 PMCID: PMC10750297 DOI: 10.1158/1055-9965.epi-23-0330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 05/22/2023] [Accepted: 06/21/2023] [Indexed: 06/24/2023] Open
Abstract
BACKGROUND Surveillance mammography is recommended for all women with a history of breast cancer. Risk-guided surveillance incorporating advanced imaging modalities based on individual risk of a second cancer could improve cancer detection. However, personalized surveillance may also amplify disparities. METHODS In simulated populations using inputs from the Breast Cancer Surveillance Consortium (BCSC), we investigated race- and ethnicity-based disparities. Disparities were decomposed into those due to primary breast cancer and treatment characteristics, social determinants of health (SDOH) and differential error in second cancer ascertainment by modeling populations with or without variation across race and ethnicity in the distribution of these characteristics. We estimated effects of disparities on mammography performance and supplemental imaging recommendations stratified by race and ethnicity. RESULTS In simulated cohorts based on 65,446 BCSC surveillance mammograms, when only cancer characteristics varied by race and ethnicity, mammograms for Black women had lower sensitivity compared with the overall population (64.1% vs. 71.1%). Differences between Black women and the overall population were larger when both cancer characteristics and SDOH varied by race and ethnicity (53.8% vs. 71.1%). Basing supplemental imaging recommendations on high predicted second cancer risk resulted in less frequent recommendations for Hispanic (6.7%) and Asian/Pacific Islander women (6.4%) compared with the overall population (10.0%). CONCLUSIONS Variation in cancer characteristics and SDOH led to disparities in surveillance mammography performance and recommendations for supplemental imaging. IMPACT Risk-guided surveillance imaging may exacerbate disparities. Decision-makers should consider implications for equity in cancer outcomes resulting from implementing risk-guided screening programs. See related In the Spotlight, p. 1479.
Collapse
Affiliation(s)
- Rebecca A. Hubbard
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | | | - Elie Alhajjar
- Department of Mathematical Sciences, United States Military Academy, West Point, NY
| | - Kossi Edoh
- Department of Mathematics, North Carolina Agricultural & Technical State University, Greensboro, NC
| | - Melissa L. Martin
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
4
|
Ro SK, Zhang W, Jiang Q, Li XN, Liu R, Lu CC, Marchenko O, Sun L, Zhao J. Statistical Considerations on the Use of RWD/RWE for Oncology Drug Approvals: Overview and Lessons Learned. Ther Innov Regul Sci 2023; 57:899-910. [PMID: 37179264 PMCID: PMC10276785 DOI: 10.1007/s43441-023-00528-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 04/14/2023] [Indexed: 05/15/2023]
Abstract
Despite increasing utilization of real-world data (RWD)/real-world evidence (RWE) in regulatory submissions, their application to oncology drug approvals has seen limited success. Real-world data is most commonly summarized as a benchmark control for a single arm study or used to augment the concurrent control in a randomized clinical trial (RCT). While there has been substantial research on usage of RWD/RWE, our goal is to provide a comprehensive overview of their use in oncology drug approval submissions to inform future RWD/RWE study design. We will review examples of applications and summarize the strengths and weaknesses of each example identified by regulatory agencies. A few noteworthy case studies will be reviewed in detail. Operational aspects of RWD/RWE study design/analysis will be also discussed.
Collapse
Affiliation(s)
- Sunhee K Ro
- Sierra Oncology Inc: GlaxoSmithKline Inc, San Mateo, USA.
| | | | | | | | - Rong Liu
- Bristol Myers Squibb Co., New York, USA
| | | | | | | | | |
Collapse
|
5
|
Zhou Y, Shi J, Stein R, Liu X, Baldassano RN, Forrest CB, Chen Y, Huang J. Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research. J Am Med Inform Assoc 2023; 30:1246-1256. [PMID: 37337922 PMCID: PMC10280351 DOI: 10.1093/jamia/ocad066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 03/20/2023] [Accepted: 04/08/2023] [Indexed: 06/21/2023] Open
Abstract
OBJECTIVES The impacts of missing data in comparative effectiveness research (CER) using electronic health records (EHRs) may vary depending on the type and pattern of missing data. In this study, we aimed to quantify these impacts and compare the performance of different imputation methods. MATERIALS AND METHODS We conducted an empirical (simulation) study to quantify the bias and power loss in estimating treatment effects in CER using EHR data. We considered various missing scenarios and used the propensity scores to control for confounding. We compared the performance of the multiple imputation and spline smoothing methods to handle missing data. RESULTS When missing data depended on the stochastic progression of disease and medical practice patterns, the spline smoothing method produced results that were close to those obtained when there were no missing data. Compared to multiple imputation, the spline smoothing generally performed similarly or better, with smaller estimation bias and less power loss. The multiple imputation can still reduce study bias and power loss in some restrictive scenarios, eg, when missing data did not depend on the stochastic process of disease progression. DISCUSSION AND CONCLUSION Missing data in EHRs could lead to biased estimates of treatment effects and false negative findings in CER even after missing data were imputed. It is important to leverage the temporal information of disease trajectory to impute missing values when using EHRs as a data resource for CER and to consider the missing rate and the effect size when choosing an imputation method.
Collapse
Affiliation(s)
- Yizhao Zhou
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Jiasheng Shi
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Ronen Stein
- Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
- Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Xiaokang Liu
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Robert N Baldassano
- Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
- Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Christopher B Forrest
- Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
- Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jing Huang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| |
Collapse
|
6
|
Coulombe J, Moodie EEM, Shortreed SM, Renoux C. Estimating individualized treatment rules in longitudinal studies with covariate-driven observation times. Stat Methods Med Res 2023; 32:868-884. [PMID: 36927216 PMCID: PMC10248307 DOI: 10.1177/09622802231158733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
The sequential treatment decisions made by physicians to treat chronic diseases are formalized in the statistical literature as dynamic treatment regimes. To date, methods for dynamic treatment regimes have been developed under the assumption that observation times, that is, treatment and outcome monitoring times, are determined by study investigators. That assumption is often not satisfied in electronic health records data in which the outcome, the observation times, and the treatment mechanism are associated with patients' characteristics. The treatment and observation processes can lead to spurious associations between the treatment of interest and the outcome to be optimized under the dynamic treatment regime if not adequately considered in the analysis. We address these associations by incorporating two inverse weights that are functions of a patient's covariates into dynamic weighted ordinary least squares to develop optimal single stage dynamic treatment regimes, known as individualized treatment rules. We show empirically that our methodology yields consistent, multiply robust estimators. In a cohort of new users of antidepressant drugs from the United Kingdom's Clinical Practice Research Datalink, the proposed method is used to develop an optimal treatment rule that chooses between two antidepressants to optimize a utility function related to the change in body mass index.
Collapse
Affiliation(s)
- Janie Coulombe
- Department of Mathematics and
Statistics, Université de Montréal, Montreal, Canada
| | - Erica EM Moodie
- Department of Epidemiology,
Biostatistics and Occupational Health, McGill University, Montreal, Canada
| | - Susan M Shortreed
- Biostatistics Unit, Kaiser Permanente Washington Health
Research Institute, Seattle, Washington, USA
- Biostatistics Department, University of Washington, Seattle, Washington, USA
| | - Christel Renoux
- Lady Davis Institute for Medical
Research, Jewish General Hospital, Montreal, Canada
- Department of Neurology and
Neurosurgery, McGill University, Montreal, Canada
- Department of Epidemiology,
Biostatistics and Occupational Health, Mcgill University, Montreal, Canada
| |
Collapse
|
7
|
Estimation of marginal structural models under irregular visits and unmeasured confounder: calibrated inverse probability weights. BMC Med Res Methodol 2023; 23:4. [PMID: 36611135 PMCID: PMC9825036 DOI: 10.1186/s12874-022-01831-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 12/26/2022] [Indexed: 01/09/2023] Open
Abstract
Clinical information collected in electronic health records (EHRs) is becoming an essential source to emulate randomized experiments. Since patients do not interact with the healthcare system at random, the longitudinal information in large observational databases must account for irregular visits. Moreover, we need to also account for subject-specific unmeasured confounders which may act as a common cause for treatment assignment mechanism (e.g. glucose-lowering medications) while also influencing the outcome (e.g. Hemoglobin A1c). We used the calibration of longitudinal weights to improve the finite sample properties and to account for subject-specific unmeasured confounders. A Monte Carlo simulation study is conducted to evaluate the performance of calibrated inverse probability estimators using time-dependent treatment assignment and irregular visits with subject-specific unmeasured confounders. The simulation study showed that the longitudinal weights with calibrated restrictions improved the finite sample bias when compared to the stabilized weights. The application of the calibrated weights is demonstrated using the exposure of glucose lowering medications and the longitudinal outcome of Hemoglobin A1c. Our results support the effectiveness of glucose lowering medications in reducing Hemoglobin A1c among type II diabetes patients with elevated glycemic index ([Formula: see text]) using stabilized and calibrated weights.
Collapse
|
8
|
Carrero JJ, Fu EL, Vestergaard SV, Jensen SK, Gasparini A, Mahalingasivam V, Bell S, Birn H, Heide-Jørgensen U, Clase CM, Cleary F, Coresh J, Dekker FW, Gansevoort RT, Hemmelgarn BR, Jager KJ, Jafar TH, Kovesdy CP, Sood MM, Stengel B, Christiansen CF, Iwagami M, Nitsch D. Defining measures of kidney function in observational studies using routine health care data: methodological and reporting considerations. Kidney Int 2023; 103:53-69. [PMID: 36280224 DOI: 10.1016/j.kint.2022.09.020] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 08/31/2022] [Accepted: 09/09/2022] [Indexed: 11/06/2022]
Abstract
The availability of electronic health records and access to a large number of routine measurements of serum creatinine and urinary albumin enhance the possibilities for epidemiologic research in kidney disease. However, the frequency of health care use and laboratory testing is determined by health status and indication, imposing certain challenges when identifying patients with kidney injury or disease, when using markers of kidney function as covariates, or when evaluating kidney outcomes. Depending on the specific research question, this may influence the interpretation, generalizability, and/or validity of study results. This review illustrates the heterogeneity of working definitions of kidney disease in the scientific literature and discusses advantages and limitations of the most commonly used approaches using 3 examples. We summarize ways to identify and overcome possible biases and conclude by proposing a framework for reporting definitions of exposures and outcomes in studies of kidney disease using routinely collected health care data.
Collapse
Affiliation(s)
- Juan Jesus Carrero
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Solna, Sweden.
| | - Edouard L Fu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Solna, Sweden; Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA; Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, the Netherlands
| | - Søren V Vestergaard
- Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark; Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
| | - Simon Kok Jensen
- Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark; Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
| | - Alessandro Gasparini
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Solna, Sweden
| | - Viyaasan Mahalingasivam
- Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Samira Bell
- Division of Population Health and Genomics, University of Dundee, Dundee, UK
| | - Henrik Birn
- Department of Clinical Medicine, Aarhus University, Aarhus, Denmark; Department of Biomedicine, Aarhus University, Aarhus, Denmark; Department of Renal Medicine, Aarhus University Hospital, Aarhus, Denmark
| | - Uffe Heide-Jørgensen
- Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark; Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
| | - Catherine M Clase
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada; Department of Health Research and Methodology, McMaster University, Hamilton, Ontario, Canada
| | - Faye Cleary
- Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Josef Coresh
- Department of Epidemiology, Johns Hopkins University, Baltimore, Maryland, USA
| | - Friedo W Dekker
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, the Netherlands
| | - Ron T Gansevoort
- Department of Nephrology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | | | - Kitty J Jager
- ERA Registry, Amsterdam UMC location University of Amsterdam, Medical Informatics, Meibergdreef, Amsterdam, Netherlands; Amsterdam Public Health Research Institute, Quality of Care, Amsterdam, the Netherlands
| | - Tazeen H Jafar
- Program in Health Services and Systems Research, Duke-NUS Medical School, Singapore
| | - Csaba P Kovesdy
- Division of Nephrology, Department of Medicine, University of Tennessee Health Science Center, Memphis, Tennessee, USA
| | - Manish M Sood
- Department of Medicine, the Ottawa Hospital Research Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Bénédicte Stengel
- CESP (Center for Research in Epidemiology and Population Health), Clinical Epidemiology Team, University Paris-Saclay, University Versailles-Saint Quentin, Inserm U1018, Villejuif, France
| | - Christian F Christiansen
- Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus, Denmark; Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
| | - Masao Iwagami
- Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK; Department of Health Services Research, University of Tsukuba, Ibaraki, Japan
| | - Dorothea Nitsch
- Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK; Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand; UK Renal Registry, UK Kidney Association, Bristol, UK.
| |
Collapse
|
9
|
Chavez-Yenter D, Goodman MS, Chen Y, Chu X, Bradshaw RL, Lorenz Chambers R, Chan PA, Daly BM, Flynn M, Gammon A, Hess R, Kessler C, Kohlmann WK, Mann DM, Monahan R, Peel S, Kawamoto K, Del Fiol G, Sigireddi M, Buys SS, Ginsburg O, Kaphingst KA. Association of Disparities in Family History and Family Cancer History in the Electronic Health Record With Sex, Race, Hispanic or Latino Ethnicity, and Language Preference in 2 Large US Health Care Systems. JAMA Netw Open 2022; 5:e2234574. [PMID: 36194411 PMCID: PMC9533178 DOI: 10.1001/jamanetworkopen.2022.34574] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
IMPORTANCE Clinical decision support (CDS) algorithms are increasingly being implemented in health care systems to identify patients for specialty care. However, systematic differences in missingness of electronic health record (EHR) data may lead to disparities in identification by CDS algorithms. OBJECTIVE To examine the availability and comprehensiveness of cancer family history information (FHI) in patients' EHRs by sex, race, Hispanic or Latino ethnicity, and language preference in 2 large health care systems in 2021. DESIGN, SETTING, AND PARTICIPANTS This retrospective EHR quality improvement study used EHR data from 2 health care systems: University of Utah Health (UHealth) and NYU Langone Health (NYULH). Participants included patients aged 25 to 60 years who had a primary care appointment in the previous 3 years. Data were collected or abstracted from the EHR from December 10, 2020, to October 31, 2021, and analyzed from June 15 to October 31, 2021. EXPOSURES Prior collection of cancer FHI in primary care settings. MAIN OUTCOMES AND MEASURES Availability was defined as having any FHI and any cancer FHI in the EHR and was examined at the patient level. Comprehensiveness was defined as whether a cancer family history observation in the EHR specified the type of cancer diagnosed in a family member, the relationship of the family member to the patient, and the age at onset for the family member and was examined at the observation level. RESULTS Among 144 484 patients in the UHealth system, 53.6% were women; 74.4% were non-Hispanic or non-Latino and 67.6% were White; and 83.0% had an English language preference. Among 377 621 patients in the NYULH system, 55.3% were women; 63.2% were non-Hispanic or non-Latino, and 55.3% were White; and 89.9% had an English language preference. Patients from historically medically undeserved groups-specifically, Black vs White patients (UHealth: 17.3% [95% CI, 16.1%-18.6%] vs 42.8% [95% CI, 42.5%-43.1%]; NYULH: 24.4% [95% CI, 24.0%-24.8%] vs 33.8% [95% CI, 33.6%-34.0%]), Hispanic or Latino vs non-Hispanic or non-Latino patients (UHealth: 27.2% [95% CI, 26.5%-27.8%] vs 40.2% [95% CI, 39.9%-40.5%]; NYULH: 24.4% [95% CI, 24.1%-24.7%] vs 31.6% [95% CI, 31.4%-31.8%]), Spanish-speaking vs English-speaking patients (UHealth: 18.4% [95% CI, 17.2%-19.1%] vs 40.0% [95% CI, 39.7%-40.3%]; NYULH: 15.1% [95% CI, 14.6%-15.6%] vs 31.1% [95% CI, 30.9%-31.2%), and men vs women (UHealth: 30.8% [95% CI, 30.4%-31.2%] vs 43.0% [95% CI, 42.6%-43.3%]; NYULH: 23.1% [95% CI, 22.9%-23.3%] vs 34.9% [95% CI, 34.7%-35.1%])-had significantly lower availability and comprehensiveness of cancer FHI (P < .001). CONCLUSIONS AND RELEVANCE These findings suggest that systematic differences in the availability and comprehensiveness of FHI in the EHR may introduce informative presence bias as inputs to CDS algorithms. The observed differences may also exacerbate disparities for medically underserved groups. System-, clinician-, and patient-level efforts are needed to improve the collection of FHI.
Collapse
Affiliation(s)
- Daniel Chavez-Yenter
- Huntsman Cancer Institute, University of Utah, Salt Lake City
- Department of Communication, University of Utah, Salt Lake City
| | - Melody S. Goodman
- School of Global Public Health, New York University, New York, New York
| | - Yuyu Chen
- School of Global Public Health, New York University, New York, New York
| | - Xiangying Chu
- School of Global Public Health, New York University, New York, New York
| | - Richard L. Bradshaw
- Department of Biomedical Informatics, University of Utah, Salt Lake City
- School of Medicine, University of Utah Health, Salt Lake City, Utah
| | | | | | - Brianne M. Daly
- Huntsman Cancer Institute, University of Utah, Salt Lake City
| | - Michael Flynn
- School of Medicine, University of Utah Health, Salt Lake City, Utah
| | - Amanda Gammon
- Huntsman Cancer Institute, University of Utah, Salt Lake City
| | - Rachel Hess
- Department of Population Health Sciences, University of Utah, Salt Lake City
- Department of Internal Medicine, University of Utah, Salt Lake City
| | - Cecelia Kessler
- Huntsman Cancer Institute, University of Utah, Salt Lake City
| | | | - Devin M. Mann
- Department of Population Health, New York University Grossman School of Medicine, New York University, New York, New York
| | - Rachel Monahan
- Perlmutter Cancer Center, NYU Langone Health, New York, New York
- Department of Population Health, New York University Grossman School of Medicine, New York University, New York, New York
| | - Sara Peel
- Huntsman Cancer Institute, University of Utah, Salt Lake City
| | - Kensaku Kawamoto
- Department of Biomedical Informatics, University of Utah, Salt Lake City
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah, Salt Lake City
| | | | - Saundra S. Buys
- Huntsman Cancer Institute, University of Utah, Salt Lake City
- Department of Internal Medicine, University of Utah, Salt Lake City
| | - Ophira Ginsburg
- Center for Global Health, National Cancer Institute, Rockville, Maryland
| | - Kimberly A. Kaphingst
- Huntsman Cancer Institute, University of Utah, Salt Lake City
- Department of Communication, University of Utah, Salt Lake City
| |
Collapse
|
10
|
Kopylova OV, Ershova AI, Efimova IA, Blokhina AV, Limonova AS, Borisova AL, Pokrovskaya MS, Drapkina OM. Electronic medical records and biobanking. КАРДИОВАСКУЛЯРНАЯ ТЕРАПИЯ И ПРОФИЛАКТИКА 2022. [DOI: 10.15829/1728-8800-2022-3425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Biosample preservation for future research is a fundamental component of translational medicine. At the same time, the value of stored biosamples is largely determined by the presence of related clinical data and other information. Electronic medical records are a unique source of a large amount of information received over a long period of time. In this regard, genetic and other types of data obtained from the biosample analysis can be associated with phenotypic and other types of information stored in electronic medical records, which pushes the boundaries in large-scale genetic research and improves healthcare. The aim of this review was to analyze the literature on the potential of combining electronic medical records and biobank databases in research and clinical practice.
Collapse
Affiliation(s)
- O. V. Kopylova
- National Medical Research Center for Therapy and Preventive Medicine
| | - A. I. Ershova
- National Medical Research Center for Therapy and Preventive Medicine
| | - I. A. Efimova
- National Medical Research Center for Therapy and Preventive Medicine
| | - A. V. Blokhina
- National Medical Research Center for Therapy and Preventive Medicine
| | - A. S. Limonova
- National Medical Research Center for Therapy and Preventive Medicine
| | - A. L. Borisova
- National Medical Research Center for Therapy and Preventive Medicine
| | - M. S. Pokrovskaya
- National Medical Research Center for Therapy and Preventive Medicine
| | - O. M. Drapkina
- National Medical Research Center for Therapy and Preventive Medicine
| |
Collapse
|
11
|
Levenson M, He W, Chen L, Dharmarajan S, Izem R, Meng Z, Pang H, Rockhold F. Statistical consideration for fit-for-use real-world data to support regulatory decision making in drug development. Stat Biopharm Res 2022. [DOI: 10.1080/19466315.2022.2120533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Affiliation(s)
| | - Weili He
- Global Medical Affairs Statistics, Data and Statistical Sciences, AbbVie, North Chicago, IL
| | - Li Chen
- Global Medical Affairs Statistics, Data and Statistical Sciences, AbbVie, North Chicago, IL
| | | | - Rima Izem
- Novartis Institutes for BioMedical Research Basel, Basel, Basel-Stadt, CH
| | | | | | - Frank Rockhold
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC
- Duke Clinical Research Institute, Duke University, Durham, NC
| |
Collapse
|
12
|
McGee G, Haneuse S, Coull BA, Weisskopf MG, Rotem RS. On the Nature of Informative Presence Bias in Analyses of Electronic Health Records. Epidemiology 2022; 33:105-113. [PMID: 34711733 PMCID: PMC8633193 DOI: 10.1097/ede.0000000000001432] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Electronic health records (EHRs) offer unprecedented opportunities to answer epidemiologic questions. However, unlike in ordinary cohort studies or randomized trials, EHR data are collected somewhat idiosyncratically. In particular, patients who have more contact with the medical system have more opportunities to receive diagnoses, which are then recorded in their EHRs. The goal of this article is to shed light on the nature and scope of this phenomenon, known as informative presence, which can bias estimates of associations. We show how this can be characterized as an instance of misclassification bias. As a consequence, we show that informative presence bias can occur in a broader range of settings than previously thought, and that simple adjustment for the number of visits as a confounder may not fully correct for bias. Additionally, where previous work has considered only underdiagnosis, investigators are often concerned about overdiagnosis; we show how this changes the settings in which bias manifests. We report on a comprehensive series of simulations to shed light on when to expect informative presence bias, how it can be mitigated in some cases, and cases in which new methods need to be developed.
Collapse
Affiliation(s)
- Glen McGee
- Department of Statistics and Actuarial Science, University
of Waterloo, Waterloo, ON, Canada
| | - Sebastien Haneuse
- Department of Biostatistics, Harvard T.H. Chan School of
Public Health, Boston, MA
| | - Brent A. Coull
- Department of Biostatistics, Harvard T.H. Chan School of
Public Health, Boston, MA
| | - Marc G. Weisskopf
- Department of Environmental Health, Harvard T.H. Chan
School of Public Health, Boston, MA
| | - Ran S. Rotem
- Department of Environmental Health, Harvard T.H. Chan
School of Public Health, Boston, MA
- Kahn-Sagol-Maccabi Research and Innovation Institute,
Maccabi Healthcare Services, Tel Aviv, Israel
| |
Collapse
|
13
|
Harton J, Mitra N, Hubbard RA. OUP accepted manuscript. J Am Med Inform Assoc 2022; 29:1191-1199. [PMID: 35438796 PMCID: PMC9196698 DOI: 10.1093/jamia/ocac050] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 02/21/2022] [Accepted: 03/29/2022] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE Electronic health record (EHR)-derived data are extensively used in health research. However, the pattern of patient interaction with the healthcare system can result in informative presence bias if those who have poorer health have more data recorded than healthier patients. We aimed to determine how informative presence affects bias across multiple scenarios informed by real-world healthcare utilization patterns. MATERIALS AND METHODS We conducted an analysis of EHR data from a pediatric healthcare system as well as simulation studies to characterize conditions under which informative presence bias is likely to occur. This analysis extends prior work by examining a variety of scenarios for the relationship between a biomarker and a health event of interest and the healthcare visit process. RESULTS Using biomarker values gathered at both informative and noninformative visits when estimating the effect of the biomarker on the event of interest resulted in minimal bias when the biomarker was relatively stable over time but produced substantial bias when the biomarker was more volatile. Adjusting analyses for the number of prior visits within a fixed look-back window was able to reduce but not eliminate this bias. DISCUSSION These results suggest that bias may arise frequently in commonly encountered scenarios and may not be eliminated by adjusting for prior visit intensity. CONCLUSION Depending on the context, the estimated effect from analyses using data from all visits available may diverge from the true effect. Sensitivity analyses using only visits likely to be informative or noninformative based on visit type may aid in the assessment of the magnitude of potential bias.
Collapse
Affiliation(s)
- Joanna Harton
- Corresponding Author: Joanna Harton, Department of Biostatistics, Epidemiology, and Informatics, 423 Guardian Drive, Philadelphia, PA 19104, USA;
| | | | | |
Collapse
|
14
|
Peer K, Adams WG, Legler A, Sandel M, Levy JI, Boynton-Jarrett R, Kim C, Leibler JH, Fabian MP. Developing and evaluating a pediatric asthma severity computable phenotype derived from electronic health records. J Allergy Clin Immunol 2021; 147:2162-2170. [PMID: 33338540 PMCID: PMC8328264 DOI: 10.1016/j.jaci.2020.11.045] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 11/23/2020] [Accepted: 11/26/2020] [Indexed: 10/22/2022]
Abstract
BACKGROUND Extensive data available in electronic health records (EHRs) have the potential to improve asthma care and understanding of factors influencing asthma outcomes. However, this work can be accomplished only when the EHR data allow for accurate measures of severity, which at present are complex and inconsistent. OBJECTIVE Our aims were to create and evaluate a standardized pediatric asthma severity phenotype based in clinical asthma guidelines for use in EHR-based health initiatives and studies and also to examine the presence and absence of these data in relation to patient characteristics. METHODS We developed an asthma severity computable phenotype and compared the concordance of different severity components contributing to the phenotype to trends in the literature. We used multivariable logistic regression to assess the presence of EHR data relevant to asthma severity. RESULTS The asthma severity computable phenotype performs as expected in comparison with national statistics and the literature. Severity classification for a child is maximized when based on the long-term medication regimen component and minimized when based only on the symptom data component. Use of the severity phenotype results in better, clinically grounded classification. Children for whom severity could be ascertained from these EHR data were more likely to be seen for asthma in the outpatient setting and less likely to be older or Hispanic. Black children were less likely to have lung function testing data present. CONCLUSION We developed a pragmatic computable phenotype for pediatric asthma severity that is transportable to other EHRs.
Collapse
Affiliation(s)
- Komal Peer
- Department of Environmental Health, Boston University School of Public Health, Boston, Mass.
| | - William G Adams
- Boston Medical Center, Boston, Mass; Department of Pediatrics, Boston University School of Medicine, Boston, Mass
| | | | - Megan Sandel
- Boston Medical Center, Boston, Mass; Department of Pediatrics, Boston University School of Medicine, Boston, Mass
| | - Jonathan I Levy
- Department of Environmental Health, Boston University School of Public Health, Boston, Mass
| | - Renée Boynton-Jarrett
- Boston Medical Center, Boston, Mass; Department of Pediatrics, Boston University School of Medicine, Boston, Mass
| | - Chanmin Kim
- Department of Statistics, SungKyunKwan University, Seoul, Korea
| | - Jessica H Leibler
- Department of Environmental Health, Boston University School of Public Health, Boston, Mass
| | - M Patricia Fabian
- Department of Environmental Health, Boston University School of Public Health, Boston, Mass
| |
Collapse
|
15
|
Izzy S, Tahir Z, Grashow R, Cote DJ, Jarrah AA, Dhand A, Taylor H, Whalen M, Nathan DM, Miller KK, Speizer F, Baggish A, Weisskopf MG, Zafonte R. Concussion and Risk of Chronic Medical and Behavioral Health Comorbidities. J Neurotrauma 2021; 38:1834-1841. [PMID: 33451255 DOI: 10.1089/neu.2020.7484] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
While chronic neurological effects from concussion have been studied widely, little is known about possible links between concussion and long-term medical and behavioral comorbidities. We performed a retrospective cohort study of 9205 adult patients with concussion, matched to non-concussion controls from a hospital-based electronic medical registry. Patients with comorbidities before the index visit were excluded. Behavioral and medical comorbidities were defined by International Classification of Diseases, Ninth and Tenth Revision codes. Groups were followed for up to 10 years to identify comorbidity incidence after a concussion. Cox proportional hazards models were used to calculate associations between concussion and comorbidities after multi-variable adjustment. Patients with concussion were 57% male (median age: 31; interquartile range [IQR] = 23-48 years) at enrollment with a median follow-up time of 6.1 years (IQR = 4.2-9.1) and well-matched to healthy controls. Most (83%) concussions were evaluated in outpatient settings (5% inpatient). During follow-up, we found significantly higher risks of cardiovascular risks developing including hypertension (hazard ratio [HR] = 1.7, 95% confidence interval [CI]: 1.5-1.9), obesity (HR = 1.7, 95% CI: 1.3-2.0), and diabetes mellitus (HR = 1.8, 95% CI: 1.4-2.3) in the concussion group compared with controls. Similarly, psychiatric and neurological disorders such as depression (HR = 3.0, 95% CI: 2.6-3.5), psychosis (HR = 6.0, 95% CI: 4.2-8.6), stroke (HR = 2.1 95% CI: 1.5-2.9), and epilepsy (HR = 4.4, 95% CI: 3.2-5.9) were higher in the concussion group. Most comorbidities developed less than five years post-concussion. The risks for post-concussion comorbidities were also higher in patients under 40 years old compared with controls. Patients with concussion demonstrated an increased risk of development of medical and behavioral health comorbidities. Prospective studies are warranted to better describe the burden of long-term comorbidities in patients with concussion.
Collapse
Affiliation(s)
- Saef Izzy
- Department of Neurology, Divisions of Stroke, Cerebrovascular, and Critical Care Neurology, Brigham and Women's Hospital, Boston, Massachusetts, USA.,Harvard Medical School, Boston, Massachusetts, USA
| | - Zabreen Tahir
- Department of Neurology, Divisions of Stroke, Cerebrovascular, and Critical Care Neurology, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Rachel Grashow
- Department of Environmental Health, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA.,The Football Players Health Study at Harvard University, Boston, Massachusetts, USA
| | - David J Cote
- Harvard Medical School, Boston, Massachusetts, USA
| | - Ali Al Jarrah
- Department of Neurology, Divisions of Stroke, Cerebrovascular, and Critical Care Neurology, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Amar Dhand
- Department of Neurology, Divisions of Stroke, Cerebrovascular, and Critical Care Neurology, Brigham and Women's Hospital, Boston, Massachusetts, USA.,Harvard Medical School, Boston, Massachusetts, USA.,Network Science Institute, Northeastern University, Boston, Massachusetts, USA
| | - Herman Taylor
- The Football Players Health Study at Harvard University, Boston, Massachusetts, USA.,Morehouse School of Medicine, Atlanta, Georgia, USA
| | - Michael Whalen
- Department of Pediatrics, Cardiovascular Performance Center, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - David M Nathan
- Harvard Medical School, Boston, Massachusetts, USA.,The Football Players Health Study at Harvard University, Boston, Massachusetts, USA.,Diabetes Center, Cardiovascular Performance Center, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Karen K Miller
- Harvard Medical School, Boston, Massachusetts, USA.,The Football Players Health Study at Harvard University, Boston, Massachusetts, USA.,Neuroendocrine Unit, Cardiovascular Performance Center, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Frank Speizer
- Department of Environmental Health, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA
| | - Aaron Baggish
- Harvard Medical School, Boston, Massachusetts, USA.,The Football Players Health Study at Harvard University, Boston, Massachusetts, USA.,Department of Internal Medicine, Cardiovascular Performance Center, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Marc G Weisskopf
- Department of Environmental Health, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA.,The Football Players Health Study at Harvard University, Boston, Massachusetts, USA
| | - Ross Zafonte
- Harvard Medical School, Boston, Massachusetts, USA.,The Football Players Health Study at Harvard University, Boston, Massachusetts, USA.,Department of Physical Medicine and Rehabilitation, Massachusetts General Hospital, Brigham and Women's Hospital, Boston, Massachusetts, USA.,Spaulding Rehabilitation Hospital, Charlestown, Massachusetts, USA
| |
Collapse
|
16
|
Levenson M, He W, Chen J, Fang Y, Faries D, Goldstein BA, Ho M, Lee K, Mishra-Kalyani P, Rockhold F, Wang H, Zink RC. Biostatistical Considerations When Using RWD and RWE in Clinical Studies for Regulatory Purposes: A Landscape Assessment. Stat Biopharm Res 2021. [DOI: 10.1080/19466315.2021.1883473] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
| | - Weili He
- Global Medical Affairs Statistics, Data and Statistical Sciences, AbbVie, North Chicago, IL
| | - Jie Chen
- Overland Pharmaceuticals, Dover, DE
| | - Yixin Fang
- Global Medical Affairs Statistics, Data and Statistical Sciences, AbbVie, North Chicago, IL
| | - Douglas Faries
- Global Statistical Sciences, Eli Lilly & Company, Indianapolis, IN
| | - Benjamin A. Goldstein
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC
- Duke Clinical Research Institute, Duke University, Durham, NC
| | | | - Kwan Lee
- Statistics and Decision Sciences, Janssen Research and Development (retired), Spring House, PA
| | | | - Frank Rockhold
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC
- Duke Clinical Research Institute, Duke University, Durham, NC
| | - Hongwei Wang
- Global Medical Affairs Statistics, Data and Statistical Sciences, AbbVie, North Chicago, IL
| | | |
Collapse
|
17
|
Sisk R, Lin L, Sperrin M, Barrett JK, Tom B, Diaz-Ordaz K, Peek N, Martin GP. Informative presence and observation in routine health data: A review of methodology for clinical risk prediction. J Am Med Inform Assoc 2021; 28:155-166. [PMID: 33164082 PMCID: PMC7810439 DOI: 10.1093/jamia/ocaa242] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 09/17/2020] [Indexed: 12/20/2022] Open
Abstract
Objective Informative presence (IP) is the phenomenon whereby the presence or absence of patient data is potentially informative with respect to their health condition, with informative observation (IO) being the longitudinal equivalent. These phenomena predominantly exist within routinely collected healthcare data, in which data collection is driven by the clinical requirements of patients and clinicians. The extent to which IP and IO are considered when using such data to develop clinical prediction models (CPMs) is unknown, as is the existing methodology aiming at handling these issues. This review aims to synthesize such existing methodology, thereby helping identify an agenda for future methodological work. Materials and Methods A systematic literature search was conducted by 2 independent reviewers using prespecified keywords. Results Thirty-six articles were included. We categorized the methods presented within as derived predictors (including some representation of the measurement process as a predictor in the model), modeling under IP, and latent structures. Including missing indicators or summary measures as predictors is the most commonly presented approach amongst the included studies (24 of 36 articles). Discussion This is the first review to collate the literature in this area under a prediction framework. A considerable body relevant of literature exists, and we present ways in which the described methods could be developed further. Guidance is required for specifying the conditions under which each method should be used to enable applied prediction modelers to use these methods. Conclusions A growing recognition of IP and IO exists within the literature, and methodology is increasingly becoming available to leverage these phenomena for prediction purposes. IP and IO should be approached differently in a prediction context than when the primary goal is explanation. The work included in this review has demonstrated theoretical and empirical benefits of incorporating IP and IO, and therefore we recommend that applied health researchers consider incorporating these methods in their work.
Collapse
Affiliation(s)
- Rose Sisk
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Lijing Lin
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Matthew Sperrin
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Jessica K Barrett
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom.,Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Brian Tom
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
| | - Karla Diaz-Ordaz
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Niels Peek
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom.,NIHR Biomedical Research Centre, Manchester Academic Health Science Centre, University of Manchester, Manchester, United Kingdom.,Alan Turing Institute, University of Manchester, London, United Kingdom
| | - Glen P Martin
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
18
|
Lavery JA, Callahan MK, Panageas KS. Apples and Oranges? Considerations for EHR-Based Analyses Aggregating Data From Interventional Clinical Trials and Point-of-Care Encounters in Oncology. JCO Clin Cancer Inform 2021; 5:21-23. [PMID: 33411618 DOI: 10.1200/cci.20.00096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Jessica A Lavery
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Margaret K Callahan
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Katherine S Panageas
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY
| |
Collapse
|
19
|
Lin AL, Chen WC, Hong JC. Electronic health record data mining for artificial intelligence healthcare. Artif Intell Med 2021. [DOI: 10.1016/b978-0-12-821259-2.00008-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
20
|
Gallis JA, Kusibab K, Egger JR, Olsen MK, Askew S, Steinberg DM, Bennett GG. Can Electronic Health Records Validly Estimate the Effects of Health System Interventions Aimed at Controlling Body Weight? Obesity (Silver Spring) 2020; 28:2107-2115. [PMID: 32985131 PMCID: PMC8351620 DOI: 10.1002/oby.22958] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 06/08/2020] [Accepted: 06/26/2020] [Indexed: 11/12/2022]
Abstract
OBJECTIVE This study aimed to compare weight collected at clinics and recorded in the electronic health record (EHR) with primary study-collected trial weights to assess the validity of using EHR data in future pragmatic weight loss or weight gain prevention trials. METHODS For both the Track and Shape obesity intervention randomized trials, clinic EHR weight data were compared with primary trial weight data over the same time period. In analyzing the EHR weights, intervention effects were estimated on the primary outcome of weight (in kilograms) with EHR data, using linear mixed effects models. RESULTS EHR weight measurements were higher on average and more variable than trial weight measurements. The mean difference and 95% CI were similar at all time points between the estimates using EHR and study-collected weights. CONCLUSIONS The results of this study can be used to help guide the planning of future pragmatic weight-related trials. This study provides evidence that body weight measurements abstracted from the EHR can provide valid, efficient, and cost-effective data to estimate treatment effects from randomized clinical weight loss and weight management trials. However, care should be taken to properly understand the data-generating process and any mechanisms that may affect the validity of these estimates.
Collapse
Affiliation(s)
- John A. Gallis
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, United States
- Duke Global Health Institute, Duke University, Durham, NC, United States
| | - Kristie Kusibab
- During the study, Ms. Kusibab was a Master of Science student in the Department of Biostatistics & Bioinformatics at Duke University
- PharPoint Research, Inc., Durham, NC, United States
| | - Joseph R. Egger
- Duke Global Health Institute, Duke University, Durham, NC, United States
| | - Maren K. Olsen
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, United States
- Center for Health Services Research in Primary Care, Durham VA Medical Center, Durham, NC, United States
| | - Sandy Askew
- Duke Global Health Institute, Duke University, Durham, NC, United States
- Duke Global Digital Health Science Center, Duke University, Durham, NC, United States
| | - Dori M. Steinberg
- Duke Global Health Institute, Duke University, Durham, NC, United States
- Duke Global Digital Health Science Center, Duke University, Durham, NC, United States
- Duke School of Nursing, Duke University, Durham, NC, United States
| | - Gary G. Bennett
- Duke Global Health Institute, Duke University, Durham, NC, United States
- Duke Global Digital Health Science Center, Duke University, Durham, NC, United States
- Department of Psychology and Neuroscience, Duke University, Durham, NC, United States
- Corresponding Author Contact Info: Gary G. Bennett, ; 919-668-3420; 116 Allen Building, Box 90024, Durham NC 27708
| |
Collapse
|
21
|
Goldstein BA. Five analytic challenges in working with electronic health records data to support clinical trials with some solutions. Clin Trials 2020; 17:370-376. [DOI: 10.1177/1740774520931211] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Electronic health records data are becoming a key data resource in clinical research. Owing to issues of data efficiency, electronic health records data are being used for clinical trials. This includes both large-scale pragmatic trails and smaller—more focused—point-of-care trials. While electronic health records data open up a number of scientific opportunities, they also present a number of analytic challenges. This article discusses five particular challenges related to organizing electronic health records data for analytic purposes. These are as follows: (1) data are not organized for research purposes, (2) data are both densely and irregularly observed, (3) we don’t have all data elements we may want or need, (4) data are both cross-sectional and longitudinal, and (5) data may be informatively observed. While laying out these challenges, the article notes how many of these challenges can be addressed by careful and thoughtful study design as well as by integration of clinicians and informaticians into the analytic team.
Collapse
|