1
|
Al-Sahab B, Leviton A, Loddenkemper T, Paneth N, Zhang B. Biases in Electronic Health Records Data for Generating Real-World Evidence: An Overview. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:121-139. [PMID: 38273982 PMCID: PMC10805748 DOI: 10.1007/s41666-023-00153-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/05/2023] [Accepted: 11/07/2023] [Indexed: 01/27/2024]
Abstract
Electronic Health Records (EHR) are increasingly being perceived as a unique source of data for clinical research as they provide unprecedentedly large volumes of real-time data from real-world settings. In this review of the secondary uses of EHR, we identify the anticipated breadth of opportunities, pointing out the data deficiencies and potential biases that are likely to limit the search for true causal relationships. This paper provides a comprehensive overview of the types of biases that arise along the pathways that generate real-world evidence and the sources of these biases. We distinguish between two levels in the production of EHR data where biases are likely to arise: (i) at the healthcare system level, where the principal source of bias resides in access to, and provision of, medical care, and in the acquisition and documentation of medical and administrative data; and (ii) at the research level, where biases arise from the processes of extracting, analyzing, and interpreting these data. Due to the plethora of biases, mainly in the form of selection and information bias, we conclude with advising extreme caution about making causal inferences based on secondary uses of EHRs.
Collapse
Affiliation(s)
- Ban Al-Sahab
- Department of Family Medicine, College of Human Medicine, Michigan State University, B100 Clinical Center, 788 Service Road, East Lansing, MI USA
| | - Alan Leviton
- Department of Neurology, Harvard Medical School, Boston, MA USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA USA
| | - Tobias Loddenkemper
- Department of Neurology, Harvard Medical School, Boston, MA USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA USA
| | - Nigel Paneth
- Department of Epidemiology and Biostatistics, College of Human Medicine, Michigan State University, East Lansing, MI USA
- Department of Pediatrics and Human Development, College of Human Medicine, Michigan State University, East Lansing, MI USA
| | - Bo Zhang
- Department of Neurology, Boston Children’s Hospital, Boston, MA USA
- Biostatistics and Research Design, Institutional Centers of Clinical and Translational Research, Boston Children’s Hospital, Boston, MA USA
- Harvard Medical School, Boston, MA USA
| |
Collapse
|
2
|
van Os HJA, Kanning JP, Wermer MJH, Chavannes NH, Numans ME, Ruigrok YM, van Zwet EW, Putter H, Steyerberg EW, Groenwold RHH. Developing Clinical Prediction Models Using Primary Care Electronic Health Record Data: The Impact of Data Preparation Choices on Model Performance. FRONTIERS IN EPIDEMIOLOGY 2022; 2:871630. [PMID: 38455328 PMCID: PMC10910909 DOI: 10.3389/fepid.2022.871630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 04/11/2022] [Indexed: 03/09/2024]
Abstract
Objective To quantify prediction model performance in relation to data preparation choices when using electronic health records (EHR). Study Design and Setting Cox proportional hazards models were developed for predicting the first-ever main adverse cardiovascular events using Dutch primary care EHR data. The reference model was based on a 1-year run-in period, cardiovascular events were defined based on both EHR diagnosis and medication codes, and missing values were multiply imputed. We compared data preparation choices based on (i) length of the run-in period (2- or 3-year run-in); (ii) outcome definition (EHR diagnosis codes or medication codes only); and (iii) methods addressing missing values (mean imputation or complete case analysis) by making variations on the derivation set and testing their impact in a validation set. Results We included 89,491 patients in whom 6,736 first-ever main adverse cardiovascular events occurred during a median follow-up of 8 years. Outcome definition based only on diagnosis codes led to a systematic underestimation of risk (calibration curve intercept: 0.84; 95% CI: 0.83-0.84), while complete case analysis led to overestimation (calibration curve intercept: -0.52; 95% CI: -0.53 to -0.51). Differences in the length of the run-in period showed no relevant impact on calibration and discrimination. Conclusion Data preparation choices regarding outcome definition or methods to address missing values can have a substantial impact on the calibration of predictions, hampering reliable clinical decision support. This study further illustrates the urgency of transparent reporting of modeling choices in an EHR data setting.
Collapse
Affiliation(s)
- Hendrikus J. A. van Os
- Department of Neurology, Leiden University Medical Hospital, Leiden, Netherlands
- National eHealth Living Lab, Leiden University Medical Hospital, Leiden, Netherlands
- Department of Public Health & Primary Care, Leiden University Medical Hospital, Leiden, Netherlands
| | - Jos P. Kanning
- Department of Neurology, University Medical Center Utrecht, Utrecht, Netherlands
| | - Marieke J. H. Wermer
- Department of Neurology, Leiden University Medical Hospital, Leiden, Netherlands
| | - Niels H. Chavannes
- National eHealth Living Lab, Leiden University Medical Hospital, Leiden, Netherlands
- Department of Public Health & Primary Care, Leiden University Medical Hospital, Leiden, Netherlands
| | - Mattijs E. Numans
- Department of Public Health & Primary Care, Leiden University Medical Hospital, Leiden, Netherlands
| | - Ynte M. Ruigrok
- Department of Neurology, University Medical Center Utrecht, Utrecht, Netherlands
| | - Erik W. van Zwet
- Department of Biomedical Data Sciences, Leiden University Medical Hospital, Leiden, Netherlands
| | - Hein Putter
- Department of Biomedical Data Sciences, Leiden University Medical Hospital, Leiden, Netherlands
| | - Ewout W. Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Hospital, Leiden, Netherlands
| | - Rolf H. H. Groenwold
- Department of Biomedical Data Sciences, Leiden University Medical Hospital, Leiden, Netherlands
- Department of Clinical Epidemiology, Leiden University Medical Hospital, Leiden, Netherlands
| |
Collapse
|
3
|
Artificial Intelligence in Clinical Immunology. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_83] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
4
|
AIM in Medical Informatics. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_32] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
5
|
Artificial Intelligence in Clinical Immunology. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_83-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
6
|
Bruno P, Calimeri F, Greco G. AIM in Medical Informatics. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_32-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
7
|
Oliver D, Spada G, Colling C, Broadbent M, Baldwin H, Patel R, Stewart R, Stahl D, Dobson R, McGuire P, Fusar-Poli P. Real-world implementation of precision psychiatry: Transdiagnostic risk calculator for the automatic detection of individuals at-risk of psychosis. Schizophr Res 2021; 227:52-60. [PMID: 32571619 PMCID: PMC7875179 DOI: 10.1016/j.schres.2020.05.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 05/01/2020] [Accepted: 05/04/2020] [Indexed: 12/31/2022]
Abstract
BACKGROUND Risk estimation models integrated into Electronic Health Records (EHRs) can deliver innovative approaches in psychiatry, but clinicians' endorsement and their real-world usability are unknown. This study aimed to investigate the real-world feasibility of implementing an individualised, transdiagnostic risk calculator to automatically screen EHRs and detect individuals at-risk for psychosis. METHODS Feasibility implementation study encompassing an in-vitro phase (March 2018 to May 2018) and in-vivo phase (May 2018 to April 2019). The in-vitro phase addressed implementation barriers and embedded the risk calculator (predictors: age, gender, ethnicity, index cluster diagnosis, age*gender) into the local EHR. The in-vivo phase investigated the real-world feasibility of screening individuals accessing secondary mental healthcare at the South London and Maudsley NHS Trust. The primary outcome was adherence of clinicians to automatic EHR screening, defined by the proportion of clinicians who responded to alerts from the risk calculator, over those contacted. RESULTS In-vitro phase: implementation barriers were identified/overcome with clinician and service user engagement, and the calculator was successfully integrated into the local EHR through the CogStack platform. In-vivo phase: 3722 individuals were automatically screened and 115 were detected. Clinician adherence was 74% without outreach and 85% with outreach. One-third of clinicians responded to the first email (37.1%) or phone calls (33.7%). Among those detected, cumulative risk of developing psychosis was 12% at six-month follow-up. CONCLUSION This is the first implementation study suggesting that combining precision psychiatry and EHR methods to improve detection of individuals with emerging psychosis is feasible. Future psychiatric implementation research is urgently needed.
Collapse
Affiliation(s)
- Dominic Oliver
- Early Psychosis: Interventions and Clinical-detection (EPIC) Lab, Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom
| | - Giulia Spada
- Early Psychosis: Interventions and Clinical-detection (EPIC) Lab, Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom
| | - Craig Colling
- National Institute for Health Research, Maudesley Biomedical Research Centre, South London and Maudsley National Health Service (NHS) Foundation Trust, London, United Kingdom
| | - Matthew Broadbent
- National Institute for Health Research, Maudesley Biomedical Research Centre, South London and Maudsley National Health Service (NHS) Foundation Trust, London, United Kingdom
| | - Helen Baldwin
- Early Psychosis: Interventions and Clinical-detection (EPIC) Lab, Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom; National Institute for Health Research, Maudesley Biomedical Research Centre, South London and Maudsley National Health Service (NHS) Foundation Trust, London, United Kingdom
| | - Rashmi Patel
- Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom
| | - Robert Stewart
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom; South London and Maudsley Foundation Trust, London, United Kingdom
| | - Daniel Stahl
- Department of Biostatistics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, United Kingdom
| | - Richard Dobson
- National Institute for Health Research, Maudesley Biomedical Research Centre, South London and Maudsley National Health Service (NHS) Foundation Trust, London, United Kingdom; Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom; Institute of Health Informatics Research, University College London, London, United Kingdom; Health Data Research UK London, University College London, London, United Kingdom
| | - Philip McGuire
- Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom; OASIS Service, South London and Maudsley National Health Service (NHS) Foundation Trust, London, United Kingdom
| | - Paolo Fusar-Poli
- Early Psychosis: Interventions and Clinical-detection (EPIC) Lab, Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom; National Institute for Health Research, Maudesley Biomedical Research Centre, South London and Maudsley National Health Service (NHS) Foundation Trust, London, United Kingdom; OASIS Service, South London and Maudsley National Health Service (NHS) Foundation Trust, London, United Kingdom; Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.
| |
Collapse
|
8
|
Increasing the Density of Laboratory Measures for Machine Learning Applications. J Clin Med 2020; 10:jcm10010103. [PMID: 33396741 PMCID: PMC7795258 DOI: 10.3390/jcm10010103] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 12/23/2020] [Accepted: 12/25/2020] [Indexed: 12/12/2022] Open
Abstract
Background. The imputation of missingness is a key step in Electronic Health Records (EHR) mining, as it can significantly affect the conclusions derived from the downstream analysis in translational medicine. The missingness of laboratory values in EHR is not at random, yet imputation techniques tend to disregard this key distinction. Consequently, the development of an adaptive imputation strategy designed specifically for EHR is an important step in improving the data imbalance and enhancing the predictive power of modeling tools for healthcare applications. Method. We analyzed the laboratory measures derived from Geisinger’s EHR on patients in three distinct cohorts—patients tested for Clostridioides difficile (Cdiff) infection, patients with a diagnosis of inflammatory bowel disease (IBD), and patients with a diagnosis of hip or knee osteoarthritis (OA). We extracted Logical Observation Identifiers Names and Codes (LOINC) from which we excluded those with 75% or more missingness. The comorbidities, primary or secondary diagnosis, as well as active problem lists, were also extracted. The adaptive imputation strategy was designed based on a hybrid approach. The comorbidity patterns of patients were transformed into latent patterns and then clustered. Imputation was performed on a cluster of patients for each cohort independently to show the generalizability of the method. The results were compared with imputation applied to the complete dataset without incorporating the information from comorbidity patterns. Results. We analyzed a total of 67,445 patients (11,230 IBD patients, 10,000 OA patients, and 46,215 patients tested for C. difficile infection). We extracted 495 LOINC and 11,230 diagnosis codes for the IBD cohort, 8160 diagnosis codes for the Cdiff cohort, and 2042 diagnosis codes for the OA cohort based on the primary/secondary diagnosis and active problem list in the EHR. Overall, the most improvement from this strategy was observed when the laboratory measures had a higher level of missingness. The best root mean square error (RMSE) difference for each dataset was recorded as −35.5 for the Cdiff, −8.3 for the IBD, and −11.3 for the OA dataset. Conclusions. An adaptive imputation strategy designed specifically for EHR that uses complementary information from the clinical profile of the patient can be used to improve the imputation of missing laboratory values, especially when laboratory codes with high levels of missingness are included in the analysis.
Collapse
|
9
|
Abstract
PURPOSE OF REVIEW Healthcare has already been impacted by the fourth industrial revolution exemplified by tip of spear technology, such as artificial intelligence and quantum computing. Yet, there is much to be accomplished as systems remain suboptimal, and full interoperability of digital records is not realized. Given the footprint of technology in healthcare, the field of clinical immunology will certainly see improvements related to these tools. RECENT FINDINGS Biomedical informatics spans the gamut of technology in biomedicine. Within this distinct field, advances are being made, which allow for engineering of systems to automate disease detection, create computable phenotypes and improve record portability. Within clinical immunology, technologies are emerging along these lines and are expected to continue. SUMMARY This review highlights advancements in digital health including learning health systems, electronic phenotyping, artificial intelligence and use of registries. Technological advancements for improving diagnosis and care of patients with primary immunodeficiency diseases is also highlighted.
Collapse
|
10
|
Abstract
PURPOSE OF REVIEW Artificial intelligence has pervasively transformed many industries and is beginning to shape medical practice. New use cases are being identified in subspecialty domains of medicine and, in particular, application of artificial intelligence has found its way to the practice of allergy-immunology. Here, we summarize recent developments, emerging applications and obstacles to realizing full potential. RECENT FINDINGS Artificial/augmented intelligence and machine learning are being used to reduce dimensional complexity, understand cellular interactions and advance vaccine work in the basic sciences. In genomics, bioinformatic methods are critical for variant calling and classification. For clinical work, artificial intelligence is enabling disease detection, risk profiling and decision support. These approaches are just beginning to have impact upon the field of clinical immunology and much opportunity exists for further advancement. SUMMARY This review highlights use of computational methods for analysis of large datasets across the spectrum of research and clinical care for patients with immunological disorders. Here, we discuss how big data methods are presently being used across the field clinical immunology.
Collapse
|
11
|
Gillies CE, Taylor DF, Cummings BC, Ansari S, Islim F, Kronick SL, Medlin RP, Ward KR. Demonstrating the consequences of learning missingness patterns in early warning systems for preventative health care: A novel simulation and solution. J Biomed Inform 2020; 110:103528. [PMID: 32795506 DOI: 10.1016/j.jbi.2020.103528] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 05/20/2020] [Accepted: 08/03/2020] [Indexed: 01/04/2023]
Abstract
When using tree-based methods to develop predictive analytics and early warning systems for preventive healthcare, it is important to use an appropriate imputation method to prevent learning the missingness pattern. To demonstrate this, we developed a novel simulation that generated synthetic electronic health record data using a variational autoencoder with a custom loss function, which took into account the high missing rate of electronic health data. We showed that when tree-based methods learn missingness patterns (correlated with adverse events) in electronic health record data, this leads to decreased performance if the system is used in a new setting that has different missingness patterns. Performance is worst in this scenario when the missing rate between those with and without an adverse event is the greatest. We found that randomized and Bayesian regression imputation methods mitigate the issue of learning the missingness pattern for tree-based methods. We used this information to build a novel early warning system for predicting patient deterioration in general wards and telemetry units: PICTURE (Predicting Intensive Care Transfers and other UnfoReseen Events). To develop, tune, and test PICTURE, we used labs and vital signs from electronic health records of adult patients over four years (n = 133,089 encounters). We analyzed primary outcomes of unplanned intensive care unit transfer, emergency vasoactive medication administration, cardiac arrest, and death. We compared PICTURE with existing early warning systems and logistic regression at multiple levels of granularity. When analyzing PICTURE on the testing set using all observations within a hospital encounter (event rate = 3.4%), PICTURE had an area under the receiver operating characteristic curve (AUROC) of 0.83 and an adjusted (event rate = 4%) area under the precision-recall curve (AUPR) of 0.27, while the next best tested method-regularized logistic regression-had an AUROC of 0.80 and an adjusted AUPR of 0.22. To ensure system interpretability, we applied a state-of-the-art prediction explainer that provided a ranked list of features contributing most to the prediction. Though it is currently difficult to compare machine learning-based early warning systems, a rudimentary comparison with published scores demonstrated that PICTURE is on par with state-of-the-art machine learning systems. To facilitate more robust comparisons and development of early warning systems in the future, we have released our variational autoencoder's code and weights so researchers can (a) test their models on data similar to our institution and (b) make their own synthetic datasets.
Collapse
Affiliation(s)
- Christopher E Gillies
- Department of Emergency Medicine, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States; Michigan Institute for Data Science (MIDAS), University of Michigan, Ann Arbor, United States.
| | - Daniel F Taylor
- Department of Emergency Medicine, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States
| | - Brandon C Cummings
- Department of Emergency Medicine, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States
| | - Sardar Ansari
- Department of Emergency Medicine, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States
| | - Fadi Islim
- School of Nursing, United States; Michigan Dialysis Services, Canton, MI, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States
| | - Steven L Kronick
- Department of Emergency Medicine, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States
| | - Richard P Medlin
- Department of Emergency Medicine, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States
| | - Kevin R Ward
- Department of Emergency Medicine, United States; Department of Biomedical Engineering, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States; Michigan Institute for Data Science (MIDAS), University of Michigan, Ann Arbor, United States
| |
Collapse
|